rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 13:32:57 +00:00

Author	SHA1	Message	Date
Arthur Petukhovsky	47e5bf3bbb	Improve term reject message in walproposer (#8164 ) Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-06-26 15:26:52 +01:00
Heikki Linnakangas	fdadd6a152	Remove primary_is_running (#8162 ) This was a half-finished mechanism to allow a replica to enter hot standby mode sooner, without waiting for a running-xacts record. It had issues, and we are working on a better mechanism to replace it. The control plane might still set the flag in the spec file, but compute_ctl will simply ignore it.	2024-06-26 15:13:03 +03:00
MMeent	fd0b22f5cd	Make sure we can handle temporarily offline PS when we first connect (#8094 ) Fixes https://github.com/neondatabase/neon/issues/7897 ## Problem `shard->delay_us` was potentially uninitialized when we connect to PS, as it wasn't set to a non-0 value until we've first connected to the shard's pageserver. That caused the exponential backoff to use an initial value (multiplier) of 0 for the first connection attempt to that pageserver, thus causing a hot retry loop with connection attempts to the pageserver without significant delay. That in turn caused attemmpts to reconnect to quickly fail, rather than showing the expected 'wait until pageserver is available' behaviour. ## Summary of changes We initialize shard->delay_us before connection initialization if we notice it is not initialized yet.	2024-06-19 15:05:31 +02:00
Arseny Sher	6bb8b1d7c2	Remove dead code from walproposer_pg.c Now that logical walsenders fetch WAL from safekeepers recovery in walproposer is not needed. Fixes warnings.	2024-06-18 21:12:02 +03:00
Heikki Linnakangas	dc2ab4407f	Fix on-demand SLRU download on standby starting at WAL segment boundary (#8031 ) If a standby is started right after switching to a new WAL segment, the request in the SLRU download request would point to the beginning of the segment (e.g. 0/5000000), while the not-modified-since LSN would point to just after the page header (e.g. 0/5000028). It's effectively the same position, as there cannot be any WAL records in between, but the pageserver rightly errors out on any request where the request LSN < not-modified since LSN. To fix, round down the not-modified since LSN to the beginning of the page like the request LSN. Fixes issue #8030	2024-06-13 00:31:31 +03:00
Sasha Krassovsky	b7a0c2b614	Add On-demand WAL Download to logicalfuncs (#7960 ) We implemented on-demand WAL download for walsender, but other things that may want to read the WAL from safekeepers don't do that yet. This PR makes it do that by adding the same set of hooks to logicalfuncs. Addresses https://github.com/neondatabase/neon/issues/7959 Also relies on: https://github.com/neondatabase/postgres/pull/438 https://github.com/neondatabase/postgres/pull/437 https://github.com/neondatabase/postgres/pull/436	2024-06-11 17:59:32 -07:00
Heikki Linnakangas	78a59b94f5	Copy editor config for the neon extension from PostgreSQL (#8009 ) This makes IDEs and github diff format the code the same way as PostgreSQL sources, which is the style we try to maintain.	2024-06-11 23:19:18 +03:00
Anastasia Lubennikova	66c6b270f1	Downgrade No response from reading prefetch entry WARNING to LOG	2024-06-06 20:56:19 +01:00
Arseny Sher	e6db8069b0	neon_walreader: check after local read that the segment still exists. Otherwise read might receive zeros/garbage if the file is recycled (renamed) for as a future segment.	2024-05-31 12:57:56 +03:00
Konstantin Knizhnik	d61e924103	Fix connect to PS on MacOS/X (#7885 ) ## Problem After [`0e4f182680`] which introduce async connect Neon is not able to connect to page server. ## Summary of changes Perform sync commit at MacOS/X ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-27 15:57:57 +03:00
MMeent	0e4f182680	Rework PageStream connection state handling: (#7611 ) * Make PS connection startup use async APIs This allows for improved query cancellation when we start connections * Make PS connections have per-shard connection retry state. Previously they shared global backoff state, which is bad for quickly getting all connections started and/or back online. * Make sure we clean up most connection state on failed connections. Previously, we could technically leak some resources that we'd otherwise clean up. Now, the resources are correctly cleaned up. * pagestore_smgr.c now PANICs on unexpected response message types. Unexpected responses are likely a symptom of having a desynchronized view of the connection state. As a desynchronized connection state can cause corruption, we PANIC, as we don't know what data may have been written to buffers: the only solution is to fail fast & hope we didn't write wrong data. * Catch errors in sync pagestream request handling. Previously, if a query was cancelled after a message was sent to the pageserver, but before the data was received, the backend could forget that it sent the synchronous request, and let others deal with the repercussions. This could then lead to incorrect responses, or errors such as "unexpected response from page server with tag 0x68"	2024-05-23 23:26:42 +02:00
Heikki Linnakangas	37f81289c2	Make 'neon.protocol_version = 2' the default, take two (#7819 ) Once all the computes in production have restarted, we can remove protocol version 1 altogether. See issue #6211. This was done earlier already in commit `0115fe6cb2`, but reverted before it was released to production in commit `bbe730d7ca` because of issue https://github.com/neondatabase/neon/issues/7692. That issue was fixed in commit `22afaea6e1`, so we are ready to change the default again.	2024-05-22 18:24:52 +03:00
Heikki Linnakangas	9217564026	Fix issues with determining request LSN in read replica (#7795 ) Don't set last-written LSN of a page when the record is replayed, only when the page is evicted from cache. For comparison, we don't update the last-written LSN on every page modification on the primary either, only when the page is evicted. Do update the last-written LSN when the page update is skipped in WAL redo, however. In neon_get_request_lsns(), don't be surprised if the last-written LSN is equal to the record being replayed. Use the LSN of the record being replayed as the request LSN in that case. Add a long comment explaining how that can happen. In neon_wallog_page, update last-written LSN also when Shutdown has been requested. We might still fetch and evict pages for a while, after shutdown has been requested, so we better continue to do that correctly. Enable the check that we don't evict a page with zero LSN also in standby, but make it a LOG message instead of PANIC Fixes issue https://github.com/neondatabase/neon/issues/7791	2024-05-22 18:24:21 +03:00
Heikki Linnakangas	3404e76a51	Fix confusion between 1-based Buffer and 0-based index (#7825 ) The code was working correctly, but was incorrectly using Buffer for a 0-based index into the BufferDesc array.	2024-05-22 18:24:21 +03:00
Arseny Sher	d43dcceef9	Minimize hot standby feedback xmins to next_xid. Hot standby feedback xmins can be greater than next_xid due to sparse update of nextXid on pageserver (to do less writes it advances next xid on 1024). ProcessStandbyHSFeedback ignores such xids from the future; to fix, minimize received xmin to next_xid. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Arseny Sher	f54c3b96e0	Fix bugs in hot standby feedback propagation and add test for it. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Peter Bendel	a7b84cca5a	Upgrade of pgvector to 0.7.0 (#7726 ) Upgrade pgvector to 0.7.0. This PR is based on Heikki's PR #6753 and just uses pgvector 0.7.0 instead of 0.6.0 I have now done all planned manual tests. The pull request is ready to be reviewed and merged and can be deployed in production together / after swap enablement. See (https://github.com/neondatabase/autoscaling/issues/800) Fixes https://github.com/neondatabase/neon/issues/6516 Fixes https://github.com/neondatabase/neon/issues/7780 ## Documentation input for usage recommendations ### maintenance_work_mem In Neon `maintenance_work_mem` is very small by default (depends on configured RAM for your compute but can be as low as 64 MB). To optimize pgvector index build time you may have to bump it up according to your working set size (size of tuples for vector index creation). You can do so in the current session using `SET maintenance_work_mem='10 GB';` The target value you choose should fit into the memory of your compute size and not exceed 50-60% of available RAM. The value above has been successfully used on a 7CU endpoint. ### max_parallel_maintenance_workers max_parallel_maintenance_workers is also small by default (2). For efficient parallel pgvector index creation you have to bump it up with `SET max_parallel_maintenance_workers = 7` to make use of all the CPUs available, assuming you have configured your endpoint to use 7CU. ## ID input for changelog pgvector extension in Neon has been upgraded from version 0.5.1 to version 0.7.0. Please see https://github.com/pgvector/pgvector/ for documentation of new capabilities in pgvector version 0.7.0 If you have existing databases with pgvector 0.5.1 already installed there is a slight difference in behavior in the following corner cases even if you don't run `ALTER EXTENSION UPDATE`: ### L2 distance from NULL::vector For the following script, comparing the NULL::vector to non-null vectors the resulting output changes: ```sql SET enable_seqscan = off; CREATE TABLE t (val vector(3)); INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL); CREATE INDEX ON t USING hnsw (val vector_l2_ops); INSERT INTO t (val) VALUES ('[1,2,4]'); SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector); ``` and now the output is ``` val --------- [1,1,1] [1,2,4] [1,2,3] [0,0,0] (4 rows) ``` For the following script ```sql SET enable_seqscan = off; CREATE TABLE t (val vector(3)); INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL); CREATE INDEX ON t USING ivfflat (val vector_l2_ops) WITH (lists = 1); INSERT INTO t (val) VALUES ('[1,2,4]'); SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector); ``` the output now is ``` val --------- [0,0,0] [1,2,3] [1,1,1] [1,2,4] (4 rows) ``` ### changed error messages If you provide invalid literals for datatype vector you may get improved/changed error messages, for example: ```sql neondb=> SELECT '[4e38,1]'::vector; ERROR: "4e38" is out of range for type vector LINE 1: SELECT '[4e38,1]'::vector; ^ ``` --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-05-20 12:07:25 +02:00
Heikki Linnakangas	22afaea6e1	Always use Lsn::MAX as the request LSN in the primary (#7708 ) The new protocol version supports sending two LSNs to the pageserver: request LSN and a "not_modified_since" hint. A primary always wants to read the latest version of each page, so having two values was not strictly necessary, and the old protocol worked fine with just the "not_modified_since" LSN and a flag to request the latest page version. Nevertheless, it seemed like a good idea to set the request LSN to the current insert/flush LSN, because that's logically the page version that the primary wants to read. However, that made the test_gc_aggressive test case flaky. When the primary requests a page with the last inserted or flushed LSN, it's possible that by the time that the pageserver processes the request, more WAL has been generated by other processes in the compute and already digested by the pageserver. Furthermore, if the PITR horizon in the pageserver is set to 0, and GC runs during that window, it's possible that the GC horizon has advances past the request LSN, before the pageserver processes the request. It is still correct to send the latest page version in that case, because the compute either has the page locked so the it cannot have been modified in the primary, or if it's a prefetch request, and we will validate the LSNs when the prefetch response is processed and discard it if the page has been modified. But the pageserver doesn't know that and rightly complains. To fix, modify the compute so that the primary always uses Lsn::MAX in the requests. This reverts the primary's behavior to how the protocol version 1 worked. In protocol version 1, there was only one LSN, the "not_modified_since" hint, and a flag was set to read the latest page version, whatever that might be. Requests from computes that are still using protocol version 1 were already mapped to Lsn::MAX in the pageserver, now we do the same with protocol version 2 for primary's requests. (I'm a bit sad about losing the information in the pageserver, what the last LSN was at the time that the request wa made. We never had it with protocol version 1, but I wanted to make it available for debugging purposes.) Add another field, 'effective_request_lsn', to track what the flush LSN was when the request was made. It's not sent to the pageserver, Lsn::MAX is now used as the request LSN, but it's still needed internally in the compute to track the validity of prefetch requests. Fixes issue https://github.com/neondatabase/neon/issues/7692	2024-05-14 09:32:43 +03:00
Heikki Linnakangas	ba20752b76	Refactor the request LSNs to a separate struct (#7708 ) We had a lot of code that passed around the two LSNs that are associated with each GetPage request. Introduce a new struct to encapsulate them. I'm about to add a third LSN to the struct in the next commit, this is a mechanical refactoring in preparation for that.	2024-05-14 09:32:43 +03:00
Vlad Lazar	bbe730d7ca	Revert protocol version upgrade (#7727 ) ## Problem "John pointed out that the switch to protocol version 2 made test_gc_aggressive test flaky: https://github.com/neondatabase/neon/issues/7692. I tracked it down, and that is indeed an issue. Conditions for hitting the issue: The problem occurs in the primary GC horizon is set to a very low value, e.g. 0. If the primary is actively writing WAL, and GC runs in the pageserver at the same time that the primary sends a GetPage request, it's possible that the GC advances the GC horizon past the GetPage request's LSN. I'm working on a fix here: https://github.com/neondatabase/neon/pull/7708." - Heikki ## Summary of changes Use protocol version 1 as default.	2024-05-13 13:41:14 +01:00
Alex Chi Z	2682e0254f	Revert "chore(neon_test_utils): restrict installation to superuser" (#7679 ) This reverts commit `1173ee6a7e`. ## Problem It breaks autoscaling tests	2024-05-09 15:15:19 +00:00
Alex Chi Z	1173ee6a7e	chore(neon_test_utils): restrict installation to superuser (#7624 ) The test utils should only be used during tests. Users should not be able to create this extension on their own. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-08 11:53:54 -04:00
Sasha Krassovsky	7dd58e1449	On-demand WAL download for walsender (#6872 ) ## Problem There's allegedly a bug where if we connect a subscriber before WAL is downloaded from the safekeeper, it creates an error. ## Summary of changes Adds support for pausing safekeepers from sending WAL to computes, and then creates a compute and attaches a subscriber while it's in this paused state. Fails to reproduce the issue, but probably a good test to have --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-05-06 10:54:07 -07:00
Heikki Linnakangas	0115fe6cb2	Make 'neon.protocol_version = 2' the default (#7616 ) Once all the computes in production have restarted, we can remove protocol version 1 altogether. See issue #6211.	2024-05-06 14:37:55 +03:00
Heikki Linnakangas	a2a44ea213	Refactor how the request LSNs are tracked in compute (#7377 ) Instead of thinking in terms of 'latest' and 'lsn' of the request, each request has two LSNs: the request LSN and 'not_modified_since' LSN. The request is nominally made at the request LSN, that determines what page version we want to see. But as a hint, we also include 'not_modified_since'. It tells the pageserver that the page has not been modified since that LSN, which allows the pageserver to skip waiting for newer WAL to arrive, and could allow more optimizations in the future. Refactor the internal functions to calculate the request LSN to calculate both LSNs. Sending two LSNs to the pageserver requires using the new protocol version 2. The previous commit added the server support for it, but we still default to the old protocol for compatibility with old pageservers. The 'neon.protocol_version' GUC can be used to use the new protocol. The new protocol addresses one cause of issue #6211, although you can still get the same error if you have a standby that is lagging behind so that the page version it needs is genuinely GC'd away.	2024-04-25 20:45:37 +03:00
Konstantin Knizhnik	ae15acdee7	Fix bug in prefetch cleanup (#7277 ) ## Problem Running test_pageserver_restarts_under_workload in POR #7275 I get the following assertion failure in prefetch: ``` #5 0x00005587220d4bf0 in ExceptionalCondition ( conditionName=0x7fbf24d003c8 "(ring_index) < MyPState->ring_unused && (ring_index) >= MyPState->ring_last", fileName=0x7fbf24d00240 "/home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c", lineNumber=644) at /home/knizhnik/neon.main//vendor/postgres-v16/src/backend/utils/error/assert.c:66 #6 0x00007fbf24cebc9b in prefetch_set_unused (ring_index=1509) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:644 #7 0x00007fbf24cec613 in prefetch_register_buffer (tag=..., force_latest=0x0, force_lsn=0x0) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:891 #8 0x00007fbf24cef21e in neon_prefetch (reln=0x5587233b7388, forknum=MAIN_FORKNUM, blocknum=14110) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:2055 (gdb) p ring_index $1 = 1509 (gdb) p MyPState->ring_unused $2 = 1636 (gdb) p MyPState->ring_last $3 = 1636 ``` ## Summary of changes Check status of `prefetch_wait_for` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-04-04 13:28:22 +03:00
Anastasia Lubennikova	722f271f6e	Specify caller in 'unexpected response from page server' error (#7272 ) Tiny improvement for log messages to investigate https://github.com/neondatabase/cloud/issues/11559	2024-03-28 15:28:58 +00:00
Konstantin Knizhnik	63b2060aef	Drop connections with all shards invoplved in prefetch in case of error (#7249 ) ## Problem See https://github.com/neondatabase/cloud/issues/11559 If we have multiple shards, we need to reset connections to all shards involved in prefetch (having active prefetch requests) if connection with any of them is lost. ## Summary of changes In `prefetch_on_ps_disconnect` drop connection to all shards with active page requests. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-03-28 08:16:05 +02:00
Konstantin Knizhnik	35f4c04c9b	Remove Get/SetZenithCurrentClusterSize from Postgres core (#7196 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1711003752072899 ## Summary of changes Move keeping of cluster size to neon extension --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-03-22 13:14:31 -04:00
Alex Chi Z	55c4ef408b	safekeeper: correctly handle signals (#7167 ) errno is not preserved in the signal handler. This pull request fixes it. Maybe related: https://github.com/neondatabase/neon/issues/6969, but does not fix the flaky test problem. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-20 15:22:25 -04:00
Arthur Petukhovsky	ad5efb49ee	Support backpressure for sharding (#7100 ) Add shard_number to PageserverFeedback and parse it on the compute side. When compute receives a new ps_feedback, it calculates min LSNs among feedbacks from all shards, and uses those LSNs for backpressure. Add `test_sharding_backpressure` to verify that backpressure slows down compute to wait for the slowest shard.	2024-03-18 21:54:44 +00:00
Heikki Linnakangas	74d09b78c7	Keep walproposer alive until shutdown checkpoint is safe on safekepeers The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited. As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example. Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core. After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always. In commit `568f91420a`, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again. Some tests wrongly assumed that no WAL could have been written between pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing flush_ep_to_pageserver which first stops the endpoint and then waits till all committed WAL reaches the pageserver. In passing extract safekeeper http client to its own module.	2024-03-11 23:29:32 +04:00
Sasha Krassovsky	98723844ee	Don't return from inside PG_TRY (#7095 ) ## Problem Returning from PG_TRY is a bug, and we currently do that ## Summary of changes Make it break and then return false. This should also help stabilize test_bad_connection.py	2024-03-11 18:36:39 +00:00
Alex Chi Z	73a8c97ac8	fix: warnings when compiling neon extensions (#7053 ) proceeding https://github.com/neondatabase/neon/pull/7010, close https://github.com/neondatabase/neon/issues/6188 ## Summary of changes This pull request (should) fix all warnings except `-Wdeclaration-after-statement` in the neon extension compilation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-11 17:49:58 +00:00
Anastasia Lubennikova	86e8c43ddf	Add downgrade scripts for neon extension. (#7065 ) ## Problem When we start compute with newer version of extension (i.e. 1.2) and then rollback the release, downgrading the compute version, next compute start will try to update extension to the latest version available in neon.control (i.e. 1.1). Thus we need to provide downgrade scripts like neon--1.2--1.1.sql These scripts must revert the changes made by the upgrade scripts in the reverse order. This is necessary to ensure that the next upgrade will work correctly. In general, we need to write upgrade and downgrade scripts to be more robust and add IF EXISTS / CREATE OR REPLACE clauses to all statements (where applicable). ## Summary of changes Adds downgrade scripts. Adds test cases for extension downgrade/upgrade. fixes #7066 This is a follow-up for https://app.incident.io/neondb/incidents/167?tab=follow-ups Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z <iskyzh@gmail.com> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2024-03-08 20:42:35 +00:00
Alex Chi Z	b036c32262	fix -Wmissing-prototypes for neon extension (#7010 ) ## Problem ref https://github.com/neondatabase/neon/issues/6188 ## Summary of changes This pull request fixes `-Wmissing-prototypes` for the neon extension. Note that (1) the gcc version in CI and macOS is different, therefore some of the warning does not get reported when developing the neon extension locally. (2) the CI env variable `COPT = -Werror` does not get passed into the docker build process, therefore warnings are not treated as errors on CI. `e62baa9704/.github/workflows/build_and_test.yml (L22)` There will be follow-up pull requests on solving other warnings. By the way, I did not figure out the default compile parameters in the CI env, and therefore this pull request is tested by manually adding `-Wmissing-prototypes` into the `COPT`. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-05 10:03:44 -05:00
Konstantin Knizhnik	3eb83a0ebb	Provide appoximation of working set using hyper-log-log algorithm in LFC (#6935 ) ## Summary of changes Calculate number of unique page accesses at compute. It can be used to estimate working set size and adjust cache size (shared_buffers or local file cache). Approximation is made using HyperLogLog algorithm. It is performed by local file cache and so is available only when local file cache is enabled. This calculation doesn't take in account access to the pages present in shared buffers, but includes pages available in local file cache. This information can be retrieved using approximate_working_set_size(reset bool) function from neon extension. reset parameter can be used to reset statistic and so collect unique accesses for the particular interval. Below is an example of estimating working set size after pgbench -c 10 -S -T 100 -s 10: ``` postgres=# select approximate_working_set_size(false); approximate_working_set_size ------------------------------ 19052 (1 row) postgres=# select pg_table_size('pgbench_accounts')/8192; ?column? ---------- 16402 (1 row) ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-29 15:54:58 +02:00
Konstantin Knizhnik	e895644555	Show LFC statistic in EXPLAIN (#6851 ) ## Problem LFC has high impact on Neon application performance but there is no way for user to check efficiency of its usage ## Summary of changes Show LFC statistic in EXPLAIN ANALYZE ## Description Local file cache (LFC) A layer of caching that stores frequently accessed data from the storage layer in the local memory of the Neon compute instance. This cache helps to reduce latency and improve query performance by minimizing the need to fetch data from the storage layer repeatedly. Externalization of LFC in explain output Then EXPLAIN ANALYZE output is extended to display important counts for local file cache (LFC) hits and misses. This works both, for EXPLAIN text and json output. File cache: hits Whenever the Postgres backend retrieves a page/block from SGMR, it is not found in shared buffer but the page is already found in the LFC this counter is incremented. File cache: misses Whenever the Postgres backend retrieves a page/block from SGMR, it is not found in shared buffer and also not in then LFC but the page is retrieved from Neon storage (page server) this counter is incremented. Example (for explain text output) ```sql explain (analyze,buffers,prefetch,filecache) select count(*) from pgbench_accounts; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=214486.94..214486.95 rows=1 width=8) (actual time=5195.378..5196.034 rows=1 loops=1) Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Gather (cost=214486.73..214486.94 rows=2 width=8) (actual time=5195.366..5196.025 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Partial Aggregate (cost=213486.73..213486.74 rows=1 width=8) (actual time=5187.670..5187.670 rows=1 loops=3) Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Parallel Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..203003.02 rows=4193481 width=0) (actual time=0.574..4928.995 rows=3333333 loops=3) Heap Fetches: 3675286 Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 ``` The json output uses the following keys and provides integer values for those keys: ``` ... "File Cache Hits": 141826, "File Cache Misses": 1865 ... ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-27 14:45:54 +02:00
Bodobolero	75baf83fce	externalize statistics on LFC cache usage (#6906 ) ## Problem Customers should be able to determine the size of their workload's working set to right size their compute. Since Neon uses Local file cache (LFC) instead of shared buffers on bigger compute nodes to cache pages we need to externalize a means to determine LFC hit ratio in addition to shared buffer hit ratio. Currently the following end user documentation `fb7cd3af0e/content/docs/manage/endpoints.md (L137)` is wrong because it describes how to right size a compute node based on shared buffer hit ratio. Note that the existing functionality in extension "neon" is NOT available to end users but only to superuser / cloud_admin. ## Summary of changes - externalize functions and views in neon extension to end users - introduce a new view `NEON_STAT_FILE_CACHE` with the following DDL ```sql CREATE OR REPLACE VIEW NEON_STAT_FILE_CACHE AS WITH lfc_stats AS ( SELECT stat_name, count FROM neon_get_lfc_stats() AS t(stat_name text, count bigint) ), lfc_values AS ( SELECT MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE NULL END) AS file_cache_misses, MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE NULL END) AS file_cache_hits, MAX(CASE WHEN stat_name = 'file_cache_used' THEN count ELSE NULL END) AS file_cache_used, MAX(CASE WHEN stat_name = 'file_cache_writes' THEN count ELSE NULL END) AS file_cache_writes, -- Calculate the file_cache_hit_ratio within the same CTE for simplicity CASE WHEN MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE 0 END) + MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END) = 0 THEN NULL ELSE ROUND((MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END)::DECIMAL / (MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END) + MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE 0 END))) * 100, 2) END AS file_cache_hit_ratio FROM lfc_stats ) SELECT file_cache_misses, file_cache_hits, file_cache_used, file_cache_writes, file_cache_hit_ratio from lfc_values; ``` This view can be used by an end user as follows: ```sql CREATE EXTENSION NEON; SELECT * from neon. NEON_STAT_FILE_CACHE" ``` The output looks like the following: ``` select * from NEON_STAT_FILE_CACHE; file_cache_misses \| file_cache_hits \| file_cache_used \| file_cache_writes \| file_cache_hit_ratio -------------------+-----------------+-----------------+-------------------+---------------------- 2133643 \| 108999742 \| 607 \| 10767410 \| 98.08 (1 row) ``` ## Checklist before requesting a review - [x ] I have performed a self-review of my code. - [x ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-02-26 16:06:00 +00:00
Anastasia Lubennikova	a12e4261a3	Add neon.primary_is_running GUC. (#6705 ) We set it for neon replica, if primary is running. Postgres uses this GUC at the start, to determine if replica should wait for RUNNING_XACTS from primary or not. Corresponding cloud PR is https://github.com/neondatabase/cloud/pull/10183 * Add test hot-standby replica startup. * Extract oldest_running_xid from XlRunningXits WAL records. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-23 13:56:41 +00:00
Arseny Sher	5bcae3a86e	Drop LR slots if too many .snap files are found. PR #6655 turned out to be not enough to prevent .snap files bloat; some subscribers just don't ack flushed position, thus never advancing the slot. Probably other bloating scenarios are also possible, so add a more direct restriction -- drop all slots if too many .snap files has been discovered.	2024-02-23 01:12:49 +04:00
Tristan Partin	76b92e3389	Fix multithreaded postmaster on macOS curl_global_init() with an IPv6 enabled curl build on macOS will cause the calling program to become multithreaded. Unfortunately for shared_preload_libraries, that means the postmaster becomes multithreaded, which CANNOT happen. There are checks in Postgres to make sure that this is not the case.	2024-02-21 13:22:30 -06:00
Shayan Hosseini	fff2468aa2	Add resource consume test funcs (#6747 ) ## Problem Building on #5875 to add handy test functions for autoscaling. Resolves #5609 ## Summary of changes This PR makes the following changes to #5875: - Enable `neon_test_utils` extension in the compute node docker image, so we could use it in the e2e tests (as discussed with @kelvich). - Removed test functions related to disk as we don't use them for autoscaling. - Fix the warning with printf-ing unsigned long variables. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-14 18:45:05 +00:00
Konstantin Knizhnik	b6e070bf85	Do not perform fast exit for catalog pages in redo filter (#6730 ) ## Problem See https://github.com/neondatabase/neon/issues/6674 Current implementation of `neon_redo_read_buffer_filter` performs fast exist for catalog pages: ``` /* * Out of an abundance of caution, we always run redo on shared catalogs, * regardless of whether the block is stored in shared buffers. See also * this function's top comment. / if (!OidIsValid(NInfoGetDbOid(rinfo))) return false; / as a result last written lsn and relation size for FSM fork are not correctly updated for catalog relations. ## Summary of changes Do not perform fast path return for catalog relations. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-13 20:41:17 +02:00
Arthur Petukhovsky	4be2223a4c	Discrete event simulation for safekeepers (#5804 ) This PR contains the first version of a [FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc) simulation testing for safekeeper and walproposer. ### desim This is a core "framework" for running determenistic simulation. It operates on threads, allowing to test syncronous code (like walproposer). `libs/desim/src/executor.rs` contains implementation of a determenistic thread execution. This is achieved by blocking all threads, and each time allowing only a single thread to make an execution step. All executor's threads are blocked using `yield_me(after_ms)` function. This function is called when a thread wants to sleep or wait for an external notification (like blocking on a channel until it has a ready message). `libs/desim/src/chan.rs` contains implementation of a channel (basic sync primitive). It has unlimited capacity and any thread can push or read messages to/from it. `libs/desim/src/network.rs` has a very naive implementation of a network (only reliable TCP-like connections are supported for now), that can have arbitrary delays for each package and failure injections for breaking connections with some probability. `libs/desim/src/world.rs` ties everything together, to have a concept of virtual nodes that can have network connections between them. ### walproposer_sim Has everything to run walproposer and safekeepers in a simulation. `safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`, `send_wal.rs` and `timelines_global_map.rs`. `walproposer_api.rs` implements all walproposer callback to use simulation library. `simulation.rs` defines a schedule – a set of events like `restart <sk>` or `write_wal` that should happen at time `<ts>`. It also has code to spawn walproposer/safekeeper threads and provide config to them. ### tests `simple_test.rs` has tests that just start walproposer and 3 safekeepers together in a simulation, and tests that they are not crashing right away. `misc_test.rs` has tests checking more advanced simulation cases, like crashing or restarting threads, testing memory deallocation, etc. `random_test.rs` is the main test, it checks thousands of random seeds (schedules) for correctness. It roughly corresponds to running a real python integration test in an environment with very unstable network and cpu, but in a determenistic way (each seed results in the same execution log) and much much faster. Closes #547 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-02-12 20:29:57 +00:00
Konstantin Knizhnik	529a79d263	Increment generation which LFC is disabled by assigning 0 to neon.file_cache_size_limit (#6692 ) ## Problem test_lfc_resize sometimes filed with assertion failure when require lock in write operation: ``` if (lfc_ctl->generation == generation) { Assert(LFC_ENABLED()); ``` ## Summary of changes Increment generation when 0 is assigned to neon.file_cache_size_limit ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-09 08:14:41 +02:00
Konstantin Knizhnik	43eae17f0d	Drop unused replication slots (#6655 ) ## Problem See #6626 If there is inactive replication slot then Postgres will not bw able to shrink WAL and delete unused snapshots. If she other active subscription is present, then snapshots created each 15 seconds will overflow AUX_DIR. Setting `max_slot_wal_keep_size` doesn't solve the problem, because even small WAL segment will be enough to overflow AUX_DIR if there is no other activity on the system. ## Summary of changes If there are active subscriptions and some logical replication slots are not used during `neon.logical_replication_max_time_lag` interval, then unused slot is dropped. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-08 17:31:15 +02:00
Heikki Linnakangas	df7bee7cfa	Fix compilation with recent glibc headers with close_range(2). I was getting an error: /home/heikki/git-sandbox/neon//pgxn/neon_walredo/walredoproc.c:161:5: error: conflicting types for ‘close_range’; have ‘int(unsigned int, unsigned int, unsigned int)’ 161 \| int close_range(unsigned int start_fd, unsigned int count, unsigned int flags) { \| ^~~~~~~~~~~ In file included from /usr/include/x86_64-linux-gnu/bits/sigstksz.h:24, from /usr/include/signal.h:328, from /home/heikki/git-sandbox/neon//pgxn/neon_walredo/walredoproc.c:50: /usr/include/unistd.h:1208:12: note: previous declaration of ‘close_range’ with type ‘int(unsigned int, unsigned int, int)’ 1208 \| extern int close_range (unsigned int __fd, unsigned int __max_fd, \| ^~~~~~~~~~~ The discrepancy is in the 3rd argument. Apparently in the glibc wrapper it's signed. As a quick fix, rename our close_range() function, the one that calls syscall() directly, to avoid the clash with the glibc wrapper. In the long term, an autoconf test would be nice, and some equivalent on macOS, see issue #6580.	2024-02-05 11:50:45 +02:00
Vadim Kharitonov	7e8529bec1	Revert "Update pgvector to v0.6.0, third attempt" (#6610 ) The issue is still unsolved because of shmem size in VMs. Need to figure it out before applying this patch. For more details: ``` ERROR: could not resize shared memory segment "/PostgreSQL.2892504480" to 16774205952 bytes: No space left on device ``` As an example, the same issue in community pgvector/pgvector#453.	2024-02-04 22:27:07 +00:00
Heikki Linnakangas	647b85fc15	Update pgvector to v0.6.0, third attempt This includes a compatibility patch that is needed because pgvector now skips WAL-logging during the index build, and WAL-logs the index only in one go at the end. That's how GIN, GiST and SP-GIST index builds work in core PostgreSQL too, but we need some Neon-specific calls to mark the beginning and end of those build phases. pgvector is the first index AM that does that with parallel workers, so I had to modify those functions in the Neon extension to be aware of parallel workers. Only the leader needs to create the underlying file and perform the WAL-logging. (In principle, the parallel workers could participate in the WAL-logging too, but pgvector doesn't do that. This will need some further work if that changes). The previous attempt at this (#6592) missed that parallel workers needed those changes, and segfaulted in parallel build that spilled to disk. Testing ------- We don't have a place for regression tests of extensions at the moment. I tested this manually with the following script: ``` CREATE EXTENSION IF NOT EXISTS vector; DROP TABLE IF EXISTS tst; CREATE TABLE tst (i serial, v vector(3)); INSERT INTO tst (v) SELECT ARRAY[random(), random(), random()] FROM generate_series(1, 15000) g; -- Serial build, in memory ALTER TABLE tst SET (parallel_workers=0); SET maintenance_work_mem='50 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); -- Test that the index works. (The table contents are random, and the -- search is approximate anyway, so we cannot check the exact values. -- For now, just eyeball that they look reasonable) set enable_seqscan=off; explain SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Serial build, spills to on disk ALTER TABLE tst SET (parallel_workers=0); SET maintenance_work_mem='5 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Parallel build, in memory ALTER TABLE tst SET (parallel_workers=4); SET maintenance_work_mem='50 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Parallel build, spills to disk ALTER TABLE tst SET (parallel_workers=4); SET maintenance_work_mem='5 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; ```	2024-02-03 09:19:37 +02:00

1 2 3 4

185 Commits