rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-27 10:00:38 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	ae15acdee7	Fix bug in prefetch cleanup (#7277 ) ## Problem Running test_pageserver_restarts_under_workload in POR #7275 I get the following assertion failure in prefetch: ``` #5 0x00005587220d4bf0 in ExceptionalCondition ( conditionName=0x7fbf24d003c8 "(ring_index) < MyPState->ring_unused && (ring_index) >= MyPState->ring_last", fileName=0x7fbf24d00240 "/home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c", lineNumber=644) at /home/knizhnik/neon.main//vendor/postgres-v16/src/backend/utils/error/assert.c:66 #6 0x00007fbf24cebc9b in prefetch_set_unused (ring_index=1509) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:644 #7 0x00007fbf24cec613 in prefetch_register_buffer (tag=..., force_latest=0x0, force_lsn=0x0) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:891 #8 0x00007fbf24cef21e in neon_prefetch (reln=0x5587233b7388, forknum=MAIN_FORKNUM, blocknum=14110) at /home/knizhnik/neon.main//pgxn/neon/pagestore_smgr.c:2055 (gdb) p ring_index $1 = 1509 (gdb) p MyPState->ring_unused $2 = 1636 (gdb) p MyPState->ring_last $3 = 1636 ``` ## Summary of changes Check status of `prefetch_wait_for` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-04-04 13:28:22 +03:00
Anastasia Lubennikova	722f271f6e	Specify caller in 'unexpected response from page server' error (#7272 ) Tiny improvement for log messages to investigate https://github.com/neondatabase/cloud/issues/11559	2024-03-28 15:28:58 +00:00
Konstantin Knizhnik	63b2060aef	Drop connections with all shards invoplved in prefetch in case of error (#7249 ) ## Problem See https://github.com/neondatabase/cloud/issues/11559 If we have multiple shards, we need to reset connections to all shards involved in prefetch (having active prefetch requests) if connection with any of them is lost. ## Summary of changes In `prefetch_on_ps_disconnect` drop connection to all shards with active page requests. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-03-28 08:16:05 +02:00
Konstantin Knizhnik	35f4c04c9b	Remove Get/SetZenithCurrentClusterSize from Postgres core (#7196 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1711003752072899 ## Summary of changes Move keeping of cluster size to neon extension --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-03-22 13:14:31 -04:00
Alex Chi Z	55c4ef408b	safekeeper: correctly handle signals (#7167 ) errno is not preserved in the signal handler. This pull request fixes it. Maybe related: https://github.com/neondatabase/neon/issues/6969, but does not fix the flaky test problem. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-20 15:22:25 -04:00
Arthur Petukhovsky	ad5efb49ee	Support backpressure for sharding (#7100 ) Add shard_number to PageserverFeedback and parse it on the compute side. When compute receives a new ps_feedback, it calculates min LSNs among feedbacks from all shards, and uses those LSNs for backpressure. Add `test_sharding_backpressure` to verify that backpressure slows down compute to wait for the slowest shard.	2024-03-18 21:54:44 +00:00
Heikki Linnakangas	74d09b78c7	Keep walproposer alive until shutdown checkpoint is safe on safekepeers The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited. As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example. Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core. After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always. In commit `568f91420a`, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again. Some tests wrongly assumed that no WAL could have been written between pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing flush_ep_to_pageserver which first stops the endpoint and then waits till all committed WAL reaches the pageserver. In passing extract safekeeper http client to its own module.	2024-03-11 23:29:32 +04:00
Sasha Krassovsky	98723844ee	Don't return from inside PG_TRY (#7095 ) ## Problem Returning from PG_TRY is a bug, and we currently do that ## Summary of changes Make it break and then return false. This should also help stabilize test_bad_connection.py	2024-03-11 18:36:39 +00:00
Alex Chi Z	73a8c97ac8	fix: warnings when compiling neon extensions (#7053 ) proceeding https://github.com/neondatabase/neon/pull/7010, close https://github.com/neondatabase/neon/issues/6188 ## Summary of changes This pull request (should) fix all warnings except `-Wdeclaration-after-statement` in the neon extension compilation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-11 17:49:58 +00:00
Anastasia Lubennikova	86e8c43ddf	Add downgrade scripts for neon extension. (#7065 ) ## Problem When we start compute with newer version of extension (i.e. 1.2) and then rollback the release, downgrading the compute version, next compute start will try to update extension to the latest version available in neon.control (i.e. 1.1). Thus we need to provide downgrade scripts like neon--1.2--1.1.sql These scripts must revert the changes made by the upgrade scripts in the reverse order. This is necessary to ensure that the next upgrade will work correctly. In general, we need to write upgrade and downgrade scripts to be more robust and add IF EXISTS / CREATE OR REPLACE clauses to all statements (where applicable). ## Summary of changes Adds downgrade scripts. Adds test cases for extension downgrade/upgrade. fixes #7066 This is a follow-up for https://app.incident.io/neondb/incidents/167?tab=follow-ups Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z <iskyzh@gmail.com> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2024-03-08 20:42:35 +00:00
Alex Chi Z	b036c32262	fix -Wmissing-prototypes for neon extension (#7010 ) ## Problem ref https://github.com/neondatabase/neon/issues/6188 ## Summary of changes This pull request fixes `-Wmissing-prototypes` for the neon extension. Note that (1) the gcc version in CI and macOS is different, therefore some of the warning does not get reported when developing the neon extension locally. (2) the CI env variable `COPT = -Werror` does not get passed into the docker build process, therefore warnings are not treated as errors on CI. `e62baa9704/.github/workflows/build_and_test.yml (L22)` There will be follow-up pull requests on solving other warnings. By the way, I did not figure out the default compile parameters in the CI env, and therefore this pull request is tested by manually adding `-Wmissing-prototypes` into the `COPT`. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-05 10:03:44 -05:00
Konstantin Knizhnik	3eb83a0ebb	Provide appoximation of working set using hyper-log-log algorithm in LFC (#6935 ) ## Summary of changes Calculate number of unique page accesses at compute. It can be used to estimate working set size and adjust cache size (shared_buffers or local file cache). Approximation is made using HyperLogLog algorithm. It is performed by local file cache and so is available only when local file cache is enabled. This calculation doesn't take in account access to the pages present in shared buffers, but includes pages available in local file cache. This information can be retrieved using approximate_working_set_size(reset bool) function from neon extension. reset parameter can be used to reset statistic and so collect unique accesses for the particular interval. Below is an example of estimating working set size after pgbench -c 10 -S -T 100 -s 10: ``` postgres=# select approximate_working_set_size(false); approximate_working_set_size ------------------------------ 19052 (1 row) postgres=# select pg_table_size('pgbench_accounts')/8192; ?column? ---------- 16402 (1 row) ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-29 15:54:58 +02:00
Konstantin Knizhnik	e895644555	Show LFC statistic in EXPLAIN (#6851 ) ## Problem LFC has high impact on Neon application performance but there is no way for user to check efficiency of its usage ## Summary of changes Show LFC statistic in EXPLAIN ANALYZE ## Description Local file cache (LFC) A layer of caching that stores frequently accessed data from the storage layer in the local memory of the Neon compute instance. This cache helps to reduce latency and improve query performance by minimizing the need to fetch data from the storage layer repeatedly. Externalization of LFC in explain output Then EXPLAIN ANALYZE output is extended to display important counts for local file cache (LFC) hits and misses. This works both, for EXPLAIN text and json output. File cache: hits Whenever the Postgres backend retrieves a page/block from SGMR, it is not found in shared buffer but the page is already found in the LFC this counter is incremented. File cache: misses Whenever the Postgres backend retrieves a page/block from SGMR, it is not found in shared buffer and also not in then LFC but the page is retrieved from Neon storage (page server) this counter is incremented. Example (for explain text output) ```sql explain (analyze,buffers,prefetch,filecache) select count(*) from pgbench_accounts; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=214486.94..214486.95 rows=1 width=8) (actual time=5195.378..5196.034 rows=1 loops=1) Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Gather (cost=214486.73..214486.94 rows=2 width=8) (actual time=5195.366..5196.025 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Partial Aggregate (cost=213486.73..213486.74 rows=1 width=8) (actual time=5187.670..5187.670 rows=1 loops=3) Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 -> Parallel Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.43..203003.02 rows=4193481 width=0) (actual time=0.574..4928.995 rows=3333333 loops=3) Heap Fetches: 3675286 Buffers: shared hit=178875 read=143691 dirtied=128597 written=127346 Prefetch: hits=0 misses=1865 expired=0 duplicates=0 File cache: hits=141826 misses=1865 ``` The json output uses the following keys and provides integer values for those keys: ``` ... "File Cache Hits": 141826, "File Cache Misses": 1865 ... ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-27 14:45:54 +02:00
Bodobolero	75baf83fce	externalize statistics on LFC cache usage (#6906 ) ## Problem Customers should be able to determine the size of their workload's working set to right size their compute. Since Neon uses Local file cache (LFC) instead of shared buffers on bigger compute nodes to cache pages we need to externalize a means to determine LFC hit ratio in addition to shared buffer hit ratio. Currently the following end user documentation `fb7cd3af0e/content/docs/manage/endpoints.md (L137)` is wrong because it describes how to right size a compute node based on shared buffer hit ratio. Note that the existing functionality in extension "neon" is NOT available to end users but only to superuser / cloud_admin. ## Summary of changes - externalize functions and views in neon extension to end users - introduce a new view `NEON_STAT_FILE_CACHE` with the following DDL ```sql CREATE OR REPLACE VIEW NEON_STAT_FILE_CACHE AS WITH lfc_stats AS ( SELECT stat_name, count FROM neon_get_lfc_stats() AS t(stat_name text, count bigint) ), lfc_values AS ( SELECT MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE NULL END) AS file_cache_misses, MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE NULL END) AS file_cache_hits, MAX(CASE WHEN stat_name = 'file_cache_used' THEN count ELSE NULL END) AS file_cache_used, MAX(CASE WHEN stat_name = 'file_cache_writes' THEN count ELSE NULL END) AS file_cache_writes, -- Calculate the file_cache_hit_ratio within the same CTE for simplicity CASE WHEN MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE 0 END) + MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END) = 0 THEN NULL ELSE ROUND((MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END)::DECIMAL / (MAX(CASE WHEN stat_name = 'file_cache_hits' THEN count ELSE 0 END) + MAX(CASE WHEN stat_name = 'file_cache_misses' THEN count ELSE 0 END))) * 100, 2) END AS file_cache_hit_ratio FROM lfc_stats ) SELECT file_cache_misses, file_cache_hits, file_cache_used, file_cache_writes, file_cache_hit_ratio from lfc_values; ``` This view can be used by an end user as follows: ```sql CREATE EXTENSION NEON; SELECT * from neon. NEON_STAT_FILE_CACHE" ``` The output looks like the following: ``` select * from NEON_STAT_FILE_CACHE; file_cache_misses \| file_cache_hits \| file_cache_used \| file_cache_writes \| file_cache_hit_ratio -------------------+-----------------+-----------------+-------------------+---------------------- 2133643 \| 108999742 \| 607 \| 10767410 \| 98.08 (1 row) ``` ## Checklist before requesting a review - [x ] I have performed a self-review of my code. - [x ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-02-26 16:06:00 +00:00
Anastasia Lubennikova	a12e4261a3	Add neon.primary_is_running GUC. (#6705 ) We set it for neon replica, if primary is running. Postgres uses this GUC at the start, to determine if replica should wait for RUNNING_XACTS from primary or not. Corresponding cloud PR is https://github.com/neondatabase/cloud/pull/10183 * Add test hot-standby replica startup. * Extract oldest_running_xid from XlRunningXits WAL records. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-23 13:56:41 +00:00
Arseny Sher	5bcae3a86e	Drop LR slots if too many .snap files are found. PR #6655 turned out to be not enough to prevent .snap files bloat; some subscribers just don't ack flushed position, thus never advancing the slot. Probably other bloating scenarios are also possible, so add a more direct restriction -- drop all slots if too many .snap files has been discovered.	2024-02-23 01:12:49 +04:00
Tristan Partin	76b92e3389	Fix multithreaded postmaster on macOS curl_global_init() with an IPv6 enabled curl build on macOS will cause the calling program to become multithreaded. Unfortunately for shared_preload_libraries, that means the postmaster becomes multithreaded, which CANNOT happen. There are checks in Postgres to make sure that this is not the case.	2024-02-21 13:22:30 -06:00
Shayan Hosseini	fff2468aa2	Add resource consume test funcs (#6747 ) ## Problem Building on #5875 to add handy test functions for autoscaling. Resolves #5609 ## Summary of changes This PR makes the following changes to #5875: - Enable `neon_test_utils` extension in the compute node docker image, so we could use it in the e2e tests (as discussed with @kelvich). - Removed test functions related to disk as we don't use them for autoscaling. - Fix the warning with printf-ing unsigned long variables. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-14 18:45:05 +00:00
Konstantin Knizhnik	b6e070bf85	Do not perform fast exit for catalog pages in redo filter (#6730 ) ## Problem See https://github.com/neondatabase/neon/issues/6674 Current implementation of `neon_redo_read_buffer_filter` performs fast exist for catalog pages: ``` /* * Out of an abundance of caution, we always run redo on shared catalogs, * regardless of whether the block is stored in shared buffers. See also * this function's top comment. / if (!OidIsValid(NInfoGetDbOid(rinfo))) return false; / as a result last written lsn and relation size for FSM fork are not correctly updated for catalog relations. ## Summary of changes Do not perform fast path return for catalog relations. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-13 20:41:17 +02:00
Arthur Petukhovsky	4be2223a4c	Discrete event simulation for safekeepers (#5804 ) This PR contains the first version of a [FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc) simulation testing for safekeeper and walproposer. ### desim This is a core "framework" for running determenistic simulation. It operates on threads, allowing to test syncronous code (like walproposer). `libs/desim/src/executor.rs` contains implementation of a determenistic thread execution. This is achieved by blocking all threads, and each time allowing only a single thread to make an execution step. All executor's threads are blocked using `yield_me(after_ms)` function. This function is called when a thread wants to sleep or wait for an external notification (like blocking on a channel until it has a ready message). `libs/desim/src/chan.rs` contains implementation of a channel (basic sync primitive). It has unlimited capacity and any thread can push or read messages to/from it. `libs/desim/src/network.rs` has a very naive implementation of a network (only reliable TCP-like connections are supported for now), that can have arbitrary delays for each package and failure injections for breaking connections with some probability. `libs/desim/src/world.rs` ties everything together, to have a concept of virtual nodes that can have network connections between them. ### walproposer_sim Has everything to run walproposer and safekeepers in a simulation. `safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`, `send_wal.rs` and `timelines_global_map.rs`. `walproposer_api.rs` implements all walproposer callback to use simulation library. `simulation.rs` defines a schedule – a set of events like `restart <sk>` or `write_wal` that should happen at time `<ts>`. It also has code to spawn walproposer/safekeeper threads and provide config to them. ### tests `simple_test.rs` has tests that just start walproposer and 3 safekeepers together in a simulation, and tests that they are not crashing right away. `misc_test.rs` has tests checking more advanced simulation cases, like crashing or restarting threads, testing memory deallocation, etc. `random_test.rs` is the main test, it checks thousands of random seeds (schedules) for correctness. It roughly corresponds to running a real python integration test in an environment with very unstable network and cpu, but in a determenistic way (each seed results in the same execution log) and much much faster. Closes #547 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-02-12 20:29:57 +00:00
Konstantin Knizhnik	529a79d263	Increment generation which LFC is disabled by assigning 0 to neon.file_cache_size_limit (#6692 ) ## Problem test_lfc_resize sometimes filed with assertion failure when require lock in write operation: ``` if (lfc_ctl->generation == generation) { Assert(LFC_ENABLED()); ``` ## Summary of changes Increment generation when 0 is assigned to neon.file_cache_size_limit ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-09 08:14:41 +02:00
Konstantin Knizhnik	43eae17f0d	Drop unused replication slots (#6655 ) ## Problem See #6626 If there is inactive replication slot then Postgres will not bw able to shrink WAL and delete unused snapshots. If she other active subscription is present, then snapshots created each 15 seconds will overflow AUX_DIR. Setting `max_slot_wal_keep_size` doesn't solve the problem, because even small WAL segment will be enough to overflow AUX_DIR if there is no other activity on the system. ## Summary of changes If there are active subscriptions and some logical replication slots are not used during `neon.logical_replication_max_time_lag` interval, then unused slot is dropped. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-08 17:31:15 +02:00
Heikki Linnakangas	df7bee7cfa	Fix compilation with recent glibc headers with close_range(2). I was getting an error: /home/heikki/git-sandbox/neon//pgxn/neon_walredo/walredoproc.c:161:5: error: conflicting types for ‘close_range’; have ‘int(unsigned int, unsigned int, unsigned int)’ 161 \| int close_range(unsigned int start_fd, unsigned int count, unsigned int flags) { \| ^~~~~~~~~~~ In file included from /usr/include/x86_64-linux-gnu/bits/sigstksz.h:24, from /usr/include/signal.h:328, from /home/heikki/git-sandbox/neon//pgxn/neon_walredo/walredoproc.c:50: /usr/include/unistd.h:1208:12: note: previous declaration of ‘close_range’ with type ‘int(unsigned int, unsigned int, int)’ 1208 \| extern int close_range (unsigned int __fd, unsigned int __max_fd, \| ^~~~~~~~~~~ The discrepancy is in the 3rd argument. Apparently in the glibc wrapper it's signed. As a quick fix, rename our close_range() function, the one that calls syscall() directly, to avoid the clash with the glibc wrapper. In the long term, an autoconf test would be nice, and some equivalent on macOS, see issue #6580.	2024-02-05 11:50:45 +02:00
Vadim Kharitonov	7e8529bec1	Revert "Update pgvector to v0.6.0, third attempt" (#6610 ) The issue is still unsolved because of shmem size in VMs. Need to figure it out before applying this patch. For more details: ``` ERROR: could not resize shared memory segment "/PostgreSQL.2892504480" to 16774205952 bytes: No space left on device ``` As an example, the same issue in community pgvector/pgvector#453.	2024-02-04 22:27:07 +00:00
Heikki Linnakangas	647b85fc15	Update pgvector to v0.6.0, third attempt This includes a compatibility patch that is needed because pgvector now skips WAL-logging during the index build, and WAL-logs the index only in one go at the end. That's how GIN, GiST and SP-GIST index builds work in core PostgreSQL too, but we need some Neon-specific calls to mark the beginning and end of those build phases. pgvector is the first index AM that does that with parallel workers, so I had to modify those functions in the Neon extension to be aware of parallel workers. Only the leader needs to create the underlying file and perform the WAL-logging. (In principle, the parallel workers could participate in the WAL-logging too, but pgvector doesn't do that. This will need some further work if that changes). The previous attempt at this (#6592) missed that parallel workers needed those changes, and segfaulted in parallel build that spilled to disk. Testing ------- We don't have a place for regression tests of extensions at the moment. I tested this manually with the following script: ``` CREATE EXTENSION IF NOT EXISTS vector; DROP TABLE IF EXISTS tst; CREATE TABLE tst (i serial, v vector(3)); INSERT INTO tst (v) SELECT ARRAY[random(), random(), random()] FROM generate_series(1, 15000) g; -- Serial build, in memory ALTER TABLE tst SET (parallel_workers=0); SET maintenance_work_mem='50 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); -- Test that the index works. (The table contents are random, and the -- search is approximate anyway, so we cannot check the exact values. -- For now, just eyeball that they look reasonable) set enable_seqscan=off; explain SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Serial build, spills to on disk ALTER TABLE tst SET (parallel_workers=0); SET maintenance_work_mem='5 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Parallel build, in memory ALTER TABLE tst SET (parallel_workers=4); SET maintenance_work_mem='50 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; -- Parallel build, spills to disk ALTER TABLE tst SET (parallel_workers=4); SET maintenance_work_mem='5 MB'; CREATE INDEX idx ON tst USING hnsw (v vector_l2_ops); SELECT * FROM tst ORDER BY v <-> ARRAY[0, 0, 0]::vector LIMIT 5; DROP INDEX idx; ```	2024-02-03 09:19:37 +02:00
Heikki Linnakangas	c9876b0993	Fix double-free bug in walredo process. (#6534 ) At the end of ApplyRecord(), we called pfree on the decoded record, if it was "oversized". However, we had alread linked it to the "decode queue" list in XLogReaderState. If we later called XLogBeginRead(), it called ResetDecoder and tried to free the same record again. The conditions to hit this are: - a large WAL record (larger than aboue 64 kB I think, per DEFAULT_DECODE_BUFFER_SIZE), and - another WAL record processed by the same WAL redo process after the large one. I think the reason we haven't seen this earlier is that you don't get WAL records that large that are sent to the WAL redo process, except when logical replication is enabled. Logical replication adds data to the WAL records, making them larger. To fix, allocate the buffer ourselves, and don't link it to the decode queue. Alternatively, we could perhaps have just removed the pfree(), but frankly I'm a bit scared about the whole queue thing.	2024-02-02 21:49:11 +02:00
Christian Schwarz	1be5e564ce	feat(walredo): use posix_spawn by moving close_fds() work to walredo C code (#6574 ) The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually using `gdb` that we're now forking out the walredo process using `posix_spawn`. refs https://github.com/neondatabase/neon/issues/6565	2024-02-01 22:38:34 +01:00
Konstantin Knizhnik	9a9d9beaee	Download SLRU segments on demand (#6151 ) ## Problem See https://github.com/neondatabase/cloud/issues/8673 ## Summary of changes Download missed SLRU segments from page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-31 21:39:18 +02:00
Konstantin Knizhnik	e10a7ee391	Prevent to frequent reconnects in case of race condition errors returned by PS (tenant not found) (#6522 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1706531433057289 ## Summary of changes 1. Do not decrease reconnect timeout until maximal interval value (1 second) is reached 2. Compute reconnect time after connection attempt is taken to exclude connect time itself from the interval measurement. So now backend should not perform more than 4 reconnect attempts per second. But please notice that backoff is performed locally in each backend and so if there are many active backends, then connection (and so error) rate may be much higher. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-31 09:17:32 +02:00
Konstantin Knizhnik	19ed230708	Add support for PS sharding in compute (#6205 ) refer #5508 replaces #5837 ## Problem This PR implements sharding support at compute side. Relations are splinted in stripes and `get_page` requests are redirected to the particular shard where stripe is located. All other requests (i.e. get relation or database size) are always send to shard 0. ## Summary of changes Support of sharding at compute side include three things: 1. Make it possible to specify and change in runtime connection to more retain one page server 2. Send `get_page` request to the particular shard (determined by hash of page key) 3. Support multiple servers in prefetch ring requests ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: John Spray <john@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-25 15:53:31 +02:00
Konstantin Knizhnik	00d9bf5b61	Implement lockless update of pageserver_connstring GUC in shared memory (#6314 ) ## Problem There is "neon.pageserver_connstring" GUC with PGC_SIGHUP option, allowing to change it using pg_reload_conf(). It is used by control plane to update pageserver connection string if page server is crashed, relocated or new shards are added. It is copied to shared memory because config can not be loaded during query execution and we need to reestablish connection to page server. ## Summary of changes Copying connection string to shared memory is done by postmaster. And other backends should check update counter to determine of connection URL is changed and connection needs to be reestablished. We can not use standard Postgres LW-locks, because postmaster has proc entry and so can not wait on this primitive. This is why lockless access algorithm is implemented using two atomic counters to enforce consistent reading of connection string value from shared memory. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-23 07:55:05 +02:00
Sasha Krassovsky	a40ed86d87	Add test for migrations, add initial migration	2024-01-22 14:53:29 -08:00
Arthur Petukhovsky	a092127b17	Fix truncateLsn initialization (#6396 ) In `7f828890cf` we changed the logic for persisting control_files. Previously it was updated if `peer_horizon_lsn` jumped more than one segment, which made `peer_horizon_lsn` initialized on disk as soon as safekeeper has received a first `AppendRequest`. This caused an issue with `truncateLsn`, which now can be zero sometimes. This PR fixes it, and now `truncateLsn/peer_horizon_lsn` can never be zero once we know `timeline_start_lsn`. Closes https://github.com/neondatabase/neon/issues/6248	2024-01-18 18:55:24 +00:00
Konstantin Knizhnik	02b916d3c9	Use [NEON_SMGR] tag for all messages in neon extension (#6313 ) ## Problem Use [NEON_SMGR] for all log messages produced by neon extension. ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-18 17:08:34 +02:00
Konstantin Knizhnik	0dc4c9b0b8	Relsize hash lru eviction (#6353 ) ## Problem Currently relation hash size is limited by "neon.relsize_hash_size" GUC with default value 64k. 64k relations is not so small number... but it is enough to create 376 databases to exhaust it. ## Summary of changes Use LRU replacement algorithm to prevent hash overflow ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-17 20:34:30 +02:00
Sasha Krassovsky	7662df6ca0	Fix minimum backoff to 1ms	2024-01-03 21:09:19 -08:00
Arseny Sher	65b4e6e7d6	Remove empty safekeeper init since truncateLsn. It has caveats such as creating half empty segment which can't be offloaded. Instead we'll pursue approach of pull_timeline, seeding new state from some peer.	2024-01-03 18:20:19 +04:00
Arseny Sher	f71110383c	Remove second check for max_slot_wal_keep_size download size. Already checked in GetLogRepRestartLSN, a rebase artifact.	2024-01-03 13:13:32 +04:00
Arseny Sher	ae3eaf9995	Add [WP] prefix to all walproposer logging. - rename walpop_log to wp_log - create also wpg_log which is used in postgres-specific code - in passing format messages to start with lower case	2024-01-03 11:10:27 +04:00
Sasha Krassovsky	ce13281d54	MIN not MAX	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	4e1d16f311	Switch to exponential rate-limiting	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	091a0cda9d	Switch to rate-limiting strategy	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	ea9fad419e	Add exponential backoff to page_server->send	2024-01-02 06:28:49 -08:00
Arseny Sher	ddc431fc8f	pgindent walproposer condvar comment	2023-12-26 14:12:53 +04:00
Arseny Sher	bfc98f36e3	Refactor handling responses in walproposer. Remove confirm_wal_streamed; we already apply both write and flush positions of the slot to commit_lsn which is fine because 1) we need to wake up waiters 2) committed WAL can be fetched from safekeepers by neon_walreader now.	2023-12-26 14:12:53 +04:00
Arseny Sher	1f1c50e8c7	Don't re-add neon_walreader socket to waiteventset if possible. Should make recovery slightly more efficient (likely negligibly).	2023-12-26 14:12:53 +04:00
Arseny Sher	854df0f566	Do PQgetCopyData before PQconsumeInput in libpqwp_async_read. To avoid a lot of redundant memmoves and bloated input buffer. fixes https://github.com/neondatabase/neon/issues/6055	2023-12-26 14:12:53 +04:00
Arseny Sher	9c493869c7	Perform synchronous WAL download in wp only for logical replication. wp -> sk communication now uses neon_walreader which will fetch missing WAL on demand from safekeepers, so doesn't need this anymore. Also, cap WAL download by max_slot_wal_keep_size to be able to start compute if lag is too high.	2023-12-26 14:12:53 +04:00
Arseny Sher	cdb08f0362	Introduce NeonWALReader downloading sk -> compute WAL on demand. It is similar to XLogReader, but when either requested segment is missing locally or requested LSN is before basebackup_lsn NeonWALReader asynchronously fetches WAL from one of safekeepers. Patch includes walproposer switch to NeonWALReader, splitting wouldn't make much sense as it is hard to test otherwise. This finally removes risk of pg_wal explosion (as well as slow start time) when one safekeeper is lagging, at the same time allowing to recover it. In the future reader should also be used by logical walsender for similar reasons (currently we download the tail on compute start synchronously). The main test is test_lagging_sk. However, I also run it manually a lot varying MAX_SEND_SIZE on both sides (on safekeeper and on walproposer), testing various fragmentations (one side having small buffer, another, both), which brought up https://github.com/neondatabase/neon/issues/6055 closes https://github.com/neondatabase/neon/issues/1012	2023-12-26 14:12:53 +04:00
Konstantin Knizhnik	572bc06011	Do not copy WAL for lagged slots (#6221 ) ## Problem See https://neondb.slack.com/archives/C026T7K2YP9/p1702813041997959 ## Summary of changes Do not take in account invalidated slots when calculate restart_lsn position for basebackup at page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-12-22 20:47:55 +02:00

1 2 3 4

160 Commits