rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-05 20:42:54 +00:00

Author	SHA1	Message	Date
Christian Schwarz	d260426a14	is_rel_block_key: exclude the relsize key (#6266 ) Before this PR, `is_rel_block_key` returns true for the blknum `0xffffffff`, which is a blknum that's actually never written by Postgres, but used by Neon Pageserver to store the relsize. Quoting @MMeent: > PostgreSQL can't extend the relation beyond size of 0xFFFFFFFF blocks, > so block number 0xFFFFFFFE is the last valid block number. This PR changes the definition of the function to exclude blknum 0xffffffff. My motivation for doing this change is to fix the `pagebench` getpage benchmark, which uses `is_rel_block_key` to filter the keyspace for valid pages to request from page_service. fixes https://github.com/neondatabase/neon/issues/6210 I checked other users of the function. The first one is `key_is_shard0`, which already had added an exemption for 0xffffffff. So, there's no functional change with this PR. The second one is `DatadirModification::flush`[^1]. With this PR, `.flush()` will skip the relsize key, whereas it didn't before. This means we will pile up all the relsize key-value pairs `(Key,u32)` in `DatadirModification::pending_updates` until `.commit()` is called. The only place I can think of where that would be a problem is if we import from a full basebackup, and don't `.commit()` regularly, like we currently don't do in `import_basebackup_from_tar`. It exposes us to input-controlled allocations. However, that was already the case for the other keys that are skipped, so, one can argue that this change is not making the situation much worse. [^1]: That type's `flush()` and `commit()` methods are terribly named, but, that's for another time	2024-01-05 11:48:06 +01:00
Arthur Petukhovsky	f3b5db1443	Add API for safekeeper timeline copy (#6091 ) Implement API for cloning a single timeline inside a safekeeper. Also add API for calculating a sha256 hash of WAL, which is used in tests. `/copy` API works by copying objects inside S3 for all but the last segments, and the last segments are copied on-disk. A special temporary directory is created for a timeline, because copy can take a lot of time, especially for large timelines. After all files segments have been prepared, this directory is mounted to the main tree and timeline is loaded to memory. Some caveats: - large timelines can take a lot of time to copy, because we need to copy many S3 segments - caller should wait for HTTP call to finish indefinetely and don't close the HTTP connection, because it will stop the process, which is not continued in the background - `until_lsn` must be a valid LSN, otherwise bad things can happen - API will return 200 if specified `timeline_id` already exists, even if it's not a copy - each safekeeper will try to copy S3 segments, so it's better to not call this API in-parallel on different safekeepers	2024-01-04 17:40:38 +00:00
John Spray	18e9208158	pageserver: improved error handling for shard routing error, timeline not found (#6262 ) ## Problem - When a client requests a key that isn't found in any shard on the node (edge case that only happens if a compute's config is out of date), we should prompt them to reconnect (as this includes a backoff), since they will not be able to complete the request until they eventually get a correct pageserver connection string. - QueryError::Other is used excessively: this contains a type-ambiguous anyhow::Error and is logged very verbosely (including backtrace). ## Summary of changes - Introduce PageStreamError to replace use of anyhow::Error in request handlers for getpage, etc. - Introduce Reconnect and NotFound variants to QueryError - Map the "shard routing error" case to PageStreamError::Reconnect -> QueryError::Reconnect - Update type conversions for LSN timeouts and tenant/timeline not found errors to use PageStreamError::NotFound->QueryError::NotFound	2024-01-04 10:40:03 +00:00
Sasha Krassovsky	7662df6ca0	Fix minimum backoff to 1ms	2024-01-03 21:09:19 -08:00
John Spray	c119af8ddd	pageserver: run at least 2 background task threads Otherwise an assertion in CONCURRENT_BACKGROUND_TASKS will trip if you try to run the pageserver on a single core.	2024-01-03 14:22:40 +00:00
John Spray	a2e083ebe0	pageserver: make walredo shard-aware This does not have a functional impact, but enables all the logging in this code to include the shard_id label.	2024-01-03 14:22:40 +00:00
John Spray	73a944205b	pageserver: log details on shard routing error	2024-01-03 14:22:40 +00:00
John Spray	34ebfbdd6f	pageserver: fix handling getpage with multiple shards on one node Previously, we would wait for the LSN to be visible on whichever timeline we happened to load at the start of the connection, then proceed to look up the correct timeline for the key and do the read. If the timeline holding the key was behind the timeline we used for the LSN wait, then we might serve an apparently-successful read result that actually contains data from behind the requested lsn.	2024-01-03 14:22:40 +00:00
John Spray	ef7c9c2ccc	pageserver: fix active tenant lookup hitting secondaries with sharding If there is some secondary shard for a tenant on the same node as an attached shard, the secondary shard could trip up this code and cause page_service to incorrectly get an error instead of finding the attached shard.	2024-01-03 14:22:40 +00:00
John Spray	6c79e12630	pageserver: drop unwanted keys during compaction after split	2024-01-03 14:22:40 +00:00
John Spray	753d97bd77	pageserver: don't delete ancestor shard layers	2024-01-03 14:22:40 +00:00
John Spray	edc962f1d7	test_runner: test_issue_5878 log allow list (#6259 ) ## Problem https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6254/7388706419/index.html#suites/5a4b8734277a9878cb429b80c314f470/e54c4f6f6ed22672 ## Summary of changes Permit the log message: because the test helper's detach function increments the generation number, a detach/attach cycle can cause the error if the test runner node is slow enough for the opportunistic deletion queue flush on detach not to complete by the time we call attach.	2024-01-03 14:22:17 +00:00
Arseny Sher	65b4e6e7d6	Remove empty safekeeper init since truncateLsn. It has caveats such as creating half empty segment which can't be offloaded. Instead we'll pursue approach of pull_timeline, seeding new state from some peer.	2024-01-03 18:20:19 +04:00
Alexander Bayandin	17b256679b	vm-image-spec: build pgbouncer from Neon's fork (#6249 ) ## Problem We need to add one more patch to pgbouncer (for https://github.com/neondatabase/neon/issues/5801). I've decided to cherry-pick all required patches to a pgbouncer fork (`neondatabase/pgbouncer`) and use it instead. See https://github.com/neondatabase/pgbouncer/releases/tag/pgbouncer_1_21_0-neon-1 ## Summary of changes - Revert the previous patch (for deallocate/discard all) — the fork already contains it. - Remove `libssl-dev` dependency — we build pgbouncer without `openssl` support. - Clone git tag and build pgbouncer from source code.	2024-01-03 13:02:04 +00:00
John Spray	673a865055	tests: tolerate 304 when evicting layers (#6261 ) In tests that evict layers, explicit eviction can race with automatic eviction of the same layer and result in a 304	2024-01-03 11:50:58 +00:00
Cuong Nguyen	fb518aea0d	Add batch ingestion mechanism to avoid high contention (#5886 ) ## Problem For context, this problem was observed in a research project where we try to make neon run in multiple regions and I was asked by @hlinnaka to make this PR. In our project, we use the pageserver in a non-conventional way such that we would send a larger number of requests to the pageserver than normal (imagine postgres without the buffer pool). I measured the time from the moment a WAL record left the safekeeper to when it reached the pageserver ([code](`e593db1f5a/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs (L282-L287)`)) and observed that when the number of get_page_at_lsn requests was high, the wal receiving time increased significantly (see the left side of the graphs below). Upon further investigation, I found that the delay was caused by this line `d2ca410919/pageserver/src/tenant/timeline.rs (L2348)` The `get_layer_for_write` method is called for every value during WAL ingestion and it tries to acquire layers write lock every time, thus this results in high contention when read lock is acquired more frequently. ![Untitled](https://github.com/neondatabase/neon/assets/6244849/85460f4d-ead1-4532-bc64-736d0bfd7f16) ![Untitled2](https://github.com/neondatabase/neon/assets/6244849/84199ab7-5f0e-413b-a42b-f728f2225218) ## Summary of changes It is unnecessary to call `get_layer_for_write` repeatedly for all values in a WAL message since they would end up in the same memory layer anyway, so I created the batched versions of `InMemoryLayer::put_value`, `InMemoryLayer ::put_tombstone`, `Timeline::put_value`, and `Timeline::put_tombstone`, that acquire the locks once for a batch of values. Additionally, `DatadirModification` is changed to store multiple versions of uncommitted values, and `WalIngest::ingest_record()` can now ingest records without immediately committing them. With these new APIs, the new ingestion loop can be changed to commit for every `ingest_batch_size` records. The `ingest_batch_size` variable is exposed as a config. If it is set to 1 then we get the same behavior before this change. I found that setting this value to 100 seems to work the best, and you can see its effect on the right side of the above graphs. --------- Co-authored-by: John Spray <john@neon.tech>	2024-01-03 10:41:58 +00:00
John Spray	42f41afcbd	tests: update pytest and boto3 dependencies (#6253 ) ## Problem The version of pytest we were using emits a number of DeprecationWarnings on latest python: these are fixed in latest release. boto3 and python-dateutil also have deprecation warnings, but unfortunately these aren't fixed upstream yet. ## Summary of changes - Update pytest - Update boto3 (this doesn't fix deprecation warnings, but by the time I figured that out I had already done the update, and it's good hygiene anyway)	2024-01-03 10:36:53 +00:00
Arseny Sher	f71110383c	Remove second check for max_slot_wal_keep_size download size. Already checked in GetLogRepRestartLSN, a rebase artifact.	2024-01-03 13:13:32 +04:00
Arseny Sher	ae3eaf9995	Add [WP] prefix to all walproposer logging. - rename walpop_log to wp_log - create also wpg_log which is used in postgres-specific code - in passing format messages to start with lower case	2024-01-03 11:10:27 +04:00
Christian Schwarz	aa9f1d4b69	pagebench get-page: default to latest=true, make configurable via flag (#6252 ) fixes https://github.com/neondatabase/neon/issues/6209	2024-01-02 16:57:29 +00:00
Joonas Koivunen	946c6a0006	scrubber: use adaptive config with retries, check subset of tenants (#6219 ) The tool still needs a lot of work. These are the easiest fix and feature: - use similar adaptive config with s3 as remote_storage, use retries - process only particular tenants Tenants need to be from the correct region, they are not deduplicated, but the feature is useful for re-checking small amount of tenants after a large run.	2024-01-02 15:22:16 +00:00
Sasha Krassovsky	ce13281d54	MIN not MAX	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	4e1d16f311	Switch to exponential rate-limiting	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	091a0cda9d	Switch to rate-limiting strategy	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	ea9fad419e	Add exponential backoff to page_server->send	2024-01-02 06:28:49 -08:00
Arseny Sher	e92c9f42c0	Don't split WAL record across two XLogData's when sending from safekeepers. As protocol demands. Not following this makes standby complain about corrupted WAL in various ways. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 closes https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:50:20 +04:00
Arseny Sher	aaaa39d9f5	Add large insertion and slow WAL sending to test_hot_standby. To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL records torn across several XLogData messages. Add failpoint to safekeeper to slow down sending. Also check for corrupted WAL complains in standby log. Make the test a bit simpler in passing, e.g. we don't need explicit commits as autocommit is enabled by default. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:50:20 +04:00
Arseny Sher	e79a19339c	Add failpoint support to safekeeper. Just a copy paste from pageserver.	2024-01-02 10:50:20 +04:00
Arseny Sher	dbd36e40dc	Move failpoint support code to utils. To enable them in safekeeper as well.	2024-01-02 10:50:20 +04:00
Arseny Sher	90ef48aab8	Fix safekeeper START_REPLICATION (term=n). It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of uncommitted WAL since `cdb08f03` hanged. Add test for this.	2024-01-01 20:44:05 +04:00
Arseny Sher	9a43c04a19	compute_ctl: kill postgres and sync-safekeeprs on exit. Otherwise they are left orphaned when compute_ctl is terminated with a signal. It was invisible most of the time because normally neon_local or k8s kills postgres directly and then compute_ctl finishes gracefully. However, in some tests compute_ctl gets stuck waiting for sync-safekeepers which intentionally never ends because safekeepers are offline, and we want to stop compute_ctl without leaving orphanes behind. This is a quite rough approach which doesn't wait for children termination. A better way would be to convert compute_ctl to async which would make waiting easy.	2024-01-01 20:44:05 +04:00
Abhijeet Patil	f28bdb6528	Use nextest for rust unittests (#6223 ) ## Problem `cargo test` doesn't support timeouts or junit output format ## Summary of changes - Add `nextest` to `build-tools` image - Switch `cargo test` with `cargo nextest` on CI - Set timeout	2023-12-30 13:45:31 +00:00
Conrad Ludgate	1c037209c7	proxy: fix compute addr parsing (#6237 ) ## Problem control plane should be able to return domain names and not just IP addresses. ## Summary of changes 1. add regression tests 2. use rsplit to split the port from the back, then trim the ipv6 brackets	2023-12-29 09:32:24 +00:00
Bodobolero	e5a3b6dfd8	Pg stat statements reset for neon superuser (#6232 ) ## Problem Extension pg_stat_statements has function pg_stat_statements_reset(). In vanilla Postgres this function can only be called by superuser role or other users/roles explicitly granted. In Neon no end user can use superuser role. Instead we have neon_superuser role. We need to grant execute on pg_stat_statements_reset() to neon_superuser ## Summary of changes Modify the Postgres v14, v15, v16 contrib in our compute docker file to grant execute on pg_stat_statements_reset() to neon_superuser. (Modifying it in our docker file is preferable to changes in neondatabase/postgres because we want to limit the changes in our fork that we have to carry with each new version of Postgres). Note that the interface of proc/function pg_stat_statements_reset changed in pg_stat_statements version 1.7 So for versions up to and including 1.6 we must `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset() TO neon_superuser;` and for versions starting from 1.7 we must `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint) TO neon_superuser;` If we just use `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset() TO neon_superuser;` for all version this results in the following error for versions 1.7+: ```sql neondb=> create extension pg_stat_statements; ERROR: function pg_stat_statements_reset() does not exist ``` ## Checklist before requesting a review - [x ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist ## I have run the following test and could now invoke pg_stat_statements_reset() using default user ```bash (neon) peterbendel@Peters-MBP neon % kubectl get pods \| grep compute-quiet-mud-88416983 compute-quiet-mud-88416983-74f4bf67db-crl4c 3/3 Running 0 7m26s (neon) peterbendel@Peters-MBP neon % kubectl set image deploy/compute-quiet-mud-88416983 compute-node=neondatabase/compute-node-v15:7307610371 deployment.apps/compute-quiet-mud-88416983 image updated (neon) peterbendel@Peters-MBP neon % psql postgresql://peterbendel:<secret>@ep-bitter-sunset-73589702.us-east-2.aws.neon.build/neondb psql (16.1, server 15.5) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. neondb=> select version(); version --------------------------------------------------------------------------------------------------- PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit (1 row) neondb=> create extension pg_stat_statements; CREATE EXTENSION neondb=> select pg_stat_statements_reset(); pg_stat_statements_reset -------------------------- (1 row) ```	2023-12-27 18:15:17 +01:00
Sasha Krassovsky	136aab5479	Bump postgres submodule versions	2023-12-27 08:39:00 -08:00
Anastasia Lubennikova	6e40900569	Manage pgbouncer configuration from compute_ctl: - add pgbouncer_settings section to compute spec; - add pgbouncer-connstr option to compute_ctl. - add pgbouncer-ini-path option to compute_ctl. Default: /etc/pgbouncer/pgbouncer.ini Apply pgbouncer config on compute start and respec to override default spec. Save pgbouncer config updates to pgbouncer.ini to preserve them across pgbouncer restarts.	2023-12-26 15:17:09 +00:00
Arseny Sher	ddc431fc8f	pgindent walproposer condvar comment	2023-12-26 14:12:53 +04:00
Arseny Sher	bfc98f36e3	Refactor handling responses in walproposer. Remove confirm_wal_streamed; we already apply both write and flush positions of the slot to commit_lsn which is fine because 1) we need to wake up waiters 2) committed WAL can be fetched from safekeepers by neon_walreader now.	2023-12-26 14:12:53 +04:00
Arseny Sher	d5fbfe2399	Remove test_wal_deleted_after_broadcast. It is superseded by stronger test_lagging_sk.	2023-12-26 14:12:53 +04:00
Arseny Sher	1f1c50e8c7	Don't re-add neon_walreader socket to waiteventset if possible. Should make recovery slightly more efficient (likely negligibly).	2023-12-26 14:12:53 +04:00
Arseny Sher	854df0f566	Do PQgetCopyData before PQconsumeInput in libpqwp_async_read. To avoid a lot of redundant memmoves and bloated input buffer. fixes https://github.com/neondatabase/neon/issues/6055	2023-12-26 14:12:53 +04:00
Arseny Sher	9c493869c7	Perform synchronous WAL download in wp only for logical replication. wp -> sk communication now uses neon_walreader which will fetch missing WAL on demand from safekeepers, so doesn't need this anymore. Also, cap WAL download by max_slot_wal_keep_size to be able to start compute if lag is too high.	2023-12-26 14:12:53 +04:00
Arseny Sher	df760e6de5	Add test_lagging_sk.	2023-12-26 14:12:53 +04:00
Arseny Sher	14913c6443	Adapt rust walproposer to neon_walreader.	2023-12-26 14:12:53 +04:00
Arseny Sher	cdb08f0362	Introduce NeonWALReader downloading sk -> compute WAL on demand. It is similar to XLogReader, but when either requested segment is missing locally or requested LSN is before basebackup_lsn NeonWALReader asynchronously fetches WAL from one of safekeepers. Patch includes walproposer switch to NeonWALReader, splitting wouldn't make much sense as it is hard to test otherwise. This finally removes risk of pg_wal explosion (as well as slow start time) when one safekeeper is lagging, at the same time allowing to recover it. In the future reader should also be used by logical walsender for similar reasons (currently we download the tail on compute start synchronously). The main test is test_lagging_sk. However, I also run it manually a lot varying MAX_SEND_SIZE on both sides (on safekeeper and on walproposer), testing various fragmentations (one side having small buffer, another, both), which brought up https://github.com/neondatabase/neon/issues/6055 closes https://github.com/neondatabase/neon/issues/1012	2023-12-26 14:12:53 +04:00
Konstantin Knizhnik	572bc06011	Do not copy WAL for lagged slots (#6221 ) ## Problem See https://neondb.slack.com/archives/C026T7K2YP9/p1702813041997959 ## Summary of changes Do not take in account invalidated slots when calculate restart_lsn position for basebackup at page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-12-22 20:47:55 +02:00
Arpad Müller	a7342b3897	remote_storage: store last_modified and etag in Download (#6227 ) Store the content of the `last-modified` and `etag` HTTP headers in `Download`. This serves both as the first step towards #6199 and as a preparation for tests in #6155 .	2023-12-22 14:13:20 +01:00
John Spray	e68ae2888a	pageserver: expedite tenant activation on delete (#6190 ) ## Problem During startup, a tenant delete request might have to retry for many minutes waiting for a tenant to enter Active state. ## Summary of changes - Refactor delete_tenant into TenantManager: this is not a functional change, but will avoid merge conflicts with https://github.com/neondatabase/neon/pull/6105 later - Add 412 responses to the swagger definition of this endpoint. - Use Tenant::wait_to_become_active in `TenantManager::delete_tenant` --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-12-22 10:22:22 +00:00
Arpad Müller	83000b3824	buildtools: update protoc and mold (#6222 ) These updates aren't very important but I would like to try out the new process as of #6195	2023-12-21 18:07:21 +01:00
Arpad Müller	a21b719770	Use neon-github-ci-tests S3 bucket for remote_storage tests (#6216 ) This bucket is already used by the pytests. The current bucket github-public-dev is more meant for longer living artifacts. slack thread: https://neondb.slack.com/archives/C039YKBRZB4/p1703124944669009 Part of https://github.com/neondatabase/cloud/issues/8233 / #6155	2023-12-21 17:28:28 +01:00

1 2 3 4 5 ...

4313 Commits