rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-06 13:40:37 +00:00

Author	SHA1	Message	Date
Christian Schwarz	c87c19a646	move the logic of emitting the clear visibility wal records into a common function	2024-01-04 13:03:39 +00:00
Christian Schwarz	92280727df	turns on ingest_neonrmgr_record is just copy-pasta, re-do copy-pasta	2024-01-04 12:52:34 +00:00
Christian Schwarz	31fc069482	fixup	2024-01-04 12:48:49 +00:00
Christian Schwarz	16090c876d	and now it's obvious that new_heap_blkno and old_heap_blkno really are independent	2024-01-04 12:47:42 +00:00
Christian Schwarz	02dc0db633	comments	2024-01-04 12:36:45 +00:00
Christian Schwarz	8e04de6ef9	fixup 'restructure match block to make the special case clear'	2024-01-04 12:36:35 +00:00
Christian Schwarz	0713f367d4	restructure match block to make the special case clear	2024-01-04 12:23:23 +00:00
Christian Schwarz	93d0f5e93d	lift up the vm_size checking logic	2024-01-04 12:14:31 +00:00
Christian Schwarz	20957d6c4e	lift up HEAPBLK_TO_MAPBLOCK call	2024-01-04 11:54:08 +00:00
Christian Schwarz	f4de9adb1d	same for the Some,Some case	2024-01-04 11:10:12 +00:00
Christian Schwarz	98ee0d9012	propagate Some()-ness	2024-01-04 11:05:26 +00:00
Christian Schwarz	6933f5d089	transform the nested `if` into a flattened `match`	2024-01-04 10:57:19 +00:00
Christian Schwarz	853f77eb11	some constant propagation	2024-01-04 10:52:21 +00:00
Christian Schwarz	ccfc9741f6	move vm_rel out of match	2024-01-04 10:48:53 +00:00
Christian Schwarz	c6d09f8942	transform outermost `if` to a `match`	2024-01-04 10:47:05 +00:00
Christian Schwarz	c8d36dab59	walredo: DRY ClearVisibilityMapFlags record handling	2024-01-04 10:41:36 +00:00
Sasha Krassovsky	7662df6ca0	Fix minimum backoff to 1ms	2024-01-03 21:09:19 -08:00
John Spray	c119af8ddd	pageserver: run at least 2 background task threads Otherwise an assertion in CONCURRENT_BACKGROUND_TASKS will trip if you try to run the pageserver on a single core.	2024-01-03 14:22:40 +00:00
John Spray	a2e083ebe0	pageserver: make walredo shard-aware This does not have a functional impact, but enables all the logging in this code to include the shard_id label.	2024-01-03 14:22:40 +00:00
John Spray	73a944205b	pageserver: log details on shard routing error	2024-01-03 14:22:40 +00:00
John Spray	34ebfbdd6f	pageserver: fix handling getpage with multiple shards on one node Previously, we would wait for the LSN to be visible on whichever timeline we happened to load at the start of the connection, then proceed to look up the correct timeline for the key and do the read. If the timeline holding the key was behind the timeline we used for the LSN wait, then we might serve an apparently-successful read result that actually contains data from behind the requested lsn.	2024-01-03 14:22:40 +00:00
John Spray	ef7c9c2ccc	pageserver: fix active tenant lookup hitting secondaries with sharding If there is some secondary shard for a tenant on the same node as an attached shard, the secondary shard could trip up this code and cause page_service to incorrectly get an error instead of finding the attached shard.	2024-01-03 14:22:40 +00:00
John Spray	6c79e12630	pageserver: drop unwanted keys during compaction after split	2024-01-03 14:22:40 +00:00
John Spray	753d97bd77	pageserver: don't delete ancestor shard layers	2024-01-03 14:22:40 +00:00
John Spray	edc962f1d7	test_runner: test_issue_5878 log allow list (#6259 ) ## Problem https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6254/7388706419/index.html#suites/5a4b8734277a9878cb429b80c314f470/e54c4f6f6ed22672 ## Summary of changes Permit the log message: because the test helper's detach function increments the generation number, a detach/attach cycle can cause the error if the test runner node is slow enough for the opportunistic deletion queue flush on detach not to complete by the time we call attach.	2024-01-03 14:22:17 +00:00
Arseny Sher	65b4e6e7d6	Remove empty safekeeper init since truncateLsn. It has caveats such as creating half empty segment which can't be offloaded. Instead we'll pursue approach of pull_timeline, seeding new state from some peer.	2024-01-03 18:20:19 +04:00
Alexander Bayandin	17b256679b	vm-image-spec: build pgbouncer from Neon's fork (#6249 ) ## Problem We need to add one more patch to pgbouncer (for https://github.com/neondatabase/neon/issues/5801). I've decided to cherry-pick all required patches to a pgbouncer fork (`neondatabase/pgbouncer`) and use it instead. See https://github.com/neondatabase/pgbouncer/releases/tag/pgbouncer_1_21_0-neon-1 ## Summary of changes - Revert the previous patch (for deallocate/discard all) — the fork already contains it. - Remove `libssl-dev` dependency — we build pgbouncer without `openssl` support. - Clone git tag and build pgbouncer from source code.	2024-01-03 13:02:04 +00:00
John Spray	673a865055	tests: tolerate 304 when evicting layers (#6261 ) In tests that evict layers, explicit eviction can race with automatic eviction of the same layer and result in a 304	2024-01-03 11:50:58 +00:00
Cuong Nguyen	fb518aea0d	Add batch ingestion mechanism to avoid high contention (#5886 ) ## Problem For context, this problem was observed in a research project where we try to make neon run in multiple regions and I was asked by @hlinnaka to make this PR. In our project, we use the pageserver in a non-conventional way such that we would send a larger number of requests to the pageserver than normal (imagine postgres without the buffer pool). I measured the time from the moment a WAL record left the safekeeper to when it reached the pageserver ([code](`e593db1f5a/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs (L282-L287)`)) and observed that when the number of get_page_at_lsn requests was high, the wal receiving time increased significantly (see the left side of the graphs below). Upon further investigation, I found that the delay was caused by this line `d2ca410919/pageserver/src/tenant/timeline.rs (L2348)` The `get_layer_for_write` method is called for every value during WAL ingestion and it tries to acquire layers write lock every time, thus this results in high contention when read lock is acquired more frequently. ![Untitled](https://github.com/neondatabase/neon/assets/6244849/85460f4d-ead1-4532-bc64-736d0bfd7f16) ![Untitled2](https://github.com/neondatabase/neon/assets/6244849/84199ab7-5f0e-413b-a42b-f728f2225218) ## Summary of changes It is unnecessary to call `get_layer_for_write` repeatedly for all values in a WAL message since they would end up in the same memory layer anyway, so I created the batched versions of `InMemoryLayer::put_value`, `InMemoryLayer ::put_tombstone`, `Timeline::put_value`, and `Timeline::put_tombstone`, that acquire the locks once for a batch of values. Additionally, `DatadirModification` is changed to store multiple versions of uncommitted values, and `WalIngest::ingest_record()` can now ingest records without immediately committing them. With these new APIs, the new ingestion loop can be changed to commit for every `ingest_batch_size` records. The `ingest_batch_size` variable is exposed as a config. If it is set to 1 then we get the same behavior before this change. I found that setting this value to 100 seems to work the best, and you can see its effect on the right side of the above graphs. --------- Co-authored-by: John Spray <john@neon.tech>	2024-01-03 10:41:58 +00:00
John Spray	42f41afcbd	tests: update pytest and boto3 dependencies (#6253 ) ## Problem The version of pytest we were using emits a number of DeprecationWarnings on latest python: these are fixed in latest release. boto3 and python-dateutil also have deprecation warnings, but unfortunately these aren't fixed upstream yet. ## Summary of changes - Update pytest - Update boto3 (this doesn't fix deprecation warnings, but by the time I figured that out I had already done the update, and it's good hygiene anyway)	2024-01-03 10:36:53 +00:00
Arseny Sher	f71110383c	Remove second check for max_slot_wal_keep_size download size. Already checked in GetLogRepRestartLSN, a rebase artifact.	2024-01-03 13:13:32 +04:00
Arseny Sher	ae3eaf9995	Add [WP] prefix to all walproposer logging. - rename walpop_log to wp_log - create also wpg_log which is used in postgres-specific code - in passing format messages to start with lower case	2024-01-03 11:10:27 +04:00
Christian Schwarz	aa9f1d4b69	pagebench get-page: default to latest=true, make configurable via flag (#6252 ) fixes https://github.com/neondatabase/neon/issues/6209	2024-01-02 16:57:29 +00:00
Joonas Koivunen	946c6a0006	scrubber: use adaptive config with retries, check subset of tenants (#6219 ) The tool still needs a lot of work. These are the easiest fix and feature: - use similar adaptive config with s3 as remote_storage, use retries - process only particular tenants Tenants need to be from the correct region, they are not deduplicated, but the feature is useful for re-checking small amount of tenants after a large run.	2024-01-02 15:22:16 +00:00
Sasha Krassovsky	ce13281d54	MIN not MAX	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	4e1d16f311	Switch to exponential rate-limiting	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	091a0cda9d	Switch to rate-limiting strategy	2024-01-02 06:28:49 -08:00
Sasha Krassovsky	ea9fad419e	Add exponential backoff to page_server->send	2024-01-02 06:28:49 -08:00
Arseny Sher	e92c9f42c0	Don't split WAL record across two XLogData's when sending from safekeepers. As protocol demands. Not following this makes standby complain about corrupted WAL in various ways. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 closes https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:50:20 +04:00
Arseny Sher	aaaa39d9f5	Add large insertion and slow WAL sending to test_hot_standby. To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL records torn across several XLogData messages. Add failpoint to safekeeper to slow down sending. Also check for corrupted WAL complains in standby log. Make the test a bit simpler in passing, e.g. we don't need explicit commits as autocommit is enabled by default. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:50:20 +04:00
Arseny Sher	e79a19339c	Add failpoint support to safekeeper. Just a copy paste from pageserver.	2024-01-02 10:50:20 +04:00
Arseny Sher	dbd36e40dc	Move failpoint support code to utils. To enable them in safekeeper as well.	2024-01-02 10:50:20 +04:00
Arseny Sher	90ef48aab8	Fix safekeeper START_REPLICATION (term=n). It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of uncommitted WAL since `cdb08f03` hanged. Add test for this.	2024-01-01 20:44:05 +04:00
Arseny Sher	9a43c04a19	compute_ctl: kill postgres and sync-safekeeprs on exit. Otherwise they are left orphaned when compute_ctl is terminated with a signal. It was invisible most of the time because normally neon_local or k8s kills postgres directly and then compute_ctl finishes gracefully. However, in some tests compute_ctl gets stuck waiting for sync-safekeepers which intentionally never ends because safekeepers are offline, and we want to stop compute_ctl without leaving orphanes behind. This is a quite rough approach which doesn't wait for children termination. A better way would be to convert compute_ctl to async which would make waiting easy.	2024-01-01 20:44:05 +04:00
Abhijeet Patil	f28bdb6528	Use nextest for rust unittests (#6223 ) ## Problem `cargo test` doesn't support timeouts or junit output format ## Summary of changes - Add `nextest` to `build-tools` image - Switch `cargo test` with `cargo nextest` on CI - Set timeout	2023-12-30 13:45:31 +00:00
Conrad Ludgate	1c037209c7	proxy: fix compute addr parsing (#6237 ) ## Problem control plane should be able to return domain names and not just IP addresses. ## Summary of changes 1. add regression tests 2. use rsplit to split the port from the back, then trim the ipv6 brackets	2023-12-29 09:32:24 +00:00
Bodobolero	e5a3b6dfd8	Pg stat statements reset for neon superuser (#6232 ) ## Problem Extension pg_stat_statements has function pg_stat_statements_reset(). In vanilla Postgres this function can only be called by superuser role or other users/roles explicitly granted. In Neon no end user can use superuser role. Instead we have neon_superuser role. We need to grant execute on pg_stat_statements_reset() to neon_superuser ## Summary of changes Modify the Postgres v14, v15, v16 contrib in our compute docker file to grant execute on pg_stat_statements_reset() to neon_superuser. (Modifying it in our docker file is preferable to changes in neondatabase/postgres because we want to limit the changes in our fork that we have to carry with each new version of Postgres). Note that the interface of proc/function pg_stat_statements_reset changed in pg_stat_statements version 1.7 So for versions up to and including 1.6 we must `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset() TO neon_superuser;` and for versions starting from 1.7 we must `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint) TO neon_superuser;` If we just use `GRANT EXECUTE ON FUNCTION pg_stat_statements_reset() TO neon_superuser;` for all version this results in the following error for versions 1.7+: ```sql neondb=> create extension pg_stat_statements; ERROR: function pg_stat_statements_reset() does not exist ``` ## Checklist before requesting a review - [x ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist ## I have run the following test and could now invoke pg_stat_statements_reset() using default user ```bash (neon) peterbendel@Peters-MBP neon % kubectl get pods \| grep compute-quiet-mud-88416983 compute-quiet-mud-88416983-74f4bf67db-crl4c 3/3 Running 0 7m26s (neon) peterbendel@Peters-MBP neon % kubectl set image deploy/compute-quiet-mud-88416983 compute-node=neondatabase/compute-node-v15:7307610371 deployment.apps/compute-quiet-mud-88416983 image updated (neon) peterbendel@Peters-MBP neon % psql postgresql://peterbendel:<secret>@ep-bitter-sunset-73589702.us-east-2.aws.neon.build/neondb psql (16.1, server 15.5) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off) Type "help" for help. neondb=> select version(); version --------------------------------------------------------------------------------------------------- PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit (1 row) neondb=> create extension pg_stat_statements; CREATE EXTENSION neondb=> select pg_stat_statements_reset(); pg_stat_statements_reset -------------------------- (1 row) ```	2023-12-27 18:15:17 +01:00
Sasha Krassovsky	136aab5479	Bump postgres submodule versions	2023-12-27 08:39:00 -08:00
Anastasia Lubennikova	6e40900569	Manage pgbouncer configuration from compute_ctl: - add pgbouncer_settings section to compute spec; - add pgbouncer-connstr option to compute_ctl. - add pgbouncer-ini-path option to compute_ctl. Default: /etc/pgbouncer/pgbouncer.ini Apply pgbouncer config on compute start and respec to override default spec. Save pgbouncer config updates to pgbouncer.ini to preserve them across pgbouncer restarts.	2023-12-26 15:17:09 +00:00
Arseny Sher	ddc431fc8f	pgindent walproposer condvar comment	2023-12-26 14:12:53 +04:00

1 2 3 4 5 ...

4326 Commits