rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-15 20:20:38 +00:00

Author	SHA1	Message	Date
Peter Bendel	f457cef8d4	make it compile on Linux again	2025-04-23 17:14:29 +00:00
Peter Bendel	d763caa3a9	still compiles on Linux, too	2025-04-23 16:31:04 +00:00
BodoBolero	4d99c10c5e	now removed ALL pageserver usage of metrics	2025-04-23 17:15:47 +02:00
BodoBolero	ce1e575db1	removed many more metrics, still compiles	2025-04-23 16:12:01 +02:00
BodoBolero	a12369be43	remove some more metrics, still compiles	2025-04-22 18:32:04 +02:00
BodoBolero	6d77432ed2	remove more metrcis, still compiles	2025-04-22 17:04:55 +02:00
BodoBolero	2a5b0d1b99	remove more metrics, still compiles	2025-04-22 16:02:23 +02:00
BodoBolero	b811ae4fe5	remove more metrics, still compiles	2025-04-22 15:34:20 +02:00
BodoBolero	0c6defd8da	many metrics removed and still compiles and can be started	2025-04-17 17:44:21 +02:00
BodoBolero	9584f65950	remove more metrics, still compiles	2025-04-17 16:12:06 +02:00
BodoBolero	ef81d0b81d	remove some more metrics	2025-04-17 14:16:07 +02:00
BodoBolero	e019b82d87	remove more metrics - still compiles	2025-04-17 11:12:39 +02:00
BodoBolero	cfe9a8ad11	remove some metrics usages	2025-04-17 10:30:59 +02:00
BodoBolero	f72a1505e6	remove warnings	2025-04-16 19:47:43 +02:00
BodoBolero	4ba997c3e5	fix execution errors	2025-04-16 19:39:24 +02:00
BodoBolero	1882674a8a	Merge remote-tracking branch 'origin/main' into bodobolero/remove_global_locks	2025-04-16 19:05:13 +02:00
Vlad Lazar	0e00faf528	tests: stability fixes for `test_migration_to_cold_secondary` (#11606 ) 1. Compute may generate WAL on shutdown. The test assumes that after shutdown, no further ingest happens. Tweak the compute shutdown to make the assumption true. 2. Assertion of local layer count post cold migration is not right since we may have downloaded layers due to ingest. Remove it. Closes https://github.com/neondatabase/neon/issues/11587	2025-04-16 16:31:23 +00:00
Anastasia Lubennikova	7747a9619f	compute: fix copy-paste typo for neon GUC parameters check (#11610 ) fix for commit [`5063151`](`5063151271`)	2025-04-16 15:55:11 +00:00
Erik Grinaker	46100717ad	pageserver: add `VectoredBlob::raw_with_header` (#11607 ) ## Problem To avoid recompressing page images during layer filtering, we need access to the raw header and data from vectored reads such that we can pass them through to the target layer. Touches #11562. ## Summary of changes Adds `VectoredBlob::raw_with_header()` to return a raw view of the header+data, and updates `read()` to track it. Also adds `blob_io::Header` with header metadata and decode logic, to reuse for tests and assertions. This isn't yet widely used.	2025-04-16 15:38:10 +00:00
Erik Grinaker	00eeff9b8d	pageserver: add `compaction_shard_ancestor` to disable shard ancestor compaction (#11608 ) ## Problem Splits of large tenants (several TB) can cause a huge amount of shard ancestor compaction work, which can overload Pageservers. Touches https://github.com/neondatabase/cloud/issues/22532. ## Summary of changes Add a setting `compaction_shard_ancestor` (default `true`) to disable shard ancestor compaction on a per-tenant basis.	2025-04-16 14:41:02 +00:00
Matthias van de Meent	2a46426157	Update neon GUCs with new default settings (#11595 ) Staging and prod both have these settings configured like this, so let's update this so we can eventually drop the overrides in prod.	2025-04-16 13:42:22 +00:00
BodoBolero	2033aeead1	still compiles	2025-04-16 15:28:21 +02:00
BodoBolero	d84c534922	metrics disabled still compiles	2025-04-16 15:09:38 +02:00
Tristan Partin	edc11253b6	Fix neon_local public key parsing when create compute JWKS (#11602 ) Finally figured out the right incantation. I had had this in my original go, but due to some refactoring and apparently missed testing, I committed a mistake. The reason this doesn't currently break anything is that we bypass the authorization middleware when the "testing" cargo feature is enabled. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-16 12:51:48 +00:00
Heikki Linnakangas	b4e26a6284	Set last-written LSN as part of smgr_end_unlogged_build() (#11584 ) This way, the callers don't need to do it, reducing the footprint of changes we've had to made to various index AM's build functions.	2025-04-16 12:34:18 +00:00
Vlad Lazar	96b46365e4	tests: attach final metrics to allure report (#11604 ) ## Problem Metrics are saved in https://github.com/neondatabase/neon/pull/11559, but the file is not matched by the attachment regex. ## Summary of changes Make attachment regex match the metrics file.	2025-04-16 10:26:47 +00:00
BodoBolero	fea8c98b59	remove usages of metrics	2025-04-16 12:07:45 +02:00
BodoBolero	eba08ab0a8	comment usages of coutners, gauges and histograms	2025-04-16 11:45:58 +02:00
Alex Chi Z.	aa19f10e7e	fix(test): allow shutdown warning in preempt tests (#11600 ) ## Problem test_gc_compaction_preempt is still flaky ## Summary of changes - allow shutdown warning logs Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-15 21:50:28 +00:00
Konstantin Knizhnik	35170656fe	Allocate WalProposerConn using TopMemoryAllocator (#11577 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1744659631698609 `WalProposerConn` is allocated using current memory context which life time is not long enough. ## Summary of changes Allocate `WalProposerConn` using `TopMemoryContext`. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-04-15 19:13:12 +00:00
Tristan Partin	cd9ad75797	Remove compute_ctl authorization bypass on localhost (#11597 ) For whatever reason, this never worked in production computes anyway. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-15 19:12:34 +00:00
Tristan Partin	eadb05f78e	Teach neon_local to pass the Authorization header to compute_ctl (#11490 ) This allows us to remove hacks in the compute_ctl authorization middleware which allowed for bypasses of auth checks. Fixes: https://github.com/neondatabase/neon/issues/11316 Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-15 17:27:49 +00:00
Peter Bendel	ccf32412eb	give 500 tenants more time to start up (however root cause was ulimit -n)	2025-04-15 16:21:07 +00:00
Fedor Dikarev	c5115518e9	remove temp file from repo (#11586 ) ## Problem In https://github.com/neondatabase/neon/pull/11409 we added temp file to the repo. ## Summary of changes Remove temp file from the repo.	2025-04-15 15:29:15 +00:00
Alex Chi Z.	931f8c4300	fix(pageserver): check if cancelled before waiting logical size (2/2) (#11575 ) ## Problem close https://github.com/neondatabase/neon/issues/11486, proceeding https://github.com/neondatabase/neon/pull/11531 ## Summary of changes This patch fixes the rest 50% of instability of `test_create_churn_during_restart`. During tenant warmup, we'll request logical size; however, if the startup gets cancelled, we won't be able to spawn the initial logical size calculation task that sets the `cancel_wait_for_background_loop_concurrency_limit_semaphore`. Therefore, we check `cancelled` before proceeding to get `cancel_wait_for_background_loop_concurrency_limit_semaphore`. There will still be a race if the timeline shutdown happens after L5710 and before L5711, but it should be enough to reduce the flakiness of the test. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-15 15:16:16 +00:00
Alexander Bayandin	0f7c2cc382	CI(release): add time to RC PR branch names (#11547 ) ## Problem We can't have more than one open release PR created on the same day (due to non-unique enough branch names). ## Summary of changes - Add time (hours and minutes) to RC PR branch names - Also make sure we use UTC for releases	2025-04-15 15:08:05 +00:00
Erik Grinaker	983d56502b	pageserver: reduce shard ancestor rewrite threshold to 30% (#11582 ) ## Problem When doing power-of-two shard splits (i.e. 4 → 8 → 16), we end up rewriting all layers since half of the pages will be local due to striping. This causes a lot of resource usage when splitting large tenants. ## Summary of changes Drop the threshold of local/total pages to 30%, to reduce the amount of layer rewrites after splits.	2025-04-15 14:26:29 +00:00
Erik Grinaker	bcef542d5b	pageserver: don't rewrite invisible layers during ancestor compaction (#11580 ) ## Problem Shard ancestor compaction can be very expensive following shard splits of large tenants. We currently rewrite garbage layers after shard splits as well, which can be a significant amount of data. Touches https://github.com/neondatabase/cloud/issues/22532. ## Summary of changes Don't rewrite invisible layers after shard splits.	2025-04-15 14:25:58 +00:00
a-masterov	e31455d936	Add the tests for the extensions `pg_jsonschema` and `pg_session_jwt` (#11323 ) ## Problem `pg_jsonschema` and `pg_session_jwt` are not yet covered by tests ## Summary of changes Added the tests for these extensions.	2025-04-15 14:06:01 +00:00
Alex Chi Z.	a4ea7d6194	fix(pageserver): gc-compaction verification false failure (#11564 ) ## Problem https://github.com/neondatabase/neon/pull/11515 introduced a bug that some key history cannot be verified. If a key only exists above the horizon, the verification will fail for its first occurrence because the history does not exist at that point. As gc-compaction skips a key range whenever an error occurs, it might be doing some wasted work in staging/prod now. But I'm not planning a hotfix this week as the bug doesn't affect correctness/performance. ## Summary of changes Allow keys with only above horizon history in the verification. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-15 13:58:32 +00:00
Alexander Bayandin	19bea5fd0c	CI: do not wait for tests to trigger deploy job (#11548 ) ## Problem There is too much delay between merging a PR into `main` and deploying the changes to staging ## Summary of changes - Trigger `deploy` job without waiting for `build-and-test-locally` job	2025-04-15 11:23:41 +00:00
a-masterov	5be94e28c4	Update the documentation of the cloud regress test (#11539 ) ## Problem The information in the README.md contained errors, and some information was missing. ## Summary of changes Found errors are fixed, and new information is added. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-04-15 11:00:25 +00:00
Alexander Bayandin	63a106021a	CI(allure-report-generate): Install allure to /tmp (#11579 ) ## Problem The `/__w/neon/neon` directory is mounted from host to container and persists between runs. Sometimes the next workflow run fails to delete it: ``` Deleting the contents of '/__w/neon/neon' Error: File was unable to be removed Error: EACCES: permission denied, rmdir '/__w/neon/neon/allure-2.32.2/bin' ``` ## Summary of changes - Download and install allure to `/tmp` which exists in container only Ref https://github.com/neondatabase/cloud/issues/27186	2025-04-15 09:29:36 +00:00
Fedor Dikarev	9a6ace9bde	introduce new runners: unit-perf and use them for benchmark jobs (#11409 ) ## Problem Benchmarks results are inconsistent on existing small-metal runners ## Summary of changes Introduce new `unit-perf` runners, and lets run benchmark on them. The new hardware has slower, but consistent, CPU frequency - if run with default governor schedutil. Thus we needed to adjust some testcases' timeouts and add some retry steps where hard-coded timeouts couldn't be increased without changing the system under test. - [wait_for_last_record_lsn](`6592d69a67/test_runner/fixtures/pageserver/utils.py (L193)`) 1000s -> 2000s - [test_branch_creation_many](https://github.com/neondatabase/neon/pull/11409/files#diff-2ebfe76f89004d563c7e53e3ca82462e1d85e92e6d5588e8e8f598bbe119e927) 1000s - [test_ingest_insert_bulk](https://github.com/neondatabase/neon/pull/11409/files#diff-e90e685be4a87053bc264a68740969e6a8872c8897b8b748d0e8c5f683a68d9f) - with back throttling disabled compute becomes unresponsive for more than 60 seconds (PG hard-coded client authentication connection timeout) - [test_sharded_ingest](https://github.com/neondatabase/neon/pull/11409/files#diff-e8d870165bd44acb9a6d8350f8640b301c1385a4108430b8d6d659b697e4a3f1) 600s -> 1200s Right now there are only 2 runners of that class, and if we decide to go with them, we have to check how much that type of runners we need, so jobs not stuck with waiting for that type of runners available. However we now decided to run those runners with governor performance instead of schedutil. This achieves almost same performance as previous runners but still achieves consistent results for same commit Related issue to activate performance governor on these runners https://github.com/neondatabase/runner/pull/138 ## Verification that it helps ### analyze runtimes on new runner for same commit Table of runtimes for the same commit on different runners in [run](https://github.com/neondatabase/neon/actions/runs/14417589789) \| Run \| Benchmarks (1) \| Benchmarks (2) \|Benchmarks (3) \|Benchmarks (4) \| Benchmarks (5) \| \|--------\|--------\|---------\|---------\|---------\|---------\| \| 1 \| 1950.37s \| 6374.55s \| 3646.15s \| 4149.48s \| 2330.22s \| \| 2 \| - \| 6369.27s \| 3666.65s \| 4162.42s \| 2329.23s \| \| Delta % \| - \| 0,07 % \| 0,5 % \| 0,3 % \| 0,04 % \| \| with governor performance \| 1519.57s \| 4131.62s \| - \| - \| - \| \| second run gov. perf. \| 1513.62s \| 4134.67s \| - \| - \| - \| \| Delta % \| 0,3 % \| 0,07 % \| - \| - \| - \| \| speedup gov. performance \| 22 % \| 35 % \| - \| - \| - \| \| current desktop class hetzner runners (main) \| 1487.10s \| 3699.67s \| - \| - \| - \| \| slower than desktop class \| 2 % \| 12 % \| - \| - \| - \| In summary, the runtimes for the same commit on this hardware varies less than 1 %. --------- Co-authored-by: BodoBolero <peterbendel@neon.tech>	2025-04-15 08:21:44 +00:00
Erik Grinaker	8c77ccfc01	pageserver: log total progress during shard ancestor compaction (#11565 ) ## Problem Shard ancestor compaction doesn't currently log any global progress information, only for the current batch. ## Summary of changes Log the number of layers checked for eligibility this iteration, and the total number of layers to check. This will indicate how far along the total shard ancestor compaction has gotten for this iteration.	2025-04-15 07:25:09 +00:00
Tristan Partin	cbd2fc2395	Clean up logs and error messages in compute_ctl authorize middleware (#11576 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-15 01:21:18 +00:00
Tristan Partin	028a191040	Continue with s/spec/config changes (#11574 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-14 21:18:21 +00:00
Vlad Lazar	8cce27bedb	pageserver: add a randomized read path test (#11519 ) ## Problem Every time we make changes to the read path to fix a bug or add a feature, we end up adding another incomprehensible test. ## Summary of changes Add some generic infrastructure for generating a layer map from a type spec and use that for a read path test. The test is randomized but uses a fixed seed by default. A fuzzing mode is available for confidence building. See [Notion page](https://www.notion.so/neondatabase/Read-Path-Unit-Testing-Fuzzing-1d1f189e0047806c8e5cd37781b0a350?pvs=4) for a diagram of the layer map used. Just for fun I tried removing [this commit](`9990199cb4`) from https://github.com/neondatabase/neon/pull/11494 and it caught the bug in the normal mode (no fuzzing required).	2025-04-14 15:31:32 +00:00
Vlad Lazar	90b706cd96	tests: save pageserver metrics at the end of the test (#11559 ) ## Problem Sometimes it's useful to see the pageserver metrics after a test in order to debug stuff. For example, for https://github.com/neondatabase/neon/issues/11465 I'd like to know what the remote storage latencies are from the client. ## Summary of changes When stopping the env, record the pageserver metrics into a file in the pageserver's workdir.	2025-04-14 15:13:20 +00:00
Alex Chi Z.	057ce115de	fix(test): allow stale generation errors (1/2) (#11531 ) ## Problem Part of https://github.com/neondatabase/neon/issues/11486 ## Summary of changes 50% of the test instability of `test_create_churn_during_restart` are due to error message gets changed. Allow the new error message. Still need to fix other errors due to failure to acquire semaphore in this or the next patch. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-14 14:51:17 +00:00

1 2 3 4 5 ...

7742 Commits