Comparing 5700233a47...86d5798108 - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-03-04 00:40:38 +00:00

Author	SHA1	Message	Date
Folke Behrens	86d5798108	Merge pull request #10576 from neondatabase/rc/release-proxy/2025-01-30 Proxy release 2025-01-30	2025-01-30 08:52:09 +01:00
github-actions[bot]	8b4088dd8a	Proxy release 2025-01-30	2025-01-30 06:02:00 +00:00
Alex Chi Z.	77ea9b16fe	fix(pageserver): use the larger one of upper limit and threshold (#10571 ) ## Problem Follow up of https://github.com/neondatabase/neon/pull/10550 in case the upper limit is set larger than threshold. It does not make sense for someone to enforce the behavior like "if there are >= 50 L0s, only compact 10 of them". ## Summary of changes Use the maximum of compaction threshold and upper limit when selecting L0 files to compact. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-30 00:05:40 +00:00
Alex Chi Z.	9dff6cc2a4	fix(pageserver): skip repartition if we need L0 compaction (#10547 ) ## Problem Repartition is slow, but it's only used in image layer creation. We can skip it if we have a lot of L0 layers to ingest. ## Summary of changes If L0 compaction is not complete, do not repartition and do not create image layers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-29 21:32:50 +00:00
Erik Grinaker	ff298afb97	pageserver: add `level` for timeline layer metrics (#10563 ) ## Problem We don't have good observability for per-timeline compaction debt, specifically the number of delta layers in the frozen, L0, and L1 levels. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes * Add a `level` label for `pageserver_layer_{count,size}` with values `l0`, `l1`, and `frozen`. * Track metrics for frozen layers. There is already a `kind={delta,image}` label. `kind=image` is only possible for `level=l1`. We don't include the currently open ephemeral layer, only frozen layers. There is always exactly 1 ephemeral layer, with a dynamic size which is already tracked in `pageserver_timeline_ephemeral_bytes`.	2025-01-29 21:10:56 +00:00
Fedor Dikarev	de1c35fab3	add retries for apt, wget and curl (#10553 ) Ref: https://github.com/neondatabase/cloud/issues/23461 ## Problem > recent CI failure due to apt-get: ``` 4.266 E: Failed to fetch http://deb.debian.org/debian/pool/main/g/gcc-10/libgfortran5_10.2.1-6_arm64.deb Error reading from server - read (104: Connection reset by peer) [IP: 146.75.122.132 80] ``` https://github.com/neondatabase/neon/actions/runs/11144974698/job/30973537767?pr=9186 thinking about if there should be a mirror-selector at the beginning of the dockerfile so that it uses a debian mirror closer to the build server? ## Summary of changes We could consider adding local mirror or proxy and keep it close to our self-hosted runners. For now lets just add retries for `apt`, `wget` and `curl` thanks to @skyzh for reporting that in October 2024, I just finally found time to take a look here :)	2025-01-29 21:02:54 +00:00
Peter Bendel	62819aca36	Add PostgreSQL version 17 benchmarks (#10536 ) ## Problem benchmarking.yml so far is only running benchmarks with PostgreSQL version 16. However neon recently changed the default for new customers to PostgreSQL version 17. See related [epic](https://github.com/neondatabase/cloud/issues/23295) ## Summary of changes We do not want to run every job step with both pg 16 and 17 because this would need excessive resources (runners, computes) and extend the benchmarking run wall clock time too much. So we select an opinionated subset of testcases that we also report in weekly reporting and add a postgres v17 job step. For re-use projects associated Neon projects have been created and connection strings have been added to neon database organization secrets. A follow up is to add the reporting for these new runs to some grafana dashboards.	2025-01-29 20:21:42 +00:00
Tristan Partin	707a926057	Remove unused compute_ctl HTTP routes (#10544 ) These are not used anywhere within the platform, so let's remove dead code. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 19:22:01 +00:00
Alex Chi Z.	5bcefb4ee1	fix(pageserver): compaction perftest wrt upper limit (#10564 ) ## Problem The config is added in https://github.com/neondatabase/neon/pull/10550 causing behavior change for l0 compaction. close https://github.com/neondatabase/neon/issues/10562 ## Summary of changes Fix the test case to consider the effect of upper_limit. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-29 18:43:39 +00:00
Alexey Kondratov	34322b2424	chore(compute): Simplify new compute_ctl metrics and fix flaky test (#10560 ) ## Problem 1. `d04d924` added separate metrics for total requests and failures separately, but it doesn't make much sense. We could just have a unified counter with `http_status`. 2. `test_compute_migrations_retry` had a race, i.e., it was waiting for the last successful migration, not an actual failure. This was revealed after adding an assert on failure metric in `d04d924`. ## Summary of changes 1. Switch to unified counters for `compute_ctl` requests. 2. Add a waiting loop into `test_compute_migrations_retry` to eliminate the race. Part of neondatabase/cloud#17590	2025-01-29 18:09:25 +00:00
Vlad Lazar	fdfbc7b358	pageserver: hold GC while reading from a timeline (#10559 ) ## Problem If we are GC-ing because a new image layer was added while traversing the timeline, then it will remove layers that are required for fulfilling the current get request (read-path cannot "look back" and notice the new image layer). ## Summary of Changes Prevent GC from progressing on the current timeline while it is being visited for a read. Epic: https://github.com/neondatabase/neon/issues/9376	2025-01-29 17:08:25 +00:00
Conrad Ludgate	190c19c034	chore: update rust-postgres on rebase (#10561 ) I tried a full update of our tokio-postgres fork before. We hit some breaking change. This PR only pulls in ~50% of the changes from upstream: https://github.com/neondatabase/rust-postgres/pull/38.	2025-01-29 17:02:07 +00:00
Mikhail Kot	34e560fe37	download exporters from releases rather than using docker images (#10551 ) Use releases for postgres-exporter, pgbouncer-exporter, and sql-exporter	2025-01-29 15:52:00 +00:00
Tristan Partin	7922458b98	Use num_cpus from the workspace in pageserver (#10545 ) Luckily they were the same version, so we didn't spend time compiling two versions, which could have been the case in the future. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 15:45:36 +00:00
a-masterov	34d9e2d8e3	Add a test for GrapgQL (#10156 ) ## Problem We currently don't run the tests shipped with `pg_graphql`. ## Summary of changes The tests for `pg_graphql` are added.	2025-01-29 15:01:56 +00:00
Conrad Ludgate	2f82c21c63	chore: update rust-postgres fork (#10557 ) I updated the fork to fix some lints. Cargo keeps getting confused by it so let's just update the lockfile here	2025-01-29 12:55:24 +00:00
Ivan Efremov	222cc181e9	impr(proxy): Move the CancelMap to Redis hashes (#10364 ) ## Problem The approach of having CancelMap as an in-memory structure increases code complexity, as well as putting additional load for Redis streams. ## Summary of changes - Implement a set of KV ops for Redis client; - Remove cancel notifications code; - Send KV ops over the bounded channel to the handling background task for removing and adding the cancel keys. Closes #9660	2025-01-29 11:19:10 +00:00
alexanderlaw	4d2328ebe3	Fix C code to satisfy sanitizers (#10473 )	2025-01-29 10:05:43 +00:00
a-masterov	9f81828429	Test extension upgrade compatibility (#10244 ) ## Problem We have to test the extensions, shipped with Neon for compatibility before the upgrade. ## Summary of changes Added the test for compatibility with the upgraded extensions.	2025-01-29 09:19:11 +00:00
Arseny Sher	9ab13d6e2c	Log statements in test_layer_map (#10554 ) ## Problem test_layer_map doesn't log statements and it is not clear how long they take. ## Summary of changes Do log them. ref https://github.com/neondatabase/neon/issues/10409	2025-01-29 09:16:00 +00:00
Alex Chi Z.	983e18e63e	feat(pageserver): add compaction_upper_limit config (#10550 ) ## Problem Follow-up of the incident, we should not use the same bound on lower/upper limit of compaction files. This patch adds an upper bound limit, which is set to 50 for now. ## Summary of changes Add `compaction_upper_limit`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-01-28 23:18:32 +00:00
Alex Chi Z.	b735df6ff0	fix(pageserver): make image layer generation atomic (#10516 ) ## Problem close https://github.com/neondatabase/neon/issues/8362 ## Summary of changes Use `BatchLayerWriter` to ensure we clean up image layers after failed compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-28 21:29:51 +00:00
Fedor Dikarev	68cf0ba439	run benchmark tests on small-metal runners (#10549 ) ## Problem Ref: https://github.com/neondatabase/cloud/issues/23314 We suspect some inconsistency in Benchmark tests runs could be due to different type of runners they are landed in. To have that aligned in both terms: failure rates and benchmark results, lets run them for now on `small-metal` servers and see the progress for the tests stability. ## Summary of changes	2025-01-28 21:26:38 +00:00
Alexey Kondratov	d04d924649	feat(compute): Add some basic compute_ctl metrics (#10504 ) ## Problem There are several parts of `compute_ctl` with a very low visibility of errors: 1. DB migrations that run async in the background after compute start. 2. Requests made to control plane (currently only `GetSpec`). 3. Requests made to the remote extensions server. ## Summary of changes Add new counters to quickly evaluate the amount of errors among the fleet. Part of neondatabase/cloud#17590	2025-01-28 19:24:07 +00:00
JC Grünhage	f5fdaa6dc6	feat(ci): generate basic release notes with links (#10511 ) ## Problem https://github.com/neondatabase/neon/pull/10448 removed release notes, because if their generation failed, the whole release was failing. People liked them though, and wanted some basic release notes as a fall-back instead of completely removing them. ## Summary of changes Include basic release notes that link to the release PR and to a diff to the previous release.	2025-01-28 19:13:39 +00:00
Vlad Lazar	c54cd9e76a	storcon: signal LSN wait to pageserver during live migration (#10452 ) ## Problem We've seen the ingest connection manager get stuck shortly after a migration. ## Summary of changes A speculative mitigation is to use the same mechanism as get page requests for kicking LSN ingest. The connection manager monitors LSN waits and queries the broker if no updates are received for the timeline. Closes https://github.com/neondatabase/neon/issues/10351	2025-01-28 17:33:07 +00:00
Erik Grinaker	1010b8add4	pageserver: add `l0_flush_wait_upload` setting (#10534 ) ## Problem We need a setting to disable the flush upload wait, to test L0 flush backpressure in staging. ## Summary of changes Add `l0_flush_wait_upload` setting.	2025-01-28 17:21:05 +00:00
Folke Behrens	ae4b2af299	fix(proxy): Use correct identifier for usage metrics upload (#10538 ) ## Problem The request data and usage metrics S3 requests use the same identifier shown in logs, causing confusion about what type of upload failed. ## Summary of changes Use the correct identifier for usage metrics uploads. neondatabase/cloud#23084	2025-01-28 17:08:17 +00:00
Tristan Partin	15fecb8474	Update axum to 0.8.1 (#10332 ) Only a few things that needed updating: - async_trait was removed - Message::Text takes a Utf8Bytes object instead of a String Signed-off-by: Tristan Partin <tristan@neon.tech> Co-authored-by: Conrad Ludgate <connor@neon.tech>	2025-01-28 15:32:59 +00:00
Erik Grinaker	47677ba578	pageserver: disable L0 backpressure by default (#10535 ) ## Problem We'll need further improvements to compaction before enabling L0 flush backpressure by default. See: https://neondb.slack.com/archives/C033RQ5SPDH/p1738066068960519?thread_ts=1737818888.474179&cid=C033RQ5SPDH. Touches #5415. ## Summary of changes Disable `l0_flush_delay_threshold` by default.	2025-01-28 14:51:30 +00:00
Arpad Müller	83b6bfa229	Re-download layer if its local and on-disk metadata diverge (#10529 ) In #10308, we noticed many warnings about the local layer having different sizes on-disk compared to the metadata. However, the layer downloader would never redownload layer files if the sizes or generation numbers change. This is obviously a bug, which we aim to fix with this PR. This change also moves the code deciding what to do about a layer to a dedicated function: before we handled the "routing" via control flow, but now it's become too complicated and it is nicer to have the different verdicts for a layer spelled out in a list/match.	2025-01-28 13:39:53 +00:00
Erik Grinaker	ed942b05f7	Revert "pageserver: revert flush backpressure" (#10402 )" (#10533 ) This reverts commit `9e55d79803`. We'll still need this until we can tune L0 flush backpressure and compaction. I'll add a setting to disable this separately.	2025-01-28 13:33:58 +00:00
Vlad Lazar	62a717a2ca	pageserver: use PS node id for SK appname (#10522 ) ## Problem This one is fairly embarrassing. Safekeeper node id was used in the pageserver application name when connecting to safekeepers. ## Summary of changes Use the right node id. Closes https://github.com/neondatabase/neon/issues/10461	2025-01-28 13:11:51 +00:00
Peter Bendel	c8fbbb9b65	Test ingest_benchmark with different stripe size and also PostgreSQL version 17 (#10510 ) We want to verify if pageserver stripe size has an impact on ingest performance. We want to verify if ingest performance has improved or regressed with postgres version 17. ## Summary of changes - Allow to create new project with different postgres versions - allow to pre-shard new project with different stripe sizes instead of relying on storage manager to shard_split the project once a threshold is exceeded Replaces https://github.com/neondatabase/neon/pull/10509 Test run https://github.com/neondatabase/neon/actions/runs/12986410381	2025-01-27 21:06:05 +00:00
John Spray	d73f4a6470	pageserver: retry wrapper on manifest upload (#10524 ) ## Problem On remote storage errors (e.g. I/O timeout) uploading tenant manifest, all of compaction could fail. This is a problem IRL because we shouldn't abort compaction on a single IO error, and in tests because it generates spurious failures. Related: https://github.com/orgs/neondatabase/projects/51/views/2?sliceBy%5Bvalue%5D=jcsp&pane=issue&itemId=93692919&issue=neondatabase%7Cneon%7C10389 ## Summary of changes - Use `backoff::retry` when uploading tenant manifest	2025-01-27 21:02:25 +00:00
Heikki Linnakangas	5477d7db93	fast_import: fixes for Postgres v17 (#10414 ) Now that the tests are run on v17, they're also run in debug mode, which is slow. Increase statement_timeout in the test to work around that.	2025-01-27 19:47:49 +00:00
Arpad Müller	eb9832d846	Remove PQ_LIB_DIR env var (#10526 ) We now don't need libpq any more for the build of the storage controller, as we use `diesel-async` since #10280. Therefore, we remove the env var that gave cargo/rustc the location for libpq. Follow-up of #10280	2025-01-27 19:38:18 +00:00
Christian Schwarz	3d36dfe533	fix: noisy `broker subscription failed` error during storage broker deploys (#10521 ) During broker deploys, pageservers log this noisy WARN en masse. I can trivially reproduce the WARN message in neon_local by SIGKILLing broker during e.g. `pgbench -i`. I don't understand why tonic is not detecting the error as `Code::Unavailable`. Until we find time to understand that / fix upstream, this PR adds the error message to the existing list of known error messages that get demoted to INFO level. Refs: - refs https://github.com/neondatabase/neon/issues/9562	2025-01-27 19:19:55 +00:00
John Spray	ebf44210ba	remote_storage: less sensitive timeout logging in ABS listings (#10518 ) ## Problem We were logging a warning after a single request timeout, while listing objects. Closes: https://github.com/neondatabase/neon/issues/10166 ## Summary of changes - These timeouts are a pretty normal part of life, so back it off to only log a warning after two in a row.	2025-01-27 17:44:18 +00:00
John Spray	aabf455dfb	README: clarify that neon_local is a dev/test tool (#10512 ) ## Problem From time to time, folks discover our `control_plane/` folder and make the (reasonable) mistake of thinking it's a tool for running full-sized Neon systems, whereas in reality it is a tool for dev/test. ## Summary of changes - Change control_plane's readme title to "Local Development Control Plane (`neon_local`)` - Change "Running local installation" to "Running a local development environment" in the main readme	2025-01-27 17:24:42 +00:00
John Spray	aec92bfc34	pageserver: decrease utilization MAX_SHARDS (#10489 ) ## Problem The intent of this parameter is to have pageservers consider themselves "full" if they've got lots of shards, even if they have plenty of capacity. It works, but because we typically successfully oversubscribe capacity up to 200%, the MAX_SHARDS limit is effectively doubled, so this 20,000 value ends up meaning 40,000, whereas the original intent was to limit nodes to ~10000 shards. ## Summary of changes - Change MAX_SHARDS to 5000, so that a node with 5000 will get a 100% utilization, which is equivalent in practice to being considered "half full" by the storage controller in capacity terms. This is all a bit subtle and indiret. Originally the limit was baked into the pageserver with the idea that the pageserver knows better what its own resources tolerate than the storage controller does, but in practice it would be probably be easier to understand all this if we just did it controller-side. So there's scope to refactor here in future.	2025-01-27 17:03:32 +00:00
Arpad Müller	b0b4b7dd8f	storcon: switch to diesel-async and tokio-postgres (#10280 ) Switches the storcon away from using diesel's synchronous APIs in favour of `diesel-async`. Advantages: * less C dependencies, especially no openssl, which might be behind the bug: https://github.com/neondatabase/cloud/issues/21010 * Better to only have async than mix of async plus `spawn_blocking` We had to turn off usage of the connection pool for migrations, as diesel migrations don't support async APIs. Thus we still use `spawn_blocking` in that one place. But this is explicitly done in one of the `diesel-async` examples.	2025-01-27 14:25:11 +00:00
Mikhail Kot	4dd4096f11	Pgbouncer exporter in compute image (#10503 ) https://github.com/neondatabase/cloud/issues/19081 Include pgbouncer_exporter in compute image and run it at port 9127	2025-01-27 14:09:21 +00:00
Erik Grinaker	be718ed121	pageserver: disable L0 flush stalls, tune delay threshold (#10507 ) ## Problem In ingest benchmarks, we see L0 compaction delays of over 10 minutes due to image compaction. We can't stall L0 flushes for that long. ## Summary of changes Disable L0 flush stalls, and bump the default L0 flush delay threshold from 20 to 30 L0 layers.	2025-01-25 16:51:54 +00:00
Konstantin Knizhnik	9f1408fdf3	Do not assign max(lsn) to maxLastWrittenLsn in SetLastWrittenLSNForblokv (#10474 ) ## Problem See https://github.com/neondatabase/neon/issues/10281 `SetLastWrittenLSNForBlockv` is assigning max(lsn) to `maxLastWrittenLsn` while its should contain only max LSN not present in LwLSN cache. It case unnecessary waits in PS. ## Summary of changes Restore status-quo for pg17. Related Postgres PR: https://github.com/neondatabase/postgres/pull/563 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-24 14:57:32 +00:00
Conrad Ludgate	7000aaaf75	chore: fix h2 stubgen (#10491 ) ## Problem ## Summary of changes --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-01-24 14:55:48 +00:00
Erik Grinaker	ef2a2555b1	pageserver: tighten compaction failure detection (#10502 ) ## Problem If compaction fails, we disable L0 flush stalls to avoid persistent stalls. However, the logic would unset the failure marker on offload failures or shutdown. This can lead to sudden L0 flush stalls if we try and fail to offload a timeline with compaction failures, or if there is some kind of shutdown race. Touches #10405. ## Summary of changes Don't touch the compaction failure marker on offload failures or shutdown.	2025-01-24 13:55:05 +00:00
Konstantin Knizhnik	d8ab6ddb0f	Check if relation has storage in calculate_relation_size (#10477 ) ## Problem Parent of partitioned table has no storage, it relfilelocator is zero. It cab be incorrectly hashed and produce wrong results. See https://github.com/neondatabase/postgres/pull/518 ## Summary of changes This problem is already addressed in pg17. Add the same check for all other PG versions. Postgres PRs: https://github.com/neondatabase/postgres/pull/566 https://github.com/neondatabase/postgres/pull/565 https://github.com/neondatabase/postgres/pull/564 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-24 12:43:52 +00:00
JC Grünhage	dcc437da1d	Make promote-images-prod depend on promote-images-dev (#10494 ) ## Problem After talking about it again with @bayandin again this should replace the changes from https://github.com/neondatabase/neon/pull/10475. While the previous changes worked, they are less visually clear in what happens, and we might end up in a situation where we update `latest`, but don't actually have the tagged image pushed that contains the same changes. The latter would result in potentially hard to debug situations. ## Summary of changes Revert `c283aaaf8d` and make promote-images-prod depend on promote-images-dev instead.	2025-01-24 11:03:39 +00:00
a-masterov	c286fea018	Print logs in extensions test in another step to improve readability (#10483 ) ## Problem The containers' log output is mixed with the tests' output, so you must scroll up to find the error. ## Summary of changes Printing of containers' logs moved to a separate step.	2025-01-24 10:44:48 +00:00
Vlad Lazar	de8276488d	tests: enable wal reader fanout in tests (#10301 ) Note: this has to merge after the release is cut on `2025-01-17` for compat tests to start passing. ## Problem SK wal reader fan-out is not enabled in tests by default. ## Summary of changes Enable it.	2025-01-24 10:34:57 +00:00
Erik Grinaker	ddb9ae1214	pageserver: add compaction backpressure for layer flushes (#10405 ) ## Problem There is no direct backpressure for compaction and L0 read amplification. This allows a large buildup of compaction debt and read amplification. Resolves #5415. Requires #10402. ## Summary of changes Delay layer flushes based on the number of level 0 delta layers: * `l0_flush_delay_threshold`: delay flushes such that they take 2x as long (default `2 * compaction_threshold`). * `l0_flush_stall_threshold`: stall flushes until level 0 delta layers drop below threshold (default `4 * compaction_threshold`). If either threshold is reached, ephemeral layer rolls also synchronously wait for layer flushes to propagate this backpressure up into WAL ingestion. This will bound the number of frozen layers to 1 once backpressure kicks in, since all other frozen layers must flush before the rolled layer. ## Analysis This will significantly change the compute backpressure characteristics. Recall the three compute backpressure knobs: * `max_replication_write_lag`: 500 MB (based on Pageserver `last_received_lsn`). * `max_replication_flush_lag`: 10 GB (based on Pageserver `disk_consistent_lsn`). * `max_replication_apply_lag`: disabled (based on Pageserver `remote_consistent_lsn`). Previously, the Pageserver would keep ingesting WAL and build up ephemeral layers and L0 layers until the compute hit `max_replication_flush_lag` at 10 GB and began backpressuring. Now, once we delay/stall WAL ingestion, the compute will begin backpressuring after `max_replication_write_lag`, i.e. 500 MB. This is probably a good thing (we're not building up a ton of compaction debt), but we should consider tuning these settings. `max_replication_flush_lag` probably doesn't serve a purpose anymore, and we should consider removing it. Furthermore, the removal of the upload barrier in #10402 will mean that we no longer backpressure flushes based on S3 uploads, since `max_replication_apply_lag` is disabled. We should consider enabling this as well. ### When and what do we compact? Default compaction settings: * `compaction_threshold`: 10 L0 delta layers. * `compaction_period`: 20 seconds (between each compaction loop check). * `checkpoint_distance`: 256 MB (size of L0 delta layers). * `l0_flush_delay_threshold`: 20 L0 delta layers. * `l0_flush_stall_threshold`: 40 L0 delta layers. Compaction characteristics: * Minimum compaction volume: 10 layers * 256 MB = 2.5 GB. * Additional compaction volume (assuming 128 MB/s WAL): 128 MB/s * 20 seconds = 2.5 GB (10 L0 layers). * Required compaction bandwidth: 5.0 GB / 20 seconds = 256 MB/s. ### When do we hit `max_replication_write_lag`? Depending on how fast compaction and flushes happens, the compute will backpressure somewhere between `l0_flush_delay_threshold` or `l0_flush_stall_threshold` + `max_replication_write_lag`. * Minimum compute backpressure lag: 20 layers * 256 MB + 500 MB = 5.6 GB * Maximum compute backpressure lag: 40 layers * 256 MB + 500 MB = 10.0 GB This seems like a reasonable range to me.	2025-01-24 09:47:28 +00:00
Erik Grinaker	9e55d79803	Reapply "pageserver: revert flush backpressure" (#10270 ) (#10402 ) This reapplies #10135. Just removing this flush backpressure without further mitigations caused read amp increases during bulk ingestion (predictably), so it was reverted. We will replace it by compaction-based backpressure. ## Problem In #8550, we made the flush loop wait for uploads after every layer. This was to avoid unbounded buildup of uploads, and to reduce compaction debt. However, the approach has several problems: * It prevents upload parallelism. * It prevents flush and upload pipelining. * It slows down ingestion even when there is no need to backpressure. * It does not directly backpressure based on compaction debt and read amplification. We will instead implement compaction-based backpressure in a PR immediately following this removal (#5415). Touches #5415. Touches #10095. ## Summary of changes Remove waiting on the upload queue in the flush loop.	2025-01-24 08:35:35 +00:00
Alex Chi Z.	8d47a60de2	fix(pageserver): handle dup layers during gc-compaction (#10430 ) ## Problem If gc-compaction decides to rewrite an image layer, it will now cause index_part to lose reference to that layer. In details, * Assume there's only one image layer of key 0000...AAAA at LSN 0x100 and generation 0xA in the system. * gc-compaction kicks in at gc-horizon 0x100, and then produce 0000...AAAA at LSN 0x100 and generation 0xB. * It submits a compaction result update into the index part that unlinks 0000-AAAA-100-A and adds 0000-AAAA-100-B On the remote storage / local disk side, this is fine -- it unlinks things correctly and uploads the new file. However, the `index_part.json` itself doesn't record generations. The buggy procedure is as follows: 1. upload the new file 2. update the index part to remove the old file and add the new file 3. remove the new file Therefore, the correct update result process for gc-compaction should be as follows: * When modifying the layer map, delete the old one and upload the new one. * When updating the index, uploading the new one in the index without deleting the old one. ## Summary of changes * Modify `finish_gc_compaction` to correctly order insertions and deletions. * Update the way gc-compaction uploads the layer files. * Add new tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-23 21:54:44 +00:00
Alexey Kondratov	6166482589	feat(compute): Automatically create release PRs (#10495 ) We've finally transitioned to using a separate `release-compute` branch. Now, we can finally automatically create release PRs on Fri and release them during the following week. Part of neondatabase/cloud#11698	2025-01-23 20:47:20 +00:00
Arpad Müller	ca6d72ba2a	Increase reconciler timeout after shard split (#10490 ) Sometimes, especially when the host running the tests is overloaded, we can run into reconcile timeouts in `test_timeline_ancestor_detach_idempotent_success`, making the test flaky. By increasing the timeouts from 30 seconds to 120 seconds, we can address the flakiness. Fixes #10464	2025-01-23 16:43:04 +00:00
a-masterov	b6c0f66619	CI(autocomment): add the lfc state (#10121 ) ## Problem Currently, the report does not contain the LFC state of the failed tests. ## Summary of changes Added the LFC state to the link to the allure report. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-01-23 14:52:07 +00:00
Mikhail Kot	3702ec889f	Enable postgres_fdw (#10426 ) Update compute image to include postgres_fdw #3720	2025-01-23 13:22:31 +00:00
Anastasia Lubennikova	8e8df1b453	Disable logical replication subscribers (#10249 ) Drop logical replication subscribers before compute starts on a non-main branch. Add new compute_ctl spec flag: drop_subscriptions_before_start If it is set, drop all the subscriptions from the compute node before it starts. To avoid race on compute start, use new GUC neon.disable_logical_replication_subscribers to temporarily disable logical replication workers until we drop the subscriptions. Ensure that we drop subscriptions exactly once when endpoint starts on a new branch. It is essential, because otherwise, we may drop not only inherited, but newly created subscriptions. We cannot rely only on spec.drop_subscriptions_before_start flag, because if for some reason compute restarts inside VM, it will start again with the same spec and flag value. To handle this, we save the fact of the operation in the database in the neon.drop_subscriptions_done table. If the table does not exist, we assume that the operation was never performed, so we must do it. If table exists, we check if the operation was performed on the current timeline. fixes: https://github.com/neondatabase/neon/issues/8790	2025-01-23 11:02:15 +00:00
Alex Chi Z.	92d95b08cf	fix(pageserver): extend split job key range to the end (#10484 ) ## Problem Not really a bug fix, but hopefully can reproduce https://github.com/neondatabase/neon/issues/10482 more. If the layer map does not contain layers that end at exactly the end range of the compaction job, the current split algorithm will produce the last job that ends at the maximum layer key. This patch extends it all the way to the compaction job end key. For example, the user requests a compaction of 0000...FFFF. However, we only have a layer 0000..3000 in the layer map, and the split job will have a range of 0000..3000 instead of 0000..FFFF. This is not a correctness issue but it would be better to fix it so that we can get consistent job splits. ## Summary of changes Compaction job split will always cover the full specified key range. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-23 00:15:46 +00:00
Arpad Müller	0af40b5494	Only churn rows once in test_scrubber_physical_gc_ancestors (#10481 ) ## Problem PR #10457 was supposed to fix the flakiness of `test_scrubber_physical_gc_ancestors`, but instead it made it even more flaky. However, the original error causes disappeared, now to be replaced by key not found errors. See this for a longer explanation: https://github.com/neondatabase/neon/issues/10391#issuecomment-2608018967 ## Solution This does one churn rows after all compactions, and before we do any timeline gc's. That way, we remain more accessible at older lsn's.	2025-01-22 19:45:12 +00:00
Arpad Müller	c60b91369a	Expose safekeeper APIs for creation and deletion (#10478 ) Add APIs for timeline creation and deletion to the safekeeper client crate. Going to be used later in #10440. Split off from #10440. Part of https://github.com/neondatabase/neon/issues/9011	2025-01-22 18:52:16 +00:00
a-masterov	f1473dd438	Fix the connection error for extension tests (#10480 ) ## Problem The trust connection to the compute required for `pg_anon` was removed. However, the PGPASSWORD environment variable was not added to `docker-compose.yml`. This caused connection errors, which were interpreted as success due to errors in the bash script. ## Summary of changes The environment variable was added, and the logic in the bash script was fixed.	2025-01-22 16:34:57 +00:00
JC Grünhage	c283aaaf8d	Tag images from docker-hub in promote-images-prod (#10475 ) ## Problem https://github.com/neondatabase/neon/actions/runs/12896686483/job/35961290336#step:5:107 showed that `promote-images-prod` was missing another dependency. ## Summary of changes Modify `promote-images-prod` to tag based on docker-hub images, so that `promote-images-prod` does not rely on `promote-images-dev`. The result should be the exact same, but allows the two jobs to run in parallel.	2025-01-22 16:09:41 +00:00
Vlad Lazar	414ed82c1f	pageserver: issue concurrent IO on the read path (#9353 ) ## Refs - Epic: https://github.com/neondatabase/neon/issues/9378 Co-authored-by: Vlad Lazar <vlad@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> ## Problem The read path does its IOs sequentially. This means that if N values need to be read to reconstruct a page, we will do N IOs and getpage latency is `O(NIoLatency)`. ## Solution With this PR we gain the ability to issue IO concurrently within one layer visit and* to move on to the next layer without waiting for IOs from the previous visit to complete. This is an evolved version of the work done at the Lisbon hackathon, cf https://github.com/neondatabase/neon/pull/9002. ## Design ### `will_init` now sourced from disk btree index keys On the algorithmic level, the only change is that the `get_values_reconstruct_data` now sources `will_init` from the disk btree index key (which is PS-page_cache'd), instead of from the `Value`, which is only available after the IO completes. ### Concurrent IOs, Submission & Completion To separate IO submission from waiting for its completion, while simultaneously feature-gating the change, we introduce the notion of an `IoConcurrency` struct through which IO futures are "spawned". An IO is an opaque future, and waiting for completions is handled through `tokio::sync::oneshot` channels. The oneshot Receiver's take the place of the `img` and `records` fields inside `VectoredValueReconstructState`. When we're done visiting all the layers and submitting all the IOs along the way we concurrently `collect_pending_ios` for each value, which means for each value there is a future that awaits all the oneshot receivers and then calls into walredo to reconstruct the page image. Walredo is now invoked concurrently for each value instead of sequentially. Walredo itself remains unchanged. The spawned IO futures are driven to completion by a sidecar tokio task that is separate from the task that performs all the layer visiting and spawning of IOs. That tasks receives the IO futures via an unbounded mpsc channel and drives them to completion inside a `FuturedUnordered`. (The behavior from before this PR is available through `IoConcurrency::Sequential`, which awaits the IO futures in place, without "spawning" or "submitting" them anywhere.) #### Alternatives Explored A few words on the rationale behind having a sidecar task and what alternatives were considered. One option is to queue up all IO futures in a FuturesUnordered that is polled the first time when we `collect_pending_ios`. Firstly, the IO futures are opaque, compiler-generated futures that need to be polled at least once to submit their IO. "At least once" because tokio-epoll-uring may not be able to submit the IO to the kernel on first poll right away. Second, there are deadlocks if we don't drive the IO futures to completion independently of the spawning task. The reason is that both the IO futures and the spawning task may hold some _and_ try to acquire _more_ shared limited resources. For example, both spawning task and IO future may try to acquire * a VirtualFile file descriptor cache slot async mutex (observed during impl) * a tokio-epoll-uring submission slot (observed during impl) * a PageCache slot (currently this is not the case but we may move more code into the IO futures in the future) Another option is to spawn a short-lived `tokio::task` for each IO future. We implemented and benchmarked it during development, but found little throughput improvement and moderate mean & tail latency degradation. Concerns about pressure on the tokio scheduler made us discard this variant. The sidecar task could be obsoleted if the IOs were not arbitrary code but a well-defined struct. However, 1. the opaque futures approach taken in this PR allows leaving the existing code unchanged, which 2. allows us to implement the `IoConcurrency::Sequential` mode for feature-gating the change. Once the new mode sidecar task implementation is rolled out everywhere, and `::Sequential` removed, we can think about a descriptive submission & completion interface. The problems around deadlocks pointed out earlier will need to be solved then. For example, we could eliminate VirtualFile file descriptor cache and tokio-epoll-uring slots. The latter has been drafted in https://github.com/neondatabase/tokio-epoll-uring/pull/63. See the lengthy doc comment on `spawn_io()` for more details. ### Error handling There are two error classes during reconstruct data retrieval: * traversal errors: index lookup, move to next layer, and the like * value read IO errors A traversal error fails the entire get_vectored request, as before this PR. A value read error only fails that value. In any case, we preserve the existing behavior that once `get_vectored` returns, all IOs are done. Panics and failing to poll `get_vectored` to completion will leave the IOs dangling, which is safe but shouldn't happen, and so, a rate-limited log statement will be emitted at warning level. There is a doc comment on `collect_pending_ios` giving more code-level details and rationale. ### Feature Gating The new behavior is opt-in via pageserver config. The `Sequential` mode is the default. The only significant change in `Sequential` mode compared to before this PR is the buffering of results in the `oneshot`s. ## Code-Level Changes Prep work: * Make `GateGuard` clonable. Core Feature: * Traversal code: track `will_init` in `BlobMeta` and source it from the Delta/Image/InMemory layer index, instead of determining `will_init` after we've read the value. This avoids having to read the value to determine whether traversal can stop. * Introduce `IoConcurrency` & its sidecar task. * `IoConcurrency` is the clonable handle. * It connects to the sidecar task via an `mpsc`. * Plumb through `IoConcurrency` from high level code to the individual layer implementations' `get_values_reconstruct_data`. We piggy-back on the `ValuesReconstructState` for this. * The sidecar task should be long-lived, so, `IoConcurrency` needs to be rooted up "high" in the call stack. * Roots as of this PR: * `page_service`: outside of pagestream loop * `create_image_layers`: when it is called * `basebackup`(only auxfiles + replorigin + SLRU segments) * Code with no roots that uses `IoConcurrency::sequential` * any `Timeline::get` call * `collect_keyspace` is a good example * follow-up: https://github.com/neondatabase/neon/issues/10460 * `TimelineAdaptor` code used by the compaction simulator, unused in practive * `ingest_xlog_dbase_create` * Transform Delta/Image/InMemoryLayer to * do their values IO in a distinct `async {}` block * extend the residence of the Delta/Image layer until the IO is done * buffer their results in a `oneshot` channel instead of straight in `ValuesReconstructState` * the `oneshot` channel is wrapped in `OnDiskValueIo` / `OnDiskValueIoWaiter` types that aid in expressiveness and are used to keep track of in-flight IOs so we can print warnings if we leave them dangling. * Change `ValuesReconstructState` to hold the receiving end of the `oneshot` channel aka `OnDiskValueIoWaiter`. * Change `get_vectored_impl` to `collect_pending_ios` and issue walredo concurrently, in a `FuturesUnordered`. Testing / Benchmarking: * Support queue-depth in pagebench for manual benchmarkinng. * Add test suite support for setting concurrency mode ps config field via a) an env var and b) via NeonEnvBuilder. * Hacky helper to have sidecar-based IoConcurrency in tests. This will be cleaned up later. More benchmarking will happen post-merge in nightly benchmarks, plus in staging/pre-prod. Some intermediate helpers for manual benchmarking have been preserved in https://github.com/neondatabase/neon/pull/10466 and will be landed in later PRs. (L0 layer stack generator!) Drive-By: * test suite actually didn't enable batching by default because `config.compatibility_neon_binpath` is always Truthy in our CI environment => https://neondb.slack.com/archives/C059ZC138NR/p1737490501941309 * initial logical size calculation wasn't always polled to completion, which was surfaced through the added WARN logs emitted when dropping a `ValuesReconstructState` that still has inflight IOs. * remove the timing histograms `pageserver_getpage_get_reconstruct_data_seconds` and `pageserver_getpage_reconstruct_seconds` because with planning, value read IO, and walredo happening concurrently, one can no longer attribute latency to any one of them; we'll revisit this when Vlad's work on tracing/sampling through RequestContext lands. * remove code related to `get_cached_lsn()`. The logic around this has been dead at runtime for a long time, ever since the removal of the materialized page cache in #8105. ## Testing Unit tests use the sidecar task by default and run both modes in CI. Python regression tests and benchmarks also use the sidecar task by default. We'll test more in staging and possibly preprod. # Future Work Please refer to the parent epic for the full plan. The next step will be to fold the plumbing of IoConcurrency into RequestContext so that the function signatures get cleaned up. Once `Sequential` isn't used anymore, we can take the next big leap which is replacing the opaque IOs with structs that have well-defined semantics. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-01-22 15:30:23 +00:00
Alexey Kondratov	881e351f69	feat(compute): Allow installing both 0.8.0 and 0.7.4 pgvector (#10345 ) ## Problem Both these versions are binary compatible, but the way pgvector structures the SQL files forbids installing 0.7.4 if you have a 0.8.0 distribution. Yet, some users may need a previous version for backward compatibility, e.g., restoring the dump. See this thread for discussion https://neondb.slack.com/archives/C04DGM6SMTM/p1735911490242919?thread_ts=1731343604.259169&cid=C04DGM6SMTM ## Summary of changes Put `vector--0.7.4.sql` file into compute image to allow installing this version as well. Tested on staging and it seems to be working as expected: ```sql select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| (null) \| vector data type and ivfflat and hnsw access methods create extension vector version '0.7.4'; select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| 0.7.4 \| vector data type and ivfflat and hnsw access methods alter extension vector update; select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| 0.8.0 \| vector data type and ivfflat and hnsw access methods drop extension vector; create extension vector; select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| 0.8.0 \| vector data type and ivfflat and hnsw access methods ``` If we find out it's a good approach, we can adopt the same for other extensions with a stable ABI -- support both `current` and `current - 1` releases.	2025-01-22 12:38:23 +00:00
Christian Schwarz	b31ce14083	initial logical size calculation: always poll to completion (#10471 ) # Refs - extracted from https://github.com/neondatabase/neon/pull/9353 # Problem Before this PR, when task_mgr shutdown is signalled, e.g. during pageserver shutdown or Tenant shutdown, initial logical size calculation stops polling and drops the future that represents the calculation. This is against the current policy that we poll all futures to completion. This became apparent during development of concurrent IO which warns if we drop a `Timeline::get_vectored` future that still has in-flight IOs. We may revise the policy in the future, but, right now initial logical size calculation is the only part of the codebase that doesn't adhere to the policy, so let's fix it. ## Code Changes - make sensitive exclusively to `Timeline::cancel` - This should be sufficient for all cases of shutdowns; the sensitivity to task_mgr shutdown is unnecessary. - this broke the various cancel tests in `test_timeline_size.py`, e.g., `test_timeline_initial_logical_size_calculation_cancellation` - the tests would time out because the await point was not sensitive to cancellation - to fix this, refactor `pausable_failpoint` so that it accepts a cancellation token - side note: we _really_ should write our own failpoint library; maybe after we get heap-allocated RequestContext, we can plumb failpoints through there.	2025-01-22 12:28:26 +00:00
Christian Schwarz	b4d87b9dfe	fix(tests): actually enable pipelinig by default in the test suite (#10472 ) ## Problem PR #9993 was supposed to enable `page_service_pipelining` by default for all `NeonEnv`s, but this was ineffective in our CI environment. Thus, CI Python-based tests and benchmarks, unless explicitly configuring pipelining, were still using serial protocol handling. ## Analysis The root cause was that in our CI environment, `config.compatibility_neon_binpath` is always Truthy. It's not in local environments, which is why this slipped through in local testing. Lesson: always add a log line ot pageserver startup and spot-check tests to ensure the intended default is picked up. ## Summary of changes Fix it. Since enough time has passed, the compatiblity snapshot contains a recent enough software version so we don't need to worry about `compatibility_neon_binpath` anymore. ## Future Work The question how to add a new default except for compatibliity tests, which is what the broken code was supposed to do, is still unsolved. Slack discussion: https://neondb.slack.com/archives/C059ZC138NR/p1737490501941309	2025-01-22 10:10:43 +00:00
Conrad Ludgate	2b49d6ee05	feat: adjust the tonic features to remove axum dependency (#10348 ) To help facilitate an upgrade to axum 0.8 (https://github.com/neondatabase/neon/pull/10332#pullrequestreview-2541989619) this massages the tonic dependency features so that tonic does not depend on axum.	2025-01-22 09:15:52 +00:00
Erik Grinaker	14e1f89053	pageserver: eagerly notify flush waiters (#10469 ) ## Problem Currently, the layer flush loop will continue flushing layers as long as any are pending, and only notify waiters once there are no further layers to flush. This can cause waiters to wait longer than necessary, and potentially starve them if pending layers keep arriving faster than they can be flushed. The impact of this will increase when we add compaction backpressure and propagate it up into the WAL receiver. Extracted from #10405. ## Summary of changes Break out of the layer flush loop once we've flushed up to the requested LSN. If further flush requests have arrived in the meanwhile, flushing will resume immediately after.	2025-01-21 22:01:27 +00:00
Erik Grinaker	8a8c656c06	pageserver: add `LayerMap::watch_layer0_deltas()` (#10470 ) ## Problem For compaction backpressure, we need a mechanism to signal when compaction has reduced the L0 delta layer count below the backpressure threshold. Extracted from #10405. ## Summary of changes Add `LayerMap::watch_level0_deltas()` which returns a `tokio::sync:⌚:Receiver` signalling the current L0 delta layer count.	2025-01-21 21:18:09 +00:00
Erik Grinaker	a75e11cc00	pageserver: return duration from `StorageTimeMetricsTimer` (#10468 ) ## Problem It's sometimes useful to obtain the elapsed duration from a `StorageTimeMetricsTimer` for purposes beyond just recording it in metrics (e.g. to log it). Extracted from #10405. ## Summary of changes Add `StorageTimeMetricsTimer.elapsed()` and return the duration from `stop_and_record()`.	2025-01-21 20:56:34 +00:00
Alex Chi Z.	7d4bfcdc47	feat(pageserver): add config items for gc-compaction auto trigger (#10455 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 The automatic trigger is already implemented at https://github.com/neondatabase/neon/pull/10221 but I need to write some tests and finish my experiments in staging before I can merge it with confidence. Given that I have some other patches that will modify the config items, I'd like to get the config items merged first to reduce conflicts. ## Summary of changes * add `l2_lsn` to index_part.json -- below that LSN, data have been processed by gc-compaction * add a set of gc-compaction auto trigger control items into the config --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-21 19:29:38 +00:00
a-masterov	737888e5c9	Remove the tests for `pg_anon` (#10382 ) ## Problem We are removing the `pg_anon` v1 extension from Neon. So we don't need to test it anymore and can remove the code for simplicity. ## Summary of changes The code required for testing `pg_anon` is removed.	2025-01-21 19:17:14 +00:00
Gleb Novikov	19bf7b78a0	fast import: basic python test (#10271 ) We did not have any tests on fast_import binary yet. In this PR I have introduced: - `FastImport` class and tools for testing in python - basic test that runs fast import against vanilla postgres and checks that data is there Should be merged after https://github.com/neondatabase/neon/pull/10251	2025-01-21 16:50:44 +00:00
Arpad Müller	7e4a39ea53	Fix two flakiness sources in test_scrubber_physical_gc_ancestors (#10457 ) We currently have some flakiness in `test_scrubber_physical_gc_ancestors`, see #10391. The first flakiness kind is about the reconciler not actually becoming idle within the timeout of 30 seconds. We see continuous forward progress so this is likely not a hang. We also see this happen in parallel to a test failure, so is likely due to runners being overloaded. Therefore, we increase the timeout. The second flakiness kind is an assertion failure. This one is a little bit more tricky, but we saw in the successful run that there was some advance of the lsn between the compaction ran (which created layer files) and the gc run. Apparently gc rejects reductions to the single image layer setting if the cutoff lsn is the same as the lsn of the image layer: it will claim that that layer is newer than the space cutoff and therefore skip it, while thinking the old layer (that we want to delete) is the latest one (so it's not deleted). We address the second flakiness kind by inserting a tiny amount of WAL between the compaction and gc. This should hopefully fix things. Related issue: #10391 (not closing it with the merger of the PR as we'll need to validate that these changes had the intended effect). Thanks to Chi for going over this together with me in a call.	2025-01-21 15:40:04 +00:00
JC Grünhage	624a507544	Create Github releases with empty body for now (#10448 ) ## Problem When releasing `release-7574`, the Github Release creation failed with "body is too long" (see https://github.com/neondatabase/neon/actions/runs/12834025431/job/35792346745#step:5:77). There's lots of room for improvement of the release notes, but for now we'll disable them instead. ## Summary of changes - Disable automatic generation of release notes for Github releases - Enable creation of Github releases for proxy/compute	2025-01-21 12:45:21 +00:00
Arpad Müller	2ab9f69825	Simplify pageserver_physical_gc function (#10104 ) This simplifies the code in `pageserver_physical_gc` a little bit after the feedback in #10007 that the code is too complicated. Most importantly, we don't pass around `GcSummary` any more in a complicated fashion, and we save on async stream-combinator-inception in one place in favour of `try_stream!{}`. Follow-up of #10007	2025-01-20 21:57:15 +00:00
Alex Chi Z.	2de2b26c62	feat(pageserver): add reldir migration configs (#10439 ) ## Problem Part of #9516 per RFC at https://github.com/neondatabase/neon/pull/10412 ## Summary of changes Adding the necessary config items and index_part items for the large relation count work. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-20 20:44:12 +00:00
Matthias van de Meent	e781cf6dd8	Compute/LFC: Apply limits consistently (#10449 ) Otherwise we might hit ERRORs in otherwise safe situations (such as user queries), which isn't a great user experience. ## Problem https://github.com/neondatabase/neon/pull/10376 ## Summary of changes Instead of accepting internal errors as acceptable, we ensure we don't exceed our allocated usage.	2025-01-20 18:29:21 +00:00
Christian Schwarz	72130d7d6c	fix(page_service / handle): panic when parallel client disconnect & Timeline shutdown (#10445 ) ## Refs - fixes https://github.com/neondatabase/neon/issues/10444 ## Problem We're seeing a panic `handles are only shut down once in their lifetime` in our performance testbed. ## Hypothesis Annotated code in https://github.com/neondatabase/neon/issues/10444#issuecomment-2602286415. ``` T1: drop Cache, executes up to (1) => HandleInner is now in state ShutDown T2: Timeline::shutdown => PerTimelineState::shutdown executes shutdown() again => panics ``` Likely this snuck in the final touches of #10386 where I narrowed down the locking rules. ## Summary of changes Make duplicate shutdowns a no-op.	2025-01-20 17:51:30 +00:00
John Spray	2657b7ec75	rfcs: add sharded ingest RFC (#8754 ) ## Summary Whereas currently we send all WAL to all pageserver shards, and each shard filters out the data that it needs, in this RFC we add a mechanism to filter the WAL on the safekeeper, so that each shard receives only the data it needs. This will place some extra CPU load on the safekeepers, in exchange for reducing the network bandwidth for ingesting WAL back to scaling as O(1) with shard count, rather than O(N_shards). Touches #9329. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Vlad Lazar <vlalazar.vlad@gmail.com> Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-01-20 17:33:07 +00:00
Christian Schwarz	02fc58b878	impr(timeline handles): add more tests covering reference cyle (#10446 ) The other test focus on the external interface usage while the tests added in this PR add some testing around HandleInner's lifecycle, ensuring we don't leak it once either connection gets dropped or per-timeline-state is shut down explicitly.	2025-01-20 14:37:24 +00:00
Arpad Müller	b312a3c320	Move DeleteTimelineFlow::prepare to separate function and use enum (#10334 ) It was requested by review in #10305 to use an enum or something like it for distinguishing the different modes instead of two parameters, because two flags allow four combinations, and two of them don't really make sense/ aren't used. follow-up of #10305	2025-01-20 12:50:44 +00:00
John Spray	7d761a9d22	storage controller: make chaos less disruptive to AZ locality (#10438 ) ## Problem Since #9916 , the chaos code is actively fighting the optimizer: tenants tend to be attached in their preferred AZ, so most chaos migrations were moving them to a non-preferred AZ. ## Summary of changes - When picking migrations, prefer to migrate things _toward_ their preferred AZ when possible. Then pick shards to move the other way when necessary. The resulting behavior should be an alternating "back and forth" where the chaos code migrates thiings away from home, and then migrates them back on the next iteration. The side effect will be that the chaos code actively helps to push things into their home AZ. That's not contrary to its purpose though: we mainly just want it to continuously migrate things to exercise migration+notification code.	2025-01-20 09:47:23 +00:00
John Spray	8bdaee35f3	pageserver: safety checks on validity of uploaded indices (#10403 ) ## Problem Occasionally, we encounter bugs in test environments that can be detected at the point of uploading an index, but we proceed to upload it anyway and leave a tenant in a broken state that's awkward to handle. ## Summary of changes - Validate index when submitting it for upload, so that we can see the issue quickly e.g. in an API invoking compaction - Validate index before executing the upload, so that we have a hard enforcement that any code path that tries to upload an index will not overwrite a valid index with an invalid one.	2025-01-20 09:20:31 +00:00
Arpad Müller	b0f34099f9	Add safekeeper utilization endpoint (#10429 ) Add an endpoint to obtain the utilization of a safekeeper. Future changes to the storage controller can use this endpoint to find the most suitable safekeepers for newly created timelines, analogously to how it's done for pageservers already. Initially we just want to assign by timeline count, then we can iterate from there. Part of https://github.com/neondatabase/neon/issues/9011	2025-01-17 21:43:52 +00:00
Vlad Lazar	6975228a76	pageserver: add initdb metrics (#10434 ) ## Problem Initdb observability is poor. ## Summary of changes Add some metrics so we can figure out which part, if any, is slow. Closes https://github.com/neondatabase/neon/issues/10423	2025-01-17 14:51:33 +00:00
JC Grünhage	053abff71f	Fix dependency on neon-image in promote-images-dev (#10437 ) ## Problem `871e8b325f` failed CI on main because a job ran to soon. This was caused by `ea84ec357f`. While `promote-images-dev` does not inherently need `neon-image`, a few jobs depending on `promote-images-dev` do need it, and previously had it when it was `promote-images`, which depended on `test-images`, which in turn depended on `neon-image`. ## Summary of changes To ensure jobs depending `docker.io/neondatabase/neon` images get them, `promote-images-dev` gets the dependency to `neon-image` back which it previously had transitively through `test-images`.	2025-01-17 14:21:30 +00:00
Tristan Partin	871e8b325f	Use the request ID given by the control plane in compute_ctl (#10418 ) Instead of generating our own request ID, we can just use the one provided by the control plane. In the event, we get a request from a client which doesn't set X-Request-ID, then we just generate one which is useful for tracing purposes. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-16 20:46:53 +00:00
Christian Schwarz	c47c5f4ace	fix(page_service pipelining): tenant cannot shut down because gate kept open while flushing responses (#10386 ) # Refs - fixes https://github.com/neondatabase/neon/issues/10309 - fixup of batching design, first introduced in https://github.com/neondatabase/neon/pull/9851 - refinement of https://github.com/neondatabase/neon/pull/8339 # Problem `Tenant::shutdown` was occasionally taking many minutes (sometimes up to 20) in staging and prod if the `page_service_pipelining.mode="concurrent-futures"` is enabled. # Symptoms The issue happens during shard migration between pageservers. There is page_service unavailability and hence effectively downtime for customers in the following case: 1. The source (state `AttachedStale`) gets stuck in `Tenant::shutdown`, waiting for the gate to close. 2. Cplane/Storcon decides to transition the target `AttachedMulti` to `AttachedSingle`. 3. That transition comes with a bump of the generation number, causing the `PUT .../location_config` endpoint to do a full `Tenant::shutdown` / `Tenant::attach` cycle for the target location. 4. That `Tenant::shutdown` on the target gets stuck, waiting for the gate to close. 5. Eventually the gate closes (`close completed`), correlating with a `page_service` connection handler logging that it's exiting because of a network error (`Connection reset by peer` or `Broken pipe`). While in (4): - `Tenant::shutdown` is stuck waiting for all `Timeline::shutdown` calls to complete. So, really, this is a `Timeline::shutdown` bug. - retries from Cplane/Storcon to complete above state transitions, fail with errors related to the tenant mgr slot being in state `TenantSlot::InProgress`, the tenant state being `TenantState::Stopping`, and the timelines being in `TimelineState::Stopping`, and the `Timeline::cancel` being cancelled. - Existing (and/or new?) page_service connections log errors `error reading relation or page version: Not found: Timed out waiting 30s for tenant active state. Latest state: None` # Root-Cause After a lengthy investigation ([internal write-up](https://www.notion.so/neondatabase/2025-01-09-batching-deadlock-Slow-Log-Analysis-in-Staging-176f189e00478050bc21c1a072157ca4?pvs=4)) I arrived at the following root cause. The `spsc_fold` channel (`batch_tx`/`batch_rx`) that connects the Batcher and Executor stages of the pipelined mode was storing a `Handle` and thus `GateGuard` of the Timeline that was not shutting down. The design assumption with pipelining was that this would always be a short transient state. However, that was incorrect: the Executor was stuck on writing/flushing an earlier response into the connection to the client, i.e., socket write being slow because of TCP backpressure. The probable scenario of how we end up in that case: 1. Compute backend process sends a continuous stream of getpage prefetch requests into the connection, but never reads the responses (why this happens: see Appendix section). 2. Batch N is processed by Batcher and Executor, up to the point where Executor starts flushing the response. 3. Batch N+1 is procssed by Batcher and queued in the `spsc_fold`. 4. Executor is still waiting for batch N flush to finish. 5. Batcher eventually hits the `TimeoutReader` error (10min). From here on it waits on the `spsc_fold.send(Err(QueryError(TimeoutReader_error)))` which will never finish because the batch already inside the `spsc_fold` is not being read by the Executor, because the Executor is still stuck in the flush. (This state is not observable at our default `info` log level) 6. Eventually, Compute backend process is killed (`close()` on the socket) or Compute as a whole gets killed (probably no clean TCP shutdown happening in that case). 7. Eventually, Pageserver TCP stack learns about (6) through RST packets and the Executor's flush() call fails with an error. 8. The Executor exits, dropping `cancel_batcher` and its end of the spsc_fold. This wakes Batcher, causing the `spsc_fold.send` to fail. Batcher exits. The pipeline shuts down as intended. We return from `process_query` and log the `Connection reset by peer` or `Broken pipe` error. The following diagram visualizes the wait-for graph at (5) ```mermaid flowchart TD Batcher --spsc_fold.send(TimeoutReader_error)--> Executor Executor --flush batch N responses--> socket.write_end socket.write_end --wait for TCP window to move forward--> Compute ``` # Analysis By holding the GateGuard inside the `spsc_fold` open, the pipelining implementation violated the principle established in (https://github.com/neondatabase/neon/pull/8339). That is, that `Handle`s must only be held across an await point if that await point is sensitive to the `<Handle as Deref<Target=Timeline>>::cancel` token. In this case, we were holding the Handle inside the `spsc_fold` while awaiting the `pgb_writer.flush()` future. One may jump to the conclusion that we should simply peek into the spsc_fold to get that Timeline cancel token and be sensitive to it during flush, then. But that violates another principle of the design from https://github.com/neondatabase/neon/pull/8339. That is, that the page_service connection lifecycle and the Timeline lifecycles must be completely decoupled. Tt must be possible to shut down one shard without shutting down the page_service connection, because on that single connection we might be serving other shards attached to this pageserver. (The current compute client opens separate connections per shard, but, there are plans to change that.) # Solution This PR adds a `handle::WeakHandle` struct that does _not_ hold the timeline gate open. It must be `upgrade()`d to get a `handle::Handle`. That `handle::Handle` _does_ hold the timeline gate open. The batch queued inside the `spsc_fold` only holds a `WeakHandle`. We only upgrade it while calling into the various `handle_` methods, i.e., while interacting with the `Timeline` via `<Handle as Deref<Target=Timeline>>`. All that code has always been required to be (and is!) sensitive to `Timeline::cancel`, and therefore we're guaranteed to bail from it quickly when `Timeline::shutdown` starts. We will drop the `Handle` immediately, before we start `pgb_writer.flush()`ing the responses. Thereby letting go of our hold on the `GateGuard`, allowing the timeline shutdown to complete while the page_service handler remains intact. # Code Changes * Reproducer & Regression Test * Developed and proven to reproduce the issue in https://github.com/neondatabase/neon/pull/10399 * Add a `Test` message to the pagestream protocol (`cfg(feature = "testing")`). * Drive-by minimal improvement to the parsing code, we now have a `PagestreamFeMessageTag`. * Refactor `pageserver/client` to allow sending and receiving `page_service` requests independently. * Add a Rust helper binary to produce situation (4) from above * Rationale: (4) and (5) are the same bug class, we're holding a gate open while `flush()`ing. * Add a Python regression test that uses the helper binary to demonstrate the problem. * Fix * Introduce and use `WeakHandle` as explained earlier. * Replace the `shut_down` atomic with two enum states for `HandleInner`, wrapped in a `Mutex`. * To make `WeakHandle::upgrade()` and `Handle::downgrade()` cache-efficient: * Wrap the `Types::Timeline` in an `Arc` * Wrap the `GateGuard` in an `Arc` * The separate `Arc`s enable uncontended cloning of the timeline reference in `upgrade()` and `downgrade()`. If instead we were `Arc<Timeline>::clone`, different connection handlers would be hitting the same cache line on every upgrade()/downgrade(), causing contention. * Please read the udpated module-level comment in `mod handle` module-level comment for details. # Testing & Performance The reproducer test that failed before the changes now passes, and obviously other tests are passing as well. We'll do more testing in staging, where the issue happens every ~4h if chaos migrations are enabled in storcon. Existing perf testing will be sufficient, no perf degradation is expected. It's a few more alloctations due to the added Arc's, but, they're low frequency. # Appendix: Why Compute Sometimes Doesn't Read Responses Remember, the whole problem surfaced because flush() was slow because Compute was not reading responses. Why is that? In short, the way the compute works, it only advances the page_service protocol processing when it has an interest in data, i.e., when the pagestore smgr is called to return pages. Thus, if compute issues a bunch of requests as part of prefetch but then it turns out it can service the query without reading those pages, it may very well happen that these messages stay in the TCP until the next smgr read happens, either in that session, or possibly in another session. If there’s too many unread responses in the TCP, the pageserver kernel is going to backpressure into userspace, resulting in our stuck flush(). All of this stems from the way vanilla Postgres does prefetching and "async IO": it issues `fadvise()` to make the kernel do the IO in the background, buffering results in the kernel page cache. It then consumes the results through synchronous `read()` system calls, which hopefully will be fast because of the `fadvise()`. If it turns out that some / all of the prefetch results are not needed, Postgres will not be issuing those `read()` system calls. The kernel will eventually react to that by reusing page cache pages that hold completed prefetched data. Uncompleted prefetch requests may or may not be processed -- it's up to the kernel. In Neon, the smgr + Pageserver together take on the role of the kernel in above paragraphs. In the current implementation, all prefetches are sent as GetPage requests to Pageserver. The responses are only processed in the places where vanilla Postgres would do the synchronous `read()` system call. If we never get to that, the responses are queued inside the TCP connection, which, once buffers run full, will backpressure into Pageserver's sending code, i.e., the `pgb_writer.flush()` that was the root cause of the problems we're fixing in this PR.	2025-01-16 20:34:02 +00:00
Tristan Partin	b0838a68e5	Enable pgx_ulid on Postgres 17 (#10397 ) The extension now supports Postgres 17. The release also seems to be binary compatible with the previous version. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-16 19:49:04 +00:00
John Spray	8f2ebc0684	tests: stabilize test_storage_controller_node_deletion (#10420 ) ## Problem `test_storage_controller_node_deletion` sometimes failed because shards were moving around during timeline creation, and neon_local isn't tolerant of that. The movements were unexpected because the shards had only just been created. This was a regression from #9916 Closes: #10383 ## Summary of changes - Make this test use multiple AZs -- this makes the storage controller's scheduling reliably stable Why this works: in #9916 , I made a simplifying assumption that we would have multiple AZs to get nice stable scheduling -- it's much easier, because each tenant has a well defined primary+secondary location when they have an AZ preference and nodes have different AZs. Everything still works if you don't have multiple AZs, but you just have this quirk that sometimes the optimizer can disagree with initial scheduling, so once in a while a shard moves after being created -- annoying for tests, harmless IRL.	2025-01-16 19:00:16 +00:00
Vlad Lazar	3a285a046b	pageserver: include node id when subscribing to SK (#10432 ) ## Problem All pageserver have the same application name which makes it hard to distinguish them. ## Summary of changes Include the node id in the application name sent to the safekeeper. This should gives us more visibility in logs. There's a few metrics that will increase in cardinality by `pageserver_count`, but that's fine.	2025-01-16 18:51:56 +00:00
John Spray	da13154791	storcon: revise fill logic to prioritize AZ (#10411 ) ## Problem Node fills were limited to moving (total shards / node_count) shards. In systems that aren't perfectly balanced already, that leads us to skip migrating some of the shards that belong on this node, generating work for the optimizer later to gradually move them back. ## Summary of changes - Where a shard has a preferred AZ and is currently attached outside this AZ, then always promote it during fill, irrespective of target fill count	2025-01-16 17:33:46 +00:00
John Spray	2e13a3aa7a	storage controller: handle legacy TenantConf in consistency_check (#10422 ) ## Problem We were comparing serialized configs from the database with serialized configs from memory. If fields have been added/removed to TenantConfig, this generates spurious consistency errors. This is fine in test environments, but limits the usefulness of this debug API in the field. Closes: https://github.com/neondatabase/neon/issues/10369 ## Summary of changes - Do a decode/encode cycle on the config before comparing it, so that it will have exactly the expected fields.	2025-01-16 16:56:44 +00:00
Alex Chi Z.	cccc196848	refactor(pageserver): make partitioning an ArcSwap (#10377 ) ## Problem gc-compaction needs the partitioning data to decide the job split. This refactor allows concurrent access/computing the partitioning. ## Summary of changes Make `partitioning` an ArcSwap so that others can access the partitioning while we compute it. Fully eliminate the `repartition is called concurrently` warning when gc-compaction is going on. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-16 15:33:37 +00:00
Arpad Müller	e436dcad57	Rename "disabled" safekeeper scheduling policy to "pause" (#10410 ) Rename the safekeeper scheduling policy "disabled" to "pause". A rename was requested in https://github.com/neondatabase/neon/pull/10400#discussion_r1916259124, as the "disabled" policy is meant to be analogous to the "pause" policy for pageservers. Also simplify the `SkSchedulingPolicyArg::from_str` function, relying on the `from_str` implementation of `SkSchedulingPolicy`. Latter is used for the database format as well, so it is quite stable. If we ever want to change the UI, we'll need to duplicate the function again but this is cheap.	2025-01-16 14:30:49 +00:00
John Spray	21d7b6a258	tests: refactor test_tenant_delete_races_timeline_creation (#10425 ) ## Problem Threads spawned in `test_tenant_delete_races_timeline_creation` are not joined before the test ends, and can generate `PytestUnhandledThreadExceptionWarning` in other tests. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10419/12805365523/index.html#/testresult/53a72568acd04dbd ## Summary of changes - Wrap threads in ThreadPoolExecutor which will join them before the test ends - Remove a spurious deletion call -- the background thread doing deletion ought to succeed.	2025-01-16 14:11:33 +00:00
JC Grünhage	86dbc44db1	CI: Run check-codestyle-rust as part of pre-merge-checks (#10387 ) ## Problem When multiple changes are grouped in a merge group to be merged as part of the merge queue, the changes might individually pass `check-codestyle-rust` but not in their combined form. ## Summary of changes - Move `check-codestyle-rust` into a reusable workflow that is called from it's previous location in `build_and_test.yml`, and additionally call it from `pre_merge_checks.yml`. The additional call does not run on ARM, only x86, to ensure the merge queue continues being responsive. - Trigger `pre_merge_checks.yml` on PRs that change any of the workflows running in `pre_merge_checks.yml`, so that we get feedback on those early an not only after trying to merge those changes.	2025-01-16 09:20:24 +00:00
Tristan Partin	58f6af6c9a	Clean up compute_ctl extension server code (#10417 )	2025-01-16 08:35:36 +00:00
Matthias van de Meent	7be971081a	Make sure we request pages with a known-flushed LSN. (#10413 ) This should fix the largest source of flakyness of test_nbtree_pagesplit_cycleid. ## Problem https://github.com/neondatabase/neon/issues/10390 ## Summary of changes By using a guaranteed-flushed LSN, we ensure that PS won't have to wait forever. (If it does wait forever, we know the issue can't be with Compute's WAL)	2025-01-16 08:34:11 +00:00
Ivan Efremov	c91905e643	Merge pull request #10416 from neondatabase/rc/release-proxy/2025-01-16 Proxy release 2025-01-16	2025-01-16 10:04:38 +02:00
Arseny Sher	6fe4c6798f	Add START_WAL_PUSH proto_version and allow_timeline_creation options. (#10406 ) ## Problem As part of https://github.com/neondatabase/neon/issues/8614 we need to pass options to START_WAL_PUSH. ## Summary of changes Add two options. `allow_timeline_creation`, default true, disables implicit timeline creation in the connection from compute. Eventually such creation will be forbidden completely, but as we migrate to configurations we need to support both: current mode and configurations enabled where creation by compute is disabled. `proto_version` specifies compute <-> sk protocol version. We have it currently in the first greeting package also, but I plan to change tag size from u64 to u8, which would make it hard to use. Command is more appropriate place for it anyway.	2025-01-16 08:01:19 +00:00
github-actions[bot]	44b4e355a2	Proxy release 2025-01-16	2025-01-16 06:02:04 +00:00
Matthias van de Meent	2eda484ef6	prefetch: Read more frequently from TCP buffer (#10394 ) This reduces pressure on the OS TCP read buffer by increasing the moments we read data out of the receive buffer, and increasing the number of bytes we can pull from that buffer when we do reads. ## Problem A backend may not always consume its prefetch data quick enough ## Summary of changes We add a new function `prefetch_pump_state` which pulls as many prefetch requests from the OS TCP receive buffer as possible, but without blocking. This thus reduces pressure on OS-level TCP buffers, thus increasing throughput by limiting throttling caused by full TCP buffers.	2025-01-16 02:43:47 +00:00
Mikhail Kot	c7429af8a0	Enable dblink (#10358 ) Update compute image to include dblink #3720	2025-01-15 22:29:18 +00:00
Alex Chi Z.	a753349cb0	feat(pageserver): validate data integrity during gc-compaction (#10131 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 part of investigation of https://github.com/neondatabase/neon/issues/10049 ## Summary of changes * If `cfg!(test) or cfg!(feature = testing)`, then we will always try generating an image to ensure the history is replayable, but not put the image layer into the final layer results, therefore discovering wrong key history before we hit a read error. * I suspect it's easier to trigger some races if gc-compaction is continuously run on a timeline, so I increased the frequency to twice per 10 churns. * Also, create branches in gc-compaction smoke tests to get more test coverage. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad@neon.tech>	2025-01-15 22:04:06 +00:00
Gleb Novikov	55a68b28a2	fast import: restore to neondb (not postgres) database (#10251 ) ## Problem `postgres` is system database at neon, so we need to do `pg_restore` into `neondb` instead https://github.com/neondatabase/cloud/issues/22100 ## Summary of changes Changed fast_import a little bit: 1. After succesfull connection creating `neondb` in postgres instance 2. Changed restore connstring to use new db 3. Added optional `source_connection_string`, which allows to skip `s3_prefix` and just connect directly. 4. Added `-i` that stops process until sigterm ## TODO - [x] test image in cplane e2e - [ ] Change import job image back to latest after this merged (partial revert of https://github.com/neondatabase/cloud/pull/22338)	2025-01-15 20:51:09 +00:00
John Spray	fb0e2acb2f	pageserver: add `page_trace` API for debugging (#10293 ) ## Problem When a pageserver is receiving high rates of requests, we don't have a good way to efficiently discover what the client's access pattern is. Closes: https://github.com/neondatabase/neon/issues/10275 ## Summary of changes - Add `/v1/tenant/x/timeline/y/page_trace?size_limit_bytes=...&time_limit_secs=...` API, which returns a binary buffer. - Add `pagectl page-trace` tool to decode and analyze the output. --------- Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-01-15 19:07:22 +00:00
Arpad Müller	efaec6cdf8	Add endpoint and storcon cli cmd to set sk scheduling policy (#10400 ) Implementing the last missing endpoint of #9981, this adds support to set the scheduling policy of an individual safekeeper, as specified in the RFC. However, unlike in the RFC we call the endpoint `scheduling_policy` not `status` Closes #9981. As for why not use the upsert endpoint for this: we want to have the safekeeper upsert endpoint be used for testing and for deploying new safekeepers, but not for changes of the scheduling policy. We don't want to change any of the other fields when marking a safekeeper as decommissioned for example, so we'd have to first fetch them only to then specify them again. Of course one can also design an endpoint where one can omit any field and it doesn't get modified, but it's still not great for observability to put everything into one big "change something about this safekeeper" endpoint.	2025-01-15 18:15:30 +00:00
Tristan Partin	3d41069dc4	Update pgrx in extension builds to 0.12.9 (#10372 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-15 16:26:58 +00:00
Vlad Lazar	dbebede7bf	safekeeper: fan out from single wal reader to multiple shards (#10190 ) ## Problem Safekeepers currently decode and interpret WAL for each shard separately. This is wasteful in terms of CPU memory usage - we've seen this in profiles. ## Summary of changes Fan-out interpreted WAL to multiple shards. The basic is that wal decoding and interpretation happens in a separate tokio task and senders attach to it. Senders only receive batches concerning their shard and only past the Lsn they've last seen. Fan-out is gated behind the `wal_reader_fanout` safekeeper flag (disabled by default for now). When fan-out is enabled, it might be desirable to control the absolute delta between the current position and a new shard's desired position (i.e. how far behind or ahead a shard may be). `max_delta_for_fanout` is a new optional safekeeper flag which dictates whether to create a new WAL reader or attach to the existing one. By default, this behaviour is disabled. Let's consider enabling it if we spot the need for it in the field. ## Testing Tests passed [here](https://github.com/neondatabase/neon/pull/10301) with wal reader fanout enabled as of `34f6a71718`. Related: https://github.com/neondatabase/neon/issues/9337 Epic: https://github.com/neondatabase/neon/issues/9329	2025-01-15 15:33:54 +00:00
Tristan Partin	3e529f124f	Remove leading slashes when downloading remote files (#10396 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-15 15:29:52 +00:00
Arseny Sher	05a71c7d6a	safekeeper: add membership configuration switch endpoint (#10241 ) ## Problem https://github.com/neondatabase/neon/issues/9965 ## Summary of changes Add to safekeeper http endpoint to switch membership configuration. Also add it to python client for tests, and add simple test itself.	2025-01-15 14:16:04 +00:00
Alexander Bayandin	b9464865b6	benchmarks: report successful runs to slack as well (#10393 ) ## Problem Successful `benchmarks` runs doesn't have enough visibility Ref https://neondb.slack.com/archives/C069Z2199DL/p1736868055094539 ## Summary of changes - Report both successful and failed `benchmarks` to Slack - Update `slackapi/slack-github-action` action	2025-01-15 13:05:05 +00:00
Vlad Lazar	1577430408	safekeeper: decode and interpret for multiple shards in one go (#10201 ) ## Problem Currently, we call `InterpretedWalRecord::from_bytes_filtered` from each shard. To serve multiple shards at the same time, the API needs to allow for enquiring about multiple shards. ## Summary of changes This commit tweaks it a pretty brute force way. Naively, we could just generate the shard for a key, but pre and post split shards may be subscribed at the same time, so doing it efficiently is more complex.	2025-01-15 11:10:24 +00:00
Erik Grinaker	05d17a10ae	rfc: add CPU and heap profiling RFC (#10085 ) This document proposes a standard cross-team pattern for CPU and memory profiling across applications and languages, using the [pprof](https://github.com/google/pprof) profile format. It enables both ad hoc profiles via HTTP endpoints, and continuous profiling across the fleet via [Grafana Cloud Profiles](https://grafana.com/docs/grafana-cloud/monitor-applications/profiles/). Continuous profiling incurs an overhead of about 0.1% CPU usage and 3% slower heap allocations. [Rendered](https://github.com/neondatabase/neon/blob/erik/profiling-rfc/docs/rfcs/040-profiling.md) Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888.	2025-01-15 10:35:38 +00:00
Arseny Sher	2d0ea08524	Add safekeeper membership conf to control file. (#10196 ) ## Problem https://github.com/neondatabase/neon/issues/9965 ## Summary of changes Add safekeeper membership configuration struct itself and storing it in the control file. In passing also add creation timestamp to the control file (there were cases where I wanted it in the past). Remove obsolete unused PersistedPeerInfo struct from control file (still keep it control_file_upgrade.rs to have it in old upgrade code). Remove the binary representation of cfile in the roundtrip test. Updating it is annoying, and we still test the actual roundtrip. Also add configuration to timeline creation http request, currently used only in one python test. In passing, slightly change LSNs meaning in the request: normally start_lsn is passed (the same as ancestor_start_lsn in similar pageserver call), but we allow specifying higher commit_lsn for manual intervention if needed. Also when given LSN initialize term_history with it.	2025-01-15 09:45:58 +00:00
Arseny Sher	c98cbbeac1	Add migration details to safekeeper membership RFC. (#10272 ) ## Problem https://github.com/neondatabase/neon/pull/8455 wasn't specific enough on migration from current situation to enabling generations. ## Summary of changes Describe the missing parts, including control plane pushing generation to compute, which also defines whether generations are enabled -- non zero value does it.	2025-01-15 09:41:49 +00:00
John Spray	47c1640acc	storage controller: pagination for tenant listing API (#10365 ) ## Problem For large deployments, the `control/v1/tenant` listing API can time out transmitting a monolithic serialized response. ## Summary of changes - Add `limit` and `start_after` parameters to listing API - Update storcon_cli to use these parameters and limit requests to 1000 items at a time	2025-01-14 21:37:32 +00:00
Erik Grinaker	6debb49b87	pageserver: coalesce index uploads when possible (#10248 ) ## Problem With upload queue reordering in #10218, we can easily get into a situation where multiple index uploads are queued back to back, which can't be parallelized. This will happen e.g. when multiple layer flushes enqueue layer/index/layer/index/... and the layers skip the queue and are uploaded in parallel. These index uploads will incur serial S3 roundtrip latencies, and may block later operations. Touches #10096. ## Summary of changes When multiple back-to-back index uploads are ready to upload, only upload the most recent index and drop the rest.	2025-01-14 21:10:17 +00:00
Erik Grinaker	e58e29e639	pageserver: limit number of upload queue tasks (#10384 ) ## Problem The upload queue can currently schedule an arbitrary number of tasks. This can both spawn an unbounded number of Tokio tasks, and also significantly slow down upload queue scheduling as it's quadratic in number of operations. Touches #10096. ## Summary of changes Limit the number of inprogress tasks to the remote storage upload concurrency. While this concurrency limit is shared across all tenants, there's certainly no point in scheduling more than this -- we could even consider setting the limit lower, but don't for now to avoid artificially constraining tenants.	2025-01-14 18:01:14 +00:00
Heikki Linnakangas	d36112d20f	Simplify compute dockerfile by setting PATH just once (#10357 ) By setting PATH in the 'pg-build' layer, all the extension build layers will inherit. No need to pass PG_CONFIG to all the various make invocations either: once pg_config is in PATH, the Makefiles will pick it up from there.	2025-01-14 17:02:35 +00:00
Erik Grinaker	ffaa52ff5d	pageserver: reorder upload queue when possible (#10218 ) ## Problem The upload queue currently sees significant head-of-line blocking. For example, index uploads act as upload barriers, and for every layer flush we schedule a layer and index upload, which effectively serializes layer uploads. Resolves #10096. ## Summary of changes Allow upload queue operations to bypass the queue if they don't conflict with preceding operations, increasing parallelism. NB: the upload queue currently schedules an explicit barrier after every layer flush as well (see #8550). This must be removed to enable parallelism. This will require a better mechanism for compaction backpressure, see e.g. #8390 or #5415.	2025-01-14 16:31:59 +00:00
John Spray	aa7323a384	storage controller: quality of life improvements for AZ handling (#10379 ) ## Problem Since https://github.com/neondatabase/neon/pull/9916, the preferred AZ of a tenant is much more impactful, and we would like to make it more visible in tooling. ## Summary of changes - Include AZ in node describe API - Include AZ info in node & tenant outputs in CLI - Add metrics for per-node shard counts, labelled by AZ - Add a CLI for setting preferred AZ on a tenant - Extend AZ-setting API+CLI to handle None for clearing preferred AZ	2025-01-14 15:30:43 +00:00
Christian Schwarz	2466a2f977	page_service: throttle individual requests instead of the batched request (#10353 ) ## Problem Before this PR, the pagestream throttle was applied weighted on a per-batch basis. This had several problems: 1. The throttle occurence counters were only bumped by `1` instead of `batch_size`. 2. The throttle wait time aggregator metric only counted one wait time, irrespective of `batch_size`. That makes sense in some ways of looking at it but not in others. 3. If the last request in the batch runs into the throttle, the other requests in the batch are also throttled, i.e., over-throttling happens (theoretical, didn't measure it in practice). ## Solution It occured to me that we can simply push the throttling upwards into `pagestream_read_message`. This has the added benefit that in pipeline mode, the `executor` stage will, if it is idle, steal whatever requests already made it into the `spsc_fold` and execute them; before this change, that was not the case - the throttling happened in the `executor` stage instead of the `batcher` stage. ## Code Changes There are two changes in this PR: 1. Lifting up the throttling into the `pagestream_read_message` method. 2. Move the throttling metrics out of the `Throttle` type into `SmgrOpMetrics`. Unlike the other smgr metrics, throttling is per-tenant, hence the Arc. 3. Refactor the `SmgrOpTimer` implementation to account for the new observation states, and simplify its design. 4. Drive-by-fix flush time metrics. It was using the same `now` in the `observe_guard` every time. The `SmgrOpTimer` is now a state machine. Each observation point moves the state machine forward. If a timer object is dropped early some "pair"-like metrics still require an increment or observation. That's done in the Drop implementation, by driving the state machine to completion.	2025-01-14 15:28:01 +00:00
Alex Chi Z.	9bdb14c1c0	fix(pageserver): ensure initial image layers have correct key ranges (#10374 ) ## Problem Discovered during the relation dir refactor work. If we do not create images as in this patch, we would get two set of image layers: ``` 0000...METADATA_KEYS 0000...REL_KEYS ``` They overlap at the same LSN and would cause data loss for relation keys. This doesn't happen in prod because initial image layer generation is never called, but better to be fixed to avoid future issues with the reldir refactors. ## Summary of changes * Consolidate create_image_layers call into a single one. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-14 15:27:48 +00:00
Conrad Ludgate	df4abd8b14	fix: force-refresh azure identity token (#10378 ) ## Problem Because of https://github.com/Azure/azure-sdk-for-rust/issues/1739, our identity token file was not being refreshed. This caused our uploads to start failing when the storage token expired. ## Summary of changes Drop and recreate the remote storage config every time we upload in order to force reload the identity token file.	2025-01-14 12:53:32 +00:00
Konstantin Knizhnik	a039f8381f	Optimize vector get last written LSN (#10360 ) ## Problem See https://github.com/neondatabase/neon/issues/10281 pg17 performs extra lock/unlock operation when fetching LwLSN. ## Summary of changes Perform all lookups under one lock, moving initialization of not found keys to separate loop. Related Postgres PR: https://github.com/neondatabase/postgres/pull/553 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-14 05:54:30 +00:00
Tristan Partin	430b556b34	Update postgres-exporter and sql_exporter in computes (#10349 ) The postgres-exporter was much further out of date, but let's just bump both. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-14 00:44:39 +00:00
Konstantin Knizhnik	1783501eaa	Increase max connection for replica to prevent test flukyness (#10306 ) ## Problem See https://github.com/neondatabase/neon/issues/10167 Too small number of `max_connections` (2) can cause failures of test_physical_replication_config_mismatch_too_many_known_xids test ## Summary of changes Increase `max_connections` to 5 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-13 20:01:03 +00:00
John Spray	fd1368d31e	storcon: rework scheduler optimisation, prioritize AZ (#9916 ) ## Problem We want to do a more robust job of scheduling tenants into their home AZ: https://github.com/neondatabase/neon/issues/8264. Closes: https://github.com/neondatabase/neon/issues/8969 ## Summary of changes ### Scope This PR combines prioritizing AZ with a larger rework of how we do optimisation. The rationale is that just bumping AZ in the order of Score attributes is a very tiny change: the interesting part is lining up all the optimisation logic to respect this properly, which means rewriting it to use the same scores as the scheduler, rather than the fragile hand-crafted logic that we had before. Separating these changes out is possible, but would involve doing two rounds of test updates instead of one. ### Scheduling optimisation `TenantShard`'s `optimize_attachment` and `optimize_secondary` methods now both use the scheduler to pick a new "favourite" location. Then there is some refined logic for whether + how to migrate to it: - To decide if a new location is sufficiently "better", we generate scores using some projected ScheduleContexts that exclude the shard under consideration, so that we avoid migrating from a node with AffinityScore(2) to a node with AffinityScore(1), only to migrate back later. - Score types get a `for_optimization` method so that when we compare scores, we will only do an optimisation if the scores differ by their highest-ranking attributes, not just because one pageserver is lower in utilization. Eventually we _will_ want a mode that does this, but doing it here would make scheduling logic unstable and harder to test, and to do this correctly one needs to know the size of the tenant that one is migrating. - When we find a new attached location that we would like to move to, we will create a new secondary location there, even if we already had one on some other node. This handles the case where we have a home AZ A, and want to migrate the attachment between pageservers in that AZ while retaining a secondary location in some other AZ as well. - A unit test is added for https://github.com/neondatabase/neon/issues/8969, which is implicitly fixed by reworking optimisation to use the same scheduling scores as scheduling.	2025-01-13 19:33:00 +00:00
Alex Chi Z.	e9ed53b14f	feat(pageserver): support inherited sparse keyspace (#10313 ) ## Problem In preparation to https://github.com/neondatabase/neon/issues/9516. We need to store rel size and directory data in the sparse keyspace, but it does not support inheritance yet. ## Summary of changes Add a new type of keyspace "sparse but inherited" into the system. On the read path: we don't remove the key range when we descend into the ancestor. The search will stop when (1) the full key range is covered by image layers (which has already been implemented before), or (2) we reach the end of the ancestor chain. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-13 15:43:01 +00:00
Conrad Ludgate	a338aee132	feat(local_proxy): use ed25519 signatures with pg_session_jwt (#10290 ) Generally ed25519 seems to be much preferred for cryptographic strength to P256 nowadays, and it is NIST approved finally. We should use it where we can as it's also faster than p256. This PR makes the re-signed JWTs between local_proxy and pg_session_jwt use ed25519. This does introduce a new dependency on ed25519, but I do recall some Neon Authorise customers asking for support for ed25519, so I am justifying this dependency addition in the context that we can then introduce support for customer ed25519 keys sources: * https://csrc.nist.gov/pubs/fips/186-5/final subsection 7 (EdDSA) * https://datatracker.ietf.org/doc/html/rfc8037#section-3.1	2025-01-13 15:20:46 +00:00
Heikki Linnakangas	96243af651	Stop building unnecessary extension tarballs (#10355 ) We build "custom extensions" from a different repository nowadays.	2025-01-13 15:01:13 +00:00
John Spray	ef8bfacd6b	storage controller: API + CLI for migrating secondary locations (#10284 ) ## Problem Currently, if we want to move a secondary there isn't a neat way to do that: we just have migration API for the attached location, and it is only clean to use that if you've manually created a secondary via pageserver API in the place you're going to move it to. Secondary migration API enables: - Moving the secondary somewhere because we would like to later move the attached location there. - Move the secondary location because we just want to reclaim some disk space from its current location. ## Summary of changes - Add `/migrate_secondary` API - Add `tenant-shard-migrate-secondary` CLI - Add tests for above	2025-01-13 14:52:43 +00:00
Konstantin Knizhnik	ceacc29609	Start with minimal prefetch distance to minimize prefetch overhead for exact or limited index scans (#10359 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1736526089437179 In case of queries index scan with LIMIT clause, multiple backends can concurrently send larger number of duplicated prefetch requests which are not stored in LFC and so actually do useless job. Current implementation of index prefetch starts with maximal prefetch distance (10 by default now) when there are no key bounds, so in queries with LIMIT clause like `select * from T order by pk limit 1` compute can send a lot of useless prefetch requests to page server. ## Summary of changes Always start with minimal prefetch distance even if there are not key boundaries. Related Postgres PRs: https://github.com/neondatabase/postgres/pull/552 https://github.com/neondatabase/postgres/pull/551 https://github.com/neondatabase/postgres/pull/550 https://github.com/neondatabase/postgres/pull/549 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-13 14:26:11 +00:00
Erik Grinaker	b31ed0acd1	utils: add ?force=true hint for CPU profiler (#10368 ) This makes it less annoying to try to take a CPU profile when a continuous profile is already running.	2025-01-13 14:23:42 +00:00
Alexander Bayandin	b2d0e1a519	Link OpenSSL dynamically (#10302 ) ## Problem Statically linked OpenSSL is buggy in multithreaded environment: - https://github.com/neondatabase/cloud/issues/16155 - https://github.com/neondatabase/neon/issues/8275 ## Summary of changes - Link OpenSSL dynamically (revert OpenSSL part from https://github.com/neondatabase/neon/pull/8074) Before: ``` ldd /usr/local/v17/lib/libpq.so linux-vdso.so.1 (0x0000ffffb5ce4000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb5c10000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb5650000) /lib/ld-linux-aarch64.so.1 (0x0000ffffb5ca7000) ``` After: ``` ldd /usr/local/v17/lib/libpq.so linux-vdso.so.1 (0x0000ffffbf3e8000) libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3 (0x0000ffffbf260000) libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3 (0x0000ffffbec00000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffbf1c0000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbea50000) /lib/ld-linux-aarch64.so.1 (0x0000ffffbf3ab000) ```	2025-01-13 14:13:02 +00:00
John Spray	d1bc36f536	storage controller: fix retries of compute hook notifications while a secondary node is offline (#10352 ) ## Problem We would sometimes fail to retry compute notifications: 1. Try and send, set compute_notify_failure if we can't 2. On next reconcile, reconcile() fails for some other reason (e.g. tried to talk to an offline node), and we fail the `result.is_ok() && must_notify` condition around the re-sending. Closes: https://github.com/neondatabase/cloud/issues/22612 ## Summary of changes - Clarify the meaning of the reconcile result: it should be Ok(()) if configuring attached location worked, even if secondary or detach locations cannot be reached. - Skip trying to talk to secondaries if they're offline - Even if reconcile fails and we can't send the compute notification (we can't send it because we're not sure if it's really attached), make sure we save the `compute_notify_failure` flag so that subsequent reconciler runs will try again - Add a regression test for the above	2025-01-13 13:31:57 +00:00
Erik Grinaker	0b9032065e	utils: allow 60-second CPU profiles (#10367 ) Taking continuous profiles every 20 seconds is likely too expensive (in dollar terms). Let's try 60-second profiles. We can now interrupt running profiles via `?force=true`, so this should be fine.	2025-01-13 13:14:23 +00:00
Heikki Linnakangas	09fe3b025c	Add a websockets tunnel and a test for the proxy's websockets support. (#3823 ) For testing the proxy's websockets support. I wrote this to test https://github.com/neondatabase/neon/issues/3822. Unfortunately, that bug can not be reproduced with this tunnel. The bug only appears when the client pipelines the first query with the authentication messages. The tunnel doesn't do that. --- Update (@conradludgate 2025-01-10): We have since added some websocket tests, but they manually implemented a very simplistic setup of the postgres protocol. Introducing the tunnel would make more complex testing simpler in the future. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2025-01-13 11:35:39 +00:00
John Spray	12053cf832	storage controller: improve consistency_check_api (#10363 ) ## Problem Limitations found while using this to investigate https://github.com/neondatabase/neon/issues/10234: - If we hit a node consistency issue, we drop out and don't check shards for consistency - The messages printed after a shard consistency issue are huge, and grafana appears to drop them. ## Summary of changes - Defer node consistency errors until the end of the function, so that we always proceed to check shards for consistency - Print out smaller log lines that just point out the diffs between expected and persistent state	2025-01-13 11:18:14 +00:00
Conrad Ludgate	de199d71e1	chore: Address lints introduced in rust 1.85.0 beta (#10340 ) With a new beta build of the rust compiler, it's good to check out the new lints. Either to find false positives, or find flaws in our code. Additionally, it helps reduce the effort required to update to 1.85 in 6 weeks.	2025-01-13 10:34:36 +00:00
Erik Grinaker	22a6460010	libs/utils: add `force` parameter for `/profile/cpu` (#10361 ) ## Problem It's only possible to take one CPU profile at a time. With Grafana continuous profiling, a (low-frequency) CPU profile will always be running, making it hard to take an ad hoc CPU profile at the same time. Resolves #10072. ## Summary of changes Add a `force` parameter for `/profile/cpu` which will end and return an already running CPU profile, starting a new one for the current caller.	2025-01-13 10:01:18 +00:00
Erik Grinaker	cd982a82ec	pageserver,safekeeper: increase heap profiling frequency to 2 MB (#10362 ) ## Problem Currently, the heap profiling frequency is every 1 MB allocated. Taking a profile stack trace takes about 1 µs, and allocating 1 MB takes about 15 µs, so the overhead is about 6.7% which is a bit high. This is a fixed cost regardless of whether heap profiles are actually accessed. ## Summary of changes Increase the heap profiling sample frequency from 1 MB to 2 MB, which reduces the overhead to about 3.3%. This seems acceptable, considering performance-sensitive code will avoid allocations as far as possible anyway.	2025-01-13 09:44:59 +00:00
Heikki Linnakangas	8327f68043	Minor cleanup of extension build commands (#10356 ) There used to be some pg version dependencies in these extensions, but now that there isn't, follow the simpler pattern used in other extensions. No change in the produced images.	2025-01-11 17:39:27 +00:00
Heikki Linnakangas	846e8fdce4	Remove obsolete hnsw extension (#8008 ) This has been deprecated and disabled for new installations for a long time. Let's remove it for good.	2025-01-11 14:20:50 +00:00
Heikki Linnakangas	70a3bf37a0	Stop building 'compute-tools' image (#10333 ) It's been unused from time immemorial. --------- Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2025-01-11 13:09:55 +00:00
Arpad Müller	23c0748cdd	Remove active column (#10335 ) We don't need or want the `active` column. Remove it. Vlad pointed out that this is safe. Thanks to the separation of the schemata in earlier PRs, this is easy. follow-up of #10205 Part of https://github.com/neondatabase/neon/issues/9981	2025-01-11 02:52:45 +00:00
Alex Chi Z.	b5d54ba52a	refactor(pageserver): move queue logic to compaction.rs (#10330 ) ## Problem close https://github.com/neondatabase/neon/issues/10031, part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes Move the compaction job generation to `compaction.rs`, thus making the code more readable and debuggable. We now also return running job through the get compaction job API, versus before we only return scheduled jobs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-10 20:53:00 +00:00
Christian Schwarz	58332cb361	pageserver: remove unused metric `pageserver_layers_visited_per_read_global` (#10141 ) As of commit "pageserver: remove legacy read path" (#8601) we always use vectored get, which has a separate metric.	2025-01-10 20:35:50 +00:00
Christian Schwarz	9b43204893	fix(page_service): Timeline::gate held open while throttling (#10314 ) When we moved throttling up from Timeline::get into page_service, we stopped being sensitive to `Timeline::cancel`, even though we're holding a Handle and thus a guard on the `Timeline::gate` open. This PR rectifies the situation. Refs - Found while investigating #10309 (hung detach because gate kept open), but not expected to be the root cause of that issue because the affected tenants are not being throttled according to their metrics.	2025-01-10 19:21:01 +00:00
Christian Schwarz	cdd34dfc12	impr(utils/spsc_fold): add another test case (#10319 ) Wondered about the case covered here while investigating #10309.	2025-01-10 19:19:48 +00:00
Vlad Lazar	3cd5034eac	storcon: don't assume host:port format in storcon client (#10347 ) ## Problem https://github.com/neondatabase/infra/pull/2725 updated the scrubber to use a non-host port endpoint for storcon. That breaks when unwrapping the port. ## Summary of changes Support both `host:port` and `host` formats for the storcon api.	2025-01-10 18:35:16 +00:00
Cheng Chen	425b777840	chore(compute): pg_mooncake v0.1.0 (#10337 ) ## Problem Upgrade pg_mooncake to v0.1.0 ## Summary of changes	2025-01-10 16:38:13 +00:00
John Spray	4398051385	tests: smaller datasets in LFC tests (#10346 ) ## Problem These two tests came up in #9537 as doing multi-gigabyte I/O, and from inspection of the tests it doesn't seem like they need that to fulfil their purpose. ## Summary of changes - In test_local_file_cache_unlink, run fewer background threads with a smaller number of rows. These background threads AFAICT exist to make sure some I/O is going on while we unlink the LFC directory, but 5 threads should be enough for "some". - In test_lfc_resize, tweak the test to validate that the cache size is larger than the final size before resizing it, so that we're sure we're writing enough data to really be doing something. Then decrease the pgbench scale.	2025-01-10 15:53:23 +00:00
Folke Behrens	71bca6f580	poetry: Update packaging for poetry v2 (#10344 ) ## Problem When poetry v2 (released Jan 5) is used it needs `packaging.metadata` module, but we downgrade `packaging` to 23.0. `packaging==23.1` introduced the metadata submodule. ## Summary of changes Update `packaging` to 24.2.	2025-01-10 14:32:26 +00:00
John Spray	105f66c4ce	tests: move test_parallel_copy into performance tree (#10343 ) ## Problem This test writes ~5GB of data. It is not suitable to run in parallel with all the other small tests in test_runner/regress. via #9537 ## Summary of changes - Move test_parallel_copy into the performance directory, so that it does not run in parallel with other tests	2025-01-10 13:57:26 +00:00
John Spray	0d4fce2d35	tests: refine how compat snapshot is generated (#10342 ) ## Problem I noticed in https://github.com/neondatabase/neon/pull/9537 that tests which work with compat snapshots were writing several hundred MB of data, which isn't really necessary. Also, the snapshots are large but don't have the proper variety of storage format features, e.g. they could just have L0 deltas. ## Summary of changes - Use smaller scale factor and runtime to generate less data - Configure a small layer size and use force image layer generation so that our output contains L1 deltas and image layers, and has a decent number of entries in the layer map	2025-01-10 13:57:23 +00:00
Erik Grinaker	2b8ea1e768	utils: add flamegraph for heap profiles (#10223 ) ## Problem Unlike CPU profiles, the `/profile/heap` endpoint can't automatically generate SVG flamegraphs. This requires the user to install and use `pprof` tooling, which is unnecessary and annoying. Resolves #10203. ## Summary of changes Add `format=svg` for the `/profile/heap` route, and generate an SVG flamegraph using the `inferno` crate, similarly to what `pprof-rs` already does for CPU profiles.	2025-01-10 12:14:29 +00:00
Christian Schwarz	db00eb41a1	fix(spsc_fold): potentially missing wakeup when send()ing in state `SenderWaitsForReceiverToConsume` (#10318 ) # Problem Before this PR, there were cases where send() in state SenderWaitsForReceiverToConsume would never be woken up by the receiver, because it never registered with `wake_sender`. Example Scenario 1: we stop polling a send() future A that was waiting for the receiver to consume. We drop A and create a new send() future B. B would return Poll::Pending and never regsister a waker. Example Scenario 2: a send() future A transitions from HasData to SenderWaitsForReceiverToConsume. This registers the context X with `wake_sender`. But before the Receiver consumes the data, we poll A from a different context Y. The state is still SenderWaitsForReceiverToConsume, but we wouldn't register the new context with `wake_sender`. When the Receiver comes around to consume and `wake_sender.notify()`s, it wakes the old context X instead of Y. # Fix Register the waker in the case where we're polled in state `SenderWaitsForReceiverToConsume`. # Relation to #10309 I found this bug while investigating #10309. There was never proof that this bug here is the root cause for #10309. In the meantime we found a more probably hypothesis for the root cause than what is being fixed here. Regardless, let's walk through my thought process about how it might have been relevant: There (in page_service), Scenario 1 does not apply because we poll the send() future to completion. Scenario 2 (`tokio::join!`) also does not apply with the current `tokio::join!()` impl, because it will just poll each future every time, each with the same context. Although if we ever used something like a FuturesUnordered anywhere, that will be using a different context, so, in that case, the bug might materialize. Regarding tokio & spurious poll in general: @conradludgate is not aware of any spurious wakeup cases in current tokio, but within a `tokio::join!()`, any wake meant for one future will poll all the futures, so that can appear as a spurious wake up to the N-1 futures of the `tokio::join!()`.	2025-01-10 11:06:03 +00:00
Conrad Ludgate	735c66dc65	fix(proxy): propagate the existing ComputeUserInfo to connect for cancellation (#10322 ) ## Problem We were incorrectly constructing the ComputeUserInfo, used for cancellation checks, based on the return parameters from postgres. This didn't contain the correct info. ## Summary of changes Propagate down the existing ComputeUserInfo.	2025-01-10 09:36:51 +00:00
Folke Behrens	77660f3d88	proxy: Fix parsing of UnknownTopic with payload (#10339 ) ## Problem When the proxy receives a `Notification` with an unknown topic it's supposed to use the `UnknownTopic` unit variant. Unfortunately, in adjacently tagged enums serde will not simply ignore the configured content if found and try to deserialize a map/object instead. ## Summary of changes * Use a custom deserialize function to ignore variant content. * Add a little unit test covering both cases.	2025-01-10 09:12:31 +00:00
Folke Behrens	b6205af4a5	Update tracing/otel crates (#10311 ) Update the tracing(-x) and opentelemetry(-x) crates. Some breaking changes require updating our code: * Initialization is done via builders now https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/CHANGELOG.md#0270 * Errors from OTel SDK are logged via tracing crate as well. https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry/CHANGELOG.md#0270	2025-01-10 08:48:03 +00:00
Arpad Müller	6149ac8834	Handle race between auto-offload and unarchival (#10305 ) ## Problem Auto-offloading as requested by the compaction task is racy with unarchival, in that the compaction task might attempt to offload an unarchived timeline. By that point it will already have set the timeline to the `Stopping` state however, which makes it unusable for any purpose. For example: 1. compaction task decides to offload timeline 2. timeline gets unarchived 3. `offload_timeline` gets called by compaction task * sets timeline's state to `Stopping` * realizes that the timeline can't be unarchived, errors out 6. endpoint can't be started as the timeline is `Stopping` and thus 'can't be found'. A future iteration of the compaction task can't "heal" this state either as the timeline will still not be archived, same goes for other automatic stuff. The only way to heal this is a tenant detach+attach, or alternatively a pageserver restart. Furthermore, the compaction task is especially amenable for such races as it first stores `can_offload` into a variable, figures out whether compaction is needed (which takes some time), and only then does it attempt an offload operation: the time difference between "check" and "use" is non-trivially small. To make it even worse, we start the compaction task right after attach of a tenant, and it is a common pattern by pageserver users to attach a tenant to then immediately unarchive a timeline, so that an endpoint can be started. ## Solutions not adopted The simplest solution is to move the `can_offload` check to right before attempting of the offload. But this is not a good solution, as no lock is held between that check and timeline shutdown. So races would still be possible, just become less likely. I explored using the timeline state for this, as in adding an additional enum variant. But `Timeline::set_state` is racy (#10297). ## Adopted solution We use the lock on the timeline's upload queue as an arbiter: either unarchival gets to it first and sours the state for auto-offloading, or auto-offloading shuts it down, which stops any parallel unarchival in its tracks. The key part is not releasing the upload queue's lock between the check whether the timeline is archived or not, and shutting it down (the actual implementation only sets `shutting_down` but it has the same effect on `initialized_mut()` as a full shutdown). The rest of the patch is stuff that follows from this. We also move the part where we set the state to `Stopping` to after that arbiter has decided the fate of the timeline. For deletions, we do keep it inside `DeleteTimelineFlow::prepare` however, so that it is called with all of the the timelines locks held that the function allocates (timelines lock most importantly). This is only a precautionary measure however, as I didn't want to analyze deletion related code for possible races. ## Future changes It might make sense to move `can_offload` to right before the offload attempt. Maybe some other properties might have changed as well. Although this will not be perfect either as no lock is held. I want to keep it out of this change to emphasize that this move wasn't the main reason we are race free now. Fixes #10220	2025-01-09 20:41:49 +00:00
Tristan Partin	49756a0d01	Implement compute_ctl management API in Axum (#10099 ) This is a refactor to create better abstractions related to our management server. It cleans up the code, and prepares everything for authorized communication to and from the control plane. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-09 20:08:26 +00:00
Arpad Müller	99b5a6705f	Update rust to 1.84.0 (#10328 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.84.0/). Prior update was in #9926.	2025-01-09 18:29:09 +00:00
Alexey Kondratov	f37eeb56ad	fix(compute_ctl): Resolve issues with dropping roles having dangling permissions (#10299 ) ## Problem In Postgres, one cannot drop a role if it has any dependent objects in the DB. In `compute_ctl`, we automatically reassign all dependent objects in every DB to the corresponding DB owner. Yet, it seems that it doesn't help with some implicit permissions. The issue is reproduced by installing a `postgis` extension because it creates some views and tables in the public schema. ## Summary of changes Added a repro test without using a `postgis`: i) create a role via `compute_ctl` (with `neon_superuser` grant); ii) create a test role, a table in schema public, and grant permissions via the role in `neon_superuser`. To fix the issue, I added a new `compute_ctl` code that removes such dangling permissions before dropping the role. It's done in the least invasive way, i.e., only touches the schema public, because i) that's the problem we had with PostGIS; ii) it creates a smaller chance of messing anything up and getting a stuck operation again, just for a different reason. Properly, any API-based catalog operations should fail gracefully and provide an actionable error and status code to the control plane, allowing the latter to unwind the operation and propagate an error message and hint to the user. In this sense, it's aligned with another feature request https://github.com/neondatabase/cloud/issues/21611 Resolve neondatabase/cloud#13582	2025-01-09 16:39:53 +00:00
Arpad Müller	bebc46e713	Add scheduling_policy column to safekeepers table (#10205 ) Add a `scheduling_policy` column to the safekeepers table of the storage controller. Part of #9981	2025-01-09 15:55:02 +00:00
John Spray	ad51622568	remote_storage: enable Azure connection pooling by default (#10324 ) ## Problem Initially we defaulted this to zero to reduce risk. We have now been using pooling in staging for some time without issues, so let's make it the default for anyone using this software without setting the config explicitly. Closes: https://github.com/neondatabase/cloud/issues/20971 ## Summary of changes - Set Azure blob storage connection pool size to 8 by default	2025-01-09 15:34:06 +00:00
John Spray	ac6cca17ac	storcon: don't log a heartbeat error during shutdown (#10325 ) ## Problem Occasionally we see an unexpected error like: ``` ERROR spawn_heartbeat_driver: Failed to update node state 1 after heartbeat round: Shutting down\n') Hint: use scripts/check_allowed_errors.sh to test any new allowed_error you add ``` https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10324/12690404952/index.html#/testresult/63406a0687bf6eca ## Summary of changes - Explicitly handle ApiError::ShuttingDown as a no-op when mutating node status	2025-01-09 15:33:44 +00:00
Alex Chi Z.	640ac4fc9e	fix(pageserver): report timestamp is in the past if the key is missing (#10210 ) ## Problem If for some reasons we already garbage-collected the data under an LSN but the caller uses a past LSN for the find_time_cutoff function, now we will report a missing key error and GC will never proceed. Note that missing key error can also happen if the key is really missing (i.e., during the past offload incidents) ## Summary of changes Make sure GC proceeds by bumping the LSN. When time_cutoff=None, we will not increase the time_cutoff (it will be set to latest_gc_cutoff). If we really need to bump the GC LSN for maintenance purpose, we need a separate API to do that. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-09 14:43:20 +00:00
Konstantin Knizhnik	20c40eb733	Add response tag to getpage request in V3 protocol version (#8686 ) ## Problem We have several serious data corruption incidents caused by mismatch of get-age requests: https://neondb.slack.com/archives/C07FJS4QF7V/p1723032720164359 We hope that the problem is fixed now. But it is better to prevent such kind of problems in future. Part of https://github.com/neondatabase/cloud/issues/16472 ## Summary of changes This PR introduce new V3 version of compute<->pageserver protocol, adding tag to getpage response. So now compute is able to check if it really gets response to the requested page. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-01-09 13:12:04 +00:00
Vlad Lazar	f4739d49e3	pageserver: tweak interpreted ingest record metrics (#10291 ) ## Problem The filtered record metric doesn't make sense for interpreted ingest. ## Summary of changes While of dubious utility in the first place, this patch replaces them with records received and records observed metrics for interpreted ingest: * received records cause the pageserver to do _something_: write a key, value pair to storage, update some metadata or flush pending modifications * observed records are a shard 0 concept and contain only key metadata used in tracking relation sizes (received records include observed records)	2025-01-09 12:31:02 +00:00
Arseny Sher	030ab1c0e8	TLA+ spec for safekeeper membership change (#9966 ) ## Problem We want to define the algorithm for safekeeper membership change. ## Summary of changes Add spec for it, several models and logs of checking them. ref https://github.com/neondatabase/neon/issues/8699	2025-01-09 12:26:17 +00:00
John Spray	5baa4e7f0a	docker: don't set LD_LIBRARY_PATH (#10321 ) ## Problem This was causing storage controller to still use neon-built libpq instead of vanilla libpq. Since https://github.com/neondatabase/neon/pull/10269 we have a vanilla postgres in the system path -- anything that wants a postgres library will use that. ## Summary of changes - Remove LD_LIBRARY_PATH assignment in Dockerfile	2025-01-09 11:47:55 +00:00
Folke Behrens	03666a1f37	Merge pull request #10320 from neondatabase/rc/release-proxy/2025-01-09 Proxy release 2025-01-09	2025-01-09 10:19:07 +01:00
Tristan Partin	5b2751397d	Refactor MigrationRunner::run_migrations() to call a helper (#10232 ) This will make it easier to add per-db migrations, such as that for CVE-2024-4317. Link: https://www.postgresql.org/support/security/CVE-2024-4317/ Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-09 07:05:07 +00:00
github-actions[bot]	9c92242ca0	Proxy release 2025-01-09	2025-01-09 06:02:06 +00:00
Ivan Efremov	fcfff72454	impr(proxy): Decouple ip_allowlist from the CancelClosure (#10199 ) This PR removes the direct dependency of the IP allowlist from CancelClosure, allowing for more scalable and flexible IP restrictions and enabling the future use of Redis-based CancelMap storage. Changes: - Introduce a new BackendAuth async trait that retrieves the IP allowlist through existing authentication methods; - Improve cancellation error handling by instrument() async cancel_sesion() rather than dropping it. - Set and store IP allowlist for SCRAM Proxy to consistently perform IP allowance check Relates to #9660	2025-01-08 19:34:53 +00:00
Anastasia Lubennikova	0ad0db6ff8	compute: dropdb DROP SUBSCRIPTION fix (#10066 ) ## Problem Project gets stuck if database with subscriptions was deleted via API / UI. https://github.com/neondatabase/cloud/issues/18646 ## Summary of changes Before dropping the database, drop all the subscriptions in it. Do not drop slot on publisher, because we have no guarantee that the slot still exists or that the publisher is reachable. Add `DropSubscriptionsForDeletedDatabases` phase to run these operations in all databases, we're about to delete. Ignore the error if the database does not exist.	2025-01-08 18:55:04 +00:00
John Spray	68d8acfd05	storage controller: don't hold detached tenants in memory (#10264 ) ## Problem Typical deployments of neon have some tenants that stay in use continuously, and a background churning population of tenants that are created and then fall idle, and are configured to Detached state. Currently, this churn of short lived tenants results in an ever-increasing memory footprint. Closes: https://github.com/neondatabase/neon/issues/9712 ## Summary of changes - At startup, filter to only load shards that don't have Detached policy - In process_result, check if a tenant's shards are all Detached and observed=={}, and if so drop them from memory - In tenant_location_conf and other tenant mutators, load the tenants' shards on-demand if they are not present	2025-01-08 18:12:09 +00:00
Vlad Lazar	dc284247a5	storage_controller: fix node flap detach race (#10298 ) ## Problem The observed state removal may race with the inline updates of the observed state done from `Service::node_activate_reconcile`. This was intended to work as follows: 1. Detaches while the node is unavailable remove the entry from the observed state. 2. `Service::node_activate_reconcile` diffs the locations returned by the pageserver with the observed state and detaches in-line when required. ## Summary of changes This PR removes step (1) and lets background reconciliations deal with the mismatch between the intent and observed state. A follow up will attempt to remove `Service::node_activate_reconcile` altogether. Closes https://github.com/neondatabase/neon/issues/10253	2025-01-08 10:26:53 +00:00
Alex Chi Z.	5c76e2a983	fix(storage-scrubber): ignore errors if index_part is not consistent (#10304 ) ## Problem Consider the pageserver is doing the following sequence of operations: * upload X files * update index_part to add X and remove Y * delete Y files When storage scrubber obtains the initial timeline snapshot before "update index_part" (that is the old version that contains Y but not X), and then obtains the index_part file after it gets updated, it will report all Y files are missing. ## Summary of changes Do not report layer file missing if index_part listed and downloaded are not the same (i.e. different last_modified times) Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-07 23:24:17 +00:00
Erik Grinaker	237dae71a1	Revert "pageserver,safekeeper: disable heap profiling (#10268 )" (#10303 ) This reverts commit `b33299dc37`. Heap profiles weren't the culprit after all. Touches #10225.	2025-01-07 22:49:00 +00:00
Fedor Dikarev	43a5e575d6	ci: use reusable workflow for MacOs build (#9889 ) Closes: https://github.com/neondatabase/cloud/issues/17784 ## Problem Currently, we run the whole CI pipeline for any changes. It's slow and expensive. ## Suggestion Starting with MacOs builds: - check what files were changed - rebuild only needed parts - reuse results from previous builds when available - run builds in parallel when possible --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-01-07 20:00:56 +00:00
Folke Behrens	0a117fb1f1	proxy: Parse Notification twice only for unknown topic (#10296 ) ## Problem We currently parse Notification twice even in the happy path. ## Summary of changes Use `#[serde(other)]` to catch unknown topics and defer the second parsing.	2025-01-07 15:24:54 +00:00
JC Grünhage	4aa9786c6b	Fix promote-images-prod after splitting it out (#10292 ) ## Problem `promote-images` was split into `promote-images-dev` and `promote-images-prod` in https://github.com/neondatabase/neon/pull/10267. `dev` credentials were loaded in `promote-images-dev` and `prod` credentials were loaded in `promote-images-prod`, but `promote-images-prod` needs `dev` credentials as well to access the `dev` images to replicate them from `dev` to `prod`. ## Summary of changes Load `dev` credentials in `promote-images-prod` as well.	2025-01-07 13:45:18 +00:00
Matthias van de Meent	be38123e62	Fix accounting of dropped prefetched GetPage requests (#10276 ) Apparently, we failed to do this bookkeeping in quite a few places... ## Problem Fixes https://github.com/neondatabase/cloud/issues/22364 ## Summary of changes Add accounting of dropped requests. Note that this includes prefetches dropped due to things like "PS connection dropped unexpectedly" or "prefetch queue is already full", but not (yet?) "dropped due to backend shutdown".	2025-01-07 10:41:52 +00:00
JC Grünhage	ea84ec357f	Split promote-images into promote-images-dev and promote-images-prod (#10267 ) ## Problem `trigger-e2e-tests` waits half an hour before starting to run. Nearly half of that time can be saved by promoting images before tests on them are complete, so the e2e tests can run in parallel. On `main` and `release{,-proxy,-compute}`, `promote-images` updates `latest` and pushes things to prod ecr, so we want to run `promote-images` only after `test-images` is done, but on other branches, there is no harm in promoting images that aren't tested yet. ## Summary of changes To promote images into dev container registries sooner, `promote-images` is split into `promote-images-dev` and `promote-images-prod`. The former pushes to dev container registries, the latter to prod ones. The latter also waits for `test-images`, while the former doesn't. This allows to run `trigger-e2e-tests` sooner.	2025-01-07 10:36:05 +00:00
Matthias van de Meent	30863c0104	libpagestore: timeout = max(0, difference), not min(0, difference) (#10274 ) Using `min(0, ...)` causes us to fail to wait in most situations, so a lack of data would be a hot wait loop, which is bad. ## Problem We noticed high CPU usage in some situations	2025-01-07 09:07:38 +00:00
Alexander Bayandin	02f81b6469	Fix clippy warning on macOS (#10282 ) ## Problem On macOS: ``` error: unused variable: `disable_lfc_resizing` --> compute_tools/src/bin/compute_ctl.rs:431:9 \| 431 \| disable_lfc_resizing, \| ^^^^^^^^^^^^^^^^^^^^ help: try ignoring the field: `disable_lfc_resizing: _` \| = note: `-D unused-variables` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_variables)]` ``` ## Summary of changes - Initialise `disable_lfc_resizing` only on Linux (because it's used on Linux only in further bloc)	2025-01-06 20:28:33 +00:00
Alexander Bayandin	ad7f14d526	test_runner: update packages for Python 3.13 (#10285 ) ## Problem It's impossible to run regression tests with Python 3.13 as some dependencies don't support it (some of them are outdated, and `jsonnet` doesn't support it at all yet) ## Summary of changes - Update dependencies for Python 3.13 - Install `jsonnet` only on Python < 3.13 and skip relevant tests on Python 3.13 Closes #10237	2025-01-06 20:25:31 +00:00
Erik Grinaker	b342a02b1c	Dockerfile: build with `force-frame-pointers=yes` (#10286 ) See https://github.com/neondatabase/neon/pull/10226#issuecomment-2573725182.	2025-01-06 20:17:43 +00:00
Alex Chi Z.	4a6556e269	fix(pageserver): ensure GC computes time cutoff using the same start time (#10193 ) ## Problem close https://github.com/neondatabase/neon/issues/10192 ## Summary of changes * `find_gc_time_cutoff` takes `now` parameter so that all branches compute the cutoff based on the same start time, avoiding races. * gc-compaction uses a single `get_gc_compaction_watermark` function to get the safe LSN to compact. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-01-06 19:29:18 +00:00
Erik Grinaker	95f1920231	cargo: build with frame pointers (#10226 ) ## Problem Frame pointers are typically disabled by default (depending on CPU architecture), to improve performance. This frees up a CPU register, and avoids a couple of instructions per function call. However, it makes stack unwinding much more inefficient, since it has to use DWARF debug information instead, and gives worse results with e.g. `perf` and eBPF profiles. The `backtrace` implementation of `libunwind` is also suspected to cause seg faults. The performance benefit of frame pointer omission doesn't appear to matter that much on modern 64-bit CPU architectures (which have plenty of registers and optimized instruction execution), and benchmarks did not show measurable overhead. The Rust standard library and jemalloc already enable frame pointers by default. For more information, see https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html. Resolves #10224. Touches #10225. ## Summary of changes Enable frame pointers in all builds, and use frame pointers for pprof-rs stack sampling.	2025-01-06 17:27:08 +00:00
Conrad Ludgate	fda52a0005	feat(proxy): dont trigger error alerts for unknown topics (#10266 ) ## Problem Before the holidays, and just before our code freeze, a change to cplane was made that started publishing the topics from #10197. This triggered our alerts and put us in a sticky situation as it was not an error, and we didn't want to silence the alert for the entire holidays, and we didn't want to release proxy 2 days in a row if it was not essential. We fixed it eventually by rewriting the alert based on logs, but this is not a good solution. ## Summary of changes Introduces an intermediate parsing step to check the topic name first, to allow us to ignore parsing errors for any topics we do not know about.	2025-01-06 13:05:35 +00:00
Busra Kugler	406cca643b	Update neon_fixtures.py - remove logs (#10219 ) We need to remove this line to prevent aws keys exposing in the public s3 buckets	2025-01-06 10:44:23 +00:00
dependabot[bot]	b368e62cfc	build(deps): bump jinja2 from 3.1.4 to 3.1.5 in the pip group (#10236 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-04 15:40:50 +00:00
John Spray	4b2f56862d	docker: include vanilla debian postgres client (#10269 ) ## Problem We are chasing down segfaults in the storage controller https://github.com/neondatabase/cloud/issues/21010 This is for use by the storage controller, which links dynamically with `libpq`. We currently use the neon-built libpq, but this may be unsafe for use from multi-threaded programs like the controller, as it uses a statically linked openssl Precursor to https://github.com/neondatabase/neon/pull/10258 ## Summary of changes - Include `postgresql-15` in container builds. The reason for using version 15 is simply because that is what's available in Debian 12 without adding any extra repositories, and we don't have any special need for latest version in our libpq usage.	2025-01-03 16:16:04 +00:00
Erik Grinaker	a77e87a48a	pageserver: assert that uploads don't modify indexed layers (#10228 ) ## Problem It's not legal to modify layers that are referenced by the current layer index. Assert this in the upload queue, as preparation for upload queue reordering. Touches #10096. ## Summary of changes Add a debug assertion that the upload queue does not modify layers referenced by the current index. I could be convinced that this should be a plain assertion, but will be conservative for now.	2025-01-03 16:03:19 +00:00
Erik Grinaker	1393cc668b	Revert "pageserver: revert flush backpressure (#8550 ) (#10135 )" (#10270 ) This reverts commit `f3ecd5d76a`. It is [suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759) to have caused significant read amplification in the [ingest benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs) (specifically during index creation). We will revisit an intermediate improvement here to unblock [upload parallelism](https://github.com/neondatabase/neon/issues/10096) before properly addressing [compaction backpressure](https://github.com/neondatabase/neon/issues/8390).	2025-01-03 15:38:51 +00:00
Erik Grinaker	b33299dc37	pageserver,safekeeper: disable heap profiling (#10268 ) ## Problem Since enabling continuous profiling in staging, we've seen frequent seg faults. This is suspected to be because jemalloc and pprof-rs take a stack trace at the same time, and the handlers aren't signal safe. jemalloc does this probabilistically on every allocation, regardless of whether someone is taking a heap profile, which means that any CPU profile has a chance to cause a seg fault. Touches #10225. ## Summary of changes For now, just disable heap profiles -- CPU profiles are more important, and we need to be able to take them without risking a crash.	2025-01-03 15:21:31 +00:00
John Spray	e9d30edc7f	pageserver: fix a 500 during timeline creation + shutdown (#10259 ) ## Problem The test_create_churn_during_restart test fails if timeline creation calls return 500 errors (because the API shouldn't do it), and it's sometimes failing, for example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/3ce2e7045465012e ## Summary of changes - Avoid handling UploadQueueShutDownOrStopped case as an Other (i.e. 500)	2025-01-03 13:13:22 +00:00
Arpad Müller	1303cd5d05	Fix defusing race between Tenant::shutdown and offload_timeline (#10150 ) There is a race condition between `Tenant::shutdown`'s `defuse_for_drop` loop and `offload_timeline`, where timeline offloading can insert into a tenant that is in the process of shutting down, in fact so far progressed that the `defuse_for_drop` has already been called. This prevents warn log lines of the form: ``` offloaded timeline <hash> was dropped without having cleaned it up at the ancestor ``` The solution piggybacks on the `offloaded_timelines` lock: both the defuse loop and the offloaded timeline insertion need to acquire the lock, and we know that the defuse loop only runs after the tenant has set its `TenantState` to `Stopping`. So if we hold the `offloaded_timelines` lock, and know that the `TenantState` is not `Stopping`, then we know that the defuse loop has not ran yet, and holding the lock ensures that it doesn't start running while we are inserting the offloaded timeline. Fixes #10070	2025-01-03 12:36:01 +00:00
John Spray	c08759f367	storcon: verbose logs in rare case of shards not attached yet (#10262 ) ## Problem When we do a timeline CRUD operation, we check that the shards we need to mutate are currently attached to a pageserver, by reading `generation` and `generation_pageserver` from the database. If any don't appear to be attached, we respond with a a 503 and "One or more shards in tenant is not yet attached". This is happening more often than expected, and it's not obvious with current logging what's going on: specifically which shard has a problem, and exactly what we're seeing in these persistent generation columns. (Aside: it's possible that we broke something with the change in #10011 which clears generation_pageserver when we detach a shard, although if so the mechanism isn't trivial: what should happen is that if we stamp on generation_pageserver if a reconciler is running, then it shouldn't matter because we're about to ## Summary of changes - When we are in Attached mode but find that generation_pageserver/generation are unset, output details while looping over shards.	2025-01-03 10:55:15 +00:00
John Spray	ba9722a2fd	tests: add upload wait in test_scrubber_physical_gc_ancestors (#10260 ) ## Problem We see periodic failures in `test_scrubber_physical_gc_ancestors`, where the logs show that the pageserver is creating image layers that should cause child shards to no longer reference their parents' layers, but then the scrubber runs and doesn't find any unreferenced layers.[ https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/78ea06dea6ba8dd3 From inspecting the code & test, it seems like this could be as simple as the test failing to wait for uploads before running the scrubber. It had a 2 second delay built in to satisfy the scrubbers time threshold checks, which on a lightly loaded machine would also have been easily enough for uploads to complete, but our test machines are more heavily loaded all the time. ## Summary of changes - Wait for uploads to complete after generating images layers in test_scrubber_physical_gc_ancestors, so that the scrubber should reliably see the post-compaction metadata.	2025-01-03 10:55:07 +00:00
John Spray	2d4f267983	cargo: update diesel, pq-sys (#10256 ) ## Problem Versions of `diesel` and `pq-sys` were somewhat stale. I was checking on libpq->openssl versions while investigating a segfault via https://github.com/neondatabase/cloud/issues/21010. I don't think these rust bindings are likely to be the source of issues, but we might as well freshen them as a precaution. ## Summary of changes - Update diesel to 2.2.6 - Update pq-sys to 0.6.3	2025-01-03 10:20:18 +00:00
Ivan Efremov	7a598b9842	[proxy/docs]imprv: Add local testing section to proxy README (#10230 ) Add commands to run proxy locally with the mocked control plane	2025-01-03 10:04:58 +00:00
Tristan Partin	eefad27538	Inline various migration queries (#10231 ) There was no value in saving them off to temporary variables. Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-02 22:12:56 +00:00
Em Sharnoff	cd10c719f9	compute: Add spec support for disabling LFC resizing (#10132 ) ref neondatabase/cloud#21731 ## Problem When we manually override the LFC size for particular computes, autoscaling will typically undo that because vm-monitor will resize LFC itself. So, we'd like a way to make vm-monitor not set LFC size — this actually already exists, if we just don't give vm-monitor a postgres connection string. ## Summary of changes Add a new field to the compute spec, `disable_lfc_resizing`. When set to `true`, we pass in `None` for its postgres connection string. That matches the configuration tested in `neondatabase/autoscaling` CI.	2025-01-02 19:45:59 +00:00
Tristan Partin	363ea97f69	Add more substantial tests for compute migrations (#9811 ) The previous tests really didn't do much. This set should be quite a bit more encompassing. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-02 18:37:50 +00:00
Conrad Ludgate	56e6ebfe17	chore: building compute_tools and local_proxy together (#10257 ) ## Problem Building local_proxy and compute_tools features the same dependency tree, but as they are currently built in separate clean layers all that progress is wasted. For our arm builds that's an extra 10 minutes. ## Summary of changes Combines the compute_tools and local_proxy build layers.	2025-01-02 16:05:14 +00:00
Raphael 'kena' Poss	1622fd8bda	proxy: recognize but ignore the 3 new redis message types (#10197 ) ## Problem https://neondb.slack.com/archives/C085MBDUSS2/p1734604792755369 ## Summary of changes Recognize and ignore the 3 new broadcast messages: - `/block_public_or_vpc_access_updated` - `/allowed_vpc_endpoints_updated_for_org` - `/allowed_vpc_endpoints_updated_for_projects`	2025-01-02 16:02:48 +00:00
Konstantin Knizhnik	8c7dcd2598	Set heartbeat interval for chaos test (#10222 ) ## Problem See https://neondb.slack.com/archives/C033RQ5SPDH/p1734707873215729 test_timeline_archival_chaos becomes more flaky with increased heartbeat interval Resolves #10250. ## Summary of changes Override heatbeat interval for `test_timelirn_archival_chaos.py` --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-02 14:14:18 +00:00
Folke Behrens	ee22d4c9ef	proxy: Set TCP_NODELAY for compute connections (#10240 ) neondatabase/cloud#19184	2025-01-02 13:32:24 +00:00
JC Grünhage	26600f2973	Skip running clippy without default features (#10098 ) ## Problem Running clippy with `cargo hack --feature-powerset` in CI isn't particularly fast. This PR follows-up on https://github.com/neondatabase/neon/pull/8912 to improve the speed of our clippy runs. Parallelism as suggested in https://github.com/neondatabase/neon/issues/9901 was tested, but didn't show consistent enough improvements to be worth it. It actually increased the amount of work done, as there's less cache hits when clippy runs are spread out over multiple target directories. Additionally, parallelism makes it so caching needs to be thought about more actively and copying around target directories to enable parallelism eats the rest of the performance gains from parallel execution. After some discussion, the decision was to instead cut down on the number of jobs that are running further. The easiest way to do this is to not run clippy without default features. The list of default features is empty for all crates, and I haven't found anything using `cfg(feature = "default")` either, so this is likely not going to change anything except speeding the runs up. ## Summary of changes Reduce the amount of feature combinations tried by `cargo hack` (as suggested in https://github.com/neondatabase/neon/pull/8912#pullrequestreview-2286482368) by never disabling default features. ## Alternatives - We can split things out into different jobs which reduces the time until everything is finished by running more things in parallel. This does however decreases the amount of cache hits and increases the amount of time spent on overhead tasks like repo cloning and restoring caches by doing those multiple times instead of once. - We could replace `cargo hack [...] clippy` with `cargo clippy [...]; cargo clippy --features testing`. I'm not 100% sure how this compares to the change here in the PR, but it does seem to run a bit faster. That likely means it's doing less work, but without understanding what exactly we loose by that I'd rather not do that for now. I'd appreciate input on this though.	2025-01-02 11:33:42 +00:00
Konstantin Knizhnik	b3cd883f93	Unlock LFC mutex when LFC cache is disabled (#10235 ) ## Problem See https://github.com/neondatabase/neon/issues/10233 `lfc_containsv` returns with holding lock when LFC was disabled. This bug was introduced in commit `78938d1b59` ## Summary of changes Release lock before return. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-02 11:28:15 +00:00
Conrad Ludgate	38c7a2abfc	chore(proxy): pre-load native tls certificates and propagate compute client config (#10182 ) Now that we construct the TLS client config for cancellation as well as connect, it feels appropriate to construct the same config once and re-use it elsewhere. It might also help should #7500 require any extra setup, so we can easily add it to all the appropriate call sites.	2025-01-02 09:36:13 +00:00
Conrad Ludgate	f94248a594	chore(libs/proxy): refactor tokio-postgres connection control flow (#10247 ) In #10207 it was clear there was some confusion with the current connection logic. To analyse the flow to make sure there was no poll stalling, I ended up with the following refactor. Notable changes: 1. Now all functions called `poll_xyz` and that have a `cx: &mut Context` argument must return a `Poll<_>` type, and can only return `Pending` iff an internal poll call also returned `Pending` 2. State management is handled entirely by `poll_messages`. There are now only 2 states which makes it much easier to keep track of. Each commit should be self-reviewable and should be simple to verify that it keeps the same behaviour	2025-01-02 09:35:28 +00:00
Alex Chi Z.	9c53b41245	fix(pageserver): update remote latest_gc_cutoff after gc-compaction (#10209 ) ## Problem close https://github.com/neondatabase/neon/issues/10208 part of #9114 ## Summary of changes * Ensure remote `latest_gc_cutoff` is up-to-date before removing any files for gc-compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-19 18:40:20 +00:00
Konstantin Knizhnik	197a89ab3d	Increase default stotrage controller heartbeat interval from 100msec … (#10206 ) ## Problem Currently default value of storage controller heartbeat interval is 100msec. It means that 10 times per second it establish connection to PS. And it seems to be quite expensive. At MacOS right now storage_controller consumes 70% CPU and trusts - 30%. So together they completely utilize one core. A lot of us has Macs. Let's save environment a little bit and do not waste electricity and contribute to global warming. By the way, on prod we have interval 10seconds ## Summary of changes Increase heartbeat interval from 100msec to 1 second. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-19 18:32:32 +00:00
Alex Chi Z.	b89e02f3e8	fix(pageserver): consider partial compaction layer map in layer check (#10044 ) ## Problem In https://github.com/neondatabase/neon/pull/9897 we temporarily disabled the layer valid check because the current one only considers the end result of all compaction algorithms, but partial gc-compaction would temporarily produce an "invalid" layer map. part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes Allow LSN splits to overlap in the slow path check. Currently, the valid check is only used in storage scrubber (background job) and during gc-compaction (without taking layer lock). Therefore, it's fine for such checks to be a little bit inefficient but more accurate. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-19 18:04:53 +00:00
Konstantin Knizhnik	04517c6ff3	Do not reload config file on PS reconnect (#10204 ) ## Problem See https://github.com/neondatabase/neon/issues/10184 and https://neondb.slack.com/archives/C04DGM6SMTM/p1733997259898819 Reloading config file inside parallel worker cause it's termination ## Summary of changes Remove call of `HandleMainLoopInterrupts()` Update of page server URL is propagated by postmaster through shared memory and we should not reload config for it. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-19 15:22:39 +00:00
Vlad Lazar	628451d68e	safekeeper: short-circuit interpreted wal sender (#10202 ) ## Problem Safekeeper may currently send a batch to the pageserver even if it hasn't decoded a new record. I think this is quite unlikely in the field, but worth adressing. ## Summary of changes Don't send anything if we haven't decoded a full record. Once this merges and releases, the `InterpretedWalRecords` struct can be updated to remove the Option wrapper for `next_record_lsn`.	2024-12-19 14:04:46 +00:00
Vlad Lazar	502d512fe2	safekeeper: lift benchmarking utils into safekeeper crate (#10200 ) ## Problem The benchmarking utilities are also useful for testing. We want to write tests in the safekeeper crate. ## Summary of changes This commit lifts the utils to the safekeeper crate. They are compiled if the benchmarking features is enabled or if in test mode.	2024-12-19 14:04:42 +00:00
John Spray	afda6d4700	storage_scrubber: don't report half-created timelines as corruption (#10198 ) ## Problem test_timeline_archival_chaos does timeline creation with failure injection, and thereby sometimes leaves timelines in a part created state. This was being reported as corruption by the scrubber on test teardown, because it considered a layer without an index to be an invalid state. This was incorrect: the scrubber should accept this state, it occurs legitimately during timeline creation. Closes: https://github.com/neondatabase/neon/issues/9988 ## Summary of changes - Report a timeline with layers but no index as Relic rather than MissingIndexPart. - We retain the MissingIndexPart variant for the case where an index _was_ found in the listing, but was not found by a subsequent GET, i.e. racing with deletion.	2024-12-19 12:55:05 +00:00
John Spray	65042cbadd	tests: use high IO concurrency in `test_pgdata_import_smoke`, use `effective_io_concurrency=2` in tests by default (#10114 ) ## Problem `test_pgdata_import_smoke` writes two gigabytes of pages and then reads them back serially. This is CPU bottlenecked and results in a long runtime, and sensitivity to CPU load from other tests on the same machine. Closes: https://github.com/neondatabase/neon/issues/10071 ## Summary of changes - Use effective_io_concurrency=32 when doing sequential scans through 2GiB of pages in test_pgdata_import_smoke. This is a ~10x runtime decrease in the parts of the test that do sequential scans. - Also set `effective_io_concurrency=2` for tests, as I noticed while debugging that we were doing all getpage requests serially, which is bad for checking the stability of the batching code.	2024-12-19 10:58:49 +00:00
Folke Behrens	b135194090	proxy: Delay SASL complete message until auth is done (#10189 ) The final SASL complete message can be bundled with the remainder of the auth flow messages until ReadyForQuery. neondatabase/cloud#19184	2024-12-19 10:37:08 +00:00
Peter Bendel	43dc03459d	Run pgbench on 10 GB scale factor on database with n relations (e.g. 10k) (#10172 ) ## Problem We want to verify how much / if pgbench throughput and latency on Neon suffers if the database contains many other relations, too. ## Summary of changes Modify the benchmarking.yml pgbench-compare job to - create an addiitional project at scale factor 10 GiB - before running pgbench add n tables (initially 10k) to the database - then compare the pgbench throughput and latency to the existing pgbench-compare at 10 Gib scale factor We use a realistic template for the n relations that is a partitioned table with some realistic data types, indexes and constraints - similar to a table that we use internally. Example run: https://github.com/neondatabase/neon/actions/runs/12377565956/job/34547386959	2024-12-19 10:25:44 +00:00
Christian Schwarz	a1b0558493	fast import: importer: use aws s3 cli (#10162 ) ## Problem s5cmd doesn't pick up the pod service account ``` 2024/12/16 16:26:01 Ignoring, HTTP credential provider invalid endpoint host, "169.254.170.23", only loopback hosts are allowed. <nil> ERROR "ls s3://neon-dev-bulk-import-us-east-2/import-pgdata/fast-import/v1/br-wandering-hall-w2xobawv": NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors ``` ## Summary of changes Switch to offical CLI. ## Testing Tested the pre-merge image in staging, using `job_image` override in project settings. https://neondb.slack.com/archives/C033RQ5SPDH/p1734554944391949?thread_ts=1734368383.258759&cid=C033RQ5SPDH ## Future Work Switch back to s5cmd once https://github.com/peak/s5cmd/pull/769 gets merged. ## Refs - fixes https://github.com/neondatabase/cloud/issues/21876 --------- Co-authored-by: Gleb Novikov <NanoBjorn@users.noreply.github.com>	2024-12-19 10:04:17 +00:00
Alex Chi Z.	cc138b56f9	fix(pageserver): run psql in thread to avoid blocking (#10177 ) ## Problem ref https://github.com/neondatabase/neon/issues/10170 ref https://github.com/neondatabase/neon/issues/9994 The psql command will block the main thread, causing other async tasks to timeout (i.e., HTTP connect). Therefore, we need to move it to an I/O executor thread. ## Summary of changes * run psql connection in a thread --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2024-12-19 09:45:06 +00:00
Konstantin Knizhnik	61fcf64c22	Fix flukyness of test_physical_and_logical_replicaiton.py (#10176 ) ## Problem See https://github.com/neondatabase/neon/issues/10037 test_physical_and_logical_replication.py sometimes failed. ## Summary of changes Add `wait_replica_caughtup` to wait for replica sync Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-18 19:15:38 +00:00
Alex Chi Z.	6d3e8096fc	refactor(test): tighten up test_gc_feedback (#10126 ) ## Problem In https://github.com/neondatabase/neon/pull/8103 we changed the test case to have more test coverage of gc_compaction. Now that we have `test_gc_compaction_smoke`, we can revert this test case to serve its original purpose and revert the parameter changes. part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Revert pitr_interval from 60s to 10s. * Assert the physical/logical size ratio in the benchmark. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-18 18:10:05 +00:00
Alex Chi Z.	3d1c3a80ae	feat(pageserver): add compact queue http endpoint (#10173 ) ## Problem We cannot get the size of the compaction queue and access the info. Part of #9114 ## Summary of changes * Add an API endpoint to get the compaction queue. * gc_compaction test case now waits until the compaction finishes. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-18 18:09:02 +00:00
John Spray	835287ba3a	neon_local: add a `flock` to protect against concurrent execution (#10185 ) ## Problem `neon_local` has always been unsafe to run concurrently with itself: it uses simple text files for persistent state, and concurrent runs will step on each other. In some test environments we intentionally handle this with mutexes in python land, but it's fragile to try and always remember to do that. ## Summary of changes - Add a `flock` based mutex around the `main` function of neon_local, using the repo directory as the file to lock - Clean up an Option<> around control_plane_api, this is a drive-by change because it was one of the fields that had a weird effect when previous concurrent stuff stamped on it.	2024-12-18 16:29:47 +00:00
Conrad Ludgate	d63602cc78	chore(proxy): fully remove allow-self-signed-compute flag (#10168 ) When https://github.com/neondatabase/cloud/pull/21856 is merged, this flag is no longer necessary.	2024-12-18 16:03:14 +00:00
Erik Grinaker	1668d39b7c	safekeeper: fix typo in allowlist for `/profile/heap` (#10186 )	2024-12-18 15:51:53 +00:00
Alex Chi Z.	1d12efc428	fix(pageserver): allow repartition errors during gc-compaction smoke tests (#10164 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 In https://github.com/neondatabase/neon/pull/10127 we fixed the race, but we didn't add the errors to the allowlist. ## Summary of changes * Allow repartition errors in the gc-compaction smoke test. I think it might be worth to refactor the code to allow multiple threads getting a copy of repartition status (i.e., using Rcu) in the future. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-18 15:37:26 +00:00
Arpad Müller	85696297c5	Add safekeepers command to storcon_cli for listing (#10151 ) Add a `safekeepers` subcommand to `storcon_cli` that allows listing the safekeepers. ``` $ curl -X POST --url http://localhost:1234/control/v1/safekeeper/42 --data \ '{"active":true, "id":42, "created_at":"2023-10-25T09:11:25Z", "updated_at":"2024-08-28T11:32:43Z","region_id":"neon_local","host":"localhost","port":5454,"http_port":0,"version":123,"availability_zone_id":"us-east-2b"}' $ cargo run --bin storcon_cli -- --api http://localhost:1234 safekeepers Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.38s Running `target/debug/storcon_cli --api 'http://localhost:1234' safekeepers` +----+---------+-----------+------+-----------+------------+ \| Id \| Version \| Host \| Port \| Http Port \| AZ Id \| +==========================================================+ \| 42 \| 123 \| localhost \| 5454 \| 0 \| us-east-2b \| +----+---------+-----------+------+-----------+------------+ ``` Also: * Don't return the raw `SafekeeperPersistence` struct that contains the raw database presentation, but instead a new `SafekeeperDescribeResponse` struct. * The `SafekeeperPersistence` struct leaves out the `active` field on purpose because we want to deprecate it and replace it with a `scheduling_policy` one. Part of https://github.com/neondatabase/neon/issues/9981	2024-12-18 12:47:56 +00:00
Konstantin Knizhnik	aaf980f70d	Online checkpoint replication state (#9976 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1733180965970089 Replication state is checkpointed only by shutdown checkpoint. It means that replication snapshots are not removed till compute shutdown. ## Summary of changes Checkpoint replication state during online checkpoint Related Postgres PR: https://github.com/neondatabase/postgres/pull/546 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-18 09:34:38 +00:00
Conrad Ludgate	a354071dd0	Merge pull request #10180 from neondatabase/rc/release-proxy/2024-12-17 Proxy release 2024-12-17	2024-12-18 06:31:05 +00:00
github-actions[bot]	758680d4f8	Proxy release 2024-12-17	2024-12-17 22:06:42 +00:00
a-masterov	c52514ab02	Fix allure report creation on periodic `pg_regress` testing (#10171 ) ## Problem The allure report finishes with the error `HttpError: Resource not accessible by integration` while running the `pg_regress` test against a cloud staging project due to a lack of permissions. ## Summary of changes The permissions are added.	2024-12-17 20:47:44 +00:00
Conrad Ludgate	2ee6bc5ec4	chore(proxy): update vendored postgres libs to edition 2021 (#10139 ) I ran `cargo fix --edition` in each project prior, and it found nothing that needed fixing.	2024-12-17 20:06:18 +00:00
John Spray	fd230227f2	storcon: include preferred AZ in compute notifications (#9953 ) ## Problem It is unreliable for the control plane to infer the AZ for computes from where the tenant is currently attached, because if a tenant happens to be in a degraded state or a release is ongoing while a compute starts, then the tenant's attached AZ can be a different one to where it will run long-term, and the control plane doesn't check back later to restart the compute. This can land in parallel with https://github.com/neondatabase/neon/pull/9947 ## Summary of changes - Thread through the preferred AZ into the compute hook code via the reconciler - Include the preferred AZ in the body of compute hook notifications	2024-12-17 20:04:09 +00:00
Ivan Efremov	93e958341f	[proxy]: Use TLS for cancellation queries (#10152 ) ## Problem pg_sni_router assumes that all the streams are upgradable to TLS. Cancellation requests were declined because of using NoTls config. ## Summary of changes Provide TLS client config for cancellation requests. Fixes [#21789](https://github.com/orgs/neondatabase/projects/65/views/1?pane=issue&itemId=90911361&issue=neondatabase%7Ccloud%7C21789)	2024-12-17 19:26:54 +00:00
Tristan Partin	7dddbb9570	Add pg_repack extension (#10100 ) Our solutions engineers and some customers would like to have this extension available. Link: https://github.com/neondatabase/cloud/issues/18890 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-17 18:36:55 +00:00
Erik Grinaker	a55853f67f	utils: symbolize heap profiles (#10153 ) ## Problem Jemalloc heap profiles aren't symbolized. This is inconvenient, and doesn't work with Grafana Cloud Profiles. Resolves #9964. ## Summary of changes Symbolize the heap profiles in-process, and strip unnecessary cruft. This uses about 100 MB additional memory to cache the DWARF information, but I believe this is already the case with CPU profiles, which use the same library for symbolization. With cached DWARF information, the symbolization CPU overhead is negligible. Example profiles: * [pageserver.pb.gz](https://github.com/user-attachments/files/18141395/pageserver.pb.gz) * [safekeeper.pb.gz](https://github.com/user-attachments/files/18141396/safekeeper.pb.gz)	2024-12-17 16:51:58 +00:00
Mikhail Kot	007b13b79a	Don't build tests in compute image, use ninja (#10149 ) Don't build tests in h3 and rdkit: ~15 min speedup. Use Ninja as cmake generator where possible: ~10 min speedup. Clean apt cache for smaller images: around 250mb size loss for intermediate layers	2024-12-17 16:43:54 +00:00
Alexey Kondratov	2dfd3cab8c	fix(compute): Report compute_backpressure_throttling_seconds as counter (#10125 ) ## Problem It was reported as `gauge`, but it's actually a `counter`. Also add `_total` suffix as that's the convention for counters. The corresponding flux-fleet PR: https://github.com/neondatabase/flux-fleet/pull/386	2024-12-17 16:14:07 +00:00
John Spray	b5833ef259	remote_storage: configurable connection pooling for ABS (#10169 ) ## Problem The ABS SDK's default behavior is to do no connection pooling, i.e. open and close a fresh connection for each request. Under high request rates, this can result in an accumulation of TCP connections in TIME_WAIT or CLOSE_WAIT state, and in extreme cases exhaustion of client ports. Related: https://github.com/neondatabase/cloud/issues/20971 ## Summary of changes - Add a configurable `conn_pool_size` parameter for Azure storage, defaulting to zero (current behavior) - Construct a custom reqwest client using this connection pool size.	2024-12-17 12:24:51 +00:00
Erik Grinaker	b0e43c2f88	postgres_ffi: add `WalStreamDecoder::complete_record()` benchmark (#10158 ) Touches #10097.	2024-12-17 10:35:00 +00:00
a-masterov	e226d7a3d1	Fix docker compose with PG17 (#10165 ) ## Problem It's impossible to run docker compose with compute v17 due to `pg_anon` extension which is not supported under PG17. ## Summary of changes The auto-loading of `pg_anon` is disabled by default	2024-12-17 08:16:54 +00:00
Folke Behrens	aa7ab9b3ac	proxy: Allow dumping TLS session keys for debugging (#10163 ) ## Problem To debug issues with TLS connections there's no easy way to decrypt packets unless a client has special support for logging the keys. ## Summary of changes Add TLS session keys logging to proxy via `SSLKEYLOGFILE` env var gated by flag.	2024-12-16 18:56:24 +00:00
Erik Grinaker	28ccda0a63	test_runner: ignore error in `test_timeline_archival_chaos` (#10161 ) Resolves #10159.	2024-12-16 17:10:55 +00:00
Conrad Ludgate	59b7ff8988	chore(proxy): disallow unwrap and unimplemented (#10142 ) As the title says, I updated the lint rules to no longer allow unwrap or unimplemented. Three special cases: * Tests are allowed to use them * std::sync::Mutex lock().unwrap() is common because it's usually correct to continue panicking on poison * `tokio::spawn_blocking(...).await.unwrap()` is common because it will only error if the blocking fn panics, so continuing the panic is also correct I've introduced two extension traits to help with these last two, that are a bit more explicit so they don't need an expect message every time.	2024-12-16 16:37:15 +00:00
Conrad Ludgate	2e4c9c5704	chore(proxy): remove allow_self_signed from regular proxy (#10157 ) I noticed that the only place we use this flag is for testing console redirect proxy. Makes sense to me to make this assumption more explicit.	2024-12-16 16:11:39 +00:00
Erik Grinaker	3d30a7a934	pageserver: make `RemoteTimelineClient::schedule_index_upload` infallible (#10155 ) Remove an unnecessary `Result` and address a `FIXME`.	2024-12-16 15:54:47 +00:00
Conrad Ludgate	6565fd4056	chore: fix clippy lints 2024-12-06 (#10138 )	2024-12-16 15:33:21 +00:00
Arseny Sher	c5e3314c6e	Add test restarting compute at WAL page boundary (#10111 ) ## Problem We've had similar test in test_logical_replication, but then removed it because it wasn't needed to trigger LR related bug. Restarting at WAL page boundary is still a useful test, so add it separately back. ## Summary of changes Add the test.	2024-12-16 14:53:04 +00:00
Arseny Sher	1ed0e52bc8	Extract safekeeper http client to separate crate. (#10140 ) ## Problem We want to use safekeeper http client in storage controller and neon_local. ## Summary of changes Extract it to separate crate. No functional changes.	2024-12-16 12:07:24 +00:00
Conrad Ludgate	24d6587914	chore(proxy): refactor self-signed config (#10154 ) ## Problem While reviewing #10152 I found it tricky to actually determine whether the connection used `allow_self_signed_compute` or not. I've tried to remove this setting in the past: * https://github.com/neondatabase/neon/pull/7884 * https://github.com/neondatabase/neon/pull/7437 * https://github.com/neondatabase/cloud/pull/13702 But each time it seems it is used by e2e tests ## Summary of changes The `node_info.allow_self_signed_computes` is always initialised to false, and then sometimes inherits the proxy config value. There's no need this needs to be in the node_info, so removing it and propagating it via `TcpMechansim` is simpler.	2024-12-16 11:15:25 +00:00
John Spray	ebcbc1a482	pageserver: tighten up code around SLRU dir key handling (#10082 ) ## Problem Changes in #9786 were functionally complete but missed some edges that made testing less robust than it should have been: - `is_key_disposable` didn't consider SLRU dir keys disposable - Timeline `init_empty` was always creating SLRU dir keys on all shards The result was that when we had a bug (https://github.com/neondatabase/neon/pull/10080), it wasn't apparent in tests, because one would only encounter the issue if running on a long-lived timeline with enough compaction to drop the initially created empty SLRU dir keys, _and_ some CLog truncation going on. Closes: https://github.com/neondatabase/cloud/issues/21516 ## Summary of changes - Update is_key_global and init_empty to handle SLRU dir keys properly -- the only functional impact is that we avoid writing some spurious keys in shards >0, but this makes testing much more robust. - Make `test_clog_truncate` explicitly use a sharded tenant The net result is that if one reverts #10080, then tests fail (i.e. this PR is a reproducer for the issue)	2024-12-16 10:06:08 +00:00
Konstantin Knizhnik	117c1b5dde	Do not perform prefetch for temp relations (#10146 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1734002916827019 With recent prefetch fixes for pg17 and `effective_io_concurrency=100` pg_regress test stats.sql is failed when set temp_buffers to 100. Stream API will try to lock all this 100 buffers for prefetch. ## Summary of changes Disable such behaviour for temp relations. Postgres PR: https://github.com/neondatabase/postgres/pull/548 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-16 06:03:53 +00:00
Erik Grinaker	f3ecd5d76a	pageserver: revert flush backpressure (#8550 ) (#10135 ) ## Problem In #8550, we made the flush loop wait for uploads after every layer. This was to avoid unbounded buildup of uploads, and to reduce compaction debt. However, the approach has several problems: * It prevents upload parallelism. * It prevents flush and upload pipelining. * It slows down ingestion even when there is no need to backpressure. * It does not directly backpressure WAL ingestion (only via `disk_consistent_lsn`), and will build up in-memory layers. * It does not directly backpressure based on compaction debt and read amplification. An alternative solution to these problems is proposed in #8390. In the meanwhile, we revert the change to reduce the impact on ingest throughput. This does reintroduce some risk of unbounded upload/compaction buildup. Until https://github.com/neondatabase/neon/issues/8390, this can be addressed in other ways: * Use `max_replication_apply_lag` (aka `remote_consistent_lsn`), which will more directly limit upload debt. * Shard the tenant, which will spread the flush/upload work across more Pageservers and move the bottleneck to Safekeeper. Touches #10095. ## Summary of changes Remove waiting on the upload queue in the flush loop.	2024-12-15 09:45:12 +00:00
Mikhail Kot	cf161e1556	fix(adapter): password not set in role drop (#10130 ) ## Problem When entry was dropped and password wasn't set, new entry had uninitialized memory in controlplane adapter Resolves: https://github.com/neondatabase/cloud/issues/14914 ## Summary of changes Initialize password in all cases, add tests. Minor formatting for less indentation	2024-12-14 17:37:13 +00:00
Konstantin Knizhnik	2521eba674	Check for invalid down link while prefetching B-Tree leave pages for index-only scan (#9867 ) ## Problem See #9866 Index-only scan prefetch implementation doesn't take in account that down link may be invalid ## Summary of changes Check that downlink is valid block number Correspondent Postgres PRs: https://github.com/neondatabase/postgres/pull/534 https://github.com/neondatabase/postgres/pull/535 https://github.com/neondatabase/postgres/pull/536 https://github.com/neondatabase/postgres/pull/537 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-13 20:46:41 +00:00
Alexander Bayandin	d56fea680e	CI: always require aws-oicd-role-arn input to be set (#10145 ) ## Problem `benchmarking` job fails because `aws-oicd-role-arn` input is not set ## Summary of changes: - Set `aws-oicd-role-arn` for `benchmarking job - Always require `aws-oicd-role-arn` to be set - Rename `aws_oicd_role_arn` to `aws-oicd-role-arn` for consistency	2024-12-13 19:56:32 +00:00
Alex Chi Z.	7ee5dca752	fix(pageserver): race between gc-compaction and repartition (#10127 ) ## Problem close https://github.com/neondatabase/neon/issues/10124 gc-compaction split_gc_jobs is holding the repartition lock for too long time. ## Summary of changes * Ensure split_gc_compaction_jobs drops the repartition lock once it finishes cloning the structures. * Update comments. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-13 18:22:25 +00:00
Tristan Partin	07d1db54b3	Improve comments and log messages in the logical replication monitor (#9974 ) Improved comments will help others when they read the code, and the log messages will help others understand why the logical replication monitor works the way it does. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-13 18:10:42 +00:00
Konstantin Knizhnik	eeabecd89f	Correctly update LFC used_pages in case of LFC resize (#10128 ) ## Problem LFC used_pages statistic is not updated in case of LFC resize (shrinking `neon.file_cache_size_limit`) ## Summary of changes Update `lfc_ctl->used_pages` in `lfc_change_limit_hook` Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-13 17:40:26 +00:00
Christian Schwarz	fcff752851	fix(test_timeline_archival_chaos): flakiness caused by orphan layers (#10083 ) The test was failing with the scary but generic message `Remote storage metadata corrupted`. The underlying scrubber error is `Orphan layer detected: ...`. The test kills pageserver at random points, hence it's expected that we leak layers if we're killed in the window after layer upload but before it's referenced from index part. Refer to generation numbers RFC for details. Refs: - fixes https://github.com/neondatabase/neon/issues/9988 - root-cause analysis https://github.com/neondatabase/neon/issues/9988#issuecomment-2520673167	2024-12-13 16:28:21 +00:00
Alexander Bayandin	2c91062828	test_prefetch: reduce timeout to default 5m from 10m (#10105 ) ## Problem `test_prefetch` is flaky (https://github.com/neondatabase/neon/issues/9961), but if it passes, the run time is less than 30 seconds — we don't need an extended timeout for it. ## Summary of changes - Remove extended test timeout for `test_prefetch`	2024-12-13 14:52:54 +00:00
Arseny Sher	ce8eb089f3	Extract public sk types to safekeeper_api (#10137 ) ## Problem We want to extract safekeeper http client to separate crate for use in storage controller and neon_local. However, many types used in the API are internal to safekeeper. ## Summary of changes Move them to safekeeper_api crate. No functional changes. ref https://github.com/neondatabase/neon/issues/9011	2024-12-13 14:06:27 +00:00
a-masterov	7dc382601c	Fix pg_regress tests on a cloud staging instance (#10134 ) ## Problem pg_regress tests start failing due to unique ids added to Neon error messages ## Summary of changes Patches updated	2024-12-13 13:59:04 +00:00
Rahul Patil	2451969d5c	fix(ci): Allow github-action-script to post reports (#10136 ) Allow github-action-script to post reports. Failed CI: https://github.com/neondatabase/neon/actions/runs/12304655364/job/34342554049#step:13:514	2024-12-13 12:22:15 +00:00
JC Grünhage	59ef701925	CI(deploy): fix git tag/release creation (#10119 ) ## Problem When moving the comment on proxy-releases from the yaml doc into a javascript code block, I missed converting the comment marker from `#` to `//`. ## Summary of changes Correctly convert comment marker.	2024-12-12 23:38:20 +00:00
Alexander Bayandin	ac04bad457	CI: don't run debug builds with LFC (#10123 ) ## Problem I've noticed that debug builds with LFC fail more frequently and for some reason ,their failure do block merging (but it should not) ## Summary of changes - Do not run Debug builds with LFC	2024-12-12 22:55:38 +00:00
Peter Bendel	2f3f98a319	use OIDC role instead of AWS access keys for managing test runner (#10117 ) in periodic pagebench workflow ## Problem for background see https://github.com/neondatabase/cloud/issues/21545 ## Summary of changes use OIDC role to manage runners instead of AWS access key which needs to be periodically rotated ## logs seems to work in https://github.com/neondatabase/neon/actions/runs/12298575888/job/34322306127#step:6:1	2024-12-12 20:25:39 +00:00
Alex Chi Z.	5ff4b991c7	feat(pageserver): gc-compaction split over LSN (#9900 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114, stacked PR over https://github.com/neondatabase/neon/pull/9897, partially refactored to help with https://github.com/neondatabase/neon/issues/10031 ## Summary of changes * gc-compaction takes `above_lsn` parameter. We only compact the layers above this LSN, and all data below the LSN are treated as if they are on the ancestor branch. * refactored gc-compaction to take `GcCompactJob` that describes the rectangular range to be compacted. * Added unit test for this case. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-12 20:23:24 +00:00
John Spray	a93e3d31cc	storcon: refine logic for choosing AZ on tenant creation (#10054 ) ## Problem When we update our scheduler/optimization code to respect AZs properly (https://github.com/neondatabase/neon/pull/9916), the choice of AZ becomes a much higher-stakes decision. We will pretty much always run a tenant in its preferred AZ, and that AZ is fixed for the lifetime of the tenant (unless a human intervenes) Eventually, when we do auto-balancing based on utilization, I anticipate that part of that will be to automatically change the AZ of tenants if our original scheduling decisions have caused imbalance, but as an interim measure, we can at least avoid making this scheduling decision based purely on which AZ contains the emptiest node. This is a precursor to https://github.com/neondatabase/neon/pull/9947 ## Summary of changes - When creating a tenant, instead of scheduling a shard and then reading its preferred AZ back, make the AZ decision first. - Instead of choosing AZ based on which node is emptiest, use the median utilization of nodes in each AZ to pick the AZ to use. This avoids bad AZ decisions during periods when some node has very low utilization (such as after replacing a dead node) I considered also making the selection a weighted pseudo-random choice based on utilization, but wanted to avoid destabilising tests with that for now.	2024-12-12 19:35:38 +00:00
Rahul Patil	6d5687521b	fix(ci): Allow github-script to post test reports (#10120 ) Allow github-script to post test reports	2024-12-12 18:53:35 +00:00
Heikki Linnakangas	53721266f1	Disable connection logging in pgbouncer by default (#10118 ) It can produce a lot of logs, making pgbouncer itself consume all CPU in extreme cases. We saw that happen in stress testing.	2024-12-12 17:05:58 +00:00
a-masterov	2f3433876f	Change the channel for notification. (#10112 ) ## Problem Now notifications about failures in `pg_regress` tests run on the staging cloud instance, reach the channel `on-call-staging-stream`, while they should reach `on-call-qa-staging-stream` ## Summary of changes The channel changed.	2024-12-12 16:34:07 +00:00
Rahul Patil	58d45c6e86	ci(fix): Use OIDC auth to login on ECR (#10055 ) ## Problem CI currently uses static credentials in some places. These are less secure and hard to maintain, so we are going to deprecate them and use OIDC auth. ## Summary of changes - ci(fix): Use OIDC auth to upload artifact on s3 - ci(fix): Use OIDC auth to login on ECR	2024-12-12 15:13:08 +00:00
Conrad Ludgate	e502e880b5	chore(proxy): remove code for old API (#10109 ) ## Problem Now that https://github.com/neondatabase/cloud/issues/15245 is done, we can remove the old code. ## Summary of changes Removes support for the ManagementV2 API, in favour of the ProxyV1 API.	2024-12-12 13:42:50 +00:00
Arseny Sher	c9a773af37	Fix test_subscriber_synchronous_commit flakiness. (#10057 ) `6f7aeaa` configured LFC for USE_LFC case, but omitted setting shared_buffers for non USE_LFC, causing flakiness. ref https://github.com/neondatabase/neon/issues/9989	2024-12-12 11:57:00 +00:00
Vlad Lazar	ec0ce06c16	tests: default interpreted proto in tests (#10079 ) ## Problem We aren't using the sharded interpreted wal receiver protocol in all tests. ## Summary of changes Default to the interpreted protocol.	2024-12-12 10:53:10 +00:00
Conrad Ludgate	1738fd0a96	Merge pull request #10107 from neondatabase/rc/release-proxy/2024-12-12 Proxy release 2024-12-12	2024-12-12 10:21:30 +00:00
Conrad Ludgate	87b7edfc72	Merge branch 'release-proxy' into rc/release-proxy/2024-12-12	2024-12-12 09:58:31 +00:00
Alexander Bayandin	0bd8eca9ca	Storage: create release PRs On Fridays (#10017 ) ## Problem To give Storage more time on preprod — create a release branch on Friday ## Summary of changes - Automatically create Storage release PR on Friday instead of Monday	2024-12-12 09:18:50 +00:00
Misha Sakhnov	739f627b96	Bump vm-builder v0.35.0 -> v0.37.1 (#10015 ) Bump version to pick up changes introduced in the neonvm-daemon to support sys fs based CPU scaling (https://github.com/neondatabase/autoscaling/issues/1082). Previous update: https://github.com/neondatabase/neon/pull/9208	2024-12-12 08:45:52 +00:00
github-actions[bot]	def05700d5	Proxy release 2024-12-12	2024-12-12 06:02:08 +00:00
Arpad Müller	342cbea255	storcon: add safekeeper list API (#10089 ) This adds an API to the storage controller to list safekeepers registered to it. This PR does a `diesel print-schema > storage_controller/src/schema.rs` because of an inconsistency between up.sql and schema.rs, introduced by [this](`2c142f14f7`) commit, so there is some updates of `schema.rs` due to that. As a followup to this, we should maybe think about running `diesel print-schema` in CI. Part of #9981	2024-12-12 01:09:24 +00:00
Tristan Partin	b391b29bdc	Improve typing in test_runner/fixtures/httpserver.py (#10103 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-11 22:21:42 +00:00
Erik Grinaker	5126ebbfed	test_runner: bump test_check_visibility_map timeout (#10091 ) ## Problem `test_check_visibility_map` has been seen to time out in debug tests. ## Summary of changes Bump the timeout to 10 minutes (test reports indicate 7 minutes is sufficient). We don't want to disable the test entirely in debug builds, to exercise this with debug assertions enabled. Resolves #10069.	2024-12-11 21:37:25 +00:00
Arpad Müller	7fa986bc92	Do tenant manifest validation with index-part (#10007 ) This adds some validation of invariants that we want to uphold wrt the tenant manifest and `index_part.json`: * the data the manifest has about a timeline must match with the data in `index_part.json`. It might actually change, e.g. when we do reparenting during detach ancestor, but that requires the timeline to be unoffloaded, i.e. removed from the manifest. * any timeline mentioned in index part, must, if present, be archived. If we unarchive, we first update the tenant manifest to unoffload, and only then update index part. And one needs to archive before offloading. * it is legal for timelines to be mentioned in the manifest but have no `index_part`: this is a temporary state visible during deletion of the timeline. if the pageserver crashed, an attach of the tenant will clean the state up. * it is also legal for offloaded timelines to have an `ancestor_retain_lsn` of None while having an `ancestor_timeline_id`. This is for the to-be-added flattening functionality: the plan is to set former to None if we have flattened a timeline. follow-up of #9942 part of #8088	2024-12-11 20:10:22 +00:00
Vlad Lazar	e8395807a5	storcon: allow for more concurrency in drain/fill operations (#10093 ) ## Problem We saw the drain/fill operations not drain fast enough in ap-southeast. ## Summary of changes These are some quick changes to speed it up: * double reconcile concurrency - this is now half of the available reconcile bandwidth * reduce the waiter polling timeout - this way we can spawn new reconciliations faster	2024-12-11 19:43:40 +00:00
Vlad Lazar	a3e80448e8	pageserver/storcon: add patch endpoints for tenant config metrics (#10020 ) ## Problem Cplane and storage controller tenant config changes are not additive. Any change overrides all existing tenant configs. This would be fine if both did client side patching, but that's not the case. Once this merges, we must update cplane to use the PATCH endpoint. ## Summary of changes ### High Level Allow for patching of tenant configuration with a `PATCH /v1/tenant/config` endpoint. It takes the same data as it's PUT counterpart. For example the payload below will update `gc_period` and unset `compaction_period`. All other fields are left in their original state. ``` { "tenant_id": "1234", "gc_period": "10s", "compaction_period": null } ``` ### Low Level * PS and storcon gain `PATCH /v1/tenant/config` endpoints. PS endpoint is only used for cplane managed instances. * `storcon_cli` is updated to have separate commands for `set-tenant-config` and `patch-tenant-config` Related https://github.com/neondatabase/cloud/issues/21043	2024-12-11 19:16:33 +00:00
Anastasia Lubennikova	ef233e91ef	Update compute_installed_extensions metric: (#9891 ) add owned_by_superuser field to filter out system extensions. While on it, also correct related code: - fix the metric setting: use set() instead of inc() in a loop. inc() is not idempotent and can lead to incorrect results if the function called multiple times. Currently it is only called at compute start, but this will change soon. - fix the return type of the installed_extensions endpoint to match the metric. Currently it is only used in the test.	2024-12-11 16:43:26 +00:00
Mikhail Kot	dee2041cd3	walproposer: fix link error on debian 12 / ubuntu 22 (#10090 ) ## Problem Linking walproposer library (e.g. `cargo t`) produces linker errors: /home/myrrc/neon/pgxn/neon/walproposer_compat.c:169: undefined reference to `pg_snprintf' The library with these symbols (libpgcommon.a) is present ## Summary of changes Changed order of libraries resolution for linker	2024-12-11 16:23:59 +00:00
Arseny Sher	e4bb1ca7d8	Increase neon_local http client to compute timeout in reconfigure. (#10088 ) Seems like 30s sometimes not enough when CI runners are overloaded, causing pull_timeline flakiness. ref https://github.com/neondatabase/neon/issues/9731#issuecomment-2535946443	2024-12-11 15:46:50 +00:00
a-masterov	b987648e71	Enable LFC for all the PG versions. (#10068 ) ## Problem We added support for LFC for tests but are still using it only for the PG17 release. ## Summary of changes LFC is enabled for all PG versions. Errors in tests with LFC enabled now block merging as usual. We keep tests with disabled LFC for PG17 release. Tests on debug builds with LFC enabled still don't affect permission to merge.	2024-12-11 15:28:10 +00:00
Mikhail Kot	c79c1dd8e9	compute_ctl: don't panic if control plane can't be reached (#10078 ) ## Problem If the control plane cannot be reached for some reason, compute_ctl panics ## Summary of changes panic is removed in favour of returning an error. Code is reformatted a bit for more flat control flow Resolves: #5391	2024-12-11 15:03:11 +00:00
Vlad Lazar	a53db73851	pageserver: don't drop multixact slrus on non zero shards (#10086 ) ## Problem We get slru truncation commands on non-zero shards. Compaction will drop the slru dir keys and ingest will fail when receiving such records. https://github.com/neondatabase/neon/pull/10080 fixed it for clog, but not for multixact. ## Summary of changes Only truncate multixact slrus on shard zero. I audited the rest of the ingest code and it looks fine from this pov.	2024-12-11 14:28:18 +00:00
Christian Schwarz	9ae980bf4f	page_service: don't count time spent in Batcher towards smgr latency metrics (#10075 ) ## Problem With pipelining enabled, the time a request spends in the batcher stage counts towards the smgr op latency. If pipelining is disabled, that time is not accounted for. In practice, this results in a jump in smgr getpage latencies in various dashboards and degrades the internal SLO. ## Solution In a similar vein to #10042 and with a similar rationale, this PR stops counting the time spent in batcher stage towards smgr op latency. The smgr op latency metric is reduced to the actual execution time. Time spent in batcher stage is tracked in a separate histogram. I expect to remove that histogram after batching rollout is complete, but it will be helpful in the meantime to reason about the rollout.	2024-12-11 13:37:08 +00:00
Vlad Lazar	665369c439	wal_decoder: fix compact key protobuf encoding (#10074 ) ## Problem Protobuf doesn't support 128 bit integers, so we encode the keys as two 64 bit integers. Issue is that when we split the 128 bit compact key we use signed 64 bit integers to represent the two halves. This may result in a negative lower half when relnode is larger than `0x00800000`. When we convert the lower half to an i128 we get a negative `CompactKey`. ## Summary of Changes Use unsigned integers when encoding into Protobuf. ## Deployment * Prod: We disabled the interpreted proto, so no compat concerns. * Staging: Disable the interpreted proto, do one release, and then release the fixed version. We do this because a negative int32 will convert to a large uint32 value and could give a key in the actual pageserver space. In production we would around this by adding new fields to the proto and deprecating the old ones, but we can make our lives easy here. * Pre-prod: Same as staging	2024-12-11 12:35:02 +00:00
JC Grünhage	d7aeca2f34	CI(deploy): create git tags/releases before triggering deploy workflows (#10022 ) ## Problem When dev deployments are disabled (or fail), the tags for releases aren't created. It makes more sense to have tag and release creation before the deployment to prevent situations like [this](https://github.com/neondatabase/neon/pull/9959). It is not enough to move the tag creation before the deployment. If the deployment fails, re-running the job isn't possible because the API call to create the tag will fail. ## Summary of changes - Tag/Release creation now happens before the deployment - The two steps for tag and release have been merged into a bigger one - There's new checks to ensure the that if the tags/releases already exist as expected, things will continue just fine.	2024-12-11 09:41:34 +00:00
John Spray	38415a9816	pageserver: fix ingest handling of CLog truncate (#10080 ) ## Problem In #9786 we stop storing SLRUs on non-zero shards. However, there was one code path during ingest that still tries to enumerate SLRU relations on all shards. This fails if it sees a tenant who has never seen any write to an SLRU, or who has done such thorough compaction+GC that it has dropped its SLRU directory key. ## Summary of changes - Avoid trying to list SLRU relations on nonzero shards	2024-12-11 09:16:11 +00:00
Matthias van de Meent	597125e124	Disable readstream's reliance on seqscan readahead (#9860 ) Neon doesn't have seqscan detection of its own, so stop read_stream from trying to utilize that readahead, and instead make it issue readahead of its own. ## Problem @knizhnik noticed that we didn't issue smgrprefetch[v] calls for seqscans in PG17 due to the move to the read_stream API, which assumes that the underlying IO facilities do seqscan detection for readahead. That is a wrong assumption when Neon is involved, so let's remove the code that applies that assumption. ## Summary of changes Remove the cases where seqscans are detected and prefetch is disabled as a consequence, and instead don't do that detection. PG PR: https://github.com/neondatabase/postgres/pull/532	2024-12-11 00:51:05 +00:00
Matthias van de Meent	e71d20d392	Emit nbtree vacuum cycle id in nbtree xlog through forced FPIs (#9932 ) This fixes neondatabase/neon#9929. ## Postgres repo PRS: - PG17: https://github.com/neondatabase/postgres/pull/538 - PG16: https://github.com/neondatabase/postgres/pull/539 - PG15: https://github.com/neondatabase/postgres/pull/540 - PG14: https://github.com/neondatabase/postgres/pull/541 ## Problem see #9929 ## Summary of changes We update the split code to force the code to emit an FPI whenever the cycle ID might be interesting for concurrent btree vacuum.	2024-12-10 19:42:52 +00:00
Alex Chi Z.	aa0554fd1e	feat(test_runner): allowed_errors in storage scrubber (#10062 ) ## Problem resolve https://github.com/neondatabase/neon/issues/9988#issuecomment-2528239437 ## Summary of changes * New verbose mode for storage scrubber scan metadata (pageserver) that contains the error messages. * Filter allowed_error list from the JSON output to determine the healthy flag status. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-10 17:00:47 +00:00
Heikki Linnakangas	b853f78136	Print a log message if GetPage response takes too long (#10046 ) We have metrics for GetPage request latencies, but this is an extra measure to capture requests that take way too long in the logs. The log message is printed every 10 s, until the response is received: ``` PG:2024-12-09 16:02:07.715 GMT [1782845] LOG: [NEON_SMGR] [shard 0] no response received from pageserver for 10.000 s, still waiting (sent 10613 requests, received 10612 responses) PG:2024-12-09 16:02:17.723 GMT [1782845] LOG: [NEON_SMGR] [shard 0] no response received from pageserver for 20.008 s, still waiting (sent 10613 requests, received 10612 responses) PG:2024-12-09 16:02:19.719 GMT [1782845] LOG: [NEON_SMGR] [shard 0] received response from pageserver after 22.006 s ```	2024-12-10 16:26:56 +00:00
Alex Chi Z.	6ad99826c1	fix(pageserver): refresh_gc_info should always increase cutoff (#9862 ) ## Problem close https://github.com/neondatabase/cloud/issues/19671 ``` Timeline ----------------------------- ^ last GC happened LSN ^ original retention period setting = 24hr > refresh-gc-info updates the gc_info ^ planned cutoff (gc_info) ^ customer set retention to 48hr, and it's still within the last GC LSN ^1 ^2 we have two choices: (1) update the planned cutoff to move backwards, or (2) keep the current one ``` In this patch, we decided to keep the current cutoff instead of moving back the gc_info to avoid races. In the future, we could allow the planned gc cutoff to go back once cplane sends a retention_history tenant config update, but this requires a careful revisit of the code. ## Summary of changes Ensure that GC cutoffs never go back if retention settings get changed. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-10 15:23:26 +00:00
Konstantin Knizhnik	311ee793b9	Fix handling in-flight requersts in prefetch buffer resize (#9968 ) ## Problem See https://github.com/neondatabase/neon/issues/9961 Current implementation of prefetch buffer resize doesn't correctly handle in-flight requests ## Summary of changes 1. Fix index of entry we should wait for if new prefetch buffer size is smaller than number of in-flight requests. 2. Correctly set flush position Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-10 15:01:40 +00:00
Erik Grinaker	ad472bd4a1	test_runner: add visibility map test (#9940 ) Verifies that visibility map pages are correctly maintained across shards. Touches #9914.	2024-12-10 12:07:00 +00:00
Arpad Müller	c51db1db61	Replace MAX_KEYS_PER_DELETE constant with function (#10061 ) Azure has a different per-request limit of 256 items for bulk deletion compared to the number of 1000 on AWS. Therefore, we need to support multiple values. Due to `GenericRemoteStorage`, we can't add an associated constant, but it has to be a function. The PR replaces the `MAX_KEYS_PER_DELETE` constant with a function of the same name, implemented on both the `RemoteStorage` trait as well as on `GenericRemoteStorage`. The value serves as hint of how many objects to pass to the `delete_objects` function. Reading: * https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch * https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html Part of #7931	2024-12-10 11:29:38 +00:00
Ivan Efremov	34c1295594	[proxy] impr: Additional logging for cancellation queries (#10039 ) ## Problem Since cancellation tasks spawned in the background sometimes logs missing context. https://neondb.slack.com/archives/C060N3SEF9D/p1733427801527419?thread_ts=1733419882.560159&cid=C060N3SEF9D ## Summary of changes Add `session_id` and change loglevel for cancellation queries	2024-12-10 10:14:28 +00:00
Evan Fleming	b593e51eae	safekeeper: use arc for global timelines and config (#10051 ) Hello! I was interested in potentially making some contributions to Neon and looking through the issue backlog I found [8200](https://github.com/neondatabase/neon/issues/8200) which seemed like a good first issue to attempt to tackle. I see it was assigned a while ago so apologies if I'm stepping on any toes with this PR. I also apologize for the size of this PR. I'm not sure if there is a simple way to reduce it given the footprint of the components being changed. ## Problem This PR is attempting to address part of the problem outlined in issue [8200](https://github.com/neondatabase/neon/issues/8200). Namely to remove global static usage of timeline state in favour of `Arc<GlobalTimelines>` and to replace wasteful clones of `SafeKeeperConf` with `Arc<SafeKeeperConf>`. I did not opt to tackle `RemoteStorage` in this PR to minimize the amount of changes as this PR is already quite large. I also did not opt to introduce an `SafekeeperApp` wrapper struct to similarly minimize changes but I can tackle either or both of these omissions in this PR if folks would like. ## Summary of changes - Remove static usage of `GlobalTimelines` in favour of `Arc<GlobalTimelines>` - Wrap `SafeKeeperConf` in `Arc` to avoid wasteful clones of the underlying struct ## Some additional thoughts - We seem to currently store `SafeKeeperConf` in `GlobalTimelines` and then expose it through a public`get_global_config` function which requires locking. This seems needlessly wasteful and based on observed usage we could remove this public accessor and force consumers to acquire `SafeKeeperConf` through the new Arc reference.	2024-12-09 21:09:20 +00:00
Alex Chi Z.	4c4cb80186	fix(pageserver): fix gc-compaction racing with legacy gc (#10052 ) ## Problem close https://github.com/neondatabase/neon/issues/10049, close https://github.com/neondatabase/neon/issues/10030, close https://github.com/neondatabase/neon/issues/8861 part of https://github.com/neondatabase/neon/issues/9114 The legacy gc process calls `get_latest_gc_cutoff`, which uses a Rcu different than the gc_info struct. In the gc_compaction_smoke test case, the "latest" cutoff could be lower than the gc_info struct, causing gc-compaction to collect data that could be accessed by `latest_gc_cutoff`. Technically speaking, there's nothing wrong with gc-compaction using gc_info without considering latest_gc_cutoff, because gc_info is the source of truth. But anyways, let's fix it. ## Summary of changes * gc-compaction uses `latest_gc_cutoff` instead of gc_info to determine the gc horizon. * if a gc-compaction is scheduled via tenant compaction iteration, it will take the gc_block lock to avoid racing with functionalities like detach ancestor (if it's triggered via manual compaction API without scheduling, then it won't take the lock) --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-09 20:06:06 +00:00
a-masterov	92273b6d5e	Enable the pg_regress tests on staging for PG17 (#9978 ) ## Problem Currently, we run the `pg_regress` tests only for PG16 However, PG17 is a part of Neon and should be tested as well ## Summary of changes Modified the workflow and added a patch for PG17 enabling the `pg_regress` tests. The problem with leftovers was solved by using branches.	2024-12-09 19:30:39 +00:00
Arpad Müller	e74e7aac93	Use updated patched azure SDK crates (#10036 ) For a while already, we've been unable to update the Azure SDK crates due to Azure adopting use of a non-tokio async runtime, see #7545. The effort to upstream the fix got stalled, and I think it's better to switch to a patched version of the SDK that is up to date. Now we have a fork of the SDK under the neondatabase github org, to which I have applied Conrad's rebased patches to: https://github.com/neondatabase/azure-sdk-for-rust/tree/neon . The existence of a fork will also help with shipping bulk delete support before it's upstreamed (#7931). Also, in related news, the Azure SDK has gotten a rift in development, where the main branch pertains to a future, to-be-officially-blessed release of the SDK, and the older versions, which we are currently using, are on the `legacy` branch. Upstream doesn't really want patches for the `legacy` branch any more, they want to focus on the `main` efforts. However, even then, the `legacy` branch is still newer than what we are having right now, so let's switch to `legacy` for now. Depending on how long it takes, we can switch to the official version of the SDK once it's released or switch to the upstream `main` branch if there is changes we want before that. As a nice side effect of this PR, we now use reqwest 0.12 everywhere, dropping the dependency on version 0.11. Fixes #7545	2024-12-09 15:50:06 +00:00
Vlad Lazar	4cca5cdb12	deps: update url to 2.5.4 for RUSTSEC-2024-0421 (#10059 ) ## Problem See https://rustsec.org/advisories/RUSTSEC-2024-0421 ## Summary of changes Update url crate to 2.5.4.	2024-12-09 14:57:42 +00:00
Arpad Müller	9d425b54f7	Update AWS SDK crates (#10056 ) Result of running: cargo update -p aws-types -p aws-sigv4 -p aws-credential-types -p aws-smithy-types -p aws-smithy-async -p aws-sdk-kms -p aws-sdk-iam -p aws-sdk-s3 -p aws-config We want to keep the AWS SDK up to date as that way we benefit from new developments and improvements.	2024-12-09 12:46:59 +00:00
John Spray	ec790870d5	storcon: automatically clear Pause/Stop scheduling policies to enable detaches (#10011 ) ## Problem We saw a tenant get stuck when it had been put into Pause scheduling mode to pin it to a pageserver, then it was left idle for a while and the control plane tried to detach it. Close: https://github.com/neondatabase/neon/issues/9957 ## Summary of changes - When changing policy to Detached or Secondary, set the scheduling policy to Active. - Add a test that exercises this - When persisting tenant shards, set their `generation_pageserver` to null if the placement policy is not Attached (this enables consistency checks to work, and avoids leaving state in the DB that could be confusing/misleading in future)	2024-12-07 13:05:09 +00:00
Christian Schwarz	4d7111f240	page_service: don't count time spent flushing towards smgr latency metrics (#10042 ) ## Problem In #9962 I changed the smgr metrics to include time spent on flush. It isn't under our (=storage team's) control how long that flush takes because the client can stop reading requests. ## Summary of changes Stop the timer as soon as we've buffered up the response in the `pgb_writer`. Track flush time in a separate metric. --------- Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-12-07 08:57:55 +00:00
Alex Chi Z.	b1fd086c0c	test(pageserver): disable gc_compaction smoke test for now (#10045 ) ## Problem The test is flaky. ## Summary of changes Disable the test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-06 22:30:04 +00:00
Heikki Linnakangas	b6eea65597	Fix error message if PS connection is lost while receiving prefetch (#9923 ) If the pageserver connection is lost while receiving the prefetch request, the prefetch queue is cleared. The error message prints the values from the prefetch slot, but because the slot was already cleared, they're all zeros: LOG: [NEON_SMGR] [shard 0] No response from reading prefetch entry 0: 0/0/0.0 block 0. This can be caused by a concurrent disconnect To fix, make local copies of the values. In the passing, also add a sanity check that if the receive() call succeeds, the prefetch slot is still intact.	2024-12-06 20:56:57 +00:00
Alex Chi Z.	c42c28b339	feat(pageserver): gc-compaction split job and partial scheduler (#9897 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114, stacked PR over #9809 The compaction scheduler now schedules partial compaction jobs. ## Summary of changes * Add the compaction job splitter based on size. * Schedule subcompactions using the compaction scheduler. * Test subcompaction scheduler in the smoke regress test. * Temporarily disable layer map checks --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-06 18:44:26 +00:00
Tristan Partin	e4837b0a5a	Bump sql_exporter to 0.16.0 (#10041 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-06 17:43:55 +00:00
Erik Grinaker	14c4fae64a	test_runner/performance: add improved bulk insert benchmark (#9812 ) Adds an improved bulk insert benchmark, including S3 uploads. Touches #9789.	2024-12-06 15:17:15 +00:00
Vlad Lazar	cc70fc802d	pageserver: add metric for number of wal records received by each shard (#10035 ) ## Problem With the current metrics we can't identify which shards are ingesting data at any given time. ## Summary of changes Add a metric for the number of wal records received for processing by each shard. This is per (tenant, timeline, shard).	2024-12-06 12:51:41 +00:00
Alexey Kondratov	fa07097f2f	chore: Reorganize and refresh CODEOWNERS (#10008 ) ## Problem We didn't have a codeowner for `/compute`, so nobody was auto-assigned for PRs like #9973 ## Summary of changes While on it: 1. Group codeowners into sections. 2. Remove control plane from the `/compute_tools` because it's primarily the internal `compute_ctl` code. 3. Add control plane (and compute) to `/libs/compute_api` because that's the shared public interface of the compute.	2024-12-06 11:44:50 +00:00
Erik Grinaker	7838659197	pageserver: assert that keys belong to shard (#9943 ) We've seen cases where stray keys end up on the wrong shard. This shouldn't happen. Add debug assertions to prevent this. In release builds, we should be lenient in order to handle changing key ownership policies. Touches #9914.	2024-12-06 10:24:13 +00:00
Vlad Lazar	3f1c542957	pageserver: add disk consistent and remote lsn metrics (#10005 ) ## Problem There's no metrics for disk consistent LSN and remote LSN. This stuff is useful when looking at ingest performance. ## Summary of changes Two per timeline metrics are added: `pageserver_disk_consistent_lsn` and `pageserver_projected_remote_consistent_lsn`. I went for the projected remote lsn instead of the visible one because that more closely matches remote storage write tput. Ideally we would have both, but these metrics are expensive.	2024-12-06 10:21:52 +00:00
Erik Grinaker	ec4072f845	pageserver: add `wait_until_flushed` parameter for timeline checkpoint (#10013 ) ## Problem I'm writing an ingest benchmark in #9812. To time S3 uploads, I need to schedule a flush of the Pageserver's in-memory layer, but don't actually want to wait around for it to complete (which will take a minute). ## Summary of changes Add a parameter `wait_until_flush` (default `true`) for `timeline/checkpoint` to control whether to wait for the flush to complete.	2024-12-06 10:12:39 +00:00
Erik Grinaker	56f867bde5	pageserver: only zero truncated FSM page on owning shard (#10032 ) ## Problem FSM pages are managed like regular relation pages, and owned by a single shard. However, when truncating the FSM relation the last FSM page was zeroed out on all shards. This is unnecessary and potentially confusing. The superfluous keys will be removed during compactions, as they do not belong on these shards. Resolves #10027. ## Summary of changes Only zero out the truncated FSM page on the owning shard.	2024-12-06 07:22:22 +00:00
Arpad Müller	d1ab7471e2	Fix desc_str for Azure container (#10021 ) Small logs fix I've noticed while working on https://github.com/neondatabase/cloud/issues/19963 .	2024-12-05 20:51:57 +00:00
Tristan Partin	6ff4175fd7	Send Content-Type header on reconfigure request from neon_local (#10029 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 20:30:35 +00:00
Tristan Partin	6331cb2161	Bump anyhow to 1.0.94 (#10028 ) We were over a year out of date. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 19:42:52 +00:00
Alex Chi Z.	71f38d1354	feat(pageserver): support schedule gc-compaction (#9809 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 gc-compaction can take a long time. This patch adds support for scheduling a gc-compaction job. The compaction loop will first handle L0->L1 compaction, and then gc compaction. The scheduled jobs are stored in a non-persistent queue within the tenant structure. This will be the building block for the partial compaction trigger -- if the system determines that we need to do a gc compaction, it will partition the keyspace and schedule several jobs. Each of these jobs will run for a short amount of time (i.e, 1 min). L0 compaction will be prioritized over gc compaction. ## Summary of changes * Add compaction scheduler in tenant. * Run scheduled compaction in integration tests. * Change the manual compaction API to allow schedule a compaction instead of immediately doing it. * Add LSN upper bound as gc-compaction parameter. If we schedule partial compactions, gc_cutoff might move across different runs. Therefore, we need to pass a pre-determined gc_cutoff beforehand. (TODO: support LSN lower bound so that we can compact arbitrary "rectangle" in the layer map) * Refactor the gc_compaction internal interface. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-05 19:37:17 +00:00
Tristan Partin	c0ba416967	Add compute_logical_snapshots_bytes metric (#9887 ) This metric exposes the size of all non-temporary logical snapshot files. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 19:04:33 +00:00
Alexey Kondratov	13e8105740	feat(compute): Allow specifying the reconfiguration concurrency (#10006 ) ## Problem We need a higher concurrency during reconfiguration in case of many DBs, but the instance is already running and used by the client. We can easily get out of `max_connections` limit, and the current code won't handle that. ## Summary of changes Default to 1, but also allow control plane to override this value for specific projects. It's also recommended to bump `superuser_reserved_connections` += `reconfigure_concurrency` for such projects to ensure that we always have enough spare connections for reconfiguration process to succeed. Quick workaround for neondatabase/cloud#17846	2024-12-05 17:57:25 +00:00
Erik Grinaker	db79304416	storage_controller: increase shard scan timeout (#10000 ) ## Problem The node shard scan timeout of 1 second is a bit too aggressive, and we've seen this cause test failures. The scans are performed in parallel across nodes, and the entire operation has a 15 second timeout. Resolves #9801. ## Summary of changes Increase the timeout to 5 seconds. This is still enough to time out on a network failure and retry successfully within 15 seconds.	2024-12-05 17:29:21 +00:00
Ivan Efremov	b547681e08	Merge pull request #10024 from neondatabase/rc/release-proxy/2024-12-05 Proxy release 2024-12-05	2024-12-05 15:35:35 +02:00
Ivan Efremov	0fd211537b	proxy: Present new auth backend cplane_proxy_v1 (#10012 ) Implement a new auth backend based on the current Neon backend to switch to the new Proxy V1 cplane API. Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)	2024-12-05 13:00:40 +02:00
Yuchen Liang	a83bd4e81c	pageserver: fix buffered-writer on macos build (#10019 ) ## Problem In https://github.com/neondatabase/neon/pull/9693, we forgot to check macos build. The [CI run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468) on main showed that macos build failed with unused variables and dead code. ## Summary of changes - add `allow(dead_code)` and `allow(unused_variables)` to the relevant code that is not used on macos. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-12-05 13:00:40 +02:00
Conrad Ludgate	ecdad5e6d5	chore: update rust-postgres (#10002 ) Like #9931 but without rebasing upstream just yet, to try and minimise the differences. Removes all proxy-specific commits from the rust-postgres fork, now that proxy no longer depends on them. Merging upstream changes to come later.	2024-12-05 13:00:40 +02:00
Conrad Ludgate	d028929945	chore: update clap (#10009 ) This updates clap to use a new version of anstream	2024-12-05 13:00:40 +02:00
Yuchen Liang	7b0e3db868	pageserver: make `BufferedWriter` do double-buffering (#9693 ) Closes #9387. ## Problem `BufferedWriter` cannot proceed while the owned buffer is flushing to disk. We want to implement double buffering so that the flush can happen in the background. See #9387. ## Summary of changes - Maintain two owned buffers in `BufferedWriter`. - The writer is in charge of copying the data into owned, aligned buffer, once full, submit it to the flush task. - The flush background task is in charge of flushing the owned buffer to disk, and returned the buffer to the writer for reuse. - The writer and the flush background task communicate through a bi-directional channel. For in-memory layer, we also need to be able to read from the buffered writer in `get_values_reconstruct_data`. To handle this case, we did the following - Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`, and use `Arc` to share it between writer and background task. - leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`, one clone will be submitted to the channel, the other clone will be saved within the writer to serve reads. When we want to reuse the buffer, we can invoke `IoBuffer::into_mut`, which gives us back the mutable aligned buffer. - InMemoryLayer reads is now aware of the maybe_flushed part of the buffer. Caveat - We removed the owned version of write, because this interface does not work well with buffer alignment. The result is that without direct IO enabled, [`download_object`](`a439d57050/pageserver/src/tenant/remote_timeline_client/download.rs (L243)`) does one more memcpy than before this PR due to the switch to use `_borrowed` version of the write. - "Bypass aligned part of write" could be implemented later to avoid large amount of memcpy. Testing - use an oneshot channel based control mechanism to make flush behavior deterministic in test. - test reading from `EphemeralFile` when the last submitted buffer is not flushed, in-progress, and done flushing to disk. ## Performance We see performance improvement for small values, and regression on big values, likely due to being CPU bound + disk write latency. [Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-05 13:00:40 +02:00
John Spray	088eb72dd7	tests: make storcon scale test AZ-aware (#9952 ) ## Problem We have a scale test for the storage controller which also acts as a good stress test for scheduling stability. However, it created nodes with no AZs set. ## Summary of changes - Bump node count to 6 and set AZs on them. This is a precursor to other AZ-related PRs, to make sure any new code that's landed is getting scale tested in an AZ-aware environment.	2024-12-05 13:00:40 +02:00
a-masterov	d550e3f626	Create a branch for compute release (#9637 ) ## Problem We practice a manual release flow for the compute module. This will allow automation of the compute release process. ## Summary of changes The workflow was modified to make a compute release automatically on the branch release-compute. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-12-05 13:00:40 +02:00
Erik Grinaker	8c6b41daf5	Display reqwest error source (#10004 ) ## Problem Reqwest errors don't include details about the inner source error. This means that we get opaque errors like: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config) ``` Instead of the more helpful: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out ``` Touches #9801. ## Summary of changes Include the source error for `reqwest::Error` wherever it's displayed.	2024-12-05 13:00:40 +02:00
Alexey Kondratov	bbb050459b	feat(compute): Set default application_name for pgbouncer connections (#9973 ) ## Problem When client specifies `application_name`, pgbouncer propagates it to the Postgres. Yet, if client doesn't do it, we have hard time figuring out who opens a lot of Postgres connections (including the `cloud_admin` ones). See this investigation as an example: https://neondb.slack.com/archives/C0836R0RZ0D ## Summary of changes I haven't found this documented, but it looks like pgbouncer accepts standard Postgres connstring parameters in the connstring in the `[databases]` section, so put the default `application_name=pgbouncer` there. That way, we will always see who opens Postgres connections. I did tests, and if client specifies a `application_name`, pgbouncer overrides this default, so it only works if it's not specified or set to blank `&application_name=` in the connection string. This is the last place we could potentially open some Postgres connections without `application_name`. Everything else should be either of two: 1. Direct client connections without `application_name`, but these should be strictly non-`cloud_admin` ones 2. Some ad-hoc internal connections, so if we see spikes of unidentified `cloud_admin` connections, we will need to investigate it again. Fixes neondatabase/cloud#20948	2024-12-05 13:00:40 +02:00
Conrad Ludgate	cab498c787	feat(proxy): add option to forward startup params (#9979 ) (stacked on #9990 and #9995) Partially fixes #1287 with a custom option field to enable the fixed behaviour. This allows us to gradually roll out the fix without silently changing the observed behaviour for our customers. related to https://github.com/neondatabase/cloud/issues/15284	2024-12-05 13:00:40 +02:00
Folke Behrens	6359342ffb	Assign /libs/proxy/ to proxy team (#10003 )	2024-12-05 13:00:40 +02:00
Erik Grinaker	13285c2a5e	pageserver: return proper status code for heatmap_upload errors (#9991 ) ## Problem During deploys, we see a lot of 500 errors due to heapmap uploads for inactive tenants. These should be 503s instead. Resolves #9574. ## Summary of changes Make the secondary tenant scheduler use `ApiError` rather than `anyhow::Error`, to propagate the tenant error and convert it to an appropriate status code.	2024-12-05 13:00:40 +02:00
Peter Bendel	33790d14a3	fix parsing human time output like "50m37s" (#10001 ) ## Problem In ingest_benchmark.yml workflow we use pgcopydb tool to migrate project. pgcopydb logs human time. Our parsing of the human time doesn't work for times like "50m37s". [Example workflow](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:479) contains "57m45s" but we [reported](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:500) only the seconds part: 45.000 s ## Summary of changes add a regex pattern for Minute/Second combination	2024-12-05 13:00:40 +02:00
Peter Bendel	709b8cd371	optimize parms for ingest bench (#9999 ) ## Problem we tried different parallelism settings for ingest bench ## Summary of changes the following settings seem optimal after merging - SK side Wal filtering - batched getpages Settings: - effective_io_concurrency 100 - concurrency limit 200 (different from Prod!) - jobs 4, maintenance workers 7 - 10 GB chunk size	2024-12-05 13:00:40 +02:00
Vlad Lazar	1c9bbf1a92	storcon: return an error for drain attempts while paused (#9997 ) ## Problem We currently allow drain operations to proceed while the node policy is paused. ## Summary of changes Return a precondition failed error in such cases. The orchestrator is updated in https://github.com/neondatabase/infra/pull/2544 to skip drain and fills if the pageserver is paused. Closes: https://github.com/neondatabase/neon/issues/9907	2024-12-05 13:00:40 +02:00
Christian Schwarz	16163fb850	page_service: enable batching in Rust & Python Tests + Python benchmarks (#9993 ) This is the first step towards batching rollout. Refs - rollout plan: https://github.com/neondatabase/cloud/issues/20620 - task https://github.com/neondatabase/neon/issues/9377 - uber-epic: https://github.com/neondatabase/neon/issues/9376	2024-12-05 13:00:40 +02:00
Alexander Bayandin	73ccc2b08c	test_page_service_batching: fix non-numeric metrics (#9998 ) ## Problem ``` 2024-12-03T15:42:46.5978335Z + poetry run python /__w/neon/neon/scripts/ingest_perf_test_result.py --ingest /__w/neon/neon/test_runner/perf-report-local 2024-12-03T15:42:49.5325077Z Traceback (most recent call last): 2024-12-03T15:42:49.5325603Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 165, in <module> 2024-12-03T15:42:49.5326029Z main() 2024-12-03T15:42:49.5326316Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 155, in main 2024-12-03T15:42:49.5326739Z ingested = ingest_perf_test_result(cur, item, recorded_at_timestamp) 2024-12-03T15:42:49.5327488Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-12-03T15:42:49.5327914Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 99, in ingest_perf_test_result 2024-12-03T15:42:49.5328321Z psycopg2.extras.execute_values( 2024-12-03T15:42:49.5328940Z File "/github/home/.cache/pypoetry/virtualenvs/non-package-mode-_pxWMzVK-py3.11/lib/python3.11/site-packages/psycopg2/extras.py", line 1299, in execute_values 2024-12-03T15:42:49.5335618Z cur.execute(b''.join(parts)) 2024-12-03T15:42:49.5335967Z psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "concurrent-futures" 2024-12-03T15:42:49.5336287Z LINE 57: 'concurrent-futures', 2024-12-03T15:42:49.5336462Z ^ ``` ## Summary of changes - `test_page_service_batching`: save non-numeric params as `labels` - Add a runtime check that `metric_value` is NUMERIC	2024-12-05 13:00:40 +02:00
Christian Schwarz	c719be6474	tests & benchmarks: unify the way we customize the default tenant config (#9992 ) Before this PR, some override callbacks used `.default()`, others used `.setdefault()`. As of this PR, all callbacks use `.setdefault()` which I think is least prone to failure. Aligning on a single way will set the right example for future tests that need such customization. The `test_pageserver_getpage_throttle.py` technically is a change in behavior: before, it replaced the `tenant_config` field, now it just configures the throttle. This is what I believe is intended anyway.	2024-12-05 13:00:40 +02:00
Arpad Müller	718645e56c	Support tenant manifests in the scrubber (#9942 ) Support tenant manifests in the storage scrubber: * list the manifests, order them by generation * delete all manifests except for the two most recent generations * for the latest manifest: try parsing it. I've tested this patch by running the against a staging bucket and it successfully deleted stuff (and avoided deleting the latest two generations). In follow-up work, we might want to also check some invariants of the manifest, as mentioned in #8088. Part of #9386 Part of #8088 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-05 13:00:40 +02:00
Conrad Ludgate	fbc8c36983	chore(proxy): enforce single host+port (#9995 ) proxy doesn't ever provide multiple hosts/ports, so this code adds a lot of complexity of error handling for no good reason. (stacked on #9990)	2024-12-05 13:00:40 +02:00
Alexey Immoreev	5519e42612	Improvement: add console redirect timeout warning (#9985 ) ## Problem There is no information on session being cancelled in 2 minutes at the moment ## Summary of changes The timeout being logged for the user	2024-12-05 13:00:40 +02:00
Erik Grinaker	4157eaf4c5	pageserver: respond to multiple shutdown signals (#9982 ) ## Problem The Pageserver signal handler would only respond to a single signal and initiate shutdown. Subsequent signals were ignored. This meant that a `SIGQUIT` sent after a `SIGTERM` had no effect (e.g. in the case of a slow or stalled shutdown). The `test_runner` uses this to force shutdown if graceful shutdown is slow. Touches #9740. ## Summary of changes Keep responding to signals after the initial shutdown signal has been received. Arguably, the `test_runner` should also use `SIGKILL` rather than `SIGQUIT` in this case, but it seems reasonable to respond to `SIGQUIT` regardless.	2024-12-05 13:00:40 +02:00
Conrad Ludgate	60241127e2	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-05 13:00:40 +02:00
John Spray	f7d5322e8b	pageserver: more detailed logs when calling re-attach (#9996 ) ## Problem We saw a peculiar case where a pageserver apparently got a 0-tenant response to `/re-attach` but we couldn't see the request landing on a storage controller. It was hard to confirm retrospectively that the pageserver was configured properly at the moment it sent the request. ## Summary of changes - Log the URL to which we are sending the request - Log the NodeId and metadata that we sent	2024-12-05 13:00:40 +02:00
John Spray	41bb9c5280	pageserver: only store SLRUs & aux files on shard zero (#9786 ) ## Problem Since https://github.com/neondatabase/neon/pull/9423 the non-zero shards no longer need SLRU content in order to do GC. This data is now redundant on shards >0. One release cycle after merging that PR, we may merge this one, which also stops writing those pages to shards > 0, reaping the efficiency benefit. Closes: https://github.com/neondatabase/neon/issues/7512 Closes: https://github.com/neondatabase/neon/issues/9641 ## Summary of changes - Avoid storing SLRUs on non-zero shards - Bonus: avoid storing aux files on non-zero shards	2024-12-05 13:00:40 +02:00
John Spray	69c0d61c5c	storcon: in shard splits, inherit parent's AZ (#9946 ) ## Problem Sharded tenants should be run in a single AZ for best performance, so that computes have AZ-local latency to all the shards. Part of https://github.com/neondatabase/neon/issues/8264 ## Summary of changes - When we split a tenant, instead of updating each shard's preferred AZ to wherever it is scheduled, propagate the preferred AZ from the parent. - Drop the check in `test_shard_preferred_azs` that asserts shards end up in their preferred AZ: this will not be true again until the optimize_attachment logic is updated to make this so. The existing check wasn't testing anything about scheduling, it was just asserting that we set preferred AZ in a way that matches the way things happen to be scheduled at time of split.	2024-12-05 13:00:40 +02:00
Christian Schwarz	63cb8ce975	pageserver: only throttle pagestream requests & bring back throttling deduction for smgr latency metrics (#9962 ) ## Problem In the batching PR - https://github.com/neondatabase/neon/pull/9870 I stopped deducting the time-spent-in-throttle fro latency metrics, i.e., - smgr latency metrics (`SmgrOpTimer`) - basebackup latency (+scan latency, which I think is part of basebackup). The reason for stopping the deduction was that with the introduction of batching, the trick with tracking time-spent-in-throttle inside RequestContext and swap-replacing it from the `impl Drop for SmgrOpTimer` no longer worked with >1 requests in a batch. However, deducting time-spent-in-throttle is desirable because our internal latency SLO definition does not account for throttling. ## Summary of changes - Redefine throttling to be a page_service pagestream request throttle instead of a throttle for repository `Key` reads through `Timeline::get` / `Timeline::get_vectored`. - This means reads done by `basebackup` are no longer subject to any throttle. - The throttle applies after batching, before handling of the request. - Drive-by fix: make throttle sensitive to cancellation. - Rename metric label `kind` from `timeline_get` to `pagestream` to reflect the new scope of throttling. To avoid config format breakage, we leave the config field named `timeline_get_throttle` and ignore the `task_kinds` field. This will be cleaned up in a future PR. ## Trade-Offs Ideally, we would apply the throttle before reading a request off the connection, so that we queue the minimal amount of work inside the process. However, that's not possible because we need to do shard routing. The redefinition of the throttle to limit pagestream request rate instead of repository `Key` rate comes with several downsides: - We're no longer able to use the throttle mechanism for other other tasks, e.g. image layer creation. However, in practice, we never used that capability anyways. - We no longer throttle basebackup.	2024-12-05 13:00:40 +02:00
Erik Grinaker	907e4aa3c4	test_runner: use immediate shutdown in `test_sharded_ingest` (#9984 ) ## Problem `test_sharded_ingest` ingests a lot of data, which can cause shutdown to be slow e.g. due to local "S3 uploads" or compactions. This can cause test flakes during teardown. Resolves #9740. ## Summary of changes Perform an immediate shutdown of the cluster.	2024-12-05 13:00:40 +02:00
Erik Grinaker	0a2a84b766	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-05 13:00:40 +02:00
a-masterov	85b12ddd52	Add support for the extensions test for Postgres v17 (#9748 ) ## Problem The extensions for Postgres v17 are ready but we do not test the extensions shipped with v17 ## Summary of changes Build the test image based on Postgres v17. Run the tests for v17. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2024-12-05 13:00:40 +02:00
Christian Schwarz	dd76f1eeee	page_service: batching observability & include throttled time in smgr metrics (#9870 ) This PR - fixes smgr metrics https://github.com/neondatabase/neon/issues/9925 - adds an additional startup log line logging the current batching config - adds a histogram of batch sizes global and per-tenant - adds a metric exposing the current batching config The issue described #9925 is that before this PR, request latency was only observed after batching. This means that smgr latency metrics (most importantly getpage latency) don't account for - `wait_lsn` time - time spent waiting for batch to fill up / the executor stage to pick up the batch. The fix is to use a per-request batching timer, like we did before the initial batching PR. We funnel those timers through the entire request lifecycle. I noticed that even before the initial batching changes, we weren't accounting for the time spent writing & flushing the response to the wire. This PR drive-by fixes that deficiency by dropping the timers at the very end of processing the batch, i.e., after the `pgb.flush()` call. I was *unable to maintain the behavior that we deduct time-spent-in-throttle from various latency metrics. The reason is that we're using a single* counter in `RequestContext` to track micros spent in throttle. But there are N metrics timers in the batch, one per request. As a consequence, the practice of consuming the counter in the drop handler of each timer no longer works because all but the first timer will encounter error `close() called on closed state`. A failed attempt to maintain the current behavior can be found in https://github.com/neondatabase/neon/pull/9951. So, this PR remvoes the deduction behavior from all metrics. I started a discussion on Slack about it the implications this has for our internal SLO calculation: https://neondb.slack.com/archives/C033RQ5SPDH/p1732910861704029 # Refs - fixes https://github.com/neondatabase/neon/issues/9925 - sub-issue https://github.com/neondatabase/neon/issues/9377 - epic: https://github.com/neondatabase/neon/issues/9376	2024-12-05 13:00:40 +02:00
Christian Schwarz	8963ac85f9	storcon_cli tenant-describe: include tenant-wide information in output (#9899 ) Before this PR, the storcon_cli didn't have a way to show the tenant-wide information of the TenantDescribeResponse. Sadly, the `Serialize` impl for the tenant config doesn't skip on `None`, so, the output becomes a bit bloated. Maybe we can use `skip_serializing_if(Option::is_none)` in the future. => https://github.com/neondatabase/neon/issues/9983	2024-12-05 13:00:40 +02:00
John Spray	4a488b3e24	storcon: use proper schedule context during node delete (#9958 ) ## Problem I was touching `test_storage_controller_node_deletion` because for AZ scheduling work I was adding a change to the storage controller (kick secondaries during optimisation) that made a FIXME in this test defunct. While looking at it I also realized that we can easily fix the way node deletion currently doesn't use a proper ScheduleContext, using the iterator type recently added for that purpose. ## Summary of changes - A testing-only behavior in storage controller where if a secondary location isn't yet ready during optimisation, it will be actively polled. - Remove workaround in `test_storage_controller_node_deletion` that previously was needed because optimisation would get stuck on cold secondaries. - Update node deletion code to use a `TenantShardContextIterator` and thereby a proper ScheduleContext	2024-12-05 13:00:40 +02:00
Alexey Kondratov	c4987b0b13	fix(testing): Use 1 MB shared_buffers even with LFC (#9969 ) ## Problem After enabling LFC in tests and lowering `shared_buffers` we started having more problems with `test_pg_regress`. ## Summary of changes Set `shared_buffers` to 1MB to both exercise getPage requests/LFC, and still have enough room for Postgres to operate. Everything smaller might be not enough for Postgres under load, and can cause errors like 'no unpinned buffers available'. See Konstantin's comment [1] as well. Fixes #9956 [1]: https://github.com/neondatabase/neon/issues/9956#issuecomment-2511608097	2024-12-05 13:00:40 +02:00
Tristan Partin	84b4821118	Stop changing the value of neon.extension_server_port at runtime (#9972 ) On reconfigure, we no longer passed a port for the extension server which caused us to not write out the neon.extension_server_port line. Thus, Postgres thought we were setting the port to the default value of 0. PGC_POSTMASTER GUCs cannot be set at runtime, which causes the following log messages: > LOG: parameter "neon.extension_server_port" cannot be changed without restarting the server > LOG: configuration file "/var/db/postgres/compute/pgdata/postgresql.conf" contains errors; unaffected changes were applied Fixes: https://github.com/neondatabase/neon/issues/9945 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 13:00:40 +02:00
Conrad Ludgate	32ba9811f9	feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971 ) Fix the HTTP AuthMethod to accomodate the JWT authorization method. Introduces the JWT issuer as an additional field in the parquet logs	2024-12-05 13:00:40 +02:00
Folke Behrens	a0cd64c4d3	Bump OTel, tracing, reqwest crates (#9970 )	2024-12-05 13:00:40 +02:00
Arseny Sher	84687b743d	Update consensus protocol spec (#9607 ) The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have. ref https://github.com/neondatabase/neon/issues/8699	2024-12-05 13:00:40 +02:00
Folke Behrens	b6f93dcec9	proxy: Create Elasticache credentials provider lazily (#9967 ) ## Problem The credentials providers tries to connect to AWS STS even when we use plain Redis connections. ## Summary of changes * Construct the CredentialsProvider only when needed ("irsa").	2024-12-05 13:00:40 +02:00
Alexander Bayandin	4f6c594973	CI(replication-tests): fix notifications about replication-tests failures (#9950 ) ## Problem `if: ${{ github.event.schedule }}` gets skipped if a previous step has failed, but we want to run the step for both `success` and `failure` ## Summary of changes - Add `!cancelled()` to notification step if-condition, to skip only cancelled jobs	2024-12-05 13:00:40 +02:00
Conrad Ludgate	a750c14735	fix(proxy): forward notifications from authentication (#9948 ) Fixes https://github.com/neondatabase/cloud/issues/20973. This refactors `connect_raw` in order to return direct access to the delayed notices. I cannot find a way to test this with psycopg2 unfortunately, although testing it with psql does return the expected results.	2024-12-05 13:00:40 +02:00
John Spray	9ce0dd4e55	storcon: add metric for AZ scheduling violations (#9949 ) ## Problem We can't easily tell how far the state of shards is from their AZ preferences. This can be a cause of performance issues, so it's important for diagnosability that we can tell easily if there are significant numbers of shards that aren't running in their preferred AZ. Related: https://github.com/neondatabase/cloud/issues/15413 ## Summary of changes - In reconcile_all, count shards that are scheduled into the wrong AZ (if they have a preference), and publish it as a prometheus gauge. - Also calculate a statistic for how many shards wanted to reconcile but couldn't. This is clearly a lazy calculation: reconcile all only runs periodically. But that's okay: shards in the wrong AZ is something that only matters if it stays that way for some period of time.	2024-12-05 13:00:40 +02:00
Erik Grinaker	0e1a336607	test_runner: improve `wait_until` (#9936 ) Improves `wait_until` by: * Use `timeout` instead of `iterations`. This allows changing the timeout/interval parameters independently. * Make `timeout` and `interval` optional (default 20s and 0.5s). Most callers don't care. * Only output status every 1s by default, and add optional `status_interval` parameter. * Remove `show_intermediate_error`, this was always emitted anyway. Most callers have been updated to use the defaults, except where they had good reason otherwise.	2024-12-05 13:00:40 +02:00
Anastasia Lubennikova	7fc2912d06	Update pgvector to 0.8.0 (#9733 )	2024-12-05 13:00:40 +02:00
John Spray	fdf231c237	storcon: don't take any Service locks in /status and /ready (#9944 ) ## Problem We saw unexpected container terminations when running in k8s with with small CPU resource requests. The /status and /ready handlers called `maybe_forward`, which always takes the lock on Service::inner. If there is a lot of writer lock contention, and the container is starved of CPU, this increases the likelihood that we will get killed by the kubelet. It isn't certain that this was a cause of issues, but it is a potential source that we can eliminate. ## Summary of changes - Revise logic to return immediately if the URL is in the non-forwarded list, rather than calling maybe_forward	2024-12-05 13:00:40 +02:00
Konstantin Knizhnik	1e08b5dccc	Fix issues with prefetch ring buffer resize (#9847 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479 We observe the following error in the logs ``` [XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128 ``` most likely caused by changing `neon.readahead_buffer_size` ## Summary of changes 1. Copy shard state 2. Do not use prefetch_set_unused in readahead_buffer_resize 3. Change prefetch buffer overflow criteria --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-05 13:00:40 +02:00
Alexander Bayandin	030810ed3e	Compute image: prepare Postgres v14-v16 for Debian 12 (#9954 ) ## Problem Current compute images for Postgres 14-16 don't build on Debian 12 because of issues with extensions. This PR fixes that, but for the current setup, it is mostly a no-op change. ## Summary of changes - Use `/bin/bash -euo pipefail` as SHELL to fail earlier - Fix `plv8` build: backport a trivial patch for v8 - Fix `postgis` build: depend `sfgal` version on Debian version instead of Postgres version Tested in: https://github.com/neondatabase/neon/pull/9849	2024-12-05 13:00:40 +02:00
Konstantin Knizhnik	62b74bdc2c	Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057 ) ## Problem See https://github.com/neondatabase/neon/issues/9023 ## Summary of changes Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of replica and primary configuration mismatch See https://github.com/neondatabase/postgres/pull/501 See https://github.com/neondatabase/postgres/pull/502 See https://github.com/neondatabase/postgres/pull/503 See https://github.com/neondatabase/postgres/pull/504 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-12-05 13:00:40 +02:00
Folke Behrens	8b7e9ed820	Merge the consumption metric pushes (#9939 ) #8564 ## Problem The main and backup consumption metric pushes are completely independent, resulting in different event time windows and different idempotency keys. ## Summary of changes * Merge the push tasks, but keep chunks the same size.	2024-12-05 13:00:40 +02:00
Christian Schwarz	5dad89acd4	page_service: rewrite batching to work without a timeout (#9851 ) # Problem The timeout-based batching adds latency to unbatchable workloads. We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see [this](https://github.com/neondatabase/neon/pull/9822) abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads. # Solution The insight is that, in the steady state of a batchable workload, the time we spend in `get_vectored` will be hundreds of microseconds anyway. If we prepare the next batch concurrently to `get_vectored`, we will have a sizeable batch ready once `get_vectored` of the current batch is done and do not need an explicit timeout. This can be reasonably described as pipelining of the protocol handler. # Implementation We model the sub-protocol handler for pagestream requests (`handle_pagrequests`) as two futures that form a pipeline: 2. Batching: read requests from the connection and fill the current batch 3. Execution: `take` the current batch, execute it using `get_vectored`, and send the response. The Reading and Batching stage are connected through a new type of channel called `spsc_fold`. See the long comment in the `handle_pagerequests_pipelined` for details. # Changes - Refactor `handle_pagerequests` - separate functions for - reading one protocol message; produces a `BatchedFeMessage` with just one page request in it - batching; tried to merge an incoming `BatchedFeMessage` into an existing `BatchedFeMessage`; returns `None` on success and returns back the incoming message in case merging isn't possible - execution of a batched message - unify the timeline handle acquisition & request span construction; it now happen in the function that reads the protocol message - Implement serial and pipelined model - serial: what we had before any of the batching changes - read one protocol message - execute protocol messages - pipelined: the design described above - optionality for execution of the pipeline: either via concurrent futures vs tokio tasks - Pageserver config - remove batching timeout field - add ability to configure pipelining mode - add ability to limit max batch size for pipelined configurations (required for the rollout, cf https://github.com/neondatabase/cloud/issues/20620 ) - ability to configure execution mode - Tests - remove `batch_timeout` parametrization - rename `test_getpage_merge_smoke` to `test_throughput` - add parametrization to test different max batch sizes and execution moes - rename `test_timer_precision` to `test_latency` - rename the test case file to `test_page_service_batching.py` - better descriptions of what the tests actually do ## On the holding The `TimelineHandle` in the pending batch While batching, we hold the `TimelineHandle` in the pending batch. Therefore, the timeline will not finish shutting down while we're batching. This is not a problem in practice because the concurrently ongoing `get_vectored` call will fail quickly with an error indicating that the timeline is shutting down. This results in the Execution stage returning a `QueryError::Shutdown`, which causes the pipeline / entire page service connection to shut down. This drops all references to the `Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the contained `TimelineHandle`s. - => fixes https://github.com/neondatabase/neon/issues/9850 # Performance Local run of the benchmarks, results in [this empty commit](`1cf5b1463f`) in the PR branch. Key take-aways: * `concurrent-futures` and `tasks` deliver identical `batching_factor` * tail latency impact unknown, cf https://github.com/neondatabase/neon/issues/9837 * `concurrent-futures` has higher throughput than `tasks` in all workloads (=lower `time` metric) * In unbatchable workloads, `concurrent-futures` has 5% higher `CPU-per-throughput` than that of `tasks`, and 15% higher than that of `serial`. * In batchable-32 workload, `concurrent-futures` has 8% lower `CPU-per-throughput` than that of `tasks` (comparison to tput of `serial` is irrelevant) * in unbatchable workloads, mean and tail latencies of `concurrent-futures` is practically identical to `serial`, whereas `tasks` adds 20-30us of overhead Overall, `concurrent-futures` seems like a slightly more attractive choice. # Rollout This change is disabled-by-default. Rollout plan: - https://github.com/neondatabase/cloud/issues/20620 # Refs - epic: https://github.com/neondatabase/neon/issues/9376 - this sub-task: https://github.com/neondatabase/neon/issues/9377 - the abandoned attempt to improve batching timeout resolution: https://github.com/neondatabase/neon/pull/9820 - closes https://github.com/neondatabase/neon/issues/9850 - fixes https://github.com/neondatabase/neon/issues/9835	2024-12-05 13:00:40 +02:00
Matthias van de Meent	547b2d2827	Fix timeout value used in XLogWaitForReplayOf (#9937 ) The previous value assumed usec precision, while the timeout used is in milliseconds, causing replica backends to wait for (potentially) many hours for WAL replay without the expected progress reports in logs. This fixes the issue. Reported-By: Alexander Lakhin <exclusion@gmail.com> ## Problem https://github.com/neondatabase/postgres/pull/279#issuecomment-2507671817 The timeout value was configured with the assumption the indicated value would be microseconds, where it's actually milliseconds. That causes the backend to wait for much longer (2h46m40s) before it emits the "I'm waiting for recovery" message. While we do have wait events configured on this, it's not great to have stuck backends without clear logs, so this fixes the timeout value in all our PostgreSQL branches. ## PG PRs * PG14: https://github.com/neondatabase/postgres/pull/542 * PG15: https://github.com/neondatabase/postgres/pull/543 * PG16: https://github.com/neondatabase/postgres/pull/544 * PG17: https://github.com/neondatabase/postgres/pull/545	2024-12-05 13:00:40 +02:00
Gleb Novikov	93f29a0065	Fixed fast_import pgbin in calling get_pg_version (#9933 ) Was working on https://github.com/neondatabase/cloud/pull/20795 and discovered that fast_import is not working normally.	2024-12-05 13:00:40 +02:00
John Spray	4f36494615	pageserver: download small objects using a smaller timeout (#9938 ) ## Problem It appears that the Azure storage API tends to hang TCP connections more than S3 does. Currently we use a 2 minute timeout for all downloads. This is large because sometimes the objects we download are large. However, waiting 2 minutes when doing something like downloading a manifest on tenant attach is problematic, because when someone is doing a "create tenant, create timeline" workflow, that 2 minutes is long enough for them reasonably to give up creating that timeline. Rather than propagate oversized timeouts further up the stack, we should use a different timeout for objects that we expect to be small. Closes: https://github.com/neondatabase/neon/issues/9836 ## Summary of changes - Add a `small_timeout` configuration attribute to remote storage, defaulting to 30 seconds (still a very generous period to do something like download an index) - Add a DownloadKind parameter to DownloadOpts, so that callers can indicate whether they expect the object to be small or large. - In the azure client, use small timeout for HEAD requests, and for GET requests if DownloadKind::Small is used. - Use DownloadKind::Small for manifests, indices, and heatmap downloads. This PR intentionally does not make the equivalent change to the S3 client, to reduce blast radius in case this has unexpected consequences (we could accomplish the same thing by editing lots of configs, but just skipping the code is simpler for right now)	2024-12-05 13:00:40 +02:00
Alexey Kondratov	0a550f3e7d	feat(compute_ctl): Always set application_name (#9934 ) ## Problem It was not always possible to judge what exactly some `cloud_admin` connections were doing because we didn't consistently set `application_name` everywhere. ## Summary of changes Unify the way we connect to Postgres: 1. Switch to building configs everywhere 2. Always set `application_name` and make naming consistent Follow-up for #9919 Part of neondatabase/cloud#20948	2024-12-05 13:00:40 +02:00
Erik Grinaker	4bb9554e4a	safekeeper: use jemalloc (#9780 ) ## Problem To add Safekeeper heap profiling in #9778, we need to switch to an allocator that supports it. Pageserver and proxy already use jemalloc. Touches #9534. ## Summary of changes Use jemalloc in Safekeeper.	2024-12-05 13:00:40 +02:00
John Spray	008616cfe6	storage controller: use proper ScheduleContext when evacuating a node (#9908 ) ## Problem When picking locations for a shard, we should use a ScheduleContext that includes all the other shards in the tenant, so that we apply proper anti-affinity between shards. If we don't do this, then it can lead to unstable scheduling, where we place a shard somewhere that the optimizer will then immediately move it away from. We didn't always do this, because it was a bit awkward to accumulate the context for a tenant rather than just walking tenants. This was a TODO in `handle_node_availability_transition`: ``` // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters // for tenants without secondary locations: if they have a secondary location, then this // schedule() call is just promoting an existing secondary) ``` This is a precursor to https://github.com/neondatabase/neon/issues/8264, where the current imperfect scheduling during node evacuation hampers testing. ## Summary of changes - Add an iterator type that yields each shard along with a schedulecontext that includes all the other shards from the same tenant - Use the iterator to replace hand-crafted logic in optimize_all_plan (functionally identical) - Use the iterator in `handle_node_availability_transition` to apply proper anti-affinity during node evacuation.	2024-12-05 13:00:40 +02:00
Conrad Ludgate	e61ec94fbc	chore(proxy): vendor a subset of rust-postgres (#9930 ) Our rust-postgres fork is getting messy. Mostly because proxy wants more control over the raw protocol than tokio-postgres provides. As such, it's diverging more and more. Storage and compute also make use of rust-postgres, but in more normal usage, thus they don't need our crazy changes. Idea: * proxy maintains their subset * other teams use a minimal patch set against upstream rust-postgres Reviewing this code will be difficult. To implement it, I 1. Copied tokio-postgres, postgres-protocol and postgres-types from `00940fcdb5` 2. Updated their package names with the `2` suffix to make them compile in the workspace. 3. Updated proxy to use those packages 4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches applied https://github.com/jbg/tokio-postgres-rustls/pull/32 https://github.com/jbg/tokio-postgres-rustls/pull/33) 5. Removed as much dead code as I could find in the vendored libraries 6. Updated the tokio-postgres-rustls code to use our existing channel binding implementation	2024-12-05 13:00:40 +02:00
Erik Grinaker	e5152551ad	test_runner/performance: add logical message ingest benchmark (#9749 ) Adds a benchmark for logical message WAL ingestion throughput end-to-end. Logical messages are essentially noops, and thus ignored by the Pageserver. Example results from my MacBook, with fsync enabled: ``` postgres_ingest: 14.445 s safekeeper_ingest: 29.948 s pageserver_ingest: 30.013 s pageserver_recover_ingest: 8.633 s wal_written: 10,340 MB message_count: 1310720 messages postgres_throughput: 715 MB/s safekeeper_throughput: 345 MB/s pageserver_throughput: 344 MB/s pageserver_recover_throughput: 1197 MB/s ``` See https://github.com/neondatabase/neon/issues/9642#issuecomment-2475995205 for running analysis. Touches #9642.	2024-12-05 13:00:40 +02:00
Alexey Kondratov	b0822a5499	fix(compute_ctl): Allow usage of DB names with whitespaces (#9919 ) ## Problem We used `set_path()` to replace the database name in the connection string. It automatically does url-safe encoding if the path is not already encoded, but it does it as per the URL standard, which assumes that tabs can be safely removed from the path without changing the meaning of the URL. See, e.g., https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks for DBs with properly %-encoded names, like with `%20`, as they are kept intact, but actually should be escaped. Yet, this is not true for Postgres, where it's completely valid to have trailing tabs in the database name. I think this is the PR that caused this regression https://github.com/neondatabase/neon/pull/9717, as it switched from `postgres::config::Config` back to `set_path()`. This was fixed a while ago already [1], btw, I just haven't added a test to catch this regression back then :( ## Summary of changes This commit changes the code back to use `postgres/tokio_postgres::Config` everywhere. While on it, also do some changes around, as I had to touch this code: 1. Bump some logging from `debug` to `info` in the spec apply path. We do not use `debug` in prod, and it was tricky to understand what was going on with this bug in prod. 2. Refactor configuration concurrency calculation code so it was reusable. Yet, still keep `1` in the case of reconfiguration. The database can be actively used at this moment, so we cannot guarantee that there will be enough spare connection slots, and the underlying code won't handle connection errors properly. 3. Simplify the installed extensions code. It was spawning a blocking task inside async function, which doesn't make much sense. Instead, just have a main sync function and call it with `spawn_blocking` in the API code -- the only place we need it to be async. 4. Add regression python test to cover this and related problems in the future. Also, add more extensive testing of schema dump and DBs and roles listing API. [1]: `4d1e48f3b9` [2]: https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org Resolves neondatabase/cloud#20869	2024-12-05 13:00:40 +02:00
Alexander Bayandin	1fb6ab59e8	test_runner: rerun all failed tests (#9917 ) ## Problem Currently, we rerun only known flaky tests. This approach was chosen to reduce the number of tests that go unnoticed (by forcing people to take a look at failed tests and rerun the job manually), but it has some drawbacks: - In PRs, people tend to push new changes without checking failed tests (that's ok) - In the main, tests are just restarted without checking (understandable) - Parametrised tests become flaky one by one, i.e. if `test[1]` is flaky `, test[2]` is not marked as flaky automatically (which may or may not be the case). I suggest rerunning all failed tests to increase the stability of GitHub jobs and using the Grafana Dashboard with flaky tests for deeper analysis. ## Summary of changes - Rerun all failed tests twice at max	2024-12-05 13:00:40 +02:00
Vlad Lazar	e16439400d	pageserver: return correct LSN for interpreted proto keep alive responses (#9928 ) ## Problem For the interpreted proto the pageserver is not returning the correct LSN in replies to keep alive requests. This is because the interpreted protocol arm was not updating `last_rec_lsn`. ## Summary of changes * Return correct LSN in keep-alive responses * Fix shard field in wal sender traces	2024-12-05 13:00:40 +02:00
Arpad Müller	e401f66698	Update rust to 1.83.0, also update cargo adjacent tools (#9926 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.83.0/). Also update `cargo-hakari`, `cargo-deny`, `cargo-hack` and `cargo-nextest` to their latest versions. Prior update was in #9445.	2024-12-05 13:00:40 +02:00
Erik Grinaker	2fa461b668	Makefile: build pg_visibility (#9922 ) Build the `pg_visibility` extension for use with `neon_local`. This is useful to inspect the visibility map for debugging. Touches #9914.	2024-12-05 13:00:40 +02:00
Vlad Lazar	03d90bc0b3	remote_storage/abs: count 404 and 304 for get as ok for metrics (#9912 ) ## Problem We currently see elevated levels of errors for GetBlob requests. This is because 404 and 304 are counted as errors for metric reporting. ## Summary of Changes Bring the implementation in line with the S3 client and treat 404 and 304 responses as ok for metric purposes. Related: https://github.com/neondatabase/cloud/issues/20666	2024-12-05 13:00:40 +02:00
Ivan Efremov	268bc890ea	proxy: spawn cancellation checks in the background (#9918 ) ## Problem For cancellation, a connection is open during all the cancel checks. ## Summary of changes Spawn cancellation checks in the background, and close connection immediately. Use task_tracker for cancellation checks.	2024-12-05 13:00:40 +02:00
Ivan Efremov	ffc9c33eb2	proxy: Present new auth backend cplane_proxy_v1 (#10012 ) Implement a new auth backend based on the current Neon backend to switch to the new Proxy V1 cplane API. Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)	2024-12-05 05:30:38 +00:00
Yuchen Liang	ed2d892113	pageserver: fix buffered-writer on macos build (#10019 ) ## Problem In https://github.com/neondatabase/neon/pull/9693, we forgot to check macos build. The [CI run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468) on main showed that macos build failed with unused variables and dead code. ## Summary of changes - add `allow(dead_code)` and `allow(unused_variables)` to the relevant code that is not used on macos. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-12-05 02:16:09 +00:00
Conrad Ludgate	131585eb6b	chore: update rust-postgres (#10002 ) Like #9931 but without rebasing upstream just yet, to try and minimise the differences. Removes all proxy-specific commits from the rust-postgres fork, now that proxy no longer depends on them. Merging upstream changes to come later.	2024-12-04 21:07:44 +00:00
Conrad Ludgate	0bab7e3086	chore: update clap (#10009 ) This updates clap to use a new version of anstream	2024-12-04 17:42:17 +00:00
Yuchen Liang	e6cd5050fc	pageserver: make `BufferedWriter` do double-buffering (#9693 ) Closes #9387. ## Problem `BufferedWriter` cannot proceed while the owned buffer is flushing to disk. We want to implement double buffering so that the flush can happen in the background. See #9387. ## Summary of changes - Maintain two owned buffers in `BufferedWriter`. - The writer is in charge of copying the data into owned, aligned buffer, once full, submit it to the flush task. - The flush background task is in charge of flushing the owned buffer to disk, and returned the buffer to the writer for reuse. - The writer and the flush background task communicate through a bi-directional channel. For in-memory layer, we also need to be able to read from the buffered writer in `get_values_reconstruct_data`. To handle this case, we did the following - Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`, and use `Arc` to share it between writer and background task. - leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`, one clone will be submitted to the channel, the other clone will be saved within the writer to serve reads. When we want to reuse the buffer, we can invoke `IoBuffer::into_mut`, which gives us back the mutable aligned buffer. - InMemoryLayer reads is now aware of the maybe_flushed part of the buffer. Caveat - We removed the owned version of write, because this interface does not work well with buffer alignment. The result is that without direct IO enabled, [`download_object`](`a439d57050/pageserver/src/tenant/remote_timeline_client/download.rs (L243)`) does one more memcpy than before this PR due to the switch to use `_borrowed` version of the write. - "Bypass aligned part of write" could be implemented later to avoid large amount of memcpy. Testing - use an oneshot channel based control mechanism to make flush behavior deterministic in test. - test reading from `EphemeralFile` when the last submitted buffer is not flushed, in-progress, and done flushing to disk. ## Performance We see performance improvement for small values, and regression on big values, likely due to being CPU bound + disk write latency. [Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-04 16:54:56 +00:00
John Spray	60c0d19f57	tests: make storcon scale test AZ-aware (#9952 ) ## Problem We have a scale test for the storage controller which also acts as a good stress test for scheduling stability. However, it created nodes with no AZs set. ## Summary of changes - Bump node count to 6 and set AZs on them. This is a precursor to other AZ-related PRs, to make sure any new code that's landed is getting scale tested in an AZ-aware environment.	2024-12-04 15:04:04 +00:00
a-masterov	dec2e2fb29	Create a branch for compute release (#9637 ) ## Problem We practice a manual release flow for the compute module. This will allow automation of the compute release process. ## Summary of changes The workflow was modified to make a compute release automatically on the branch release-compute. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-12-04 13:10:00 +00:00
Erik Grinaker	699a213c5d	Display reqwest error source (#10004 ) ## Problem Reqwest errors don't include details about the inner source error. This means that we get opaque errors like: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config) ``` Instead of the more helpful: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out ``` Touches #9801. ## Summary of changes Include the source error for `reqwest::Error` wherever it's displayed.	2024-12-04 13:05:53 +00:00
Alexey Kondratov	9a4157dadb	feat(compute): Set default application_name for pgbouncer connections (#9973 ) ## Problem When client specifies `application_name`, pgbouncer propagates it to the Postgres. Yet, if client doesn't do it, we have hard time figuring out who opens a lot of Postgres connections (including the `cloud_admin` ones). See this investigation as an example: https://neondb.slack.com/archives/C0836R0RZ0D ## Summary of changes I haven't found this documented, but it looks like pgbouncer accepts standard Postgres connstring parameters in the connstring in the `[databases]` section, so put the default `application_name=pgbouncer` there. That way, we will always see who opens Postgres connections. I did tests, and if client specifies a `application_name`, pgbouncer overrides this default, so it only works if it's not specified or set to blank `&application_name=` in the connection string. This is the last place we could potentially open some Postgres connections without `application_name`. Everything else should be either of two: 1. Direct client connections without `application_name`, but these should be strictly non-`cloud_admin` ones 2. Some ad-hoc internal connections, so if we see spikes of unidentified `cloud_admin` connections, we will need to investigate it again. Fixes neondatabase/cloud#20948	2024-12-04 13:05:31 +00:00
Conrad Ludgate	bd52822e14	feat(proxy): add option to forward startup params (#9979 ) (stacked on #9990 and #9995) Partially fixes #1287 with a custom option field to enable the fixed behaviour. This allows us to gradually roll out the fix without silently changing the observed behaviour for our customers. related to https://github.com/neondatabase/cloud/issues/15284	2024-12-04 12:58:35 +00:00
Folke Behrens	dcd016bbfc	Assign /libs/proxy/ to proxy team (#10003 )	2024-12-04 12:58:31 +00:00
Erik Grinaker	7b18e33997	pageserver: return proper status code for heatmap_upload errors (#9991 ) ## Problem During deploys, we see a lot of 500 errors due to heapmap uploads for inactive tenants. These should be 503s instead. Resolves #9574. ## Summary of changes Make the secondary tenant scheduler use `ApiError` rather than `anyhow::Error`, to propagate the tenant error and convert it to an appropriate status code.	2024-12-04 12:53:52 +00:00
Peter Bendel	9d75218ba7	fix parsing human time output like "50m37s" (#10001 ) ## Problem In ingest_benchmark.yml workflow we use pgcopydb tool to migrate project. pgcopydb logs human time. Our parsing of the human time doesn't work for times like "50m37s". [Example workflow](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:479) contains "57m45s" but we [reported](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:500) only the seconds part: 45.000 s ## Summary of changes add a regex pattern for Minute/Second combination	2024-12-04 11:37:24 +00:00
Peter Bendel	1b3558df7a	optimize parms for ingest bench (#9999 ) ## Problem we tried different parallelism settings for ingest bench ## Summary of changes the following settings seem optimal after merging - SK side Wal filtering - batched getpages Settings: - effective_io_concurrency 100 - concurrency limit 200 (different from Prod!) - jobs 4, maintenance workers 7 - 10 GB chunk size	2024-12-04 11:07:22 +00:00
Vlad Lazar	68205c48ed	storcon: return an error for drain attempts while paused (#9997 ) ## Problem We currently allow drain operations to proceed while the node policy is paused. ## Summary of changes Return a precondition failed error in such cases. The orchestrator is updated in https://github.com/neondatabase/infra/pull/2544 to skip drain and fills if the pageserver is paused. Closes: https://github.com/neondatabase/neon/issues/9907	2024-12-04 09:25:29 +00:00
Christian Schwarz	8d93d02c2f	page_service: enable batching in Rust & Python Tests + Python benchmarks (#9993 ) This is the first step towards batching rollout. Refs - rollout plan: https://github.com/neondatabase/cloud/issues/20620 - task https://github.com/neondatabase/neon/issues/9377 - uber-epic: https://github.com/neondatabase/neon/issues/9376	2024-12-04 00:07:49 +00:00
Alexander Bayandin	023821a80c	test_page_service_batching: fix non-numeric metrics (#9998 ) ## Problem ``` 2024-12-03T15:42:46.5978335Z + poetry run python /__w/neon/neon/scripts/ingest_perf_test_result.py --ingest /__w/neon/neon/test_runner/perf-report-local 2024-12-03T15:42:49.5325077Z Traceback (most recent call last): 2024-12-03T15:42:49.5325603Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 165, in <module> 2024-12-03T15:42:49.5326029Z main() 2024-12-03T15:42:49.5326316Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 155, in main 2024-12-03T15:42:49.5326739Z ingested = ingest_perf_test_result(cur, item, recorded_at_timestamp) 2024-12-03T15:42:49.5327488Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-12-03T15:42:49.5327914Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 99, in ingest_perf_test_result 2024-12-03T15:42:49.5328321Z psycopg2.extras.execute_values( 2024-12-03T15:42:49.5328940Z File "/github/home/.cache/pypoetry/virtualenvs/non-package-mode-_pxWMzVK-py3.11/lib/python3.11/site-packages/psycopg2/extras.py", line 1299, in execute_values 2024-12-03T15:42:49.5335618Z cur.execute(b''.join(parts)) 2024-12-03T15:42:49.5335967Z psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "concurrent-futures" 2024-12-03T15:42:49.5336287Z LINE 57: 'concurrent-futures', 2024-12-03T15:42:49.5336462Z ^ ``` ## Summary of changes - `test_page_service_batching`: save non-numeric params as `labels` - Add a runtime check that `metric_value` is NUMERIC	2024-12-03 22:46:18 +00:00
Christian Schwarz	944c1adc4c	tests & benchmarks: unify the way we customize the default tenant config (#9992 ) Before this PR, some override callbacks used `.default()`, others used `.setdefault()`. As of this PR, all callbacks use `.setdefault()` which I think is least prone to failure. Aligning on a single way will set the right example for future tests that need such customization. The `test_pageserver_getpage_throttle.py` technically is a change in behavior: before, it replaced the `tenant_config` field, now it just configures the throttle. This is what I believe is intended anyway.	2024-12-03 22:07:03 +00:00
Arpad Müller	ca85f364ba	Support tenant manifests in the scrubber (#9942 ) Support tenant manifests in the storage scrubber: * list the manifests, order them by generation * delete all manifests except for the two most recent generations * for the latest manifest: try parsing it. I've tested this patch by running the against a staging bucket and it successfully deleted stuff (and avoided deleting the latest two generations). In follow-up work, we might want to also check some invariants of the manifest, as mentioned in #8088. Part of #9386 Part of #8088 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-03 20:39:10 +00:00
Conrad Ludgate	9ef0662a42	chore(proxy): enforce single host+port (#9995 ) proxy doesn't ever provide multiple hosts/ports, so this code adds a lot of complexity of error handling for no good reason. (stacked on #9990)	2024-12-03 20:00:14 +00:00
Alexey Immoreev	3baef0bca3	Improvement: add console redirect timeout warning (#9985 ) ## Problem There is no information on session being cancelled in 2 minutes at the moment ## Summary of changes The timeout being logged for the user	2024-12-03 18:59:44 +00:00
Erik Grinaker	f312c6571f	pageserver: respond to multiple shutdown signals (#9982 ) ## Problem The Pageserver signal handler would only respond to a single signal and initiate shutdown. Subsequent signals were ignored. This meant that a `SIGQUIT` sent after a `SIGTERM` had no effect (e.g. in the case of a slow or stalled shutdown). The `test_runner` uses this to force shutdown if graceful shutdown is slow. Touches #9740. ## Summary of changes Keep responding to signals after the initial shutdown signal has been received. Arguably, the `test_runner` should also use `SIGKILL` rather than `SIGQUIT` in this case, but it seems reasonable to respond to `SIGQUIT` regardless.	2024-12-03 18:47:17 +00:00
Conrad Ludgate	27a42d0f96	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-03 18:39:23 +00:00
John Spray	b04ab468ee	pageserver: more detailed logs when calling re-attach (#9996 ) ## Problem We saw a peculiar case where a pageserver apparently got a 0-tenant response to `/re-attach` but we couldn't see the request landing on a storage controller. It was hard to confirm retrospectively that the pageserver was configured properly at the moment it sent the request. ## Summary of changes - Log the URL to which we are sending the request - Log the NodeId and metadata that we sent	2024-12-03 18:36:37 +00:00
John Spray	dcb629532b	pageserver: only store SLRUs & aux files on shard zero (#9786 ) ## Problem Since https://github.com/neondatabase/neon/pull/9423 the non-zero shards no longer need SLRU content in order to do GC. This data is now redundant on shards >0. One release cycle after merging that PR, we may merge this one, which also stops writing those pages to shards > 0, reaping the efficiency benefit. Closes: https://github.com/neondatabase/neon/issues/7512 Closes: https://github.com/neondatabase/neon/issues/9641 ## Summary of changes - Avoid storing SLRUs on non-zero shards - Bonus: avoid storing aux files on non-zero shards	2024-12-03 17:22:49 +00:00
John Spray	71d004289c	storcon: in shard splits, inherit parent's AZ (#9946 ) ## Problem Sharded tenants should be run in a single AZ for best performance, so that computes have AZ-local latency to all the shards. Part of https://github.com/neondatabase/neon/issues/8264 ## Summary of changes - When we split a tenant, instead of updating each shard's preferred AZ to wherever it is scheduled, propagate the preferred AZ from the parent. - Drop the check in `test_shard_preferred_azs` that asserts shards end up in their preferred AZ: this will not be true again until the optimize_attachment logic is updated to make this so. The existing check wasn't testing anything about scheduling, it was just asserting that we set preferred AZ in a way that matches the way things happen to be scheduled at time of split.	2024-12-03 16:55:00 +00:00
Christian Schwarz	4d422b937c	pageserver: only throttle pagestream requests & bring back throttling deduction for smgr latency metrics (#9962 ) ## Problem In the batching PR - https://github.com/neondatabase/neon/pull/9870 I stopped deducting the time-spent-in-throttle fro latency metrics, i.e., - smgr latency metrics (`SmgrOpTimer`) - basebackup latency (+scan latency, which I think is part of basebackup). The reason for stopping the deduction was that with the introduction of batching, the trick with tracking time-spent-in-throttle inside RequestContext and swap-replacing it from the `impl Drop for SmgrOpTimer` no longer worked with >1 requests in a batch. However, deducting time-spent-in-throttle is desirable because our internal latency SLO definition does not account for throttling. ## Summary of changes - Redefine throttling to be a page_service pagestream request throttle instead of a throttle for repository `Key` reads through `Timeline::get` / `Timeline::get_vectored`. - This means reads done by `basebackup` are no longer subject to any throttle. - The throttle applies after batching, before handling of the request. - Drive-by fix: make throttle sensitive to cancellation. - Rename metric label `kind` from `timeline_get` to `pagestream` to reflect the new scope of throttling. To avoid config format breakage, we leave the config field named `timeline_get_throttle` and ignore the `task_kinds` field. This will be cleaned up in a future PR. ## Trade-Offs Ideally, we would apply the throttle before reading a request off the connection, so that we queue the minimal amount of work inside the process. However, that's not possible because we need to do shard routing. The redefinition of the throttle to limit pagestream request rate instead of repository `Key` rate comes with several downsides: - We're no longer able to use the throttle mechanism for other other tasks, e.g. image layer creation. However, in practice, we never used that capability anyways. - We no longer throttle basebackup.	2024-12-03 15:25:58 +00:00
Erik Grinaker	bbe4dfa991	test_runner: use immediate shutdown in `test_sharded_ingest` (#9984 ) ## Problem `test_sharded_ingest` ingests a lot of data, which can cause shutdown to be slow e.g. due to local "S3 uploads" or compactions. This can cause test flakes during teardown. Resolves #9740. ## Summary of changes Perform an immediate shutdown of the cluster.	2024-12-03 14:33:31 +00:00
Erik Grinaker	dcb24ce170	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-03 11:35:59 +00:00
a-masterov	a2a942f93c	Add support for the extensions test for Postgres v17 (#9748 ) ## Problem The extensions for Postgres v17 are ready but we do not test the extensions shipped with v17 ## Summary of changes Build the test image based on Postgres v17. Run the tests for v17. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2024-12-03 11:25:29 +00:00
Christian Schwarz	cb10be710d	page_service: batching observability & include throttled time in smgr metrics (#9870 ) This PR - fixes smgr metrics https://github.com/neondatabase/neon/issues/9925 - adds an additional startup log line logging the current batching config - adds a histogram of batch sizes global and per-tenant - adds a metric exposing the current batching config The issue described #9925 is that before this PR, request latency was only observed after batching. This means that smgr latency metrics (most importantly getpage latency) don't account for - `wait_lsn` time - time spent waiting for batch to fill up / the executor stage to pick up the batch. The fix is to use a per-request batching timer, like we did before the initial batching PR. We funnel those timers through the entire request lifecycle. I noticed that even before the initial batching changes, we weren't accounting for the time spent writing & flushing the response to the wire. This PR drive-by fixes that deficiency by dropping the timers at the very end of processing the batch, i.e., after the `pgb.flush()` call. I was *unable to maintain the behavior that we deduct time-spent-in-throttle from various latency metrics. The reason is that we're using a single* counter in `RequestContext` to track micros spent in throttle. But there are N metrics timers in the batch, one per request. As a consequence, the practice of consuming the counter in the drop handler of each timer no longer works because all but the first timer will encounter error `close() called on closed state`. A failed attempt to maintain the current behavior can be found in https://github.com/neondatabase/neon/pull/9951. So, this PR remvoes the deduction behavior from all metrics. I started a discussion on Slack about it the implications this has for our internal SLO calculation: https://neondb.slack.com/archives/C033RQ5SPDH/p1732910861704029 # Refs - fixes https://github.com/neondatabase/neon/issues/9925 - sub-issue https://github.com/neondatabase/neon/issues/9377 - epic: https://github.com/neondatabase/neon/issues/9376	2024-12-03 11:03:23 +00:00
Christian Schwarz	15d01b257a	storcon_cli tenant-describe: include tenant-wide information in output (#9899 ) Before this PR, the storcon_cli didn't have a way to show the tenant-wide information of the TenantDescribeResponse. Sadly, the `Serialize` impl for the tenant config doesn't skip on `None`, so, the output becomes a bit bloated. Maybe we can use `skip_serializing_if(Option::is_none)` in the future. => https://github.com/neondatabase/neon/issues/9983	2024-12-03 10:55:13 +00:00
John Spray	aaee713e53	storcon: use proper schedule context during node delete (#9958 ) ## Problem I was touching `test_storage_controller_node_deletion` because for AZ scheduling work I was adding a change to the storage controller (kick secondaries during optimisation) that made a FIXME in this test defunct. While looking at it I also realized that we can easily fix the way node deletion currently doesn't use a proper ScheduleContext, using the iterator type recently added for that purpose. ## Summary of changes - A testing-only behavior in storage controller where if a secondary location isn't yet ready during optimisation, it will be actively polled. - Remove workaround in `test_storage_controller_node_deletion` that previously was needed because optimisation would get stuck on cold secondaries. - Update node deletion code to use a `TenantShardContextIterator` and thereby a proper ScheduleContext	2024-12-03 08:59:38 +00:00
Alexey Kondratov	2e9207fdf3	fix(testing): Use 1 MB shared_buffers even with LFC (#9969 ) ## Problem After enabling LFC in tests and lowering `shared_buffers` we started having more problems with `test_pg_regress`. ## Summary of changes Set `shared_buffers` to 1MB to both exercise getPage requests/LFC, and still have enough room for Postgres to operate. Everything smaller might be not enough for Postgres under load, and can cause errors like 'no unpinned buffers available'. See Konstantin's comment [1] as well. Fixes #9956 [1]: https://github.com/neondatabase/neon/issues/9956#issuecomment-2511608097	2024-12-02 18:46:06 +00:00
Tristan Partin	d8ebd33fe6	Stop changing the value of neon.extension_server_port at runtime (#9972 ) On reconfigure, we no longer passed a port for the extension server which caused us to not write out the neon.extension_server_port line. Thus, Postgres thought we were setting the port to the default value of 0. PGC_POSTMASTER GUCs cannot be set at runtime, which causes the following log messages: > LOG: parameter "neon.extension_server_port" cannot be changed without restarting the server > LOG: configuration file "/var/db/postgres/compute/pgdata/postgresql.conf" contains errors; unaffected changes were applied Fixes: https://github.com/neondatabase/neon/issues/9945 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-02 18:06:19 +00:00
Conrad Ludgate	2dc238e5b3	feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971 ) Fix the HTTP AuthMethod to accomodate the JWT authorization method. Introduces the JWT issuer as an additional field in the parquet logs	2024-12-02 17:54:32 +00:00
Folke Behrens	243bca1c49	Bump OTel, tracing, reqwest crates (#9970 )	2024-12-02 17:24:48 +00:00
Arseny Sher	fa909c27fc	Update consensus protocol spec (#9607 ) The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have. ref https://github.com/neondatabase/neon/issues/8699	2024-12-02 16:10:44 +00:00
Folke Behrens	1b60571636	proxy: Create Elasticache credentials provider lazily (#9967 ) ## Problem The credentials providers tries to connect to AWS STS even when we use plain Redis connections. ## Summary of changes * Construct the CredentialsProvider only when needed ("irsa").	2024-12-02 15:38:12 +00:00
Alexander Bayandin	c18716bb3f	CI(replication-tests): fix notifications about replication-tests failures (#9950 ) ## Problem `if: ${{ github.event.schedule }}` gets skipped if a previous step has failed, but we want to run the step for both `success` and `failure` ## Summary of changes - Add `!cancelled()` to notification step if-condition, to skip only cancelled jobs	2024-12-02 12:46:07 +00:00
Conrad Ludgate	cd1d2d1996	fix(proxy): forward notifications from authentication (#9948 ) Fixes https://github.com/neondatabase/cloud/issues/20973. This refactors `connect_raw` in order to return direct access to the delayed notices. I cannot find a way to test this with psycopg2 unfortunately, although testing it with psql does return the expected results.	2024-12-02 12:29:57 +00:00
John Spray	bd09369198	storcon: add metric for AZ scheduling violations (#9949 ) ## Problem We can't easily tell how far the state of shards is from their AZ preferences. This can be a cause of performance issues, so it's important for diagnosability that we can tell easily if there are significant numbers of shards that aren't running in their preferred AZ. Related: https://github.com/neondatabase/cloud/issues/15413 ## Summary of changes - In reconcile_all, count shards that are scheduled into the wrong AZ (if they have a preference), and publish it as a prometheus gauge. - Also calculate a statistic for how many shards wanted to reconcile but couldn't. This is clearly a lazy calculation: reconcile all only runs periodically. But that's okay: shards in the wrong AZ is something that only matters if it stays that way for some period of time.	2024-12-02 11:50:22 +00:00
Erik Grinaker	5330122049	test_runner: improve `wait_until` (#9936 ) Improves `wait_until` by: * Use `timeout` instead of `iterations`. This allows changing the timeout/interval parameters independently. * Make `timeout` and `interval` optional (default 20s and 0.5s). Most callers don't care. * Only output status every 1s by default, and add optional `status_interval` parameter. * Remove `show_intermediate_error`, this was always emitted anyway. Most callers have been updated to use the defaults, except where they had good reason otherwise.	2024-12-02 10:26:15 +00:00
Anastasia Lubennikova	45658ccccb	Update pgvector to 0.8.0 (#9733 )	2024-12-02 10:10:51 +00:00
John Spray	14853a3284	storcon: don't take any Service locks in /status and /ready (#9944 ) ## Problem We saw unexpected container terminations when running in k8s with with small CPU resource requests. The /status and /ready handlers called `maybe_forward`, which always takes the lock on Service::inner. If there is a lot of writer lock contention, and the container is starved of CPU, this increases the likelihood that we will get killed by the kubelet. It isn't certain that this was a cause of issues, but it is a potential source that we can eliminate. ## Summary of changes - Revise logic to return immediately if the URL is in the non-forwarded list, rather than calling maybe_forward	2024-12-01 18:09:58 +00:00
Konstantin Knizhnik	aad809b048	Fix issues with prefetch ring buffer resize (#9847 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479 We observe the following error in the logs ``` [XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128 ``` most likely caused by changing `neon.readahead_buffer_size` ## Summary of changes 1. Copy shard state 2. Do not use prefetch_set_unused in readahead_buffer_resize 3. Change prefetch buffer overflow criteria --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-01 15:47:28 +00:00
Alexander Bayandin	fae8e7ba76	Compute image: prepare Postgres v14-v16 for Debian 12 (#9954 ) ## Problem Current compute images for Postgres 14-16 don't build on Debian 12 because of issues with extensions. This PR fixes that, but for the current setup, it is mostly a no-op change. ## Summary of changes - Use `/bin/bash -euo pipefail` as SHELL to fail earlier - Fix `plv8` build: backport a trivial patch for v8 - Fix `postgis` build: depend `sfgal` version on Debian version instead of Postgres version Tested in: https://github.com/neondatabase/neon/pull/9849	2024-12-01 13:04:37 +00:00
Konstantin Knizhnik	97a9abd181	Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057 ) ## Problem See https://github.com/neondatabase/neon/issues/9023 ## Summary of changes Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of replica and primary configuration mismatch See https://github.com/neondatabase/postgres/pull/501 See https://github.com/neondatabase/postgres/pull/502 See https://github.com/neondatabase/postgres/pull/503 See https://github.com/neondatabase/postgres/pull/504 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-12-01 12:23:10 +00:00
Folke Behrens	4abc8e5282	Merge the consumption metric pushes (#9939 ) #8564 ## Problem The main and backup consumption metric pushes are completely independent, resulting in different event time windows and different idempotency keys. ## Summary of changes * Merge the push tasks, but keep chunks the same size.	2024-11-30 10:11:37 +00:00
Christian Schwarz	aa4ec11af9	page_service: rewrite batching to work without a timeout (#9851 ) # Problem The timeout-based batching adds latency to unbatchable workloads. We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see [this](https://github.com/neondatabase/neon/pull/9822) abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads. # Solution The insight is that, in the steady state of a batchable workload, the time we spend in `get_vectored` will be hundreds of microseconds anyway. If we prepare the next batch concurrently to `get_vectored`, we will have a sizeable batch ready once `get_vectored` of the current batch is done and do not need an explicit timeout. This can be reasonably described as pipelining of the protocol handler. # Implementation We model the sub-protocol handler for pagestream requests (`handle_pagrequests`) as two futures that form a pipeline: 2. Batching: read requests from the connection and fill the current batch 3. Execution: `take` the current batch, execute it using `get_vectored`, and send the response. The Reading and Batching stage are connected through a new type of channel called `spsc_fold`. See the long comment in the `handle_pagerequests_pipelined` for details. # Changes - Refactor `handle_pagerequests` - separate functions for - reading one protocol message; produces a `BatchedFeMessage` with just one page request in it - batching; tried to merge an incoming `BatchedFeMessage` into an existing `BatchedFeMessage`; returns `None` on success and returns back the incoming message in case merging isn't possible - execution of a batched message - unify the timeline handle acquisition & request span construction; it now happen in the function that reads the protocol message - Implement serial and pipelined model - serial: what we had before any of the batching changes - read one protocol message - execute protocol messages - pipelined: the design described above - optionality for execution of the pipeline: either via concurrent futures vs tokio tasks - Pageserver config - remove batching timeout field - add ability to configure pipelining mode - add ability to limit max batch size for pipelined configurations (required for the rollout, cf https://github.com/neondatabase/cloud/issues/20620 ) - ability to configure execution mode - Tests - remove `batch_timeout` parametrization - rename `test_getpage_merge_smoke` to `test_throughput` - add parametrization to test different max batch sizes and execution moes - rename `test_timer_precision` to `test_latency` - rename the test case file to `test_page_service_batching.py` - better descriptions of what the tests actually do ## On the holding The `TimelineHandle` in the pending batch While batching, we hold the `TimelineHandle` in the pending batch. Therefore, the timeline will not finish shutting down while we're batching. This is not a problem in practice because the concurrently ongoing `get_vectored` call will fail quickly with an error indicating that the timeline is shutting down. This results in the Execution stage returning a `QueryError::Shutdown`, which causes the pipeline / entire page service connection to shut down. This drops all references to the `Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the contained `TimelineHandle`s. - => fixes https://github.com/neondatabase/neon/issues/9850 # Performance Local run of the benchmarks, results in [this empty commit](`1cf5b1463f`) in the PR branch. Key take-aways: * `concurrent-futures` and `tasks` deliver identical `batching_factor` * tail latency impact unknown, cf https://github.com/neondatabase/neon/issues/9837 * `concurrent-futures` has higher throughput than `tasks` in all workloads (=lower `time` metric) * In unbatchable workloads, `concurrent-futures` has 5% higher `CPU-per-throughput` than that of `tasks`, and 15% higher than that of `serial`. * In batchable-32 workload, `concurrent-futures` has 8% lower `CPU-per-throughput` than that of `tasks` (comparison to tput of `serial` is irrelevant) * in unbatchable workloads, mean and tail latencies of `concurrent-futures` is practically identical to `serial`, whereas `tasks` adds 20-30us of overhead Overall, `concurrent-futures` seems like a slightly more attractive choice. # Rollout This change is disabled-by-default. Rollout plan: - https://github.com/neondatabase/cloud/issues/20620 # Refs - epic: https://github.com/neondatabase/neon/issues/9376 - this sub-task: https://github.com/neondatabase/neon/issues/9377 - the abandoned attempt to improve batching timeout resolution: https://github.com/neondatabase/neon/pull/9820 - closes https://github.com/neondatabase/neon/issues/9850 - fixes https://github.com/neondatabase/neon/issues/9835	2024-11-30 00:16:24 +00:00
Matthias van de Meent	973a8d2680	Fix timeout value used in XLogWaitForReplayOf (#9937 ) The previous value assumed usec precision, while the timeout used is in milliseconds, causing replica backends to wait for (potentially) many hours for WAL replay without the expected progress reports in logs. This fixes the issue. Reported-By: Alexander Lakhin <exclusion@gmail.com> ## Problem https://github.com/neondatabase/postgres/pull/279#issuecomment-2507671817 The timeout value was configured with the assumption the indicated value would be microseconds, where it's actually milliseconds. That causes the backend to wait for much longer (2h46m40s) before it emits the "I'm waiting for recovery" message. While we do have wait events configured on this, it's not great to have stuck backends without clear logs, so this fixes the timeout value in all our PostgreSQL branches. ## PG PRs * PG14: https://github.com/neondatabase/postgres/pull/542 * PG15: https://github.com/neondatabase/postgres/pull/543 * PG16: https://github.com/neondatabase/postgres/pull/544 * PG17: https://github.com/neondatabase/postgres/pull/545	2024-11-29 19:10:26 +00:00
Gleb Novikov	c848f25ec2	Fixed fast_import pgbin in calling get_pg_version (#9933 ) Was working on https://github.com/neondatabase/cloud/pull/20795 and discovered that fast_import is not working normally.	2024-11-29 17:58:36 +00:00
John Spray	d5624cc505	pageserver: download small objects using a smaller timeout (#9938 ) ## Problem It appears that the Azure storage API tends to hang TCP connections more than S3 does. Currently we use a 2 minute timeout for all downloads. This is large because sometimes the objects we download are large. However, waiting 2 minutes when doing something like downloading a manifest on tenant attach is problematic, because when someone is doing a "create tenant, create timeline" workflow, that 2 minutes is long enough for them reasonably to give up creating that timeline. Rather than propagate oversized timeouts further up the stack, we should use a different timeout for objects that we expect to be small. Closes: https://github.com/neondatabase/neon/issues/9836 ## Summary of changes - Add a `small_timeout` configuration attribute to remote storage, defaulting to 30 seconds (still a very generous period to do something like download an index) - Add a DownloadKind parameter to DownloadOpts, so that callers can indicate whether they expect the object to be small or large. - In the azure client, use small timeout for HEAD requests, and for GET requests if DownloadKind::Small is used. - Use DownloadKind::Small for manifests, indices, and heatmap downloads. This PR intentionally does not make the equivalent change to the S3 client, to reduce blast radius in case this has unexpected consequences (we could accomplish the same thing by editing lots of configs, but just skipping the code is simpler for right now)	2024-11-29 15:11:44 +00:00
Alexey Kondratov	538e2312a6	feat(compute_ctl): Always set application_name (#9934 ) ## Problem It was not always possible to judge what exactly some `cloud_admin` connections were doing because we didn't consistently set `application_name` everywhere. ## Summary of changes Unify the way we connect to Postgres: 1. Switch to building configs everywhere 2. Always set `application_name` and make naming consistent Follow-up for #9919 Part of neondatabase/cloud#20948	2024-11-29 13:55:56 +00:00
Erik Grinaker	a6073b5013	safekeeper: use jemalloc (#9780 ) ## Problem To add Safekeeper heap profiling in #9778, we need to switch to an allocator that supports it. Pageserver and proxy already use jemalloc. Touches #9534. ## Summary of changes Use jemalloc in Safekeeper.	2024-11-29 13:38:04 +00:00
John Spray	ea3798e3b3	storage controller: use proper ScheduleContext when evacuating a node (#9908 ) ## Problem When picking locations for a shard, we should use a ScheduleContext that includes all the other shards in the tenant, so that we apply proper anti-affinity between shards. If we don't do this, then it can lead to unstable scheduling, where we place a shard somewhere that the optimizer will then immediately move it away from. We didn't always do this, because it was a bit awkward to accumulate the context for a tenant rather than just walking tenants. This was a TODO in `handle_node_availability_transition`: ``` // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters // for tenants without secondary locations: if they have a secondary location, then this // schedule() call is just promoting an existing secondary) ``` This is a precursor to https://github.com/neondatabase/neon/issues/8264, where the current imperfect scheduling during node evacuation hampers testing. ## Summary of changes - Add an iterator type that yields each shard along with a schedulecontext that includes all the other shards from the same tenant - Use the iterator to replace hand-crafted logic in optimize_all_plan (functionally identical) - Use the iterator in `handle_node_availability_transition` to apply proper anti-affinity during node evacuation.	2024-11-29 13:27:49 +00:00
Conrad Ludgate	1d642d6a57	chore(proxy): vendor a subset of rust-postgres (#9930 ) Our rust-postgres fork is getting messy. Mostly because proxy wants more control over the raw protocol than tokio-postgres provides. As such, it's diverging more and more. Storage and compute also make use of rust-postgres, but in more normal usage, thus they don't need our crazy changes. Idea: * proxy maintains their subset * other teams use a minimal patch set against upstream rust-postgres Reviewing this code will be difficult. To implement it, I 1. Copied tokio-postgres, postgres-protocol and postgres-types from `00940fcdb5` 2. Updated their package names with the `2` suffix to make them compile in the workspace. 3. Updated proxy to use those packages 4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches applied https://github.com/jbg/tokio-postgres-rustls/pull/32 https://github.com/jbg/tokio-postgres-rustls/pull/33) 5. Removed as much dead code as I could find in the vendored libraries 6. Updated the tokio-postgres-rustls code to use our existing channel binding implementation	2024-11-29 11:08:01 +00:00
Erik Grinaker	3ffe6de0b9	test_runner/performance: add logical message ingest benchmark (#9749 ) Adds a benchmark for logical message WAL ingestion throughput end-to-end. Logical messages are essentially noops, and thus ignored by the Pageserver. Example results from my MacBook, with fsync enabled: ``` postgres_ingest: 14.445 s safekeeper_ingest: 29.948 s pageserver_ingest: 30.013 s pageserver_recover_ingest: 8.633 s wal_written: 10,340 MB message_count: 1310720 messages postgres_throughput: 715 MB/s safekeeper_throughput: 345 MB/s pageserver_throughput: 344 MB/s pageserver_recover_throughput: 1197 MB/s ``` See https://github.com/neondatabase/neon/issues/9642#issuecomment-2475995205 for running analysis. Touches #9642.	2024-11-29 09:40:08 +00:00
Alexey Kondratov	42fb3c4d30	fix(compute_ctl): Allow usage of DB names with whitespaces (#9919 ) ## Problem We used `set_path()` to replace the database name in the connection string. It automatically does url-safe encoding if the path is not already encoded, but it does it as per the URL standard, which assumes that tabs can be safely removed from the path without changing the meaning of the URL. See, e.g., https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks for DBs with properly %-encoded names, like with `%20`, as they are kept intact, but actually should be escaped. Yet, this is not true for Postgres, where it's completely valid to have trailing tabs in the database name. I think this is the PR that caused this regression https://github.com/neondatabase/neon/pull/9717, as it switched from `postgres::config::Config` back to `set_path()`. This was fixed a while ago already [1], btw, I just haven't added a test to catch this regression back then :( ## Summary of changes This commit changes the code back to use `postgres/tokio_postgres::Config` everywhere. While on it, also do some changes around, as I had to touch this code: 1. Bump some logging from `debug` to `info` in the spec apply path. We do not use `debug` in prod, and it was tricky to understand what was going on with this bug in prod. 2. Refactor configuration concurrency calculation code so it was reusable. Yet, still keep `1` in the case of reconfiguration. The database can be actively used at this moment, so we cannot guarantee that there will be enough spare connection slots, and the underlying code won't handle connection errors properly. 3. Simplify the installed extensions code. It was spawning a blocking task inside async function, which doesn't make much sense. Instead, just have a main sync function and call it with `spawn_blocking` in the API code -- the only place we need it to be async. 4. Add regression python test to cover this and related problems in the future. Also, add more extensive testing of schema dump and DBs and roles listing API. [1]: `4d1e48f3b9` [2]: https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org Resolves neondatabase/cloud#20869	2024-11-28 21:38:30 +00:00
Alexander Bayandin	e04dd3be0b	test_runner: rerun all failed tests (#9917 ) ## Problem Currently, we rerun only known flaky tests. This approach was chosen to reduce the number of tests that go unnoticed (by forcing people to take a look at failed tests and rerun the job manually), but it has some drawbacks: - In PRs, people tend to push new changes without checking failed tests (that's ok) - In the main, tests are just restarted without checking (understandable) - Parametrised tests become flaky one by one, i.e. if `test[1]` is flaky `, test[2]` is not marked as flaky automatically (which may or may not be the case). I suggest rerunning all failed tests to increase the stability of GitHub jobs and using the Grafana Dashboard with flaky tests for deeper analysis. ## Summary of changes - Rerun all failed tests twice at max	2024-11-28 19:02:57 +00:00
Vlad Lazar	eb520a14ce	pageserver: return correct LSN for interpreted proto keep alive responses (#9928 ) ## Problem For the interpreted proto the pageserver is not returning the correct LSN in replies to keep alive requests. This is because the interpreted protocol arm was not updating `last_rec_lsn`. ## Summary of changes * Return correct LSN in keep-alive responses * Fix shard field in wal sender traces	2024-11-28 17:38:47 +00:00
Arpad Müller	eb5d832e6f	Update rust to 1.83.0, also update cargo adjacent tools (#9926 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.83.0/). Also update `cargo-hakari`, `cargo-deny`, `cargo-hack` and `cargo-nextest` to their latest versions. Prior update was in #9445.	2024-11-28 15:49:30 +00:00
Erik Grinaker	70780e310c	Makefile: build pg_visibility (#9922 ) Build the `pg_visibility` extension for use with `neon_local`. This is useful to inspect the visibility map for debugging. Touches #9914.	2024-11-28 15:48:18 +00:00
Vlad Lazar	e82f7f0dfc	remote_storage/abs: count 404 and 304 for get as ok for metrics (#9912 ) ## Problem We currently see elevated levels of errors for GetBlob requests. This is because 404 and 304 are counted as errors for metric reporting. ## Summary of Changes Bring the implementation in line with the S3 client and treat 404 and 304 responses as ok for metric purposes. Related: https://github.com/neondatabase/cloud/issues/20666	2024-11-28 10:11:08 +00:00
Folke Behrens	8a6ee79f6f	Merge pull request #9921 from neondatabase/rc/release-proxy/2024-11-28 Proxy release 2024-11-28	2024-11-28 11:09:06 +01:00
Ivan Efremov	8173dc600a	proxy: spawn cancellation checks in the background (#9918 ) ## Problem For cancellation, a connection is open during all the cancel checks. ## Summary of changes Spawn cancellation checks in the background, and close connection immediately. Use task_tracker for cancellation checks.	2024-11-28 06:32:22 +00:00
github-actions[bot]	9052c32b46	Proxy release 2024-11-28	2024-11-28 06:02:15 +00:00
Erik Grinaker	da1daa2426	pageserver: only apply `ClearVmBits` on relevant shards (#9895 ) # Problem VM (visibility map) pages are stored and managed as any regular relation page, in the VM fork of the main relation. They are also sharded like other pages. Regular WAL writes to the VM pages (typically performed by vacuum) are routed to the correct shard as usual. However, VM pages are also updated via `ClearVmBits` metadata records emitted when main relation pages are updated. These metadata records were sent to all shards, like other metadata records. This had the following effects: * On shards responsible for VM pages, the `ClearVmBits` applies as expected. * On shard 0, which knows about the VM relation and its size but doesn't necessarily have any VM pages, the `ClearVmBits` writes may have been applied without also having applied the explicit WAL writes to VM pages. * If VM pages are spread across multiple shards (unlikely with 256MB stripe size), all shards may have applied `ClearVmBits` if the pages fall within their local view of the relation size, even for pages they do not own. * On other shards, this caused a relation size cache miss and a DbDir and RelDir lookup before dropping the `ClearVmBits`. With many relations, this could cause significant CPU overhead. This is not believed to be a correctness problem, but this will be verified in #9914. Resolves #9855. # Changes Route `ClearVmBits` metadata records only to the shards responsible for the VM pages. Verification of the current VM handling and cleanup of incomplete VM pages on shard 0 (and potentially elsewhere) is left as follow-up work.	2024-11-27 19:44:24 +00:00
Alex Chi Z.	9e3cb75bc7	fix(pageserver): flush deletion queue in `reload` shutdown mode (#9884 ) ## Problem close https://github.com/neondatabase/neon/issues/9859 ## Summary of changes Ensure that the deletion queue gets fully flushed (i.e., the deletion lists get applied) during a graceful shutdown. It is still possible that an incomplete shutdown would leave deletion list behind and cause race upon the next startup, but we assume this will unlikely happen, and even if it happened, the pageserver should already be at a tainted state and the tenant should be moved to a new tenant with a new generation number. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-27 18:30:54 +00:00
Folke Behrens	5c41707bee	proxy: promote two logs to error, fix multiline log (#9913 ) * Promote two logs from mpsc send errors to error level. The channels are unbounded and there shouldn't be errors. * Fix one multiline log from anyhow::Error. Use Debug instead of Display.	2024-11-27 18:05:46 +00:00
Erik Grinaker	cc37fa0f33	pageserver: add metrics for unknown `ClearVmBits` pages (#9911 ) ## Problem When ingesting implicit `ClearVmBits` operations, we silently drop the writes if the relation or page is unknown. There are implicit assumptions around VM pages wrt. explicit/implicit updates, sharding, and relation sizes, which can possibly drop writes incorrectly. Adding a few metrics will allow us to investigate further and tighten up the logic. Touches #9855. ## Summary of changes Add a `pageserver_wal_ingest_clear_vm_bits_unknown` metric to record dropped `ClearVmBits` writes. Also add comments clarifying the behavior of relation sizes on non-zero shards.	2024-11-27 17:16:41 +00:00
Alex Chi Z.	23f5a27146	fix(storage-scrubber): valid layermap error degrades to warning (#9902 ) Valid layer assumption is a necessary condition for a layer map to be valid. It's a stronger check imposed by gc-compaction than the actual valid layermap definition. Actually, the system can work as long as there are no overlapping layer maps. Therefore, we degrade that into a warning. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-27 16:07:39 +00:00
Erik Grinaker	e4f437a354	pageserver: add relsize cache metrics (#9890 ) ## Problem We don't have any observability for the relation size cache. We have seen cache misses cause significant performance impact with high relation counts. Touches #9855. ## Summary of changes Adds the following metrics: * `pageserver_relsize_cache_entries` * `pageserver_relsize_cache_hits` * `pageserver_relsize_cache_misses` * `pageserver_relsize_cache_misses_old`	2024-11-27 13:54:14 +00:00
Vlad Lazar	8fdf786217	pageserver: add tenant config override for wal receiver proto (#9888 ) ## Problem Can't change protocol at tenant granularity. ## Summary of changes Add tenant config level override for wal receiver protocol. ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-27 13:46:23 +00:00
Vlad Lazar	9e0148de11	safekeeper: use protobuf for sending compressed records to pageserver (#9821 ) ## Problem https://github.com/neondatabase/neon/pull/9746 lifted decoding and interpretation of WAL to the safekeeper. This reduced the ingested amount on the pageservers by around 10x for a tenant with 8 shards, but doubled the ingested amount for single sharded tenants. Also, https://github.com/neondatabase/neon/pull/9746 uses bincode which doesn't support schema evolution. Technically the schema can be evolved, but it's very cumbersome. ## Summary of changes This patch set addresses both problems by adding protobuf support for the interpreted wal records and adding compression support. Compressed protobuf reduced the ingested amount by 100x on the 32 shards `test_sharded_ingest` case (compared to non-interpreted proto). For the 1 shard case the reduction is 5x. Sister change to `rust-postgres` is [here](https://github.com/neondatabase/rust-postgres/pull/33). ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-27 12:12:21 +00:00
Alexander Bayandin	7b41ee872e	CI(pre-merge-checks): build only one build-tools-image (#9718 ) ## Problem The `pre-merge-checks` workflow relies on the build-tools image. If changes to the `build-tools` image have been merged into the main branch since the last CI run for a PR (with other changes to the `build-tools`), the image will be rebuilt during the merge queue run. Otherwise, cached images are used. Rebuilding the image adds approximately 10 minutes on x86-64 and 20 minutes on arm64 to the process. ## Summary of changes - parametrise `build-build-tools-image` job with arch and Debian version - Run `pre-merge-checks` only on Debian 12 x86-64 image	2024-11-27 10:42:26 +00:00
Peter Bendel	277c33ba3f	ingest benchmark: after effective_io_concurrency = 100 we can increase compute side parallelism (#9904 ) ## Problem ingest benchmark tests project migration to Neon involving steps - COPY relation data - create indexes - create constraints Previously we used only 4 copy jobs, 4 create index jobs and 7 maintenance workers. After increasing effective_io_concurrency on compute we see that we can sustain more parallelism in the ingest bench ## Summary of changes Increase copy jobs to 8, create index jobs to 8 and maintenance workers to 16	2024-11-27 10:09:01 +00:00
Tristan Partin	2b788cb53f	Bump neon.logical_replication_max_snap_files default to 10000 (#9896 ) This bump comes from a recommendation from Chi. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-26 17:49:37 +00:00
Peter Bendel	13feda0669	track how much time the flush loop is stalled waiting for uploads (#9885 ) ## Problem We don't know how much time PS is losing during ingest when waiting for remote storage uploads in the flush frozen layer loop. Also we don't know how many remote storage requests get an permit without waiting (not throttled by remote_storage concurrency_limit). ## Summary of changes - Add a metric that accumulates the time waited per shard/PS - in [remote storage semaphore wait seconds](https://neonprod.grafana.net/d/febd9732-9bcf-4992-a821-49b1f6b02724/remote-storage?orgId=1&var-datasource=HUNg6jvVk&var-instance=pageserver-26.us-east-2.aws.neon.build&var-instance=pageserver-27.us-east-2.aws.neon.build&var-instance=pageserver-28.us-east-2.aws.neon.build&var-instance=pageserver-29.us-east-2.aws.neon.build&var-instance=pageserver-30.us-east-2.aws.neon.build&var-instance=pageserver-31.us-east-2.aws.neon.build&var-instance=pageserver-36.us-east-2.aws.neon.build&var-instance=pageserver-37.us-east-2.aws.neon.build&var-instance=pageserver-38.us-east-2.aws.neon.build&var-instance=pageserver-39.us-east-2.aws.neon.build&var-instance=pageserver-40.us-east-2.aws.neon.build&var-instance=pageserver-41.us-east-2.aws.neon.build&var-request_type=put_object&from=1731961336340&to=1731964762933&viewPanel=3) add a first bucket with 100 microseconds to count requests that do not need to wait on semaphore Update: created a new version that uses a Gauge (one increasing value per PS/shard) instead of histogram as suggested by review	2024-11-26 11:46:58 +00:00
Conrad Ludgate	96a1b71c84	chore(proxy): discard request context span during passthrough (#9882 ) ## Problem The RequestContext::span shouldn't live for the entire postgres connection, only the handshake. ## Summary of changes * Slight refactor to the RequestContext to discard the span upon handshake completion. * Make sure the temporary future for the handshake is dropped (not bound to a variable) * Runs our nightly fmt script	2024-11-25 21:32:53 +00:00
Arpad Müller	a74ab9338d	fast_import: remove hardcoding of pg_version (#9878 ) Before, we hardcoded the pg_version to 140000, while the code expected version numbers like 14. Now we use an enum, and code from `extension_server.rs` to auto-detect the correct version. The enum helps when we add support for a version: enums ensure that compilation fails if one forgets to put the version to one of the `match` locations. cc https://github.com/neondatabase/neon/pull/9218	2024-11-25 20:23:42 +00:00
Folke Behrens	7404887b81	proxy: Demote errors from cplane request routines to debug (#9886 ) ## Problem Any errors from these async blocks are unconditionally logged at error level even though we already handle such errors based on context. ## Summary of changes * Log raw errors from creating and executing cplane requests at debug level. * Inline macro calls to retain the correct callsite.	2024-11-25 19:35:32 +00:00
Folke Behrens	87e4dd23a1	proxy: Demote all cplane error replies to info log level (#9880 ) ## Problem The vast majority of the error/warn logs from cplane are about time or data transfer quotas exceeded or endpoint-not-found errors and not operational errors in proxy or cplane. ## Summary of changes * Demote cplane error replies to info level. * Raise other errors from warn back to error.	2024-11-25 17:53:26 +00:00
Vlad Lazar	7a2f0ed8d4	safekeeper: lift decoding and interpretation of WAL to the safekeeper (#9746 ) ## Problem For any given tenant shard, pageservers receive all of the tenant's WAL from the safekeeper. This soft-blocks us from using larger shard counts due to bandwidth concerns and CPU overhead of filtering out the records. ## Summary of changes This PR lifts the decoding and interpretation of WAL from the pageserver into the safekeeper. A customised PG replication protocol is used where instead of sending raw WAL, the safekeeper sends filtered, interpreted records. The receiver drives the protocol selection, so, on the pageserver side, usage of the new protocol is gated by a new pageserver config: `wal_receiver_protocol`. More granularly the changes are: 1. Optionally inject the protocol and shard identity into the arguments used for starting replication 2. On the safekeeper side, implement a new wal sending primitive which decodes and interprets records before sending them over 3. On the pageserver side, implement the ingestion of this new replication message type. It's very similar to what we already have for raw wal (minus decoding and interpreting). ## Notes * This PR currently uses my [branch of rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support) which includes the deserialization logic for the new replication message type. PR for that is open [here](https://github.com/neondatabase/rust-postgres/pull/32). * This PR contains changes for both pageservers and safekeepers. It's safe to merge because the new protocol is disabled by default on the pageserver side. We can gradually start enabling it in subsequent releases. * CI tests are running on https://github.com/neondatabase/neon/pull/9747 ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-25 17:29:28 +00:00
Christian Schwarz	5c2356988e	page_service: add benchmark for batching (#9820 ) This PR adds two benchmark to demonstrate the effect of server-side getpage request batching added in https://github.com/neondatabase/neon/pull/9321. For the CPU usage, I found the the `prometheus` crate's built-in CPU usage accounts the seconds at integer granularity. That's not enough you reduce the target benchmark runtime for local iteration. So, add a new `libmetrics` metric and report that. The benchmarks are disabled because [on our benchmark nodes, timer resolution isn't high enough](https://neondb.slack.com/archives/C059ZC138NR/p1732264223207449). They work (no statement about quality) on my bare-metal devbox. They will be refined and enabled once we find a fix. Candidates at time of writing are: - https://github.com/neondatabase/neon/pull/9822 - https://github.com/neondatabase/neon/pull/9851 Refs: - Epic: https://github.com/neondatabase/neon/issues/9376 - Extracted from https://github.com/neondatabase/neon/pull/9792	2024-11-25 15:52:39 +00:00
Konstantin Knizhnik	441612c1ce	Prefetch on macos (#9875 ) ## Problem Prefetch is disabled at MacODS because `posix_fadvise` is not available. But Neon prefetch is not using this function and for testing at MacOS is it very convenient that prefetch is available. ## Summary of changes Define `USE_PREFETCH` in Makefile. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-25 15:21:52 +00:00
Arpad Müller	77630e5408	Address beta clippy lint needless_lifetimes (#9877 ) The 1.82.0 version of Rust will be stable soon, let's get the clippy lint fixes in before the compiler version upgrade.	2024-11-25 14:59:12 +00:00
Alexander Bayandin	3d380acbd1	Bump default Debian version to Bookworm everywhere (#9863 ) ## Problem We have a couple of CI workflows that still run on Debian Bullseye, and the default Debian version in images is Bullseye as well (we explicitly set building on Bookworm) ## Summary of changes - Run `pgbench-pgvector` on Bookworm (fix a couple of packages) - Run `trigger_bench_on_ec2_machine_in_eu_central_1` on Bookworm - Change default `DEBIAN_VERSION` in Dockerfiles to Bookworm - Make `pinned` docker tag an alias to `pinned-bookworm`	2024-11-25 14:43:32 +00:00
Alex Chi Z.	4630b70962	fix(pageserver): ensure all layers are flushed before measuring RSS (#9861 ) ## Problem close https://github.com/neondatabase/neon/issues/9761 The test assumed that no new L0 layers are flushed throughout the process, which is not true. ## Summary of changes Fix the test case `test_compaction_l0_memory` by flushing in-memory layers before compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-25 14:25:18 +00:00
Conrad Ludgate	6f6749c4a9	chore: update rustls (#9871 )	2024-11-25 12:01:30 +00:00
Folke Behrens	0d1e82f0a7	Bump futures-* crates, drop unused license, hide duplicate crate warnings (#9858 ) * The futures-util crate we use was yanked. Bump it and its siblings to new patch release. https://github.com/rust-lang/futures-rs/releases/tag/0.3.31 * cargo-deny: Drop an unused license. * cargo-deny: Don't warn about duplicate crate. Duplicate crates are unavoidable and the noise just hides real warnings.	2024-11-25 10:59:49 +00:00
Alexander Bayandin	6f7aeaa1c5	test_runner: use LFC by default (#8613 ) ## Problem LFC is not enabled by default in tests, but it is enabled in production. This increases the risk of errors in the production environment, which were not found during the routine workflow. However, enabling LFC for all the tests may overload the disk on our servers and increase the number of failures. So, we try enabling LFC in one case to evaluate the possible risk. ## Summary of changes A new environment variable, USE_LFC is introduced. If it is set to true, LFC is enabled by default in all the tests. In our workflow, we enable LFC for PG17, release, x86-64, and disabled for all other combinations. --------- Co-authored-by: Alexey Masterov <alexeymasterov@neon.tech> Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>	2024-11-25 09:01:05 +00:00
Christian Schwarz	450be26bbb	fast imports: initial Importer and Storage changes (#9218 ) Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Stas Kelvic <stas@neon.tech> # Context This PR contains PoC-level changes for a product feature that allows onboarding large databases into Neon without going through the regular data path. # Changes This internal RFC provides all the context * https://github.com/neondatabase/cloud/pull/19799 In the language of the RFC, this PR covers * the Importer code (`fast_import`) * all the Pageserver changes (mgmt API changes, flow implementation, etc) * a basic test for the Pageserver changes # Reviewing As acknowledged in the RFC, the code added in this PR is not ready for general availability. Also, the architecture is not to be discussed in this PR, but in the RFC and associated Slack channel instead. Reviewers of this PR should take that into consideration. The quality bar to apply during review depends on what area of the code is being reviewed: * Importer code (`fast_import`): practically anything goes * Core flow (`flow.rs`): * Malicious input data must be expected and the existing threat models apply. * The code must not be safe to execute on dedicated Pageserver instances: * This means in particular that tenants on other Pageserver instances must not be affected negatively wrt data confidentiality, integrity or availability. * Other code: the usual quality bar * Pay special attention to correct use of gate guards, timeline cancellation in all places during shutdown & migration, etc. * Consider the broader system impact; if you find potentially problematic interactions with Storage features that were not covered in the RFC, bring that up during the review. I recommend submitting three separate reviews, for the three high-level areas with different quality bars. # References (Internal-only) * refs https://github.com/neondatabase/cloud/issues/17507 * refs https://github.com/neondatabase/company_projects/issues/293 * refs https://github.com/neondatabase/company_projects/issues/309 * refs https://github.com/neondatabase/cloud/issues/20646 --------- Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2024-11-22 22:47:06 +00:00
Anastasia Lubennikova	3245f7b88d	Rename 'installed_extensions' metric to 'compute_installed_extensions' (#9759 ) to keep it consistent with existing compute metrics. flux-fleet change is not needed, because it doesn't have any filter by metric name for compute metrics.	2024-11-22 19:27:04 +00:00
Alex Chi Z.	c1937d073f	fix(pageserver): ensure upload happens after delete (#9844 ) ## Problem Follow up of https://github.com/neondatabase/neon/pull/9682, that patch didn't fully address the problem: what if shutdown fails due to whatever reason and then we reattach the tenant? Then we will still remove the future layer. The underlying problem is that the fix for #5878 gets voided because of the generation optimizations. Of course, we also need to ensure that delete happens after uploads, but note that we only schedule deletes when there are no ongoing upload tasks, so that's fine. ## Summary of changes * Add a test case to reproduce the behavior (by changing the original test case to attach the same generation). * If layer upload happens after the deletion, drain the deletion queue before uploading. * If blocked_deletion is enabled, directly remove it from the blocked_deletion queue. * Local fs backend fix to avoid race between deletion and preload. * test_emergency_mode does not need to wait for uploads (and it's generally not possible to wait for uploads). * ~~Optimize deletion executor to skip validation if there are no files to delete.~~ this doesn't work --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-22 18:30:53 +00:00
Alex Chi Z.	6f8b1eb5a6	test(pageserver): add detach ancestor smoke test (#9842 ) ## Problem Follow up to https://github.com/neondatabase/neon/pull/9682, hopefully we can detect some issues or assure ourselves that this is ready for production. ## Summary of changes * Add a compaction-detach-ancestor smoke test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-22 18:21:51 +00:00
Erik Grinaker	e939d36dd4	safekeeper,pageserver: fix CPU profiling allowlists (#9856 ) ## Problem The HTTP router allowlists matched both on the path and the query string. This meant that only `/profile/cpu` would be allowed without auth, while `/profile/cpu?format=svg` would require auth. Follows #9764. ## Summary of changes * Match allowlists on URI path, rather than the entire URI. * Fix the allowlist for Safekeeper to use `/profile/cpu` rather than the old `/pprof/profile`. * Just use a constant slice for the allowlist; it's only a handful of items, and these handlers are not on hot paths.	2024-11-22 17:50:33 +00:00
Alex Chi Z.	211e4174d2	fix(pageserver): preempt and retry azure list operation (#9840 ) ## Problem close https://github.com/neondatabase/neon/issues/9836 Looking at Azure SDK, the only related issue I can find is https://github.com/azure/azure-sdk-for-rust/issues/1549. Azure uses reqwest as the backend, so I assume there's some underlying magic unknown to us that might have caused the stuck in #9836. The observation is: * We didn't get an explicit out of resource HTTP error from Azure. * The connection simply gets stuck and times out. * But when we retry after we reach the timeout, it succeeds. This issue is hard to identify -- maybe something went wrong at the ABS side, or something wrong with our side. But we know that a retry will usually succeed if we give up the stuck connection. Therefore, I propose the fix that we preempt stuck HTTP operation and actively retry. This would mitigate the problem, while in the long run, we need to keep an eye on ABS usage and see if we can fully resolve this problem. The reasoning of such timeout mechanism: we use a much smaller timeout than before to preempt, while it is possible that a normal listing operation would take a longer time than the initial timeout if it contains a lot of keys. Therefore, after we terminate the connection, we should double the timeout, so that such requests would eventually succeed. ## Summary of changes * Use exponential growth for ABS list timeout. * Rather than using a fixed timeout, use a timeout that starts small and grows * Rather than exposing timeouts to the list_streaming caller as soon as we see them, only do so after we have retried a few times Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-22 17:50:00 +00:00
Ivan Efremov	3b1ac8b14a	proxy: Implement cancellation rate limiting (#9739 ) Implement cancellation rate limiting and ip allowlist checks. Add ip_allowlist to the cancel closure Fixes [#16456](https://github.com/neondatabase/cloud/issues/16456)	2024-11-22 16:46:38 +00:00
Alexander Bayandin	b3b579b45e	test_bulk_insert: fix typing for PgVersion (#9854 ) ## Problem Along with the migration to Python 3.11, I switched `C(str, Enum)` with `C(StrEnum)`; one such example is the `PgVersion` enum. It required more changes in `PgVersion` itself (before, it accepted both `str` and `int`, and after it, it supports only `str`), which caused the `test_bulk_insert` test to fail. ## Summary of changes - `test_bulk_insert`: explicitly cast pg_version from `timeline_detail` to str	2024-11-22 16:13:53 +00:00
Conrad Ludgate	8ab96cc71f	chore(proxy/jwks): reduce the rightward drift of jwks renewal (#9853 ) I found the rightward drift of the `renew_jwks` function hard to review. This PR splits out some major logic and uses early returns to make the happy path more linear.	2024-11-22 14:51:32 +00:00
Alexander Bayandin	51d26a261b	build(deps): bump mypy from 1.3.0 to 1.13.0 (#9670 ) ## Problem We use a pretty old version of `mypy` 1.3 (released 1.5 years ago), it produces false positives for `typing.Self`. ## Summary of changes - Bump `mypy` from 1.3 to 1.13 - Fix new warnings and errors - Use `typing.Self` whenever we `return self`	2024-11-22 14:31:36 +00:00
Tristan Partin	c10b7f7de9	Write a newline after adding dynamic_shared_memory_type to PG conf (#9843 ) Without adding a newline, we can end up with a conf line that looks like the following: dynamic_shared_memory_type = mmap# Managed by compute_ctl: begin This leads to Postgres logging: LOG: configuration file "/var/db/postgres/compute/pgdata/postgresql.conf" contains errors; unaffected changes were applied Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-22 13:37:06 +00:00
Heikki Linnakangas	7372312a73	Avoid unnecessary send_replace calls in seqwait (#9852 ) The notifications need to be sent whenever the waiters heap changes, per the comment in `update_status`. But if 'advance' is called when there are no waiters, or the new LSN is lower than the waiters so that no one needs to be woken up, there's no need to send notifications. This saves some CPU cycles in the common case that there are no waiters.	2024-11-22 13:29:49 +00:00
John Spray	d9de65ee8f	pageserver: permit reads behind GC cutoff during LSN grace period (#9833 ) ## Problem In https://github.com/neondatabase/neon/issues/9754 and the flakiness of `test_readonly_node_gc`, we saw that although our logic for controlling GC was sound, the validation of getpage requests was not, because it could not consider LSN leases when requests arrived shortly after restart. Closes https://github.com/neondatabase/neon/issues/9754 ## Summary of changes This is the "Option 3" discussed verbally -- rather than holding back gc cutoff, we waive the usual validation of request LSN if we are still waiting for leases to be sent after startup - When validating LSN in `wait_or_get_last_lsn`, skip the validation relative to GC cutoff if the timeline is still in its LSN lease grace period - Re-enable test_readonly_node_gc	2024-11-22 09:24:23 +00:00
Fedor Dikarev	83b73fc24e	Batch scrape workflows up to last 30 days and stop ad-hoc (#9846 ) Comparing Batch and Ad-hoc collectors there is no big difference, just we need scrape for longer duration to catch retries. Dashboard with comparison: https://neonprod.grafana.net/d/be3pjm7c9ne2oe/compare-ad-hoc-and-batch?orgId=1&from=1731345095814&to=1731946295814 I should anyway raise support case with Github relating to that, meanwhile that should be working solution and should save us some cost, so it worths to switch to Batch now. Ref: https://github.com/neondatabase/cloud/issues/17503	2024-11-22 09:06:00 +00:00
Peter Bendel	1e05e3a6e2	minor PostgreSQL update in benchmarking (#9845 ) ## Problem in benchmarking.yml job pgvector we install postgres from deb packages. After the minor postgres update the referenced packages no longer exist [Failing job: ](https://github.com/neondatabase/neon/actions/runs/11965785323/job/33360391115#step:4:41) ## Summary of changes Reference and install the updated packages. [Successful job after this fix](https://github.com/neondatabase/neon/actions/runs/11967959920/job/33366011934#step:4:45)	2024-11-22 08:31:54 +00:00
Tristan Partin	37962e729e	Fix panic in compute_ctl metrics collection (#9831 ) Calling unwrap on the encoder is a little overzealous. One of the errors that can be returned by the encode function in particular is the non-existence of metrics for a metric family, so we should prematurely filter instances like that out. I believe that the cause of this panic was caused by a race condition between the prometheus collector and the compute collecting the installed extensions metric for the first time. The HTTP server is spawned on a separate thread before we even start bringing up Postgres. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-21 20:19:02 +00:00
Erik Grinaker	190e8cebac	safekeeper,pageserver: add CPU profiling (#9764 ) ## Problem We don't have a convenient way to gather CPU profiles from a running binary, e.g. during production incidents or end-to-end benchmarks, nor during microbenchmarks (particularly on macOS). We would also like to have continuous profiling in production, likely using [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/). We may choose to use either eBPF profiles or pprof profiles for this (pending testing and discussion with SREs), but pprof profiles appear useful regardless for the reasons listed above. See https://github.com/neondatabase/cloud/issues/14888. This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888. ## Summary of changes Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns it. Defaults to a 5-second pprof Protobuf profile for use with e.g. `pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query parameters: * `format`: output format (`pprof` or `svg`) * `frequency`: sampling frequency in microseconds (default 100) * `seconds`: number of seconds to profile (default 5) Also integrates pprof profiles into Criterion benchmarks, such that flamegraph reports can be taken with `cargo bench ... --profile-duration <seconds>`. Output under `target/criterion//profile/flamegraph.svg`. Example profiles: pprof profile (use [`pprof`](https://github.com/google/pprof)): [profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz) * Web interface: `pprof -http :6060 profile.pb.gz` * Interactive flamegraph: [profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)	2024-11-21 18:59:46 +00:00
Conrad Ludgate	725a5ff003	fix(proxy): CancelKeyData display log masking (#9838 ) Fixes the masking for the CancelKeyData display format. Due to negative i32 cast to u64, the top-bits all had `0xffffffff` prefix. On the bitwise-or that followed, these took priority. This PR also compresses 3 logs during sql-over-http into 1 log with durations as label fields, as prior discussed.	2024-11-21 16:46:30 +00:00
Alexander Bayandin	8d1c44039e	Python 3.11 (#9515 ) ## Problem On Debian 12 (Bookworm), Python 3.11 is the latest available version. ## Summary of changes - Update Python to 3.11 in build-tools - Fix ruff check / format - Fix mypy - Use `StrEnum` instead of pair `str`, `Enum` - Update docs	2024-11-21 16:25:31 +00:00
Konstantin Knizhnik	0713ff3176	Bump Postgres version (#9808 ) ## Problem I have made a mistake in merging Postgre PRs ## Summary of changes Restore consistency of submodule referenced. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-21 14:56:56 +00:00
John Spray	42bda5d632	pageserver: revise metrics lifetime for SecondaryTenant (#9818 ) ## Problem We saw a scale test failure when one shard went secondary->attached->secondary in a short period of time -- the metrics for the shard failed a validation assertion that is meant to ensure the size metric matches the sum of layer sizes in the SecondaryDetail struct. This appears to be due to two SecondaryTenants being alive at the same time -- the first one was shut down but still had its contributions to the metrics. Closes: https://github.com/neondatabase/neon/issues/9628 ## Summary of changes - Refactor code for validating metrics and call it in shutdown as well as during downloads - Move code for dropping per-tenant secondary metrics from drop() into shutdown(), so that once shutdown() completes it is definitely safe to instantiate another SecondaryTenant for the same tenant.	2024-11-21 08:31:24 +00:00
Ivan Efremov	995e729ebe	Merge pull request #9832 from neondatabase/rc/release-proxy/2024-11-21 Proxy release 2024-11-21	2024-11-21 09:41:31 +02:00
github-actions[bot]	76077e1ddf	Proxy release 2024-11-21	2024-11-21 06:02:11 +00:00
Arpad Müller	59c2c3f8ad	compute_ctl: print OpenTelemetry errors via tracing, not stdout (#9830 ) Before, `OpenTelemetry` errors were printed to stdout/stderr directly, causing one of the few log lines without a timestamp, like: ``` OpenTelemetry trace error occurred. error sending request for url (http://localhost:4318/v1/traces) ``` Now, we print: ``` 2024-11-21T02:24:20.511160Z INFO OpenTelemetry error: error sending request for url (http://localhost:4318/v1/traces) ``` I found this while investigating #9731.	2024-11-21 04:46:01 +00:00
Ivan Efremov	2d6bf176a0	proxy: Refactor http conn pool (#9785 ) - Use the same ConnPoolEntry for http connection pool. - Rename EndpointConnPool to the HttpConnPool. - Narrow clone bound for client Fixes #9284	2024-11-20 19:36:29 +00:00
Vadim Kharitonov	313ebfdb88	[proxy] chore: allow bypassing empty `params` to `/sql` endpoint (#9827 ) ## Problem ``` curl -H "Neon-Connection-String: postgresql://neondb_owner:PASSWORD@ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/neondb?sslmode=require" https://ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/sql -d '{"query":"SELECT 1","params":[]}' ``` For such a query, I also need to send `params`. Do I really need it? ## Summary of changes I've marked `params` as optional	2024-11-20 19:36:23 +00:00
Arpad Müller	811fab136f	scrubber: allow restricting find_garbage to a partial tenant id prefix (#9814 ) Adds support to the `find_garbage` command to restrict itself to a partial tenant ID prefix, say `a`, and then it only traverses tenants with IDs starting with `a`. One can now pass the `--tenant-id-prefix` parameter. That way, one can shard the `find_garbage` command and make it run in parallel. The PR also does a change of how `remote_storage` first removes trailing `/`s, only to then add them in the listing function. It turns out that this isn't neccessary and it prevents the prefix functionality from working. S3 doesn't do this either.	2024-11-20 19:31:02 +00:00
Vlad Lazar	ee26f09e45	pageserver: remove shard split hard link assertion (#9829 ) ## Problem We were hitting this assertion in debug mode tests sometimes. This case was being hit when the parent shard has no resident layers. For instance, this is the case on split retry where the previous attempt shut-down the parent and deleted local state for it. If the logical size calculation does not download some layers before we get to the hardlinking, then the assertion is hit. ## Summary of Changes Remove the assertion. It's fine for the ancestor to not have any resident layers at the time of the split. Closes https://github.com/neondatabase/neon/issues/9412	2024-11-20 18:33:05 +00:00
Conrad Ludgate	f36f0068b8	chore(proxy): demote more logs during successful connection attempts (#9828 ) Follow up to #9803 See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment. I've also slightly refactored the sql-over-http body read/length reject code. I can split that into a separate PR. It just felt natural after I switched to `read_body_with_limit` as we discussed during the meet.	2024-11-20 17:50:39 +00:00
John Spray	5ff2f1ee7d	pageserver: enable compaction to proceed while live-migrating (#5397 ) ## Problem Long ago, in #5299 the tenant states for migration are added, but respected only in a coarse-grained way: when hinted not to do deletions, tenants will just avoid doing all GC or compaction. Skipping compaction is not necessary for AttachedMulti, as we will soon become the primary attached location, and it is not a waste of resources to proceed with compaction. Instead, per the RFC https://github.com/neondatabase/neon/pull/5029/files), deletions should be queued up in this state, and executed later when we switch to AttachedSingle. Avoiding compaction in AttachedMulti can have an operational impact if a tenant is under significant write load, as a long-running migration can result in a large accumulation of delta layers with commensurate impact on read latency. Closes: https://github.com/neondatabase/neon/issues/5396 ## Summary of changes - Add a 'config' part to RemoteTimelineClient so that it can be aware of the mode of the tenant it belongs to, and wire this through for construction + updates - Add a special buffer for delayed deletions, and when in AttachedMulti route deletions here instead of into the main remote client queue. This is drained when transitioning to AttachedSingle. If the tenant is detached or our process dies before then, then these objects are leaked. - As a quality of life improvement, also use the remote timeline client's knowledge of the tenant state to avoid submitting remote consistent LSN updates for validation when in AttachedStale (as we know these will fail) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-11-20 17:31:55 +00:00
John Spray	67f5f83edc	pageserver: avoid reading SLRU blocks for GC on shards >0 (#9423 ) ## Problem SLRU blocks, which can add up to several gigabytes, are currently ingested by all shards, multiplying their capacity cost by the shard count and slowing down ingest. We do this because all shards need the SLRU pages to do timestamp->LSN lookup for GC. Related: https://github.com/neondatabase/neon/issues/7512 ## Summary of changes - On non-zero shards, learn the GC offset from shard 0's index instead of calculating it. - Add a test `test_sharding_gc` that exercises this - Do GC in test_pg_regress as a general smoke test that GC functions run (e.g. this would fail if we were using SLRUs we didn't have) In this PR we are still ingesting SLRUs everywhere, but not using them any more. Part 2 PR (https://github.com/neondatabase/neon/pull/9786) makes the change to not store them at all. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-11-20 15:56:14 +00:00
John Spray	593e35027a	tests: use fewer pageservers in test_sharding_split_smoke (#9804 ) ## Problem This test uses a gratuitous number of pageservers (16). This works fine when there are plenty of system resources, but causes issues on test runners that have limited resources and run many tests concurrently. Related: https://github.com/neondatabase/neon/issues/9802 ## Summary of changes - Split from 2 shards to 4, instead of 4 to 8 - Don't give every shard a separate pageserver, let two locations share each pageserver. Net result is 4 pageservers instead of 16	2024-11-20 14:57:59 +00:00
Folke Behrens	bf7d859a8b	proxy: Rename RequestMonitoring to RequestContext (#9805 ) ## Problem It is called context/ctx everywhere and the Monitoring suffix needlessly confuses with proper monitoring code. ## Summary of changes * Rename RequestMonitoring to RequestContext * Rename RequestMonitoringInner to RequestContextInner	2024-11-20 12:50:36 +00:00
Alexander Bayandin	899933e159	scan_log_for_errors: check that regex is correct (#9815 ) ## Problem I've noticed that we have 2 flaky tests which failed with error: ``` re.error: missing ), unterminated subpattern at position 21 ``` - `test_timeline_archival_chaos` — has been already fixed - `test_sharded_tad_interleaved_after_partial_success` — I didn't manage to find the incorrect regex [Internal link](https://neonprod.grafana.net/goto/yfmVHV7NR?orgId=1) ## Summary of changes - Wrap `re.match` in `try..except` block and print incorrect regex	2024-11-20 12:48:21 +00:00
Alexander Bayandin	46beecacce	CI(benchmarking): route test failures to on-call-qa-staging-stream (#9813 ) ## Problem We want to keep `#on-call-staging-stream` channel close to the prod one and redirect notifications from failing benchmarks to another channel for investigation. ## Summary of changes - Send notifications regarding failures in `benchmarking` job to `#on-call-staging-stream` - Send notifications regarding failures in `periodic_pagebench` job to `#on-call-staging-stream`	2024-11-20 12:23:41 +00:00
Fedor Dikarev	94e4a0e2a0	update macos version for runner (#9817 ) Closes: https://github.com/neondatabase/neon/issues/9816 Run MacOs builds on `macos-15`. As `pkg-config` is bundled in runner image, don't install it with `brew`	2024-11-20 13:04:14 +01:00
John Spray	33dce25af8	safekeeper: block deletion on protocol handler shutdown (#9364 ) ## Problem Two recently observed log errors indicate safekeeper tasks for a timeline running after that timeline's deletion has started. - https://github.com/neondatabase/neon/issues/8972 - https://github.com/neondatabase/neon/issues/8974 These code paths do not have a mechanism that coordinates task shutdown with the overall shutdown of the timeline. ## Summary of changes - Add a `Gate` to `Timeline` - Take the gate as part of resident timeline guard: any code that holds a guard over a timeline staying resident should also hold a guard over the timeline's total lifetime. - Take the gate from the wal removal task - Respect Timeline::cancel in WAL send/recv code, so that we do not block shutdown indefinitely. - Add a test that deletes timelines with open pageserver+compute connections, to check these get torn down as expected. There is some risk to introducing gates: if there is code holding a gate which does not properly respect a cancellation token, it can cause shutdown hangs. The risk of this for safekeepers is lower in practice than it is for other services, because in a healthy timeline deletion, the compute is shutdown first, then the timeline is deleted on the pageserver, and finally it is deleted on the safekeepers -- that makes it much less likely that some protocol handler will still be running. Closes: #8972 Closes: #8974	2024-11-20 11:07:45 +00:00
Conrad Ludgate	3ae0b2149e	chore(proxy): demote a ton of logs for successful connection attempts (#9803 ) See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment.	2024-11-20 10:14:28 +00:00
Arpad Müller	0a499a3176	Don't preload offloaded timelines (#9646 ) In timeline preloading, we also do a preload for offloaded timelines. This includes the download of `index-part.json`. Ultimately, such a download is wasteful, therefore avoid it. Same goes for the remote client, we just discard it immediately thereafter. Part of #8088 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-11-20 05:44:23 +00:00
Matthias van de Meent	ea1858e3b6	compute_ctl: Streamline and Pipeline startup SQL (#9717 ) Before, compute_ctl didn't have a good registry for what command would run when, depending exclusively on sync code to apply changes. When users have many databases/roles to manage, this step can take a substantial amount of time, breaking assumptions about low (re)start times in other systems. This commit reduces the time compute_ctl takes to restart when changes must be applied, by making all commands more or less blind writes, and applying these commands in an asynchronous context, only waiting for completion once we know the commands have all been sent. Additionally, this reduces time spent by batching per-database operations where previously we would create a new SQL connection for every user-database operation we planned to execute.	2024-11-20 02:14:58 +01:00
Alexander Bayandin	2281a02c49	CODEOWNERS: add developer-productivity team (#9810 ) Notify @neondatabase/developer-productivity team about changes in CI (i.e. in `.github/` directory)	2024-11-20 00:30:24 +00:00
Alexander Bayandin	725e0a1ac9	CI(release): create reusable workflow for releases (#9806 ) ## Problem We have a bunch of duplicated code for automated releases. There will be even more, once we have `release-compute` branch (https://github.com/neondatabase/neon/pull/9637). Another issue with the current `release` workflow is that it creates a PR from the main as is. If we create 2 different releases from the same commit, GitHub could mix up results from different PRs. ## Summary of changes - Create a reusable workflow for releases - Create an empty commit to differentiate releases	2024-11-19 23:03:15 +00:00
Konstantin Knizhnik	770ac34ae6	Register custom xlog reader callbacks for on-demand WAL download in StartupDecodingContext (#9007 ) ## Problem See https://github.com/neondatabase/neon/issues/8931 On-demand WAL download are not set in all cases where WAL is accessed by logical replication ## Summary of changes Set customer xlog reader handles in StartupDecodingContext Related changes in Postgres modules: https://github.com/neondatabase/postgres/pull/495 https://github.com/neondatabase/postgres/pull/496 https://github.com/neondatabase/postgres/pull/497 https://github.com/neondatabase/postgres/pull/498 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-19 22:29:57 +02:00
Alex Chi Z.	b22a84a7bf	feat(pageserver): support key range for manual compaction trigger (#9723 ) part of https://github.com/neondatabase/neon/issues/9114, we want to be able to run partial gc-compaction in tests. In the future, we can also expand this functionality to legacy compaction, so that we can trigger compaction for a specific key range. ## Summary of changes * Support passing compaction key range through pageserver routes. * Refactor input parameters of compact related function to take the new `CompactOptions`. * Add tests for partial compaction. Note that the test may or may not trigger compaction based on GC horizon. We need to improve the test case to ensure things always get below the gc_horizon and the gc-compaction can be triggered. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-19 19:38:41 +00:00
Arpad Müller	b092126c94	scrubber: fix parsing issue with Azure (#9797 ) Apparently Azure returns timelines ending with `/` which confuses the parsing. So remove all trailing `/`s before attempting to parse. Part of https://github.com/neondatabase/cloud/issues/19963	2024-11-19 20:10:53 +01:00
Alex Chi Z.	5e3fbef721	fix(pageserver): queue stopped error should be ignored during create timeline (#9767 ) close https://github.com/neondatabase/neon/issues/9730 The test case tests if anything goes wrong during pageserver restart + during timeline creation not complete. Therefore, queue is stopped error is normal in this case, except that it should be categorized as a shutdown error instead of a real error. ## Summary of changes * More comments for the test case. * Queue stopped error will now be forwarded as CreateTimelineError::ShuttingDown. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-19 14:10:09 -05:00
dependabot[bot]	15468cd23c	build(deps): bump aiohttp from 3.10.2 to 3.10.11 (#9794 )	2024-11-19 19:08:00 +00:00
Peter Bendel	a8ac895b83	re-acquire S3 OIDC token after long running tests for report upload to S3 (#9799 ) ## Problem If a benchmark or test-case runs longer than the AWS OIDC token lifetime successive upload of test reports to S3 fail - example: https://github.com/neondatabase/neon/actions/runs/11905529176/job/33176168174#step:9:243 ## Summary of changes In actions that require access to S3 and which are invoked after a long running python testcase we re-acquire the OIDC token explicitly. Note that we need to pass down the aws_oicd_role_arn from the workflow to the action because actions have no access to GitHub vars for security reasons. Sample run https://github.com/neondatabase/neon/actions/runs/11912328276/job/33195676867	2024-11-19 18:22:51 +01:00
Heikki Linnakangas	ada84400b7	PostgreSQL minor version updates (17.2, 16.6, 15.10, 14.15) (#9795 ) The community decided to make a new off-schedule release due to ABI breakage in last week's release. We're not affected by the ABI breakage because we rebuild all extensions in our docker images, but let's stay up-to-date. There were a few other fixes in the release too.	2024-11-19 17:01:05 +02:00
Conrad Ludgate	191f745c81	fix(proxy/auth_broker): ignore -pooler suffix (#9800 ) Fixes https://github.com/neondatabase/cloud/issues/20400 We cannot mix local_proxy and pgbouncer, so we are filtering out the `-pooler` suffix prior to calling wake_compute.	2024-11-19 13:58:26 +00:00
Conrad Ludgate	37b97b3a68	chore(local_proxy): reduce some startup logging (#9798 ) Currently, local_proxy will write an error log if it doesn't find the config file. This is expected for startup, so it's just noise. It is an error if we do receive an explicit SIGHUP though. I've also demoted the build info logs to be debug level. We don't need them in the compute image since we have other ways to determine what code is running. Lastly, I've demoted SIGHUP signal handling from warn to info, since it's not really a warning event. See https://github.com/neondatabase/cloud/issues/10880 for more details	2024-11-19 13:58:11 +00:00
Konstantin Knizhnik	c9acd214ae	Do not create DSM segment for wal_redo_postgres (#9793 ) ## Problem See https://github.com/neondatabase/neon/issues/9738 ## Summary of changes Do not create DSM segment for wal_redo Postgres --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-19 11:56:40 +02:00
Peter Bendel	982cb1c15d	Move logic for ingest benchmark from GitHub workflow into python testcase (#9762 ) ## Problem The first version of the ingest benchmark had some parsing and reporting logic in shell script inside GitHub workflow. it is better to move that logic into a python testcase so that we can also run it locally. ## Summary of changes - Create new python testcase - invoke pgcopydb inside python test case - move the following logic into python testcase - determine backpressure - invoke pgcopydb and report its progress - parse pgcopydb log and extract metrics - insert metrics into perf test database - add additional column to perf test database that can receive endpoint ID used for pgcopydb run to have it available in grafana dashboard when retrieving other metrics for an endpoint ## Example run https://github.com/neondatabase/neon/actions/runs/11860622170/job/33056264386	2024-11-19 09:46:46 +00:00
Arpad Müller	9b6af2bcad	Add the ability to configure GenericRemoteStorage for the scrubber (#9652 ) Earlier work (#7547) has made the scrubber internally generic, but one could only configure it to use S3 storage. This is the final piece to make (most of, snapshotting still requires S3) the scrubber be able to be configured via GenericRemoteStorage. I.e. you can now set an env var like: ``` REMOTE_STORAGE_CONFIG='remote_storage = { bucket_name = "neon-dev-safekeeper-us-east-2d", bucket_region = "us-east-2" } ``` and the scrubber will read it instead.	2024-11-18 21:01:48 +00:00
Arpad Müller	4fc3af15dd	Remove at most one retain_lsn entry from (possibly offloaded) timelne's parent (#9791 ) There is a potential data corruption issue, not one I've encountered, but it's still not hard to hit with some correct looking code given our current architecture. It has to do with the timeline's memory object storage via reference counted `Arc`s, and the removal of `retain_lsn` entries at the drop of the last `Arc` reference. The corruption steps are as follows: 1. timeline gets offloaded. timeline object A doesn't get dropped though, because some long-running task accesses it 2. the same timeline gets unoffloaded again. timeline object B gets created for it, timeline object A still referenced. both point to the same timeline. 3. the task keeping the reference to timeline object A exits. destructor for object A runs, removing `retain_lsn` in the timeline's parent. 4. the timeline's parent runs gc without the `retain_lsn` of the still exant timleine's child, leading to data corruption. In general we are susceptible each time when we recreate a `Timeline` object in the same process, which happens both during a timeline offload/unoffload cycle, as well as during an ancestor detach operation. The solution this PR implements is to make the destructor for a timeline as well as an offloaded timeline remove at most one `retain_lsn`. PR #9760 has added a log line to print the refcounts at timeline offload, but this only detects one of the places where we do such a recycle operation. Plus it doesn't prevent the actual issue. I doubt that this occurs in practice. It is more a defense in depth measure. Usually I'd assume that the timeline gets dropped immediately in step 1, as there is no background tasks referencing it after its shutdown. But one never knows, and reducing the stakes of step 1 actually occurring is a really good idea, from potential data corruption to waste of CPU time. Part of #8088	2024-11-18 21:42:19 +01:00
Vlad Lazar	d7662fdc7b	feat(page_service): timeout-based batching of requests (#9321 ) ## Problem We don't take advantage of queue depth generated by the compute on the pageserver. We can process getpage requests more efficiently by batching them. ## Summary of changes Batch up incoming getpage requests that arrive within a configurable time window (`server_side_batch_timeout`). Then process the entire batch via one `get_vectored` timeline operation. By default, no merging takes place. ## Testing * Functional: https://github.com/neondatabase/neon/pull/9792 * Performance: will be done in staging/pre-prod # Refs * https://github.com/neondatabase/neon/issues/9377 * https://github.com/neondatabase/neon/issues/9376 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-11-18 20:24:03 +00:00
Alex Chi Z.	e5c89f3da3	feat(pageserver): drop disposable keys during gc-compaction (#9765 ) close https://github.com/neondatabase/neon/issues/9552, close https://github.com/neondatabase/neon/issues/8920, part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Drop keys not belonging to this shard during gc-compaction to avoid constructing history that might have been truncated during shard compaction. * Run gc-compaction at the end of shard compaction test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-18 19:27:52 +00:00
Alexey Kondratov	5f0e9c9a94	feat(compute/tests): Report successful replication test runs as well (#9787 ) It should increase the visibility of whether they run and pass.	2024-11-18 16:05:09 +00:00
Alexander Bayandin	44f33b2bd6	Bump default Postgres version for tests to v17 (#9777 ) ## Problem Tests that are marked with `run_only_on_default_postgres` do not run on debug builds on CI because we run debug builds only for the latest Postgres version (which is 17) ## Summary of changes - Bump `PgVersion.DEFAULT` to `v17` - Skip `test_timeline_archival_chaos` in debug builds	2024-11-18 15:06:24 +00:00
Alexander Bayandin	913b5b7027	CI: remove separate check-build-tools-image workflow (#9708 ) ## Problem We call `check-build-tools-image` twice for each workflow whenever we use it, along with `build-build-tools-image`, once as a workflow itself, and the second time from `build-build-tools-image`. This is not necessary. ## Summary of changes - Inline `check-build-tools-image` into `build-build-tools-image` - Remove separate `check-build-tools-image` workflow	2024-11-18 13:14:28 +00:00
John Spray	3f401a328f	tests: mitigate bug to stabilize test_storage_controller_many_tenants (#9771 ) ## Problem Due to #9471 , the scale test occasionally gets 404s while trying to modify the config of a timeline that belongs to a tenant being migrated. We rarely see this narrow race in the field, but the test is quite good at reproducing it. ## Summary of changes - Ignore 404 errors in this test.	2024-11-18 11:33:27 +00:00
Peter Bendel	c3eecf6763	adapt pgvector bench to minor version upgrades of PostgreSql (#9784 ) ## Problem pgvector benchmark is failing because after PostgreSQL minor version upgrade previous version packages are no longer available in deb repository [example failure](https://github.com/neondatabase/neon/actions/runs/11875503070/job/33092787149#step:4:40) ## Summary of changes Update postgres minor version of packages to current version [Example run after this change](https://github.com/neondatabase/neon/actions/runs/11888978279/job/33124614605)	2024-11-18 10:47:43 +00:00
Konstantin Knizhnik	6fa9b0cd8c	Use DATA_DIR instead of current workign directory in restore_from_wal script (#9729 ) ## Problem See https://github.com/neondatabase/neon/issues/7750 test_wal_restore.sh is copying file to current working directory which can cause interfere of test_wa_restore.py tests spawned of different configurations. ## Summary of changes Copy file to $DATA_DIR Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-18 11:55:38 +02:00
a-masterov	10bc1903e1	Fix the regression test running against the staging instance (#9773 ) ## Problem The Postgres version was updated. The patch has to be updated accordingly. ## Summary of changes The patch of the regression test was updated.	2024-11-18 10:30:50 +01:00
John Spray	261d065e6f	pageserver: respect no_sync in `VirtualFile` (#9772 ) ## Problem `no_sync` initially just skipped syncfs on startup (#9677). I'm also interested in flaky tests that time out during pageserver shutdown while flushing l0s, so to eliminate disk throughput as a source of issues there, ## Summary of changes - Drive-by change for test timeouts: add a couple more ::info logs during pageserver startup so it's obvious which part got stuck. - Add a SyncMode enum to configure VirtualFile and respect it in sync_all and sync_data functions - During pageserver startup, set SyncMode according to `no_sync`	2024-11-18 08:59:05 +00:00
Christian Schwarz	b6154b03f4	build(deps): bump smallvec to 1.13.2 to get UB fix (#9781 ) Smallvec 1.13.2 contains [an UB fix](https://github.com/servo/rust-smallvec/pull/345). Upstream opened [a request](https://github.com/rustsec/advisory-db/issues/1960) for this in the advisory-db but it never got acted upon. Found while working on https://github.com/neondatabase/neon/pull/9321.	2024-11-17 21:25:16 +01:00
Erik Grinaker	8880134171	Cargo.toml: upgrade tikv-jemallocator to 0.6.0 (#9779 )	2024-11-17 19:52:05 +01:00
Erik Grinaker	de7e4a34ca	safekeeper: send `AppendResponse` on segment flush (#9692 ) ## Problem When processing pipelined `AppendRequest`s, we explicitly flush the WAL every second and return an `AppendResponse`. However, the WAL is also implicitly flushed on segment bounds, but this does not result in an `AppendResponse`. Because of this, concurrent transactions may take up to 1 second to commit and writes may take up to 1 second before sending to the pageserver. ## Summary of changes Advance `flush_lsn` when a WAL segment is closed and flushed, and emit an `AppendResponse`. To accommodate this, track the `flush_lsn` in addition to the `flush_record_lsn`.	2024-11-17 18:19:14 +01:00
Vlad Lazar	ac689ab014	wal_decoder: rename end_lsn to next_record_lsn (#9776 ) ## Problem It turns out that `WalStreamDecoder::poll_decode` returns the start LSN of the next record and not the end LSN of the current record. They are not always equal. For example, they're not equal when the record in question is an XLOG SWITCH record. ## Summary of changes Rename things to reflect that.	2024-11-15 21:53:11 +00:00
Tristan Partin	23eabb9919	Fix PG_MAJORVERSION_NUM typo In `ea32f1d0a3`, Matthias added a feature to our extension to expose more granular wait events. However, due to the typo, those wait events were never registered, so we used the more generic wait events instead. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-15 15:17:23 -06:00
Vlad Lazar	2af791ba83	wal_decoder: make InterpretedWalRecord serde (#9775 ) ## Problem We want to serialize interpreted records to send them over the wire from safekeeper to pageserver. ## Summary of changes Make `InterpretedWalRecord` ser/de. This is a temporary change to get the bulk of the lift merged in https://github.com/neondatabase/neon/pull/9746. For going to prod, we don't want to use bincode since we can't evolve the schema. Questions on serialization will be tackled separately.	2024-11-15 20:34:48 +00:00
Mikhail Kot	e12628fe93	Collect max_connections metric (#9770 ) This will further allow us to expose this metric to users	2024-11-15 17:42:41 +00:00
Arpad Müller	7880c246f1	Correct mistakes in offloaded timeline retain_lsn management (#9760 ) PR #9308 has modified tenant activation code to take offloaded child timelines into account for populating the list of `retain_lsn` values. However, there is more places than just tenant activation where one needs to update the `retain_lsn`s. This PR fixes some bugs of the current code that could lead to corruption in the worst case: 1. Deleting of an offloaded timeline would not get its `retain_lsn` purged from its parent. With the patch we now do it, but as the parent can be offloaded as well, the situatoin is a bit trickier than for non-offloaded timelines which can just keep a pointer to their parent. Here we can't keep a pointer because the parent might get offloaded, then unoffloaded again, creating a dangling pointer situation. Keeping a pointer to the tenant is not good either, because we might drop the offloaded timeline in a context where a `offloaded_timelines` lock is already held: so we don't want to acquire a lock in the drop code of OffloadedTimeline. 2. Unoffloading a timeline would not get its `retain_lsn` values populated, leading to it maybe garbage collecting values that its children might need. We now call `initialize_gc_info` on the parent. 3. Offloading of a timeline would not get its `retain_lsn` values registered as offloaded at the parent. So if we drop the `Timeline` object, and its registration is removed, the parent would not have any of the child's `retain_lsn`s around. Also, before, the `Timeline` object would delete anything related to its timeline ID, now it only deletes `retain_lsn`s that have `MaybeOffloaded::No` set. Incorporates Chi's reproducer from #9753. cc https://github.com/neondatabase/cloud/issues/20199 The `test_timeline_retain_lsn` test is extended: 1. it gains a new dimension, duplicating each mode, to either have the "main" branch be the direct parent of the timeline we archive, or the "test_archived_parent" branch intermediary, creating a three timeline structure. This doesn't test anything fixed by this PR in particular, just explores the vast space of possible configurations a little bit more. 2. it gains two new modes, `offload-parent`, which tests the second point, and `offload-no-restart` which tests the third point. It's easy to verify the test actually is "sharp" by removing one of the respective `self.initialize_gc_info()`, `gc_info.insert_child()` or `ancestor_children.push()`. Part of #8088 --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-11-15 14:22:29 +01:00
John Spray	04938d9d55	tests: tolerate pageserver 500s in test_timeline_archival_chaos (#9769 ) ## Problem Test exposes cases where pageserver gives 500 responses, causing failures like https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9766/11844529470/index.html#suites/d1acc79950edeb0563fc86236c620898/3546be2ffed99ba6 ## Summary of changes - Tolerate such messages, and link an issue for cleaning up the pageserver not to return such 500s.	2024-11-15 13:22:05 +00:00
Erik Grinaker	19f7d40c1d	deny.toml: allow CDDL-1.0 license (#9766 ) #9764, which adds profiling support to Safekeeper, pulls in the dependency [`inferno`](https://crates.io/crates/inferno) via [`pprof-rs`](https://crates.io/crates/pprof). This is licenced under the [Common Development and Distribution License 1.0](https://spdx.org/licenses/CDDL-1.0.html), which is not allowed by `cargo-deny`. This patch allows the CDDL-1.0 license. It is a derivative of the Mozilla Public License, which we already allow, but avoids some issues around European copyright law that the MPL had. As such, I don't expect this to be problematic.	2024-11-15 10:41:43 +00:00
John Spray	38563de7dd	storcon: exclude non-Active tenants from shard autosplitting (#9743 ) ## Problem We didn't have a neat way to prevent auto-splitting of tenants. This could be useful during incidents or for testing. Closes https://github.com/neondatabase/neon/issues/9332 ## Summary of changes - Filter splitting candidates by scheduling policy	2024-11-14 19:41:10 +00:00
John Spray	93939f123f	tests: add test_timeline_archival_chaos (#9609 ) ## Problem - We lack test coverage of cases where multiple timelines fight for updates to the same manifest (https://github.com/neondatabase/neon/pull/9557), and in timeline archival changes while dual-attached (https://github.com/neondatabase/neon/pull/9555) ## Summary of changes - Add a chaos test for timeline creation->archival->offload->deletion	2024-11-14 17:31:35 +00:00
Tristan Partin	49b599c113	Remove the replication slot in test_snap_files at the end of the test Analysis of the LR benchmarking tests indicates that in the duration of test_subscriber_lag, a leftover 'slotter' replication slot can lead to retained WAL growing on the publisher. This replication slot is not used by any subscriber. The only purpose of the slot is to generate snapshot files for the puspose of test_snap_files. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-14 10:59:15 -06:00
Yuchen Liang	8cde37bc0b	test: disable test_readonly_node_gc until proper fix (#9755 ) ## Problem After investigation, we think to make `test_readonly_node_gc` less flaky, we need to make a proper fix (likely involving persisting part of the lease state). See https://github.com/neondatabase/neon/issues/9754 for details. ## Summary of changes - skip the test until proper fix. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-14 15:26:58 +00:00
Konstantin Knizhnik	f70611c8df	Correctly truncate VM (#9342 ) ## Problem https://github.com/neondatabase/neon/issues/9240 ## Summary of changes Correctly truncate VM page instead just replacing it with zero page. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-11-14 17:19:13 +02:00
Vlad Lazar	21282aa113	cargo: use neon branch of rust-postgres (#9757 ) ## Problem We are pining our fork of rust-postgres to a commit hash and that prevents us from making further changes to it. The latest commit in rust-postgres requires https://github.com/neondatabase/neon/pull/8747, but that seems to have gone stale. I reverted rust-postgres `neon` branch to the pinned commit in https://github.com/neondatabase/rust-postgres/pull/31. ## Summary of changes Switch back to using the `neon` branch of the rust-postgres fork.	2024-11-14 15:16:43 +00:00
Arseny Sher	d06bf4b0fe	safekeeper: fix atomicity of WAL truncation (#9685 ) If WAL truncation fails in the middle it might leave some data on disk above the write/flush LSN. In theory, concatenated with previous records it might form bogus WAL (though very unlikely in practice because CRC would protect from that). To protect from that, set pending_wal_truncation flag: means before any WAL writes truncation must be retried until it succeeds. We already did that in case of safekeeper restart, now extend this mechanism for failures without restart. Also, importantly, reset LSNs in the beginning of the operation, not in the end, because once on disk deletion starts previous pointers are wrong. All this most likely haven't created any problems in practice because CRC protects from the consequences. Tests for this are hard; simulation infrastructure might be useful here in the future, but not yet.	2024-11-14 13:06:42 +03:00
Ivan Efremov	0467d88f06	Merge pull request #9756 from neondatabase/rc/proxy/2024-11-14 Proxy release 2024-11-14	2024-11-14 09:46:52 +02:00
Tristan Partin	1280b708f1	Improve error handling for NeonAPI fixture Move error handling to the common request function and add a debug log. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-13 20:35:48 -06:00
John Spray	b4e00b8b22	pageserver: refuse to load tenants with suspiciously old indices in old generations (#9719 ) ## Problem Historically, if a control component passed a pageserver "generation: 1" this could be a quick way to corrupt a tenant by loading a historic index. Follows https://github.com/neondatabase/neon/pull/9383 Closes #6951 ## Summary of changes - Introduce a Fatal variant to DownloadError, to enable index downloads to signal when they have encountered a scary enough situation that we shouldn't proceed to load the tenant. - Handle this variant by putting the tenant into a broken state (no matter which timeline within the tenant reported it) - Add a test for this case In the event that this behavior fires when we don't want it to, we have ways to intervene: - "Touch" an affected index to update its mtime (download+upload S3 object) - If this behavior is triggered, it indicates we're attaching in some old generation, so we should be able to fix that by manually bumping generation numbers in the storage controller database (this should never happen, but it's an option if it does)	2024-11-13 18:07:39 +00:00
Heikki Linnakangas	10aaa3677d	PostgreSQL minor version updates (17.1, 16.5, 15.9, 14.14) (#9727 ) This includes a patch to temporarily disable one test in the pg_anon test suite. It is an upstream issue, the test started failing with the new PostgreSQL minor versions because of a change in the default timezone used in tests. We don't want to block the release for this, so just disable the test for now. See `199f0a392b (note_2148017485)` Corresponding postgres repository PRs: https://github.com/neondatabase/postgres/pull/524 https://github.com/neondatabase/postgres/pull/525 https://github.com/neondatabase/postgres/pull/526 https://github.com/neondatabase/postgres/pull/527	2024-11-13 15:08:58 +02:00
Heikki Linnakangas	d5435b1a81	tests: Increase timeout in test_create_churn_during_restart (#9736 ) This test was seen to be flaky, e.g. at: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9457/11804246485/index.html#suites/ec4311502db344eee91f1354e9dc839b/982bd121ea698414/. If I _reduce_ the timeout from 10s to 8s on my laptop, it reliably hits that timeout and fails. That suggests that the test is pretty close to the edge even when it passes. Let's bump up the timeout to 30 s to make it more robust. See also https://github.com/neondatabase/neon/issues/9730, although the error message is different there.	2024-11-13 12:20:32 +02:00
Anastasia Lubennikova	080d585b22	Add installed_extensions prometheus metric (#9608 ) and add /metrics endpoint to compute_ctl to expose such metrics metric format example for extension pg_rag with versions 1.2.3 and 1.4.2 installed in 3 and 1 databases respectively: neon_extensions_installed{extension="pg_rag", version="1.2.3"} = 3 neon_extensions_installed{extension="pg_rag", version="1.4.2"} = 1 ------ infra part: https://github.com/neondatabase/flux-fleet/pull/251 --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-11-13 09:36:48 +00:00
John Spray	7595d3afe6	pageserver: add `no_sync` for use in regression tests (2/2) (#9678 ) ## Problem Followup to https://github.com/neondatabase/neon/pull/9677 which enables `no_sync` in tests. This can be merged once the next release has happened. ## Summary of changes - Always run pageserver with `no_sync = true` in tests.	2024-11-13 09:17:26 +00:00
Konstantin Knizhnik	1ff5333a1b	Do not wallog AUX files at replica (#9457 ) ## Problem Attempt to persist LR stuff at replica cause cannot make new WAL entries during recovery` error. See https://neondb.slack.com/archives/C07S7RBFVRA/p1729280401283389 ## Summary of changes Do not wallog AUX files at replica. Related Postgres PRs: https://github.com/neondatabase/postgres/pull/517 https://github.com/neondatabase/postgres/pull/516 https://github.com/neondatabase/postgres/pull/515 https://github.com/neondatabase/postgres/pull/514 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-11-13 08:50:01 +02:00
Tristan Partin	d8f5d43549	Fix autocommit footguns in performance tests psycopg2 has the following warning related to autocommit: > By default, any query execution, including a simple SELECT will start > a transaction: for long-running programs, if no further action is > taken, the session will remain “idle in transaction”, an undesirable > condition for several reasons (locks are held by the session, tables > bloat…). For long lived scripts, either ensure to terminate a > transaction as soon as possible or use an autocommit connection. In the 2.9 release notes, psycopg2 also made the following change: > `with connection` starts a transaction on autocommit transactions too Some of these connections are indeed long-lived, so we were retaining tons of WAL on the endpoints because we had a transaction pinned in the past. Link: https://www.psycopg.org/docs/news.html#what-s-new-in-psycopg-2-9 Link: https://github.com/psycopg/psycopg2/issues/941 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-12 15:48:19 -06:00
Erik Grinaker	2256a5727a	safekeeper: use `WAL_SEGMENT_SIZE` for empty timeline state (#9734 ) ## Problem `TimelinePersistentState::empty()`, used for tests and benchmarks, had a hardcoded 16 MB WAL segment size. This caused confusion when attempting to change the global segment size. ## Summary of changes Inherit from `WAL_SEGMENT_SIZE` in `TimelinePersistentState::empty()`.	2024-11-12 20:35:44 +00:00
Tristan Partin	3f80af8b1d	Add neon.logical_replication_max_logicalsnapdir_size This GUC will drop replication slots if the size of the pg_logical/snapshots directory (not including temp snapshot files) becomes larger than the specified size. Keeping the size of this directory smaller will help with basebackup size from the pageserver. Part-of: https://github.com/neondatabase/neon/issues/8619 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-12 13:13:28 -06:00
Tristan Partin	a61d81bbc7	Calculate compute_backpressure_throttling_seconds correctly The original value that we get is measured in microseconds. It comes from a calculation using Postgres' GetCurrentTimestamp(), whihc is implemented in terms of gettimeofday(2). Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-12 13:12:08 -06:00
Erik Grinaker	05381a48f0	utils: remove unnecessary fsync in `durable_rename()` (#9686 ) ## Problem WAL segment fsyncs significantly affect WAL ingestion throughput. `durable_rename()` is used when initializing every 16 MB segment, and issues 3 fsyncs of which 1 was unnecessary. ## Summary of changes Remove an fsync in `durable_rename` which is unnecessary with Linux and ext4 (which we currently use). This improves WAL ingestion throughput by up to 23% with large appends on my MacBook.	2024-11-12 18:57:31 +01:00
Alex Chi Z.	cef165818c	test(pageserver): add gc-compaction tests with delta will_init (#9724 ) I had an impression that gc-compaction didn't test the case where the first record of the key history is will_init because of there are some code path that will panic in this case. Luckily it got fixed in https://github.com/neondatabase/neon/pull/9026 so we can now implement such tests. Part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Randomly changed some images into will_init neon wal record * Split `test_simple_bottom_most_compaction_deltas` into two test cases, one of them has the bottom layer as delta layer with will_init flags, while the other is the original one with image layers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-12 10:37:31 -05:00
Erik Grinaker	6b19867410	safekeeper: don't flush control file on WAL ingest path (#9698 ) ## Problem The control file is flushed on the WAL ingest path when the commit LSN advances by one segment, to bound the amount of recovery work in case of a crash. This involves 3 additional fsyncs, which can have a significant impact on WAL ingest throughput. This is to some extent mitigated by `AppendResponse` not being emitted on segment bound flushes, since this will prevent commit LSN advancement, which will be addressed separately. ## Summary of changes Don't flush the control file on the WAL ingest path at all. Instead, leave that responsibility to the timeline manager, but ask it to flush eagerly if the control file lags the in-memory commit LSN by more than one segment. This should not cause more than `REFRESH_INTERVAL` (300 ms) additional latency before flushing the control file, which is negligible.	2024-11-12 15:17:03 +00:00
Tristan Partin	cc8029c4c8	Update pg_cron to 1.6.4 This comes with PG 17 support. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 20:10:53 -06:00
Tristan Partin	5be6b07cf1	Improve typing related to regress/test_logical_replication.py (#9725 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 17:36:45 -06:00
Arpad Müller	b018bc7da8	Add a retain_lsn test (#9599 ) Add a test that ensures the `retain_lsn` functionality works. Right now, there is not a single test that is broken if offloaded or non-offloaded timelines don't get registered at their parents, preventing gc from discarding the ancestor_lsns of the children. This PR fills that gap. The test has four modes: * `offloaded`: offload the child timeline, run compaction on the parent timeline, unarchive the child timeline, then try reading from it. hopefully the data is still there. * `offloaded-corrupted`: offload the child timeline, corrupts the manifest in a way that the pageserver believes the timeline was flattened. This is the closest we can get to pretend the `retain_lsn` mechanism doesn't exist for offloaded timelines, so we can avoid adding endpoints to the pageserver that do this manually for tests. The test then checks that indeed data is corrupted and the endpoint can't be started. That way we know that the test is actually working, and actually tests the `retain_lsn` mechanism, instead of say the lsn lease mechanism, or one of the many other mechanisms that impede gc. * `archived`: the child timeline gets archived but doesn't get offloaded. this currently matches the `None` case but we might have refactors in the future that make archived timelines sufficiently different from non-archived ones. * `None`: the child timeline doesn't even get archived. this tests that normal timelines participate in `retain_lsn`. I've made them locally not participate in `retain_lsn` (via commenting out the respective `ancestor_children.push` statement in tenant.rs) and ran the testsuite, and not a single test failed. So this test is first of its kind. Part of #8088.	2024-11-11 22:29:21 +00:00
Tristan Partin	4b075db7ea	Add a postgres_exporter config file This exporter logs an ERROR if a file called `postgres_exporter.yml` is not located in its current working directory. We can silence it by adding an empty config file and pointing the exporter at it. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 14:49:37 -06:00
Fedor Dikarev	fde16f8614	use batch gh-workflow-stats-action with separate table (#9722 ) We found that exporting GH Workflow Runs in batch is more efficient due to - better utilisation of Github API - and gh runners usage is rounded to minutes, so even when ad-hoc export is done in 5-10 seconds, we billed for one minute usage So now we introduce batch exporting, with version v0.2.x of github workflow stats exporter. How it's expected to work now: - every 15 minutes we query for the workflow runs, created in last 2 hours - to avoid missing workflows that ran for more than 2 hours, every night (00:25) we will query workflows created in past 24 hours and export them as well - should we have query for even longer periods? - lets see how it works with current schedule - for longer periods like for days or weeks, it may require to adjust logic and concurrency of querying data, so lets for now use simpler version	2024-11-11 20:33:29 +00:00
Alex Chi Z.	5a138d08a3	feat(pageserver): support partial gc-compaction for delta layers (#9611 ) The final patch for partial compaction, part of https://github.com/neondatabase/neon/issues/9114, close https://github.com/neondatabase/neon/issues/8921 (note that we didn't implement parallel compaction or compaction scheduler for partial compaction -- currently this needs to be scheduled by using a Python script to split the keyspace, and in the future, automatically split based on the key partitioning when the pageserver wants to trigger a gc-compaction) ## Summary of changes * Update the layer selection algorithm to use the same selection as full compaction (everything intersect/below gc horizon) * Update the layer selection algorithm to also generate a list of delta layers that need to be rewritten * Add the logic to rewrite delta layers and add them back to the layer map * Update test case to do partial compaction on deltas --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 20:30:32 +00:00
Tristan Partin	2d9652c434	Clean up C.UTF-8 locale changes Removes some unnecessary initdb arguments, and fixes Neon for MacOS since it doesn't seem to ship a C.UTF-8 locale. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 13:53:12 -06:00
Alexander Bayandin	e9dcfa2eb2	test_runner: skip more tests using decorator instead of pytest.skip (#9704 ) ## Problem Running `pytest.skip(...)` in a test body instead of marking the test with `@pytest.mark.skipif(...)` makes all fixtures to be initialised, which is not necessary if the test is going to be skipped anyway. Also, some tests are unnecessarily skipped (e.g. `test_layer_bloating` on Postgres 17, or `test_idle_reconnections` at all) or run (e.g. `test_parse_project_git_version_output_positive` more than on once configuration) according to comments. ## Summary of changes - Move `skip_on_postgres` / `xfail_on_postgres` / `run_only_on_default_postgres` decorators to `fixture.utils` - Add new `skip_in_debug_build` and `skip_on_ci` decorators - Replace `pytest.skip(...)` calls with decorators where possible	2024-11-11 18:07:01 +00:00
Peter Bendel	8db84d9964	new ingest benchmark (#9711 ) ## Problem We have no specific benchmark testing project migration of postgresql project with existing data into Neon. Typical steps of such a project migration are - schema creation in the neon project - initial COPY of relations - creation of indexes and constraints - vacuum analyze ## Summary of changes Add a periodic benchmark running 9 AM UTC every day. In each run: - copy a 200 GiB project that has realistic schema, data, tables, indexes and constraints from another project into - a new Neon project (7 CU fixed) - an existing tenant, (but new branch and new database) that already has 4 TiB of data - use pgcopydb tool to automate all steps and parallelize COPY and index creation - parse pgcopydb output and report performance metrics in Neon performance test database ## Logs This benchmark has been tested first manually and then as part of benchmarking.yml workflow, example run see https://github.com/neondatabase/neon/actions/runs/11757679870	2024-11-11 17:51:15 +00:00
Alexander Bayandin	1aab34715a	Remove checklist from the PR template (#9702 ) ## Problem Once we enable the merge queue for the `main` branch, it won't be possible to adjust the commit message right after pressing the "Squash and merge" button and the PR title + description will be used as is. To avoid extra noise in the commits in the `main` with the checklist leftovers, I propose removing the checklist from the PR template and keeping only the Problem / Summary of changes. ## Summary of changes - Remove the checklist from the PR template	2024-11-11 17:01:02 +00:00
Erik Grinaker	f63de5f527	safekeeper: add `initialize_segment` variant of `safekeeper_wal_storage_operation_seconds` (#9691 ) ## Problem We don't have a metric capturing the latency of segment initialization. This can be significant due to fsyncs. ## Summary of changes Add an `initialize_segment` variant of `safekeeper_wal_storage_operation_seconds`.	2024-11-11 17:55:50 +01:00
Alex Chi Z.	54a1676680	rfc: update aux file rfc to reflect latest optimizations (#9681 ) Reflects https://github.com/neondatabase/neon/pull/9631 in the RFC. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 09:19:03 -05:00
Alex Chi Z.	48c06d9f7b	fix(pageserver): increase frozen layer warning threshold; ignore in tests (#9705 ) Perf benchmarks produce a lot of layers. ## Summary of changes Bumping the threshold and ignore the warning. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 09:13:46 -05:00
Alexander Bayandin	f510647c7e	CI: retry `actions/github-script` for 5XX errors (#9703 ) ## Problem GitHub API can return error 500, and it fails jobs that use `actions/github-script` action. ## Summary of changes - Add `retry: 500` to all `actions/github-script` usage	2024-11-11 12:42:32 +00:00
Vlad Lazar	ceaa80ffeb	storcon: add peer token for peer to peer communication (#9695 ) ## Problem We wish to stop using admin tokens in the infra repo, but step down requests use the admin token. ## Summary of Changes Introduce a new "ControllerPeer" scope and use it for step-down requests.	2024-11-11 09:58:41 +00:00
Alexander Bayandin	2fcac0e66b	CI(pre-merge-checks): add required checks (#9700 ) ## Problem The Merge queue doesn't work because it expects certain jobs, which we don't have in the `pre-merge-checks` workflow. But it turns out we can just create jobs/checks with the same names in any workflow that we run. ## Summary of changes - Add `conclusion` jobs - Create `neon-cloud-e2e` status check - Add a bunch of `if`s to handle cases with no relevant changes found and prepare the workflow to run rust checks in the future - List the workflow in `report-workflow-stats` to collect stats about it	2024-11-09 01:02:54 +00:00
Tristan Partin	ecde8d7632	Improve type safety according to pyright Pyright found many issues that mypy doesn't seem to want to catch or mypy isn't configured to catch. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-08 14:43:15 -06:00
Alex Chi Z.	af8238ae52	fix(pageserver): drain upload queue before offloading timeline (#9682 ) It is possible at the point we shutdown the timeline, there are still layer files we did not upload. ## Summary of changes * If the queue is not empty, avoid offloading. * Shutdown the timeline gracefully using the flush mode to ensure all local files are uploaded before deleting the timeline directory. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 14:28:55 -05:00
Erik Grinaker	ab47804d00	safekeeper: remove unused `WriteGuardSharedState::skip_update` (#9699 )	2024-11-08 19:25:31 +00:00
Alex Chi Z.	ecca62a45d	feat(pageserver): more log lines around frozen layers (#9697 ) We saw pageserver OOMs https://github.com/neondatabase/cloud/issues/19715 for tenants doing large writes. Add log lines around in-memory layers to hopefully collect some info during my on-call shift next week. ## Summary of changes * Estimate in-memory size of an in-mem layer. * Print frozen layer number if there are too many layers accumulated in memory. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 18:44:00 +00:00
Tristan Partin	34a4eb6f2a	Switch compute-related locales to C.UTF-8 by default Right now, our environments create databases with the C locale, which is really unfortunate for users who have data stored in other languages that they want to analyze. For instance, show_trgm on Hebrew text currently doesn't work in staging or production. I don't envision this being the final solution. I think this is just a way to set a known value so the pageserver doesn't use its parent environment. The final solution to me is exposing initdb parameters to users in the console. Then they could use a different locale or encoding if they so chose. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-08 12:19:18 -06:00
Alexander Bayandin	b6bc954c5d	CI: move check codestyle python to reusable workflow and run on a merge_group (#9683 ) ## Problem To prevent breaking main after Python 3.11 PR get merged we need to enable merge queue and run `check-codestyle-python` job on it ## Summary of changes - Move `check-codestyle-python` to a reusable workflow - Run this workflow on `merge_group` event	2024-11-08 17:32:56 +00:00
Vlad Lazar	30680d1f32	tests: use tigther storcon scopes (#9696 ) ## Problem https://github.com/neondatabase/neon/pull/9596 did not update tests because that would've broken the compat tests. ## Summary of Changes Use infra scope where possible.	2024-11-08 17:00:31 +00:00
Alex Chi Z.	f561cbe1c7	fix(pageserver): drain upload queue before detaching ancestor (#9651 ) In INC-317 https://neondb.slack.com/archives/C033RQ5SPDH/p1730815677932209, we saw an interesting series of operations that would remove valid layer files existing in the layer map. * Timeline A starts compaction and generates an image layer Z but not uploading it yet. * Timeline B/C starts ancestor detaching (which should not affect timeline A) * The tenant gets restarted as part of the ancestor detaching process, without increasing the generation number. * Timeline A reloads, discovering the layer Z is a future layer, and schedules a deletion into the deletion queue. This means that the file will be deleted any time in the future. * Timeline A starts compaction and generates layer Z again, adding it to the layer map. Note that because we don't bump generation number during ancestor detach, it has the same filename + generation number as the original Z. * Timeline A deletes layer Z from s3 + disk, and now we have a dangling reference in the layer map, blocking all compaction/logical_size_calculation process. ## Summary of changes * We wait until all layers to be uploaded before shutting down the tenants in `Flush` mode. * Ancestor detach restarts now use this mode. * Ancestor detach also waits for remote queue completion before starting the detaching process. * The patch ensures that we don't have any future image layer (or something similar) after restart, but not fixing the underlying problem around generation numbers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 10:35:27 -05:00
Tristan Partin	3525d2e381	Update TimescaleDB to 2.17.1 for PG 17 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-08 09:15:38 -06:00
Konstantin Knizhnik	17c002b660	Do not copy logical replicaiton slots to replica (#9458 ) ## Problem Replication slots are now persisted using AUX files mechanism and included in basebackup when replica is launched. This slots are not somehow used at replica but hold WAL, which may cause local disk space exhaustion. ## Summary of changes Add `--replica` parameter to basebackup request and do not include replication slot state files in basebackup for replica. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-08 14:54:58 +02:00
John Spray	aa9112efce	pageserver: add `no_sync` for use in regression tests (1/2) (#9677 ) ## Problem In test environments, the `syncfs` that the pageserver does on startup can take a long time, as other tests running concurrently might have many gigabytes of dirty pages. ## Summary of changes - Add a `no_sync` option to the pageserver's config. - Skip syncfs on startup if this is set - A subsequent PR (https://github.com/neondatabase/neon/pull/9678) will enable this by default in tests. We need to wait until after the next release to avoid breaking compat tests, which would fail if we set no_sync & use an old pageserver binary. Q: Why is this a different mechanism than safekeeper, which as a --no-sync CLI? A: Because the way we manage pageservers in neon_local depends on the pageserver.toml containing the full configuration, whereas safekeepers have a config file which is neon-local-specific and can drive a CLI flag. Q: Why is the option no_sync rather than sync? A: For boolean configs with a dangerous value, it's preferable to make "false" the safe option, so that any downstream future config tooling that might have a "booleans are false by default" behavior (e.g. golang structs) is safe by default. Q: Why only skip the syncfs, and not all fsyncs? A: Skipping all fsyncs would require more code changes, and the most acute problem isn't fsyncs themselves (these just slow down a running test), it's the syncfs (which makes a pageserver startup slow as a result of _other_ tests)	2024-11-08 10:16:04 +00:00
JC Grünhage	027889b06c	ci: use set-docker-config-dir from dev-actions (#9638 ) set-docker-config-dir was replicated over multiple repositories. The replica of this action was removed from this repository and it's using the version from github.com/neondatabase/dev-actions instead	2024-11-08 10:44:59 +01:00
Heikki Linnakangas	79929bb1b6	Disable `rust_2024_compatibility` lint option (#9615 ) Compiling with nightly rust compiler, I'm getting a lot of errors like this: error: `if let` assigns a shorter lifetime since Edition 2024 --> proxy/src/auth/backend/jwt.rs:226:16 \| 226 \| if let Some(permit) = self.try_acquire_permit() { \| ^^^^^^^^^^^^^^^^^^^------------------------- \| \| \| this value has a significant drop implementation which may observe a major change in drop order and requires your discretion \| = warning: this changes meaning in Rust 2024 = note: for more information, see issue #124085 <https://github.com/rust-lang/rust/issues/124085> help: the value is now dropped here in Edition 2024 --> proxy/src/auth/backend/jwt.rs:241:13 \| 241 \| } else { \| ^ note: the lint level is defined here --> proxy/src/lib.rs:8:5 \| 8 \| rust_2024_compatibility \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: `#[deny(if_let_rescope)]` implied by `#[deny(rust_2024_compatibility)]` and this: error: these values and local bindings have significant drop implementation that will have a different drop order from that of Edition 2021 --> proxy/src/auth/backend/jwt.rs:376:18 \| 369 \| let client = Client::builder() \| ------ these values have significant drop implementation and will observe changes in drop order under Edition 2024 ... 376 \| map: DashMap::default(), \| ^^^^^^^^^^^^^^^^^^ \| = warning: this changes meaning in Rust 2024 = note: for more information, see issue #123739 <https://github.com/rust-lang/rust/issues/123739> = note: `#[deny(tail_expr_drop_order)]` implied by `#[deny(rust_2024_compatibility)]` They are caused by the `rust_2024_compatibility` lint option. When we actually switch to the 2024 edition, it makes sense to go through all these and check that the drop order changes don't break anything, but in the meanwhile, there's no easy way to avoid these errors. Disable it, to allow compiling with nightly again. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-11-08 08:35:03 +00:00
Peter Bendel	9132d80aa3	add pgcopydb tool to build tools image (#9658 ) ## Problem build-tools image does not provide superuser, so additional packages can not be installed during GitHub benchmarking workflows but need to be added to the image ## Summary of changes install pgcopydb version 0.17-1 or higher into build-tools bookworm image ```bash docker run -it neondatabase/build-tools:<tag>-bookworm-arm64 /bin/bash ... nonroot@c23c6f4901ce:~$ LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version; 13:58:19.768 8 INFO Running pgcopydb version 0.17 from "/pgcopydb/bin/pgcopydb" pgcopydb version 0.17 compiled with PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit compatible with Postgres 11, 12, 13, 14, 15, and 16 ``` Example usage of that image in a workflow https://github.com/neondatabase/neon/actions/runs/11725718371/job/32662681172#step:7:14	2024-11-07 19:00:25 +01:00
Conrad Ludgate	82e3f0ecba	[proxy/authorize]: improve JWKS reliability (#9676 ) While setting up some tests, I noticed that we didn't support keycloak. They make use of encryption JWKs as well as signature ones. Our current jwks crate does not support parsing encryption keys which caused the entire jwk set to fail to parse. Switching to lazy parsing fixes this. Also while setting up tests, I couldn't use localhost jwks server as we require HTTPS and we were using webpki so it was impossible to add a custom CA. Enabling native roots addresses this possibility. I saw some of our current e2e tests against our custom JWKS in s3 were taking a while to fetch. I've added a timeout + retries to address this.	2024-11-07 16:24:38 +00:00
Arpad Müller	75aa19aa2d	Don't attach is_archived to debug output (#9679 ) We are in branches where we know its value already.	2024-11-07 16:13:50 +00:00
Alex Chi Z.	a8d9939ea9	fix(pageserver): reduce aux compaction threshold (#9647 ) ref https://github.com/neondatabase/neon/issues/9441 The metrics from LR publisher testing project: ~300KB aux key deltas per 256MB files. Therefore, I think we can do compaction more aggressively as these deltas are small and compaction can reduce layer download latency. We also have a read path perf fix https://github.com/neondatabase/neon/pull/9631 but I'd still combine the read path fix with the reduce of the compaction threshold. ## Summary of changes * reduce metadata compaction threshold * use num of L1 delta layers as an indicator for metadata compaction * dump more logs Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-07 10:38:15 -05:00
Erik Grinaker	f18aa04b90	safekeeper: use `set_len()` to zero out segments (#9665 ) ## Problem When we create a new segment, we zero it out in order to avoid changing the length and fsyncing metadata on every write. However, we zeroed it out by writing 8 KB zero-pages, and Tokio file writes have non-trivial overhead. ## Summary of changes Zero out the segment using [`File::set_len()`](https://docs.rs/tokio/latest/i686-unknown-linux-gnu/tokio/fs/struct.File.html#method.set_len) instead. This will typically (depending on the filesystem) just write a sparse file and omit the 16 MB of data entirely. This improves WAL append throughput for large messages by over 400% with fsync disabled, and 100% with fsync enabled.	2024-11-07 15:09:57 +00:00
Erik Grinaker	01265b7bc6	safekeeper: add basic WAL ingestion benchmarks (#9531 ) ## Problem We don't have any benchmarks for Safekeeper WAL ingestion. ## Summary of changes Add some basic benchmarks for WAL ingestion, specifically for `SafeKeeper::process_msg()` (single append) and `WalAcceptor` (pipelined batch ingestion). Also add some baseline file write benchmarks.	2024-11-07 13:24:03 +00:00
Arseny Sher	f54f0e8e2d	Fix direct reading from WAL buffers. (#9639 ) Fix direct reading from WAL buffers. Pointer wasn't advanced which resulted in sending corrupted WAL if part of read used WAL buffers and part read from the file. Also move it to neon_walreader so that e.g. replication could also make use of it. ref https://github.com/neondatabase/cloud/issues/19567	2024-11-07 11:29:52 +00:00
Erik Grinaker	d6aa26a533	postgres_ffi: make `WalGenerator` generic over record generator (#9614 ) ## Problem Benchmarks need more control over the WAL generated by `WalGenerator`. In particular, they need to vary the size of logical messages. ## Summary of changes * Make `WalGenerator` generic over `RecordGenerator`, which constructs WAL records. * Add `LogicalMessageGenerator` which emits logical messages, with a configurable payload. * Minor tweaks and code reorganization. There are no changes to the core logic or emitted WAL.	2024-11-07 10:38:39 +00:00
Ivan Efremov	f5eec194e7	Merge pull request #9674 from neondatabase/rc/proxy/2024-11-07 Proxy release 2024-11-07	2024-11-07 12:07:12 +02:00
Cheng Chen	e1d0b73824	chore(compute): Bump pg_mooncake to the latest version	2024-11-06 22:41:18 -06:00
Arpad Müller	011c0a175f	Support copying layers in detach_ancestor from before shard splits (#9669 ) We need to use the shard associated with the layer file, not the shard associated with our current tenant shard ID. Due to shard splits, the shard IDs can refer to older files. close https://github.com/neondatabase/neon/issues/9667	2024-11-07 01:53:58 +01:00
Alex Chi Z.	2a95a51a0d	refactor(pageserver): better pageservice command parsing (#9597 ) close https://github.com/neondatabase/neon/issues/9460 ## Summary of changes A full rewrite of pagestream cmdline parsing to make it more robust and readable. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-11-06 20:41:01 +00:00
Yuchen Liang	11fc1a4c12	fix(test): use layer map dump in `test_readonly_node_gc` to validate layers protected by leases (#9551 ) Fixes #9518. ## Problem After removing the assertion `layers_removed == 0` in #9506, we could miss breakage if we solely rely on the successful execution of the `SELECT` query to check if lease is properly protecting layers. Details listed in #9518. Also, in integration tests, we sometimes run into the race condition where getpage request comes before the lease get renewed (item 2 of #8817), even if compute_ctl sends a lease renewal as soon as it sees a `/configure` API calls that updates the `pageserver_connstr`. In this case, we would observe a getpage request error stating that we `tried to request a page version that was garbage collected` (as we seen in [Allure Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8613/11550393107/index.html#suites/3ccffb1d100105b98aed3dc19b717917/d1a1ba47bc180493)). ## Summary of changes - Use layer map dump to verify if the lease protects what it claimed: Record all historical layers that has `start_lsn <= lease_lsn` before and after running timeline gc. This is the same check as `ad79f42460/pageserver/src/tenant/timeline.rs (L5025-L5027)` The set recorded after GC should contain every layer in the set recorded before GC. - Wait until log contains another successful lease request before running the `SELECT` query after GC. We argued in #8817 that the bad request can only exist within a short period after migration/restart, and our test shows that as long as a lease renewal is done before the first getpage request sent after reconfiguration, we will not have bad request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-06 20:18:21 +00:00
Tristan Partin	93123f2623	Rename compute_backpressure_throttling_ms to compute_backpressure_throttling_seconds This is in line with the Prometheus guidance[0]. We also haven't started using this metric, so renaming is essentially free. Link: https://prometheus.io/docs/practices/naming/ [0] Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-06 13:28:23 -06:00
Alex Chi Z.	1d3559d4bc	feat(pageserver): add fast path for sparse keyspace read (#9631 ) In https://github.com/neondatabase/neon/issues/9441, the tenant has a lot of aux keys spread in multiple aux files. The perf tool shows that a significant amount of time is spent on remove_overlapping_keys. For sparse keyspaces, we don't need to report missing key errors anyways, and it's very likely that we will need to read all layers intersecting with the key range. Therefore, this patch adds a new fast path for sparse keyspace reads that we do not track `unmapped_keyspace` in a fine-grained way. We only modify it when we find an image layer. In debug mode, it was ~5min to read the aux files for a dump of the tenant, and now it's only 8s, that's a 60x speedup. ## Summary of changes * Do not add sparse keys into `keys_done` so that remove_overlapping does nothing. * Allow `ValueReconstructSituation::Complete` to be updated again in `ValuesReconstructState::update_key` for sparse keyspaces. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-06 18:17:02 +00:00
Conrad Ludgate	73bdc9a2d0	[proxy]: minor changes to endpoint-cache handling (#9666 ) I think I meant to make these changes over 6 months ago. alas, better late than never. 1. should_reject doesn't eagerly intern the endpoint string 2. Rate limiter uses a std Mutex instead of a tokio Mutex. 3. Recently I introduced a `-local-proxy` endpoint suffix. I forgot to add this to normalize. 4. Random but a small cleanup making the ControlPlaneEvent deser directly to the interned strings.	2024-11-06 17:40:40 +00:00
John Spray	d182ff294c	storcon: respect tenant scheduling policy in drain/fill (#9657 ) ## Problem Pinning a tenant by setting Pause scheduling policy doesn't work because drain/fill code moves the tenant around during deploys. Closes: https://github.com/neondatabase/neon/issues/9612 ## Summary of changes - In drain, only move a tenant if it is in Active or Essential mode - In fill, only move a tenant if it is in Active mode. The asymmetry is a bit annoying, but it faithfully respects the purposes of the modes: Essential is meant to endeavor to keep the tenant available, which means it needs to be drained but doesn't need to be migrated during fills.	2024-11-06 15:14:43 +00:00
Vlad Lazar	4dfa0c221b	pageserver: ingest pre-serialized batches of values (#9579 ) ## Problem https://github.com/neondatabase/neon/pull/9524 split the decoding and interpretation step from ingestion. The output of the first phase is a `wal_decoder::models::InterpretedWalRecord`. Before this patch set that struct contained a list of `Value` instances. We wish to lift the decoding and interpretation step to the safekeeper, but it would be nice if the safekeeper gave us a batch containing the raw data instead of actual values. ## Summary of changes Main goal here is to make `InterpretedWalRecord` hold a raw buffer which contains pre-serialized Values. For this we do: 1. Add a `SerializedValueBatch` type. This is `inmemory_layer::SerializedBatch` with some extra functionality for extension, observing values for shard 0 and tests. 2. Replace `inmemory_layer::SerializedBatch` with `SerializedValueBatch` 3. Make `DatadirModification` maintain a `SerializedValueBatch`. ### `DatadirModification` changes `DatadirModification` now maintains a `SerializedValueBatch` and extends it as new WAL records come in (to avoid flushing to disk on every record). In turn, this cascaded into a number of modifications to `DatadirModification`: 1. Replace `pending_data_pages` and `pending_zero_data_pages` with `pending_data_batch`. 2. Removal of `pending_zero_data_pages` and its cousin `on_wal_record_end` 3. Rename `pending_bytes` to `pending_metadata_bytes` since this is what it tracks now. 4. Adapting of various utility methods like `len`, `approx_pending_bytes` and `has_dirty_data_pages`. Removal of `pending_zero_data_pages` and the optimisation associated with it ((1) and (2)) deserves more detail. Previously all zero data pages went through `pending_zero_data_pages`. We wrote zero data pages when filling gaps caused by relation extension (case A) and when handling special wal records (case B). If it happened that the same WAL record contained a non zero write for an entry in `pending_zero_data_pages` we skipped the zero write. Case A: We handle this differently now. When ingesting the `SerialiezdValueBatch` associated with one PG WAL record, we identify the gaps and fill the them in one go. Essentially, we move from a per key process (gaps were filled after each new key), and replace it with a per record process. Hence, the optimisation is not required anymore. Case B: When the handling of a special record needs to zero out a key, it just adds that to the current batch. I inspected the code, and I don't think the optimisation kicked in here.	2024-11-06 14:10:32 +00:00
Folke Behrens	bdd492b1d8	proxy: Replace "web(auth)" with "console redirect" everywhere (#9655 )	2024-11-06 11:03:38 +00:00
Folke Behrens	5d8284c7fe	proxy: Read cplane JWT with clap arg (#9654 )	2024-11-06 10:27:55 +00:00
Folke Behrens	ebc43efebc	proxy: Refactor cplane types (#9643 ) The overall idea of the PR is to rename a few types to make their purpose more clear, reduce abstraction where not needed, and move types to to more better suited modules.	2024-11-05 23:03:53 +01:00
Folke Behrens	754d2950a3	proxy: Revert ControlPlaneEvent back to struct (#9649 ) Due to neondatabase/cloud#19815 we need to be more tolerant when reading events.	2024-11-05 21:32:33 +00:00
Conrad Ludgate	fcde40d600	[proxy] use the proxy protocol v2 command to silence some logs (#9620 ) The PROXY Protocol V2 offers a "command" concept. It can be of two different values. "Local" and "Proxy". The spec suggests that "Local" be used for health-checks. We can thus use this to silence logging for such health checks such as those from NLB. This additionally refactors the flow to be a bit more type-safe, self documenting and using zerocopy deser.	2024-11-05 17:23:00 +00:00
Erik Grinaker	babfeb70ba	safekeeper: don't allocate send buffers on stack (#9644 ) ## Problem While experimenting with `MAX_SEND_SIZE` for benchmarking, I saw stack overflows when increasing it to 1 MB. Turns out a few buffers of this size are stack-allocated rather than heap-allocated. Even at the default 128 KB size, that's a bit large to allocate on the stack. ## Summary of changes Heap-allocate buffers of size `MAX_SEND_SIZE`.	2024-11-05 17:05:30 +00:00
Ivan Efremov	2f1a56c8f9	proxy: Unify local and remote conn pool client structures (#9604 ) Unify client, EndpointConnPool and DbUserConnPool for remote and local conn. - Use new ClientDataEnum for additional client data. - Add ClientInnerCommon client structure. - Remove Client and EndpointConnPool code from local_conn_pool.rs	2024-11-05 17:33:41 +02:00
John Spray	e30f5fb922	scrubber: remove AWS region assumption, tolerate negative max_project_size (#9636 ) ## Problem First issues noticed when trying to run scrubber find-garbage on Azure: - Azure staging contains projects with -1 set for max_project_size: apparently the control plane treats this as a signed field. - Scrubber code assumed that listing projects should filter to aws-$REGION. This is no longer needed (per comment in the code) because we know hit region-local APIs. This PR doesn't make it work all the way (`init_remote` still assumes S3), but these are necessary precursors. ## Summary of changes - Change max-project_size from unsigned to signed - Remove region filtering in favor of simply using the right region's API (which we already do)	2024-11-05 13:32:50 +00:00
Arpad Müller	70ae8c16da	Construct models::TenantConfig only once (#9630 ) Since 5f83c9290b482dc90006c400dfc68e85a17af785/#1504 we've had duplication in construction of models::TenantConfig, where both constructs contained the same code. This PR removes one of the two locations to avoid the duplication.	2024-11-05 13:02:49 +00:00
Erik Grinaker	8840f3858c	pageserver: return 503 during tenant shutdown (#9635 ) ## Problem Tenant operations may return `409 Conflict` if the tenant is shutting down. This status code is not retried by the control plane, causing user-facing errors during pageserver restarts. Operations should instead return `503 Service Unavailable`, which may be retried for idempotent operations. ## Summary of changes Convert `GetActiveTenantError::WillNotBecomeActive(TenantState::Stopping)` to `ApiError::ShuttingDown` rather than `ApiError::Conflict`. This error is returned by `Tenant::wait_to_become_active` in most (all?) tenant/timeline-related HTTP routes.	2024-11-05 13:16:55 +01:00
Tristan Partin	1e16221f82	Update psycopg2 to latest version for complete PG 17 support Update the types to match. Changes the cursor import to match the C bindings[0]. Link: https://github.com/python/typeshed/issues/12578 [0] Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-04 18:21:59 -06:00
Tristan Partin	34812a6aab	Improve some typing related to performance testing for LR Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-04 15:52:01 -06:00
Arpad Müller	ee68bbf6f5	Add tenant config option to allow timeline_offloading (#9598 ) Allow us to enable timeline offloading for single tenants without having to enable it for the entire pageserver. Part of #8088.	2024-11-04 21:01:18 +01:00
Folke Behrens	1085fe57d3	proxy: Rewrite ControlPlaneEvent as enum (#9627 )	2024-11-04 20:19:26 +01:00
Folke Behrens	59879985b4	proxy: Wrap JWT errors in separate AuthError variant (#9625 ) * Also rename `AuthFailed` variant to `PasswordFailed`. * Before this all JWT errors end up in `AuthError::AuthFailed()`, expects a username and also causes cache invalidation.	2024-11-04 19:56:40 +01:00
Conrad Ludgate	81d1bb1941	quieten aws_config logs (#9626 ) logs during aws authentication are soooo noisy in staging 🙃	2024-11-04 17:28:10 +00:00
Christian Schwarz	06113e94e6	fix(test_regress): always use storcon virtual pageserver API to set tenant config (#9622 ) Problem ------- Tests that directly call the Pageserver Management API to set tenant config are flaky if the Pageserver is managed by Storcon because Storcon is the source of truth and may (theoretically) reconcile a tenant at any time. Solution -------- Switch all users of `set_tenant_config`/`patch_tenant_config_client_side` to use the `env.storage_controller.pageserver_api()` Future Work ----------- Prevent regressions from creeping in. And generally clean up up tenant configuration. Maybe we can avoid the Pageserver having a default tenant config at all and put the default into Storcon instead? * => https://github.com/neondatabase/neon/issues/9621 Refs ---- fixes https://github.com/neondatabase/neon/issues/9522	2024-11-04 17:42:08 +01:00
Erik Grinaker	0d5a512825	safekeeper: add walreceiver metrics (#9450 ) ## Problem We don't have any observability for Safekeeper WAL receiver queues. ## Summary of changes Adds a few WAL receiver metrics: * `safekeeper_wal_receivers`: gauge of currently connected WAL receivers. * `safekeeper_wal_receiver_queue_depth`: histogram of queue depths per receiver, sampled every 5 seconds. * `safekeeper_wal_receiver_queue_depth_total`: gauge of total queued messages across all receivers. * `safekeeper_wal_receiver_queue_size_total`: gauge of total queued message sizes across all receivers. There are already metrics for ingested WAL volume: `written_wal_bytes` counter per timeline, and `safekeeper_write_wal_bytes` per-request histogram.	2024-11-04 15:22:46 +00:00
Conrad Ludgate	8ad1dbce72	[proxy]: parse proxy protocol TLVs with aws/azure support (#9610 ) AWS/azure private link shares extra information in the "TLV" values of the proxy protocol v2 header. This code doesn't action on it, but it parses it as appropriate.	2024-11-04 14:04:56 +00:00
Conrad Ludgate	3dcdbcc34d	remove aws-lc-rs dep and fix storage_broker tls (#9613 ) It seems the ecosystem is not so keen on moving to aws-lc-rs as it's build setup is more complicated than ring (requiring cmake). Eventually I expect the ecosystem should pivot to https://github.com/ctz/graviola/tree/main/rustls-graviola as it stabilises (it has a very simply build step and license), but for now let's try not have a headache of juggling two crypto libs. I also noticed that tonic will just fail with tls without a default provider, so I added some defensive code for that.	2024-11-04 13:29:13 +00:00
Matthias van de Meent	d5de63c6b8	Fix a time zone issue in a PG17 test case (#9618 ) The commit was cherry-picked and thus shouldn't cause issues once we merge the release tag for PostgreSQL 17.1	2024-11-04 12:10:32 +00:00
John Spray	4534f5cdc6	pageserver: make local timeline deletion infallible (#9594 ) ## Problem In https://github.com/neondatabase/neon/pull/9589, timeline offload code is modified to return an explicit error type rather than propagating anyhow::Error. One of the 'Other' cases there is I/O errors from local timeline deletion, which shouldn't need to exist, because our policy is not to try and continue running if the local disk gives us errors. ## Summary of changes - Make `delete_local_timeline_directory` and use `.fatal_err(` on I/O errors --------- Co-authored-by: Erik Grinaker <erik@neon.tech>	2024-11-04 09:11:52 +00:00
Erik Grinaker	0058eb09df	test_runner/performance: add sharded ingest benchmark (#9591 ) Adds a Python benchmark for sharded ingestion. This ingests 7 GB of WAL (100M rows) into a Safekeeper and fans out to 10 shards running on 10 different pageservers. The ingest volume and duration is recorded.	2024-11-02 16:42:10 +00:00
Konstantin Knizhnik	8ac523d2ee	Do not assign page LSN to new (uninitialized) page in ClearVisibilityMapFlags redo handler (#9287 ) ## Problem https://neondb.slack.com/archives/C04DGM6SMTM/p1727872045252899 See https://github.com/neondatabase/neon/issues/9240 ## Summary of changes Add `!page_is_new` check before assigning page lsn. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-01 20:31:29 +02:00
John Spray	3c16bd6e0b	storcon: skip non-active projects in chaos injection (#9606 ) ## Problem We may sometimes use scheduling modes like `Pause` to pin a tenant in its current location for operational reasons. It is undesirable for the chaos task to make any changes to such projects. ## Summary of changes - Add a check for scheduling mode - Add a log line when we do choose to do a chaos action for a tenant: this will help us understand which operations originate from the chaos task.	2024-11-01 16:47:20 +00:00
Erik Grinaker	123816e99a	safekeeper: log slow WalAcceptor sends (#9564 ) ## Problem We don't have any observability into full WalAcceptor queues per timeline. ## Summary of changes Logs a message when a WalAcceptor send has blocked for 5 seconds, and another message when the send completes. This implies that the log frequency is at most once every 5 seconds per timeline, so we don't need further throttling.	2024-11-01 13:47:03 +01:00
Peter Bendel	8b3bcf71ee	revert higher token expiration (#9605 ) ## Problem The IAM role associated with our github action runner supports a max token expiration which is lower than the value we tried. ## Summary of changes Since we believe to have understood the performance regression we (by ensuring availability zone affinity of compute and pageserver) the job should again run in lower than 5 hours and we revert this change instead of increasing the max session token expiration in the IAM role which would reduce our security.	2024-11-01 12:46:02 +01:00
Erik Grinaker	4c2c8d6708	test_runner: fix `tenant_get_shards` with one pageserver (#9603 ) ## Problem `tenant_get_shards()` does not work with a sharded tenant on 1 pageserver, as it assumes an unsharded tenant in this case. This special case appears to have been added to handle e.g. `test_emergency_mode`, where the storage controller is stopped. This breaks e.g. the sharded ingest benchmark in #9591 when run with a single shard. ## Summary of changes Correctly look up shards even with a single pageserver, but add a special case that assumes an unsharded tenant if the storage controller is stopped and the caller provides an explicit pageserver, in order to accomodate `test_emergency_mode`.	2024-11-01 11:25:04 +00:00
Conrad Ludgate	2d1366c8ee	fix pre-commit hook with python stubs (#9602 ) fix #9601	2024-11-01 11:22:38 +00:00
Vlad Lazar	e589c2e5ec	storage_controller: allow deployment infra to use infra token (#9596 ) ## Problem We wish for the deployment orchestrator to use infra scoped tokens, but storcon endpoints it's using require admin scoped tokens. ## Summary of Changes Switch over all endpoints that are used by the deployment orchestrator to use an infra scoped token. This causes no breakage during mixed version scenarios because admin scoped tokens allow access to all endpoints. The deployment orchestrator can cut over to the infra token after this commit touches down in prod. Once this commit is released we should also update the tests code to use infra scoped tokens where appropriate. Currently it would fail on the [compat tests](`9761b6a64e/test_runner/regress/test_storage_controller.py (L69-L71)`).	2024-10-31 18:29:16 +00:00
Conrad Ludgate	9761b6a64e	update pg_session_jwt to use pgrx 0.12 for pg17 (#9595 ) Updates the extension to use pgrx 0.12. No changes to the extensions have been made, the only difference is the pgrx version.	2024-10-31 15:50:41 +00:00
Conrad Ludgate	897cffb9d8	auth_broker: fix local_proxy conn count (#9593 ) our current metrics for http pool opened connections is always negative :D oops	2024-10-31 14:57:55 +00:00
John Spray	552088ac16	pageserver: fix spurious error logs in timeline lifecycle (#9589 ) ## Problem The final part of https://github.com/neondatabase/neon/issues/9543 will be a chaos test that creates/deletes/archives/offloads timelines while restarting pageservers and migrating tenants. Developing that test showed up a few places where we log errors during normal shutdown. ## Summary of changes - UninitializedTimeline's drop should log at info severity: this is a normal code path when some part of timeline creation encounters a cancellation `?` path. - When offloading and finding a `RemoteTimelineClient` in a non-initialized state, this is not an error and should not be logged as such. - The `offload_timeline` function returned an anyhow error, so callers couldn't gracefully pick out cancellation errors from real errors: update this to have a structured error type and use it throughout.	2024-10-31 14:44:59 +00:00
Peter Bendel	51fda118f6	increase lifetime of AWS session token to 12 hours (#9590 ) ## Problem clickbench regression causes clickbench to run >9 hours and the AWS session token is expired before the run completes ## Summary of changes extend lifetime of session token for this job to 12 hours	2024-10-31 13:34:50 +00:00
Anastasia Lubennikova	e96398a552	Add support of extensions for v17 (part 4) (#9568 ) - pg_jsonschema 0.3.3 - pg_graphql 1.5.9 - rum 65e0a752 - pg_tiktoken a5bc447e update support of extensions for v14-v16: - pg_jsonschema 0.3.1 -> 0.3.3 - pg_graphql 1.5.7 -> 1.5.9 - rum 6ab37053 -> 65e0a752 - pg_tiktoken e64e55aa -> a5bc447e	2024-10-31 15:05:24 +02:00
Erik Grinaker	f9d8256d55	pageserver: don't return option from `DeletionQueue::new` (#9588 ) `DeletionQueue::new()` always returns deletion workers, so the returned `Option` is redundant.	2024-10-31 10:51:58 +00:00
Vlad Lazar	411c3aa0d6	pageserver: lift decoding and interpreting of wal into wal_decoder (#9524 ) ## Problem Decoding and ingestion are still coupled in `pageserver::WalIngest`. ## Summary of changes A new type is added to `wal_decoder::models`, InterpretedWalRecord. This type contains everything that the pageserver requires in order to ingest a WAL record. The highlights are the `metadata_record` which is an optional special record type to be handled and `blocks` which stores key, value pairs to be persisted to storage. This type is produced by `wal_decoder::models::InterpretedWalRecord::from_bytes` from a raw PG wal record. The rest of this commit separates decoding and interpretation of the PG WAL record from its application in `WalIngest::ingest_record`. Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-31 10:47:43 +00:00
Arpad Müller	65b69392ea	Disallow offloaded children during timeline deletion (#9582 ) If we delete a timeline that has childen, those children will have their data corrupted. Therefore, extend the already existing safety check to offloaded timelines as well. Part of #8088	2024-10-30 19:37:09 +01:00
Alex Chi Z.	8d70f88b37	refactor(pageserver): use JSON field encoding for consumption metrics cache (#9470 ) In https://github.com/neondatabase/neon/issues/9032, I would like to eventually add a `generation` field to the consumption metrics cache. The current encoding is not backward compatible and it is hard to add another field into the cache. Therefore, this patch refactors the format to store "field -> value", and it's easier to maintain backward/forward compatibility with this new format. ## Summary of changes * Add `NewRawMetric` as the new format. * Add upgrade path. When opening the disk cache, the codepath first inspects the `version` field, and decide how to decode. * Refactor metrics generation code and tests. * Add tests on upgrade / compatibility with the old format. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-30 18:13:11 +00:00
Arpad Müller	bcfe013094	Don't keep around the timeline's remote_client (#9583 ) Constructing a remote client is no big deal. Yes, it means an extra download from S3 but it's not that expensive. This simplifies code paths and scenarios to test. This unifies timelines that have been recently offloaded with timelines that have been offloaded in an earlier invocation of the process. Part of #8088	2024-10-30 18:44:29 +01:00
Arpad Müller	d0a02f3649	Disallow archived timelines to be detached or reparented (#9578 ) Disallow a request for timeline ancestor detach if either the to be detached timeline, or any of the to be reparented timelines are offloaded or archived. In theory we could support timelines that are archived but not offloaded, but archived timelines are at the risk of being offloaded, so we treat them like offloaded timelines. As for offloaded timelines, any code to "support" them would amount to unoffloading them, at which point we can just demand to have the timelines be unarchived. Part of #8088	2024-10-30 17:04:57 +01:00
Tristan Partin	8af9412eb2	Collect compute backpressure throttling time This will tell us how much time the compute has spent throttled if pageserver/safekeeper cannot keep up with WAL generation. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-30 09:58:29 -05:00
Erik Grinaker	96e35e11a6	postgres_ffi: add WAL generator for tests/benchmarks (#9503 ) ## Problem We don't have a convenient way to generate WAL records for benchmarks and tests. ## Summary of changes Adds a WAL generator, exposed as an iterator. It currently only generates logical messages (noops), but will be extended to write actual table rows later. Some existing code for WAL generation has been replaced with this generator, to reduce duplication.	2024-10-30 14:46:39 +03:00
Alexey Kondratov	745061ddf8	chore(compute): Bump pg_mooncake to the latest version (#9576 ) ## Problem There were some critical breaking changes made in the upstream since Oct 29th morning. ## Summary of changes Point it to the topmost commit in the `neon` branch at the time of writing this https://github.com/Mooncake-Labs/pg_mooncake/commits/neon/ `c495cd17d6`	2024-10-30 11:07:02 +01:00
Tristan Partin	0c828c57e2	Remove non-gzipped basebackup code path In July of 2023, Bojan and Chi authored `92aee7e07f`. Our in production pageservers are most definitely at a version where they all support gzipped basebackups.	2024-10-29 23:03:45 -05:00
John Spray	8e2e9f0fed	pageserver: generation-aware storage for TenantManifest (#9555 ) ## Problem When tenant manifest objects are written without a generation suffix, concurrently attached pageservers may stamp on each others writes of the manifest and cause undefined behavior. Closes: #9543 ## Summary of changes - Use download_generation_object helper when reading manifests, to search for the most recent generation - Use Tenant::generation as the generation suffix when writing manifests.	2024-10-29 23:24:04 +01:00
Tristan Partin	b77b9bdc9f	Add tests for sql-exporter metrics Should help us keep non-working metrics from hitting staging or production. Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Fixes: https://github.com/neondatabase/neon/issues/8569 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-29 15:13:06 -05:00
Alex Chi Z.	81f9aba005	fix(pagectl): layer parsing and image layer dump (#9571 ) This patch contains various improvements for the pagectl tool. ## Summary of changes * Rewrite layer name parsing: LayerName now supports all variants we use now. * Drop pagectl's own layer parsing function, use LayerName in the pageserver crate. * Support image layer dumping in the layer dump command using ImageLayer::dump, drop the original implementation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-29 15:16:23 -04:00
Alex Chi Z.	88ff8a7803	feat(pageserver): support partial gc-compaction for lowest retain lsn (#9134 ) part of https://github.com/neondatabase/neon/issues/8921, https://github.com/neondatabase/neon/issues/9114 ## Summary of changes We start the partial compaction implementation with the image layer partial generation. The partial compaction API now takes a key range. We will only generate images for that key range for now, and remove layers fully included in the key range after compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-29 18:25:32 +00:00
Konstantin Knizhnik	0c075fab3a	Add --replica parameter to basebackup (#9553 ) ## Problem See https://github.com/neondatabase/neon/pull/9458 This PR separates PS related changes in #9458 from compute_ctl changes to enforce that PS is deployed before compute. ## Summary of changes This PR adds handlings of `--replica` parameters of backebackup to page server. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-29 18:40:10 +02:00
Anastasia Lubennikova	80e1630042	Use pg_mooncake from our fork. (#9565 ) Switch to main repo once https://github.com/Mooncake-Labs/pg_mooncake/pull/3 is merged	2024-10-29 15:57:52 +00:00
Jakub Kołodziejczak	57499640c5	proxy: more granular http status codes for sql-over-http errors (#9549 ) closes #9532	2024-10-29 15:44:45 +00:00
Anastasia Lubennikova	793ad50b7d	fix allow_unstable_extensions GUC - make it USERSET (#9563 ) fix message wording	2024-10-29 14:25:23 +00:00
John Spray	7a1331eee5	pageserver: make concurrent offloaded timeline operations safe wrt manifest uploads (#9557 ) ## Problem Uploads of the tenant manifest could race between different tasks, resulting in unexpected results in remote storage. Closes: https://github.com/neondatabase/neon/issues/9556 ## Summary of changes - Create a central function for uploads that takes a tokio::sync::Mutex - Store the latest upload in that Mutex, so that when there is lots of concurrency (e.g. archive 20 timelines at once) we can coalesce their manifest writes somewhat.	2024-10-29 13:54:48 +00:00
John Spray	4ef74215e1	pageserver: refactor generation-aware loading code into generic (#9545 ) ## Problem Indices used to be the only kind of object where we had to search across generations to find the most recent one. As of https://github.com/neondatabase/neon/issues/9543, manifests will need the same treatment. ## Summary of changes - Refactor download_index_part to a generic download_generation_object function, which will be usable for downloading manifest objects as well.	2024-10-29 13:00:03 +00:00
Conrad Ludgate	7e00be391d	Merge pull request #9558 from neondatabase/rc/proxy/2024-10-29 Auth broker release 2024-10-29	2024-10-29 12:10:50 +00:00
Conrad Ludgate	d4cbc8cfeb	[auth_broker]: regress test (#9541 ) python based regression test setup for auth_broker. This uses a http mock for cplane as well as the JWKs url. complications: 1. We cannot just use local_proxy binary, as that requires the pg_session_jwt extension which we don't have available in the current test suite 2. We cannot use just any old http mock for local_proxy, as auth_broker requires http2 to local_proxy as such, I used the h2 library to implement an echo server - copied from the examples in the h2 docs.	2024-10-29 11:39:09 +00:00
Conrad Ludgate	47c35f67c3	[proxy]: fix JWT handling for AWS cognito. (#9536 ) In the base64 payload of an aws cognito jwt, I saw the following: ``` "iss":"https:\/\/cognito-idp.us-west-2.amazonaws.com\/us-west-2_redacted" ``` issuers are supposed to be URLs, and URLs are always valid un-escaped JSON. However, `\/` is a valid escape character so what AWS is doing is technically correct... sigh... This PR refactors the test suite and adds a new regression test for cognito.	2024-10-29 11:01:09 +00:00
Peter Bendel	45b558f480	temporarily increase timeout for clickbench benchmark until regression is resolved (#9554 ) ## Problem click bench job in benchmarking workflow has a performance regression causing it to run in timeout of max job run. Suspected root cause: Project has been migrated from single pageserver to storage controller managed project on Oct 14th. Since then the regression shows. ## Summary of changes Increase timeout of pytest to 12 hours. Increase job timeout to 12 hours	2024-10-29 10:53:28 +00:00
Arpad Müller	a73402e646	Offloaded timeline deletion (#9519 ) As pointed out in https://github.com/neondatabase/neon/pull/9489#discussion_r1814699683 , we currently didn't support deletion for offloaded timelines after the timeline has been loaded from the manifest instead of having been offloaded. This was because the upload queue hasn't been initialized yet. This PR thus initializes the timeline and shuts it down immediately. Part of #8088	2024-10-29 10:41:53 +00:00
Vlad Lazar	07b974480c	pageserver: move things around to prepare for decoding logic (#9504 ) ## Problem We wish to have high level WAL decoding logic in `wal_decoder::decoder` module. ## Summary of Changes For this we need the `Value` and `NeonWalRecord` types accessible there, so: 1. Move `Value` and `NeonWalRecord` to `pageserver::value` and `pageserver::record` respectively. 2. Get rid of `pageserver::repository` (follow up from (1)) 3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In theory they could live in `wal_decoder`, but it would create a circular dependency between `wal_decoder` and `postgres_ffi`. Long term it makes sense for those types to be PG version specific, so that will work out nicely. 4. Move higher level WAL record types (to be ingested by pageserver) into `wal_decoder::models` Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-29 10:00:34 +00:00
Arpad Müller	62f5d484d9	Assert the tenant to be active in `unoffload_timeline` (#9539 ) Currently, all callers of `unoffload_timeline` ensure that the tenant the unoffload operation is called on is active. We rely on it being active as we activate the timeline below and don't want to race with the activation code of the tenant (in the worst case, activating a timeline twice). Therefore, add this assertion. Part of #8088	2024-10-29 00:36:05 +00:00
Tristan Partin	4df3987054	Get role name when not a C string We will only have a C string if the specified role is a string. Otherwise, we need to resolve references to public, current_role, current_user, and session_user. Fixes: https://github.com/neondatabase/cloud/issues/19323 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-28 18:21:45 -05:00
Konstantin Knizhnik	0624565617	Create the notion of unstable extensions As a DBaaS provider, Neon needs to provide a stable platform for customers to build applications upon. At the same time however, we also need to enable customers to use the latest and greatest technology, so they can prototype their work, and we can solicit feedback. If all extensions are treated the same in terms of stability, it is hard to meet that goal. There are now two new GUCs created by the Neon extension: neon.allow_unstable_extensions: This is a session GUC which allows a session to install and load unstable extensions. neon.unstable_extensions: This is a comma-separated list of extension names. We can check if a CREATE EXTENSION statement is attempting to install an unstable extension, and if so, deny the request if neon.allow_unstable_extensions is not set to true. Signed-off-by: Tristan Partin <tristan@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-28 17:47:15 -05:00
George MacKerron	7d5f6b6a52	Build `pgrag` extensions x3 (#8486 ) Build the pgrag extensions (rag, rag_bge_small_en_v15, and rag_jina_reranker_v1_tiny_en) as part of the compute node Dockerfile. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-10-28 20:06:36 +00:00
Alex Chi Z.	f7c61e856f	fix(pageserver): bump tokio-epoll-uring (#9546 ) Includes https://github.com/neondatabase/tokio-epoll-uring/pull/58 that fixes the clippy error. ## Summary of changes Update the version of tokio-epoll-uring Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-28 20:03:02 +00:00
Alex Chi Z.	57c21aff9f	refactor(pageserver): remove aux v1 configs (#9494 ) ## Problem Part of https://github.com/neondatabase/neon/issues/8623 ## Summary of changes Removed all aux-v1 config processing code. Note that we persisted it into the index part file, so we cannot really remove the field from index part. I also kept the config item within the tenant config, but we will not read it any more. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-28 19:51:14 +00:00
Erik Grinaker	248558dee8	safekeeper: refactor `WalAcceptor` to be event-driven (#9462 ) ## Problem The `WalAcceptor` main loop currently uses two nested loops to consume inbound messages. This makes it hard to slot in periodic events like metrics collection. It also duplicates the event processing code, and assumes all messages in steady state are AppendRequests (other messages types may be dropped if following an AppendRequest). ## Summary of changes Refactor the `WalAcceptor` loop to be event driven.	2024-10-28 17:18:37 +00:00
Sergey Melnikov	3bad52543f	We don't have legacy proxies anymore (#9544 ) We don't have legacy scram proxies anymore: cc: https://github.com/neondatabase/cloud/issues/9745	2024-10-28 16:42:35 +00:00
Tristan Partin	3d64a7ddcd	Add pg_mooncake to compute-node.Dockerfile Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-28 11:23:30 -05:00
Conrad Ludgate	25f1e5cfeb	[proxy] demote warnings and remove dead-argument (#9512 ) fixes https://github.com/neondatabase/cloud/issues/19000	2024-10-28 15:02:20 +00:00
Rahul Patil	8dd555d396	ci(proxy): Update GH action flag on proxy deployment (#9535 ) ## Problem Based on a recent proxy deployment issue, we deployed another proxy version (proxy-scram), which was not needed when deploying a specific proxy type. we have [PR](https://github.com/neondatabase/infra/pull/2142) to update on the infra branch and need to update CI in this repo which triggers proxy deployment. ## Summary of changes - Update proxy deployment flag ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-10-28 13:17:09 +01:00
Arthur Petukhovsky	01b6843e12	Route pgbouncer logs to virtio-serial (#9488 ) virtio-serial is much more performant than /dev/console emulation, therefore, is much more suitable for the verbose logs inside vm. This commit changes routing for pgbouncer logs, since we've recently noticed it can emit large volumes of logs. Manually tested on staging by pinning a compute image to my test project. Should help with https://github.com/neondatabase/cloud/issues/19072	2024-10-28 12:09:47 +00:00
John Spray	93987b5a4a	tests: add test_storage_controller_onboard_detached (#9431 ) ## Problem We haven't historically taken this API route where we would onboard a tenant to the controller in detached state. It worked, but we didn't have test coverage. ## Summary of changes - Add a test that onboards a tenant to the storage controller in Detached mode, and checks that deleting it without attaching it works as expected.	2024-10-28 11:11:12 +00:00
John Spray	33baca07b6	storcon: add an API to cancel ongoing reconciler (#9520 ) ## Problem If something goes wrong with a live migration, we currently only have awkward ways to interrupt that: - Restart the storage controller - Ask it to do some other modification/migration on the shard, which we don't really want. ## Summary of changes - Add a new `/cancel` control API, and storcon_cli wrapper for it, which fires the Reconciler's cancellation token. This is just for on-call use and we do not expect it to be used by any other services.	2024-10-28 09:26:01 +00:00
John Spray	923974d4da	safekeeper: don't un-evict timelines during snapshot API handler (#9428 ) ## Problem When we use pull_timeline API on an evicted timeline, it gets downloaded to serve the snapshot API request. That means that to evacuate all the timelines from a node, the node needs enough disk space to download partial segments from all timelines, which may not be physically the case. Closes: #8833 ## Summary of changes - Add a "try" variant of acquiring a residence guard, that returns None if the timeline is offloaded - During snapshot API handler, take a different code path if the timeline isn't resident, where we just read the checkpoint and don't try to read any segments.	2024-10-28 08:47:12 +00:00
Arpad Müller	e7277885b3	Don't consider archived timelines for synthetic size calculation (#9497 ) Archived timelines should not count towards synthetic size. Closes #9384. Part of #8088.	2024-10-26 13:27:57 +00:00
dependabot[bot]	80262e724f	build(deps): bump werkzeug from 3.0.3 to 3.0.6 (#9527 )	2024-10-26 08:24:15 +01:00
Yuchen Liang	85b954f449	pageserver: add tokio-epoll-uring slots waiters queue depth metrics (#9482 ) In complement to https://github.com/neondatabase/tokio-epoll-uring/pull/56. ## Problem We want to make tokio-epoll-uring slots waiters queue depth observable via Prometheus. ## Summary of changes - Add `pageserver_tokio_epoll_uring_slots_submission_queue_depth` metrics as a `Histogram`. - Each thread-local tokio-epoll-uring system is given a `LocalHistogram` to observe the metrics. - Keep a list of `Arc<ThreadLocalMetrics>` used on-demand to flush data to the shared histogram. - Extend `Collector::collect` to report `pageserver_tokio_epoll_uring_slots_submission_queue_depth`. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-25 21:30:57 +01:00
Arpad Müller	76328ada05	Fix unoffload_timeline races with creation (#9525 ) This PR does two things: 1. Obtain a `TimelineCreateGuard` object in `unoffload_timeline`. This prevents two unoffload tasks from racing with each other. While they already obtain locks for `timelines` and `offloaded_timelines`, they aren't sufficient, as we have already constructed an entire timeline at that point. We shouldn't ever have two `Timeline` objects in the same process at the same time. 2. don't allow timeline creations for timelines that have been offloaded. Obviously they already exist, so we should not allow creation. the previous logic only looked at the timelines list. Part of #8088	2024-10-25 20:06:27 +00:00
Erik Grinaker	b54b632c6a	safekeeper: don't pass conf into storage constructors (#9523 ) ## Problem The storage components take an entire `SafekeeperConf` during construction, but only actually use the `no_sync` field. This makes it hard to understand the storage inputs (which fields do they actually care about?), and is also inconvenient for tests and benchmarks that need to set up a lot of unnecessary boilerplate. ## Summary of changes * Don't take the entire config, but pass in the `no_sync` field explicitly. * Take the timeline dir instead of `ttid` as an input, since it's the only thing it cares about. * Fix a couple of tests to not leak tempdirs. * Various minor tweaks.	2024-10-25 18:19:52 +01:00
Erik Grinaker	9909551f47	safekeeper: fix version in `TimelinePersistentState::empty()` (#9521 ) ## Problem The Postgres version in `TimelinePersistentState::empty()` is incorrect: the major version should be multiplied by 10000. ## Summary of changes Multiply the version by 10000.	2024-10-25 16:22:35 +01:00
Arseny Sher	700b102b0f	safekeeper: retry eviction. (#9485 ) Without this manager may sleep forever after eviction failure without retries.	2024-10-25 17:48:29 +03:00
Conrad Ludgate	dbadb0f9bb	proxy: propagate session IDs (#9509 ) fixes #9367 by sending session IDs to local_proxy, and also returns session IDs to the client for easier debugging.	2024-10-25 14:34:19 +00:00
John Spray	8297f7a181	pageserver: fix N^2 I/O when processing relation drops in transaction abort (#9507 ) ## Problem We have some known N^2 behaviors when it comes to large relation counts, due to the monolithic encoding and full rewrites of of RelDirectory each time a relation is added. Ordinarily our backpressure mechanisms give "slow but steady" performance when creating/dropping/truncating relations. However, in the case of a transaction abort, it is possible for a single WAL record to drop an unbounded number of relations. The results in an unavailable compute, as when it sends one of these records, it can stall the pageserver's ingest for many minutes, even though the compute only sent a small amount of WAL. Closes https://github.com/neondatabase/neon/issues/9505 ## Summary of changes - Rewrite relation-dropping code to do one read/modify/write cycle of RelDirectory, instead of doing it separately for each relation in a loop. - Add a test for the bug scenario encountered: `test_tx_abort_with_many_relations` The test has ~40s runtime on my workstation. About 1 second of that is the part where we wait for ingest to catch up after a rollback, the rest is the slowness of creating and truncating a large number of relations. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-25 15:09:02 +01:00
Christian Schwarz	2090e928d1	refactor(timeline creation): idempotency checking (#9501 ) # Context In the PGDATA import code (https://github.com/neondatabase/neon/pull/9218) I add a third way to create timelines, namely, by importing from a copy of a vanilla PGDATA directory in object storage. For idempotency, I'm using the PGDATA object storage location specification, which is stored in the IndexPart for the entire lifespan of the timeline. When loading the timeline from remote storage, that value gets stored inside `struct Timeline` and timeline creation compares the creation argument with that value to determine idempotency of the request. # Changes This PR refactors the existing idempotency handling of Timeline bootstrap and branching such that we simply compare the `CreateTimelineIdempotency` struct, using the derive-generated `PartialEq` implementation. Also, by spelling idempotency out in the type names, I find it adds a lot of clarity. The pathway to idempotency via requester-provided idempotency key also becomes very straight-forward, if we ever want to do this in the future. # Refs * platform context: https://github.com/neondatabase/neon/pull/9218 * product context: https://github.com/neondatabase/cloud/issues/17507 * stacks on top of https://github.com/neondatabase/neon/pull/9366	2024-10-25 14:44:20 +01:00
Tristan Partin	05eff3a67e	Move logical replication slot monitor neon.c is getting crowded and the logical replication slot monitor is a good candidate for reorganization. It is very self-contained, and being in a separate file will make it that much easier to find. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-25 08:41:44 -05:00
Arseny Sher	c6cf5e7c0f	Make test_pageserver_lsn_wait_error_safekeeper_stop less aggressive. (#9517 ) Previously it inserted ~150MiB of WAL while expecting page fetching to work in 1s (wait_lsn_timeout=1s). It failed in CI in debug builds. Instead, just directly wait for the wanted condition, i.e. needed safekeepers are reported in pageserver timed out waiting for WAL error message. Also set NEON_COMPUTE_TESTING_BASEBACKUP_RETRIES to 1 in this test and neighbour one, it reduces execution time from 2.5m to ~10s.	2024-10-25 14:13:46 +01:00
Christian Schwarz	e0c7f1ce15	remote_storage(local_fs): return correct file sizes (#9511 ) ## Problem `local_fs` doesn't return file sizes, which I need in PGDATA import (#9218) ## Solution Include file sizes in the result. I would have liked to add a unit test, and started doing that in * https://github.com/neondatabase/neon/pull/9510 by extending the common object storage tests (`libs/remote_storage/tests/common/tests.rs`) to check for sizes as well. But it turns out that localfs is not even covered by the common object storage tests and upon closer inspection, it seems that this area needs more attention. => punt the effort into https://github.com/neondatabase/neon/pull/9510	2024-10-25 12:20:53 +00:00
Christian Schwarz	6f5c262684	pageserver: add testing API to scan layers for disposable keys (#9393 ) This PR adds a pageserver mgmt API to scan a layer file for disposable keys. It hooks it up to the sharding compaction test, demonstrating that we're not filtering out all disposable keys. This is extracted from PGDATA import (https://github.com/neondatabase/neon/pull/9218) where I do the filtering of layer files based on `is_key_disposable`.	2024-10-25 14:16:45 +02:00
Jakub Kołodziejczak	9768f09f6b	proxy: don't follow redirects for user provided JWKS urls + set custom user agent (#9514 ) partially fixes https://github.com/neondatabase/cloud/issues/19249 ref https://docs.rs/reqwest/latest/reqwest/redirect/index.html > By default, a Client will automatically handle HTTP redirects, having a maximum redirect chain of 10 hops. To customize this behavior, a redirect::Policy can be used with a ClientBuilder.	2024-10-25 14:04:41 +02:00
Yuchen Liang	db900ae9d0	fix(test): remove too strict layers_removed==0 check in test_readonly_node_gc (#9506 ) Fixes #9098 ## Problem `test_readonly_node_gc` is flaky. As shown in [Allure Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9469/11444519440/index.html#suites/3ccffb1d100105b98aed3dc19b717917/2c02073738fa2b39), we would get a `AssertionError: No layers should be removed, old layers are guarded by leases.` after the test restarts pageservers or after reconfigure pageservers. During the investigation, we found that the layers has LSN (`0/1563088`) greater than the LSN (`0x1562000`) protected by the lease. For instance, Layers removed <pre> 000000067F00000005000034540100000000-000000067F00000005000040050100000000__000000000<b><i>1563088</i></b>-00000001 (shard 0002) 000000068000000000000017E20000000001-010000000100000001000000000000000001__000000000<b><i>1563088</i></b>-00000001 (shard 0002) </pre> Lsn Lease Granted <pre> handle_make_lsn_lease{lsn=<b><i>0/1562000</i></b> shard_id=0002 shard_id=0002}: lease created, valid until 2024-10-21 </pre> This means that these layers are not guarded by the leases: they are in "future", not visible to the static endpoint. ## Summary of changes - Remove the assertion layers_removed == 0 after trigger timeline GC while holding the lease. Instead rely on the successful execution of the`SELECT` query to test lease validity. - Improve test logging Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-25 12:50:47 +01:00
Arpad Müller	4d9036bf1f	Support offloaded timelines during shard split (#9489 ) Before, we didn't copy over the `index-part.json` of offloaded timelines to the new shard's location, resulting in the new shard not knowing the timeline even exists. In #9444, we copy over the manifest, but we also need to do this for `index-part.json`. As the operations to do are mostly the same between offloaded and non-offloaded timelines, we can iterate over all of them in the same loop, after the introduction of a `TimelineOrOffloadedArcRef` type to generalize over the two cases. This is analogous to the deletion code added in #8907. The added test also ensures that the sharded archival config endpoint works, something that has not yet been ensured by tests. Part of #8088	2024-10-25 12:32:46 +02:00
Vlad Lazar	b3bedda6fd	pageserver/walingest: log on gappy rel extend (#9502 ) ## Problem https://github.com/neondatabase/neon/pull/9492 added a metric to track the total count of block gaps filled on rel extend. More context is needed to understand when this happens. The current theory is that it may only happen on pg 14 and pg 15 since they do not WAL log relation extends. ## Summary of Changes A rate limited log is added.	2024-10-25 11:15:53 +01:00
Christian Schwarz	b782b11b33	refactor(timeline creation): represent bootstrap vs branch using enum (#9366 ) # Problem Timeline creation can either be bootstrap or branch. The distinction is made based on whether the `ancestor_` fields are present or not. In the PGDATA import code (https://github.com/neondatabase/neon/pull/9218), I add a third variant to timeline creation. # Solution The above pushed me to refactor the code in Pageserver to distinguish the different creation requests through enum variants. There is no externally observable effect from this change. On the implementation level, a notable change is that the acquisition of the `TimelineCreationGuard` happens later than before. This is necessary so that we have everything in place to construct the `CreateTimelineIdempotency`. Notably, this moves the acquisition of the creation guard _after_ the acquisition of the `gc_cs` lock in the case of branching. This might appear as if we're at risk of holding `gc_cs` longer than before this PR, but, even before this PR, we were holding `gc_cs` until after the `wait_completion()` that makes the timeline creation durable in S3 returns. I don't see any deadlock risk with reversing the lock acquisition order. As a drive-by change, I found that the `create_timeline()` function in `neon_local` is unused, so I removed it. # Refs platform context: https://github.com/neondatabase/neon/pull/9218 * product context: https://github.com/neondatabase/cloud/issues/17507 * next PR stacked atop this one: https://github.com/neondatabase/neon/pull/9501	2024-10-25 10:04:27 +00:00
Vlad Lazar	5069123b6d	pageserver: refactor ingest inplace to decouple decoding and handling (#9472 ) ## Problem WAL ingest couples decoding of special records with their handling (updates to the storage engine mostly). This is a roadblock for our plan to move WAL filtering (and implicitly decoding) to safekeepers since they cannot do writes to the storage engine. ## Summary of changes This PR decouples the decoding of the special WAL records from their application. The changes are done in place and I've done my best to refrain from refactorings and attempted to preserve the original code as much as possible. Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-24 17:12:47 +01:00
Alex Chi Z.	fb0406e9d2	refactor(pageserver): refactor split writers using batch layer writer (#9493 ) part of https://github.com/neondatabase/neon/issues/9114, https://github.com/neondatabase/neon/issues/8836, https://github.com/neondatabase/neon/issues/8362 The split layer writer code can be used in a more general way: the caller puts unfinished writers into the batch layer writer and let batch layer writer to ensure the atomicity of the layer produces. ## Summary of changes * Add batch layer writer, which atomically finishes the layers. `BatchLayerWriter::finish` is simply a copy-paste from previous split layer writers. * Refactor split writers to use the batch layer writer. * The current split writer tests cover all code path of batch layer writer. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-24 10:49:54 -04:00
Alexander Bayandin	b8a311131e	CI: remove `git config --add safe.directory` hack (#9391 ) ## Problem We have `git config --global --add safe.directory ...` leftovers from the past, but `actions/checkout` does it by default (since v3.0.2, we use v4) ## Summary of changes - Remove `git config --global --add safe.directory ...` hack	2024-10-24 15:49:26 +01:00
John Spray	d589498c6f	storcon: respect Reconciler::cancel during await_lsn (#9486 ) ## Problem When a pageserver is misbehaving (e.g. we hit an ingest bug or something is pathologically slow), the storage controller could get stuck in the part of live migration that waits for LSNs to catch up. This is a problem, because it can prevent us migrating the troublesome tenant to another pageserver. Closes: https://github.com/neondatabase/cloud/issues/19169 ## Summary of changes - Respect Reconciler::cancel during await_lsn.	2024-10-24 15:23:09 +01:00
Folke Behrens	d56599df2a	Merge pull request #9499 from neondatabase/rc/proxy/2024-10-24 Proxy release 2024-10-24	2024-10-24 10:34:56 +02:00
Christian Schwarz	6f34f97573	refactor(pageserver(load_remote_timeline)) remove dead code handling absence of IndexPart (#9408 ) The code is dead at runtime since we're nowadays always running with remote storage and treat it as the source of truth during attach. Clean it up as a preliminary to https://github.com/neondatabase/neon/pull/9218. Related: https://github.com/neondatabase/neon/pull/9366	2024-10-24 09:00:22 +01:00
Tristan Partin	b86432c29e	Fix buggy sizeof A sizeof on a pointer on a 64 bit machine is 8 bytes whereas Entry::old_name is a 64 byte array of characters. There was most likely no fallout since the string would start with NUL bytes, but best to fix nonetheless. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-23 21:52:22 -06:00
Vlad Lazar	ac1205c14c	pageserver: add metric for number of zeroed pages on rel extend (#9492 ) ## Problem Filling the gap in with zeroes is annoying for sharded ingest. We are not sure it even happens in reality. ## Summary of Changes Add one global counter which tracks how many such gap blocks we filled on relation extends. We can add more metrics once we understand the scope.	2024-10-23 19:58:28 +01:00
John Spray	e3ff87ce3b	tests: avoid using background_process when invoking pg_ctl (#9469 ) ## Problem Occasionally, we get failures to start the storage controller's db with errors like: ``` aborting due to panic at /__w/neon/neon/control_plane/src/background_process.rs:349:67: claim pid file: lock file Caused by: file is already locked ``` e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9428/11380574562/index.html#/testresult/1c68d413ea9ecd4a This is happening in a stop,start cycle during a test. Presumably the pidfile from the startup background process is still held at the point we stop, because we let pg_ctl keep running in the background. ## Summary of changes - Refactor pg_ctl invocations into a helper - In the controller's `start` function, use pg_ctl & a wait loop for pg_isready, instead of using background_process --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-10-23 16:29:55 +00:00
Tristan Partin	0595320c87	Protect call to pg_current_wal_lsn() in retained_wal query We can't call pg_current_wal_lsn() if we are a standby instance (read replica). Any attempt to call this function while in recovery results in: ERROR: recovery is in progress Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-23 09:55:00 -06:00
Folke Behrens	92d5e0e87a	proxy: clear lib.rs of code items (#9479 ) We keep lib.rs for crate configs, lint configs and re-exports for the binaries.	2024-10-23 08:21:28 +02:00
Arpad Müller	3a3bd34a28	Rename IndexPart::{from_s3_bytes,to_s3_bytes} (#9481 ) We support multiple storage backends now, so remove the `_s3_` from the name. Analogous to the names adopted for tenant manifests added in #9444.	2024-10-23 00:34:24 +02:00
Alex Chi Z.	64949a37a9	fix(pageserver): make delta split layer writer finish atomic (#9048 ) similar to https://github.com/neondatabase/neon/pull/8841, we make the delta layer writer atomic when finishing the layers. ## Summary of changes * `put_value` not taking discard fn anymore * `finish` decides what layers to keep --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-22 22:06:21 +00:00
Arpad Müller	6f8fcdf9ea	Timeline offloading persistence (#9444 ) Persist timeline offloaded state to S3. Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful. This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched. This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start. We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active. Part of #9386, #8088	2024-10-22 20:52:30 +00:00
Tristan Partin	fcb55a2aa2	Fix copy-paste error in checkpoints_timed metric Importing the wrong metric. Sigh... Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-22 14:34:26 -06:00
a-masterov	f36cf3f885	Fix local errors for the tests with the versions mix (#9477 ) ## Problem If the environment variables `COMPATIBILITY_NEON_BIN` or `COMPATIBILITY_POSTGRES_DISTRIB_DIR` are not set (this is usual during a local run), the tests with the versions mix cannot run. ## Summary of changes If these variables are not set turn off the version mix. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-10-22 21:58:55 +02:00
John Spray	8dca188974	storage controller: add metrics for tenant shard, node count (#9475 ) ## Problem Previously, figuring out how many tenant shards were managed by a storage controller was typically done by peeking at the database or calling into the API. A metric makes it easier to monitor, as unexpectedly increasing shard counts can be indicative of problems elsewhere in the system. ## Summary of changes - Add metrics `storage_controller_pageserver_nodes` (updated on node CRUD operations from Service) and `storage_controller_tenant_shards` (updated RAII-style from TenantShard)	2024-10-22 19:43:02 +01:00
Tristan Partin	b7fa93f6b7	Use make's builtin RM variable At least as far as removing individual files goes, this is the best pattern for removing. I can't say the same for removing directories, but I went ahead and changed those to `$(RM) -r` anyway. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-22 09:14:29 -06:00
Arseny Sher	1e8e04bb2c	safekeeper: refactor timeline initialization (#9362 ) Always do timeline init through atomic rename of temp directory. Add GlobalTimelines::load_temp_timeline which does this, and use it from both pull_timeline and basic timeline creation. Fixes a collection of issues: - previously timeline creation didn't really flushed cfile to disk due to 'nothing to do if state didn't change' check; - even if it did, without tmp dir it is possible to lose the cfile but leave timeline dir in place, making it look corrupted; - tenant directory creation fsync was missing in timeline creation; - pull_timeline is now protected from concurrent both itself and timeline creation; - now global timelines map entry got special CreationInProgress entry type which prevents from anyone getting access to timeline while it is being created (previously one could get access to it, but it was locked during creation, which is valid but confusing if creation failed). fixes #8927	2024-10-22 07:11:36 +01:00
David Gomes	94369af782	chore(compute): bumps pg_session_jwt to latest version (#9474 )	2024-10-21 23:39:30 +00:00
Arpad Müller	34b6bd416a	offloaded timeline list API (#9461 ) Add a way to list the offloaded timelines. Before, one had to look at logs to figure out if a timeline has been offloaded or not, or use the non-presence of a certain timeline in the list of normal timelines. Now, one can list them directly. Part of #8088	2024-10-21 16:33:05 +01:00
Yuchen Liang	49d5e56c08	pageserver: use direct IO for delta and image layer reads (#9326 ) Part of #8130 ## Problem Pageserver previously goes through the kernel page cache for all the IOs. The kernel page cache makes light-loaded pageserver have deceptive fast performance. Using direct IO would offer predictable latencies of our virtual file IO operations. In particular for reads, the data pages also have an extremely low temporal locality because the most frequently accessed pages are cached on the compute side. ## Summary of changes This PR enables pageserver to use direct IO for delta layer and image layer reads. We can ship them separately because these layers are write-once, read-many, so we will not be mixing buffered IO with direct IO. - implement `IoBufferMut`, an buffer type with aligned allocation (currently set to 512). - use `IoBufferMut` at all places we are doing reads on image + delta layers. - leverage Rust type system and use `IoBufAlignedMut` marker trait to guarantee that the input buffers for the IO operations are aligned. - page cache allocation is also made aligned. _* in-memory layer reads and the write path will be shipped separately._ ## Testing Integration test suite run with O_DIRECT enabled: https://github.com/neondatabase/neon/pull/9350 ## Performance We evaluated performance based on the `get-page-at-latest-lsn` benchmark. The results demonstrate a decrease in the number of IOps, no sigificant change in the latency mean, and an slight improvement on the p99.9 and p99.99 latencies. [Benchmark](https://www.notion.so/neondatabase/Benchmark-O_DIRECT-for-image-and-delta-layers-2024-10-01-112f189e00478092a195ea5a0137e706?pvs=4) ## Rollout We will add `virtual_file_io_mode=direct` region by region to enable direct IO on image + delta layers. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-21 11:01:25 -04:00
Alex Chi Z.	aca81f5fa4	fix(pageserver): make image split layer writer finish atomic (#8841 ) Part of https://github.com/neondatabase/neon/issues/8836 ## Summary of changes This pull request makes the image layer split writer atomic when finishing the layers. All the produced layers either finish at the same time, or discard at the same time. Note that this does not prevent atomicity when crash, but anyways, it will be cleaned up on pageserver restart. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-21 15:59:48 +01:00
Ivan Efremov	2dcac94194	proxy: Use common error interface for error handling with cplane (#9454 ) - Remove obsolete error handles. - Use one source of truth for cplane errors. #18468	2024-10-21 17:20:09 +03:00
Ivan Efremov	ababa50cce	Use '-f' for make clean in Makefile compute (#9464 ) Use '-f' instead of '--force' because it is impossible to clean the targets on MacOS	2024-10-21 16:20:39 +03:00
Alexander Bayandin	163beaf9ad	CI: use build-tools on Debian 12 whenever we use Neon artifact (#9463 ) ## Problem ``` + /tmp/neon/pg_install/v16/bin/psql '***' -c 'SELECT version()' /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/bin/psql) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/bin/psql) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5) ``` ## Summary of changes - Use `build-tools:pinned-bookworm` whenever we download Neon artefact	2024-10-21 12:14:19 +01:00
Alexander Bayandin	5b37485c99	Rename dockerfiles from `Dockerfile.<something>` to `<something>.Dockerfile` (#9446 ) ## Problem Our dockerfiles, for some historical reason, have unconventional names `Dockerfile.<something>`, and some tools (like GitHub UI) fail to highlight the syntax in them. > Some projects may need distinct Dockerfiles for specific purposes. A common convention is to name these `<something>.Dockerfile` From: https://docs.docker.com/build/concepts/dockerfile/#filename ## Summary of changes - Rename `Dockerfile.build-tools` -> `build-tools.Dockerfile` - Rename `compute/Dockerfile.compute-node` -> `compute/compute-node.Dockerfile`	2024-10-21 09:51:12 +01:00
Folke Behrens	ed958da38a	proxy: Make tests fail fast when test proxy exited early (#9432 ) This currently happens when proxy is not compiled with feature `testing`. Also fix an adjacent function.	2024-10-21 08:29:23 +00:00
Conrad Ludgate	cc25ef7342	bump pg-session-jwt version (#9455 ) forgot to bump this before	2024-10-20 14:42:50 +02:00
Arpad Müller	71d09c78d4	Accept basebackup <tenant> <timeline> --gzip requests (#9456 ) In #9453, we want to remove the non-gzipped basebackup code in the computes, and always request gzipped basebackups. However, right now the pageserver's page service only accepts basebackup requests in the following formats: * `basebackup <tenant_id> <timeline_id>`, lsn is determined by the pageserver as the most recent one (`timeline.get_last_record_rlsn()`) * `basebackup <tenant_id> <timeline_id> <lsn>` * `basebackup <tenant_id> <timeline_id> <lsn> --gzip` We add a fourth case, `basebackup <tenant_id> <timeline_id> --gzip` to allow gzipping the request for the latest lsn as well.	2024-10-19 00:23:49 +02:00
Tristan Partin	62a334871f	Take the collector name as argument when generating sql_exporter configs In neon_collector_autoscaling.jsonnet, the collector name is hardcoded to neon_collector_autoscaling. This issue manifests itself such that sql_exporter would not find the collector configuration. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-18 09:36:29 -05:00
Vlad Lazar	e162ab8b53	storcon: handle ongoing deletions gracefully (#9449 ) ## Problem Pageserver returns 409 (Conflict) if any of the shards are already deleting the timeline. This resulted in an error being propagated out of the HTTP handler and to the client. It's an expected scenario so we should handle it nicely. This caused failures in `test_storage_controller_smoke` [here](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9435/11390431900/index.html#suites/8fc5d1648d2225380766afde7c428d81/86eee4b002d6572d). ## Summary of Changes Instead of returning an error on 409s, we now bubble the status code up and let the HTTP handler code retry until it gets a 404 or times out.	2024-10-18 15:33:04 +01:00
Conrad Ludgate	5cbdec9c79	[local_proxy]: install pg_session_jwt extension on demand (#9370 ) Follow up on #9344. We want to install the extension automatically. We didn't want to couple the extension into compute_ctl so instead local_proxy is the one to issue requests specific to the extension. depends on #9344 and #9395	2024-10-18 14:41:21 +01:00
Vlad Lazar	ec6d3422a5	pageserver: disconnect when asking client to reconnect (#9390 ) ## Problem Consider the following sequence of events: 1. Shard location gets downgraded to secondary while there's a libpq connection in pagestream mode from the compute 2. There's no active tenant, so we return `QueryError::Reconnect` from `PageServerHandler::handle_get_page_at_lsn_request`. 3. Error bubbles up to `PostgresBackendIO::process_message`, bailing us out of pagestream mode. 4. We instruct the client to reconnnect, but continue serving the libpq connection. The client isn't yet aware of the request to reconnect and believes it is still in pagestream mode. Pageserver fails to deserialize get page requests wrapped in `CopyData` since it's not in pagestream mode. ## Summary of Changes When we wish to instruct the client to reconnect, also disconnect from the server side after flushing the error. Closes https://github.com/neondatabase/cloud/issues/17336	2024-10-18 13:38:59 +01:00
Arseny Sher	fecff15f18	walproposer: immediately exit if sync-safekeepers collected 0/0. (#9442 ) Otherwise term history starting with 0/0 is streamed to safekeepers. ref https://github.com/neondatabase/neon/issues/9434	2024-10-18 15:31:50 +03:00
Jere Vaara	3532ae76ef	compute_ctl: Add endpoint that allows extensions to be installed (#9344 ) Adds endpoint to install extensions: POST `/extensions` ``` {"extension":"pg_sessions_jwt","database":"neondb","version":"1.0.0"} ``` Will be used by `local-proxy`. Example, for the JWT authentication to work the database needs to have the pg_session_jwt extension and also to enable JWT to work in RLS policies. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-10-18 15:07:36 +03:00
Folke Behrens	15fecffe6b	Update ruff to much newer version (#9433 ) Includes a multidict patch release to fix build with newer cpython.	2024-10-18 12:42:41 +02:00
Arseny Sher	98fee7a97d	Increase shared_buffers in test_subscriber_synchronous_commit. (#9427 ) Might make the test less flaky.	2024-10-18 13:31:14 +03:00
John Spray	b7173b1ef0	storcon: fix case where we might fail to send compute notifications after two opposite migrations (#9435 ) ## Problem If we migrate A->B, then B->A, and the notification of A->B fails, then we might have retained state that makes us think "A" is the last state we sent to the compute hook, whereas when we migrate B->A we should really be sending a fresh notification in case our earlier failed notification has actually mutated the remote compute config. Closes: #9417 ## Summary of changes - Add a reproducer for the bug (`test_storage_controller_compute_hook_revert`) - Refactor compute hook code to represent remote state with `ComputeRemoteState` which stores a boolean for whether the compute has fully applied the change as well as the request that the compute accepted. - The actual bug fix: after sending a compute notification, if we got a 423 response then update our ComputeRemoteState to reflect that we have mutated the remote state. This way, when we later try and notify for our historic location, we will properly see that as a change and send the notification. Co-authored-by: Vlad Lazar <vlad@neon.tech>	2024-10-18 11:29:23 +01:00
Jere Vaara	24654b8eee	compute_ctl: Add endpoint that allows setting role grants (#9395 ) This PR introduces a `/grants` endpoint which allows setting specific `privileges` to certain `role` for a certain `schema`. Related to #9344 Together these endpoints will be used to configure JWT extension and set correct usage to its schema to specific roles that will need them. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-10-18 11:25:45 +01:00
Conrad Ludgate	b8304f90d6	2024 oct new clippy lints (#9448 ) Fixes new lints from `cargo +nightly clippy` (`clippy 0.1.83 (798fb83f 2024-10-16)`)	2024-10-18 10:27:50 +01:00
Conrad Ludgate	d762ad0883	update rustls (#9396 ) The forever ongoing effort of juggling multiple versions of rustls :3 now with new crypto library aws-lc. Because of dependencies, it is currently impossible to not have both ring and aws-lc in the dep tree, therefore our only options are not updating rustls or having both crypto backends enabled... According to benchmarks run by the rustls maintainer, aws-lc is faster than ring in some cases too <https://jbp.io/graviola/>, so it's not without its upsides,	2024-10-17 20:45:37 +01:00
Arpad Müller	928d98b6dc	Update Rust to 1.82.0 and mold to 2.34.0 (#9445 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1820-2024-10-17). Also update mold. [release notes for 2.34.0](https://github.com/rui314/mold/releases/tag/v2.34.0), [release notes for 2.34.1](https://github.com/rui314/mold/releases/tag/v2.34.1). Prior update was in #8939.	2024-10-17 21:25:51 +02:00
John Spray	24398bf060	pageserver: detect & warn on loading an old index which is probably the result of a bad generation (#9383 ) ## Problem The pageserver generally trusts the storage controller/control plane to give it valid generations. However, sometimes it should be obvious that a generation is bad, and for defense in depth we should detect that on the pageserver. This PR is part 1 of 2: 1. in this PR we detect and warn on such situations, but do not block starting up the tenant. Once we have confidence that the check is not firing unexpectedly in the field 2. part 2 of 2 will introduce a condition that refuses to start a tenant in this situtation, and a test for that (maybe, if we can figure out how to spoof an ancient mtime) Related: #6951 ## Summary of changes - When loading an index older than 2 weeks, log an INFO message noting that we will check for other indices - When loading an index older than 2 weeks _and_ a newer-generation index exists, log a warning.	2024-10-17 19:02:24 +01:00
Alex Chi Z.	63b3491c1b	refactor(pageserver): remove aux v1 code path (#9424 ) Part of the aux v1 retirement https://github.com/neondatabase/neon/issues/8623 ## Summary of changes Remove write/read path for aux v1, but keeping the config item and the index part field for now. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-17 17:22:44 +01:00
Anastasia Lubennikova	858867c627	Add logging of installed_extensions (#9438 ) Simple PR to log installed_extensions statistics. in the following format: ``` 2024-10-17T13:53:02.860595Z INFO [NEON_EXT_STAT] {"extensions":[{"extname":"plpgsql","versions":["1.0"],"n_databases":2},{"extname":"neon","versions":["1.5"],"n_databases":1}]} ```	2024-10-17 16:35:19 +01:00
Erik Grinaker	299cde899b	safekeeper: flush WAL on compute disconnect (#9436 ) ## Problem In #9259, we found that the `check_safekeepers_synced` fast path could result in a lower basebackup LSN than the `flush_lsn` reported by Safekeepers in `VoteResponse`, causing the compute to panic once on startup. This would happen if the Safekeeper had unflushed WAL records due to a compute disconnect. The `TIMELINE_STATUS` query would report a `flush_lsn` below these unflushed records, while `VoteResponse` would flush the WAL and report the advanced `flush_lsn`. See https://github.com/neondatabase/neon/issues/9259#issuecomment-2410849032. ## Summary of changes Flush the WAL if the compute disconnects during WAL processing.	2024-10-17 17:19:18 +02:00
Erik Grinaker	4c9835f4a3	storage_controller: delete stale shards when deleting tenant (#9333 ) ## Problem Tenant deletion only removes the current shards from remote storage. Any stale parent shards (before splits) will be left behind. These shards are kept since child shards may reference data from the parent until new image layers are generated. ## Summary of changes * Document a special case for pageserver tenant deletion that deletes all shards in remote storage when given an unsharded tenant ID, as well as any unsharded tenant data. * Pass an unsharded tenant ID to delete all remote storage under the tenant ID prefix. * Split out `RemoteStorage::delete_prefix()` to delete a bucket prefix, with additional test coverage. * Add a `delimiter` argument to `asset_prefix_empty()` to support partial prefix matches (i.e. all shards starting with a given tenant ID).	2024-10-17 14:34:51 +00:00
Alex Chi Z.	f3a3eefd26	feat(pageserver): do space check before gc-compaction (#9250 ) part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes gc-compaction may take a lot of disk space, and if it does, the caller should do a partial gc-compaction. This patch adds space check for the compaction job. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-17 10:29:53 -04:00
Ivan Efremov	a7c05686cc	test_runner: Update the README.md to build neon with 'testing' (#9437 ) Without having the '--features testing' in the cargo build the proxy won't start causing tests to fail.	2024-10-17 17:20:42 +03:00
Anastasia Lubennikova	8b47938140	Add support of extensions for v17 (part 3) (#9430 ) - pgvector 7.4 update support of extensions for v14-v16: - pgvector 7.2 -> 7.4	2024-10-17 13:37:21 +01:00
Arpad Müller	35e7d91bc9	Add config variable for timeline offloading (#9421 ) Adds a configuration variable for timeline offloading support. The added pageserver-global config option controls whether the pageserver automatically offloads timelines during compaction. Therefore, already offloaded timelines are not affected by this, nor is the manual testing endpoint. This allows the rollout of timeline offloading to be driven by the storage team. Part of #8088	2024-10-17 12:07:58 +00:00
Ivan Efremov	22d8834474	proxy: move the connection pools to separate file (#9398 ) First PR for #9284 Start unification of the client and connection pool interfaces: - Exclude the 'global_connections_count' out from the get_conn_entry() - Move remote connection pools to the conn_pool_lib as a reference - Unify clients among all the conn pools	2024-10-17 13:38:24 +03:00
Folke Behrens	9d9aab3680	Merge pull request #9426 from neondatabase/rc/proxy/2024-10-17 Proxy release 2024-10-17	2024-10-17 12:18:51 +02:00
John Spray	db68e82235	storage_scrubber: fixes to garbage commands (#9409 ) ## Problem While running `find-garbage` and `purge-garbage`, I encountered two things that needed updating: - Console API may omit `user_id` since org accounts were added - When we cut over to using GenericRemoteStorage, the object listings we do during purge did not get proper retry handling, so could easily fail on usual S3 errors, and make the whole process drop out. ...and one bug: - We had a `.unwrap` which expects that after finding an object in a tenant path, a listing in that path will always return objects. This is not true, because a pageserver might be deleting the path at the same time as we scan it. ## Summary of changes - When listing objects during purge, use backoff::retry - Make `user_id` an `Option` - Handle the case where a tenant's objects go away during find-garbage.	2024-10-17 10:06:02 +01:00
Konstantin Knizhnik	934dbb61f5	Check access_count in lfc_evict (#9407 ) ## Problem See https://neondb.slack.com/archives/C033A2WE6BZ/p1729007738526309?thread_ts=1722942856.987979&cid=C033A2WE6BZ When replica receives WAL record which target page is not present in shared buffer, we evict this page from LFC. If all pages from the LFC chunk are evicted, then chunk is moved to the beginning of LRU least to force it reuse. Unfortunately access_count is not checked and if the entry is access at this moment then this operation can cause LRU list corruption. ## Summary of changes Check `access_count` in `lfc_evict` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-17 08:04:57 +03:00
Christian Schwarz	67d5d98b19	readme: fix build instructions for debian 12 (#9371 ) We need libprotobuf-dev for some of the `/usr/include/google/protobuf/...*.proto` referenced by our protobuf decls.	2024-10-16 21:47:53 +02:00
Tristan Partin	e0fa6bcf1a	Fix some sql_exporter metrics for PG 17 Checkpointer related statistics moved from pg_stat_bgwriter to pg_stat_checkpointer, so we need to adjust our queries accordingly. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-16 14:46:33 -05:00
Tristan Partin	409a286eaa	Fix typo in sql_exporter generator Bad copy-paste seemingly. This manifested itself as a failure to start for the sql_exporter, and was just dying on loop in staging. A future PR will have E2E testing of sql_exporter. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-16 13:08:40 -05:00
Arpad Müller	0551cfb6a7	Fix beta clippy warnings (#9419 ) ``` warning: first doc comment paragraph is too long --> compute_tools/src/installed_extensions.rs:35:1 \| 35 \| / /// Connect to every database (see list_dbs above) and get the list of installed extensions. 36 \| \| /// Same extension can be installed in multiple databases with different versions, 37 \| \| /// we only keep the highest and lowest version across all databases. \| \|_ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph = note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default help: add an empty line \| 35 ~ /// Connect to every database (see list_dbs above) and get the list of installed extensions. 36 + /// \| ```	2024-10-16 19:04:56 +01:00
Folke Behrens	ed694732e7	proxy: merge AuthError and AuthErrorImpl (#9418 ) Since GetAuthInfoError now boxes the ControlPlaneError message the variant is not big anymore and AuthError is 32 bytes.	2024-10-16 19:10:49 +02:00
Alex Chi Z.	8a114e3aed	refactor(pageserver): upgrade remote_storage to use hyper1 (#9405 ) part of https://github.com/neondatabase/neon/issues/9255 ## Summary of changes Upgrade remote_storage crate to use hyper1. Hyper0 is used when providing the streaming HTTP body to the s3 SDK, and it is refactored to use hyper1. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-16 16:19:45 +01:00
Arpad Müller	55b246085e	Activate timelines during unoffload (#9399 ) The current code has forgotten to activate timelines during unoffload, leading to inability to receive the basebackup, due to the timeline still being in loading state. ``` stderr: command failed: compute startup failed: failed to get basebackup@0/0 from pageserver postgresql://no_user@localhost:15014 Caused by: 0: db error: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading 1: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading ``` Therefore, also activate the timeline during unoffloading. Part of #8088	2024-10-16 16:47:17 +02:00
Anastasia Lubennikova	9668601f46	Add support of extensions for v17 (part 2) (#9389 ) - plv8 3.2.3 - HypoPG 1.4.1 - pgtap 1.3.3 - timescaledb 2.17.0 - pg_hint_plan 17_1_7_0 - rdkit Release_2024_09_1 - pg_uuidv7 1.6.0 - wal2json 2.6 - pg_ivm 1.9 - pg_partman 5.1.0 update support of extensions for v14-v16: - HypoPG 1.4.0 -> 1.4.1 - pgtap 1.2.0 -> 1.3.3 - plpgsql_check 2.5.3 -> 2.7.11 - pg_uuidv7 1.0.1 -> 1.6.0 - wal2json 2.5 -> 2.6 - pg_ivm 1.7 -> 1.9 - pg_partman 5.0.1 -> 5.1.0	2024-10-16 15:29:23 +01:00
Arpad Müller	3140c14d60	Remove allow(clippy::unknown_lints) (#9416 ) the lint stabilized in 1.80.	2024-10-16 16:28:55 +02:00
John Spray	d6281cbe65	tests: stabilize test_timelines_parallel_endpoints (#9413 ) ## Problem This test would get failures like `command failed: Found no timeline id for branch name 'branch_8'` It's because neon_local is being invoked concurrently for branch creation, which is unsafe (they'll step on each others' JSON writes) Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9410/11363051979/index.html#testresult/5ddc56c640f5422b/retries ## Summary of changes - Don't do branch creation concurrently with endpoint creation via neon_local	2024-10-16 15:27:46 +01:00
Vlad Lazar	d490ad23e0	storcon: use the same trace fields for reconciler and results (#9410 ) ## Problem The reconciler use `seq`, but processing of results uses `sequence`. Order is different too. It makes it annoying to read logs. ## Summary of Changes Use the same tracing fields in both	2024-10-16 14:04:17 +01:00
Folke Behrens	f14e45f0ce	proxy: format imports with nightly rustfmt (#9414 ) ```shell cargo +nightly fmt -p proxy -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true ``` These rust-analyzer settings for VSCode should help retain this style: ```json "rust-analyzer.imports.group.enable": true, "rust-analyzer.imports.prefix": "crate", "rust-analyzer.imports.merge.glob": false, "rust-analyzer.imports.granularity.group": "module", "rust-analyzer.imports.granularity.enforce": true, ```	2024-10-16 15:01:56 +02:00
John Spray	89a65a9e5a	pageserver: improve handling of archival_config calls during Timeline shutdown (#9415 ) ## Problem In test `test_timeline_offloading`, we see failures like: ``` PageserverApiException: queue is in state Stopped ``` Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/main/11356917668/index.html#testresult/ff0e348a78a974ee/retries ## Summary of changes - Amend code paths that handle errors from RemoteTimelineClient to check for cancellation and emit the Cancelled error variant in these cases (will give clients a 503 to retry) - Remove the implicit `#[from]` for the Other error case, to make it harder to add code that accidentally squashes errors into this (500-equivalent) error variant. This would be neater if we made RemoteTimelineClient return a structured error instead of anyhow::Error, but that's a bigger refactor. I'm not sure if the test really intends to hit this path, but the error handling fix makes sense either way.	2024-10-16 13:39:58 +01:00
Cihan Demirci	bc6b8cee01	don't trigger workflows in two repos (#9340 ) https://github.com/neondatabase/cloud/issues/16723	2024-10-16 10:43:48 +01:00
Tristan Partin	061ea0de7a	Add jsonnetfmt targets This should make it a little bit easier for people wanting to check if their files are formated correctly. Has the added bonus of making the CI check simpler as well. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-15 20:01:13 -05:00
Tristan Partin	be5d6a69dc	Fix jsonnet_files wildcard Just a typo in a path. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-15 16:30:31 -05:00
Matthias van de Meent	18f4e5f10c	Add newly added metrics from neondatabase/neon#9116 to exports (#9402 ) They weren't added in that PR, but should be available immediately on rollout as the neon extension already defaults to 1.5.	2024-10-15 23:13:31 +02:00
Alex Chi Z.	f1eb703256	fix(pageserver): use a buffer for basebackup; add aux basebackup metrics log (#9401 ) Our replication bench project is stuck because it is too slow to generate basebackup and it caused compute to disconnect. https://neondb.slack.com/archives/C03438W3FLZ/p1728330685012419 The compute timeout for waiting for basebackup is 10m (is it true?). Generating basebackup directly on pageserver takes ~3min. Therefore, I suspect it's because there are too many wasted round-trip time for writing the 10000+ snapshot aux files. Also, it is possible that the basebackup process takes too long time retrieving all aux files that it did not write anything over the wire protocol, causing a read timeout. Basebackup size is 800KB gzipped for that project and was 55MB tar before compression. ## Summary of changes * Potentially fix the issue by placing a write buffer for basebackup. * Log how many aux files did we read + the time spent on it. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-15 16:35:21 -04:00
Tristan Partin	cf7a596a15	Generate sql_exporter config files with Jsonnet There are quite a few benefits to this approach: - Reduce config duplication - The two sql_exporter configs were super similar with just a few differences - Pull SQL queries into standalone files - That means we could run a SQL formatter on the file in the future - It also means access to syntax highlighting - In the future, run different queries for different PG versions - This is relevant because right now, we have queries that are failing on PG 17 due to catalog updates Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-15 11:18:38 -05:00
Konstantin Knizhnik	614c3aef72	Remove redundant code (#9373 ) ## Problem There is double update of resize cache in `put_rel_truncation` Also `page_server_request` contains check that fork is MAIN_FORKNUM which 1. is incorrect (because Vm/FSM pages are shreded in the same way as MAIN fork pages and 2. is redundant because `page_server_request` is never called for `get page` request so first part to OR condition is always true. ## Summary of changes Remove redundant code ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-15 17:18:52 +03:00
Folke Behrens	fb74c21e8c	proxy: Migrate jwt module away from anyhow (#9361 )	2024-10-15 15:24:56 +02:00
Conrad Ludgate	d92d36a315	[local_proxy] update api for pg_session_jwt (#9359 ) pg_session_jwt now: 1. Sets the JWK in a PGU_BACKEND session guc, no longer in the init() function. 2. JWK no longer needs the kid.	2024-10-15 12:13:57 +00:00
Arpad Müller	ec4cc30de9	Shut down timelines during offload and add offload tests (#9289 ) Add a test for timeline offloading, and subsequent unoffloading. Also adds a manual endpoint, and issues a proper timeline shutdown during offloading which prevents a pageserver hang at shutdown. Part of #8088.	2024-10-15 09:46:51 +00:00
John Spray	73c6626b38	pageserver: stabilize & refine controller scale test (#8971 ) ## Problem We were seeing timeouts on migrations in this test. The test unfortunately tends to saturate local storage, which is shared between the pageservers and the control plane database, which makes the test kind of unrealistic. We will also want to increase the scale of this test, so it's worth fixing that. ## Summary of changes - Instead of randomly creating timelines at the same time as the other background operations, explicitly identify a subset of tenant which will have timelines, and create them at the start. This avoids pageservers putting a lot of load on the test node during the main body of the test. - Adjust the tenants created to create some number of 8 shard tenants and the rest 1 shard tenants, instead of just creating a lot of 2 shard tenants. - Use archival_config to exercise tenant-mutating operations, instead of using timeline creation for this. - Adjust reconcile_until_idle calls to avoid waiting 5 seconds between calls, which causes timelines with large shard count tenants. - Fix a pageserver bug where calls to archival_config during activation get 404	2024-10-15 09:31:18 +01:00
Alexander Bayandin	0fc4ada3ca	Switch CI, Storage and Proxy to Debian 12 (Bookworm) (#9170 ) ## Problem This PR switches CI and Storage to Debain 12 (Bookworm) based images. ## Summary of changes - Add Debian codename (`bookworm`/`bullseye`) to most of docker tags, create un-codenamed images to be used by default - `vm-compute-node-image`: create a separate spec for `bookworm` (we don't need to build cgroups in the future) - `neon-image`: Switch to `bookworm`-based `build-tools` image - Storage components and Proxy use it - CI: run lints and tests on `bookworm`-based `build-tools` image	2024-10-14 21:12:43 +01:00
Matthias van de Meent	dab96a6eb1	Add more timing histogram and gauge metrics to the Neon extension (#9116 ) We now also track: - Number of PS IOs in-flight - Number of pages cached by smgr prefetch implementation - IO timing histograms for LFC reads and writes, per IO issued ## Problem There's little insight into the timing metrics of LFC, and what the prefetch state of each backend is. This changes that, by measuring (and subsequently exposing) these data points. ## Summary of changes - Extract IOHistogram as separate type, rather than a collection of fields on NeonMetrics - others, see items above. Part of https://github.com/neondatabase/neon/issues/8926	2024-10-14 20:30:21 +02:00
Arpad Müller	f54e3e9147	Also consider offloaded timelines for obtaining retain_lsn (#9308 ) Also consider offloaded timelines for obtaining `retain_lsn`. This is required for correctness for all timelines that have not been flattened yet: otherwise we GC data that might still be required for reading. This somewhat counteracts the original purpose of timeline offloading of not having to iterate over offloaded timelines, but sadly it's required. In the future, we can improve the way the offloaded timelines are stored. We also make the `retain_lsn` optional so that in the future, when we implement flattening, we can make it None. This also applies to full timeline objects by the way, where it would probably make most sense to add a bool flag whether the timeline is successfully flattened, and if it is, one can exclude it from `retain_lsn` as well. Also, track whether a timeline was offloaded or not in `retain_lsn` so that the `retain_lsn` can be excluded from visibility and size calculation. Part of #8088	2024-10-14 17:54:03 +02:00
Vlad Lazar	f4f7ea247c	tests: make size comparisons more lenient (#9388 ) The empirically determined threshold doesn't hold for PG 17. Bump the limit to stabilise ci.	2024-10-14 16:50:12 +01:00
Arpad Müller	d92ff578c4	Add test for fixed storage broker issue (#9311 ) Adds a test for the (now fixed) storage broker limit issue, see #9268 for the description and #9299 for the fix. Also fix a race condition with endpoint creation/starts running in parallel, leading to file not found errors.	2024-10-14 14:34:57 +02:00
Alexander Bayandin	31b7703fa8	CI(build-build-tools): fix unexpected cancellations (#9357 ) ## Problem When `Dockerfile.build-tools` gets changed, several PRs catch up with it and some might get unexpectedly cancelled workflows because of GitHub's concurrency model for workflows. See the comment in the code for more details. It should be possible to revert it after https://github.com/orgs/community/discussions/41518 (I don't expect it anytime soon, but I subscribed) ## Summary of changes - Do not queue `build-build-tools-image` workflows in the concurrency group	2024-10-14 11:51:01 +01:00
Konstantin Knizhnik	d056ae9be5	Ignore pg_dynshmem fiel when comparing directories (#9374 ) ## Problem At MacOS `pg_dynshmem` file is create in PGDATADIR which cause mismatch in directories comparison ## Summary of changes Add this files to the ignore list. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-14 13:45:20 +03:00
Conrad Ludgate	cb9ab7463c	proxy: split out the console-redirect backend flow (#9270 ) removes the ConsoleRedirect backend from the main auth::Backends enum, copy-paste the existing crate::proxy::task_main structure to use the ConsoleRedirectBackend exclusively. This makes the logic a bit simpler at the cost of some fairly trivial code duplication.	2024-10-14 12:25:55 +02:00
Conrad Ludgate	ab5bbb445b	proxy: refactor auth backends (#9271 ) preliminary for #9270 The auth::Backend didn't need to be in the mega ProxyConfig object, so I split it off and passed it manually in the few places it was necessary. I've also refined some of the uses of config I saw while doing this small refactor. I've also followed the trend and make the console redirect backend it's own struct, same as LocalBackend and ControlPlaneBackend.	2024-10-11 20:14:52 +01:00
Alexander Bayandin	5ef805e12c	CI(run-python-test-set): allow to skip missing compatibility snapshot (#9365 ) ## Problem Action `run-python-test-set` fails if it is not used for `regress_tests` on release PR, because it expects `test_compatibility.py::test_create_snapshot` to generate a snapshot, and the test exists only in `regress_tests` suite. For example, in https://github.com/neondatabase/neon/pull/9291 [`test-postgres-client-libs`](https://github.com/neondatabase/neon/actions/runs/11209615321/job/31155111544) job failed. ## Summary of changes - Add `skip-if-does-not-exist` input to `.github/actions/upload` action (the same way we do for `.github/actions/download`) - Set `skip-if-does-not-exist=true` for "Upload compatibility snapshot" step in `run-python-test-set` action	2024-10-11 16:58:41 +01:00
a-masterov	091a175a3e	Test versions mismatch (#9167 ) ## Problem We faced the problem of incompatibility of the different components of different versions. This should be detected automatically to prevent production bugs. ## Summary of changes The test for this situation was implemented Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-10-11 15:29:54 +02:00
Fedor Dikarev	326cd80f0d	ci: gh-workflow-stats-action v0.1.4: remove debug output and proper pagination (#9356 ) ## Problem In previous version pagination didn't work so we collect information only for first 30 jobs in WorkflowRun	2024-10-11 14:46:45 +02:00
Folke Behrens	6baf1aae33	proxy: Demote some errors to warnings in logs (#9354 )	2024-10-11 11:29:08 +02:00
John Spray	184935619e	tests: stabilize test_storage_controller_heartbeats (#9347 ) ## Problem This could fail with `reconciliation in progress` if running on a slow test node such that background reconciliation happens at the same time as we call consistency_check. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/11258171952/index.html#/testresult/54889c9469afb232 ## Summary of changes - Call reconcile_until_idle before calling consistency check once, rather than calling consistency check until it passes	2024-10-11 09:41:08 +01:00
Ivan Efremov	b2ecbf3e80	Introduce "quota" ErrorKind (#9300 ) ## Problem Fixes #8340 ## Summary of changes Introduced ErrorKind::quota to handle quota-related errors ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-10-11 10:45:55 +03:00
Tristan Partin	53147b51f9	Use valid type hints for Python 3.9 I have no idea how this made it past the linters. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 13:00:25 -05:00
Tristan Partin	006d9dfb6b	Add compute_config_dir fixture Allows easy access to various compute config files. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 12:43:40 -05:00
Tristan Partin	1f7904c917	Enable cargo caching in check-codestyle-rust This job takes an extraordinary amount of time for what I understand it to do. The obvious win is caching dependencies. Rory disabled caching in `cd5732d9d8`. I assume this was to get gen3 runners up and running. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 12:40:30 -05:00
John Spray	07c714343f	tests: allow a log warning in test_cli_start_stop_multi (#9320 ) ## Problem This test restarts services in an undefined order (whatever neon_local does), which means we should be tolerant of warnings that come from restarting the storage controller while a pageserver is running. We can see failures with warnings from dropped requests, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9307/11229000712/index.html#/testresult/d33d5cb206331e28 ``` WARN request{method=GET path=/v1/location_config request_id=b7dbda15-6efb-4610-8b19-a3772b65455f}: request was dropped before completing\n') ``` ## Summary of changes - allow-list the `request was dropped before completing` message on pageservers before restarting services	2024-10-10 17:06:42 +01:00
Tristan Partin	264c34dfb7	Move path-related fixtures into their own module (#9304 ) neon_fixtures.py has grown into quite a beast. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 10:26:23 -05:00
Erik Grinaker	9dd80b9b4c	storage_scrubber: fix faulty assertion when no timelines (#9345 ) When there are no timelines in remote storage, the storage scrubber would incorrectly trip an assertion with "Must be set if results are present", referring to the last processed tenant ID. When there are no timelines we don't expect there to be a tenant ID either. The assertion was introduced in `37aa6fd`. Only apply the assertion when any timelines are present.	2024-10-10 09:09:53 -04:00
Erik Grinaker	c2623ffef4	CODEOWNERS: assign `storage_scrubber` to storage (#9346 )	2024-10-10 12:40:35 +01:00
John Spray	426b1c5f08	storage controller: use 'infra' JWT scope for node registration (#9343 ) ## Problem Storage controller `/control` API mostly requires admin tokens, for interactive use by engineers. But for endpoints used by scripts, we should not require admin tokens. Discussion at https://neondb.slack.com/archives/C033RQ5SPDH/p1728550081788989?thread_ts=1728548232.265019&cid=C033RQ5SPDH ## Summary of changes - Introduce the 'infra' JWT scope, which was not previously used in the neon repo - For pageserver & safekeeper node registrations, require infra scope instead of admin Note that admin will still work, as the controller auth checks permit admin tokens for all endpoints irrespective of what scope they require.	2024-10-10 12:26:43 +01:00
Folke Behrens	a202b1b5cc	Merge pull request #9341 from neondatabase/rc/proxy/2024-10-10 Proxy release 2024-10-10	2024-10-10 09:17:11 +02:00
Conrad Ludgate	306094a87d	add local-proxy suffix to wake-compute requests, respect the returned port (#9298 ) https://github.com/neondatabase/cloud/issues/18349 Use the `-local-proxy` suffix to make sure we get the 10432 local_proxy port back from cplane.	2024-10-09 22:43:35 +01:00
Tristan Partin	d3464584a6	Improve some typing in test_runner Fixes some types, adds some types, and adds some override annotations. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-09 15:42:22 -05:00
Tristan Partin	878135fe9c	Move PgBenchInitResult.EXTRACTORS to a private module constant This seems to paper over a behavioral difference in Python 3.9 and Python 3.12 with how dataclasses work with mutable variables. On Python 3.12, I get the following error: ValueError: mutable default <class 'dict'> for field EXTRACTORS is not allowed: use default_factory This obviously doesn't occur in our testing environment. When I do what the error tells me, EXTRACTORS doesn't seem to exist as an attribute on the class in at least Python 3.9. The solution provided in this commit seems like the least amount of friction to keep the wheels turning. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-09 14:02:09 -05:00
Conrad Ludgate	75434060a5	local_proxy: integrate with pg_session_jwt extension (#9086 )	2024-10-09 18:24:10 +01:00
Anastasia Lubennikova	721803a0e7	Add partial support of extensions for v17: (#9322 ) - PostGIS 3.5.0 - pgrouting 3.6.2 - h3 4.1.3 - unit 7.9 - pgjwt version (f3d82fd) - pg_hashids 1.2.1 - ip4r 2.4.2 - prefix 1.2.10 - postgresql-hll 2.18 - pg_roaringbitmap 0.5.4 - pg-semver 0.40.0 update support of extensions for v14-v16: - unit 7.7 -> 7.9 - pgjwt 9742dab -> f3d82fd --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-09 17:07:59 +01:00
Fedor Dikarev	108a211917	added workflow Report Workflow Stats (#9330 ) ## Summary of changes CI: Collect stats for Github Workflows Runs	2024-10-09 17:27:41 +02:00
Heikki Linnakangas	72ef0e0fa1	tests: Remove redundant log lines when stopping storage nodes (#9317 ) The neon_cli functions print the command that gets executed, which contains the same information. Before: 2024-10-07 22:32:28.884 INFO [neon_fixtures.py:3927] Stopping safekeeper 1 2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1" 2024-10-07 22:32:28.989 INFO [neon_fixtures.py:3927] Stopping safekeeper 2 2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2" 2024-10-07 22:32:29.93 INFO [neon_fixtures.py:3927] Stopping safekeeper 3 2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3" 2024-10-07 22:32:29.251 INFO [neon_cli.py:450] Stopping pageserver with ['pageserver', 'stop', '--id=1'] 2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1" After: 2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1" 2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2" 2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3" 2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1"	2024-10-09 15:51:34 +03:00
Heikki Linnakangas	eb23d355a9	tests: Use ThreadedMotoServer python class to launch mock S3 server (#9313 ) This is simpler than using subprocess. One difference is in how moto's log output is now collected. Previously, moto's logs went to stderr, and were collected and printed at the end of the test by pytest, like this: 2024-10-07T22:45:12.3705222Z ----------------------------- Captured stderr call ----------------------------- 2024-10-07T22:45:12.3705577Z 127.0.0.1 - - [07/Oct/2024 22:35:14] "PUT /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7 HTTP/1.1" 200 - 2024-10-07T22:45:12.3706181Z 127.0.0.1 - - [07/Oct/2024 22:35:15] "GET /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7/?list-type=2&delimiter=/&prefix=/tenants/43da25eac0f41412696dd31b94dbb83c/timelines/ HTTP/1.1" 200 - 2024-10-07T22:45:12.3706894Z 127.0.0.1 - - [07/Oct/2024 22:35:16] "PUT /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7//tenants/43da25eac0f41412696dd31b94dbb83c/timelines/eabba5f0c1c72c8656d3ef1d85b98c1d/initdb.tar.zst?x-id=PutObject HTTP/1.1" 200 - Note the timestamps: the timestamp at the beginning of the line is the time that the stderr was dumped, i.e. the end of the test, which makes those timestamps rather useless. The timestamp in the middle of the line is when the operation actually happened, but it has only 1 s granularity. With this change, moto's log lines are printed in the "live log call" section, as they happen, which makes the timestamps more useful: 2024-10-08 12:12:31.129 INFO [_internal.py:97] 127.0.0.1 - - [08/Oct/2024 12:12:31] "GET /pageserver-test-deletion-queue-e24e7525d437e1874d8a52030dcabb4f/?list-type=2&delimiter=/&prefix=/tenants/7b6a16b1460eda5204083fba78bc360f/timelines/ HTTP/1.1" 200 - 2024-10-08 12:12:32.612 INFO [_internal.py:97] 127.0.0.1 - - [08/Oct/2024 12:12:32] "PUT /pageserver-test-deletion-queue-e24e7525d437e1874d8a52030dcabb4f//tenants/7b6a16b1460eda5204083fba78bc360f/timelines/7ab4c2b67fa8c712cada207675139877/initdb.tar.zst?x-id=PutObject HTTP/1.1" 200 -	2024-10-09 15:34:51 +03:00
Yuchen Liang	bee04b8a69	pageserver: add direct io config to virtual file (#9214 ) ## Problem We need a way to incrementally switch to direct IO. During the rollout we might want to switch to O_DIRECT on image and delta layer read path first before others. ## Summary of changes - Revisited and simplified direct io config in `PageserverConf`. - We could add a fallback mode for open, but for read there isn't a reasonable alternative (without creating another buffered virtual file). - Added a wrapper around `VirtualFile`, current implementation become `VirtualFileInner` - Use `open_v2`, `create_v2`, `open_with_options_v2` when we want to use the IO mode specified in PS config. - Once we onboard all IO through VirtualFile using this new API, we will delete the old code path. - Make io mode live configurable for benchmarking. - Only guaranteed for files opened after the config change, so do it before the experiment. As an example, we are using `open_v2` with `virtual_file::IoMode::Direct` in https://github.com/neondatabase/neon/pull/9169 We also remove `io_buffer_alignment` config in `a04cfd754b` and use it as a compile time constant. This way we don't have to carry the alignment around or make frequent call to retrieve this information from the static variable. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-09 08:33:07 -04:00
Anastasia Lubennikova	63e7fab990	Add /installed_extensions endpoint to collect statistics about extension usage. (#8917 ) Add /installed_extensions endpoint to collect statistics about extension usage. It returns a list of installed extensions in the format: ```json { "extensions": [ { "extname": "extension_name", "versions": ["1.0", "1.1"], "n_databases": 5, } ] } ``` --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-09 13:32:13 +01:00
Arseny Sher	a181392738	safekeeper: add evicted_timelines gauge. (#9318 ) showing total number of evicted timelines.	2024-10-09 14:40:30 +03:00
Alexander Bayandin	fc7397122c	test_runner: fix path to tpc-h queries (#9327 ) ## Problem The path to TPC-H queries was incorrectly changed in #9306. This path is used for `test_tpch` parameterization, so all perf tests started to fail: ``` ==================================== ERRORS ==================================== __________ ERROR collecting test_runner/performance/test_perf_olap.py __________ test_runner/performance/test_perf_olap.py:205: in <module> @pytest.mark.parametrize("query", tpch_queuies()) test_runner/performance/test_perf_olap.py:196: in tpch_queuies assert queries_dir.exists(), f"TPC-H queries dir not found: {queries_dir}" E AssertionError: TPC-H queries dir not found: /__w/neon/neon/test_runner/performance/performance/tpc-h/queries E assert False E + where False = <bound method Path.exists of PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries')>() E + where <bound method Path.exists of PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries')> = PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries').exists ``` ## Summary of changes - Fix the path to tpc-h queries	2024-10-09 12:11:06 +01:00
Vlad Lazar	cc599e23c1	storcon: make observed state updates more granular (#9276 ) ## Problem Previously, observed state updates from the reconciler may have clobbered inline changes made to the observed state by other code paths. ## Summary of changes Model observed state changes from reconcilers as deltas. This means that we only update what has changed. Handling for node going off-line concurrently during the reconcile is also added: set observed state to None in such cases to respect the convention. Closes https://github.com/neondatabase/neon/issues/9124	2024-10-09 11:53:29 +01:00
Folke Behrens	54d1185789	proxy: Unalias hyper1 and replace one use of hyper0 in test (#9324 ) Leaves one final use of hyper0 in proxy for the health service, which requires some coordinated effort with other services.	2024-10-09 12:44:17 +02:00
Heikki Linnakangas	8a138db8b7	tests: Reduce noise from logging renamed files (#9315 ) Instead of printing the full absolute path for every file, print just the filenames. Before: 2024-10-08 13:19:39.98 INFO [test_pageserver_generations.py:669] Found file /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 2024-10-08 13:19:39.99 INFO [test_pageserver_generations.py:673] Renamed /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 -> /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0 After: 2024-10-08 13:24:39.726 INFO [test_pageserver_generations.py:667] Renaming files in /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/3439538816c520adecc541cc8b1de21c/timelines/6a7be8ee707b355de48dd91b326d6ae1 2024-10-08 13:24:39.728 INFO [test_pageserver_generations.py:673] Renamed 000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 -> 000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0	2024-10-09 10:55:56 +01:00
Erik Grinaker	211970f0e0	remote_storage: add `DownloadOpts::byte_(start\|end)` (#9293 ) `download_byte_range()` is basically a copy of `download()` with an additional option passed to the backend SDKs. This can cause these code paths to diverge, and prevents combining various options. This patch adds `DownloadOpts::byte_(start\|end)` and move byte range handling into `download()`.	2024-10-09 10:29:06 +01:00
Heikki Linnakangas	f87f5a383e	tests: Remove redundant log lines when starting an endpoint (#9316 ) The "Starting postgres endpoint <name>" message is not needed, because the neon_cli.py prints the neon_local command line used to start the endpoint. That contains the same information. The "Postgres startup took XX seconds" message is not very useful because no one pays attention to those in the python test logs when things are going smoothly, and if you do wonder about the startup speed, the same information and more can be found in the compute log. Before: 2024-10-07 22:32:27.794 INFO [neon_fixtures.py:3492] Starting postgres endpoint ep-1 2024-10-07 22:32:27.794 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local endpoint start --safekeepers 1 ep-1" 2024-10-07 22:32:27.901 INFO [neon_fixtures.py:3690] Postgres startup took 0.11398935317993164 seconds After: 2024-10-07 22:32:27.794 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local endpoint start --safekeepers 1 ep-1"	2024-10-09 09:58:50 +01:00
Arpad Müller	e8ae37652b	Add timeline offload mechanism (#8907 ) Implements an initial mechanism for offloading of archived timelines. Offloading is implemented as specified in the RFC. For now, there is no persistence, so a restart of the pageserver will retrigger downloads until the timeline is offloaded again. We trigger offloading in the compaction loop because we need the signal for whether compaction is done and everything has been uploaded or not. Part of #8088	2024-10-09 01:33:39 +02:00
Tristan Partin	5bd8e2363a	Enable all pyupgrade checks in ruff This will help to keep us from using deprecated Python features going forward. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-08 14:32:26 -05:00
Vlad Lazar	618680c299	storcon: apply all node status changes before handling transitions (#9281 ) ## Problem When a node goes offline, we trigger reconciles to migrate shards away from it. If multiple nodes go offline at the same time, we handled them in sequence. Hence, we might migrate shards from the first offline node to the second offline node and increase the unavailability period. ## Summary of changes Refactor heartbeat delta handling to: 1. Update in memory state for all nodes first 2. Handle availability transitions one by one (we have full picture for each node after (1)) Closes https://github.com/neondatabase/neon/issues/9126	2024-10-08 17:55:25 +01:00
Alexander Bayandin	baf27ba6a3	Fix compiler warnings on macOS (#9319 ) ## Problem On macOS: ``` /Users/runner/work/neon/neon//pgxn/neon/file_cache.c:623:19: error: variable 'has_remaining_pages' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized] ``` ## Summary of changes - Initialise `has_remaining_pages` with `false`	2024-10-08 17:34:35 +01:00
Tristan Partin	16417d919d	Remove get_self_dir() It didn't serve much value, and was only used twice. Path(__file__).parent is a pretty easy invocation to use. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-08 08:57:11 -05:00
Heikki Linnakangas	18b97150b2	Remove non-existent entries from .dockerignore (#9209 )	2024-10-08 14:55:24 +03:00
Heikki Linnakangas	17c59ed786	Don't override CFLAGS when building neon extension If you override CFLAGS, you also override any flags that PostgreSQL configure script had picked. That includes many options that enable extra compiler warnings, like '-Wall', '-Wmissing-prototypes', and so forth. The override was added in commit `171385ac14`, but the intention of that was to be more strict, by enabling '-Werror', not less strict. The proper way of setting '-Werror', as documented in the docs and mentioned in PR #2405, is to set COPT='-Werror', but leave CFLAGS alone. All the compiler warnings with the standard PostgreSQL flags have now been fixed, so we can do this without adding noise. Part of the cleanup issue #9217.	2024-10-07 23:49:33 +03:00
Heikki Linnakangas	d7b960c9b5	Silence compiler warning about using variable uninitialized It's not a bug, the variable is initialized when it's used, but the compiler isn't smart enough to see that through all the conditions. Part of the cleanup issue #9217.	2024-10-07 23:49:31 +03:00
Heikki Linnakangas	2ff6d2b6b5	Silence compiler warning about variable only used in assertions Part of the cleanup issue #9217.	2024-10-07 23:49:29 +03:00
Heikki Linnakangas	30f7fbc88d	Add pg_attribute_printf to WalProposerLibLog, per gcc's suggestion /pgxn/neon/walproposer_compat.c:192:9: warning: function ‘WalProposerLibLog’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] 192 \| vsnprintf(buf, sizeof(buf), fmt, args); \| ^~~~~~~~~	2024-10-07 23:49:27 +03:00
Heikki Linnakangas	09f2000f91	Silence warnings about shadowed local variables Part of the cleanup issue #9217.	2024-10-07 23:49:24 +03:00
Heikki Linnakangas	e553ca9e4f	Silence warnings about mixed declarations and code The warning: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] It's PostgreSQL project style to stick to the old C90 style. (Alternatively, we could disable it for our extension.) Part of the cleanup issue #9217.	2024-10-07 23:49:22 +03:00
Heikki Linnakangas	0a80dbce83	neon_write() function is not used on v17 ifdef it out on v17, to silence compiler warning. Part of the cleanup issue #9217.	2024-10-07 23:49:20 +03:00
Heikki Linnakangas	e763256448	Fix warnings about missing function prototypes Prototypes for neon_writev(), neon_readv(), and neon_regisersync() were missing. But instead of adding the missing prototypes, mark all the smgr functions 'static'. Part of the cleanup issue #9217.	2024-10-07 23:49:18 +03:00
Heikki Linnakangas	129d4480bb	Move "/* fallthrough */" comments so that GCC recognizes them This silences warnings about implicit fallthroughs. Part of the cleanup issue #9217.	2024-10-07 23:49:16 +03:00
Heikki Linnakangas	776df963ba	Fix function prototypes Silences these compiler warnings: /pgxn/neon_walredo/walredoproc.c:452:1: warning: ‘CreateFakeSharedMemoryAndSemaphores’ was used with no prototype before its definition [-Wmissing-prototypes] 452 \| CreateFakeSharedMemoryAndSemaphores() \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /pgxn/neon/walproposer_pg.c:541:1: warning: no previous prototype for ‘GetWalpropShmemState’ [-Wmissing-prototypes] 541 \| GetWalpropShmemState() \| ^~~~~~~~~~~~~~~~~~~~ Part of the cleanup issue #9217.	2024-10-07 23:49:13 +03:00
Heikki Linnakangas	11dc5feb36	Remove unused static function In v16 merge, we copied much of heap RMGR, to distinguish vanilla Postgres heap records from records generated with neon patches, with the additional CID fields. This function is only used by the HEAP_TRUNCATE records, however, which we didn't need to copy. Part of the cleanup issue #9217.	2024-10-07 23:49:11 +03:00
Heikki Linnakangas	dbbe57a837	Remove unused local vars and a prototype for non-existent function Per compiler warnings. Part of the cleanup issue #9217.	2024-10-07 23:49:09 +03:00
Em Sharnoff	cc29def544	vm-monitor: Ignore LFC in postgres cgroup memory threshold (#8668 ) In short: Currently we reserve 75% of memory to the LFC, meaning that if we scale up to keep postgres using less than 25% of the compute's memory. This means that for certain memory-heavy workloads, we end up scaling much higher than is actually needed — in the worst case, up to 4x, although in practice it tends not to be quite so bad. Part of neondatabase/autoscaling#1030.	2024-10-07 21:25:34 +01:00
Arpad Müller	912d47ec02	storage_broker: update hyper and tonic again (#9299 ) Update hyper and tonic again in the storage broker, this time with a fix for the issue that made us revert the update last time. The first commit is a revert of #9268, the second a fix for the issue. fixes #9231.	2024-10-07 21:12:13 +02:00
Tristan Partin	6eba29c732	Improve logging on changes in a compute's status I'm trying to debug a situation with the LR benchmark publisher not being in the correct state. This should aid in debugging, while just being generally useful. PR: https://github.com/neondatabase/neon/pull/9265 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-07 13:19:48 -04:00
Heikki Linnakangas	99d4c1877b	Replace BUFFERTAGS_EQUAL compatibility macro with new-style function (#9294 ) In PostgreSQL v16, BUFFERTAGS_EQUAL was replaced with a static inline macro, BufferTagsEqual. Let's use the new name going forward, and have backwards-compatibility glue to allow using the new name on v14 and v15, rather than the other way round. This also makes BufferTagsEquals consistent with InitBufferTag, for which we were already using the new name.	2024-10-07 19:49:27 +03:00
Jere Vaara	2272dc8a48	feat(compute_tools): Create JWKS Postgres roles without attributes (#9031 ) Requires https://github.com/neondatabase/neon/pull/9086 first to have `local_proxy_config`. This logic can still be reviewed implementation wise. Create JWT Auth functionality related roles without attributes and `neon_superuser` group. Read the JWT related roles from `local_proxy_config` `JWKS` settings and handle them differently than other console created roles.	2024-10-07 19:37:32 +03:00
Heikki Linnakangas	323bd018cd	Make sure BufferTag padding bytes are cleared in hash keys (#9292 ) The prefetch-queue hash table uses a BufferTag struct as the hash key, and it's hashed using hash_bytes(). It's important that all the padding bytes in the key are cleared, because hash_bytes() will include them. I was getting compiler warnings like this on v14 and v15, when compiling with -Warray-bounds: In function ‘prfh_lookup_hash_internal’, inlined from ‘prfh_lookup’ at pg_install/v14/include/postgresql/server/lib/simplehash.h:821:9, inlined from ‘neon_read_at_lsnv’ at pgxn/neon/pagestore_smgr.c:2789:11, inlined from ‘neon_read_at_lsn’ at pgxn/neon/pagestore_smgr.c:2904:2: pg_install/v14/include/postgresql/server/storage/relfilenode.h:90:43: warning: array subscript ‘PrefetchRequest[0]’ is partly outside array bounds of ‘BufferTag[1]’ {aka ‘struct buftag[1]’} [-Warray-bounds] 89 \| ((node1).relNode == (node2).relNode && \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 90 \| (node1).dbNode == (node2).dbNode && \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ 91 \| (node1).spcNode == (node2).spcNode) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pg_install/v14/include/postgresql/server/storage/buf_internals.h:116:9: note: in expansion of macro ‘RelFileNodeEquals’ 116 \| RelFileNodeEquals((a).rnode, (b).rnode) && \ \| ^~~~~~~~~~~~~~~~~ pgxn/neon/neon_pgversioncompat.h:25:31: note: in expansion of macro ‘BUFFERTAGS_EQUAL’ 25 \| #define BufferTagsEqual(a, b) BUFFERTAGS_EQUAL((a), (b)) \| ^~~~~~~~~~~~~~~~ pgxn/neon/pagestore_smgr.c:220:34: note: in expansion of macro ‘BufferTagsEqual’ 220 \| #define SH_EQUAL(tb, a, b) (BufferTagsEqual(&(a)->buftag, &(b)->buftag)) \| ^~~~~~~~~~~~~~~ pg_install/v14/include/postgresql/server/lib/simplehash.h:280:77: note: in expansion of macro ‘SH_EQUAL’ 280 \| #define SH_COMPARE_KEYS(tb, ahash, akey, b) (ahash == SH_GET_HASH(tb, b) && SH_EQUAL(tb, b->SH_KEY, akey)) \| ^~~~~~~~ pg_install/v14/include/postgresql/server/lib/simplehash.h:799:21: note: in expansion of macro ‘SH_COMPARE_KEYS’ 799 \| if (SH_COMPARE_KEYS(tb, hash, key, entry)) \| ^~~~~~~~~~~~~~~ pgxn/neon/pagestore_smgr.c: In function ‘neon_read_at_lsn’: pgxn/neon/pagestore_smgr.c:2742:25: note: object ‘buftag’ of size 20 2742 \| BufferTag buftag = {0}; \| ^~~~~~ This commit silences those warnings, although it's not clear to me why the compiler complained like that in the first place. I found the issue with padding bytes while looking into those warnings, but that was coincidental, I don't think the padding bytes explain the warnings as such. In v16, the BUFFERTAGS_EQUAL macro was replaced with a static inline function, and that also silences the compiler warning. Not clear to me why.	2024-10-07 18:04:04 +03:00
Folke Behrens	ad267d849f	proxy: Move module base files into module directory (#9297 )	2024-10-07 16:25:34 +02:00
Conrad Ludgate	8cd7b5bf54	proxy: rename console -> control_plane, rename web -> console_redirect (#9266 ) rename console -> control_plane rename web -> console_redirect I think these names are a little more representative.	2024-10-07 14:09:54 +01:00
Konstantin Knizhnik	47c3c9a413	Fix update of statistic for LFC/prefetch (#9272 ) ## Problem See #9199 ## Summary of changes Fix update of hits/misses for LFC and prefetch introduced in `78938d1b59` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-07 12:21:16 +03:00
Arseny Sher	eae4470bb6	safekeeper: remove local WAL files ignoring peer_horizon_lsn. (#8900 ) If peer safekeeper needs garbage collected segment it will be fetched now from s3 using on-demand WAL download. Reduces danger of running out of disk space when safekeeper fails.	2024-10-04 19:07:39 +03:00
Ivan Efremov	2d248aea6f	proxy: exclude triple logging of connect compute errors (#9277 ) Fixes (#9020) - Use the compute::COULD_NOT_CONNECT for connection error message; - Eliminate logging for one connection attempt; - Typo fix.	2024-10-04 18:21:39 +03:00
Conrad Ludgate	6c05f89f7d	proxy: add local-proxy to compute image (#8823 ) 1. Adds local-proxy to compute image and vm spec 2. Updates local-proxy config processing, writing PID to a file eagerly 3. Updates compute-ctl to understand local proxy compute spec and to send SIGHUP to local-proxy over that pid. closes https://github.com/neondatabase/cloud/issues/16867	2024-10-04 14:52:01 +00:00
Arseny Sher	db53f98725	neon walsender_hooks: take basebackup LSN directly. (#9263 ) NeonWALReader needs to know LSN before which WAL is not available locally, that is, basebackup LSN. Previously it was taken from WalpropShmemState, but that's racy, as walproposer sets its there only after successfull election. Get it directly with GetRedoStartLsn. Should fix flakiness of test_ondemand_wal_download_in_replication_slot_funcs etc. ref #9201	2024-10-04 14:56:15 +01:00
Erik Grinaker	04a6222418	remote_storage: add `head_object` integration test (#9274 )	2024-10-04 12:40:41 +01:00
Vlad Lazar	dcf7af5a16	storcon: do timeline creation on all attached location (#9237 ) ## Problem Creation of a timelines during a reconciliation can lead to unavailability if the user attempts to start a compute before the storage controller has notified cplane of the cut-over. ## Summary of changes Create timelines on all currently attached locations. For the latest location, we still look at the database (this is a previously). With this change we also look into the observed state to find other attached locations. Related https://github.com/neondatabase/neon/issues/9144	2024-10-04 11:56:43 +01:00
Erik Grinaker	37158d0424	pageserver: use conditional GET for secondary tenant heatmaps (#9236 ) ## Problem Secondary tenant heatmaps were always downloaded, even when they hadn't changed. This can be avoided by using a conditional GET request passing the `ETag` of the previous heatmap. ## Summary of changes The `ETag` was already plumbed down into the heatmap downloader, and just needed further plumbing into the remote storage backends. * Add a `DownloadOpts` struct and pass it to `RemoteStorage::download()`. * Add an optional `DownloadOpts::etag` field, which uses a conditional GET and returns `DownloadError::Unmodified` on match.	2024-10-04 12:29:48 +02:00
Erik Grinaker	60fb840e1f	Cargo.toml: enable `sso` for `aws-config` (#9261 ) ## Problem The S3 tests couldn't use SSO authentication for local tests against S3. ## Summary of changes Enable the `sso` feature of `aws-config`. Also run `cargo hakari generate` which made some updates to `workspace_hack`.	2024-10-04 11:27:06 +01:00
Heikki Linnakangas	52232dd85c	tests: Add a comment explaining the rules of NeonLocalCli wrappers (#9195 )	2024-10-03 22:03:29 +03:00
Heikki Linnakangas	8ef0c38b23	tests: Rename NeonLocalCli functions to match the 'neon_local' commands (#9195 ) This makes it more clear that the functions in NeonLocalCli are just typed wrappers around the corresponding 'neon_local' commands.	2024-10-03 22:03:27 +03:00
Heikki Linnakangas	56bb1ac458	tests: Move NeonCli and friends to separate file (#9195 ) In the passing, rename it to NeonLocalCli, to reflect that the binary is called 'neon_local'. Add wrapper for the 'timeline_import' command, eliminating the last raw call to the raw_cli() function from tests, except for a few in test_neon_cli.py which are about testing the 'neon_local' iteself. All the other calls are now made through the strongly-typed wrapper functions	2024-10-03 22:03:25 +03:00
Heikki Linnakangas	19db9e9aad	tests: Replace direct calls to neon_cli with wrappers in NeonEnv (#9195 ) Add wrappers for a few commands that didn't have them before. Move the logic to generate tenant and timeline IDs from NeonCli to the callers, so that NeonCli is more purely just a type-safe wrapper around 'neon_local'.	2024-10-03 22:03:22 +03:00
David Gomes	4e9b32c442	chore: makes some onboarding document improvements (#9216 ) * I had to install `m4` in order to be able to run locally * The docs/docker.md was missing a pointer to where the compute node code is (Was originally on #8888 but I am pulling this out)	2024-10-03 20:58:30 +02:00
David Gomes	2fac0b7fac	chore: remove unnecessary comments in compute/Dockerfile.compute-node (#9253 ) See [this comment](https://github.com/neondatabase/neon/pull/8888#discussion_r1783130082).	2024-10-03 18:26:41 +00:00
Arpad Müller	e3d6ecaeee	Revert hyper and tonic updates (#9268 )	2024-10-03 19:21:22 +01:00
Arseny Sher	d785fcb5ff	safekeeper: fix panic in debug_dump. (#9097 ) Panic was triggered only when dump selected no timelines. sentry report: https://neondatabase.sentry.io/issues/5832368589/	2024-10-03 19:22:22 +03:00
Vlad Lazar	552fa2b972	pageserver: tweak oversized key read path warning (#9221 ) ## Problem `Oversized vectored read [...]` logs are spewing in prod because we have a few keys that are unexpectedly large: * reldir/relblock - these are unbounded, so it's known technical debt * slru block - they can be a bit bigger than 128KiB due to storage format overhead ## Summary of changes * Bump threshold to 130KiB * Don't warn on oversized reldir and dbdir keys Closes https://github.com/neondatabase/neon/issues/8967	2024-10-03 16:40:35 +01:00
Arpad Müller	9d93dd4807	Rename hyper 1.0 to hyper and hyper 0.14 to hyper0 (#9254 ) Follow-up of #9234 to give hyper 1.0 the version-free name, and the legacy version of hyper the one with the version number inside. As we move away from hyper 0.14, we can remove the `hyper0` name piece by piece. Part of #9255	2024-10-03 16:33:43 +02:00
Heikki Linnakangas	53b6e1a01c	vm-monitor: Upgrade axum from 0.6 to 0.7 (#9257 ) Because: - it's nice to be up-to-date, - we already had axum 0.7 in our dependency tree, so this avoids having to compile two versions, and - removes one of the remaining dpendencies to hyper version 0 Also bumps the 'tokio-tungstenite' dependency, to avoid having two versions in the dependency tree.	2024-10-03 16:49:39 +03:00
Folke Behrens	90f731f3b1	Merge pull request #9256 from neondatabase/rc/proxy/2024-10-03 Proxy release 2024-10-03	2024-10-03 11:01:41 +02:00
Joonas Koivunen	dbef1b064c	chore: smaller layer changes (#9247 ) Address minor technical debt in Layer inspired by #9224: - layer usage as arg same as in spans - avoid one Weak::upgrade	2024-10-03 09:38:45 +01:00
Heikki Linnakangas	6a9e2d657c	Remove unnecessary dependencies from postgis-build image (#9211 ) The apt install stage before this commit: 0 upgraded, 391 newly installed, 0 to remove and 9 not upgraded. Need to get 261 MB of archives. after: 0 upgraded, 367 newly installed, 0 to remove and 9 not upgraded. Need to get 220 MB of archives.	2024-10-03 10:05:23 +03:00
Arpad Müller	2d8f6d7906	Suppress wal lag timeout warnings right after tenant attachment (#9232 ) As seen in https://github.com/neondatabase/cloud/issues/17335, during releases we can have ingest lags that are above the limits for warnings. However, such lags are part of normal pageserver startup. Therefore, calculate a certain cooldown timestamp until which we accept lags up to a certain size. The heuristic is chosen to grow the later we get to fully load the tenant, and we also add 60 seconds as a grace period after that term.	2024-10-03 02:33:09 +01:00
Arpad Müller	1b176fe74a	Use hyper 1.0 and tonic 0.12 in storage broker (#9234 ) Fixes #9231 . Upgrade hyper to 1.4.0 and use hyper 1.4 instead of 0.14 in the storage broker, together with tonic 0.12. The two upgrades go hand in hand. Thanks to the broker being independent from other components, we can upgrade its hyper version without touching the other components, which makes things easier.	2024-10-03 00:48:12 +02:00
Heikki Linnakangas	1dec93f129	Add compute_tools/ to the list of paths that trigger an E2E run on a PR (#9251 ) compute_ctl is an important part of the interfaces between the control plane and the compute, so it seems important to E2E test any changes there.	2024-10-03 00:31:19 +03:00
Alexander Bayandin	16002f5e45	test_runner: bump `requests` and `psycopg2-binary` (#9248 ) ## Problem ``` Warning: The file chosen for install of requests 2.32.0 (requests-2.32.0-py3-none-any.whl) is yanked. Reason for being yanked: Yanked due to conflicts with CVE-2024-35195 mitigation ``` ## Summary of changes - Update `requests` to fix the warning - Update `psycopg2-binary`	2024-10-02 21:26:45 +01:00
dotdister	09d4bad1be	Change parentheses to clarify conditions in walproposer (#9180 ) Some parentheses in conditional expressions are redundant or necessary for clarity conditional expressions in walproposer. ## Summary of changes Change some parentheses to clarify conditions in walproposer. Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-02 14:49:52 -04:00
Heikki Linnakangas	d20448986c	Fix metric name of the 'getpage_wait_seconds_bucket' metric (#9242 ) Per convention, histogram buckets have the '_bucket' suffix. I got that wrong in commit `0d500bbd5b`. Fixes https://github.com/neondatabase/neon/issues/9241	2024-10-02 20:05:14 +03:00
John Spray	d54624153d	tests: sync_after_each_test -> sync_between_tests (#9239 ) ## Problem We are seeing frequent pageserver startup timelines while it calls syncfs(). There is an existing fixture that syncs _after_ tests, but not before the first one. We hypothesize that some failures are happening on the first test in a job. ## Summary of changes - extend the existing sync_after_each_test to be a sync between all tests, including sync'ing before running the first test. That should remove any ambiguity about whether the sync is happening on the correct node. This is an alternative to https://github.com/neondatabase/neon/pull/8957 -- I didn't realize until I saw Alexander's comment on that PR that we have an existing hook that syncs filesystems and can be extended.	2024-10-02 17:44:25 +01:00
Alex Chi Z.	700885471f	fix(test): only test num of L1 layers in compaction smoke test (#9186 ) close https://github.com/neondatabase/neon/issues/9160 For whatever reason, pg17's WAL pattern seems different from others, which triggers some flaky behavior within the compaction smoke test. ## Summary of changes * Run L0 compaction before proceeding with the read benchmark. * So that we can ensure the num of L0 layers is 0 and test the compaction behavior only with L1 layers. We have a threshold for triggering L0 compaction. In some cases, the test case did not produce enough L0 layers to do a L0 compaction, therefore leaving the layer map with 3+ L0 layers above the L1 layers. This increases the average read depth for the timeline. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-02 17:42:35 +01:00
Vlad Lazar	38a8dcab9f	storcon: add metric for long running reconciles (#9207 ) ## Problem We don't have an alert for long running reconciles. Stuck reconciles are problematic as we've seen in a recent incident. ## Summary of changes Add a new metric `storage_controller_reconcile_long_running_total` with labels: `{tenant_id, shard_number, seq}`. The metric is removed after the long running reconcile finishes. These events should be rare, so we won't break the bank on cardinality. Related https://github.com/neondatabase/neon/issues/9150	2024-10-02 17:25:11 +01:00
Vlad Lazar	8dbfda98d4	storcon: ignore deleted timelines on new location catch-up (#9244 ) ## Problem If a timeline was deleted right before waiting for LSNs to catch up before the cut-over, then we would wait forever. ## Summary of changes Fix the issue and add a test for timeline deletions mid migration. Related https://github.com/neondatabase/neon/issues/9144	2024-10-02 17:23:26 +01:00
John Spray	f875e107aa	pageserver: tweak logging of "became visible" for layers (#9224 ) ## Problem Recent change to avoid the "became visible" log messages from certain tasks missed a task: the logical size calculation that happens as a child of synthetic size calculation. Related: https://github.com/neondatabase/neon/issues/9058 ## Summary of changes - Add OnDemandLogicalSize to the list of permitted tasks for reads making a covered layer visible - Tweak the log message to use layer name instead of key: this is more terse, and easier to use when debugging, as one can search for it elsewhere to see when the layer was written/downloaded etc.	2024-10-02 13:21:04 +01:00
Folke Behrens	1e90e792d6	proxy: Add timeout to webauth confirmation wait (#9227 ) ```shell $ cargo run -p proxy --bin proxy -- --auth-backend=web --webauth-confirmation-timeout=5s ``` ``` $ psql -h localhost -p 4432 NOTICE: Welcome to Neon! Authenticate by visiting within 5s: http://localhost:3000/psql_session/e946900c8a9bc6e9 psql: error: connection to server at "localhost" (::1), port 4432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? connection to server at "localhost" (127.0.0.1), port 4432 failed: ERROR: Disconnected due to inactivity after 5s. ```	2024-10-02 12:10:56 +02:00
Matthias van de Meent	ea32f1d0a3	Expose more granular wait event data to the user (#9163 ) In PG17, there is this newfangled custom wait events system. This commit adds that feature to Neon, so that users can see what their backends may be waiting for when a PostgreSQL backend is playing the waiting game in Neon code.	2024-10-02 11:12:50 +02:00
Heikki Linnakangas	2e3b7862d0	Fix compute metrics collector config (#9235 )	2024-10-02 09:44:00 +01:00
Arpad Müller	387e569259	Update aws SDK crates (#9233 ) This updates the aws SDK crates to their newest released versions.	2024-10-02 08:00:08 +02:00
Alex Chi Z.	31f12f6426	fix: ignore tonic to resolve advisories (#9230 ) check-rust-style fails because tonic version too old, this does not seem to be an easy fix, so ignore it from the deny list. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-01 19:26:54 -04:00
Heikki Linnakangas	8861e8a323	Fix the size of the perf counters shared memory array (#9226 ) MaxBackends doesn't include auxiliary processes. Whenever an aux process made IO operations that updated the counters, they would scribble over shared memory beoynd the end of the array. The relsize cache hash table comes after the array, so the symptom was an error about hash table corruption in the relsize cache hash.	2024-10-01 20:07:51 +01:00
Arseny Sher	62e22dfd85	Backpressure: reset ps display after it is done. (#8980 ) Previously we set the 'backpressure throttling' status, but overwrote current one and never reset it back.	2024-10-01 20:55:05 +03:00
Arseny Sher	17672c88ff	tests: wait walreceiver on sks to be gone on 'immediate' ep restart. (#9099 ) When endpoint is stopped in immediate mode and started again there is a chance of old connection delivering some WAL to safekeepers after second start checked need for sync-safekeepers and thus grabbed basebackup LSN. It makes basebackup unusable, so compute panics. Avoid flakiness by waiting for walreceivers on safekeepers to be gone in such cases. A better way would be to bump term on safekeepers if sync-safekeepers is skipped, but it needs more infrastructure. ref https://github.com/neondatabase/neon/issues/9079	2024-10-01 20:54:00 +03:00
Matthias van de Meent	6efdb1d0f3	Fix small memory accounting bug in libpagestore (#9223 ) Found while searching for other issues in shared memory. The bug should be benign, in that it over-allocates memory for this struct, but doesn't allow for out-of-bounds writes.	2024-10-01 17:37:59 +01:00
Erik Grinaker	325de52e73	pageserver: remove `TenantConfOpt::TryFrom<toml_edit::Item>` (#9219 ) Following #7656, `TenantConfOpt::TryFrom<toml_edit::Item>` appears to be dead code. This patch removes `TenantConfOpt::TryFrom<toml_edit::Item>`. The code does appear to be dead, since the TOML config is deserialized into `TenantConfig` (via `LocationConfig`) and then converted into `TenantConfOpt`. This was verified by adding a panic to `try_from()` and running the pageserver unit tests as well as a local end-to-end cluster (including creating a new tenant and restarting the pageserver). This did not fail, so this is not used on the common happy path at least. No explicit `try_from` or `try_into` calls were found either. Resolves #8918.	2024-10-01 16:35:18 +01:00
Anastasia Lubennikova	ce73db9316	Fix post_apply_config() (#9220 ) Bring back post_apply_config() step that was accidentally removed in `78938d1`	2024-10-01 16:28:58 +01:00
Shinya Kato	b675997f48	safekeeper: Fix a log message of HTTP worker (#9213 ) ## Problem There is a wrong log message. ## Summary of changes Fixed the log message.	2024-10-01 17:16:53 +02:00
Alex Chi Z.	49f99eb729	docs: add aux file v2 RFC (#9115 ) aux v2 migration is near the end and I rewrote the RFC based on what I proposed (several months before...) and what I actually implemented. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-01 15:56:54 +01:00
Heikki Linnakangas	0d500bbd5b	Add new compute metrics to sql exporter (#9190 ) These are the perf counters added in commit `263dfba6ee`. Note: This relies on 'neon' extension version 1.5. The default was bumped to 1.5 in commit `d696c41807`. --------- Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2024-10-01 17:38:19 +03:00
Heikki Linnakangas	1b8b50755c	Use debian packages for cmake again (#9212 ) On bookworm, 'cmake' is new enough that we can just use it. On bullseye, we can get a new-enough package from backports. By including 'cmake' in the build-deps stage, we don't need to install it separately in all the later build stages that need it. See https://github.com/neondatabase/neon/pull/2699, where we switched to downloading and building a specific version.	2024-10-01 15:09:09 +03:00
Conrad Ludgate	4391b25d01	proxy: ignore typ and use jwt.alg rather than jwk.alg (#9215 ) Microsoft exposes JWKs without the alg header. It's only included on the tokens. Not a problem. Also noticed that wrt the `typ` header: > It will typically not be used by applications when it is already known that the object is a JWT. This parameter is ignored by JWT implementations; any processing of this parameter is performed by the JWT application. Since we know we are expecting JWTs only, I've followed the guidance and removed the validation.	2024-10-01 10:36:49 +01:00
John Spray	40b10b878a	storage_scrubber: retry on index deletion failures (#9204 ) ## Problem In automated tests running on AWS S3, we frequently see scrubber failures when it can't delete an index. `location_conf_churn`: https://neon-github-public-dev.s3.amazonaws.com/reports/main/11076221056/index.html#/testresult/f89b1916b6a693e2 `scrubber_physical_gc`: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9178/11074269153/index.html#/testresult/9885ed5aa0fe38b6 ## Summary of changes Wrap index deletion in a backoff::retry --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-10-01 10:34:39 +01:00
David Gomes	d6c6b0a509	feat(compute): adds pg_session_jwt extension to compute image (#8888 ) ## Problem We need the [pg_session_jwt](https://github.com/neondatabase/pg_session_jwt/) extension in the compute image. This PR adds it. ## Summary of changes I added the `pg_session_jwt` extension in a very similar way to how the pggraphql and pgtiktoken extensions were added (since they're all written with pgrx). Then I tested this. ``` $ cd docker-compose/ $ PG_VERSION=16 TAG=10667533475 docker-compose up --build -d $ psql postgresql://cloud_admin:cloud_admin@localhost:55433/postgres cloud_admin@postgres=# create extension pg_session_jwt; CREATE EXTENSION Time: 43.048 ms cloud_admin@postgres=# \df auth.*; List of functions ┌────────┬──────────────────┬──────────────────┬─────────────────────┬──────┐ │ Schema │ Name │ Result data type │ Argument data types │ Type │ ├────────┼──────────────────┼──────────────────┼─────────────────────┼──────┤ │ auth │ get │ jsonb │ s text │ func │ │ auth │ init │ void │ kid bigint, s jsonb │ func │ │ auth │ jwt_session_init │ void │ s text │ func │ │ auth │ user_id │ text │ │ func │ └────────┴──────────────────┴──────────────────┴─────────────────────┴──────┘ (4 rows) cloud_admin@postgres=# select auth.init(cast('1' as bigint), to_jsonb(TEXT '{ "kty": "EC", "kid": "571683be-33cf-4e67-bccc-8905c0ebb862", "crv": "P-521", "alg": "ES512", "x": "AM_GsnQvKML2yXdn_OsN8PdgO1Sf9XMXih5vQMKLmJkp-Iz_FFWJUt6uyR_qp4brr8Ji2kjGJgN4cQJpg2kskH7V", "y": "AZg-salw24lCmsBP-BCBa5jT6INkTwLtCOC7o0BIxDVvmIEH1-PQAJVYVJPTFvPMi_PLa0QlOm-ufJYkynwa2Mau" }')); ERROR: called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"{ \\\"kty\\\": \\\"EC\\\", \\\"kid\\\": \\\"571683be-33cf-4e67-bccc-8905c0ebb862\\\", \\\"crv\\\": \\\"P-521\\\", \\\"alg\\\": \\\"ES512\\\", \\\"x\\\": \\\"AM_GsnQvKML2yXdn_OsN8PdgO1Sf9XMXih5vQMKLmJkp-Iz_FFWJUt6uyR_qp4brr8Ji2kjGJgN4cQJpg2kskH7V\\\", \\\"y\\\": \\\"AZg-salw24lCmsBP-BCBa5jT6INkTwLtCOC7o0BIxDVvmIEH1-PQAJVYVJPTFvPMi_PLa0QlOm-ufJYkynwa2Mau\\\" }\", expected struct JwkEcKey", line: 0, column: 0) Time: 6.991 ms ``` ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Move the download location to a proper URL	2024-10-01 10:29:56 +01:00
John Spray	d515727e94	tests: make test_multi_attach more stable (#9202 ) ## Problem `test_multi_attach` is sometimes failing with `invalid compute status for configuration request: Configuration`. This is likely a result of the test attempting to reconfigure the compute at the same time as the storage controller is doing so. This test was originally written before the storage controller existed, and is not expecting anything else to be reconfiguring computes at the same time. ## Summary of changes - Configure the tenant into scheduling policy `Stop` in the storage controller at the start of the test, so that it won't try to do anything to the tenant while the test is running.	2024-10-01 10:15:18 +01:00
Folke Behrens	2e508b1ff9	Upgrade OpenTelemetry and other tracing crates (#9200 ) * tracing-utils now returns a `Layer` impl. Removes the need for crates to import OTel crates. * Drop the /v1/traces URI check. Verified that the code does the right thing. * Leave a TODO to hook in an error handler for OTel to log errors to when it assumes the regular pipeline cannot be used/is broken.	2024-10-01 11:02:54 +02:00
John Spray	651ae44569	storage controller: drop out of blocking compute notification loop if migration origin becomes unavailable (#9147 ) ## Problem The live migration code waits forever for the compute notification hook, on the basis that until it succeeds, the compute is probably using the old location and we shouldn't detach it. However, if a pageserver stops or restarts in the background, then this original location might no longer be available, so there is no point waiting. Waiting is also actively harmful, because it prevents other reconciliations happening for the tenant shard, such as during an upgrade where a stuck "drain" migration might prevent the later "fill" migration from moving the shard back to its original location. ## Summary of changes - Refactor the notification wait loop into a function - Add a checks during the loop, for the origin node's cancellation token and an explicit HTTP request to the origin node to confirm the shard is still attached there. Closes: https://github.com/neondatabase/neon/issues/8901	2024-10-01 07:57:22 +00:00
Heikki Linnakangas	65bda19051	Remove unnecessary dev package from compute image (#9210 ) libcurl4-openssl-dev is needed to build pgxn/, but libcurl4 is enough at runtime.	2024-10-01 01:07:43 +03:00
Conrad Ludgate	94a5ca2817	proxy: auth broker (#8855 ) Opens http2 connection to local-proxy and forwards requests over with all headers and body closes https://github.com/neondatabase/cloud/issues/16039	2024-09-30 20:43:45 +01:00
Arthur Petukhovsky	c07cea80bd	Bump vm-builder v0.29.3 -> v0.35.0 (#9208 ) We haven't updated it for a while. Now I need the update to add quotas support to compute images (https://github.com/neondatabase/cloud/issues/13127). Previous update: https://github.com/neondatabase/neon/pull/7849	2024-09-30 19:18:42 +01:00
Conrad Ludgate	a2e2362ee9	add proxy-protocol header disable option (#9203 ) resolves https://github.com/neondatabase/cloud/issues/18026	2024-09-30 18:11:50 +00:00
Heikki Linnakangas	0a567acdb9	tests: Move comment to more appropriate place There is no 'pg_bin' in NeonEnv.	2024-09-30 17:56:43 +03:00
Heikki Linnakangas	69ea2776e9	tests: Remove creation of extra timelines in some tests neon_cli.create_tenant() creates a new tenant and a timeline on the tenant, with name "main". In most tests, there's no need to create another timeline on the same tenant. There are some more tests that do that, but in the remaining cases, I wasn't be 100% if the presence of extra root timelines affect what the tests test, so I left them alone.	2024-09-30 17:56:40 +03:00
Heikki Linnakangas	4dc9cb7cf9	tests: Remove some spurious list_timelines calls These calls seem really out of place. We know what the initial tenant and branch are in these tests, just like in all other tests.	2024-09-30 17:56:37 +03:00
John Spray	7424e7269c	tests: longer timeout in `test_delete_timeline_client_hangup` (#9161 ) ## Problem This test waits for a request to finish, and then expects deletion to complete almost immediately. The request completes, but it's a 202, the timeline is still deleting in the background: we need to be more patient. ## Summary of changes - Adjust iterations from 2 to 10 when waiting for deletion	2024-09-30 15:46:07 +01:00
a-masterov	5dc68e4e6a	test_compatibility: fix the regexes detecting the version (#9205 ) ## Problem The Neon components, built locally and by the GitHub workflow have slightly different version prefixes (git: vs git-env:) This does not allow running tests against local builds correctly. ## Summary of changes The regular expressions were changed to work with both prefixes.	2024-09-30 16:37:14 +02:00
John Spray	7cfd116856	pageserver: refactor immediate_gc into TenantManager (#9183 ) ## Problem Legacy functions that were called as `mgr::` and relied on the static TENANTS, see #5796 ## Summary of changes - Move the last stray function (immediate_gc) into TenantManager Closes: https://github.com/neondatabase/neon/issues/5796	2024-09-30 09:27:28 +01:00
Heikki Linnakangas	d696c41807	Bump default neon extension version to 1.5 (#9188 ) Commit `263dfba6ee` introduced neon extension version 1.5, which included some new functions and views for metrics. It didn't bump the default neon extension number yet, so that we could still safely roll back to the old binary if necessary. This bumps the default version.	2024-09-30 09:20:52 +03:00
Alexander Bayandin	3c72192065	CI(benchmarking): fix setting LD_LIBRARY_PATH (#9191 ) ## Problem `pgbench-pgvector` job from Nightly Benchmarks fails with the error: ``` /__w/_temp/f45bc2eb-4c4c-4f0a-8030-99079303fa65.sh: line 17: LD_LIBRARY_PATH: unbound variable ``` ## Summary of changes - Fix `LD_LIBRARY_PATH: unbound variable` error in benchmarks	2024-09-29 22:27:53 +00:00
Alexander Bayandin	d2d9921761	CI(benchmarking): fix Nightly Benchmarks (#9178 ) ## Problem Nightly Benchmarks have been broken for some time due to various reasons, this PR fixes it ## Summary of changes - Pull `build-tools` image from dockerhub for `benchmarking` workflow - Use `aws-actions/configure-aws-credentials` to upload/download artifacts from S3 - Fix Postgres 16 installation (for pgbench)	2024-09-28 02:44:22 +01:00
Arthur Petukhovsky	ba498a630a	Set disk quotas on bind in compute_ctl (#8936 ) Part of https://github.com/neondatabase/cloud/issues/13127. Resolves #9153 What changed in this PR: 1. Adds `ComputeSpec.disk_quota_bytes: Option<u64>` 2. Adds new arg to compute_ctl: `--set-disk-quota-for-fs <mountpoint>` 3. Implements running `/neonvm/bin/set-disk-quota` with the right value if both cmdline arg AND field in the spec are specified 4. Patches `/etc/sudoers.d` to allow `compute_ctl` to set quota with sudo This PR is very similar to the swap support added earlier, you can take a look at it as prior art: #7434 In theory, it can be implemented outside of compute_ctl when we will have a separate neonvm daemon, but we are not there yet. Current implementation is the simplest possible to unblock computes with larger disks. All code related to usage of `/neonvm/bin/set-disk-quota` is located in `disk_quota.rs`. We need to call this script with the following arguments: `/neonvm/bin/set-disk-quota {size_kb} {mountpoint}`. Quotas are set on the filesystem level, so we need to provide path to the directory that filesystem was mounted to. I tested this change locally with https://github.com/neondatabase/cloud/pull/17270. It should be safe to merge, because this feature is gated by both cmdline arg and field in the spec. If control-plane doesn't set values in both places, compute_ctl won't be affected by this change.	2024-09-27 20:52:22 +01:00
Heikki Linnakangas	e989a5e4a2	neon_local: Use clap derive macros to parse the CLI args (#9103 ) This is easier to work with.	2024-09-27 22:08:46 +03:00
Alex Chi Z.	cde1654d7b	fix(pageserver): abort process if fsync fails (#9108 ) close https://github.com/neondatabase/neon/issues/8140 The original issue is rather vague on what we should do. After discussion w/ @problame we decided to narrow down the problems we want to solve in that issue. * read path -- do not panic for now. * write path -- panic only on write errors (i.e., device error, fsync error), but not on no-space for now. The guideline is that if the pageserver behavior could lead to violation of persistent constraints (i.e., return an operation as successful but not actually persisting things), we should panic. Fsync is the place where both of us agree that we should panic, because if fsync fails, the kernel will mark dirty pages as clean, and the next fsync will not necessarily return false. This would make the storage client assume the operation is successful. ## Summary of changes Make fsync panic on fatal errors. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-27 19:58:50 +01:00
Heikki Linnakangas	cf6a776fcf	tests: Reduce the # of iterations in safekeeper::test_random_schedules (#9182 ) To make it faster. On my laptop, it takes about 30 before this commit. In the arm64 debug variant in CI, it takes about 120 s. Reduce it by factor of 4.	2024-09-27 16:25:35 +00:00
Matthias van de Meent	5c5871111a	WalProposer: Read WAL directly from WAL buffers in PG17 (#9171 ) This reduces the overhead of the WalProposer when it is not being throttled by SK WAL acceptance rate	2024-09-27 17:47:05 +02:00
Yuchen Liang	d56c4e7a38	pageserver: remove AdjacentVectoredReadBuilder and bump minmimum io_buffer_alignment to 512 (#9175 ) Part of #8130 ## Problem After deploying https://github.com/neondatabase/infra/pull/1927, we shipped `io_buffer_alignment=512` to all prod region. The `AdjacentVectoredReadBuilder` code path is no longer taken and we are running pageserver unit tests 6 times in the CI. Removing it would reduce the test duration by 30-60s. ## Summary of changes - Remove `AdjacentVectoredReadBuilder` code. - Bump the minimum `io_buffer_alignment` requirement to at least 512 bytes. - Use default `io_buffer_alignment` for Rust unit tests. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-27 16:41:42 +01:00
Conrad Ludgate	43b2445d0b	proxy: add jwks endpoint to control plane and mock providers (#9165 )	2024-09-27 16:08:43 +01:00
Yuchen Liang	42ef08db47	fix(pageserver): LSN lease edge cases around restarts/migrations (#9055 ) Part of #7497, closes #8817. ## Problem See #8817. ## Summary of changes compute_ctl - Renew lsn lease as soon as `/configure` updates pageserver_connstr, use `state_changed` Condvar for synchronization. pageserver As mentioned in https://github.com/neondatabase/neon/issues/8817#issuecomment-2315768076, we still want some permanent error reported if a lease cannot be granted. By considering attachment mode and the added `lsn_lease_deadline` when processing lease requests, we can also bound the case of bad requests to a very short period after migration/restart. - Refactor https://github.com/neondatabase/neon/pull/9024 and move `lsn_lease_deadline` to `AttachedTenantConf` so timeline can easily access it. - Have separate HTTP `init_lsn_lease` and libpq `renew_lsn_lease` API. - Always do LSN verification for the initial HTTP lease request. - LSN verification for the renewal is still done when tenants are not in `AttachedSingle` and we have pass the `lsn_lease_deadline`, which give plenty of time for compute to renew the lease. neon_local - add and call `timeline_init_lsn_lease` mgmt_api at static endpoint start. The initial lsn lease http request is sent when we run `cargo neon endpoint start <static endpoint>`. ## Testing - Extend `test_readonly_node_gc` to do pageserver restarts and migration. ## Future Work - The control plane should make the initial lease request through HTTP when creating a static endpoint. This is currently only done in `neon_local`. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-27 09:56:52 -04:00
Tristan Partin	fc962c9605	Use long options when calling initdb Verbosity in this case is good when reading the code. Short options are better when operating in an interactive shell. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-27 08:22:16 -05:00
Heikki Linnakangas	357fa070a3	Add gdb to build-tools (#9125 ) So that compute_ctl can use it to print backtrace on core dumps See issue #2800.	2024-09-27 15:36:24 +03:00
Heikki Linnakangas	02cdd37b56	Dump backtrace if a core dump is called just "core" (#9125 ) I hope this lets us capture backtraces in CI. At least it makes it work on my laptop, which is valuable even if we need to do more for CI. See issue #2800.	2024-09-27 15:36:24 +03:00
Vlad Lazar	fa354a65ab	libs: improve logging on PG connection errors (#9130 ) ## Problem We get some unexpected errors, but don't know who they're happening for. ## Summary of change Add tenant id and peer address to PG connection error logs. Related https://github.com/neondatabase/cloud/issues/17336	2024-09-27 12:36:43 +01:00
Arseny Sher	40f7930a7d	safekeeper: skip syncfs on start if --no-sync is specified. (#9166 ) https://neondb.slack.com/archives/C059ZC138NR/p1727350911890989?thread_ts=1727350211.370869&cid=C059ZC138NR	2024-09-27 09:59:38 +03:00
Conrad Ludgate	ec07a1ecc9	proxy: make local-proxy config by signal with PID, refine JWKS apis with role caching (#9164 )	2024-09-26 19:01:48 +01:00
Arseny Sher	c4cdfe66ac	Fix flakiness of test_timeline_copy. Timeline might be not initialized when timeline_start_lsn is queried. Spotted by CI.	2024-09-26 19:01:45 +03:00
Alex Chi Z.	42e19e952f	fix(pageserver): categorize client error in basebackup metrics (#9110 ) We separated client error from basebackup error log lines in https://github.com/neondatabase/neon/pull/7523, but we didn't do anything for the metrics. In this patch, we fixed it. ref https://github.com/neondatabase/neon/issues/8970 ## Summary of changes We use the same criteria as in `log_query_error` producing an info line (instead of error) for the metrics. We added a `client_error` category for the basebackup query time metrics. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-26 11:38:19 -04:00
John Spray	3d255d601b	pageserver: rename control plane client & chunk validation requests (#8997 ) ## Problem - In https://github.com/neondatabase/neon/pull/8784, the validate controller API is modified to check generations directly in the database. It batches tenants into separate queries to avoid generating a huge statement, but - While updating this, I realized that "control_plane_client" is a kind of confusing name for the client code now that it primarily talks to the storage controller (the case of talking to the control plane will go away in a few months). ## Summary of changes - Big rename to "ControllerUpcallClient" -- this reflects the storage controller's api naming, where the paths used by the pageserver are in `/upcall/` - When sending validate requests, break them up into chunks so that we avoid possible edge cases of generating any HTTP requests that require database I/O across many thousands of tenants. This PR mixes a functional change with a refactor, but the commits are cleanly separated -- only the last commit is a functional change. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-09-26 16:06:34 +01:00
Arthur Petukhovsky	80e974d05b	fix(compute_ctl): race condition in configurator (#9162 ) There was a tricky race condition in compute_ctl, that sometimes makes configurator skip updates. It makes a deadlock because: - control-plane cannot configure compute, because it's in ConfigurationPending state - compute_ctl doesn't do any reconfiguration because `configurator_main_loop` missed notification for it Full sequence that reproduces the issue: 1. `start_compute` finishes works and changes status `self.set_status(ComputeStatus::Running);` 2. configurator received update about `Running` state and dropped the mutex lock in the iteration 3. `/configure` request was triggered at the same time as step 1, and got the mutex lock 4. same `/configure` request set the spec and updated the state to `ConfigurationPending`, also sent a notification 5. next iteration in configurator got the mutex lock, but missed the notification There are more details in this slack thread: https://neondb.slack.com/archives/C03438W3FLZ/p1727281028478689?thread_ts=1727261220.483799&cid=C03438W3FLZ --------- Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-09-26 15:42:17 +01:00
Alexander Bayandin	7fdf1ab5b6	CI: run compatibility tests on Postgres 17 (#9145 ) ## Problem The latest storage release has generated artifacts for Postgres 17, so we can enable compatibility tests this version ## Summary of changes - Unskip `test_backward_compatibility` / `test_forward_compatibility` on Postgres 17	2024-09-26 15:17:01 +01:00
Conrad Ludgate	7736b748d3	Merge pull request #9159 from neondatabase/rc/proxy/2024-09-26 Proxy release 2024-09-26	2024-09-26 09:22:33 +01:00
Arpad Müller	7bae78186b	Forbid creation of child timelines of archived timeline (#9122 ) We don't want to allow any new child timelines of archived timelines. If you want any new child timelines, you should first un-archive the timeline. Part of #8088	2024-09-26 02:05:25 +02:00
Heikki Linnakangas	7e560dd00e	chore: Silence clippy warning with nightly (#9157 ) The warning: warning: first doc comment paragraph is too long --> pageserver/src/tenant/checks.rs:7:1 \| 7 \| / /// Checks whether a layer map is valid (i.e., is a valid result of the current compaction algorithm if no... 8 \| \| /// The function checks if we can split the LSN range of a delta layer only at the LSNs of the delta layer... 9 \| \| /// 10 \| \| /// ```plain \| \|_ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph = note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default help: add an empty line \| 7 ~ /// Checks whether a layer map is valid (i.e., is a valid result of the current compaction algorithm if nothing goes wrong). 8 + /// \| Fix by applying the suggestion.	2024-09-25 21:29:16 +00:00
Tristan Partin	684e924211	Fix compute_logical_snapshot_files for v14 The function, pg_ls_logicalsnapdir(), was added in version 15. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-25 16:25:17 -05:00
Tristan Partin	8ace9ea25f	Format long single DATA line in pgxn/Makefile This should be a little more readable. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-25 16:25:17 -05:00
Alex Chi Z.	6a4f49b08b	fix(pageserver): passthrough partition cancel error (#9154 ) close https://github.com/neondatabase/neon/issues/9142 ## Summary of changes passthrough CollectKeyspaceError::Cancelled to CompactionError::ShuttingDown Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-25 21:35:33 +01:00
Alexander Bayandin	c6e89445e2	CI(promote-images): fix prod ECR auth (#9146 ) A cherry-pick from the previous release (#9131) ## Problem Login to prod ECR doesn't work anymore: ``` Retrieving registries data through * SDK... * ECR detected with eu-central-1 region Error: The security token included in the request is invalid. ``` ## Summary of changes - Fix login to prod ECR by using `aws-actions/configure-aws-credentials`	2024-09-25 18:22:39 +01:00
Vlad Lazar	04f32b9526	tests: remove patching up of az id column (#8968 ) This was required since the compat tests used a snapshot generated from a version of neon local which didn't contain the availability_zone_id column.	2024-09-25 17:22:32 +01:00
Heikki Linnakangas	6f2333f52b	CI: Leave out unnecessary build files from binary artifact (#9135 ) The pg_install/build directory contains .o files and such intermediate results from the build, which are not needed in the final tarball. Except for src/test/regress/regress.so and a few other .so files in that directory; keep those. This reduces the size of the neon-Linux-X64-release-artifact.tar.zst artifact from about 1.5 GB to 700 MB. (I attempted this a long time ago already, by moving the build/ directory out of pg_install altogether, see PR #2127. But I never got around to finish that work.) Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-25 19:07:20 +03:00
Yuchen Liang	d447f49bc3	fix(pageserver): handle lsn lease requests for unnormalized lsns (#9137 ) Fixes https://github.com/neondatabase/neon/issues/9098. ## Problem See https://github.com/neondatabase/neon/issues/9098#issuecomment-2372484969. ### Related A similar problem happened with branch creation, which was discussed [here](https://github.com/neondatabase/neon/pull/2143#issuecomment-1199969052) and fixed by https://github.com/neondatabase/neon/pull/2529. ## Summary of changes - Normalize the lsn on pageserver side upon lsn lease request, stores the normalized LSN. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-25 14:57:38 +00:00
Vlad Lazar	c5972389aa	storcon: include timeline ID in LSN waiting logs (#9141 ) ## Problem Hard to tell which timeline is holding the migration. ## Summary of Changes Add timeline id to log.	2024-09-25 15:54:41 +01:00
Matthias van de Meent	c4f5736d5a	Build images for PG17 using Debian 12 "Bookworm" (#9132 ) This increases the support window of the OS used for PG17 by 2 years compared to the previous usage of Debian 11 "Bullseye".	2024-09-25 17:50:05 +03:00
Alexey Kondratov	518f598e2d	docs(rfc): Independent compute release flow (#8881 ) Related to https://github.com/neondatabase/cloud/issues/11698	2024-09-25 16:24:09 +02:00
John Spray	4b711caf5e	storage controller: make proxying of GETs to pageservers more robust (#9065 ) ## Problem These commits are split off from https://github.com/neondatabase/neon/pull/8971/commits where I was fixing this to make a better scale test pass -- Vlad also independently recognized these issues with cloudbench in https://github.com/neondatabase/neon/issues/9062. 1. The storage controller proxies GET requests to pageservers based on their intent, not the ground truth of where they're really attached. 2. Proxied requests can race with scheduling to tenants, resulting in 404 responses if the request hits the wrong pageserver. Closes: https://github.com/neondatabase/neon/issues/9062 ## Summary of changes 1. If a shard has a running reconciler, then use the database generation_pageserver to decide who to proxy the request to 2. If such a request gets a 404 response and its scheduled node has changed since the request was dispatched.	2024-09-25 13:56:39 +00:00
Vlad Lazar	2cf47b1477	storcon: do az aware scheduling (#9083 ) ## Problem Storage controller didn't previously consider AZ locality between compute and pageservers when scheduling nodes. Control plane has this feature, and, since we are migrating tenants away from it, we need feature parity to avoid perf degradations. ## Summary of changes The change itself is fairly simple: 1. Thread az info into the scheduler 2. Add an extra member to the scheduling scores Step (2) deserves some more discussion. Let's break it down by the shard type being scheduled: Attached Shards We wish for attached shards of a tenant to end up in the preferred AZ of the tenant since that is where the compute is like to be. The AZ member for `NodeAttachmentSchedulingScore` has been placed below the affinity score (so it's got the second biggest weight for picking the node). The rationale for going below the affinity score is to avoid having all shards of a single tenant placed on the same node in 2 node regions, since that would mean that one tenant can drive the general workload of an entire pageserver. I'm not 100% sure this is the right decision, so open to discussing hoisting the AZ up to first place. Secondary Shards We wish for secondary shards of a tenant to be scheduled in a different AZ from the preferred one for HA purposes. The AZ member for `NodeSecondarySchedulingScore` has been placed first, so nodes in different AZs from the preferred one will always be considered first. On small clusters, this can mean that all the secondaries of a tenant are scheduled to the same pageserver, but secondaries don't use up as many resources as the attached location, so IMO the argument made for attached shards doesn't hold. Related: https://github.com/neondatabase/neon/issues/8848	2024-09-25 14:31:04 +01:00
Folke Behrens	7dcfcccf7c	Re-export git-version from utils and remove as direct dep (#9138 )	2024-09-25 14:38:35 +02:00
Vlad Lazar	a26cc29d92	storcon: add tags to scheduler logs (#9127 ) We log something at info level each time we schedule a shard to a non-secondary location. Might as well have context for it.	2024-09-25 10:16:06 +01:00
Alex Chi Z.	5f2f31e879	fix(test): storage scrubber should only log to stdout with info (#9067 ) As @koivunej mentioned in the storage channel, for regress test, we don't need to create a log file for the scrubber, and we should reduce noisy logs. ## Summary of changes * Disable log file creation for storage scrubber * Only log at info level --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-24 22:33:03 +00:00
Damian972	938b163b42	chore(docker-compose): fix typo in readme (#9133 ) Typo in the readme inside docker-compose folder ## Summary of changes - Update the readme	2024-09-24 18:05:23 -04:00
Heikki Linnakangas	5cbf5b45ae	Remove TenantState::Loading (#9118 ) The last real use was removed in commit `de90bf4663`. It was still used in a few unit tests, but they can use Attaching too.	2024-09-24 20:58:54 +00:00
Heikki Linnakangas	af5c54ed14	test: Make test_lfc_resize more robust (#9117 ) 1. Increase statement_timeout. It defaults to 120 s, which is not quite enough on slow or busy systems with debug build. On my laptop, the index creation takes about 100 s. On buildfarm, we've seen failures, e.g: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9084/10997888708/index.html#suites/821f97908a487f1d7d3a2a4dd1571e99/db1834bddfe8c5b9/ 2. Keep twiddling the LFC size through the whole test. Before, we would do it for the first 10 seconds, but that only covers a small part of the pgbench initialization phase. Change the loop so that the pgbench run time determines how long the test runs, and we keep changing the LFC for the whole time. In the passing, also fix bogus test description, copy-pasted from a completely unrelated test.	2024-09-24 23:38:16 +03:00
Alexander Bayandin	523cf71721	Fix compiler warnings on macOS (#9128 ) ## Problem Compilation of neon extension on macOS produces a warning ``` pgxn/neon/neon_perf_counters.c:50:1: error: non-void function does not return a value [-Werror,-Wreturn-type] ``` ## Summary of changes - Change the return type of `NeonPerfCountersShmemInit` to void	2024-09-24 18:11:31 +00:00
Arpad Müller	c47f355ec1	Catch Cancelled and don't print a warning for it (#9121 ) In the `imitate_synthetic_size_calculation_worker` function, we might obtain the `Cancelled` error variant instead of hitting the cancellation token based path. Therefore, catch `Cancelled` and handle it analogously to the cancellation case. Fixes #8886.	2024-09-24 17:28:56 +00:00
Yuchen Liang	4f67b0225b	pageserver: handle decompression outside vectored `read_blobs` (#8942 ) Part of #8130. ## Problem Currently, decompression is performed within the `read_blobs` implementation and the decompressed blob will be appended to the end of the `BytesMut` buffer. We will lose this flexibility of extending the buffer when we switch to using our own dio-aligned buffer (WIP in https://github.com/neondatabase/neon/pull/8730). To facilitate the adoption of aligned buffer, we need to refactor the code to perform decompression outside `read_blobs`. ## Summary of changes - `VectoredBlobReader::read_blobs` will return `VectoredBlob` without performing decompression and appending decompressed blob. It becomes the caller's responsibility to decompress the buffer. - Added a new `BufView` type that functions as `Cow<Bytes, &[u8]>`. - Perform decompression within `VectoredBlob::read` so that people don't have to explicitly thinking about compression when using the reader interface. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-24 16:41:38 +00:00
Heikki Linnakangas	2f7cecaf6a	test: Poll pageserver availability more aggressively at test startup Even with the 100 ms interval, on my laptop the pageserver always becomes available on second attempt, so this saves about 900 ms at every test startup.	2024-09-24 17:16:43 +03:00
Heikki Linnakangas	589594c2e1	test: Skip fsync when initdb'ing the storage controller db After initdb, we configure it with "fsync=off" anyway.	2024-09-24 17:16:43 +03:00
Heikki Linnakangas	70fe007519	test: Make test_hot_standby_feedback more forgiving of slow initialization (#9113 ) Don't start waiting for the index to appear in the secondary until it has been created in the primary. Before, if the "pgbench -i" step took more than 60 s, we would give up. There was a flaky test failure along those lines at: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9105/10997477941/index.html#suites/950eff205b552e248417890b8b8f189e/73cf4b5648fa6f74/ Hopefully, this avoids such failures in the future.	2024-09-24 16:41:59 +03:00
a-masterov	b224a5a377	Move the patch to compute (#9120 ) ## Problem All the other patches were moved to the compute directory, and only one was left in the patches subdirectory in the root directory. ## Summary of changes The patch was moved to the compute directory as others	2024-09-24 15:13:18 +02:00
Christian Schwarz	a65d437930	chore(#9077 ): cleanups & code dedup (#9082 ) Punted from https://github.com/neondatabase/neon/pull/9077	2024-09-24 13:05:07 +00:00
Matthias van de Meent	fc67f8dc60	Update PostgreSQL 17 from 17rc1 to 17.0 (#9119 ) The PostgreSQL 17 vendor module is now based on postgres/postgres @ d7ec59a63d745ba74fba0e280bbf85dc6d1caa3e, presumably the final code change before the V17 tag.	2024-09-24 14:15:52 +02:00
Folke Behrens	2b65a2b53e	proxy: check if IP is allowed during webauth flow (#9101 ) neondatabase/cloud#12018	2024-09-24 11:52:25 +02:00
Vlad Lazar	9490360df4	storcon: improve initial shard scheduling (#9081 ) ## Problem Scheduling on tenant creation uses different heuristics compared to the scheduling done during background optimizations. This results in scenarios where shards are created and then immediately migrated by the optimizer. ## Summary of changes 1. Make scheduler aware of the type of the shard it is scheduling (attached vs secondary). We wish to have different heuristics. 2. For attached shards, include the attached shard count from the context in the node score calculation. This brings initial shard scheduling in line with what the optimization passes do. 3. Add a test for (2). This looks like a bigger change than required, but the refactoring serves as the basis for az-aware shard scheduling where we also need to make the distinction between attached and secondary shards. Closes https://github.com/neondatabase/neon/issues/8969	2024-09-24 09:03:41 +00:00
a-masterov	91d947654e	Add regression tests for a cloud-based Neon instance (#8681 ) ## Problem We need to be able to run the regression tests against a cloud-based Neon staging instance to prepare the migration to the arm architecture. ## Summary of changes Some tests were modified to work on the cloud instance (i.e. added passwords, server-side copy changed to client-side, etc) --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-24 09:44:45 +02:00
Yuchen Liang	37aa6fd953	scrubber: retry when missing index key in the listing (#8873 ) Part of #8128, fixes #8872. ## Problem See #8872. ## Summary of changes - Retry `list_timeline_blobs` another time if - there are layer file keys listed but not index. - failed to download index. - Instrument code with `analyze-tenant` and `analyze-timeline` span. - Remove `initdb_archive` check, it could have been deleted. - Return with exit code 1 on fatal error if `--exit-code` parameter is set. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-23 21:58:12 +00:00
Heikki Linnakangas	3ad567290c	Move metric exporter and pgbouncer config files Instead of adding them to the VM image late in the build process, when putting together the final VM image, include them in the earlier compute image already. That makes it more convenient to edit the files, and to test them.	2024-09-24 00:35:52 +03:00
Heikki Linnakangas	3a110e45ed	Move files related to building compute image into compute/ dir Seems nice to keep all these together. This also provides a nice place for a README file to describe the compute image build process. For now, it briefly describes the contents of the directory, but can be expanded.	2024-09-24 00:35:52 +03:00
Heikki Linnakangas	e7e6319e20	Fix compiler warnings with nightly rustc about elided lifetimes having names (#9105 ) The warnings: warning: elided lifetime has a name --> pageserver/src/metrics.rs:1386:29 \| 1382 \| pub(crate) fn start_timer<'c: 'a, 'a>( \| -- lifetime `'a` declared here ... 1386 \| ) -> Option<impl Drop + '_> { \| ^^ this elided lifetime gets resolved as `'a` \| = note: `#[warn(elided_named_lifetimes)]` on by default warning: elided lifetime has a name --> pageserver/src/metrics.rs:1537:46 \| 1534 \| pub(crate) fn start_recording<'c: 'a, 'a>( \| -- lifetime `'a` declared here ... 1537 \| ) -> BasebackupQueryTimeOngoingRecording<'_, '_> { \| ^^ this elided lifetime gets resolved as `'a` warning: elided lifetime has a name --> pageserver/src/metrics.rs:1537:50 \| 1534 \| pub(crate) fn start_recording<'c: 'a, 'a>( \| -- lifetime `'a` declared here ... 1537 \| ) -> BasebackupQueryTimeOngoingRecording<'_, '_> { \| ^^ this elided lifetime gets resolved as `'a` warning: elided lifetime has a name --> pageserver/src/tenant.rs:3630:25 \| 3622 \| async fn prepare_new_timeline<'a>( \| -- lifetime `'a` declared here ... 3630 \| ) -> anyhow::Result<UninitializedTimeline> { \| ^^^^^^^^^^^^^^^^^^^^^ this elided lifetime gets resolved as `'a`	2024-09-23 23:31:32 +02:00
Matthias van de Meent	d865881d59	NOAI (#9084 ) We can't FlushOneBuffer when we're in redo-only mode on PageServer, so make execution of that function conditional on us not running in pageserver walredo mode.	2024-09-23 21:16:42 +00:00
Konstantin Knizhnik	1c5d6e59a0	Maintain number of used pages for LFC (#9088 ) ## Problem LFC cache entry is chunk (right now size of chunk is 1Mb). LFC statistics shows number of chunks, but not number of used pages. And autoscaling team wants to know how sparse LFC is: https://neondb.slack.com/archives/C04DGM6SMTM/p1726782793595969 It is possible to obtain it from the view `select count() from local_cache`. Nut it is expensive operation, enumerating all entries in LFC under lock. ## Summary of changes This PR added "file_cache_used_pages" to `neon_lfc_stats` view: ``` select from neon_lfc_stats; lfc_key \| lfc_value -----------------------+----------- file_cache_misses \| 3139029 file_cache_hits \| 4098394 file_cache_used \| 1024 file_cache_writes \| 3173728 file_cache_size \| 1024 file_cache_used_pages \| 25689 (6 rows) ``` Please notice that this PR doesn't change neon extension API, so no need to create new version of Neon extension. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-09-23 22:05:32 +03:00
Heikki Linnakangas	263dfba6ee	Add views for metrics about pageserver requests (#9008 ) The metrics include a histogram of how long we need to wait for a GetPage request, number of reconnects, and number of requests among other things. The metrics are not yet exported anywhere, but you can query them manually. Note: This does not bump the default version of the 'neon' extension. We will do that later, as a separate PR. The reason is that this allows us to roll back the compute image smoothly, if necessary. Once the image that includes the new extension .so file with the new functions has been rolled out, and we're confident that we don't need to roll back the image anymore, we can change default extension version and actually start using the new functions and views. This is what the view looks like: ``` postgres=# select * from neon_perf_counters ; metric \| bucket_le \| value ---------------------------------------+-----------+---------- getpage_wait_seconds_count \| \| 300 getpage_wait_seconds_sum \| \| 0.048506 getpage_wait_seconds_bucket \| 2e-05 \| 0 getpage_wait_seconds_bucket \| 3e-05 \| 0 getpage_wait_seconds_bucket \| 6e-05 \| 71 getpage_wait_seconds_bucket \| 0.0001 \| 124 getpage_wait_seconds_bucket \| 0.0002 \| 248 getpage_wait_seconds_bucket \| 0.0003 \| 279 getpage_wait_seconds_bucket \| 0.0006 \| 297 getpage_wait_seconds_bucket \| 0.001 \| 298 getpage_wait_seconds_bucket \| 0.002 \| 298 getpage_wait_seconds_bucket \| 0.003 \| 298 getpage_wait_seconds_bucket \| 0.006 \| 300 getpage_wait_seconds_bucket \| 0.01 \| 300 getpage_wait_seconds_bucket \| 0.02 \| 300 getpage_wait_seconds_bucket \| 0.03 \| 300 getpage_wait_seconds_bucket \| 0.06 \| 300 getpage_wait_seconds_bucket \| 0.1 \| 300 getpage_wait_seconds_bucket \| 0.2 \| 300 getpage_wait_seconds_bucket \| 0.3 \| 300 getpage_wait_seconds_bucket \| 0.6 \| 300 getpage_wait_seconds_bucket \| 1 \| 300 getpage_wait_seconds_bucket \| 2 \| 300 getpage_wait_seconds_bucket \| 3 \| 300 getpage_wait_seconds_bucket \| 6 \| 300 getpage_wait_seconds_bucket \| 10 \| 300 getpage_wait_seconds_bucket \| 20 \| 300 getpage_wait_seconds_bucket \| 30 \| 300 getpage_wait_seconds_bucket \| 60 \| 300 getpage_wait_seconds_bucket \| 100 \| 300 getpage_wait_seconds_bucket \| Infinity \| 300 getpage_prefetch_requests_total \| \| 69 getpage_sync_requests_total \| \| 231 getpage_prefetch_misses_total \| \| 0 getpage_prefetch_discards_total \| \| 0 pageserver_requests_sent_total \| \| 323 pageserver_requests_disconnects_total \| \| 0 pageserver_send_flushes_total \| \| 323 file_cache_hits_total \| \| 0 (39 rows) ```	2024-09-23 21:28:50 +03:00
Heikki Linnakangas	df3996265f	test: Downgrade info message on removing empty directories (#9093 ) It was pretty noisy. It changed from debug to info level in commit `78938d1b59`, but I believe that was not purpose.	2024-09-23 20:10:22 +02:00
Alex Chi Z.	29699529df	feat(pageserver): filter keys with gc-compaction (#9004 ) Part of https://github.com/neondatabase/neon/issues/8002 Close https://github.com/neondatabase/neon/issues/8920 Legacy compaction (as well as gc-compaction) rely on the GC process to remove unused layer files, but this relies on many factors (i.e., key partition) to ensure data in a dropped table can be eventually removed. In gc-compaction, we consider the keyspace information when doing the compaction process. If a key is not in the keyspace, we will skip that key and not include it in the final output. However, this is not easy to implement because gc-compaction considers branch points (i.e., retain_lsns) and the retained keyspaces could change across different LSNs. Therefore, for now, we only remove aux v1 keys in the compaction process. ## Summary of changes * Add `FilterIterator` to filter out keys. * Integrate `FilterIterator` with gc-compaction. * Add `collect_gc_compaction_keyspace` for a spec of keyspaces that can be retained during the gc-compaction process. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-23 16:30:44 +00:00
Nikita Kalyanov	f446e08fb8	change HTTP method to comply with spec (#9100 ) There is discrepancy with the spec, it has PUT	2024-09-23 15:53:06 +02:00
Christian Schwarz	4d5add9ca0	compact_level0_phase1: remove final traces of value access mode config (#8935 ) refs https://github.com/neondatabase/neon/issues/8184 stacked atop https://github.com/neondatabase/neon/pull/8934 This PR changes from ignoring the config field to rejecting configs that contain it. PR https://github.com/neondatabase/infra/pull/1903 removes the field usage from `pageserver.toml`. It rolls into prod sooner or in the same release as this PR.	2024-09-23 15:05:22 +02:00
Christian Schwarz	59b4c2eaf9	walredo: add a ping method (#8952 ) Not used in production, but in benchmarks, to demonstrate minimal RTT. (It would be nice to not have to copy the 8KiB of zeroes, but, that would require larger protocol changes). Found this useful in investigation https://github.com/neondatabase/neon/pull/8952.	2024-09-23 10:19:37 +00:00
Vlad Lazar	5432155b0d	storcon: update compute hook state on detach (#9045 ) ## Problem Previously, the storage controller may send compute notifications containing stale pageservers (i.e. pageserver serving the shard was detached). This happened because detaches did not update the compute hook state. ## Summary of Changes Update compute hook state on shard detach. Fixes #8928	2024-09-23 10:05:02 +01:00
Heikki Linnakangas	e16e82749f	Remove unused crates from workspace Cargo.toml These were not referenced in any of the other Cargo.toml files in the workspace. They were not being built because of that, so there was little harm in having them listed, but let's be tidy.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	9f653893b9	Update a few dependencies, removing some indirect dependencies cargo update ciborium iana-time-zone lazy_static schannel uuid cargo update hyper@0.14 cargo update --precise 2.9.7 ureq It might be worthwhile just update all our dependencies at some point, but this is aimed at pruning the dependency tree, to make the build a little faster. That's also why I didn't update ureq to the latest version: that would've added a dependency to yet another version of rustls.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	913af44219	Update "memoffset" crate To eliminate one version of it from our dependency tree.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	ecd615ab6d	Update "hostname" crate We were already building v0.4.0 as an indirect dependency, so this avoids having to build two different versions of it.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	c9b2ec9ff1	Check submodule forward progress (#8949 ) We frequently mess up our submodule references. This adds one safeguard: it checks that the submodule references are only updated "forwards", not to some older commit, or a commit that's not a descended of the previous one. As next step, I'm thinking that we should automate things so that when you merge a PR to the 'neon' repository that updates the submodule references, the REL_*_STABLE_neon branches are automatically updated to match the submodule references. That way, you never need to manually merge PRs in the postgres repository, it's all triggered from commits in the 'neon' repository. But that's not included here.	2024-09-22 21:46:53 +03:00
Arpad Müller	a3800dcb0c	Move load_timeline_metadata into separate function (#9080 ) Moves the per-timeline code to load timeline metadata into a new dedicated function called `load_timeline_metadata`. The old `load_timeline_metadata` becomes `load_timelines_metadata`. Split out of #8907 Part of #8088	2024-09-21 12:36:41 +00:00
Heikki Linnakangas	9a32aa828d	Fix init of WAL page header at startup (#8914 ) If the primary is started at an LSN within the first of a 16 MB WAL segment, the "long XLOG page header" at the beginning of the segment was not initialized correctly. That has gone unnnoticed, because under normal circumstances, nothing looks at the page header. The WAL that is streamed to the safekeepers starts at the new record's LSN, not at the beginning of the page, so that bogus page header didn't propagate elsewhere, and a primary server doesn't normally read the WAL its written. Which is good because the contents of the page would be bogus anyway, as it wouldn't contain any of the records before the LSN where the new record is written. Except that in the following cases a primary does read its own WAL: 1. When there are two-phase transactions in prepared state at checkpoint. The checkpointer reads the two-phase state from the XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/. 2. Logical decoding reads the WAL starting from the replication slot's restart LSN. This PR fixes the problem with two-phase transactions. For that, it's sufficient to initialize the page header correctly. The checkpointer only needs to read XLOG_XACT_PREPARE records that were generated after the server startup, so it's still OK that older WAL is missing / bogus. I have not investigated if we have a problem with logical decoding, however. Let's deal with that separately. Special thanks to @Lzjing-1997, who independently found the same bug and opened a PR to fix it, although I did not use that PR.	2024-09-21 04:00:38 +03:00
Anastasia Lubennikova	f03f7b3868	Bump vendor/postgres to include extension path fix (#9076 ) This is a pre requisite for https://github.com/neondatabase/neon/pull/8681	2024-09-20 20:24:40 +03:00
Christian Schwarz	ec5dce04eb	pageserver: throttling: per-tenant metrics + more metrics to help understand throttle queue depth (#9077 )	2024-09-20 16:48:26 +00:00
John Spray	6014f15157	pageserver: suppress noisy "layer became visible" logs (#9064 ) ## Problem When layer visibility was added, an info log was included for the situation where actual access to a layer disagrees with the visibility calculation. This situation is safe, but I was interested in seeing when it happens. The log is pretty high volume, so this PR refines it to fire less often. ## Summary of changes - For cases where accessing non-visible layers is normal, don't log at all. - Extend a unit test to increase confidence that the updates to visibility on access are working as expected - During compaction, only call the visibility calculation routine if some image layers were created: previously, frequent calls resulted in the visibility of layers getting reset every time we passed through create_image_layers.	2024-09-20 16:07:09 +00:00
Conrad Ludgate	e675a21346	utils: leaky bucket should only report throttled if the notify queue is blocked on sleep (#9072 ) ## Problem Seems that PS might be too eager in reporting throttled tasks ## Summary of changes Introduce a sleep counter. If the sleep counter increases, then the acquire tasks was throttled.	2024-09-20 16:09:39 +01:00
Alex Chi Z.	6b93230270	fix(pageserver): receive body error now 500 (#9052 ) close https://github.com/neondatabase/neon/issues/8903 In https://github.com/neondatabase/neon/issues/8903 we observed JSON decoding error to have the following error message in the log: ``` Error processing HTTP request: Resource temporarily unavailable: 3956 (pageserver-6.ap-southeast-1.aws.neon.tech) error receiving body: error decoding response body ``` This is hard to understand. In this patch, we make the error message more reasonable. ## Summary of changes * receive body error is now an internal server error, passthrough the `reqwest::Error` (only decoding error) as `anyhow::Error`. * instead of formatting the error using `to_string`, we use the alternative `anyhow::Error` formatting, so that it prints out the cause of the error (i.e., what exactly cannot serde decode). I would expect seeing something like `error receiving body: error decoding response body: XXX field not found` after this patch, though I didn't set up a testing environment to observe the exact behavior. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-20 10:37:28 -04:00
Heikki Linnakangas	797aa4ffaa	Skip running clippy in --release mode. (#9073 ) It's pretty expensive to run, and there is very little difference between debug and release builds that could lead to different clippy warnings. This is extracted from PR #8912. That PR wandered off into various improvements we could make, but we seem to have consensus on this part at least.	2024-09-20 17:22:58 +03:00
Christian Schwarz	c45b56e0bb	pageserver: add counters for started smgr/getpage requests (#9069 ) After this PR ``` curl localhost:9898/metrics \| grep smgr_ \| grep start ``` ``` pageserver_smgr_query_started_count{shard_id="0000",smgr_query_type="get_page_at_lsn",tenant_id="...",timeline_id="..."} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_db_size"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_page_at_lsn"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_rel_exists"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_rel_size"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_slru_segment"} 0 ``` We instantiate the per-tenant counter only for `get_page_at_lsn`.	2024-09-20 14:55:50 +01:00
Alexander Bayandin	3104f0f250	Safekeeper: fix OpenAPI spec (#9066 ) ## Problem Safekeeper's OpenAPI spec is incorrect: ``` Semantic error at paths./v1/tenant/{tenant_id}/timeline/{timeline_id}.get.responses.404.content.application/json.schema.$ref $refs must reference a valid location in the document Jump to line 126 ``` Checked on https://editor.swagger.io ## Summary of changes - Add `NotFoundError` - Add `description` and `license` fields to make Cloud OpenAPI spec linter happy	2024-09-20 12:00:05 +01:00
Arseny Sher	f2c08195f0	Bump vendor/postgres. Includes PRs: - ERROR out instead of segfaulting when walsender slots are full. - logical worker: respond to publisher even under dense stream.	2024-09-20 12:38:42 +03:00
Alex Chi Z.	d0cbfda15c	refactor(pageserver): check layer map valid in one place (#9051 ) We have 3 places where we implement layer map checks. ## Summary of changes Now we have a single check function being called in all places. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-19 20:29:28 +00:00
Yuchen Liang	1708743e78	pageserver: wait for lsn lease duration after transition into AttachedSingle (#9024 ) Part of #7497, closes https://github.com/neondatabase/neon/issues/8890. ## Problem Since leases are in-memory objects, we need to take special care of them after pageserver restarts and while doing a live migration. The approach we took for pageserver restart is to wait for at least lease duration before doing first GC. We want to do the same for live migration. Since we do not do any GC when a tenant is in `AttachedStale` or `AttachedMulti` mode, only the transition from `AttachedMulti` to `AttachedSingle` requires this treatment. ## Summary of changes - Added `lsn_lease_deadline` field in `GcBlock::reasons`: the tenant is temporarily blocked from GC until we reach the deadline. This information does not persist to S3. - In `GCBlock::start`, skip the GC iteration if we are blocked by the lsn lease deadline. - In `TenantManager::upsert_location`, set the lsn_lease_deadline to `Instant::now() + lsn_lease_length` so the granted leases have a chance to be renewed before we run GC for the first time after transitioned from AttachedMulti to AttachedSingle. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-09-19 17:27:10 +01:00
Conrad Ludgate	0a1ca7670c	proxy: remove auth info from http conn info & fixup jwt api trait (#9047 ) misc changes split out from #8855 - allow cloning the request context in a read-only fashion for background tasks - propagate endpoint and request context through the jwk cache - only allow password based auth for md5 during testing - remove auth info from conn info	2024-09-19 15:09:30 +00:00
Alex Chi Z.	ff9f065c43	impr(pageserver): log image layer creation (#9050 ) https://github.com/neondatabase/neon/pull/9028 changed the image layer creation log into trace level. However, I personally find logging image layer creation useful when reading the logs -- it makes it clear that the image layer creation is happening and gives a clear idea of the progress. Therefore, I propose to continue logging them for create_image_layers set of functions. ## Summary of changes * Add info logging for all image layers created in legacy compaction. * Add info logging for all layers creation in testing functions. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-19 10:43:12 -04:00
Vlad Lazar	21eeafaaa5	pageserver: simple fix for vectored read image layer skip (#9026 ) ## Problem Different keyspaces may require different floor LSNs in vectored delta layer visits. This patch adds support for such cases. ## Summary of changes Different keyspaces wishing to read the same layer might require different stop lsns (or lsn floor). The start LSN of the read (or the lsn ceil) will always be the same. With this observation, we fix skipping of image layers by indexing the fringe by layer id plus lsn floor. This is very simple, but means that we can visit delta layers twice in certain cases. Still, I think it's very unlikely for any extra merging to have taken place in this case, so perhaps it makes sense to go with the simpler patch. Fixes https://github.com/neondatabase/neon/issues/9012 Alternative to https://github.com/neondatabase/neon/pull/9025	2024-09-19 14:51:00 +01:00
Arseny Sher	32a0e759bd	safekeeper: add wal_last_modified to debug_dump. Adds to debug_dump option to include highest modified time among all WAL segments. In passing replace some str with OsStr to have less unwraps.	2024-09-19 16:17:25 +03:00
Heikki Linnakangas	7c489092b7	Remove unused duplicate DEFAULT_INGEST_BATCH_SIZE constant This constant in 'tenant_conf_defaults' was unused, but there's another constant with the same name in the global 'defaults'. I wish the setting was configurable per-tenant, but it isn't, so let's remove the confusing duplicate.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	06d55a3b12	Clean up concurrent logical size calc semaphore initialization The DEFAULT_CONCURRENT_TENANT_SIZE_LOGICAL_SIZE_QUERIES constant was unused, because we had just hardcoded it to 1 where the constant should've been used. Remove the ConfigurableSemaphore::Default implementation, since it was unused.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	5c68e6a172	Remove unused constant The code that used it was removed in commit `b9d2c7bdd5`	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	2753abc0d8	Remove leftover enums for configuring vectored get implementation The settings were removed in commit corb9d2c7b.	2024-09-19 15:41:35 +03:00
Conrad Ludgate	9c23333cb3	Merge pull request #9056 from neondatabase/rc/proxy/2024-09-19 Proxy release 2024-09-19	2024-09-19 10:41:17 +01:00
Heikki Linnakangas	a523548ed1	Remove unused cleanup_remaining_timeline_fs_traces function There's some more code that still checks for uninit and delete markers, see callers of is_delete_mark and is_uninit_mark, and github issue #5718. But these functions were outright dead.	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	2d4e5af18b	Remove unused code for parsing a postgresql.conf file	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	5da2340e74	Remove misc dead code in control_plane/	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	7b34c2d7af	Remove misc dead code in libs/	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	15ae1fc3df	Remove a few postgres constants that were not used Dead code is generally useless, but with Postgres constants in particular, I'm also worried that if they're not used anywhere, we might fail to update them at a Postgres version update, and get very confused later when they have wrong values.	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	728b79b9dd	Remove some unnecessary derives	2024-09-19 11:57:10 +03:00
Alex Chi Z.	9d1c6f23d3	fix(storage-scrubber): log version after initialize the logger (#9049 ) When I checked the log in Grafana I couldn't find the scrubber version. Then I realized that it should be logged after the logger gets initialized. ## Summary of changes Log after initializing the logger for the scrubber. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-18 14:13:57 -04:00
Christian Schwarz	035a49a6b2	`neon_local start`: parallel startup to break cyclic dependency (#8950 ) (Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.) Problem ------- Before this PR, `neon_local` sequentially does the following: 1. launch storcon process 2. wait for storcon to signal readiness [here](`75310fe441/control_plane/src/storage_controller.rs (L804-L808)`) 3. start pageserver 4. wait for pageserver to become ready [here](`c43e664ff5/control_plane/src/pageserver.rs (L343-L346)`) 5. etc The problem is that storcon's readiness waits for the [`startup_reconcile`](`cbcd4058ed/storage_controller/src/service.rs (L520-L523)`) to complete. But pageservers aren't started at this point. So, worst case we wait for `STARTUP_RECONCILE_TIMEOUT/2`, i.e., 15s. This is more than the 10s default timeout allowed by neon_local. So, the result is that `neon_local start` fails to start storcon and stops everything. Solution -------- In this PR I choose the the radical solution to start everything in parallel. It junks up the output because we do stuff like `print!(".")` to indicate progress. We should just abandon that. And switch to `utils::logging` + `tracing` with separate spans for each component. I can do that in this PR or we leave it as a follow-up. Alternatives Considered ----------------------- The Pageserver's `/v1/status` or in fact any endpoint of the mgmt API will not `accept()` on the mgmt API socket until after the `re-attach` call to storcon returned success. So, it's insufficient to change the startup order to start Pageservers first. We cannot easily change Pageserver startup order because `init_tenant_mgr` must complete before we start serving the mgmt API. Otherwise tenant detach calls et al can race with `init_tenant_mgr`. We'd have to add a "loading" state to tenant mgr and make all API endpoints except `/v1/status` wait for _that_ to complete. Related ------- - https://github.com/neondatabase/neon/pull/6475	2024-09-18 18:17:55 +02:00
Folke Behrens	794bd4b866	proxy: mock cplane usable without allowed-ips table (#9046 )	2024-09-18 17:14:53 +02:00
Alexander Bayandin	ac6a1151ae	test_postgres_version: reenable version check for prereleased versions	2024-09-18 14:51:59 +01:00
Tristan Partin	2f37f0384c	Add v17 to revisions.json Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-18 14:51:59 +01:00
Alexander Bayandin	e161a2fa42	CI(deploy): fix deploy to staging and prod (#9030 ) ## Problem It turns out the previous approach (with `skip_if` input) doesn't work (from https://github.com/neondatabase/neon/pull/9017). Revert it and use more straightforward if-conditions ## Summary of changes - Revert `efbe8db7f1` - Add if-condition to`promote-compatibility-data` job and relevant comments	2024-09-18 14:26:47 +01:00
Folke Behrens	c5cd8577ff	proxy: make sql-over-http max request/response sizes configurable (#9029 )	2024-09-18 13:58:51 +02:00
Heikki Linnakangas	3454ef7507	Refactor ImageLayerWriter to avoid passing a Timeline to finish() (#9028 ) Commit `ca5390a89d` made a similar change to DeltaLayerWriter. We bumped into this with Stas with our hackathon project, to create a standalong program to create image layers directly from a Postgres data directory. It needs to create image layers without having a Timeline and other pageserver machinery. This downgrades the "created image layer {}" message from INFO to TRACE level. TRACE is used for the corresponding message on delta layer creation too. The path logged in the message is now the temporary path, before the file is renamed to its final name. Again commit `ca5390a89d` made the same change for the message on delta layer creation.	2024-09-18 13:16:51 +03:00
Christian Schwarz	135e7e4306	add `neon_local` subcommand for the broker & use that from regression tests (#8948 ) There's currently no way to just start/stop broker from `neon_local`. This PR * adds a sub-command * uses that sub-command from the test suite instead of the pre-existing Python `subprocess` based approach. Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.	2024-09-18 09:10:27 +02:00
Christian Schwarz	3cd2a3f931	refactor(walredo): process launch & kill-on-error machinery (#8951 ) Immediate benefit: easier to spot what's going on. Later benefit: use the extracted method in PR - https://github.com/neondatabase/neon/pull/8952 which adds a `ping` command to walredo. Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.	2024-09-17 19:16:33 +00:00
Alexander Bayandin	d78f5ce6da	CI: don't fetch the whole git history if it's not required (#9021 ) ## Problem We do use `actions/checkout` with `fetch-depth: 0` when it's not required ## Summary of changes - Remove unneeded `fetch-depth: 0` - Add a comment if `fetch-depth: 0` is required	2024-09-17 18:40:05 +01:00
Arpad Müller	a1b71b73fe	Rename some S3 usages to "remote storage" in exposed messages (#8999 ) In exposed messages like log messages we mentioned "S3", which is not entirely accurate as we support Azure blob storage now as well.	2024-09-17 19:15:01 +02:00
Tristan Partin	6138eb50e9	Fix test code related to migrations We added another migration in `5876c441ab`, but didn't bump this value. This had no effect, but best to fix it anyway. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-17 15:56:05 +01:00
Heikki Linnakangas	d211f00f05	Remove unnecessary dependencies (#9000 ) Found by "cargo machete"	2024-09-17 17:55:45 +03:00
Alexander Bayandin	cd4276fd65	CI: fix release pipeline (#9017 ) ## Problem We've got 2 non-blocking failures on the release pipeline: - `promote-compatibility-data` job got skipped _presumably_ because one of the dependencies of `deploy` job (`push-to-acr-dev`) got skipped (https://github.com/neondatabase/neon/pull/8940) - `coverage-report` job fails because we don't build debug artifacts in the release branch (https://github.com/neondatabase/neon/pull/8561) ## Summary of changes - Always run `push-to-acr-dev` / `push-to-acr-prod` jobs, but add `skip_if` parameter to the reusable workflow, which can skip the job internally, without skipping externally - Do not run `coverage-report` on release branches	2024-09-17 10:17:48 +01:00
Vlad Lazar	b719d58863	storcon: forward requests from stepped down instance to the current leader (#8954 ) ## Problem It turns out that we can't rely on external orchestration to promptly route trafic to the new leader. This is downtime inducing. Forwarding provides a safe way out. ## Safety We forward when: 1. Request is not one of ["/control/v1/step_down", "/status", "/ready", "/metrics"] 2. Current instance is in [`LeadershipStatus::SteppedDown`] state 3. There is a leader in the database to forward to 4. Leader from step (3) is not the current instance If a storcon instance is persisted in the database, then we know that it is the current leader. There's one exception: time between handling step-down request and the new leader updating the database. Let's treat the happy case first. The stepped down node does not produce any side effects, since all request handling happens on the leader. As for the edge case, we are guaranteed to always have a maximum of two running instances. Hence, if we are in the edge case scenario the leader persisted in the database is the stepped down instance that received the request. Condition (4) above covers this scenario. ## Summary of changes * Conversion utilities for reqwest <-> hyper. I'm not happy with these, but I don't see a better way. Open to suggestions. * Add request forwarding logic * Update each request handler. Again, not happy with this. If anyone knows a nice to wrap the handlers, lmk. Me and Joonas tried :/ * Update each handler to maybe forward * Tweak tests to showcase new behaviour	2024-09-17 09:25:42 +01:00
Heikki Linnakangas	2db840d8b8	Move a few test functions related to auth tokens to separate file (#9018 ) For readability. neon_fixtures.py is huge.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	4295ff0f07	Mark a couple of test fixtures as session-scoped (#9018 ) pg_distrib_dir doesn't include the Postgres version and only depends on env variables which cannot change during a test run, so it can be marked as session-scoped. Similarly, the platform cannot change during a test run.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	c6f56b8462	Remove redundant get_dir_size() function (#9018 ) There was another copy of it in utils.py. The only difference is that the version in utils.py tolerates files that are concurrently removed. That seems fine for the few callers in neon_fixtures.py too.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	fec9321fc0	Use Path type in a few more places in neon_fixtures.py (#9018 ) This is in preparation of replacing neon_fixtures.get_dir_size with neon_fixtures.utils.get_dir_size() in next commit.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	3a52e356c1	Remove unused function (#9018 )	2024-09-17 06:53:18 +03:00
Tristan Partin	5e16c7bb0b	Generate pgbench data on the server for most tests This should generally be faster when running tests, especially those that run with higher scales. Ignoring test_lfc_resize since it seems like we are hitting a query timeout for some reason that I have yet to investigate. A little bit of improvemnt is better than none. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-16 23:37:36 +01:00
Heikki Linnakangas	2bbb4d3e1c	Remove misc unused code (#9014 )	2024-09-16 18:45:19 +00:00
Matthias van de Meent	c8bedca582	Fix PG17's extension modifications (#9010 ) This also reduces the GRANT statements to one per created _reset function	2024-09-16 17:06:31 +01:00
Tristan Partin	5876c441ab	Grant access to pg_show_replication_origin_status for neon_superuser Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-16 16:38:55 +01:00
Alexander Bayandin	b2c83db54d	CI(gather-rust-build-stats): set PQ_LIB_DIR to Postgres 17 (#9001 ) ## Problem `gather-rust-build-stats` extra CI job fails with ``` "PQ_LIB_DIR" doesn't exist in the configured path: "/__w/neon/neon/pg_install/v16/lib" ``` ## Summary of changes - Use the path to Postgres 17 for the `gather-rust-build-stats` job. The job uses Postgres built by `make walproposer-lib`	2024-09-16 12:44:26 +01:00
Matthias van de Meent	0a8c5e1214	Fix broken image for PG17 (#8998 ) Most extensions are not required to run Neon-based PostgreSQL, but the Neon extension is _quite_ critical, so let's make sure we include it. ## Problem Staging doesn't have working compute images for PG17 ## Summary of changes Disable some PG17 filters so that we get the critical components into the PG17 image	2024-09-13 15:10:52 +01:00
Matthias van de Meent	78938d1b59	[compute/postgres] feature: PostgreSQL 17 (#8573 ) This adds preliminary PG17 support to Neon, based on RC1 / 2024-09-04 `07b828e9d4` NOTICE: The data produced by the included version of the PostgreSQL fork may not be compatible with the future full release of PostgreSQL 17 due to expected or unexpected future changes in magic numbers and internals. DO NOT EXPECT DATA IN V17-TENANTS TO BE COMPATIBLE WITH THE 17.0 RELEASE! Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-09-12 23:18:41 +01:00
Stefan Radig	fcab61bdcd	Prototype implementation for private access poc (#8976 ) ## Problem For the Private Access POC we want users to be able to disable access from the public proxy. To limit the number of changes this can be done by configuring an IP allowlist [ "255.255.255.255" ]. For the Private Access proxy a new commandline flag allows to disable IP allowlist completely. See https://www.notion.so/neondatabase/Neon-Private-Access-POC-Proposal-8f707754e1ab4190ad5709da7832f020?d=887495c15e884aa4973f973a8a0a582a#7ac6ec249b524a74adbeddc4b84b8f5f for details about the POC., ## Summary of changes - Adding the commandline flag is_private_access_proxy=true will disable IP allowlist	2024-09-12 15:55:12 +01:00
Tristan Partin	9e3ead3689	Collect the last of on-demand WAL download in CreateReplicationSlot reverts Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-12 11:31:38 +01:00
Heikki Linnakangas	8dc069037b	Remove NeonEnvBuilder.start() function It feels wrong to me to start() from the builder object. Surely the thing you start is the environment itself, not its configuration.	2024-09-12 01:28:56 +03:00
Heikki Linnakangas	0a363c3dce	Add --timeline-id option to "neon_local timeline branch" command Makes it consistent with the "timeline create" and "timeline import" commands, which allowed you to pass the timeline id as argument. This also makes it unnecessary to parse the timeline ID from the output in the python function that calls it.	2024-09-12 01:28:56 +03:00
Heikki Linnakangas	aeca15008c	Remove obsolete and misleading comment The tenant ID was not actually generated here but in NeonEnvBuilder. And the "neon_local init" command hasn't been able to generate the initial tenant since `8712e1899e` anyway.	2024-09-12 01:28:56 +03:00
Heikki Linnakangas	43846b72fa	Remove unused "neon_local init --pg-version" arg It has been unused since commit `8712e1899e`, when it stopped creating the initial timeline.	2024-09-12 01:28:56 +03:00
John Spray	cb060548fb	libs: tweak PageserverUtilization::is_overloaded (#8946 ) ## Problem Having run in production for a while, we see that nodes are generally safely oversubscribed by about a factor of 2. ## Summary of changes Tweak the is_overloaded method to check for utililzation over 200% rather than over 100%	2024-09-11 18:45:34 +01:00
Folke Behrens	bae793ffcd	proxy: Handle all let underscore instances (#8898 ) * Most can be simply replaced * One instance renamed to _rtchk (return-type check)	2024-09-10 15:36:08 +02:00
John Spray	26b5fcdc50	reinstate write-path key check (#8973 ) ## Problem In https://github.com/neondatabase/neon/pull/8621, validation of keys during ingest was removed because the places where we actually store keys are now past the point where we have already converted them to CompactKey (i128) representation. ## Summary of changes Reinstate validation at an earlier stage in ingest. This doesn't cover literally every place we write a key, but it covers most cases where we're trusting postgres to give us a valid key (i.e. one that doesn't try and use a custom spacenode).	2024-09-10 12:54:25 +01:00
Arpad Müller	97582178cb	Remove async_trait from the Handler trait (#8958 ) Newest attempt to remove `async_trait` from the Handler trait. Earlier attempts were in #7301 and #8296 .	2024-09-10 02:40:00 +02:00
Matthias van de Meent	842be0ba74	Specialize WalIngest on PostgreSQL version (#8904 ) The current code assumes that most of this functionality is version-independent, which is only true up to v16 - PostgreSQL 17 has a new field in CheckPoint that we need to keep track of. This basically removes the file-level dependency on v14, and replaces it with switches that load the correct version dependencies where required.	2024-09-09 23:01:52 +01:00
Heikki Linnakangas	982b376ea2	Update parquet crate to a released version (#8961 ) PR #7782 set the dependency in Cargo.toml to 'master', and locked the version to commit that contained a specific fix, because we needed the fix before it was included in a versioned release. The fix was later included in parquet crate version 52.0.0, so we can now switch back to using a released version. The latest release is 53.0.0, switch straight to that. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-09-10 00:04:00 +03:00
Alex Chi Z.	e158df4e86	feat(pageserver): split delta writer automatically determines key range (#8850 ) close https://github.com/neondatabase/neon/issues/8838 ## Summary of changes This patch modifies the split delta layer writer to avoid taking start_key and end_key when creating/finishing the layer writer. The start_key for the delta layers will be the first key provided to the layer writer, and the end_key would be the `last_key.next()`. This simplifies the delta layer writer API. On that, the layer key hack is removed. Image layers now use the full key range, and delta layers use the first/last key provided by the user. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-09 22:03:27 +01:00
Heikki Linnakangas	723c0971e8	Don't create 'empty' branch in neon_simple_env (#8965 ) Now that we've given up hope on sharing the neon_simple_env between tests, there's no reason to not use the 'main' branch directly.	2024-09-09 12:38:34 +03:00
Heikki Linnakangas	c8f67eed8f	Remove TEST_SHARED_FIXTURES (#8965 ) I wish it worked, but it's been broken for a long time, so let's admit defeat and remove it. The idea of sharing the same pageserver and safekeeper environment between tests is still sound, and it could save a lot of time in our CI. We should perhaps put some time into doing that, but we're better off starting from scratch than trying to make TEST_SHARED_FIXTURES work in its current form.	2024-09-09 12:38:34 +03:00
Heikki Linnakangas	2d885ac07a	Update strum (#8962 ) I wanted to use some features from the newer version. The PR that needed the new version is not ready yet (and might never be), but seems nice to stay up in any case.	2024-09-08 21:47:57 +03:00
Heikki Linnakangas	89c5e80b3f	Update toml and toml_edit crates (#8963 ) Eliminates a few duplicate versions from the dependency tree.	2024-09-08 21:47:23 +03:00
Heikki Linnakangas	93ec7503e0	Lock the correct revision of rust-postgres crates (#8960 ) We modified the crate in an incompatible way and upgraded to the new version in PR #8076. However, it was reverted in #8654. The revert reverted the Cargo.lock reference to it, but since Cargo.toml still points to the (tip of the) 'neon' branch, every time you make any other unrelated changes to Cargo.toml, it also tries to update the rust-postgres crates to the tip of the 'neon' branch again, which doesn't work. To fix, lock the crates to the exact commit SHA that works.	2024-09-07 14:11:36 +01:00
Alexander Bayandin	7d7d1f354b	Fix rust warnings on macOS (#8955 ) ## Problem ``` error: unused import: `anyhow::Context` --> libs/utils/src/crashsafe.rs:8:5 \| 8 \| use anyhow::Context; \| ^^^^^^^^^^^^^^^ \| = note: `-D unused-imports` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_imports)]` error: unused variable: `fd` --> libs/utils/src/crashsafe.rs:209:15 \| 209 \| pub fn syncfs(fd: impl AsRawFd) -> anyhow::Result<()> { \| ^^ help: if this is intentional, prefix it with an underscore: `_fd` \| = note: `-D unused-variables` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_variables)]` ``` ## Summary of changes - Fix rust warnings on macOS	2024-09-07 08:17:25 +01:00
Cihan Demirci	16c200d6d9	push images to prod ACR (#8940 ) Used `vars` for new storing non-sensitive information, changed dev secrets to vars as well but didn't cleanup any secrets. https://github.com/neondatabase/cloud/issues/16925 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-07 00:20:36 +01:00
Joonas Koivunen	3dbd34aa78	feat(storcon): forward gc blocking and unblocking (#8956 ) Currently using gc blocking and unblocking with storage controller managed pageservers is painful. Implement the API on storage controller. Fixes: #8893	2024-09-06 22:42:55 +01:00
Arpad Müller	fa3fc73c1b	Address 1.82 clippy lints (#8944 ) Addresses the clippy lints of the beta 1.82 toolchain. The `too_long_first_doc_paragraph` lint complained a lot and was addressed separately: #8941	2024-09-06 21:05:18 +02:00
Alex Chi Z.	ac5815b594	feat(storage-controller): add node shards api (#8896 ) For control-plane managed tenants, we have the page in the admin console that lists all tenants on a specific pageserver. But for storage-controller managed ones, we don't have that functionality for now. ## Summary of changes Adds an API that lists all shards on a given node (intention + observed) --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-06 14:14:21 -04:00
Alexander Bayandin	30583cb626	CI(label-for-external-users): add retry logic for unexpected errors (#8938 ) ## Problem One of the PRs opened by a `neondatabase` org member got labelled as `external` because the `gh api` call failed in the wrong way: ``` Get "https://api.github.com/orgs/neondatabase/members/<username>": dial tcp 140.82.114.5:443: i/o timeout is-member=false ``` ## Summary of changes - Check that the error message is expected before labelling PRs - Retry `gh api` call for 10 times in case of unexpected error messages - Add `workflow_dispatch` trigger	2024-09-06 17:42:35 +01:00
Arseny Sher	c1a51416db	safekeeper: fsync filesystem on start. We can't really rely on files contents after boot without fsync'ing them.	2024-09-06 19:14:25 +03:00
Arseny Sher	8eab7009c1	safekeeper: do pid file lock before id init	2024-09-06 19:14:25 +03:00
Arseny Sher	11cf16e3f3	safekeeper: add term_bump endpoint. When walproposer observes now higher term it restarts instead of crashing whole compute with PANIC; this avoids compute crash after term_bump call. After successfull election we're still checking last_log_term of the highest given vote to ensure basebackup is good, and PANIC otherwise. It will be used for migration per 035-safekeeper-dynamic-membership-change.md and https://github.com/neondatabase/docs/pull/21 ref https://github.com/neondatabase/neon/issues/8700	2024-09-06 19:13:50 +03:00
Folke Behrens	af6f63617e	proxy: clean up code and lints for 1.81 and 1.82 (#8945 )	2024-09-06 17:13:30 +02:00
Arseny Sher	e287f36a05	safekeeper: fix endpoint restart immediately after xlog switch. Check that truncation point is not from the future by comparing it with write_record_lsn, not write_lsn, and explain that xlog switch changes their normal order. ref https://github.com/neondatabase/neon/issues/8911	2024-09-06 18:09:21 +03:00
Arpad Müller	cbcd4058ed	Fix 1.82 clippy lint too_long_first_doc_paragraph (#8941 ) Addresses the 1.82 beta clippy lint `too_long_first_doc_paragraph` by adding newlines to the first sentence if it is short enough, and making a short first sentence if there is the need.	2024-09-06 14:33:52 +02:00
Vlad Lazar	e86fef05dd	storcon: track preferred AZ for each tenant shard (#8937 ) ## Problem We want to do AZ aware scheduling, but don't have enough metadata. ## Summary of changes Introduce a `preferred_az_id` concept for each managed tenant shard. In a future PR, the scheduler will use this as a soft preference. The idea is to try and keep the shard attachments within the same AZ. Under the assumption that the compute was placed in the correct AZ, this reduces the chances of cross AZ trafic from between compute and PS. In terms of code changes we: 1. Add a new nullable `preferred_az_id` column to the `tenant_shards` table. Also include an in-memory counterpart. 2. Populate the preferred az on tenant creation and shard splits. 3. Add an endpoint which allows to bulk-set preferred AZs. (3) gives us the migration path. I'll write a script which queries the cplane db in the region and sets the preferred az of all shards with an active compute to the AZ of said compute. For shards without an active compute, I'll use the AZ of the currently attached pageserver since this is what cplane uses now to schedule computes.	2024-09-06 13:11:17 +01:00
Arpad Müller	a1323231bc	Update Rust to 1.81.0 (#8939 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1810-2024-09-05). Prior update was in #8667 and #8518	2024-09-06 12:40:19 +02:00
Christian Schwarz	06e840b884	compact_level0_phase1: ignore access mode config, always do streaming-kmerge without validation (#8934 ) refs https://github.com/neondatabase/neon/issues/8184 PR https://github.com/neondatabase/infra/pull/1905 enabled streaming-kmerge without validation everywhere. It rolls into prod sooner or in the same release as this PR.	2024-09-06 10:58:48 +02:00
Christian Schwarz	cf11c8ab6a	update svg_fmt to 0.4.3 (#8930 ) Audited ``` diff -r -u ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/svg_fmt-0.4.{2,3} ``` fixes https://github.com/neondatabase/neon/issues/7763	2024-09-06 10:52:29 +02:00
Vlad Lazar	04f99a87bf	storcon: make pageserver AZ id mandatory (#8856 ) ## Problem https://github.com/neondatabase/neon/pull/8852 introduced a new nullable column for the `nodes` table: `availability_zone_id` ## Summary of changes * Make neon local and the test suite always provide an az id * Make the az id field in the ps registration request mandatory * Migrate the column to non-nullable and adjust in memory state accordingly * Remove the code that was used to populate the az id for pre-existing nodes	2024-09-05 19:14:21 +01:00
Stefan Radig	fd12dd942f	Add installation instructions for m4 on mac (#8929 ) ## Problem Building on MacOS failed due to missing m4. Although a window was popping up claiming to install m4, this was not helping. ## Summary of changes Add instructions to install m4 using brew and link it (thanks to Folke for helping).	2024-09-05 17:48:51 +02:00
vladov	ebddda5b7f	Fix precedence issue causing yielding loop to never yield. (#8922 ) There is a bug in `yielding_loop` that causes it to never yield. ## Summary of changes Fixed the bug. `i + 1 % interval == 0` will always evaluate to `i + 1 == 0` which is false ([Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=68e6ca393a02113cb7720115c2842e75)). This function is called in 2 places [here](`99fa1c3600/pageserver/src/tenant/secondary/scheduler.rs (L389)`) and [here](`99fa1c3600/pageserver/src/tenant/secondary/heatmap_uploader.rs (L152)`) with `interval == 1000` in both cases. This may change the performance of the system since now we are yielding to tokio. Also, this may expose undefined behavior since it is now possible for tasks to be moved between threads/whatever tokio does to tasks. However, this was the intention of the author of the code.	2024-09-05 11:06:57 -04:00
Joonas Koivunen	efe03d5a1c	build: sync between benchies (#8919 ) Sometimes, the benchmarks fail to start up pageserver in 10s without any obvious reason. Benchmarks run sequentially on otherwise idle runners. Try running `sync(2)` after each bench to force a cleaner slate. Implement this via: - SYNC_AFTER_EACH_TEST environment variable enabled autouse fixture - autouse fixture seems to be outermost fixture, so it works as expected - set SYNC_AFTER_EACH_TEST=true for benchmarks in build_and_test workflow Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10678984691/index.html#suites/5008d72a1ba3c0d618a030a938fc035c/1210266507534c0f/ --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-05 14:29:48 +01:00
Christian Schwarz	850421ec06	refactor(pageserver): rely on serde derive for toml deserialization (#7656 ) This PR simplifies the pageserver configuration parsing as follows: * introduce the `pageserver_api::config::ConfigToml` type * implement `Default` for `ConfigToml` * use serde derive to do the brain-dead leg-work of processing the toml document * use `serde(default)` to fill in default values * in `pageserver` crate: * use `toml_edit` to deserialize the pageserver.toml string into a `ConfigToml` * `PageServerConfig::parse_and_validate` then * consumes the `ConfigToml` * destructures it exhaustively into its constituent fields * constructs the `PageServerConfig` The rules are: * in `ConfigToml`, use `deny_unknown_fields` everywhere * static default values go in `pageserver_api` * if there cannot be a static default value (e.g. which default IO engine to use, because it depends on the runtime), make the field in `ConfigToml` an `Option` * if runtime-augmentation of a value is needed, do that in `parse_and_validate` * a good example is `virtual_file_io_engine` or `l0_flush`, both of which need to execute code to determine the effective value in `PageServerConf` The benefits: * massive amount of brain-dead repetitive code can be deleted * "unused variable" compile-time errors when removing a config value, due to the exhaustive destructuring in `parse_and_validate` * compile-time errors guide you when adding a new config field Drawbacks: * serde derive is sometimes a bit too magical * `deny_unknown_fields` is easy to miss Future Work / Benefits: * make `neon_local` use `pageserver_api` to construct `ConfigToml` and write it to `pageserver.toml` * This provides more type safety / coompile-time errors than the current approach. ### Refs Fixes #3682 ### Future Work * `remote_storage` deser doesn't reject unknown fields https://github.com/neondatabase/neon/issues/8915 * clean up `libs/pageserver_api/src/config.rs` further * break up into multiple files, at least for tenant config * move `models` as appropriate / refine distinction between config and API models / be explicit about when it's the same * use `pub(crate)` visibility on `mod defaults` to detect stale values	2024-09-05 14:59:49 +02:00
Folke Behrens	6dfbf49128	proxy: don't let one timeout eat entire retry budget (#8924 ) This reduces the per-request timeout to 10sec while keeping the total retry duration at 1min. Relates: neondatabase/cloud#15944	2024-09-05 13:34:27 +02:00
Vlad Lazar	708322ce3c	storcon: handle fills including high tput tenants more gracefully (#8865 ) ## Problem A tenant may ingest a lot of data between being drained for node restart and being moved back in the fill phase. This is expensive and causes the fill to stall. ## Summary of changes We make a tactical change to reduce secondary warm-up time for migrations in fills.	2024-09-05 09:56:26 +01:00
Alex Chi Z.	99fa1c3600	fix(pageserver): more information on aux v1 warnings (#8906 ) Part of https://github.com/neondatabase/neon/issues/8623 ## Summary of changes It seems that we have tenants with aux policy set to v1 but don't have any aux files in the storage. It is still safe to force migrate them without notifying the customers. This patch adds more details to the warning to identify the cases where we have to reach out to the users before retiring aux v1. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-04 21:45:04 +01:00
Heikki Linnakangas	0205ce1849	Update submodule reference for vendor/postgres-v14 (#8913 ) There was a confusion on the REL_14_STABLE_neon branch. PR https://github.com/neondatabase/postgres/pull/471 was merged ot the branch, but the corresponding PRs on the other REL_15_STABLE_neon and REL_16_STABLE_neon branches were not merged. Also, the submodule reference in the neon repository was never updated, so even though the REL_14_STABLE_neon branch contained the commit, it was never used. That PR https://github.com/neondatabase/postgres/pull/471 was a few bricks shy of a load (no tests, some differences between the different branches), so to get us to a good state, revert that change from the REL_14_STABLE_neon branch. This PR in the neon repository updates the submodule reference past two commites on the REL_14_STABLE_neon branch: first the commit from PR https://github.com/neondatabase/postgres/pull/471, and immediately after that the revert of the same commit. This brings us back to square one, but now the submodule reference matches the tip of the REL_14_STABLE_neon branch again.	2024-09-04 15:41:51 +01:00
John Spray	1a9b54f1d9	storage controller: read from database in validate API (#8784 ) ## Problem The initial implementation of the validate API treats the in-memory generations as authoritative. - This is true when only one storage controller is running, but if a rogue controller was running that hadn't been shut down properly, and some pageserver requests were routed to that bad controller, it could incorrectly return valid=true for stale generations. - The generation in the main in-memory map gets out of date while a live migration is in flight, and if the origin location for the migration tries to do some deletions even though it is in AttachedStale (for example because it had already started compaction), these might be wrongly validated + executed. ## Summary of changes - Continue to do the in-memory check: if this returns valid=false it is sufficient to reject requests. - When valid=true, do an additional read from the database to confirm the generation is fresh. - Revise behavior for validation on missing shards: this used to always return valid=true as a convenience for deletions and shard splits, so that pageservers weren't prevented from completing any enqueued deletions for these shards after they're gone. However, this becomes unsafe when we consider split brain scenarios. We could reinstate this in future if we wanted to store some tombstones for deleted shards. - Update test_scrubber_physical_gc to cope with the behavioral change: they must now explicitly flush the deletion queue before splits, to avoid tripping up on deletions that are enqueued at the time of the split (these tests assert "scrubber deletes nothing", which check fails if the split leaves behind some remote objects that are legitimately GC'able) - Add `test_storage_controller_validate_during_migration`, which uses failpoints to create a situation where incorrect generation validation during a live migration could result in a corruption The rate of validate calls for tenants is pretty low: it happens as a consequence deletions from GC and compaction, which are both concurrency-limited on the pageserver side.	2024-09-04 15:00:40 +01:00
dependabot[bot]	3f43823a9b	build(deps): bump cryptography from 42.0.4 to 43.0.1 (#8908 )	2024-09-04 13:41:10 +01:00
Heikki Linnakangas	a046717a24	Fix submodule refs to point to the correct REL_X_STABLE_neon branches (#8910 ) Commit `cfa45ff5ee` (PR #8860) updated the vendor/postgres submodules, but didn't use the same commit SHAs that were pushed as the corresponding REL__STABLE_neon branches in the postgres repository. The contents were the same, but the REL__STABLE_neon branches pointed to squashed versions of the commits, whereas the SHAs used in the submodules referred to the pre-squash revisions. Note: The vendor/postgres-v14 submodule still doesn't match with the tip of REL_14_STABLE_neon branch, because there has been one more commit on that branch since then. That's another confusion which we should fix, but let's do that separately. This commit doesn't change the code that gets built in any way, only changes the submodule references to point to the correct SHAs in the REL_*_STABLE_neon branch histories, rather than some detached commits.	2024-09-04 12:41:51 +01:00
Joonas Koivunen	7a1397cf37	storcon: boilerplate to upsert safekeeper records on deploy (#8879 ) We currently do not record safekeepers in the storage controller database. We want to migrate timelines across safekeepers eventually, so start recording the safekeepers on deploy. Cc: #8698	2024-09-04 10:10:05 +00:00
Vlad Lazar	75310fe441	storcon: make hb interval an argument and speed up tests (#8880 ) ## Problem Each test might wait for up to 5s in order to HB the pageserver. ## Summary of changes Make the heartbeat interval configurable and use a really tight one for neon local => startup quicker	2024-09-04 10:09:41 +01:00
Alex Chi Z.	ecfa3d9de9	fix(storage-scrubber): wrong trial condition (#8905 ) ref https://github.com/neondatabase/neon/issues/8872 ## Summary of changes We saw stuck storage scrubber in staging caused by infinite retries. I believe here we should use `min` instead of `max` to avoid getting minutes or hours of retry backoff. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-03 21:39:56 +00:00
Alex Chi Z.	3d9001d83f	fix(pageserver): is_archived should be optional (#8902 ) Set the field to optional, otherwise there will be decode errors when newer version of the storage controller receives the JSON from older version of the pageservers. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-03 14:05:06 -04:00
dependabot[bot]	1a874a3e86	build(deps): bump flask-cors from 4.0.1 to 5.0.0 (#8899 )	2024-09-03 17:31:42 +00:00
John Spray	c4fe6641c1	pageserver: separate metadata and data pages in DatadirModification (#8621 ) ## Problem Currently, DatadirModification keeps a key-indexed map of all pending writes, even though we (almost) never need to read back dirty pages for anything other than metadata pages (e.g. relation sizes). Related: https://github.com/neondatabase/neon/issues/6345 ## Summary of changes - commit() modifications before ingesting database creation wal records, so that they are guaranteed to be able to get() everything they need directly from the underlying Timeline. - Split dirty pages in DatadirModification into pending_metadata_pages and pending_data_pages. The data ones don't need to be in a key-addressable format, so they just go in a Vec instead. - Special case handling of zero-page writes in DatadirModification, putting them in a map which is flushed on the end of a WAL record. This handles the case where during ingest, we might first write a zero page, and then ingest a postgres write to that page. We used to do this via the key-indexed map of writes, but in this PR we change the data page write path to not bother indexing these by key. My least favorite thing about this PR is that I needed to change the DatadirModification interface to add the on_record_end call. This is not very invasive because there's really only one place we use it, but it changes the object's behaviour from being clearly an aggregation of many records to having some per-record state. I could avoid this by implicitly doing the work when someone calls set_lsn or commit -- I'm open to opinions on whether that's cleaner or dirtier. ## Performance There may be some efficiency improvement here, but the primary motivation is to enable an earlier stage of ingest to operate without access to a Timeline. The `pending_data_pages` part is the "fast path" bulk write data that can in principle be generated without a Timeline, in parallel with other ingest batches, and ultimately on the safekeeper. `test_bulk_insert` on AX102 shows approximately the same results as in the previous PR #8591: ``` ------------------------------ Benchmark results ------------------------------- test_bulk_insert[neon-release-pg16].insert: 23.577 s test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB test_bulk_insert[neon-release-pg16].peak_mem: 637 MB test_bulk_insert[neon-release-pg16].size: 0 MB test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB test_bulk_insert[neon-release-pg16].num_files_uploaded: 8 test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB test_bulk_insert[neon-release-pg16].wal_recovery: 18.264 s test_bulk_insert[neon-release-pg16].compaction: 0.052 s ```	2024-09-03 18:16:49 +01:00
Arseny Sher	c7187be8a1	safekeeper: check for non-consecutive writes in safekeeper.rs wal_storage.rs already checks this, but since this is a quite legit scenario check it at safekeeper.rs (consensus level) as well. ref https://github.com/neondatabase/neon/issues/8212 This is a take 2; previous PR #8640 had been reverted because interplay with another change broke test_last_log_term_switch.	2024-09-03 18:58:19 +03:00
Arseny Sher	83dd7f559c	safekeeper: more consistent task naming. Make all them snake case.	2024-09-03 17:21:36 +03:00
Arseny Sher	80512e2779	safekeeper: add endpoint resetting uploaded partial segment state. Endpoint implementation sends msg to manager requesting to do the reset. Manager stops current partial backup upload task if it exists and performs the reset. Also slightly tweak eviction condition: all full segments before flush_lsn must be uploaded (and committed) and there must be only one segment left on disk (partial). This allows to evict timelines which started not on the first segment and didn't fill the whole segment (previous condition wasn't good because last_removed_segno was 0). ref https://github.com/neondatabase/neon/issues/8759	2024-09-03 17:21:36 +03:00
Arseny Sher	3916810f20	safekeeper: add remote_path to Timeline It is used in many places, let's reduce number of ? on construction results.	2024-09-03 17:21:36 +03:00
Vlad Lazar	c43e664ff5	storcon: provide an az id in metadata.json from neon local (#8897 ) ## Problem Neon local set-up does not inject an az id in `metadata.json`. See real change in https://github.com/neondatabase/neon/pull/8852. ## Summary of changes We piggyback on the existing `availability_zone` pageserver configuration in order to avoid making neon local even more complex.	2024-09-03 15:11:30 +01:00
Erik Grinaker	b37da32c6f	pageserver: reuse idempotency keys across metrics sinks (#8876 ) ## Problem Metrics event idempotency keys differ across S3 and Vector. The events should be identical. Resolves #8605. ## Summary of changes Pre-generate the idempotency keys and pass the same set into both metrics sinks. Co-authored-by: John Spray <john@neon.tech>	2024-09-03 09:05:24 +01:00
Christian Schwarz	3b317cae07	page_cache/layer load: correctly classify layer summary block reads (#8885 ) Before this PR, we would classify layer summary block reads as "Unknown" content kind. <img width="1267" alt="image" src="https://github.com/user-attachments/assets/508af034-5c2a-4c89-80db-2899967b337c">	2024-09-02 16:09:26 +01:00
Christian Schwarz	bf0531d107	fixup(#8839 ): `test_forward_compatibility` needs to allow lag warning as well (#8891 ) Found in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8885/10665614629/index.html#suites/0fbaeb107ef328d03993d44a1fb15690/ea10ba1c140fba1d	2024-09-02 15:10:10 +01:00
Christian Schwarz	15e90cc427	bottommost-compaction: remove dead code / rectify cfg!()s (#8884 ) part of https://github.com/neondatabase/neon/issues/8002	2024-09-02 14:45:17 +01:00
Arpad Müller	9746b6ea31	Implement archival_config timeline endpoint in the storage controller (#8680 ) Implement the timeline specific `archival_config` endpoint also in the storage controller. It's mostly a copy-paste of the detach handler: the task is the same: do the same operation on all shards. Part of #8088.	2024-09-02 13:51:45 +02:00
John Spray	516ac0591e	storage controller: eliminate ensure_attached (#8875 ) ## Problem This is a followup to #8783 - The old blocking ensure_attached function had been retained to handle the case where a shard had a None generation_pageserver, but this wasn't really necessary. - There was a subtle `.1` in the code where a struct would have been clearer Closes #8819 ## Summary of changes - Add ShardGenerationState to represent the results of peek_generation - Instead of calling ensure_attached when a tenant has a non-attached shard, check the shard's policy and return 409 if it isn't Attached, else return 503 if the shard's policy is attached but it hasn't been reconciled yet (i.e. has a None generation_pageserver)	2024-09-02 11:36:57 +00:00
Arpad Müller	3ec785f30d	Add safekeeper scrubber test (#8785 ) The test is very rudimentary, it only checks that before and after tenant deletion, we can run `scan_metadata` for the safekeeper node kind. Also, we don't actually expect any uploaded data, for that we don't have enough WAL (needs to create at least one S3-uploaded file, the scrubber doesn't recognize partial files yet). The `scan_metadata` scrubber subcommand is extended to support either specifying a database connection string, which was previously the only way, and required a database to be present, or specifying the timeline information manually via json. This is ideal for testing scenarios because in those, the number of timelines is usually limited, but it is involved to spin up a database just to write the timeline information.	2024-08-31 01:12:25 +02:00
Alex Chi Z.	05caaab850	fix(pageserver): fire layer eviction alert only when it's visible (#8882 ) The pull request https://github.com/neondatabase/neon/pull/8679 explicitly mentioned that it will evict layers earlier than before. Given that the eviction metrics is solely based on eviction threshold (which is 86400s now), we should consider the early eviction and do not fire alert if it's a covered layer. ## Summary of changes Record eviction timer only when the layer is visible + accessed. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-30 17:22:26 -04:00
Yuchen Liang	cacb1ae333	pageserver: set default io_buffer_alignment to 512 bytes (#8878 ) ## Summary of changes - Setting default io_buffer_alignment to 512 bytes. - Fix places that assumed `DEFAULT_IO_BUFFER_ALIGNMENT=0` - Adapt unit tests to handle merge with `chunk size <= 4096`. ## Testing and Performance We have done sufficient performance de-risking. Enabling it by default completes our correctness de-risking before the next release. Context: https://neondb.slack.com/archives/C07BZ38E6SD/p1725026845455259 Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-30 19:53:52 +01:00
Alex Chi Z.	df971f995c	feat(storage-scrubber): check layer map validity (#8867 ) When implementing bottom-most gc-compaction, we analyzed the structure of layer maps that the current compaction algorithm could produce, and decided to only support structures without delta layer overlaps and LSN intersections with the exception of single key layers. ## Summary of changes This patch adds the layer map valid check in the storage scrubber. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-30 14:12:39 -04:00
Alexander Bayandin	e58e045ebb	CI(promote-compatibility-data): fix job (#8871 ) ## Problem `promote-compatibility-data` job got broken and slightly outdated after - https://github.com/neondatabase/neon/pull/8552 -- we don't upload artifacts for ARM64 - https://github.com/neondatabase/neon/pull/8561 -- we don't prepare `debug` artifacts in the release branch anymore ## Summary of changes - Promote artifacts from release PRs to the latest version (but do it from `release` branch) - Upload artifacts for both X64 and ARM64	2024-08-30 13:18:30 +01:00
John Spray	20f82f9169	storage controller: sleep between compute notify retries (#8869 ) ## Problem Live migration retries when it fails to notify the compute of the new location. It should sleep between attempts. Closes: https://github.com/neondatabase/neon/issues/8820 ## Summary of changes - Do an `exponential_backoff` in the retry loop for compute notifications	2024-08-30 11:44:13 +01:00
Conrad Ludgate	72aa6b02da	chore: speed up testing (#8874 ) `safekeeper::random_test test_random_schedules` debug test takes over 2 minutes to run on our arm runners. Running it 6 times with pageserver settings seems redundant.	2024-08-30 11:34:23 +01:00
Conrad Ludgate	022fad65eb	proxy: fix password hash cancellation (#8868 ) In #8863 I replaced the threadpool with tokio tasks, but there was a behaviour I missed regarding cancellation. Adding the JoinHandle wrapper that triggers abort on drop should fix this. Another change, any panics that occur in password hashing will be propagated through the resume_unwind functionality.	2024-08-29 20:16:44 +01:00
Arpad Müller	8eaa8ad358	Remove async_trait usages from safekeeper and neon_local (#8864 ) Removes additional async_trait usages from safekeeper and neon_local. Also removes now redundant dependencies of the `async_trait` crate. cc earlier work: #6305, #6464, #7303, #7342, #7212, #8296	2024-08-29 18:24:25 +02:00
Alex Chi Z.	653a6532a2	fix(pageserver): reject non-i128 key on the write path (#8648 ) It's better to reject invalid keys on the write path than storing it and panic-ing the pageserver. https://github.com/neondatabase/neon/issues/8636 ## Summary of changes If a key cannot be represented using i128, we don't allow writing that key into the pageserver. There are two versions of the check valid function: the normal one that simply rejects i128 keys, and the stronger one that rejects all keys that we don't support. The current behavior when a key gets rejected is that safekeeper will keep retrying streaming that key to the pageserver. And once such key gets written, no new computes can be started. Therefore, there could be a large amount of pageserver warnings if a key cannot be ingested. To validate this behavior by yourself, the reviewer can (1) use the stronger version of the valid check (2) run the following SQL. ``` set neon.regress_test_mode = true; CREATE TABLESPACE regress_tblspace LOCATION '/Users/skyzh/Work/neon-test/tablespace'; CREATE SCHEMA testschema; CREATE TABLE testschema.foo (i int) TABLESPACE regress_tblspace; insert into testschema.foo values (1), (2), (3); ``` For now, I'd like to merge the patch with only rejecting non-i128 keys. It's still unknown whether the stronger version covers all the cases that basebackup doesn't support. Furthermore, the behavior of rejecting a key will produce large amounts of warnings due to safekeeper retry. Therefore, I'd like to reject the minimum set of keys that we don't support (i128 ones) for now. (well, erroring out is better than panic on `to_compact_key`) The next step is to fix the safekeeper behavior (i.e., on such key rejections, stop streaming WAL), so that we can properly stop writing. An alternative solution is to simply drop these keys on the write path. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-29 10:07:05 -04:00
Alex Chi Z.	18bfc43fa7	fix(pageserver): add dry-run to force compact API (#8859 ) Add `dry-run` flag to the compact API Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-29 10:01:54 -04:00
Conrad Ludgate	7ce49fe6e3	proxy: improve test performance (#8863 ) Some tests were very slow and some tests occasionally stalled. This PR improves some test performance and replaces the custom threadpool in order to fix the stalling of tests.	2024-08-29 13:20:15 +00:00
Christian Schwarz	a8fbc63be2	tenant background loops: periodic log message if long-running iteration (#8832 ) refs https://github.com/neondatabase/neon/issues/7524 Problem ------- When browsing Pageserver logs, background loop iterations that take a long time are hard to spot / easy to miss because they tend to not produce any log messages unless: - they overrun their period, but that's only one message when the iteration completes late - they do something that produces logs (e.g., create image layers) Further, a slow iteration that is still running does will not log nor bump the metrics of `warn_when_period_overrun`until _after_ it has finished. Again, that makes a still-running iteration hard to spot. Solution -------- This PR adds a wrapper around the per-tenant background loops that, while a slow iteration is ongoing, emit a log message every $period.	2024-08-29 15:06:13 +02:00
Arpad Müller	96b5c4d33d	Don't unarchive a timeline if its ancestor is archived (#8853 ) If a timeline unarchival request comes in, give an error if the parent timeline is archived. This prevents us from the situation of having an archived timeline with children that are not archived. Follow up of #8824 Part of #8088 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-08-29 12:54:02 +00:00
Christian Schwarz	c7481402a0	pageserver: default to 4MiB stack size and add env var to control it (#8862 ) # Motivation In https://github.com/neondatabase/neon/pull/8832 I get tokio runtime worker stack overflow errors in debug builds. In a similar vein, I had tokio runtimer worker stack overflow when trying to eliminate `async_trait` (https://github.com/neondatabase/neon/pull/8296). The 2MiB default is kind of arbitrary - so this PR bumps it to 4MiB. It also adds an env var to control it. # Risk Assessment With our 4 runtimes, the worst case stack memory usage is `4 (runtimes) * ($num_cpus (executor threads) + 512 (blocking pool threads)) * 4MiB`. On i3en.3xlarge, that's `8384 MiB`. On im4gn.2xlarge, that's `8320 MiB`. Before this change, it was half that. Looking at production metrics, we _do_ have the headroom to accomodate this worst case case. # Alternatives The problems only occur with debug builds, so technically we could only raise the stack size for debug builds. However, it would be another configuration where `debug != release`. # Future Work If we ever enable single runtime mode in prod (=> https://github.com/neondatabase/neon/issues/7312 ) then the worst case will drop to 25% of its current value. Eliminating the use of `tokio::spawn_blocking` / `tokio::fs` in favor of `tokio-epoll-uring` (=> https://github.com/neondatabase/neon/issues/7370 ) would reduce the worst case to `4 (runtimes) * $num_cpus (executor threads) * 4 MiB`.	2024-08-29 14:02:27 +02:00
Conrad Ludgate	a644f01b6a	proxy+pageserver: shared leaky bucket impl (#8539 ) In proxy I switched to a leaky-bucket impl using the GCRA algorithm. I figured I could share the code with pageserver and remove the leaky_bucket crate dependency with some very basic tokio timers and queues for fairness. The underlying algorithm should be fairly clear how it works from the comments I have left in the code. --- In benchmarking pageserver, @problame found that the new implementation fixes a getpage throughput discontinuity in pageserver under the `pagebench get-page-latest-lsn` benchmark with the clickbench dataset (`test_perf_olap.py`). The discontinuity is that for any of `--num-clients={2,3,4}`, getpage throughput remains 10k. With `--num-clients=5` and greater, getpage throughput then jumps to the configured 20k rate limit. With the changes in this PR, the discontinuity is gone, and we scale throughput linearly to `--num-clients` until the configured rate limit. More context in https://github.com/neondatabase/cloud/issues/16886#issuecomment-2315257641. closes https://github.com/neondatabase/cloud/issues/16886 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-29 11:26:52 +00:00
Christian Schwarz	c2f8fdccd7	ingest: rate-limited warning if WAL commit timestamps lags for > wait_lsn_timeout (#8839 ) refs https://github.com/neondatabase/cloud/issues/13750 The logging in this commit will make it easier to detect lagging ingest. We're trusting compute timestamps --- ideally we'd use SK timestmaps instead. But trusting the compute timestamp is ok for now.	2024-08-29 12:06:00 +01:00
Konstantin Knizhnik	cfa45ff5ee	Undo walloging replorgin file on checkpoint (#8794 ) ## Problem See #8620 ## Summary of changes Remove walloping of replorigin file because it is reconstructed by PS ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-29 07:45:33 +03:00
Andrew Rudenko	acc075071d	feat(compute_ctl): add periodic `lease lsn` requests for static computes (#7994 ) Part of #7497 ## Problem Static computes pinned at some fix LSN could be created initially within PITR interval but eventually go out it. To make sure that Static computes are not affected by GC, we need to start using the LSN lease API (introduced in #8084) in compute_ctl. ## Summary of changes compute_ctl - Spawn a thread for when a static compute starts to periodically ping pageserver(s) to make LSN lease requests. - Add `test_readonly_node_gc` to test if static compute can read all pages without error. - (test will fail on main without the code change here) page_service - `wait_or_get_last_lsn` will now allow `request_lsn` less than `latest_gc_cutoff_lsn` to proceed if there is a lease on `request_lsn`. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-08-28 19:09:26 +00:00
Christian Schwarz	9627747d35	bypass `PageCache` for `InMemoryLayer` + avoid `Value::deser` on L0 flush (#8537 ) Part of [Epic: Bypass PageCache for user data blocks](https://github.com/neondatabase/neon/issues/7386). # Problem `InMemoryLayer` still uses the `PageCache` for all data stored in the `VirtualFile` that underlies the `EphemeralFile`. # Background Before this PR, `EphemeralFile` is a fancy and (code-bloated) buffered writer around a `VirtualFile` that supports `blob_io`. The `InMemoryLayerInner::index` stores offsets into the `EphemeralFile`. At those offset, we find a varint length followed by the serialized `Value`. Vectored reads (`get_values_reconstruct_data`) are not in fact vectored - each `Value` that needs to be read is read sequentially. The `will_init` bit of information which we use to early-exit the `get_values_reconstruct_data` for a given key is stored in the serialized `Value`, meaning we have to read & deserialize the `Value` from the `EphemeralFile`. The L0 flushing also needs to re-determine the `will_init` bit of information, by deserializing each value during L0 flush. # Changes 1. Store the value length and `will_init` information in the `InMemoryLayer::index`. The `EphemeralFile` thus only needs to store the values. 2. For `get_values_reconstruct_data`: - Use the in-memory `index` figures out which values need to be read. Having the `will_init` stored in the index enables us to do that. - View the EphemeralFile as a byte array of "DIO chunks", each 512 bytes in size (adjustable constant). A "DIO chunk" is the minimal unit that we can read under direct IO. - Figure out which chunks need to be read to retrieve the serialized bytes for thes values we need to read. - Coalesce chunk reads such that each DIO chunk is only read once to serve all value reads that need data from that chunk. - Merge adjacent chunk reads into larger `EphemeralFile::read_exact_at_eof_ok` of up to 128k (adjustable constant). 3. The new `EphemeralFile::read_exact_at_eof_ok` fills the IO buffer from the underlying VirtualFile and/or its in-memory buffer. 4. The L0 flush code is changed to use the `index` directly, `blob_io` 5. We can remove the `ephemeral_file::page_caching` construct now. The `get_values_reconstruct_data` changes seem like a bit overkill but they are necessary so we issue the equivalent amount of read system calls compared to before this PR where it was highly likely that even if the first PageCache access was a miss, remaining reads within the same `get_values_reconstruct_data` call from the same `EphemeralFile` page were a hit. The "DIO chunk" stuff is truly unnecessary for page cache bypass, but, since we're working on [direct IO](https://github.com/neondatabase/neon/issues/8130) and https://github.com/neondatabase/neon/issues/8719 specifically, we need to do _something_ like this anyways in the near future. # Alternative Design The original plan was to use the `vectored_blob_io` code it relies on the invariant of Delta&Image layers that `index order == values order`. Further, `vectored_blob_io` code's strategy for merging IOs is limited to adjacent reads. However, with direct IO, there is another level of merging that should be done, specifically, if multiple reads map to the same "DIO chunk" (=alignment-requirement-sized and -aligned region of the file), then it's "free" to read the chunk into an IO buffer and serve the two reads from that buffer. => https://github.com/neondatabase/neon/issues/8719 # Testing / Performance Correctness of the IO merging code is ensured by unit tests. Additionally, minimal tests are added for the `EphemeralFile` implementation and the bit-packed `InMemoryLayerIndexValue`. Performance testing results are presented below. All pref testing done on my M2 MacBook Pro, running a Linux VM. It's a release build without `--features testing`. We see definitive improvement in ingest performance microbenchmark and an ad-hoc microbenchmark for getpage against InMemoryLayer. ``` baseline: commit `7c74112b2a` origin/main HEAD: `ef1c55c52e` ``` <details> ``` cargo bench --bench bench_ingest -- 'ingest 128MB/100b seq, no delta' baseline ingest-small-values/ingest 128MB/100b seq, no delta time: [483.50 ms 498.73 ms 522.53 ms] thrpt: [244.96 MiB/s 256.65 MiB/s 264.73 MiB/s] HEAD ingest-small-values/ingest 128MB/100b seq, no delta time: [479.22 ms 482.92 ms 487.35 ms] thrpt: [262.64 MiB/s 265.06 MiB/s 267.10 MiB/s] ``` </details> We don't have a micro-benchmark for InMemoryLayer and it's quite cumbersome to add one. So, I did manual testing in `neon_local`. <details> ``` ./target/release/neon_local stop rm -rf .neon ./target/release/neon_local init ./target/release/neon_local start ./target/release/neon_local tenant create --set-default ./target/release/neon_local endpoint create foo ./target/release/neon_local endpoint start foo psql 'postgresql://cloud_admin@127.0.0.1:55432/postgres' psql (13.16 (Debian 13.16-0+deb11u1), server 15.7) CREATE TABLE wal_test ( id SERIAL PRIMARY KEY, data TEXT ); DO $$ DECLARE i INTEGER := 1; BEGIN WHILE i <= 500000 LOOP INSERT INTO wal_test (data) VALUES ('data'); i := i + 1; END LOOP; END $$; -- => result is one L0 from initdb and one 137M-sized ephemeral-2 DO $$ DECLARE i INTEGER := 1; random_id INTEGER; random_record wal_test%ROWTYPE; start_time TIMESTAMP := clock_timestamp(); selects_completed INTEGER := 0; min_id INTEGER := 1; -- Minimum ID value max_id INTEGER := 100000; -- Maximum ID value, based on your insert range iters INTEGER := 100000000; -- Number of iterations to run BEGIN WHILE i <= iters LOOP -- Generate a random ID within the known range random_id := min_id + floor(random() * (max_id - min_id + 1))::int; -- Select the row with the generated random ID SELECT * INTO random_record FROM wal_test WHERE id = random_id; -- Increment the select counter selects_completed := selects_completed + 1; -- Check if a second has passed IF EXTRACT(EPOCH FROM clock_timestamp() - start_time) >= 1 THEN -- Print the number of selects completed in the last second RAISE NOTICE 'Selects completed in last second: %', selects_completed; -- Reset counters for the next second selects_completed := 0; start_time := clock_timestamp(); END IF; -- Increment the loop counter i := i + 1; END LOOP; END $$; ./target/release/neon_local stop baseline: commit `7c74112b2a` origin/main NOTICE: Selects completed in last second: 1864 NOTICE: Selects completed in last second: 1850 NOTICE: Selects completed in last second: 1851 NOTICE: Selects completed in last second: 1918 NOTICE: Selects completed in last second: 1911 NOTICE: Selects completed in last second: 1879 NOTICE: Selects completed in last second: 1858 NOTICE: Selects completed in last second: 1827 NOTICE: Selects completed in last second: 1933 ours NOTICE: Selects completed in last second: 1915 NOTICE: Selects completed in last second: 1928 NOTICE: Selects completed in last second: 1913 NOTICE: Selects completed in last second: 1932 NOTICE: Selects completed in last second: 1846 NOTICE: Selects completed in last second: 1955 NOTICE: Selects completed in last second: 1991 NOTICE: Selects completed in last second: 1973 ``` NB: the ephemeral file sizes differ by ca 1MiB, ours being 1MiB smaller. </details> # Rollout This PR changes the code in-place and is not gated by a feature flag.	2024-08-28 18:31:41 +00:00
Alex Chi Z.	63a0d0d039	fix(storage-scrubber): make retry error into warnings (#8851 ) We get many HTTP connect timeout errors from scrubber logs, and it turned out that the scrubber is retrying, and this is not an actual error. In the future, we should revisit all places where we log errors in the storage scrubber, and only error when necessary (i.e., errors that might need manual fixing) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-28 13:39:21 -04:00
Vlad Lazar	793b5061ec	storcon: track pageserver availability zone (#8852 ) ## Problem In order to build AZ aware scheduling, the storage controller needs to know what AZ pageservers are in. Related https://github.com/neondatabase/neon/issues/8848 ## Summary of changes This patch set adds a new nullable column to the `nodes` table: `availability_zone_id`. The node registration request is extended to include the AZ id (pageservers already have this in their `metadata.json` file). If the node is already registered, then we update the persistent and in-memory state with the provided AZ. Otherwise, we add the node with the AZ to begin with. A couple assumptions are made here: 1. Pageserver AZ ids are stable 2. AZ ids do not change over time Once all pageservers have a configured AZ, we can remove the optionals in the code and make the database column not nullable.	2024-08-28 18:23:55 +01:00
Yuchen Liang	a889a49e06	pageserver: do vectored read on each dio-aligned section once (#8763 ) Part of #8130, closes #8719. ## Problem Currently, vectored blob io only coalesce blocks if they are immediately adjacent to each other. When we switch to Direct IO, we need a way to coalesce blobs that are within the dio-aligned boundary but has gap between them. ## Summary of changes - Introduces a `VectoredReadCoalesceMode` for `VectoredReadPlanner` and `StreamingVectoredReadPlanner` which has two modes: - `AdjacentOnly` (current implementation) - `Chunked(<alignment requirement>)` - New `ChunkedVectorBuilder` that considers batching `dio-align`-sized read, the start and end of the vectored read will respect `stx_dio_offset_align` / `stx_dio_mem_align` (`vectored_read.start` and `vectored_read.blobs_at.first().start_offset` will be two different value). - Since we break the assumption that blobs within single `VectoredRead` are next to each other (implicit end offset), we start to store blob end offsets in the `VectoredRead`. - Adapted existing tests to run in both `VectoredReadCoalesceMode`. - The io alignment can also be live configured at runtime. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-28 15:54:42 +01:00
Vlad Lazar	5eb7322d08	docs: rolling storage controller restarts RFC (#8310 ) ## Problem Storage controller upgrades (restarts, more generally) can cause multi-second availability gaps. While the storage controller does not sit on the main data path, it's generally not acceptable to block management requests for extended periods of time (e.g. https://github.com/neondatabase/neon/issues/8034). ## Summary of changes This RFC describes the issues around the current storage controller restart procedure and describes an implementation which reduces downtime to a few milliseconds on the happy path. Related https://github.com/neondatabase/neon/issues/7797	2024-08-28 13:56:14 +00:00
Joonas Koivunen	c0ba18a112	bench: flush before shutting down (#8844 ) while driving by: - remove the extra tenant - remove the extra timelines implement this by turning the pg_compare to a yielding fixture. evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10571779162/index.html#suites/9681106e61a1222669b9d22ab136d07b/3bbe9f007b3ffae1/	2024-08-28 10:20:43 +01:00
John Spray	992a951b5e	.github: direct feature requests to the feedback form (#8849 ) ## Problem When folks open github issues for feature requests, they don't have a clear recipient: engineers usually see them during bug triage, but that doesn't necessarily get the work prioritized. ## Summary of changes Give end users a clearer path to submitting feedback to Neon	2024-08-28 09:22:19 +01:00
Heikki Linnakangas	c5ef779801	tests: Remove unnecessary entries from list of allowed errors (#8199 ) The "manual_gc" context was removed in commit `be0c73f8e7`. The code that generated the other error was removed in commit `9a6c0be823`.	2024-08-27 17:47:05 +01:00
Heikki Linnakangas	2d10306f7a	Remove support for pageserver <-> compute protocol version 1 (#8774 ) Protocol version 2 has been the default for a while now, and we no longer have any computes running in production that used protocol version 1. This completes the migration by removing support for v1 in both the pageserver and the compute. See issue #6211.	2024-08-27 18:36:33 +03:00
Alexey Kondratov	9b9f90c562	fix(walproposer): Do not restart on safekeepers reordering (#8840 ) ## Problem Currently, we compare `neon.safekeepers` values as is, so we unnecessarily restart walproposer even if safekeepers set didn't change. This leads to errors like: ```log FATAL: [WP] restarting walproposer to change safekeeper list from safekeeper-8.us-east-2.aws.neon.tech:6401,safekeeper-11.us-east-2.aws.neon.tech:6401,safekeeper-10.us-east-2.aws.neon.tech:6401 to safekeeper-11.us-east-2.aws.neon.tech:6401,safekeeper-8.us-east-2.aws.neon.tech:6401,safekeeper-10.us-east-2.aws.neon.tech:6401 ``` ## Summary of changes Split the GUC into the list of individual safekeepers and properly compare. We could've done that somewhere on the upper level, e.g., control plane, but I think it's still better when the actual config consumer is smarter and doesn't rely on upper levels.	2024-08-27 15:49:47 +02:00
Folke Behrens	52cb33770b	proxy: Rename backend types and variants as prep for refactor (#8845 ) * AuthBackend enum to AuthBackendType * BackendType enum to Backend * Link variants to Web * Adjust messages, comments, etc.	2024-08-27 14:12:42 +02:00
Conrad Ludgate	12850dd5e9	proxy: remove dead code (#8847 ) By marking everything possible as pub(crate), we find a few dead code candidates.	2024-08-27 12:00:35 +01:00
a-masterov	5d527133a3	Fix the pg_hintplan flakyness (#8834 ) ## Problem pg_hintplan test seems to be flaky, sometimes it fails, while usually it passes ## Summary of changes The regression test is changed to filter out the Neon service queries. The expected file is changed as well. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-08-27 12:39:42 +02:00
Arseny Sher	09362b6363	safekeeper: reorder routes and their handlers. Routes and their handlers were in a bit different order in 1) routes list 2) their implementation 3) python client 4) openapi spec, making addition of new ones intimidating. Make it the same everywhere, roughly lexicographically but preserving some of existing logic. No functional changes.	2024-08-27 07:37:55 +03:00
Alexey Kondratov	7820c572e7	fix(sql-exporter): Remove tenant_id from compute_logical_snapshot_files It appeared to be that it's already auto-added to all metrics [1] [1]: `3a907c317c/apps/base/ext-vmagent/vmagent.yaml (L43)`	2024-08-27 00:51:23 +02:00
Alexey Kondratov	bf03713fa1	fix(sql-exporter): Fix typo in gauge In `f4b3c317f` there was a typo and I missed that on review	2024-08-27 00:51:23 +02:00
Alex Chi Z.	0f65684263	feat(pageserver): use split layer writer in gc-compaction (#8608 ) Part of #8002, the final big PR in the batch. ## Summary of changes This pull request uses the new split layer writer in the gc-compaction. * It changes how layers are split. Previously, we split layers based on the original split point, but this creates too many layers (test_gc_feedback has one key per layer). * Therefore, we first verify if the layer map can be processed by the current algorithm (See https://github.com/neondatabase/neon/pull/8191, it's basically the same check) * On that, we proceed with the compaction. This way, it creates a large enough layer close to the target layer size. * Added a new set of functions `with_discard` in the split layer writer. This helps us skip layers if we are going to produce the same persistent key. * The delta writer will keep the updates of the same key in a single file. This might create a super large layer, but we can optimize it later. * The split layer writer is used in the gc-compaction algorithm, and it will split layers based on size. * Fix the image layer summary block encoded the wrong key range. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-26 14:19:47 -04:00
Christian Schwarz	97241776aa	pageserver: startup: ensure local disk state is durable (#8835 ) refs https://github.com/neondatabase/neon/issues/6989 Problem ------- After unclean shutdown, we get restarted, start reading the local filesystem, and make decisions based on those reads. However, some of the data might have not yet been fsynced when the unclean shutdown completed. Durability matters even though Pageservers are conceptually just a cache of state in S3. For example: - the cloud control plane is no control loop => pageserver responses to tenant attachmentm, etc, needs to be durable. - the storage controller does not rely on this (as much?) - we don't have layer file checksumming, so, downloaded+renamed but not fsynced layer files are technically not to be trusted - https://github.com/neondatabase/neon/issues/2683 Solution -------- `syncfs` the tenants directory during startup, before we start reading from it. This is a bit overkill because we do remove some temp files (InMemoryLayer!) later during startup. Further, these temp files are particularly likely to be dirty in the kernel page cache. However, we don't want to refactor that cleanup code right now, and the dirty data on pageservers is generally not that high. Last, with [direct IO](https://github.com/neondatabase/neon/issues/8130) we're going to have near-zero kernel page cache anyway quite soon.	2024-08-26 18:07:55 +02:00
Arpad Müller	2dd53e7ae0	Timeline archival test (#8824 ) This PR: * Implements the rule that archived timelines require all of their children to be archived as well, as specified in the RFC. There is no fancy locking mechanism though, so the precondition can still be broken. As a TODO for later, we still allow unarchiving timelines with archived parents. * Adds an `is_archived` flag to `TimelineInfo` * Adds timeline_archival_config to `PageserverHttpClient` * Adds a new `test_timeline_archive` test, loosely based on `test_timeline_delete` Part of #8088	2024-08-26 17:30:19 +02:00
Folke Behrens	d6eede515a	proxy: clippy lints: handle some low hanging fruit (#8829 ) Should be mostly uncontroversial ones.	2024-08-26 15:16:54 +02:00
Alexey Kondratov	d48229f50f	feat(compute): Introduce new compute_subscriptions_count metric (#8796 ) ## Problem We need some metric to sneak peek into how many people use inbound logical replication (Neon is a subscriber). ## Summary of changes This commit adds a new metric `compute_subscriptions_count`, which is number of subscriptions grouped by enabled/disabled state. Resolves: neondatabase/cloud#16146	2024-08-26 14:34:18 +02:00
Jakub Kołodziejczak	cdfdcd3e5d	chore: improve markdown formatting (#8825 ) fixes: ![Screenshot_2024-08-25_16-25-30](https://github.com/user-attachments/assets/c993309b-6c2d-4938-9fd0-ce0953fc63ff) fixes: ![Screenshot_2024-08-25_16-26-29](https://github.com/user-attachments/assets/cf497f4a-d9e3-45a6-a1a5-7e215d96d022)	2024-08-25 16:33:45 +01:00
Conrad Ludgate	06795c6b9a	proxy: new local-proxy application (#8736 ) Add binary for local-proxy that uses the local auth backend. Runs only the http serverless driver support and offers config reload based on a config file and SIGHUP	2024-08-23 22:32:10 +01:00
Conrad Ludgate	701cb61b57	proxy: local auth backend (#8806 ) Adds a Local authentication backend. Updates http to extract JWT bearer tokens and passes them to the local backend to validate.	2024-08-23 18:48:06 +00:00
John Spray	0aa1450936	storage controller: enable timeline CRUD operations to run concurrently with reconciliation & make them safer (#8783 ) ## Problem - If a reconciler was waiting to be able to notify computes about a change, but the control plane was waiting for the controller to finish a timeline creation/deletion, the overall system can deadlock. - If a tenant shard was migrated concurrently with a timeline creation/deletion, there was a risk that the timeline operation could be applied to a non-latest-generation location, and thereby not really be persistent. This has never happened in practice, but would eventually happen at scale. Closes: #8743 ## Summary of changes - Introduce `Service::tenant_remote_mutation` helper, which looks up shards & generations and passes them into an inner function that may do remote I/O to pageservers. Before returning success, this helper checks that generations haven't incremented, to guarantee that changes are persistent. - Convert tenant_timeline_create, tenant_timeline_delete, and tenant_timeline_detach_ancestor to use this helper. - These functions no longer block on ensure_attached unless the tenant was never attached at all, so they should make progress even if we can't complete compute notifications. This increases the database load from timeline/create operations, but only with cheap read transactions.	2024-08-23 18:56:05 +01:00
John Spray	b65a95f12e	controller: use PageserverUtilization for scheduling (#8711 ) ## Problem Previously, the controller only used the shard counts for scheduling. This works well when hosting only many-sharded tenants, but works much less well when hosting single-sharded tenants that have a greater deviation in size-per-shard. Closes: https://github.com/neondatabase/neon/issues/7798 ## Summary of changes - Instead of UtilizationScore, carry the full PageserverUtilization through into the Scheduler. - Use the PageserverUtilization::score() instead of shard count when ordering nodes in scheduling. Q: Why did test_sharding_split_smoke need updating in this PR? A: There's an interesting side effect during shard splits: because we do not decrement the shard count in the utilization when we de-schedule the shards from before the split, the controller will now prefer to pick _different_ nodes for shards compared with which ones held secondaries before the split. We could use our knowledge of splitting to fix up the utilizations more actively in this situation, but I'm leaning toward leaving the code simpler, as in practical systems the impact of one shard on the utilization of a node should be fairly low (single digit %).	2024-08-23 18:32:56 +01:00
Conrad Ludgate	c1cb7a0fa0	proxy: flesh out JWT verification code (#8805 ) This change adds in the necessary verification steps for the JWT payload, and adds per-role querying of JWKs as needed for #8736	2024-08-23 18:01:02 +01:00
Alex Chi Z.	f4cac1f30f	impr(pageserver): error if keys are unordered in merge iter (#8818 ) In case of corrupted delta layers, we can detect the corruption and bail out the compaction. ## Summary of changes * Detect wrong delta desc of key range * Detect unordered deltas Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-23 16:38:42 +00:00
Conrad Ludgate	612b643315	update diesel (#8816 ) https://rustsec.org/advisories/RUSTSEC-2024-0365	2024-08-23 15:28:22 +00:00
Vlad Lazar	bcc68a7866	storcon_cli: add support for drain and fill operations (#8791 ) ## Problem We have been naughty and curl-ed storcon to fix-up drains and fills. ## Summary of changes Add support for starting/cancelling drain/fill operations via `storcon_cli`.	2024-08-23 14:48:06 +01:00
Joonas Koivunen	73286e6b9f	test: copy dict to avoid error on retry (#8811 ) there is no "const" in python, so when we modify the global dict, it will remain that way on the retry. fix to not have it influence other tests which might be run on the same runner. evidence: <https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8625/10513146742/index.html#/testresult/453c4ce05ada7496>	2024-08-23 14:43:08 +01:00
Alex Chi Z.	bc8cfe1b55	fix(pageserver): l0 check criteria (#8797 ) close https://github.com/neondatabase/neon/issues/8579 ## Summary of changes The `is_l0` check now takes both layer key range and the layer type. This allows us to have image layers covering the full key range in btm-most compaction (upcoming PR). However, we still don't allow delta layers to cover the full key range, and I will make btm-most compaction to generate delta layers with the key range of the keys existing in the layer instead of `Key::MIN..Key::HACK_MAX` (upcoming PR). Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-23 09:42:45 -04:00
Alex Chi Z.	6a74bcadec	feat(pageserver): remove features=testing restriction for compact (#8815 ) A small PR to make it possible to run force compaction in staging for btm-gc compaction testing. Part of https://github.com/neondatabase/neon/issues/8002 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-23 14:32:00 +01:00
Alexander Bayandin	e62cd9e121	CI(autocomment): add arch to build type (#8809 ) ## Problem Failed / flaky tests for different arches don't have any difference in GitHub Autocomment ## Summary of changes - Add arch to build type for GitHub autocomment	2024-08-23 14:29:11 +01:00
Arpad Müller	e80ab8fd6a	Update serde_json to 1.0.125 (#8813 ) Updates `serde_json` to `1.0.125`, rolling out speedups added by a serde_json contributor. Release [link](https://github.com/serde-rs/json/releases/tag/1.0.125). Blog post [link](https://purplesyringa.moe/blog/i-sped-up-serde-json-strings-by-20-percent/).	2024-08-23 12:14:14 +01:00
MMeent	d8ca495eae	Require poetry >=1.8 (#8812 ) This was already a requirement for installing the python packages after https://github.com/neondatabase/neon/pull/8609 got merged, so this updates the documentation to reflect that.	2024-08-23 11:48:26 +01:00
Heikki Linnakangas	dbdb8a1187	Document how to use "git merge" for PostgreSQL minor version upgrades. (#8692 ) Our new policy is to use the "rebase" method and slice all the Neon commits into a nice patch set when doing a new major version, and use "merge" method on minor version upgrades on the release branches. "git merge" preserves the git history of Neon commits on the Postgres branches. While it's nice to rebase all the Neon changes to a logical patch set against upstream, having to do it between every minor release is a fair amount work, and it loses the history, and is more error-prone.	2024-08-23 09:15:55 +03:00
Tristan Partin	f7ab3ffcb7	Check that TERM != dumb before using colors in pre-commit.py Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	2f8d548a12	Update Postgres 16 to 16.4 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	66db381dc9	Update Postgres 15 to 15.8 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	6744ed19d8	Update Postgres 14 to 14.13 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-22 18:03:45 -05:00
Tristan Partin	ae63ac7488	Write messages field by field instead of bytes sheet in test_simple_sync_safekeepers Co-authored-by: Arseny Sher <ars@neon.tech>	2024-08-22 18:03:45 -05:00
Alex Chi Z.	6eb638f4b3	feat(pageserver): warn on aux v1 tenants + default to v2 (#8625 ) part of https://github.com/neondatabase/neon/issues/8623 We want to discover potential aux v1 customers that we might have missed from the migrations. ## Summary of changes Log warnings on basebackup, load timeline, and the first put_file. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-22 22:31:38 +01:00
Konstantin Knizhnik	7a485b599b	Fix race condition in LRU list update in get_cached_relsize (#8807 ) ## Problem See https://neondb.slack.com/archives/C07J14D8GTX/p1724347552023709 Manipulations with LRU list in relation size cache are performed under shared lock ## Summary of changes Take exclusive lock ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-22 23:53:37 +03:00
Joonas Koivunen	b1c457898b	test_compatibility: flush in the end (#8804 ) `test_forward_compatibility` is still often failing at graceful shutdown. Fix this by explicit flush before shutdown. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10506613738/index.html#testresult/5e7111907f7ecfb2/ Cc: #8655 and #8708 Previous attempt: #8787	2024-08-22 16:38:03 +01:00
Folke Behrens	1a9d559be8	proxy: Enable stricter/pedantic clippy checks (#8775 ) Create a list of currently allowed exceptions that should be reduced over time.	2024-08-22 13:29:05 +02:00
Alexey Kondratov	0e6c0d47a5	Revert "Use sycnhronous commit for logical replicaiton worker (#8645 )" (#8792 ) This reverts commit `cbe8c77997`. This change was originally made to test a hypothesis, but after that, the proper fix #8669 was merged, so now it's not needed. Moreover, the test is still flaky, so probably this bug was not a reason of the flakiness. Related to #8097	2024-08-22 12:52:36 +02:00
Arpad Müller	d645645fab	Sleep in test_scrubber_physical_gc (#8798 ) This copies a piece of code from `test_scrubber_physical_gc_ancestors` to fix a source of flakiness: later on we rely on stuff being older than a second, but the test can run faster under optimal conditions (as happened to me locally, but also obvservable in [this](https://neon-github-public-dev.s3.amazonaws.com/reports/main/10470762360/index.html#testresult/f713b02657db4b4c/retries) allure report): ``` test_runner/regress/test_storage_scrubber.py:169: in test_scrubber_physical_gc assert gc_summary["remote_storage_errors"] == 0 E assert 1 == 0 ```	2024-08-22 12:45:29 +02:00
John Spray	7c74112b2a	pageserver: batch InMemoryLayer `put`s, remove need to sort items by LSN during ingest (#8591 ) ## Problem/Solution TimelineWriter::put_batch is simply a loop over individual puts. Each put acquires and releases locks, and checks for potentially starting a new layer. Batching these is more efficient, but more importantly unlocks future changes where we can pre-build serialized buffers much earlier in the ingest process, potentially even on the safekeeper (imagine a future model where some variant of DatadirModification lives on the safekeeper). Ensuring that the values in put_batch are written to one layer also enables a simplification upstream, where we no longer need to write values in LSN-order. This saves us a sort, but also simplifies follow-on refactors to DatadirModification: we can store metadata keys and data keys separately at that level without needing to zip them together in LSN order later. ## Why? In this PR, these changes are simplify optimizations, but they are motivated by evolving the ingest path in the direction of disentangling extracting DatadirModification from Timeline. It may not obvious how right now, but the general idea is that we'll end up with three phases of ingest: - A) Decode walrecords and build a datadirmodification with all the simple data contents already in a big serialized buffer ready to write to an ephemeral layer <-- this part can be pipelined and parallelized, and done on a safekeeper! - B) Let that datadirmodification see a Timeline, so that it can also generate all the metadata updates that require a read-modify-write of existing pages - C) Dump the results of B into an ephemeral layer. Related: https://github.com/neondatabase/neon/issues/8452 ## Caveats Doing a big monolithic buffer of values to write to disk is ordinarily an anti-pattern: we prefer nice streaming I/O. However: - In future, when we do this first decode stage on the safekeeper, it would be inefficient to serialize a Vec of Value, and then later deserialize it just to add blob size headers while writing into the ephemeral layer format. The idea is that for bulk write data, we will serialize exactly once. - The monolithic buffer is a stepping stone to pipelining more of this: by seriailizing earlier (rather than at the final put_value), we will be able to parallelize the wal decoding and bulk serialization of data page writes. - The ephemeral layer's buffered writer already stalls writes while it waits to flush: so while yes we'll stall for a couple milliseconds to write a couple megabytes, we already have stalls like this, just distributed across smaller writes. ## Benchmarks This PR is primarily a stepping stone to safekeeper ingest filtering, but also provides a modest efficiency improvement to the `wal_recovery` part of `test_bulk_ingest`. test_bulk_ingest: ``` test_bulk_insert[neon-release-pg16].insert: 23.659 s test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB test_bulk_insert[neon-release-pg16].peak_mem: 626 MB test_bulk_insert[neon-release-pg16].size: 0 MB test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB test_bulk_insert[neon-release-pg16].num_files_uploaded: 8 test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB test_bulk_insert[neon-release-pg16].wal_recovery: 18.981 s test_bulk_insert[neon-release-pg16].compaction: 0.055 s vs. tip of main: test_bulk_insert[neon-release-pg16].insert: 24.001 s test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB test_bulk_insert[neon-release-pg16].peak_mem: 604 MB test_bulk_insert[neon-release-pg16].size: 0 MB test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB test_bulk_insert[neon-release-pg16].num_files_uploaded: 8 test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB test_bulk_insert[neon-release-pg16].wal_recovery: 23.586 s test_bulk_insert[neon-release-pg16].compaction: 0.054 s ```	2024-08-22 10:04:42 +00:00
Conrad Ludgate	66a99009ba	Merge pull request #8799 from neondatabase/rc/proxy/2024-08-22 Proxy release 2024-08-22	2024-08-22 10:04:56 +01:00
Alex Chi Z.	a968554a8c	fix(pageserver): unify initdb optimization for sparse keyspaces; fix force img generation (#8776 ) close https://github.com/neondatabase/neon/issues/8558 * Directly generate image layers for sparse keyspaces during initdb optimization. * Support force image layer generation for sparse keyspaces. * Fix a bug of incorrect image layer key range in case of duplicated keys. (The added line: `start = img_range.end;`) This can cause overlapping image layers and keys to disappear. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-21 21:25:21 +01:00
Joonas Koivunen	07b7c63975	test: avoid some too long shutdowns by flushing before shutdown (#8772 ) After #8655, we needed to mark some tests to shut down immediately. To aid these tests, try the new pattern of `flush_ep_to_pageserver` followed by a non-compacting checkpoint. This moves the general graceful shutdown problem of having too much to flush at shutdown into the test. Also, add logging for how long the graceful shutdown took, if we got to complete it for faster log eyeballing. Fixes: #8712 Cc: #8715, #8708	2024-08-21 14:26:27 -04:00
Tristan Partin	04752dfa75	Prefix current_lsn with compute_	2024-08-21 12:39:02 -05:00
Tristan Partin	99c19cad24	Add compute_receive_lsn metric Useful for dashboarding the replication metrics of a single endpoint. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-21 12:39:02 -05:00
Joonas Koivunen	b83d722369	test: fix more flaky due to graceful shutdown (#8787 ) Going through the list of recent flaky tests, trying to fix those related to graceful shutdown. - test_forward_compatibility: flush and wait for uploads to avoid graceful shutdown - test_layer_bloating: in the end the endpoint and vanilla are still up => immediate shutdown - test_lagging_sk: pageserver shutdown is not related to the test => immediate shutdown - test_lsn_lease_size: pageserver flushing is not needed => immediate shutdown Additionally: - remove `wait_for_upload` usage from workload fixture Cc: #8708 Fixes: #8710	2024-08-21 17:22:47 +01:00
Arseny Sher	d919770c55	safekeeper: add listing timelines Adds endpoint GET /tenant/timeline listing all not deleted timelines.	2024-08-21 18:38:08 +03:00
Tristan Partin	f4b3c317f3	Add compute_logical_snapshot_files metric Track the number of logical snapshot files on an endpoint over time. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-21 10:33:44 -05:00
Conrad Ludgate	428b105dde	remove workspace hack from libs (#8780 ) This removes workspace hack from all libs, not from any binaries. This does not change the behaviour of the hack. Running ``` cargo clean cargo build --release --bin proxy ``` Before this change took 5m16s. After this change took 3m3s. This is because this allows the build to be parallelisable much more.	2024-08-21 14:45:32 +01:00
Alexander Bayandin	75175f3628	CI(build-and-test): run regression tests on arm (#8552 ) ## Problem We want to run our regression test suite on ARM. ## Summary of changes - run regression tests on release ARM builds - run `build-neon` (including rust tests) on debug ARM builds - add `arch` parameter to test to distinguish them in the allure report and in a database	2024-08-21 14:29:11 +01:00
Joonas Koivunen	3b8016488e	test: test_timeline_ancestor_detach_errors rare allowed_error (#8782 ) Add another allowed_error for this rarity. Fixes: #8773	2024-08-21 12:51:08 +01:00
Joonas Koivunen	477246f42c	storcon: handle heartbeater shutdown gracefully (#8767 ) if a heartbeat happens during shutdown, then the task is already cancelled and will not be sending responses. Fixes: #8766	2024-08-21 12:28:27 +01:00
Christian Schwarz	21b684718e	pageserver: add counter for wait time on background loop semaphore (#8769 ) ## Problem Compaction jobs and other background loops are concurrency-limited through a global semaphore. The current counters allow quantifying how _many_ tasks are waiting. But there is no way to tell how _much_ delay is added by the semaphore. So, add a counter that aggregates the wall clock time seconds spent acquiring the semaphore. The metrics can be used as follows: * retroactively calculate average acquisition time in a given time range * compare the degree of background loop backlog among pageservers The metric is insufficient to calculate * run-up of ongoing acquisitions that haven't finished acquiring yet * Not easily feasible because ["Cancelling a call to acquire makes you lose your place in the queue"](https://docs.rs/tokio/latest/tokio/sync/struct.Semaphore.html#method.acquire) ## Summary of changes * Refactor the metrics to follow the current best practice for typed metrics in `metrics.rs`. * Add the new counter.	2024-08-21 10:55:01 +00:00
Peter Bendel	6d8572ded6	Benchmarking: need to checkout actions to download Neon artifacts (#8770 ) ## Problem Database preparation workflow needs Neon artifacts but does not checkout necessary download action. We were lucke in a few runs like this one https://github.com/neondatabase/neon/actions/runs/10413970941/job/28870668020 but this is flaky and a race condition which failed here https://github.com/neondatabase/neon/actions/runs/10446395644/job/28923749772#step:4:1 ## Summary of changes Checkout code (including actions) before invoking download action Successful test run https://github.com/neondatabase/neon/actions/runs/10469356296/job/28992200694	2024-08-21 08:08:49 +01:00
Alex Chi Z.	c8b9116a97	impr(pageserver): abort on fatal I/O writer error (#8777 ) part of https://github.com/neondatabase/neon/issues/8140 The blob writer path now uses `maybe_fatal_err` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-20 20:05:33 +01:00
John Spray	beefc7a810	pageserver: add metric pageserver_secondary_heatmap_total_size (#8768 ) ## Problem We don't have a convenient way for a human to ask "how far are secondary downloads along for this tenant". This is useful when driving migrations of tenants to the storage controller, as we first create a secondary location and want to see it warm up before we cut over. That can already be done via storcon_cli, but we would like a way that doesn't require direct API access. ## Summary of changes Add a metric that reports to total size of layers in the heatmap: this may be used in conjunction with the existing `pageserver_secondary_resident_physical_size` to estimate "warmth" of the secondary location.	2024-08-20 19:47:42 +01:00
Vlad Lazar	fa0750a37e	storcon: add peer jwt token (#8764 ) ## Problem Storage controllers did not have the right token to speak to their peers for leadership transitions. ## Summary of changes Accept a peer jwt token for the storage controller. Epic: https://github.com/neondatabase/cloud/issues/14701	2024-08-20 15:25:21 +01:00
Conrad Ludgate	0170611a97	proxy: small changes (#8752 ) ## Problem #8736 is getting too big. splitting off some simple changes here ## Summary of changes Local proxy wont always be using tls, so make it optional. Local proxy wont be using ws for now, so make it optional. Remove a dead config var.	2024-08-20 14:16:27 +01:00
Vlad Lazar	1c96957e85	storcon: run db migrations after step down sequence (#8756 ) ## Problem Previously, we would run db migrations before doing the step-down sequence. This meant that the current leader would have to deal with the schema changes and that's generally not safe. ## Summary of changes Push the step-down procedure earlier in start-up and do db migrations right after it (but before we load-up the in-memory state from the db). Epic: https://github.com/neondatabase/cloud/issues/14701	2024-08-20 14:00:36 +01:00
John Spray	02a28c01ca	Revert "safekeeper: check for non-consecutive writes in safekeeper.rs" (#8771 ) Reverts neondatabase/neon#8640 This broke `test_last_log_term_switch` via a merge race of some kind.	2024-08-20 11:34:53 +00:00
Alexander Bayandin	c96593b473	Make Postgres 16 default version (#8745 ) ## Problem The default Postgres version is set to 15 in code, while we use 16 in most of the other places (and Postgres 17 is coming) ## Summary of changes - Run `benchmarks` job with Postgres 16 (instead of Postgres 14) - Set `DEFAULT_PG_VERSION` to 16 in all places - Remove deprecated `--pg-version` pytest argument - Update `test_metadata_bincode_serde_ensure_roundtrip` for Postgres 16	2024-08-20 10:46:58 +01:00
Christian Schwarz	ef57e73fbf	task_mgr::spawn: require a `TenantId` (#8462 ) … to dis-incentivize global tasks via task_mgr in the future (As of https://github.com/neondatabase/neon/pull/8339 all remaining task_mgr usage is tenant or timeline scoped.)	2024-08-20 08:26:44 +00:00
Arseny Sher	4c5a0fdc75	safekeeper: check for non-consecutive writes in safekeeper.rs wal_storage.rs already checks this, but since this is a quite legit scenario check it at safekeeper.rs (consensus level) as well. ref https://github.com/neondatabase/neon/issues/8212	2024-08-20 07:12:56 +03:00
Arpad Müller	4b26783c94	scrubber: remove _generic postfix and two unused functions (#8761 ) Removes the `_generic` postfix from the `GenericRemoteStorage` using APIs, as `remote_storage` is the "default" now, and add a `_s3` postfix to the remaining APIs using the S3 SDK (only in tenant snapshot). Also, remove two unused functions: `list_objects_with_retries` and `stream_tenants functions`. Part of https://github.com/neondatabase/neon/issues/7547	2024-08-19 23:58:47 +02:00
Arpad Müller	6949b45e17	Update aws -> infra for repo rename (#8755 ) See slack thread: https://neondb.slack.com/archives/C039YKBRZB4/p1722501766006179	2024-08-19 17:44:10 +02:00
Arpad Müller	3b8ca477ab	Migrate physical GC and scan_metadata to remote_storage (#8673 ) Migrates most of the remaining parts of the scrubber to remote_storage: * `pageserver_physical_gc` * `scan_metadata` for pageservers (safekeepers were done in #8595) * `download()` in `tenant_snapshot`. The main `tenant_snapshot` is not migrated as it uses version history to be able to work in the face of ongoing changes. Part of #7547	2024-08-19 16:39:44 +02:00
Christian Schwarz	eb7241c798	l0_flush: remove support for mode `page-cached` (#8739 ) It's been rolled out everywhere, no configs are referencing it. All code that's made dead by the removal of the config option is removed as part of this PR. The `page_caching::PreWarmingWriter` in `::No` mode is equivalent to a `size_tracking_writer`, so, use that. part of https://github.com/neondatabase/neon/issues/7418	2024-08-19 16:35:34 +02:00
Folke Behrens	f246aa3ca7	proxy: Fix some warnings by extended clippy checks (#8748 ) * Missing blank lifetimes which is now deprecated. * Matching off unqualified enum variants that could act like variable. * Missing semicolons.	2024-08-19 10:33:46 +02:00
Arpad Müller	188bde7f07	Default image compression to zstd at level 1 (#8677 ) After the rollout has succeeded, we now set the default image compression to be enabled. We also remove its explicit mention from `neon_fixtures.py` added in #8368 as it is now the default (and we switch to `zstd(1)` which is a bit nicer on CPU time). Part of https://github.com/neondatabase/neon/issues/5431	2024-08-18 18:32:10 +01:00
Yuchen Liang	7131ac4730	refactor(scrubber): add unified command suitable for cron job (#8635 ) Part of #8128. ## Description This PR creates a unified command to run both physical gc and metadata health check as a cron job. This also enables us to add additional tasks to the cron job in the future. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-17 16:48:53 +01:00
Konstantin Knizhnik	2be69af6c3	Track holes to be able to reuse them once LFC limit is increased (#8575 ) ## Problem Multiple increase/decrease LFC limit may cause unlimited growth of LFC file because punched holes while LFC shrinking are not reused when LFC is extended. ## Summary of changes Keep track of holes and reused them when LFC size is increased. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-16 22:19:44 +03:00
Sasha Krassovsky	c6b6b7700a	Fix superuser check in test_snap_files (#8749 ) ## Problem Current superuser check always passes because it returns a tuple like `(False,)`, and then the `if not superuser` passes. ## Summary of changes Fixes the issue by unwrapping the tuple. Verified that it works against a project where I don't have superuser.	2024-08-16 19:13:18 +01:00
John Spray	e2d89f7991	pageserver: prioritize secondary downloads to get most recent layers first, except l0s (#8729 ) ## Problem When a secondary location is trying to catch up while a tenant is receiving new writes, it can become quite wasteful: - Downloading L0s which are soon destroyed by compaction to L1s - Downloading older layer files which are soon made irrelevant when covered by image layers. ## Summary of changes Sort the layer files in the heatmap: - L0 layers are the lowest priority - Other layers are sorted to download the highest LSNs first.	2024-08-16 14:35:02 +02:00
Arseny Sher	25e7d321f4	safekeeper: cross check divergence point in ProposerElected handling. Previously, we protected from multiple ProposerElected messages from the same walproposer with the following condition: msg.term == self.get_last_log_term() && self.flush_lsn() > msg.start_streaming_at It is not exhaustive, i.e. we could still proceed to truncating WAL even though safekeeper inserted something since the divergence point has been calculated. While it was most likely safe because walproposer can't use safekeeper position to commit WAL until last_log_term reaches the current walproposer term, let's be more careful and properly calculate the divergence point like walproposer does.	2024-08-16 15:22:46 +03:00
Vlad Lazar	3f91ea28d9	tests: add infra and test for storcon leadership transfer (#8587 ) ## Problem https://github.com/neondatabase/neon/pull/8588 implemented the mechanism for storage controller leadership transfers. However, there's no tests that exercise the behaviour. ## Summary of changes 1. Teach `neon_local` how to handle multiple storage controller instances. Each storage controller instance gets its own subdirectory (`storage_controller_1, ...`). `storage_controller start\|stop` subcommands have also been extended to optionally accept an instance id. 2. Add a storage controller proxy test fixture. It's a basic HTTP server that forwards requests from pageserver and test env to the currently configured storage controller. 3. Add a test which exercises storage controller leadership transfer. 4. Finally fix a couple bugs that the test surfaced	2024-08-16 13:05:04 +01:00
Heikki Linnakangas	7fdc3ea162	Add retroactive RFC about physical replication (#8546 ) We've had physical replication support for a long time, but we never created an RFC for the feature. This RFC does that after the fact. Even though we've already implemented the feature, let's have a design discussion as if it hadn't done that. It can still be quite insightful. This is written from a pretty compute-centric viewpoint, not much on how it works in the control plane.	2024-08-16 11:30:53 +01:00
Joonas Koivunen	4763a960d1	chore: log if we have an open layer or any frozen on shutdown (#8740 ) Some benchmarks are failing with a "long" flushing, which might be because there is a queue of in-memory layers, or something else. Add logging to narrow it down. Private slack DM ref: https://neondb.slack.com/archives/D049K7HJ9JM/p1723727305238099	2024-08-16 06:10:05 +01:00
Sasha Krassovsky	df086cd139	Add logical replication test to exercise snapfiles (#8364 )	2024-08-15 15:34:45 -07:00
Alexander Bayandin	69cb1ee479	CI(replication-tests): store test results & change notification channel (#8687 ) ## Problem We want to store Nightly Replication test results in the database and notify the relevant Slack channel about failures ## Summary of changes - Store test results in the database - Notify `on-call-compute-staging-stream` about failures	2024-08-15 22:41:58 +01:00
Alexander Bayandin	4e58fd9321	CI(label-for-external-users): use CI_ACCESS_TOKEN (#8738 ) ## Problem `secrets.GITHUB_TOKEN` (with any permissions) is not enough to get a user's membership info if they decide to hide it. ## Summary of changes - Use `secrets.CI_ACCESS_TOKEN` for `gh api` call - Use `pull_request_target` instead of `pull_request` event to get access to secrets	2024-08-15 18:37:15 +01:00
Konstantin Knizhnik	f087423a01	Handle reload config file request in LR monitor (#8732 ) ## Problem Logical replication BGW checking replication lag is not reloading config ## Summary of changes Add handling of reload config request ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-15 16:28:25 +03:00
Joonas Koivunen	24d347f50b	storcon: use tracing for logging panics (#8734 ) this gives spans for panics, and does not globber loki output by writing to stderr while all of the other logging is to stdout. See: #3475	2024-08-15 16:27:07 +03:00
Joonas Koivunen	52641eb853	storcon: add spans to drain/fill ops (#8735 ) this way we do not need to repeat the %node_id everywhere, and we get no stray messages in logs from within the op.	2024-08-15 15:30:04 +03:00
Joonas Koivunen	d9a57aeed9	storcon: deny external node configuration if an operation is ongoing (#8727 ) Per #8674, disallow node configuration while drain/fill are ongoing. Implement it by adding a only-http wrapper `Service::external_node_configure` which checks for operation existing before configuring. Additionally: - allow cancelling drain/fill after a pageserver has restarted and transitioned to WarmingUp Fixes: #8674	2024-08-15 10:54:05 +01:00
Alexander Bayandin	a9c28be7d0	fix(pageserver): allow unused_imports in download.rs on macOS (#8733 ) ## Problem On macOS, clippy fails with the following error: ``` error: unused import: `crate::virtual_file::owned_buffers_io::io_buf_ext::IoBufExt` --> pageserver/src/tenant/remote_timeline_client/download.rs:26:5 \| 26 \| use crate::virtual_file::owned_buffers_io::io_buf_ext::IoBufExt; \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: `-D unused-imports` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_imports)]` ``` Introduced in https://github.com/neondatabase/neon/pull/8717 ## Summary of changes - allow `unused_imports` for `crate::virtual_file::owned_buffers_io::io_buf_ext::IoBufExt` on macOS in download.rs	2024-08-15 10:06:28 +01:00
Vlad Lazar	fef77b0cc9	safekeeper: consider partial uploads when pulling timeline (#8628 ) ## Problem The control file contains the id of the safekeeper that uploaded it. Previously, when sending a snapshot of the control file to another sk, it would eventually be gc-ed by the receiving sk. This is incorrect because the original sk might still need it later. ## Summary of Changes When sending a snapshot and the control file contains an uploaded segment: * Create a copy of the segment in s3 with the destination sk in the object name * Tweak the streamed control file to point to the object create in the previous step Note that the snapshot endpoint now has to know the id of the requestor, so the api has been extended to include the node if of the destination sk. Closes https://github.com/neondatabase/neon/issues/8542	2024-08-15 09:02:33 +01:00
Christian Schwarz	168913bdf0	refactor(write path): newtype to enforce use of fully initialized slices (#8717 ) The `tokio_epoll_uring::Slice` / `tokio_uring::Slice` type is weird. The new `FullSlice` newtype is better. See the doc comment for details. The naming is not ideal, but we'll clean that up in a future refactoring where we move the `FullSlice` into `tokio_epoll_uring`. Then, we'll do the following: * tokio_epoll_uring::Slice is removed * `FullSlice` becomes `tokio_epoll_uring::IoBufView` * new type `tokio_epoll_uring::IoBufMutView` for the current `tokio_epoll_uring::Slice<IoBufMut>` Context ------- I did this work in preparation for https://github.com/neondatabase/neon/pull/8537. There, I'm changing the type that the `inmemory_layer.rs` passes to `DeltaLayerWriter::put_value_bytes` and thus it seemed like a good opportunity to make this cleanup first.	2024-08-14 21:57:17 +02:00
Alexander Bayandin	aa2e16f307	CI: misc cleanup & fixes (#8559 ) ## Problem A bunch of small fixes and improvements for CI, that are too small to have a separate PR for them ## Summary of changes - CI(build-and-test): fix parenthesis - CI(actionlint): fix path to workflow file - CI: remove default args from actions/checkout - CI: remove `gen3` label, using a couple `self-hosted` + `small{,-arm64}`/`large{,-arm64}` is enough - CI: prettify Slack messages, hide links behind text messages - C(build-and-test): add more dependencies to `conclusion` job	2024-08-14 17:56:59 +01:00
Alexander Bayandin	70b18ff481	CI(neon-image): add ARM-specific RUSTFLAGS (#8566 ) ## Problem It's recommended that a couple of additional RUSTFLAGS be set up to improve the performance of Rust applications on AWS Graviton. See `57dc813626/rust.md` Note: Apple Silicon is compatible with neoverse-n1: ``` $ clang --version Apple clang version 15.0.0 (clang-1500.3.9.4) Target: arm64-apple-darwin23.6.0 Thread model: posix InstalledDir: /Applications/Xcode_15.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin $ $ clang --print-supported-cpus 2>&1 \| grep neoverse- neoverse-512tvb neoverse-e1 neoverse-n1 neoverse-n2 neoverse-v1 neoverse-v2 ``` ## Summary of changes - Add `-Ctarget-feature=+lse -Ctarget-cpu=neoverse-n1` to RUSTFLAGS for ARM images	2024-08-14 17:03:21 +01:00
Joonas Koivunen	60fc1e8cc8	chore: even more responsive compaction cancellation (#8725 ) Some benchmarks and tests might still fail because of #8655 (tracked in #8708) because we are not fast enough to shut down ([one evidence]). Partially this is explained by the current validation mode of streaming k-merge, but otherwise because that is where we use a lot of time in compaction. Outside of L0 => L1 compaction, the image layer generation is already guarded by vectored reads doing cancellation checks. 32768 is a wild guess based on looking how many keys we put in each layer in a bench (1-2 million), but I assume it will be good enough divisor. Doing checks more often will start showing up as contention which we cannot currently measure. Doing checks less often might be reasonable. [one evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10384136483/index.html#suites/9681106e61a1222669b9d22ab136d07b/96e6d53af234924/ Earlier PR: #8706.	2024-08-14 14:48:15 +01:00
Alexander Bayandin	36c1719a07	CI(build-neon): fix accidental neon rebuild on `cargo test` (#8721 ) ## Problem During `Run rust tests` step (for debug builds), we accidentally rebuild neon twice (by `cargo test --doc` and by `cargo nextest run`). It happens because we don't set `cov_prefix` for the `cargo test --doc` command, which triggers rebuilding with different build flags, and one more rebuild by `cargo nextest run`. ## Summary of changes - Set `cov_prefix` for `cargo test --doc` to prevent unneeded rebuilds	2024-08-14 13:38:25 +01:00
John Spray	abb53ba36d	storcon_cli: don't clobber heatmap interval when setting eviction (#8722 ) ## Problem This command is kind of a hack, used when we're migrating large tenants and want to get their resident size down. It sets the tenant config to a fixed value, which omitted heatmap_period, so caused secondaries to get out of date. ## Summary of changes - Set heatmap period to the same 300s default that we use elsewhere when updating eviction settings This is not as elegant as some general purpose partial modification of the config, but it practically makes the command safer to use.	2024-08-14 13:37:03 +01:00
Conrad Ludgate	a7028d92b7	proxy: start of jwk cache (#8690 ) basic JWT implementation that caches JWKs and verifies signatures. this code is currently not reachable from proxy, I just wanted to get something merged in.	2024-08-14 13:35:29 +01:00
Conrad Ludgate	5d4c57491f	Merge pull request #8723 from neondatabase/rc/proxy/2024-08-14 Proxy release 2024-08-14	2024-08-14 13:05:51 +01:00
Joonas Koivunen	6c9e3c9551	refactor: error/anyhow::Error wrapping (#8697 ) We can get CompactionError::Other(Cancelled) via the error handling with a few ways. [evidence](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8655/10301613380/index.html#suites/cae012a1e6acdd9fdd8b81541972b6ce/653a33de17802bb1/). Hopefully fix it by: 1. replace the `map_err` which hid the `GetReadyAncestorError::Cancelled` with `From<GetReadyAncestorError> for GetVectoredError` conversion 2. simplifying the code in pgdatadir_mapping to eliminate the token anyhow wrapping for deserialization errors 3. stop wrapping GetVectoredError as anyhow errors 4. stop wrapping PageReconstructError as anyhow errors Additionally, produce warnings if we treat any other error (as was legal before this PR) as missing key. Cc: #8708.	2024-08-14 12:45:56 +01:00
Alexander Bayandin	fc3d372f3a	CI(label-for-external-users): check membership using GitHub API (#8724 ) ## Problem `author_association` doesn't properly work if a GitHub user decides not to show affiliation with the org in their profile (i.e. if it's private) ## Summary of changes - Call `/orgs/ORG/members/USERNAME` API to check whether a PR/issue author is a member of the org	2024-08-14 12:27:52 +01:00
John Spray	19d69d515c	pageserver: evict covered layers earlier (#8679 ) ## Problem When pageservers do compaction, they frequently create image layers that make earlier layers un-needed for reads, but then keep those earlier layers around for 24 hours waiting for time-based eviction to expire them. Now that we track layer visibility, we can use it as an input to eviction, and avoid the 24 hour "disk bump" that happens around pageserver restarts. ## Summary of changes - During time-based eviction, if a layer is marked Covered, use the eviction period as the threshold: i.e. these layers get to remain resident for at least one iteration of the eviction loop, but then get evicted. With current settings this means they get evicted after 1h instead of 24h. - During disk usage eviction, prioritized evicting covered layers above all other layers. Caveats: - Using the period as the threshold for time based eviction in this case is a bit of a hack, but it avoids adding yet another configuration property, and in any case the value of a new property would be somewhat arbitrary: there's no "right" length of time to keep covered layers around just in case. - We had previously planned on removing time-based eviction: this change would motivate us to keep it around, but we can still simplify the code later to just do the eviction of covered layers, rather than applying a TTL policy to all layers.	2024-08-14 12:10:15 +01:00
Joonas Koivunen	485d76ac62	timeline_detach_ancestor: adjust error handling (#8528 ) With additional phases from #8430 the `detach_ancestor::Error` became untenable. Split it up into phases, and introduce laundering for remaining `anyhow::Error` to propagate them as most often `Error::ShuttingDown`. Additionally, complete FIXMEs. Cc: #6994	2024-08-14 10:16:18 +01:00
John Spray	4049d2b7e1	scrubber: fix spurious "Missed some shards" errors (#8661 ) ## Problem The storage scrubber was reporting warnings for lots of timelines like: ``` WARN Missed some shards at count ShardCount(0) tenant_id=25eb7a83d9a2f90ac0b765b6ca84cf4c ``` These were spurious: these tenants are fine. There was a bug in accumulating the ShardIndex for each tenant, whereby multiple timelines would lead us to add the same ShardIndex more than one. Closes: #8646 ## Summary of changes - Accumulate ShardIndex in a BTreeSet instead of a Vec - Extend the test to reproduce the issue	2024-08-14 09:29:06 +01:00
Konstantin Knizhnik	7a1736ddcf	Preserve HEAP_COMBOCID when restoring t_cid from WAL (#8503 ) ## Problem See https://github.com/neondatabase/neon/issues/8499 ## Summary of changes Save HEAP_COMBOCID flag in WAL and do not clear it in redo handlers. Related Postgres PRs: https://github.com/neondatabase/postgres/pull/457 https://github.com/neondatabase/postgres/pull/458 https://github.com/neondatabase/postgres/pull/459 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-14 08:13:20 +03:00
Tristan Partin	c624317b0e	Decode the database name in SQL/HTTP connections A url::Url does not hand you back a URL decoded value for path values, so we must decode them ourselves. Link: https://docs.rs/url/2.5.2/url/struct.Url.html#method.path Link: https://docs.rs/url/2.5.2/url/struct.Url.html#method.path_segments Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-13 16:32:58 -05:00
Tristan Partin	0f43b7c51b	Loosen type on PgProtocol::safe_psql(queries:) Using Iterable allows us to also use tuples, among other things. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-13 16:32:58 -05:00
Joonas Koivunen	6d6e2c6a39	feat(detach_ancestor): better retries with persistent gc blocking (#8430 ) With the persistent gc blocking, we can now retry reparenting timelines which had failed for whatever reason on the previous attempt(s). Restructure the detach_ancestor into three phases: - prepare (insert persistent gc blocking, copy lsn prefix, layers) - detach and reparent - reparenting can fail, so we might need to retry this portion - complete (remove persistent gc blocking) Cc: #6994	2024-08-13 18:51:51 +01:00
Joonas Koivunen	87a5d7db9e	test: do better job of shutting everything down (#8714 ) After #8655 we've had a few issues (mostly tracked on #8708) with the graceful shutdown. In order to shutdown more of the processes and catch more errors, for example, from all pageservers, do an immediate shutdown for those nodes which fail the initial (possibly graceful) shutdown. Cc: #6485	2024-08-13 18:49:50 +01:00
Peter Bendel	9d2276323d	Benchmarking tests: automatically restore Neon reuse databases, too and migrate to pg16 (#8707 ) ## Problem We use a set of Neon reuse databases in benchmarking.yml which are still using pg14. Because we want to compare apples to apples and have migrated the AWS reuse clusters to pg16 we should also use pg16 for Neon. ## Summary of changes - Automatically restore the test databases for Neon project	2024-08-13 19:36:39 +02:00
Joonas Koivunen	ae6e27274c	refactor(test): unify how we clear shared buffers (#8634 ) so that we can easily plug in LFC clearing as well. Private discussion reference: <https://neondb.slack.com/archives/C033A2WE6BZ/p1722942856987979>	2024-08-13 20:14:42 +03:00
Joonas Koivunen	8f170c5105	fix: make compaction more sensitive to cancellation (#8706 ) A few of the benchmarks have started failing after #8655 where they are waiting for compactor task. Reads done by image layer creation should already be cancellation sensitive because vectored get does a check each time, but try sprinkling additional cancellation points to: - each partition - after each vectored read batch	2024-08-13 18:00:54 +01:00
Joonas Koivunen	e0946e334a	bench: stop immediatedly in some benches (#8713 ) It seems that some benchmarks are failing because they are simply not stopping to ingest wal on shutdown. It might mean that the tests were never ran on a stable pageserver situation and WAL has always been left to be ingested on safekeepers, but let's see if this silences the failures and "stops the bleeding". Cc: https://github.com/neondatabase/neon/issues/8712	2024-08-13 17:07:51 +01:00
Alexander Bayandin	852a6a7a5a	CI: mark PRs and issues create by external users (#8694 ) ## Problem We want to mark new PRs and issues created by external users ## Summary of changes - Add a new workflow which adds `external` label for issues and PRs created by external users	2024-08-13 15:28:26 +01:00
John Spray	ecb01834d6	pageserver: implement utilization score (#8703 ) ## Problem When the utilization API was added, it was just a stub with disk space information. Disk space information isn't a very good metric for assigning tenants to pageservers, because pageservers making full use of their disks would always just have 85% utilization, irrespective of how much pressure they had for disk space. ## Summary of changes - Use the new layer visibiilty metric to calculate a "wanted size" per tenant, and sum these to get a total local disk space wanted per pageserver. This acts as the primary signal for utilization. - Also use the shard count to calculate a utilization score, and take the max of this and the disk-driven utilization. The shard count limit is currently set as a constant 20,000, which matches contemporary operational practices when loading pageservers. The shard count limit means that for tiny/empty tenants, on a machine with 3.84TB disk, each tiny tenant influences the utilization score as if it had size 160MB.	2024-08-13 15:15:55 +01:00
Konstantin Knizhnik	afb68b0e7e	Report search_path to make it possible to use it in pgbouncer track_extra_parameters (#8303 ) ## Problem When pooled connections are used, session semantic its not preserved, including GUC settings. Many customers have particular problem with setting search_path. But pgbouncer 1.20 has `track_extra_parameters` settings which allows to track parameters included in startup package which are reported by Postgres. Postgres has [an official list of parameters that it reports to the client](https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-ASYNC). This PR makes Postgres also report `search_path` and so allows to include it in `track_extra_parameters`. ## Summary of changes Set GUC_REPORT flag for `search_path`. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-13 15:07:24 +03:00
Vlad Lazar	b9d2c7bdd5	pageserver: remove vectored get related configs (#8695 ) ## Problem Pageserver exposes some vectored get related configs which are not in use. ## Summary of changes Remove the following pageserver configs: get_impl, get_vectored_impl, and `validate_get_vectored`. They are not used in the pageserver since https://github.com/neondatabase/neon/pull/8601. Manual overrides have been removed from the aws repo in https://github.com/neondatabase/aws/pull/1664.	2024-08-13 12:45:54 +01:00
John Spray	3379cbcaa4	pageserver: add CompactKey, use it in InMemoryLayer (#8652 ) ## Problem This follows a PR that insists all input keys are representable in 16 bytes: - https://github.com/neondatabase/neon/pull/8648 & a PR that prevents postgres from sending us keys that use the high bits of field2: - https://github.com/neondatabase/neon/pull/8657 Motivation for this change: 1. Ingest is bottlenecked on CPU 2. InMemoryLayer can create huge (~1M value) BTreeMap<Key,_> for its index. 3. Maps over i128 are much faster than maps over an arbitrary 18 byte struct. It may still be worthwhile to make the index two-tier to optimize for the case where only the last 4 bytes (blkno) of the key vary frequently, but simply using the i128 representation of keys has a big impact for very little effort. Related: #8452 ## Summary of changes - Introduce `CompactKey` type which contains an i128 - Use this instead of Key in InMemoryLayer's index, converting back and forth as needed. ## Performance All the small-value `bench_ingest` cases show improved throughput. The one that exercises this index most directly shows a 35% throughput increase: ``` ingest-small-values/ingest 128MB/100b seq, no delta time: [374.29 ms 378.56 ms 383.38 ms] thrpt: [333.88 MiB/s 338.13 MiB/s 341.98 MiB/s] change: time: [-26.993% -26.117% -25.111%] (p = 0.00 < 0.05) thrpt: [+33.531% +35.349% +36.974%] Performance has improved. ```	2024-08-13 11:48:23 +01:00
Arseny Sher	d24f1b6c04	Allow logical_replication_max_snap_files = -1 which disables the mechanism.	2024-08-13 09:42:16 +03:00
Sasha Krassovsky	32aa1fc681	Add on-demand WAL download to slot funcs (#8705 ) ## Problem Currently we can have an issue where if someone does `pg_logical_slot_advance`, it could fail because it doesn't have the WAL locally. ## Summary of changes Adds on-demand WAL download and a test to these slot funcs. Before adding these, the test fails with ``` requested WAL segment pg_wal/000000010000000000000001 has already been removed ``` After the changes, the test passes Relies on: - https://github.com/neondatabase/postgres/pull/466 - https://github.com/neondatabase/postgres/pull/467 - https://github.com/neondatabase/postgres/pull/468	2024-08-12 20:54:42 -08:00
Peter Bendel	f57c2fe8fb	Automatically prepare/restore Aurora and RDS databases from pg_dump in benchmarking workflow (#8682 ) ## Problem We use infrastructure as code (TF) to deploy AWS Aurora and AWS RDS Postgres database clusters. Whenever we have a change in TF (e.g. every year to upgrade to a higher Postgres version or when we change the cluster configuration) TF will apply the change and create a new AWS database cluster. However our benchmarking testcase also expects databases in these clusters and tables loaded with data. So we add auto-detection - if the AWS RDS instances are "empty" we create the necessary databases and restore a pg_dump. Important Notes: - These steps are NOT run in each benchmarking run, but only after a new RDS instance has been deployed. - the benchmarking workflows use GitHub secrets to find the connection string for the database. These secrets still need to be (manually or programmatically using git cli) updated if some port of the connection string (e.g. user, password or hostname) changes. ## Summary of changes In each benchmarking run check if - database has already been created - if not create it - database has already been restored - if not restore it Supported databases - tpch - clickbench - user example Supported platforms: - AWS RDS Postgres - AWS Aurora serverless Postgres Sample workflow run - but this one uses Neon database to test the restore step and not real AWS databases https://github.com/neondatabase/neon/actions/runs/10321441086/job/28574350581 Sample workflow run - with real AWS database clusters https://github.com/neondatabase/neon/actions/runs/10346816389/job/28635997653 Verification in second run - with real AWS database clusters - that second time the restore is skipped https://github.com/neondatabase/neon/actions/runs/10348469517/job/28640778223	2024-08-12 21:46:35 +02:00
Christian Schwarz	ce0d0a204c	fix(walredo): shutdown can complete too early (#8701 ) Problem ------- The following race is possible today: ``` walredo_extraordinary_shutdown_thread: shutdown gets until Poll::Pending of self.launched_processes.close().await call other thread: drops the last Arc<Process> = 1. drop(_launched_processes_guard) runs, this ... walredo_extraordinary_shutdown_thread: ... wakes self.launched_processes.close().await walredo_extraordinary_shutdown_thread: logs `done` other thread: = 2. drop(process): this kill & waits ``` Solution -------- Change drop order so that `process` gets dropped first. Context ------- https://neondb.slack.com/archives/C06Q661FA4C/p1723478188785719?thread_ts=1723456706.465789&cid=C06Q661FA4C refs https://github.com/neondatabase/neon/pull/8572 refs https://github.com/neondatabase/cloud/issues/11387	2024-08-12 18:15:48 +01:00
Vlad Lazar	ae527ef088	storcon: implement graceful leadership transfer (#8588 ) ## Problem Storage controller restarts cause temporary unavailability from the control plane POV. See RFC for more details. ## Summary of changes * A couple of small refactors of the storage controller start-up sequence to make extending it easier. * A leader table is added to track the storage controller instance that's currently the leader (if any) * A peer client is added such that storage controllers can send `step_down` requests to each other (implemented in https://github.com/neondatabase/neon/pull/8512). * Implement the leader cut-over as described in the RFC * Add `start-as-candidate` flag to the storage controller to gate the rolling restart behaviour. When the flag is `false` (the default), the only change from the current start-up sequence is persisting the leader entry to the database.	2024-08-12 13:58:46 +01:00
Joonas Koivunen	9dc9a9b2e9	test: do graceful shutdown by default (#8655 ) It should give us all possible allowed_errors more consistently. While getting the workflows to pass on https://github.com/neondatabase/neon/pull/8632 it was noticed that allowed_errors are rarely hit (1/4). This made me realize that we always do an immediate stop by default. Doing a graceful shutdown would had made the draining more apparent and likely we would not have needed the #8632 hotfix. Downside of doing this is that we will see more timeouts if tests are randomly leaving pause failpoints which fail the shutdown. The net outcome should however be positive, we could even detect too slow shutdowns caused by a bug or deadlock.	2024-08-12 15:37:15 +03:00
John Spray	1b9a27d6e3	tests: reinstate test_bulk_insert (#8683 ) ## Problem This test was disabled. ## Summary of changes - Remove the skip marker. - Explicitly avoid doing compaction & gc during checkpoints (the default scale doesn't do anything here, but when experimeting with larger scales it messes things up) - Set a data size that gives a ~20s runtime on a Hetzner dev machine, previous one gave very noisy results because it was so small For reference on a Hetzner AX102: ``` ------------------------------ Benchmark results ------------------------------- test_bulk_insert[neon-release-pg16].insert: 25.664 s test_bulk_insert[neon-release-pg16].pageserver_writes: 5,428 MB test_bulk_insert[neon-release-pg16].peak_mem: 577 MB test_bulk_insert[neon-release-pg16].size: 0 MB test_bulk_insert[neon-release-pg16].data_uploaded: 1,922 MB test_bulk_insert[neon-release-pg16].num_files_uploaded: 8 test_bulk_insert[neon-release-pg16].wal_written: 1,382 MB test_bulk_insert[neon-release-pg16].wal_recovery: 25.373 s test_bulk_insert[neon-release-pg16].compaction: 0.035 s ```	2024-08-12 13:33:09 +01:00
Shinya Kato	41b5ee491e	Fix a comment in walproposer_pg.c (#8583 ) ## Problem Perhaps there is an error in the source code comment. ## Summary of changes Fix "walsender" to "walproposer"	2024-08-12 13:24:25 +01:00
Arseny Sher	06df6ca52e	proto changes	2024-08-12 14:48:05 +03:00
Arseny Sher	930763cad2	s/jsonb/array	2024-08-12 14:48:05 +03:00
Arseny Sher	28ef1522d6	cosmetic fixes	2024-08-12 14:48:05 +03:00
Arseny Sher	c9d2b61195	fix term uniqueness	2024-08-12 14:48:05 +03:00
Arseny Sher	4d1cf2dc6f	tests, rollout	2024-08-12 14:48:05 +03:00
Arseny Sher	7b50c1a457	more wip ref https://github.com/neondatabase/cloud/issues/14668	2024-08-12 14:48:05 +03:00
Arseny Sher	1e789fb963	wipwip	2024-08-12 14:48:05 +03:00
Arseny Sher	162424ad77	wip	2024-08-12 14:48:05 +03:00
Arseny Sher	a4eea5025c	Fix logical apply worker reporting of flush_lsn wrt sync replication. It should take syncrep flush_lsn into account because WAL before it on endpoint restart is lost, which makes replication miss some data if slot had already been advanced too far. This commit adds test reproducing the issue and bumps vendor/postgres to commit with the actual fix.	2024-08-12 13:14:02 +03:00
Alexander Bayandin	4476caf670	CI: add `actions/set-docker-config-dir` to set DOCKER_CONFIG (#8676 ) ## Problem In several workflows, we have repeating code which is separated into two steps: ```bash mkdir -p $(pwd)/.docker-custom echo DOCKER_CONFIG=/tmp/.docker-custom >> $GITHUB_ENV ... rm -rf $(pwd)/.docker-custom ``` Such copy-paste is prone to errors; for example, in one case, instead of `$(pwd)/.docker-custom`, we use `/tmp/.docker-custom`, which is shared between workflows. ## Summary of changes - Create a new action `actions/set-docker-config-dir`, which sets `DOCKER_CONFIG` and deletes it in a Post action part	2024-08-12 09:17:31 +01:00
dependabot[bot]	f7a3380aec	chore(deps): bump aiohttp from 3.9.4 to 3.10.2 (#8684 )	2024-08-11 12:21:32 +01:00
Arpad Müller	507f1a5bdd	Also pass HOME env var in access_env_vars (#8685 ) Noticed this while debugging a test failure in #8673 which only occurs with real S3 instead of mock S3: if you authenticate to S3 via `AWS_PROFILE`, then it requires the `HOME` env var to be set so that it can read inside the `~/.aws` directory. The scrubber abstraction `StorageScrubber::scrubber_cli` in `neon_fixtures.py` would otherwise not work. My earlier PR #6556 has done similar things for the `neon_local` wrapper. You can try: ``` aws sso login --profile dev export ENABLE_REAL_S3_REMOTE_STORAGE=y REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests REMOTE_STORAGE_S3_REGION=eu-central-1 AWS_PROFILE=dev RUST_BACKTRACE=1 BUILD_TYPE=debug DEFAULT_PG_VERSION=16 ./scripts/pytest -vv --tb=short -k test_scrubber_tenant_snapshot ``` before and after this patch: this patch fixes it.	2024-08-10 12:04:47 +00:00
John Spray	401dcd3551	Update docs/SUMMARY.md (#8665 ) ## Problem This page had many dead links, and was confusing for folks looking for documentation about our product. Closes: https://github.com/neondatabase/neon/issues/8535 ## Summary of changes - Add a link to the product docs up top - Remove dead/placeholder links	2024-08-09 18:30:15 +01:00
Alexander Bayandin	4a53cd0fc3	Dockerfiles: remove cachepot (#8666 ) ## Problem We install and try to use `cachepot`. But it is not configured correctly and doesn't work (after https://github.com/neondatabase/neon/pull/2290) ## Summary of changes - Remove `cachepot`	2024-08-09 15:48:16 +01:00
Vlad Lazar	f5cef7bf7f	storcon: skip draining shard if it's secondary is lagging too much (#8644 ) ## Problem Migrations of tenant shards with cold secondaries are holding up drains in during production deployments. ## Summary of changes If a secondary locations is lagging by more than 256MiB (configurable, but that's the default), then skip cutting it over to the secondary as part of the node drain.	2024-08-09 15:45:07 +01:00
John Spray	e6770d79fd	pageserver: don't treat NotInitialized::Stopped as unexpected (#8675 ) ## Problem This type of error can happen during shutdown & was triggering a circuit breaker alert. ## Summary of changes - Map NotIntialized::Stopped to CompactionError::ShuttingDown, so that we may handle it cleanly	2024-08-09 14:01:56 +01:00
Alexander Bayandin	201f56baf7	CI(pin-build-tools-image): fix permissions for Azure login (#8671 ) ## Problem Azure login fails in `pin-build-tools-image` workflow because the job doesn't have the required permissions. ``` Error: Please make sure to give write permissions to id-token in the workflow. Error: Login failed with Error: Error message: Unable to get ACTIONS_ID_TOKEN_REQUEST_URL env variable. Double check if the 'auth-type' is correct. Refer to https://github.com/Azure/login#readme for more information. ``` ## Summary of changes - Add `id-token: write` permission to `pin-build-tools-image` - Add an input to force image tagging - Unify pushing to Docker Hub with other registries - Split the job into two to have less if's	2024-08-09 12:05:43 +01:00
Alex Chi Z.	a155914c1c	fix(neon): disable create tablespace stmt (#8657 ) part of https://github.com/neondatabase/neon/issues/8653 Disable create tablespace stmt. It turns out it requires much less effort to do the regress test mode flag than patching the test cases, and given that we might need to support tablespaces in the future, I decided to add a new flag `regress_test_mode` to change the behavior of create tablespace. Tested manually that without setting regress_test_mode, create tablespace will be rejected. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-09 09:18:55 +01:00
Conrad Ludgate	7e08fbd1b9	Revert "proxy: update tokio-postgres to allow arbitrary config params (#8076 )" (#8654 ) This reverts #8076 - which was already reverted from the release branch since forever (it would have been a breaking change to release for all users who currently set TimeZone options). It's causing conflicts now so we should revert it here as well.	2024-08-09 09:09:29 +01:00
Peter Bendel	2ca5ff26d7	Run a subset of benchmarking job steps on GitHub action runners in Azure - closer to the system under test (#8651 ) ## Problem Latency from one cloud provider to another one is higher than within the same cloud provider. Some of our benchmarks are latency sensitive - we run a pgbench or psql in the github action runner and the system under test is running in Neon (database project). For realistic perf tps and latency results we need to compare apples to apples and run the database client in the same "latency distance" for all tests. ## Summary of changes Move job steps that test Neon databases deployed on Azure into Azure action runners. - bench strategy variant using azure database - pgvector strategy variant using azure database - pgbench-compare strategy variants using azure database ## Test run https://github.com/neondatabase/neon/actions/runs/10314848502	2024-08-09 08:36:29 +01:00
Alexander Bayandin	8acce00953	Dockerfiles: fix LegacyKeyValueFormat & JSONArgsRecommended (#8664 ) ## Problem CI complains in all PRs: ``` "ENV key=value" should be used instead of legacy "ENV key value" format ``` https://docs.docker.com/reference/build-checks/legacy-key-value-format/ See - https://github.com/neondatabase/neon/pull/8644/files ("Unchanged files with check annotations" section) - https://github.com/neondatabase/neon/actions/runs/10304090562?pr=8644 ("Annotations" section) ## Summary of changes - Use `ENV key=value` instead of `ENV key value` in all Dockerfiles	2024-08-09 07:54:54 +01:00
Alexander Bayandin	d28a6f2576	CI(build-tools): update Rust, Python, Mold (#8667 ) ## Problem - Rust 1.80.1 has been released: https://blog.rust-lang.org/2024/08/08/Rust-1.80.1.html - Python 3.9.19 has been released: https://www.python.org/downloads/release/python-3919/ - Mold 2.33.0 has been released: https://github.com/rui314/mold/releases/tag/v2.33.0 - Unpinned `cargo-deny` in `build-tools` got updated to the latest version and doesn't work anymore with the current config file ## Summary of changes - Bump Rust to 1.80.1 - Bump Python to 3.9.19 - Bump Mold to 2.33.0 - Pin `cargo-deny`, `cargo-hack`, `cargo-hakari`, `cargo-nextest`, `rustfilt` versions - Update `deny.toml` to the latest format, see https://github.com/EmbarkStudios/cargo-deny/pull/611	2024-08-09 06:17:16 +00:00
John Spray	4431688dc6	tests: don't require kafka client for regular tests (#8662 ) ## Problem We're adding more third party dependencies to support more diverse + realistic test cases in `test_runner/logical_repl`. I ❤️ these tests, they are a good thing. The slight glitch is that python packaging is hard, and some third party python packages have issues. For example the current kafka dependency doesn't work on latest python. We can mitigate that by only importing these more specialized dependencies in the tests that use them. ## Summary of changes - Move the `kafka` import into a test body, so that folks running the regular `test_runner/regress` tests don't have to have a working kafka client package.	2024-08-08 19:24:21 +01:00
Conrad Ludgate	73935ea3a2	Merge pull request #8647 from neondatabase/rc/proxy/2024-08-08 Proxy release 2024-08-08	2024-08-08 15:37:09 +01:00
Conrad Ludgate	32e595d4dd	Merge branch 'release-proxy' into rc/proxy/2024-08-08	2024-08-08 13:53:33 +01:00
John Spray	953b7d4f7e	pageserver: remove paranoia double-calculation of retain_lsns (#8617 ) ## Problem This code was to mitigate risk in https://github.com/neondatabase/neon/pull/8427 As expected, we did not hit this code path - the new continuous updates of gc_info are working fine, we can remove this code now. ## Summary of changes - Remove block that double-checks retain_lsns	2024-08-08 12:57:48 +01:00
Joonas Koivunen	8561b2c628	fix: stop leaking BackgroundPurges (#8650 ) avoid "leaking" the completions of BackgroundPurges by: 1. switching it to TaskTracker for provided close+wait 2. stop using tokio::fs::remove_dir_all which will consume two units of memory instead of one blocking task Additionally, use more graceful shutdown in tests which do actually some background cleanup.	2024-08-08 12:02:53 +01:00
Joonas Koivunen	21638ee96c	fix(test): do not fail test for filesystem race (#8643 ) evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8632/10287641784/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/c7a46335515142b/	2024-08-08 10:34:47 +01:00
Konstantin Knizhnik	cbe8c77997	Use sycnhronous commit for logical replicaiton worker (#8645 ) ## Problem See https://neondb.slack.com/archives/C03QLRH7PPD/p1723038557449239?thread_ts=1722868375.476789&cid=C03QLRH7PPD Logical replication subscription by default use `synchronous_commit=off` which cause problems with safekeeper ## Summary of changes Set `synchronous_commit=on` for logical replication subscription in test_subscriber_restart.py ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-08 10:23:57 +03:00
John Spray	cf3eac785b	pageserver: make bench_ingest build (but panic) on macOS (#8641 ) ## Problem Some developers build on MacOS, which doesn't have io_uring. ## Summary of changes - Add `io_engine_for_bench`, which on linux will give io_uring or panic if it's unavailable, and on MacOS will always panic. We do not want to run such benchmarks with StdFs: the results aren't interesting, and will actively waste the time of any developers who start investigating performance before they realize they're using a known-slow I/O backend. Why not just conditionally compile this benchmark on linux only? Because even on linux, I still want it to refuse to run if it can't get io_uring.	2024-08-07 21:17:08 +01:00
Yuchen Liang	542385e364	feat(pageserver): add direct io pageserver config (#8622 ) Part of #8130, [RFC: Direct IO For Pageserver](https://github.com/neondatabase/neon/blob/problame/direct-io-rfc/docs/rfcs/034-direct-io-for-pageserver.md) ## Description Add pageserver config for evaluating/enabling direct I/O. - Disabled: current default, uses buffered io as is. - Evaluate: still uses buffered io, but could do alignment checking and perf simulation (pad latency by direct io RW to a fake file). - Enabled: uses direct io, behavior on alignment error is configurable. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-07 21:04:19 +01:00
Joonas Koivunen	05dd1ae9e0	fix: drain completed page_service connections (#8632 ) We've noticed increased memory usage with the latest release. Drain the joinset of `page_service` connection handlers to avoid leaking them until shutdown. An alternative would be to use a TaskTracker. TaskTracker was not discussed in original PR #8339 review, so not hot fixing it in here either.	2024-08-07 17:14:45 +00:00
Cihan Demirci	8468d51a14	cicd: push build-tools image to ACR as well (#8638 ) https://github.com/neondatabase/cloud/issues/15899	2024-08-07 17:53:47 +01:00
Joonas Koivunen	a81fab4826	refactor(timeline_detach_ancestor): replace ordered reparented with a hashset (#8629 ) Earlier I was thinking we'd need a (ancestor_lsn, timeline_id) ordered list of reparented. Turns out we did not need it at all. Replace it with an unordered hashset. Additionally refactor the reparented direct children query out, it will later be used from more places. Split off from #8430. Cc: #6994	2024-08-07 18:19:00 +02:00
Alex Chi Z.	b3eea45277	fix(pageserver): dump the key when it's invalid (#8633 ) We see an assertion error in staging. Dump the key to guess where it was from, and then we can fix it. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-07 16:37:46 +01:00
Joonas Koivunen	fc78774f39	fix: EphemeralFiles can outlive their Timeline via `enum LayerManager` (#8229 ) Ephemeral files cleanup on drop but did not delay shutdown, leading to problems with restarting the tenant. The solution is as proposed: - make ephemeral files carry the gate guard to delay `Timeline::gate` closing - flush in-memory layers and strong references to those on `Timeline::shutdown` The above are realized by making LayerManager an `enum` with `Open` and `Closed` variants, and fail requests to modify `LayerMap`. Additionally: - fix too eager anyhow conversions in compaction - unify how we freeze layers and handle errors - optimize likely_resident_layers to read LayerFileManager hashmap values instead of bouncing through LayerMap Fixes: #7830	2024-08-07 17:50:09 +03:00
Conrad Ludgate	ad0988f278	proxy: random changes (#8602 ) ## Problem 1. Hard to correlate startup parameters with the endpoint that provided them. 2. Some configurations are not needed in the `ProxyConfig` struct. ## Summary of changes Because of some borrow checker fun, I needed to switch to an interior-mutability implementation of our `RequestMonitoring` context system. Using https://docs.rs/try-lock/latest/try_lock/ as a cheap lock for such a use-case (needed to be thread safe). Removed the lock of each startup message, instead just logging only the startup params in a successful handshake. Also removed from values from `ProxyConfig` and kept as arguments. (needed for local-proxy config)	2024-08-07 14:37:03 +01:00
Arpad Müller	4d7c0dac93	Add missing colon to ArchivalConfigRequest specification (#8627 ) Add a missing colon to the API specification of `ArchivalConfigRequest`. The `state` field is required. Pointed out by Gleb.	2024-08-07 14:53:52 +02:00
Arpad Müller	00c981576a	Lower level for timeline cancellations during gc (#8626 ) Timeline cancellation running in parallel with gc yields error log lines like: ``` Gc failed 1 times, retrying in 2s: TimelineCancelled ``` They are completely harmless though and normal to occur. Therefore, only print those messages at an info level. Still print them at all so that we know what is going on if we focus on a single timeline.	2024-08-07 09:29:52 +02:00
Arpad Müller	c3f2240fbd	storage broker: only print one line for version and build tag in init (#8624 ) This makes it more consistent with pageserver and safekeeper. Also, it is easier to collect the two values into one data point.	2024-08-07 09:14:26 +02:00
Yuchen Liang	ed5724d79d	scrubber: clean up `scan_metadata` before prod (#8565 ) Part of #8128. ## Problem Currently, scrubber `scan_metadata` command will return with an error code if the metadata on remote storage is corrupted with fatal errors. To safely deploy this command in a cronjob, we want to differentiate between failures while running scrubber command and the erroneous metadata. At the same time, we also want our regression tests to catch corrupted metadata using the scrubber command. ## Summary of changes - Return with error code only when the scrubber command fails - Uses explicit checks on errors and warnings to determine metadata health in regression tests. Resolve conflict with `tenant-snapshot` command (after shard split): [`test_scrubber_tenant_snapshot`](https://github.com/neondatabase/neon/blob/yuchen/scrubber-scan-cleanup-before-prod/test_runner/regress/test_storage_scrubber.py#L23) failed before applying `422a8443dd` - When taking a snapshot, the old `index_part.json` in the unsharded tenant directory is not kept. - The current `list_timeline_blobs` implementation consider no `index_part.json` as a parse error. - During the scan, we are only analyzing shards with highest shard count, so we will not get a parse error. but we do need to add the layers to tenant object listing, otherwise we will get index is referencing a layer that is not in remote storage error. - Action: Add s3_layers from `list_timeline_blobs` regardless of parsing error Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-06 18:55:42 +01:00
John Spray	ca5390a89d	pageserver: add `bench_ingest` (#7409 ) ## Problem We lack a rust bench for the inmemory layer and delta layer write paths: it is useful to benchmark these components independent of postgres & WAL decoding. Related: https://github.com/neondatabase/neon/issues/8452 ## Summary of changes - Refactor DeltaLayerWriter to avoid carrying a Timeline, so that it can be cleanly tested + benched without a Tenant/Timeline test harness. It only needed the Timeline for building `Layer`, so this can be done in a separate step. - Add `bench_ingest`, which exercises a variety of workload "shapes" (big values, small values, sequential keys, random keys) - Include a small uncontroversial optimization: in `freeze`, only exhaustively walk values to assert ordering relative to end_lsn in debug mode. These benches are limited by drive performance on a lot of machines, but still useful as a local tool for iterating on CPU/memory improvements around this code path. Anecdotal measurements on Hetzner AX102 (Ryzen 7950xd): ``` ingest-small-values/ingest 128MB/100b seq time: [1.1160 s 1.1230 s 1.1289 s] thrpt: [113.38 MiB/s 113.98 MiB/s 114.70 MiB/s] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild Benchmarking ingest-small-values/ingest 128MB/100b rand: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 18.9s. ingest-small-values/ingest 128MB/100b rand time: [1.9001 s 1.9056 s 1.9110 s] thrpt: [66.982 MiB/s 67.171 MiB/s 67.365 MiB/s] Benchmarking ingest-small-values/ingest 128MB/100b rand-1024keys: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 11.0s. ingest-small-values/ingest 128MB/100b rand-1024keys time: [1.0715 s 1.0828 s 1.0937 s] thrpt: [117.04 MiB/s 118.21 MiB/s 119.46 MiB/s] ingest-small-values/ingest 128MB/100b seq, no delta time: [425.49 ms 429.07 ms 432.04 ms] thrpt: [296.27 MiB/s 298.32 MiB/s 300.83 MiB/s] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild ingest-big-values/ingest 128MB/8k seq time: [373.03 ms 375.84 ms 379.17 ms] thrpt: [337.58 MiB/s 340.57 MiB/s 343.13 MiB/s] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild ingest-big-values/ingest 128MB/8k seq, no delta time: [81.534 ms 82.811 ms 83.364 ms] thrpt: [1.4994 GiB/s 1.5095 GiB/s 1.5331 GiB/s] Found 1 outliers among 10 measurements (10.00%) ```	2024-08-06 16:39:40 +00:00
John Spray	3727c6fbbe	pageserver: use layer visibility when composing heatmap (#8616 ) ## Problem Sometimes, a layer is Covered by hasn't yet been evicted from local disk (e.g. shortly after image layer generation). It is not good use of resources to download these to a secondary location, as there's a good chance they will never be read. This follows the previous change that added layer visibility: - #8511 Part of epic: - https://github.com/neondatabase/neon/issues/8398 ## Summary of changes - When generating heatmaps, only include Visible layers - Update test_secondary_downloads to filter to visible layers when listing layers from an attached location	2024-08-06 17:15:40 +01:00
John Spray	42229aacf6	pageserver: fixes for layer visibility metric (#8603 ) ## Problem In staging, we could see that occasionally tenants were wrapping their pageserver_visible_physical_size metric past zero to 2^64. This is harmless right now, but will matter more later when we start using visible size in things like the /utilization endpoint. ## Summary of changes - Add debug asserts that detect this case. `test_gc_of_remote_layers` works as a reproducer for this issue once the asserts are added. - Tighten up the interface around access_stats so that only Layer can mutate it. - In Layer, wrap calls to `record_access` in code that will update the visible size statistic if the access implicitly marks the layer visible (this was what caused the bug) - In LayerManager::rewrite_layers, use the proper set_visibility layer function instead of directly using access_stats (this is an additional path where metrics could go bad.) - Removed unused instances of LayerAccessStats in DeltaLayer and ImageLayer which I noticed while reviewing the code paths that call record_access.	2024-08-06 14:47:01 +01:00
John Spray	b7beaa0fd7	tests: improve stability of `test_storage_controller_many_tenants` (#8607 ) ## Problem The controller scale test does random migrations. These mutate secondary locations, and therefore can cause secondary optimizations to happen in the background, violating the test's expectation that consistency_check will work as there are no reconciliations running. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10247161379/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/6316beacd3fb3060/ ## Summary of changes - Only migrate to existing secondary locations, not randomly picked nodes, so that we can do a fast reconcile_until_idle (otherwise reconcile_until_idle is takes a long time to create new secondary locations). - Do a reconcile_until_idle before consistency_check.	2024-08-06 12:58:33 +01:00
a-masterov	16c91ff5d3	enable rum test (#8380 ) ## Problem We need to test the rum extension automatically as a path of the GitHub workflow ## Summary of changes rum test is enabled	2024-08-06 13:56:42 +02:00
a-masterov	078f941dc8	Add a test using Debezium as a client for the logical replication (#8568 ) ## Problem We need to test the logical replication with some external consumers. ## Summary of changes A test of the logical replication with Debezium as a consumer was added. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-08-06 13:08:55 +02:00
Arseny Sher	68bcbf8227	Add package-mode=false to poetry. We don't use it for packaging, and 'poetry install' will soon error otherwise. Also remove name and version fields as these are not required for non-packaging mode.	2024-08-06 13:53:23 +03:00
Arpad Müller	a31c95cb40	storage_scrubber: migrate scan_safekeeper_metadata to remote_storage (#8595 ) Migrates the safekeeper-specific parts of `ScanMetadata` to GenericRemoteStorage, making it Azure-ready. Part of https://github.com/neondatabase/neon/issues/7547	2024-08-06 10:51:39 +00:00
Joonas Koivunen	dc7eb5ae5a	chore: bump index part version (#8611 ) #8600 missed the hunk changing index_part.json informative version. Include it in this PR, in addition add more non-warning index_part.json versions to scrubber.	2024-08-06 11:45:41 +01:00
Vlad Lazar	44fedfd6c3	pageserver: remove legacy read path (#8601 ) ## Problem We have been maintaining two read paths (legacy and vectored) for a while now. The legacy read-path was only used for cross validation in some tests. ## Summary of changes * Tweak all tests that were using the legacy read path to use the vectored read path instead * Remove the read path dispatching based on the pageserver configs * Remove the legacy read path code We will be able to remove the single blob io code in `pageserver/src/tenant/blob_io.rs` when https://github.com/neondatabase/neon/issues/7386 is complete. Closes https://github.com/neondatabase/neon/issues/8005	2024-08-06 10:14:01 +01:00
Joonas Koivunen	138f008bab	feat: persistent gc blocking (#8600 ) Currently, we do not have facilities to persistently block GC on a tenant for whatever reason. We could do a tenant configuration update, but that is risky for generation numbers and would also be transient. Introduce a `gc_block` facility in the tenant, which manages per timeline blocking reasons. Additionally, add HTTP endpoints for enabling/disabling manual gc blocking for a specific timeline. For debugging, individual tenant status now includes a similar string representation logged when GC is skipped. Cc: #6994	2024-08-06 10:09:56 +01:00
Joonas Koivunen	6a6f30e378	fix: make Timeline::set_disk_consistent_lsn use fetch_max (#8311 ) now it is safe to use from multiple callers, as we have two callers.	2024-08-06 08:52:01 +01:00
Alex Chi Z.	8f3bc5ae35	feat(pageserver): support dry-run for gc-compaction, add statistics (#8557 ) Add dry-run mode that does not produce any image layer + delta layer. I will use this code to do some experiments and see how much space we can reclaim for tenants on staging. Part of https://github.com/neondatabase/neon/issues/8002 * Add dry-run mode that runs the full compaction process without updating the layer map. (We never call finish on the writers and the files will be removed before exiting the function). * Add compaction statistics and print them at the end of compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-06 02:07:48 +00:00
Alexander Bayandin	e6e578821b	CI(benchmarking): set pub/sub projects for LR tests (#8483 ) ## Problem > Currently, long-running LR tests recreate endpoints every night. We'd like to have along-running buildup of history to exercise the pageserver in this case (instead of "unit-testing" the same behavior everynight). Closes #8317 ## Summary of changes - Update Postgres version for replication tests - Set `BENCHMARK_PROJECT_ID_PUB`/`BENCHMARK_PROJECT_ID_SUB` env vars to projects that were created for this purpose --------- Co-authored-by: Sasha Krassovsky <krassovskysasha@gmail.com>	2024-08-05 22:06:47 +00:00
Joonas Koivunen	c32807ac19	fix: allow awaiting logical size for root timelines (#8604 ) Currently if `GET /v1/tenant/x/timeline/y?force-await-initial-logical-size=true` is requested for a root timeline created within the current pageserver session, the request handler panics hitting the debug assertion. These timelines will always have an accurate (at initdb import) calculated logical size. Fix is to never attempt prioritizing timeline size calculation if we already have an exact value. Split off from #8528.	2024-08-05 21:21:33 +01:00
Alexander Bayandin	50daff9655	CI(trigger-e2e-tests): fix deadlock with Build and Test workflow (#8606 ) ## Problem In some cases, a deadlock between `build-and-test` and `trigger-e2e-tests` workflows can happen: ``` Build and Test Canceling since a deadlock for concurrency group 'Build and Test-8600/merge-anysha' was detected between 'top level workflow' and 'trigger-e2e-tests' ``` I don't understand the reason completely, probably `${{ github.workflow }}` got evaluated to the same value and somehow caused the issue. We don't need to limit concurrency for `trigger-e2e-tests` workflow. See https://neondb.slack.com/archives/C059ZC138NR/p1722869486708179?thread_ts=1722869027.960029&cid=C059ZC138NR	2024-08-05 19:47:59 +01:00
Alexander Bayandin	bd845c7587	CI(trigger-e2e-tests): wait for promote-images job from the last commit (#8592 ) ## Problem We don't trigger e2e tests for draft PRs, but we do trigger them once a PR is in the "Ready for review" state. Sometimes, a PR can be marked as "Ready for review" before we finish image building. In such cases, triggering e2e tests fails. ## Summary of changes - Make `trigger-e2e-tests` job poll status of `promote-images` job from the build-and-test workflow for the last commit. And trigger only if the status is `success` - Remove explicit image checking from the workflow - Add `concurrency` for `triggere-e2e-tests` workflow to make it possible to cancel jobs in progress (if PR moves from "Draft" to "Ready for review" several times in a row)	2024-08-05 12:25:23 +01:00
Konstantin Knizhnik	f63c8e5a8c	Update Postgres versions to use smgrexists() instead of access() to check if Oid is used (#8597 ) ## Problem PR #7992 was merged without correspondent changes in Postgres submodules and this is why test_oid_overflow.py is failed now. ## Summary of changes Bump Postgres versions ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-05 14:24:54 +03:00
Alex Chi Z.	200fa56b04	feat(pageserver): support split delta layers (#8599 ) part of https://github.com/neondatabase/neon/issues/8002 Similar to https://github.com/neondatabase/neon/pull/8574, we add auto-split support for delta layers. Tests are reused from image layer split writers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-05 10:30:49 +00:00
dotdister	0f3dac265b	safekeeper: remove unused partial_backup_enabled option (#8547 ) ## Problem There is an unused safekeeper option `partial_backup_enabled`. `partial_backup_enabled` was implemented in #6530, but this option was always turned into enabled in #8022. If you intended to keep this option for a specific reason, I will close this PR. ## Summary of changes I removed an unused safekeeper option `partial_backup_enabled`.	2024-08-05 09:23:59 +02:00
Alex Chi Z.	1dc496a2c9	feat(pageserver): support auto split layers based on size (#8574 ) part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes Add a `SplitImageWriter` that automatically splits image layer based on estimated target image layer size. This does not consider compression and we might need a better metrics. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-08-05 06:55:36 +01:00
Alex Chi Z.	6814bdd30b	fix(pageserver): deadlock in gc-compaction (#8590 ) We need both compaction and gc lock for gc-compaction. The lock order should be the same everywhere, otherwise there could be a deadlock where A waits for B and B waits for A. We also had a double-lock issue. The compaction lock gets acquired in the outer `compact` function. Note that the unit tests directly call `compact_with_gc`, and therefore not triggering the issue. ## Summary of changes Ensure all places acquire compact lock and then gc lock. Remove an extra compact lock acqusition. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-03 00:52:04 +01:00
John Spray	0a667bc8ef	tests: add test_historic_storage_formats (#8423 ) ## Problem Currently, our backward compatibility tests only look one release back. That means, for example, that when we switch on image layer compression by default, we'll test reading of uncompressed layers for one release, and then stop doing it. When we make an index_part.json format change, we'll test against the old format for a week, then stop (unless we write separate unit tests for each old format). The reality in the field is that data in old formats will continue to exist for weeks/months/years. When we make major format changes, we should retain examples of the old format data, and continuously verify that the latest code can still read them. This test uses contents from a new path in the public S3 bucket, `compatibility-data-snapshots/`. It is populated by hand. The first important artifact is one from before we switch on compression, so that we will keep testing reads of uncompressed data. We will generate more artifacts ahead of other key changes, like when we update remote storage format for archival timelines. Closes: https://github.com/neondatabase/cloud/issues/15576	2024-08-02 18:28:23 +01:00
Arthur Petukhovsky	f3acfb2d80	Improve safekeepers eviction rate limiting (#8456 ) This commit tries to fix regular load spikes on staging, caused by too many eviction and partial upload operations running at the same time. Usually it was hapenning after restart, for partial backup the load was delayed. - Add a semaphore for evictions (2 permits by default) - Rename `resident_since` to `evict_not_before` and smooth out the curve by using random duration - Use random duration in partial uploads as well related to https://github.com/neondatabase/neon/issues/6338 some discussion in https://neondb.slack.com/archives/C033RQ5SPDH/p1720601531744029	2024-08-02 15:26:46 +01:00
Arpad Müller	8c828c586e	Wait for completion of the upload queue in flush_frozen_layer (#8550 ) Makes `flush_frozen_layer` add a barrier to the upload queue and makes it wait for that barrier to be reached until it lets the flushing be completed. This gives us backpressure and ensures that writes can't build up in an unbounded fashion. Fixes #7317	2024-08-02 13:07:12 +02:00
John Spray	2334fed762	storage_controller: start adding chaos hooks (#7946 ) Chaos injection bridges the gap between automated testing (where we do lots of different things with small, short-lived tenants), and staging (where we do many fewer things, but with larger, long-lived tenants). This PR adds a first type of chaos which isn't really very chaotic: it's live migration of tenants between healthy pageservers. This nevertheless provides continuous checks that things like clean, prompt shutdown of tenants works for realistically deployed pageservers with realistically large tenants.	2024-08-02 09:37:44 +01:00
John Spray	c53799044d	pageserver: refine how we delete timelines after shard split (#8436 ) ## Problem Previously, when we do a timeline deletion, shards will delete layers that belong to an ancestor. That is not a correctness issue, because when we delete a timeline, we're always deleting it from all shards, and destroying data for that timeline is clearly fine. However, there exists a race where one shard might start doing this deletion while another shard has not yet received the deletion request, and might try to access an ancestral layer. This creates ambiguity over the "all layers referenced by my index should always exist" invariant, which is important to detecting and reporting corruption. Now that we have a GC mode for clearing up ancestral layers, we can rely on that to clean up such layers, and avoid deleting them right away. This makes things easier to reason about: there are now no cases where a shard will delete a layer that belongs to a ShardIndex other than itself. ## Summary of changes - Modify behavior of RemoteTimelineClient::delete_all - Add `test_scrubber_physical_gc_timeline_deletion` to exercise this case - Tweak AWS SDK config in the scrubber to enable retries. Motivated by seeing the test for this feature encounter some transient "service error" S3 errors (which are probably nothing to do with the changes in this PR)	2024-08-02 08:00:46 +01:00
Alexander Bayandin	e7477855b7	test_runner: don't create artifacts if Allure is not enabled (#8580 ) ## Problem `allure_attach_from_dir` method might create `tar.zst` archives even if `--alluredir` is not set (i.e. Allure results collection is disabled) ## Summary of changes - Don't run `allure_attach_from_dir` if `--alluredir` is not set	2024-08-01 15:55:43 +00:00
Alex Chi Z.	f4a668a27d	fix(pageserver): skip existing layers for btm-gc-compaction (#8498 ) part of https://github.com/neondatabase/neon/issues/8002 Due to the limitation of the current layer map implementation, we cannot directly replace a layer. It's interpreted as an insert and a deletion, and there will be file exist error when renaming the newly-created layer to replace the old layer. We work around that by changing the end key of the image layer. A long-term fix would involve a refactor around the layer file naming. For delta layers, we simply skip layers with the same key range produced, though it is possible to add an extra key as an alternative solution. * The image layer range for the layers generated from gc-compaction will be Key::MIN..(Key..MAX-1), to avoid being recognized as an L0 delta layer. * Skip existing layers if it turns out that we need to generate a layer with the same persistent key in the same generation. Note that it is possible that the newly-generated layer has different content from the existing layer. For example, when the user drops a retain_lsn, the compaction could have combined or dropped some records, therefore creating a smaller layer than the existing one. We discard the "optimized" layer for now because we cannot deal with such rewrites within the same generation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-01 15:00:06 +01:00
Alex Chi Z.	970f2923b2	storage-scrubber: log version on start (#8571 ) Helps us better identify which version of storage scrubber is running. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-01 13:52:34 +00:00
John Spray	1678dea20f	pageserver: add layer visibility calculation (#8511 ) ## Problem We recently added a "visibility" state to layers, but nothing initializes it. Part of: - #8398 ## Summary of changes - Add a dependency on `range-set-blaze`, which is used as a fast incrementally updated alternative to KeySpace. We could also use this to replace the internals of KeySpaceRandomAccum if we wanted to. Writing a type that does this kind of "BtreeMap & merge overlapping entries" thing isn't super complicated, but no reason to write this ourselves when there's a third party impl available. - Add a function to layermap to calculate visibilities for each layer - Add a function to Timeline to call into layermap and then apply these visibilities to the Layer objects. - Invoke the calculation during startup, after image layer creations, and when removing branches. Branch removal and image layer creation are the two ways that a layer can go from Visible to Covered. - Add unit test & benchmark for the visibility calculation - Expose `pageserver_visible_physical_size` metric, which should always be <= `pageserver_remote_physical_size`. - This metric will feed into the /v1/utilization endpoint later: the visible size indicates how much space we would like to use on this pageserver for this tenant. - When `pageserver_visible_physical_size` is greater than `pageserver_resident_physical_size`, this is a sign that the tenant has long-idle branches, which result in layers that are visible in principle, but not used in practice. This does not keep visibility hints up to date in all cases: particularly, when creating a child timeline, any previously covered layers will not get marked Visible until they are accessed. Updates after image layer creation could be implemented as more of a special case, but this would require more new code: the existing depth calculation code doesn't maintain+yield the list of deltas that would be covered by an image layer. ## Performance This operation is done rarely (at startup and at timeline deletion), so needs to be efficient but not ultra-fast. There is a new `visibility` bench that measures runtime for a synthetic 100k layers case (`sequential`) and a real layer map (`real_map`) with ~26k layers. The benchmark shows runtimes of single digit milliseconds (on a ryzen 7950). This confirms that the runtime shouldn't be a problem at startup (as we already incur S3-level latencies there), but that it's slow enough that we definitely shouldn't call it more often than necessary, and it may be worthwhile to optimize further later (things like: when removing a branch, only bother scanning layers below the branchpoint) ``` visibility/sequential time: [4.5087 ms 4.5894 ms 4.6775 ms] change: [+2.0826% +3.9097% +5.8995%] (p = 0.00 < 0.05) Performance has regressed. Found 24 outliers among 100 measurements (24.00%) 2 (2.00%) high mild 22 (22.00%) high severe min: 0/1696070, max: 93/1C0887F0 visibility/real_map time: [7.0796 ms 7.0832 ms 7.0871 ms] change: [+0.3900% +0.4505% +0.5164%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe min: 0/1696070, max: 93/1C0887F0 visibility/real_map_many_branches time: [4.5285 ms 4.5355 ms 4.5434 ms] change: [-1.0012% -0.8004% -0.5969%] (p = 0.00 < 0.05) Change within noise threshold. ```	2024-08-01 09:25:35 +00:00
Arpad Müller	163f2eaf79	Reduce linux-raw-sys duplication (#8577 ) Before, we had four versions of linux-raw-sys in our dependency graph: ``` linux-raw-sys@0.1.4 linux-raw-sys@0.3.8 linux-raw-sys@0.4.13 linux-raw-sys@0.6.4 ``` now it's only two: ``` linux-raw-sys@0.4.13 linux-raw-sys@0.6.4 ``` The changes in this PR are minimal. In order to get to its state one only has to update procfs in Cargo.toml to 0.16 and do `cargo update -p tempfile -p is-terminal -p prometheus`.	2024-08-01 08:22:21 +00:00
Christian Schwarz	980d506bda	pageserver: shutdown all walredo managers 8s into shutdown (#8572 ) # Motivation The working theory for hung systemd during PS deploy (https://github.com/neondatabase/cloud/issues/11387) is that leftover walredo processes trigger a race condition. In https://github.com/neondatabase/neon/pull/8150 I arranged that a clean Tenant shutdown does actually kill its walredo processes. But many prod machines don't manage to shut down all their tenants until the 10s systemd timeout hits and, presumably, triggers the race condition in systemd / the Linux kernel that causes the frozen systemd # Solution This PR bolts on a rather ugly mechanism to shut down tenant managers out of order 8s after we've received the SIGTERM from systemd. # Changes - add a global registry of `Weak<WalRedoManager>` - add a special thread spawned during `shutdown_pageserver` that sleeps for 8s, then shuts down all redo managers in the registry and prevents new redo managers from being created - propagate the new failure mode of tenant spawning throughout the code base - make sure shut down tenant manager results in PageReconstructError::Cancelled so that if Timeline::get calls come in after the shutdown, they do the right thing	2024-08-01 07:57:09 +02:00
Alex Chi Z.	d6c79b77df	test(pageserver): add test_gc_feedback_with_snapshots (#8474 ) should be working after https://github.com/neondatabase/neon/pull/8328 gets merged. Part of https://github.com/neondatabase/neon/issues/8002 adds a new perf benchmark case that ensures garbages can be collected with branches --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-31 17:55:19 -04:00
Alexander Bayandin	3350daeb9a	CI(create-test-report): fix missing benchmark results in Allure report (#8540 ) ## Problem In https://github.com/neondatabase/neon/pull/8241 I've accidentally removed `create-test-report` dependency on `benchmarks` job ## Summary of changes - Run `create-test-report` after `benchmarks` job	2024-07-31 19:47:59 +01:00
Arpad Müller	939d50a41c	storage_scrubber: migrate FindGarbage to remote_storage (#8548 ) Uses the newly added APIs from #8541 named `stream_tenants_generic` and `stream_objects_with_retries` and extends them with `list_objects_with_retries_generic` and `stream_tenant_timelines_generic` to migrate the `find-garbage` command of the scrubber to `GenericRemoteStorage`. Part of https://github.com/neondatabase/neon/issues/7547	2024-07-31 18:24:42 +00:00
John Spray	2f9ada13c4	controller: simplify reconciler generation increment logic (#8560 ) ## Problem This code was confusing, untested and covered: - an impossible case, where intent state is AttacheStale (we never do this) - a rare edge case (going from AttachedMulti to Attached), which we were not testing, and in any case the pageserver internally does the same Tenant reset in this transition as it would do if we incremented generation. Closes: https://github.com/neondatabase/neon/issues/8367 ## Summary of changes - Simplify the logic to only skip incrementing the generation if the location already has the expected generation and the exact same mode.	2024-07-31 18:37:47 +01:00
Cihan Demirci	ff51b565d3	cicd: change Azure storage details [2/2] (#8562 ) Change Azure storage configuration to point to updated variables/secrets. Also update subscription id variable.	2024-07-31 17:42:10 +01:00
Tristan Partin	5e0409de95	Fix negative replication delay metric In some cases, we can get a negative metric for replication_delay_bytes. My best guess from all the research I've done is that we evaluate pg_last_wal_receive_lsn() before pg_last_wal_replay_lsn(), and that by the time everything is said and done, the replay LSN has advanced past the receive LSN. In this case, our lag can effectively be modeled as 0 due to the speed of the WAL reception and replay.	2024-07-31 10:16:58 -05:00
Christian Schwarz	4e3b70e308	refactor(page_service): Timeline gate guard holding + cancellation + shutdown (#8339 ) Since the introduction of sharding, the protocol handling loop in `handle_pagerequests` cannot know anymore which concrete `Tenant`/`Timeline` object any of the incoming `PagestreamFeMessage` resolves to. In fact, one message might resolve to one `Tenant`/`Timeline` while the next one may resolve to another one. To avoid going to tenant manager, we added the `shard_timelines` which acted as an ever-growing cache that held timeline gate guards open for the lifetime of the connection. The consequence of holding the gate guards open was that we had to be sensitive to every cached `Timeline::cancel` on each interaction with the network connection, so that Timeline shutdown would not have to wait for network connection interaction. We can do better than that, meaning more efficiency & better abstraction. I proposed a sketch for it in * https://github.com/neondatabase/neon/pull/8286 and this PR implements an evolution of that sketch. The main idea is is that `mod page_service` shall be solely concerned with the following: 1. receiving requests by speaking the protocol / pagestream subprotocol 2. dispatching the request to a corresponding method on the correct shard/`Timeline` object 3. sending response by speaking the protocol / pagestream subprotocol. The cancellation sensitivity responsibilities are clear cut: * while in `page_service` code, sensitivity to page_service cancellation is sufficient * while in `Timeline` code, sensitivity to `Timeline::cancel` is sufficient To enforce these responsibilities, we introduce the notion of a `timeline::handle::Handle` to a `Timeline` object that is checked out from a `timeline::handle::Cache` for each request. The `Handle` derefs to `Timeline` and is supposed to be used for a single async method invocation on `Timeline`. See the lengthy doc comment in `mod handle` for details of the design.	2024-07-31 17:05:45 +02:00
Alex Chi Z.	61a65f61f3	feat(pageserver): support btm-gc-compaction for child branches (#8519 ) part of https://github.com/neondatabase/neon/issues/8002 For child branches, we will pull the image of the modified keys from the parant into the child branch, which creates a full history for generating key retention. If there are not enough delta keys, the image won't be wrote eventually, and we will only keep the deltas inside the child branch. We could avoid the wasteful work to pull the image from the parent if we can know the number of deltas in advance, in the future (currently we always pull image for all modified keys in the child branch) --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-31 15:48:48 +01:00
Alexander Bayandin	d21246c8bd	CI(regress-tests): run less regression tests (#8561 ) ## Problem We run regression tests on `release` & `debug` builds for each of the three supported Postgres versions (6 in total). With upcoming ARM support and Postgres 17, the number of jobs will jump to 16, which is a lot. See the internal discussion here: https://neondb.slack.com/archives/C033A2WE6BZ/p1722365908404329 ## Summary of changes - Run `regress-tests` job in debug builds only with the latest Postgres version - Do not do `debug` builds on release branches	2024-07-31 15:10:27 +01:00
Christian Schwarz	4825b0fec3	compaction_level0_phase1: bypass PS PageCache for data blocks (#8543 ) part of https://github.com/neondatabase/neon/issues/8184 # Problem We want to bypass PS PageCache for all data block reads, but `compact_level0_phase1` currently uses `ValueRef::load` to load the WAL records from delta layers. Internally, that maps to `FileBlockReader:read_blk` which hits the PageCache [here](`e78341e1c2/pageserver/src/tenant/block_io.rs (L229-L236)`). # Solution This PR adds a mode for `compact_level0_phase1` that uses the `MergeIterator` for reading the `Value`s from the delta layer files. `MergeIterator` is a streaming k-merge that uses vectored blob_io under the hood, which bypasses the PS PageCache for data blocks. Other notable changes: * change the `DiskBtreeReader::into_stream` to buffer the node, instead of holding a `PageCache` `PageReadGuard`. * Without this, we run out of page cache slots in `test_pageserver_compaction_smoke`. * Generally, `PageReadGuard`s aren't supposed to be held across await points, so, this is a general bugfix. # Testing / Validation / Performance `MergeIterator` has not yet been used in production; it's being developed as part of * https://github.com/neondatabase/neon/issues/8002 Therefore, this PR adds a validation mode that compares the existing approach's value iterator with the new approach's stream output, item by item. If they're not identical, we log a warning / fail the unit/regression test. To avoid flooding the logs, we apply a global rate limit of once per 10 seconds. In any case, we use the existing approach's value. Expected performance impact that will be monitored in staging / nightly benchmarks / eventually pre-prod: * with validation: * increased CPU usage * ~doubled VirtualFile read bytes/second metric * no change in disk IO usage because the kernel page cache will likely have the pages buffered on the second read * without validation: * slightly higher DRAM usage because each iterator participating in the k-merge has a dedicated buffer (as opposed to before, where compactions would rely on the PS PageCaceh as a shared evicting buffer) * less disk IO if previously there were repeat PageCache misses (likely case on a busy production Pageserver) * lower CPU usage: PageCache out of the picture, fewer syscalls are made (vectored blob io batches reads) # Rollout The new code is used with validation mode enabled-by-default. This gets us validation everywhere by default, specifically in - Rust unit tests - Python tests - Nightly pagebench (shouldn't really matter) - Staging Before the next release, I'll merge the following aws.git PR that configures prod to continue using the existing behavior: * https://github.com/neondatabase/aws/pull/1663 # Interactions With Other Features This work & rollout should complete before Direct IO is enabled because Direct IO would double the IOPS & latency for each compaction read (#8240). # Future Work The streaming k-merge's memory usage is proportional to the amount of memory per participating layer. But `compact_level0_phase1` still loads all keys into memory for `all_keys_iter`. Thus, it continues to have active memory usage proportional to the number of keys involved in the compaction. Future work should replace `all_keys_iter` with a streaming keys iterator. This PR has a draft in its first commit, which I later reverted because it's not necessary to achieve the goal of this PR / issue #8184.	2024-07-31 14:17:59 +02:00
Cihan Demirci	a4df3c8488	cicd: change Azure storage details [1/2] (#8553 ) Change Azure storage configuration to point to new variables/secrets. They have the `_NEW` suffix in order not to disrupt any tests while we complete the switch.	2024-07-30 19:34:15 +00:00
Christian Schwarz	d95b46f3f3	cleanup(compact_level0_phase1): some commentary and wrapping into block expressions (#8544 ) Byproduct of scouting done for https://github.com/neondatabase/neon/issues/8184 refs https://github.com/neondatabase/neon/issues/8184	2024-07-30 18:13:18 +02:00
Yuchen Liang	85bef9f05d	feat(scrubber): post `scan_metadata` results to storage controller (#8502 ) Part of #8128, followup to #8480. closes #8421. Enable scrubber to optionally post metadata scan health results to storage controller. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-30 16:07:34 +01:00
Yuchen Liang	e374d6778e	feat(storcon): store scrubber metadata scan result (#8480 ) Part of #8128, followed by #8502. ## Problem Currently we lack mechanism to alert unhealthy `scan_metadata` status if we start running this scrubber command as part of a cronjob. With the storage controller client introduced to storage scrubber in #8196, it is viable to set up alert by storing health status in the storage controller database. We intentionally do not store the full output to the database as the json blobs potentially makes the table really huge. Instead, only a health status and a timestamp recording the last time metadata health status is posted on a tenant shard. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-30 14:32:00 +01:00
Anton Chaporgin	9ceaf9a986	[neon/acr] impr: push to ACR while building images (#8545 ) This tests the ability to push into ACR using OIDC. Proved it worked by running slightly modified YAML. In `promote-images` we push the following images `neon compute-tools {vm-,}compute-node-{v14,v15,v16}` into `neoneastus2`. https://github.com/neondatabase/cloud/issues/14640	2024-07-30 14:15:53 +01:00
Alexander Bayandin	f72fe68626	CI(benchmarking): make neonvm default provisioner (#8538 ) ## Problem We don't allow regular end-users to use `k8s-pod` provisioner, but we still use it in nightly benchmarks ## Summary of changes - Remove `provisioner` input from `neon-create-project` action, use `k8s-neonvm` as a default provioner - Change `neon-` platform prefix to `neonvm-` - Remove `neon-captest-freetier` and `neon-captest-new` as we already have their `neonvm` counterparts	2024-07-30 13:38:23 +01:00
Arpad Müller	9fabdda2dc	scrubber: add remote_storage based listing APIs and use them in find-large-objects (#8541 ) Add two new functions `stream_objects_with_retries` and `stream_tenants_generic` and use them in the `find-large-objects` subcommand, migrating it to `remote_storage`. Also adds the `size` field to the `ListingObject` struct. Part of #7547	2024-07-30 09:00:37 +00:00
Arpad Müller	1c7b06c988	Add metrics for input data considered and taken for compression (#8522 ) If compression is enabled, we currently try compressing each image larger than a specific size and if the compressed version is smaller, we write that one, otherwise we use the uncompressed image. However, this might sometimes be a wasteful process, if there is a substantial amount of images that don't compress well. The compression metrics added in #8420 `pageserver_compression_image_in_bytes_total` and `pageserver_compression_image_out_bytes_total` are well designed for answering the question how space efficient the total compression process is end-to-end, which helps one to decide whether to enable it or not. To answer the question of how much waste there is in terms of trial compression, so CPU time, we add two metrics: * one about the images that have been trial-compressed (considered), and * one about the images where the compressed image has actually been written (chosen). There is different ways of weighting them, like for example one could look at the count, or the compressed data. But the main contributor to compression CPU usage is amount of data processed, so we weight the images by their uncompressed size. In other words, the two metrics are: * `pageserver_compression_image_in_bytes_considered` * `pageserver_compression_image_in_bytes_chosen` Part of #5431	2024-07-30 09:59:15 +02:00
John Spray	52b02d95c8	scrubber: enable cleaning up garbage tenants from known deletion bugs, add object age safety check (#8461 ) ## Problem Old storage buckets can contain a lot of tenants that aren't known to the control plane at all, because they belonged to test jobs that get their control plane state cleaned up shortly after running. In general, it's somewhat unsafe to purge these, as it's hard to distinguish "control plane doesn't know about this, so it's garbage" from "control plane said it didn't know about this, which is a bug in the scrubber, control plane, or API URL configured". However, the most common case is that we see only a small husk of a tenant in S3 from a specific old behavior of the software, for example: - We had a bug where heatmaps weren't deleted on tenant delete - When WAL DR was first deployed, we didn't delete initdb.tar.zst on tenant deletion ## Summary of changes - Add a KnownBug variant for the garbage reason - Include such cases in the "safe" deletion mode (`--mode=deleted`) - Add code that inspects tenants missing in control plane to identify cases of known bugs (this is kind of slow, but should go away once we've cleaned all these up) - Add an additional `-min-age` safety check similar to physical GC, where even if everything indicates objects aren't needed, we won't delete something that has been modified too recently. --------- Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-29 17:50:44 +01:00
Christian Schwarz	4be58522fb	l0_flush: use mode=direct by default => coverage in automated tests (#8534 ) Testing in staging and pre-prod has been [going well](https://github.com/neondatabase/neon/issues/7418#issuecomment-2255474917). This PR enables mode=direct by default, thereby providing additional coverage in the automated tests: - Rust tests - Integration tests - Nightly pagebench (likely irrelevant because it's read-only) Production deployments continue to use `mode=page-cache` for the time being: https://github.com/neondatabase/aws/pull/1655 refs https://github.com/neondatabase/neon/issues/7418	2024-07-29 16:49:22 +02:00
Christian Schwarz	d09dad0ea2	pageserver: fail if `id` is present in pageserver.toml (#8489 ) Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4 --- `identity.toml` is the authoritative place for `id` as of https://github.com/neondatabase/neon/pull/7766 refs https://github.com/neondatabase/neon/issues/7736	2024-07-29 15:16:32 +01:00
John Spray	5775662276	pageserver: fix return code from secondary_download_handler (#8508 ) ## Problem The secondary download HTTP API is meant to return 200 if the download is complete, and 202 if it is still in progress. In #8198 the download implementation was changed to drop out with success early if it over-runs a time budget, which resulted in 200 responses for incomplete downloads. This breaks storcon_cli's "tenant-warmup" command, which uses the OK status to indicate download complete. ## Summary of changes - Only return 200 if we get an Ok() _and_ the progress stats indicate the download is complete.	2024-07-29 15:05:30 +01:00
Joonas Koivunen	bdfc9ca7e9	test: deflake test_duplicate_creation (#8536 ) By including comparison of `remote_consistent_lsn_visible` we risk flakyness coming from outside of timeline creation. Mask out the `remote_consistent_lsn_visible` for the comparison. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8489/10142336315/index.html#suites/ffbb7f9930a77115316b58ff32b7c719/89ff0270bf58577a	2024-07-29 13:41:06 +01:00
a-masterov	1d8cf5b3a9	Add a test for clickhouse as a logical replication consumer (#8408 ) ## Problem We need to test logical replication with 3rd-party tools regularly. ## Summary of changes Added a test using ClickHouse as a client Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-07-29 14:35:12 +02:00
Arpad Müller	859f019185	Adopt list_streaming in tenant deletion (#8504 ) Uses the Stream based `list_streaming` function added by #8457 in tenant deletion, as suggested in https://github.com/neondatabase/neon/pull/7932#issuecomment-2150480180 . We don't have to worry about retries, as the function is wrapped inside an outer retry block. If there is a retryable error either during the listing or during deletion, we just do a fresh start. Also adds `+ Send` bounds as they are required by the `delete_tenant_remote` function.	2024-07-29 12:05:18 +02:00
Alexander Bayandin	da6bdff893	test_runner: fix user_property usage in benchmarks (#8531 ) ## Problem After https://github.com/neondatabase/neon/pull/7990 `regress_test` job started to fail with an error: ``` ... File "/__w/neon/neon/test_runner/fixtures/benchmark_fixture.py", line 485, in pytest_terminal_summary terminalreporter.write(f"{test_report.head_line}.{recorded_property['name']}: ") TypeError: 'bool' object is not subscriptable ``` https://github.com/neondatabase/neon/actions/runs/10125750938/job/28002582582 It happens because the current implementation doesn't expect pytest's `user_properties` can be used for anything else but benchmarks (and https://github.com/neondatabase/neon/pull/7990 started to use it for tracking `preserve_database_files` parameter) ## Summary of changes - Make NeonBenchmarker use only records with`neon_benchmarker_` prefix	2024-07-29 11:00:33 +01:00
Christian Schwarz	2416da337e	safekeeper: include application name in wal service root span (#8525 ) For IDENTIFY_SYSTEM in particular, application name gives away whether the client is * walproposer => Some(wal_proposer_recovery) * safekeeper => Some(safekeeper) * pageserver => Some(pageserver) Context: https://neondb.slack.com/archives/C06SJG60FRB/p1721987794673429?thread_ts=1721981056.451599&cid=C06SJG60FRB	2024-07-28 20:36:59 +01:00
Alexander Bayandin	6cad0455b0	CI(test_runner): Upload all test artifacts if preserve_database_files is enabled (#7990 ) ## Problem There's a `NeonEnvBuilder#preserve_database_files` parameter that allows you to keep database files for debugging purposes (by default, files get cleaned up), but there's no way to get these files from a CI run. This PR adds handling of `NeonEnvBuilder#preserve_database_files` and adds the compressed test output directory to Allure reports (for tests with this parameter enabled). Ref https://github.com/neondatabase/neon/issues/6967 ## Summary of changes - Compress and add the whole test output directory to Allure reports - Currently works only with `neon_env_builder` fixture - Remove `preserve_database_files = True` from sharding tests as unneeded --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-27 20:01:10 +01:00
Arpad Müller	b5e95f68b5	Persist archival information (#8479 ) Persists whether a timeline is archived or not in `index_part.json`. We only return success if the upload has actually worked successfully. Also introduces a new `index_part.json` version number. Fixes #8459 Part of #8088	2024-07-27 02:32:05 +00:00
Alex Chi Z.	dd40b19db4	fix(pageserver): give L0 compaction priorities over image layer creation (#8443 ) close https://github.com/neondatabase/neon/issues/8435 ## Summary of changes If L0 compaction did not include all L0 layers, skip image generation. There are multiple possible solutions to the original issue, i.e., an alternative is to wrap the partial L0 compaction in a loop until it compacts all L0 layers. However, considering that we should weight all tenants equally, the current solution can ensure everyone gets a chance to run compaction, and those who write too much won't get a chance to create image layers. This creates a natural backpressure feedback that they get a slower read due to no image layers are created, slowing down their writes, and eventually compaction could keep up with their writes + generate image layers. Consider deployment, we should add an alert on "skipping image layer generation", so that we won't run into the case that image layers are not generated => incidents again. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-26 18:09:55 +00:00
Christian Schwarz	68241f5a3e	raise wait_lsn timeout from 60s to 300s (#8529 ) Problem ------- wait_lsn timeouts result in a user-facing errors like ``` $ /tmp/neon/pg_install/v16/bin/pgbench -s3424 -i -I dtGvp user=neondb_owner dbname=neondb host=ep-tiny-wave-w23owa37.eastus2.azure.neon.build sslmode=require options='-cstatement_timeout=0 ' dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data (server-side)... vacuuming... pgbench: error: query failed: ERROR: [NEON_SMGR] [shard 0] could not read block 214338 in rel 1663/16389/16839.0 from page server at lsn C/E1C12828 DETAIL: page server returned error: LSN timeout: Timed out while waiting for WAL record at LSN C/E1418528 to arrive, last_record_lsn 6/999D9CA8 disk consistent LSN=6/999D9CA8, WalReceiver status: (update 2024-07-25 08:30:07): connecting to node 25, safekeeper candidates (id\|update_time\|commit_lsn): [(21\|08:30:16\|C/E1C129E0), (23\|08:30:16\|C/E1C129E0), (25\|08:30:17\|C/E1C129E0)] CONTEXT: while scanning block 214338 of relation "public.pgbench_accounts" pgbench: detail: Query was: vacuum analyze pgbench_accounts ``` Solution -------- Its better to be slow than to fail the queries. If the app has a deadline, it can use `statement_timeout`. In the long term, we want to eliminate wait_lsn timeout. In the short term (this PR), we bump the wait_lsn timeout to a larger value to reduce the frequency at which these wait_lsn timeouts occur. We will observe SLOs and specifically `pageserver_wait_lsn_seconds_bucket` before we eliminate the timeout completely.	2024-07-26 16:44:57 +01:00
Christian Schwarz	8154e88732	refactor(layer load API): all errors are permanent (#8527 ) I am not aware of a case of "transient" VirtualFile errors as mentioned in https://github.com/neondatabase/neon/pull/5880 Private DM with Joonas discussing this: https://neondb.slack.com/archives/D049K7HJ9JM/p1721836424615799	2024-07-26 15:48:44 +01:00
Em Sharnoff	240ba7e10c	Fix sql-exporter-autoscaling for pg < 16 (#8523 ) The lfc_approximate_working_set_size_windows query was failing on pg14 and pg15 with pq: subquery in FROM must have an alias Because aliases in that position became optional only in pg16. Some context here: https://neondb.slack.com/archives/C04DGM6SMTM/p1721970322601679?thread_ts=1721921122.528849	2024-07-26 15:08:13 +01:00
Vlad Lazar	7a796a9963	storcon: introduce step down primitive (#8512 ) ## Problem We are missing the step-down primitive required to implement rolling restarts of the storage controller. ## Summary of changes Add `/control/v1/step_down` endpoint which puts the storage controller into a state where it rejects all API requests apart from `/control/v1/step_down`, `/status` and `/metrics`. When receiving the request, storage controller cancels all pending reconciles and waits for them to exit gracefully. The response contains a snapshot of the in-memory observed state. Related: * https://github.com/neondatabase/cloud/issues/14701 * https://github.com/neondatabase/neon/issues/7797 * https://github.com/neondatabase/neon/pull/8310	2024-07-26 14:54:09 +01:00
John Spray	eddfd62333	CODEOWNERS: collapse safekeepers into storage (#8510 ) ## Problem - The intersection of "safekeepers" and "storage" is just one person	2024-07-26 13:29:59 +00:00
Vlad Lazar	cdaa2816e7	pageserver: make vectored get the default read path for the pageserver (#8384 ) ## Problem Vectored get is already enabled in all prod regions without validation. The pageserver defaults are out of sync however. ## Summary of changes Update the pageserver defaults to match the prod config. Also means that when running tests locally, people don't have to use the env vars to get the prod config.	2024-07-26 14:19:52 +01:00
John Spray	3cecbfc04d	.github: reduce test concurrency (#8444 ) ## Problem This is an experiment to see if 16x concurrency is actually helping, or if it's just giving us very noisy results. If the total runtime with a lower concurrency is similar, then a lower concurrency is preferable to reduce the impact of resource-hungry tests running concurrently.	2024-07-26 11:55:37 +01:00
John Spray	65868258d2	tests: checkpoint instead of compact in test_sharding_split_compaction (#8473 ) ## Problem This test relies on writing image layers before the split. It can fail to do so durably if the image layers are written ahead of the remote consistent LSN, so we should have been doing a checkpoint rather than just a compaction	2024-07-26 11:03:44 +01:00
Arpad Müller	bb2a3f9b02	Update Rust to 1.80.0 (#8518 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-180-2024-07-25). Prior update was in #8048	2024-07-26 11:17:33 +02:00
John Spray	6711087ddf	remote_storage: expose last_modified in listings (#8497 ) ## Problem The scrubber would like to check the highest mtime in a tenant's objects as a safety check during purges. It recently switched to use GenericRemoteStorage, so we need to expose that in the listing methods. ## Summary of changes - In Listing.keys, return a ListingObject{} including a last_modified field, instead of a RemotePath --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-26 10:57:52 +03:00
Andrey Taranik	8182bfdf01	Using own registry to cache layers in docker build workflows (#8521 ) ## Problem follow up for #8475 ## Summary of changes Using own private docker registry in `cache-from` and `cache-to` settings in docker build-push actions	2024-07-26 08:55:57 +01:00
Arpad Müller	8e02db1ab9	Handle NotInitialized::ShuttingDown error in shard split (#8506 ) There is a race condition between timeline shutdown and the split task. Timeline shutdown first shuts down the upload queue, and only then fires the cancellation token. A parallel running timeline split operation might thus encounter a cancelled upload queue before the cancellation token is fired, and print a noisy error. Fix this by mapping `anyhow::Error{ NotInitialized::ShuttingDown }) to `FlushLayerError::Cancelled` instead of `FlushLayerError::Other(_)`. Fixes #8496	2024-07-26 02:16:10 +02:00
Mihai Bojin	857a1823fe	Update links in synthetic-size.md (#8501 )	2024-07-26 01:14:12 +01:00
Anastasia Lubennikova	9bfa180f2e	Update pgrx to v 0.11.3 (#8515 ) update pg_jsonschema extension to v 0.3.1 update pg_graphql extension to v1.5.7 update pgx_ulid extension to v0.1.5 update pg_tiktoken extension, patch Cargo.toml to use new pgrx	2024-07-25 21:21:58 +01:00
Alex Chi Z.	bea0468f1f	fix(pageserver): allow incomplete history in btm-gc-compaction (#8500 ) This pull request (should) fix the failure of test_gc_feedback. See the explanation in the newly-added test case. Part of https://github.com/neondatabase/neon/issues/8002 Allow incomplete history for the compaction algorithm. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-25 12:56:37 -04:00
Vlad Lazar	3977e0a7a3	storcon: shutdown with clean observed state (#8494 ) ## Problem Storcon shutdown did not produce a clean observed state. This is not a problem at the moment, but we will need to stop all reconciles with clean observed state for rolling restarts. I tried to test this by collecting the observed state during shutdown and comparing it with the in-memory observed state, but it doesn't work because a lot of tests use the cursed attach hook to create tenants directly through the ps. ## Summary of Changes Rework storcon shutdown as follows: * Reconcilers get a separate cancellation token which is a child token of the global `Service::cancel`. * Reconcilers get a separate gate * Add a mechanism to drain the reconciler result queue before * Put all of this together into a clean shutdown sequence Related https://github.com/neondatabase/cloud/issues/14701	2024-07-25 15:13:34 +01:00
John Spray	775c0c8892	tests: adjust threshold in test_partial_evict_tenant (#8509 ) ## Problem This test was destabilized by https://github.com/neondatabase/neon/pull/8431. The threshold is arbitrary & failures are still quite close to it. At a high level the test is asserting "eviction was approximately fair to these tenants", which appears to still be the case when the abs diff between ratios is slightly higher at ~0.6-0.7. ## Summary of changes - Change threshold from 0.06 to 0.065. Based on the last ~10 failures that should be sufficient.	2024-07-25 15:00:42 +01:00
John Spray	24ea9f9f60	tests: always scrub on test exit when using S3Storage (#8437 ) ## Problem Currently, tests may have a scrub during teardown if they ask for it, but most tests don't request it. To detect "unknown unknowns", let's run it at the end of every test where possible. This is similar to asserting that there are no errors in the log at the end of tests. ## Summary of changes - Remove explicit `enable_scrub_on_exit` - Always scrub if remote storage is an S3Storage.	2024-07-25 14:19:38 +01:00
Vlad Lazar	9c5ad21341	storcon: make heartbeats restart aware (#8222 ) ## Problem Re-attach blocks the pageserver http server from starting up. Hence, it can't reply to heartbeats until that's done. This makes the storage controller mark the node off-line (not good). We worked around this by setting the interval after which nodes are marked offline to 5 minutes. This isn't a long term solution. ## Summary of changes * Introduce a new `NodeAvailability` state: `WarmingUp`. This state models the following time interval: * From receiving the re-attach request until the pageserver replies to the first heartbeat post re-attach * The heartbeat delta generator becomes aware of this state and uses a separate longer interval * Flag `max-warming-up-interval` now models the longer timeout and `max-offline-interval` the shorter one to match the names of the states Closes https://github.com/neondatabase/neon/issues/7552	2024-07-25 14:09:12 +01:00
Conrad Ludgate	b0d69acb07	Merge pull request #8505 from neondatabase/rc/proxy/2024-07-25 Proxy release 2024-07-25	2024-07-25 11:07:19 +01:00
Peter Bendel	f76a4e0ad2	Temporarily remove week-end test for res-aurora from pgbench-compare benchmarking runs (#8493 ) ## Problem The rds-aurora endpoint connection cannot be reached from GitHub action runners. Temporarily remove this DBMS from the pgbench comparison runs. ## Summary of changes On Saturday we normally run Neon in comparison with AWS RDS-Postgres and AWS RDS-Aurora. Remove Aurora until we have a working setup	2024-07-25 09:51:20 +01:00
Christian Schwarz	a1256b2a67	fix: remote timeline client shutdown trips circuit breaker (#8495 ) Before this PR 1.The circuit breaker would trip on CompactionError::Shutdown. That's wrong, we want to ignore those cases. 2. remote timeline client shutdown would not be mapped to CompactionError::Shutdown in all circumstances. We observed this in staging, see https://neondb.slack.com/archives/C033RQ5SPDH/p1721829745384449 This PR fixes (1) with a simple `match` statement, and (2) by switching a bunch of `anyhow` usage over to distinguished errors that ultimately get mapped to `CompactionError::Shutdown`. I removed the implicit `#[from]` conversion from `anyhow::Error` to `CompactionError::Other` to discover all the places that were mapping remote timeline client shutdown to `anyhow::Error`. In my opinion `#[from]` is an antipattern and we should avoid it, especially for `anyhow::Error`. If some callee is going to return anyhow, the very least the caller should to is to acknowledge, through a `map_err(MyError::Other)` that they're conflating different failure reasons.	2024-07-25 09:44:31 +01:00
Christian Schwarz	d57412aaab	followup(#8359 ): pre-initialize circuitbreaker metrics (#8491 )	2024-07-25 10:24:28 +02:00
Alexander Bayandin	6fc2726568	CI: Run ARM checks in the main pipeline (#8185 ) ## Problem Jobs `check-linux-arm-build` and `check-codestyle-rust-arm` (from `.github/workflows/neon_extra_builds.yml`) duplicate `build-neon` and `check-codestyle-rust` jobs in the main pipeline. ## Summary of changes - Move `check-linux-arm-build` and `check-codestyle-rust-arm` from extra builds to the main pipeline	2024-07-24 23:05:32 +01:00
Joonas Koivunen	99b1a1dfb6	devx: nicer diff hunk headers (#8482 ) By default git does not find a nice hunk header with rust. New(er) versions ship with a handy xfuncname pattern, so lets enable that for all developers. Example of how this should help: `39046172ab`	2024-07-24 16:50:49 +01:00
John Spray	5f4e14d27d	pageserver: fix a compilation error (#8487 ) ## Problem PR that modified compaction raced with PR that modified the GcInfo structure ## Summary of changes Fix it Co-authored-by: Vlad Lazar <vlalazar.vlad@gmail.com>	2024-07-24 16:37:15 +01:00
Vlad Lazar	2723a8156a	pageserver: faster and simpler inmem layer vec read (#8469 ) ## Problem The in-memory layer vectored read was very slow in some conditions (walingest::test_large_rel) test. Upon profiling, I realised that 80% of the time was spent building up the binary heap of reads. This stage isn't actually needed. ## Summary of changes Remove the planning stage as we never took advantage of it in order to merge reads. There should be no functional change from this patch.	2024-07-24 14:23:03 +01:00
Alexander Bayandin	6f22de5fc9	CI(build-and-test): move part of the pipeline to a reusable workflow (#8241 ) ## Problem - `build-and-test` workflow is pretty big - jobs that depend on the matrix job don't start before all variations are done. I.e. `regress-tests` depend on `build-neon`, but we can't start `regress-tests` on the release configuration until `build-neon` is done on release and debug configurations. This will be more visible once we add ARM to the matrix. ## Summary of changes - Move jobs related to building (`build-neon`) and testing (`regress-tests`) to a separate job	2024-07-24 13:43:31 +01:00
Conrad Ludgate	6ca41d3438	proxy: switch to leaky bucket (#8470 ) ## Problem The current bucket based rate limiter is not very intuitive and has some bad failure cases. ## Summary of changes Switches from fixed interval buckets to leaky bucket impl. A single bucket per endpoint, drains over time. Drains by checking the time since the last check, and draining tokens en-masse. Garbage collection works similar to before, it drains a shard (1/64th of the set) every 2048 checks, and it only removes buckets that are empty. To be compatible with the existing config, I've faffed to make it take the min and the max rps of each as the sustained rps and the max bucket size which should be roughly equivalent.	2024-07-24 12:28:37 +01:00
John Spray	2ef8e57f86	pageserver: maintain gc_info incrementally (#8427 ) ## Problem Previously, Timeline::gc_info was only updated in a batch operation at the start of GC. That means that timelines didn't generally have accurate information about who their children were before the first GC, or between GC cycles. Knowledge of child branches is important for calculating layer visibility in #8398 ## Summary of changes - Split out part of refresh_gc_info into initialize_gc_info, which is now called early in startup - Include TimelineId in retain_lsns so that we can later add/remove the LSNs for particular children - When timelines are added/removed, update their parent's retain_lsns	2024-07-24 12:33:44 +02:00
John Spray	842c3d8c10	tests: simplify code around unstable `test_basebackup_with_high_slru_count` (#8477 ) ## Problem In `test_basebackup_with_high_slru_count`, the pageserver is sometimes mysteriously hanging on startup, having been started+stopped earlier in the test setup while populating template tenant data. - #7586 We can't see why this is hanging in this particular test. The test does some weird stuff though, like attaching a load of broken tenants and then doing a SIGQUIT kill of a pageserver. ## Summary of changes - Attach tenants normally instead of doing a failpoint dance to attach them as broken - Shut the pageserver down gracefully during init instead of using immediate mode - Remove the "sequential" variant of the unstable test, as this is going away soon anyway - Log before trying to acquire lock file, so that if it hangs we have a clearer sense of if that's really where it's hanging. It seems like it is, but that code does a non-blocking flock so it's surprising.	2024-07-24 11:26:24 +01:00
Arpad Müller	c698b7b010	Implement retry support for list_streaming (#8481 ) Implements the TODO from #8466 about retries: now the user of the stream returned by `list_streaming` is able to obtain the next item in the stream as often as they want, and retry it if it is an error. Also adds extends the test for paginated listing to include a dedicated test for `list_streaming`. follow-up of #8466 fixes #8457 part of #7547 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-24 10:43:05 +01:00
John Spray	f5db655447	pageserver: simplify LayerAccessStats (#8431 ) ## Problem LayerAccessStats contains a lot of detail that we don't use: short histories of most recent accesses, specifics on what kind of task accessed a layer, etc. This is all stored inside a Mutex, which is locked every time something accesses a layer. ## Summary of changes - Store timestamps at a very low resolution (to the nearest second), sufficient for use on the timescales of eviction. - Pack access time and last residence change time into a single u64 - Use the high bits of the u64 for other flags, including the new layer visibility concept. - Simplify the external-facing model for access stats to just include what we now track. Note that the `HistoryBufferWithDropCounter` is removed here because it is no longer used. I do not dislike this type, we just happen not to use it for anything else at present. Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-24 08:17:28 +01:00
Konstantin Knizhnik	925c5ad1e8	Make async connect work on MacOS: it is necessary top call WaitLatchOrSocket before PQconnectPoll (#8472 ) ## Problem While investigating problem with test_subscriber_restart flukyness, I found out that this test is not passed at all for PG 14/15 at MacOS (while working for PG16). ## Summary of changes Rewrite async connect state machine exactly in the same way as in Vanilla: call `WaitLatchOrSocket` with `WL_SOCKETR_WRTEABLE` before calling `PQconnectPoll`. Please notice that most likely it will not fix flukyness of test_subscriber_restart. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-24 09:59:18 +03:00
Christian Schwarz	b037ce07ec	followup(#8475 ): also disable 'cache-to' for neon-image-arch and neon-test-extensions (#8478 ) PR #8475 only disabled it for compute-node-image-arch. Those are fast now, but we use cache-to in other places.	2024-07-24 02:17:52 +01:00
Arpad Müller	2c0d311a54	remote_storage: add list_streaming API call (#8466 ) This adds the ability to list many prefixes in a streaming fashion to both the `RemoteStorage` trait as well as `GenericRemoteStorage`. * The `list` function of the `RemoteStorage` trait is implemented by default in terms of `list_streaming`. * For the production users (S3, Azure), `list_streaming` is implemented and the default `list` implementation is used. * For `LocalFs`, we keep the `list` implementation and make `list_streaming` call it. The `list_streaming` function is implemented for both S3 and Azure. A TODO for later is retries, which the scrubber currently has while the `list_streaming` implementations lack them. part of #8457 and #7547	2024-07-24 02:09:01 +02:00
Alex Chi Z.	18cf5cfefd	feat(pageserver): support retain_lsn in bottommost gc-compaction (#8328 ) part of https://github.com/neondatabase/neon/issues/8002 The main thing in this pull request is the new `generate_key_retention` function. It decides which deltas to retain and generate images for a given key based on its history + retain_lsn + horizon. On that, we generate a flat single level of delta layers over all deltas included in the compaction. In the future, we can decide whether to split them over the LSN axis as described in the RFC. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-24 00:28:43 +01:00
Andrey Taranik	39a35671df	temporarily disable cache saving in the registry as it is very slow (#8475 ) ## Problem `compute-node-image-arch` jobs are very slow and block development. ## Summary of changes Temporary disable cache saving	2024-07-23 23:36:28 +02:00
John Spray	9e23410074	tests: allow-list a controller heartbeat error (#8471 ) ## Problem `test_change_pageserver` stops pageservers in a way that can overlap with the controller's heartbeats: the controller can get a heartbeat success and then immediately find the node unavailable. This particular situation triggers a log that isn't in our current allow-list of messages for nodes offline Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8339/10048487700/index.html#testresult/19678f27810231df/retries ## Summary of changes - Add the message to the allow list	2024-07-23 16:09:05 -04:00
Shinya Kato	d47c94b336	Fix to use a tab instead of spaces (#8394 ) ## Problem There were spaces instead of a tab in the C source file. ## Summary of changes I fixed to use a tab instead of spaces.	2024-07-23 17:46:05 +02:00
Konstantin Knizhnik	563d73d923	Use smgrexists() instead of access() to enforce uniqueness of generated relfilenumber (#7992 ) ## Problem Postgres is using `access()` function in `GetNewRelFileNumber` to check if assigned relfilenumber is not used for any other relation. This check will not work in Neon, because we do not have all files in local storage. ## Summary of changes Use smgrexists() instead which will check at page server if such relfilenode is used. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-23 18:41:55 +03:00
John Spray	1a4c1eba92	pageserver: add LayerVisibilityHint (#8432 ) ## Problem As described in https://github.com/neondatabase/neon/issues/8398, layer visibility is a new hint that will help us manage disk space more efficiently. ## Summary of changes - Introduce LayerVisibilityHint and store it as part of access stats - Automatically mark a layer visible if it is accessed, or when it is created. The impact on the access stats size will be reversed in https://github.com/neondatabase/neon/pull/8431 This is functionally a no-op change: subsequent PRs will add the logic that sets layers to Covered, and which uses the layer visibility as an input to eviction and heatmap generation. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-23 15:37:12 +01:00
dependabot[bot]	129f348aae	build(deps): bump openssl from 0.10.64 to 0.10.66 in /test_runner/pg_clients/rust/tokio-postgres (#8464 )	2024-07-23 14:05:07 +00:00
John Spray	80c8ceacbc	tests: make `test_scrubber_physical_gc_ancestors` more stable (#8453 ) ## Problem This test sometimes found that ancestors were getting cleaned up before it had done any compaction. Compaction was happening implicitly via Workload. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8298/10032173390/index.html#testresult/fb04786402f80822/retries ## Summary of changes - Set upload=False when writing data after shard split, to avoid doing a checkpoint - Add a checkpoint_period & explicit wait for uploads so that we ensure data lands in S3 without doing a checkpoint	2024-07-23 12:57:57 +01:00
Vlad Lazar	35854928d9	pageserver: use identity file as node id authority and remove init command and config-override flags (#7766 ) Ansible will soon write the node id to `identity.toml` in the work dir for new pageservers. On the pageserver side, we read the node id from the identity file if it is present and use that as the source of truth. If the identity file is missing, cannot be read, or does not deserialise, start-up is aborted. This PR also removes the `--init` mode and the `--config-override` flag from the `pageserver` binary. The neon_local is already not using these flags anymore. Ansible still uses them until the linked change is merged & deployed, so, this PR has to land simultaneously or after the Ansible change due to that. Related Ansible change: https://github.com/neondatabase/aws/pull/1322 Cplane change to remove config-override usages: https://github.com/neondatabase/cloud/pull/13417 Closes: https://github.com/neondatabase/neon/issues/7736 Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-23 11:41:12 +01:00
Yuchen Liang	3cd888f173	fix(docs): remove incorrect flags for scrubber purge-garbage command (#8463 ) Scrubber purge-garbage command does not take `--node-kind` and `--depth`.	2024-07-22 20:02:25 +01:00
Em Sharnoff	d6753e9ee4	vm-image: Expose new LFC working set size metrics (#8298 ) In general, replace: * 'lfc_approximate_working_set_size' with * 'lfc_approximate_working_set_size_windows' For the "main" metrics that are actually scraped and used internally, the old one is just marked as deprecated. For the "autoscaling" metrics, we're not currently using the old one, so we can get away with just replacing it. Also, for the user-visible metrics we'll only store & expose a few different time windows, to avoid making the UI overly busy or bloating our internal metrics storage. But for the autoscaling-related scraper, we aren't storing the metrics, and it's useful to be able to programmatically operate on the trendline of how WSS increases (or doesn't!) with window size. So there, we can just output datapoints for each minute. Part of neondatabase/autoscaling#872 See also https://www.notion.so/neondatabase/cca38138fadd45eaa753d81b859490c6	2024-07-22 19:28:08 +01:00
Konstantin Knizhnik	a868e342d4	Change default version of Neon extensio to 1.4	2024-07-22 17:58:07 +01:00
Arpad Müller	f17fe75169	Mark body of archival_config endpoint as required (#8458 ) As pointed out in https://github.com/neondatabase/neon/pull/8414#discussion_r1684881525 Part of https://github.com/neondatabase/neon/issues/8088	2024-07-22 17:39:18 +01:00
Christian Schwarz	6237322a2e	build: mark `target/` and `pg_install/` with `CACHEDIR.TAG` (#8448 ) Backup tools such as `tar` and `restic` recognize this. More info: https://bford.info/cachedir/ NB: cargo _should_ create the tag file in the `target/` directory but doesn't if the directory already exists, which happens frequently if rust-analyzer is launched by your IDE before you can type `cargo build`. Hence, create the file manually here. => https://github.com/rust-lang/cargo/issues/14281	2024-07-22 17:32:25 +02:00
Christian Schwarz	e8523014d4	refactor(pageserver) remove `task_mgr` for most global tasks (#8449 ) ## Motivation & Context We want to move away from `task_mgr` towards explicit tracking of child tasks. This PR is extracted from https://github.com/neondatabase/neon/pull/8339 where I refactor `PageRequestHandler` to not depend on task_mgr anymore. ## Changes This PR refactors all global tasks but `PageRequestHandler` to use some combination of `JoinHandle`/`JoinSet` + `CancellationToken`. The `task_mgr::spawn(.., shutdown_process_on_error)` functionality is preserved through the new `exit_on_panic_or_error` wrapper. Some global tasks were not using it before, but as of this PR, they are. The rationale is that all global tasks are relevant for correct operation of the overall Neon system in one way or another. ## Future Work After #8339, we can make `task_mgr::spawn` require a `TenantId` instead of an `Option<TenantId>` which concludes this step of cleanup work and will help discourage future usage of task_mgr for global tasks.	2024-07-22 17:25:06 +02:00
Alex Chi Z.	631a9c372f	fix(docs): clearify the admin URL and token used in scrubber (#8441 ) We were not clear about which token and admin URL to use for this tool. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-22 09:59:02 -04:00
Yuchen Liang	595c450036	fix(scrubber): more robust metadata consistency checks (#8344 ) Part of #8128. ## Problem Scrubber uses `scan_metadata` command to flag metadata inconsistencies. To trust it at scale, we need to make sure the errors we emit is a reflection of real scenario. One check performed in the scrubber is to see whether layers listed in the latest `index_part.json` is present in object listing. Currently, the scrubber does not robustly handle the case where objects are uploaded/deleted during the scan. ## Summary of changes Condition for success: An object in the index is (1) in the object listing we acquire from S3 or (2) found in a HeadObject request (new object). - Add in the `HeadObject` requests for the layers missing from the object listing. - Keep the order of first getting the object listing and then downloading the layers. - Update check to only consider shards with highest shard count. - Skip analyzing a timeline if `deleted_at` tombstone is marked in `index_part.json`. - Add new test to see if scrubber actually detect the metadata inconsistency. _Misc_ - A timeline with no ancestor should always have some layers. - Removed experimental histograms _Caveat_ - Ancestor layer is not cleaned until #8308 is implemented. If ancestor layers reference non-existing layers in the index, the scrubber will emit false positives. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-22 14:53:33 +01:00
Arpad Müller	204bb8faa3	Start using remote_storage in S3 scrubber for PurgeGarbage (#7932 ) Starts using the `remote_storage` crate in the S3 scrubber for the `PurgeGarbage` subcommand. The `remote_storage` crate is generic over various backends and thus using it gives us the ability to run the scrubber against all supported backends. Start with the `PurgeGarbage` subcommand as it doesn't use `stream_tenants`. Part of #7547.	2024-07-22 14:49:30 +01:00
John Spray	8d948f2e07	tests: make test_change_pageserver more robust (#8442 ) ## Problem This test predates the storage controller. It stops pageservers and reconfigures computes, but that races with the storage controller's node failure detection, which can result in restarting nodes not getting the attachments they expect, and the test failing ## Summary of changes - Configure the storage controller to use a compute notify hook that does nothing, so that it cannot interfere with the test's configuration of computes. - Instead of using the attach hook, just notify the storage controller that nodes are offline, and reconcile tenants so that they will automatically be attached to the other node.	2024-07-22 14:17:02 +01:00
John Spray	98af1e365b	pageserver: remove absolute-order disk usage eviction (#8454 ) ## Problem Deployed pageserver configurations are all like this: ``` disk_usage_based_eviction: max_usage_pct: 85 min_avail_bytes: 0 period: "10s" eviction_order: type: "RelativeAccessed" args: highest_layer_count_loses_first: true ``` But we're maintaining this optional absolute order eviction, with test cases etc. ## Summary of changes - Remove absolute order eviction. Make the default eviction policy the same as how we really deploy pageservers.	2024-07-22 13:15:55 +01:00
John Spray	ebda667ef8	tests: more generous memory allowance in test_compaction_l0_memory (#8446 ) ## Problem This test is new, the limit was set experimentally and it turns out the memory consumption in CI runs varies more than expected. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10010912745/index.html#suites/9eebd1154fe19f9311ca7613f38156a1/82e40cf86a243ad5/	2024-07-22 11:50:30 +01:00
Alex Chi Z.	fd8a7a7223	fix(docs): race on monotonic rfc id (#8445 ) ## Problem We have two No.34 RFC. ## Summary of changes ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-22 09:22:07 +01:00
Anton Chaporgin	7996bce6d6	[proxy/redis] impr: use redis_auth_type to switch between auth types (#8428 ) ## Problem On Azure we need to use username-password authentication in proxy for regional redis client. ## Summary of changes This adds `redis_auth_type` to the config with default value of "irsa". Not specifying it will enforce the `regional_redis_client` to be configured with IRSA redis (as it's done now). If "plain" is specified, then the regional client is condifigured with `redis_notifications`, consuming username:password auth from URI. We plan to do that for the Azure cloud. Configuring `regional_redis_client` is required now, there is no opt-out from configuring it. https://github.com/neondatabase/cloud/issues/14462	2024-07-22 11:02:22 +03:00
Arpad Müller	4e547e6274	Use DefaultCredentialsChain AWS authentication in remote_storage (#8440 ) PR #8299 has switched the storage scrubber to use `DefaultCredentialsChain`. Now we do this for `remote_storage`, as it allows us to use `remote_storage` from inside kubernetes. Most of the diff is due to `GenericRemoteStorage::from_config` becoming `async fn`.	2024-07-19 21:19:30 +02:00
Arpad Müller	3d582b212a	Add archival_config endpoint to pageserver (#8414 ) This adds an archival_config endpoint to the pageserver. Currently it has no effect, and always "works", but later the intent is that it will make a timeline archived/unarchived. - [x] add yml spec - [x] add endpoint handler Part of https://github.com/neondatabase/neon/issues/8088	2024-07-19 21:01:59 +02:00
Shinya Kato	3fbb84d741	Fix openapi specification (#8273 ) ## Problem There are some swagger errors in `pageserver/src/http/openapi_spec.yml` ``` Error 431 15000 Object includes not allowed fields Error 569 3100401 should always have a 'required' Error 569 15000 Object includes not allowed fields Error 1111 10037 properties members must be schemas ``` ## Summary of changes Fixed the above errors.	2024-07-19 18:20:57 +00:00
John Spray	a4fa250c92	tests: longer timeouts in test_timeline_deletion_with_files_stuck_in_upload_queue (#8438 ) ## Problem This test had two locations with 2 second timeouts, which is rather low when we run on a highly contended test machine running lots of tests in parallel. It usually passes, but today I've seen both of these locations time out on separate PRs. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8432/10007868041/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/6c6a092be083d27c ## Summary of changes - Change 2 second timeouts to 20 second timeouts	2024-07-19 19:30:28 +02:00
Shinya Kato	39aeb10cfc	safekeeper: remove unused safekeeper runtimes (#8433 ) There are unused safekeeper runtimes `WAL_REMOVER_RUNTIME` and `METRICS_SHIFTER_RUNTIME`. `WAL_REMOVER_RUNTIME` was implemented in [#4119](https://github.com/neondatabase/neon/pull/4119) and removed in [#7887](https://github.com/neondatabase/neon/pull/7887). `METRICS_SHIFTER_RUNTIME` was also implemented in [#4119](https://github.com/neondatabase/neon/pull/4119) but has never been used. I removed unused safekeeper runtimes `WAL_REMOVER_RUNTIME` and `METRICS_SHIFTER_RUNTIME`.	2024-07-19 13:10:19 -04:00
John Spray	44781518d0	storage scrubber: GC ancestor shard layers (#8196 ) ## Problem After a shard split, the pageserver leaves the ancestor shard's content in place. It may be referenced by child shards, but eventually child shards will de-reference most ancestor layers as they write their own data and do GC. We would like to eventually clean up those ancestor layers to reclaim space. ## Summary of changes - Extend the physical GC command with `--mode=full`, which includes cleaning up unreferenced ancestor shard layers - Add test `test_scrubber_physical_gc_ancestors` - Remove colored log output: in testing this is irritating ANSI code spam in logs, and in interactive use doesn't add much. - Refactor storage controller API client code out of storcon_client into a `storage_controller/client` crate - During physical GC of ancestors, call into the storage controller to check that the latest shards seen in S3 reflect the latest state of the tenant, and there is no shard split in progress.	2024-07-19 19:07:59 +03:00
Christian Schwarz	16071e57c6	pageserver: remove obsolete cached_metric_collection_interval (#8370 ) We're removing the usage of this long-meaningless config field in https://github.com/neondatabase/aws/pull/1599 Once that PR has been deployed to staging and prod, we can merge this PR.	2024-07-19 17:01:02 +01:00
Peter Bendel	392d3524f9	Bodobolero/fix root permissions (#8429 ) ## Problem My prior PR https://github.com/neondatabase/neon/pull/8422 caused leftovers in the GitHub action runner work directory with root permission. As an example see here https://github.com/neondatabase/neon/actions/runs/10001857641/job/27646237324#step:3:37 To work-around we install vanilla postgres as non-root using deb packages in /home/nonroot user directory ## Summary of changes - since we cannot use root we install the deb pkgs directly and create symbolic links for psql, pgbench and libs in expected places - continue jobs an aws even if azure jobs fail (because this region is currently unreliable)	2024-07-19 14:40:55 +01:00
Arpad Müller	c96e8012ce	Enable zstd in tests (#8368 ) Successor of #8288 , just enable zstd in tests. Also adds a test that creates easily compressable data. Part of #5431 --------- Co-authored-by: John Spray <john@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-18 19:09:57 +01:00
Arthur Petukhovsky	5a772761ee	Change log level for GuardDrop error (#8305 ) The error means that manager exited earlier than `ResidenceGuard` and it's not unexpected with current deletion implementation. This commit changes log level to reduse noise.	2024-07-18 16:26:27 +00:00
Peter Bendel	841b76ea7c	Temporarily use vanilla pgbench and psql (client) for running pgvector benchmark (#8422 ) ## Problem https://github.com/neondatabase/neon/issues/8275 is not yet fixed Periodic benchmarking fails with SIGABRT in pgvector step, see https://github.com/neondatabase/neon/actions/runs/9967453263/job/27541159738#step:7:393 ## Summary of changes Instead of using pgbench and psql from Neon artifacts, download vanilla postgres binaries into the container and use those to run the client side of the test.	2024-07-18 18:18:18 +02:00
Alex Chi Z.	a4434cf1c0	pageserver: integrate k-merge with bottom-most compaction (#8415 ) Use the k-merge iterator in the compaction process to reduce memory footprint. part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes * refactor the bottom-most compaction code to use k-merge iterator * add Send bound on some structs as it is used across the await points --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-18 17:16:44 +01:00
Arthur Petukhovsky	d263b1804e	Fix partial upload bug with invalid remote state (#8383 ) We have an issue that some partial uploaded segments can be actually missing in remote storage. I found this issue when was looking at the logs in staging, and it can be triggered by failed uploads: 1. Code tries to upload `SEG_TERM_LSN_LSN_sk5.partial`, but receives error from S3 2. The failed attempt is saved to `segments` vec 3. After some time, the code tries to upload `SEG_TERM_LSN_LSN_sk5.partial` again 4. This time the upload is successful and code calls `gc()` to delete previous uploads 5. Since new object and old object share the same name, uploaded data gets deleted from remote storage This commit fixes the issue by patching `gc()` not to delete objects with the same name as currently uploaded. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-18 13:46:00 +01:00
John Spray	b461755326	tests: turn on safekeeper eviction by default (#8352 ) ## Problem Ahead of enabling eviction in the field, where it will become the normal/default mode, let's enable it by default throughout our tests in case any issues become visible there. ## Summary of changes - Make default `extra_opts` for safekeepers enable offload & deletion - Set low timeouts in `extra_opts` so that tests running for tens of seconds have a chance to hit some of these background operations.	2024-07-18 12:59:14 +01:00
John Spray	9ded2556df	tests: increase test_pg_regress and test_isolation timeouts (#8418 ) ## Problem These tests time out ~1 in 50 runs when in debug mode. There is no indication of a real issue: they're just wrappers that have large numbers of individual tests contained within on pytest case. ## Summary of changes - Bump pg_regress timeout from 600 to 900s - Bump test_isolation timeout from 300s (default) to 600s In future it would be nice to break out these tests to run individual cases (or batches thereof) as separate tests, rather than this monolith.	2024-07-18 10:23:17 +01:00
John Spray	7672e49ab5	tests: fix metrics check in test_s3_eviction (#8419 ) ## Problem This test would occasionally fail its metric check. This could happen in the rare case that the nodes had all been restarted before their most recent eviction. The metric check was added in https://github.com/neondatabase/neon/pull/8348 ## Summary of changes - Check metrics before each restart, accumulate into a bool that we assert on at the end of the test	2024-07-18 10:14:56 +01:00
Christian Schwarz	a2d170b6d0	NeonEnv.from_repo_dir: use storage_controller_db instead of `attachments.json` (#8382 ) When `NeonEnv.from_repo_dir` was introduced, storage controller stored its state exclusively `attachments.json`. Since then, it has moved to using Postgres, which stores its state in `storage_controller_db`. But `NeonEnv.from_repo_dir` wasn't adjusted to do this. This PR rectifies the situation. Context for this is failures in `test_pageserver_characterize_throughput_with_n_tenants` CF: https://neondb.slack.com/archives/C033RQ5SPDH/p1721035799502239?thread_ts=1720901332.293769&cid=C033RQ5SPDH Notably, `from_repo_dir` is also used by the backwards- and forwards-compatibility. Thus, the changes in this PR affect those tests as well. However, it turns out that the compatibility snapshot already contains the `storage_controller_db`. Thus, it should just work and in fact we can remove hacks like `fixup_storage_controller`. Follow-ups created as part of this work: * https://github.com/neondatabase/neon/issues/8399 * https://github.com/neondatabase/neon/issues/8400	2024-07-18 10:56:07 +02:00
dotdister	1303d47778	Fix comment in Control Plane (#8406 ) ## Problem There are something wrong in the comment of `control_plane/src/broker.rs` and `control_plane/src/pageserver.rs` ## Summary of changes Fixed the comment about component name and their data path in `control_plane/src/broker.rs` and `control_plane/src/pageserver.rs`.	2024-07-18 09:33:46 +01:00
Joonas Koivunen	e250b9e063	test: allow requests to any pageserver get cancelled (#8413 ) Fix flakyness on `test_sharded_timeline_detach_ancestor` which does not reproduce on a fast enough runner by allowing cancelled request before completing on all pageservers. It was only allowed on half of the pageservers. Failure evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8352/9972357040/index.html#suites/a1c2be32556270764423c495fad75d47/7cca3e3d94fe12f2	2024-07-17 22:03:02 +01:00
John Spray	0c236fa465	pageserver: layer count & size metrics (#8410 ) ## Problem We lack insight into: - How much of a tenant's physical size is image vs. delta layers - Average sizes of image vs. delta layers - Total layer counts per timeline, indicating size of index_part object As well as general observability love, this is motivated by https://github.com/neondatabase/neon/issues/6738, where we need to define some sensible thresholds for storage amplification, and using total physical size may not work well (if someone does a lot of DROPs then it's legitimate for the physical-synthetic ratio to be huge), but the ratio between image layer size and delta layer size may be a better indicator of whether we're generating unreasonable quantities of image layers. ## Summary of changes - Add pageserver_layer_bytes and pageserver_layer_count metrics, labelled by timeline and `kind` (delta or image) - Add & subtract these with LayerInner's lifetime. I'm intentionally avoiding using a generic metric RAII guard object, to avoid bloating LayerInner: it already has all the information it needs to update metric on new+drop.	2024-07-17 21:55:20 +01:00
Yuchen Liang	da84a250c6	docs: update storage controller db name in doc (#8411 ) The db name was renamed to storage_controller from attachment_service. Doc was stale.	2024-07-17 15:19:40 -04:00
John Spray	975f8ac658	tests: add test_compaction_l0_memory (#8403 ) This test reproduces the case of a writer creating a deep stack of L0 layers. It uses realistic layer sizes and writes several gigabytes of data, therefore runs as a performance test although it is validating memory footprint rather than performance per se. It acts a regression test for two recent fixes: - https://github.com/neondatabase/neon/pull/8401 - https://github.com/neondatabase/neon/pull/8391 In future it will demonstrate the larger improvement of using a k-merge iterator for L0 compaction (#8184) This test can be extended to enforce limits on the memory consumption of other housekeeping steps, by restarting the pageserver and then running other things to do the same "how much did RSS increase" measurement.	2024-07-17 17:35:27 +00:00
Alex Chi Z.	839a5724a4	test(pageserver): more k-merge tests on duplicated keys (#8404 ) Existing tenants and some selection of layers might produce duplicated keys. Add tests to ensure the k-merge iterator handles it correctly. We also enforced ordering of the k-merge iterator to put images before deltas. part of https://github.com/neondatabase/neon/issues/8002 --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-17 16:22:38 +01:00
Peter Bendel	f2b8e390e7	Bodobolero/pgbench compare azure (#8409 ) ## Problem We want to run performance tests on all supported cloud providers. We want to run most tests on the postgres version which is default for new projects in production, currently (July 24) this is postgres version 16 ## Summary of changes - change default postgres version for some (performance) tests to 16 (which is our default for new projects in prod anyhow) - add azure region to pgbench_compare jobs - add azure region to pgvector benchmarking jobs - re-used project `weathered-snowflake-88107345` was prepared with 1 million embeddings running on 7 minCU 7 maxCU in azure region to compare with AWS region (pgvector indexing and hnsw queries) - see job pgbench-pgvector - Note we now have a 11 environments combinations where we run pgbench-compare and 5 are for k8s-pod (deprecated) which we can remove in the future once auto-scaling team approves. ## Logs A current run with the changes from this pull request is running here https://github.com/neondatabase/neon/actions/runs/9972096222 Note that we currently expect some failures due to - https://github.com/neondatabase/neon/issues/8275 - instability of projects on azure region	2024-07-17 16:56:32 +02:00
John Spray	f7131834eb	docs/rfcs: timeline ancestor detach API (#6888 ) ## Problem When a tenant creates a new timeline that they will treat as their 'main' history, it is awkward to permanently retain an 'old main' timeline as its ancestor. Currently this is necessary because it is forbidden to delete a timeline which has descendents. ## Summary of changes A new pageserver API is proposed to 'adopt' data from a parent timeline into one of its children, such that the link between ancestor and child can be severed, leaving the parent in a state where it may then be deleted. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-17 14:25:35 +00:00
John Spray	4a90423292	pageserver: reduce size of delta layer ValueRef (#8401 ) ## Problem ValueRef is an unnecessarily large structure, because it carries a cursor. L0 compaction currently instantiates gigabytes of these under some circumstances. ## Summary of changes - Carry a ref to the parent layer instead of a cursor, and construct a cursor on demand. This reduces RSS high watermark during L0 compaction by about 20%.	2024-07-16 21:36:17 +01:00
John Spray	f4f0869dc8	pageserver: exclude un-read layers from short residence statistic (#8396 ) ## Problem The `evictions_with_low_residence_duration` is used as an indicator of cache thrashing. However, there are situations where it is quite legitimate to only have a short residence during compaction, where a delta is downloaded, used to generate an image layer, and then discarded. This can lead to false positive alerts. ## Summary of changes - Only track low residence duration for layers that have been accessed at least once (compaction doesn't count as an access). This will give us a metric that indicates thrashing on layers that the _user_ is using, rather than those we're downloading for housekeeping purposes. Once we add "layer visibility" as an explicit property of layers, this can also be used as a cleaner condition (residence of non-visible layers should never be alertable)	2024-07-16 20:55:29 +01:00
Alex Chi Z.	0950866fa8	fix(pageserver): limit num of delta layers for l0 compaction (#8391 ) ## Problem close https://github.com/neondatabase/neon/issues/8389 ## Summary of changes A quick mitigation for tenants with fast writes. We compact at most 60 delta layers at a time, expecting a memory footprint of 15GB. We will pick the oldest 60 L0 layers. This should be a relatively safe change so no test is added. Question is whether to make this parameter configurable via tenant config. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2024-07-16 20:43:24 +01:00
Tristan Partin	7cf59ae5b4	Add some typing to Endpoint.respec()	2024-07-16 12:12:29 -05:00
Tristan Partin	b197cc20fc	Hide import behind TYPE_CHECKING	2024-07-16 12:12:29 -05:00
Tristan Partin	ba17025a57	Run each migration in its own transaction Previously, every migration was run in the same transaction. This is preparatory work for fixing CVE-2024-4317.	2024-07-16 12:12:29 -05:00
Tristan Partin	b5ab055526	Rename compute migrations to start at 1 This matches what we put into the neon_migration.migration_id table.	2024-07-16 12:12:29 -05:00
John Spray	a40b402957	pageserver: clean up GcCutoffs names (#8379 ) - `horizon` is a confusing term, it's not at all obvious that this means space-based retention limit, rather than the total GC history limit. Rename to `GcCutoffs::space`. - `pitr` is less confusing, but still an unecessary level of indirection from what we really mean: a time-based condition. The fact that we use that that time-history for Point In Time Recovery doesn't mean we have to refer to time as "pitr" everywhere. Rename to `GcCutoffs::time`.	2024-07-16 13:54:54 +00:00
dependabot[bot]	d2ee760eb2	build(deps): bump setuptools from 65.5.1 to 70.0.0 (#8387 ) Bumps [setuptools](https://github.com/pypa/setuptools) from 65.5.1 to 70.0.0. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>	2024-07-16 12:20:23 +02:00
Arpad Müller	66337097de	Avoid the storage controller in test_tenant_creation_fails (#8392 ) As described in #8385, the likely source for flakiness in test_tenant_creation_fails is the following sequence of events: 1. test instructs the storage controller to create the tenant 2. storage controller adds the tenant and persists it to the database. issues a creation request 3. the pageserver restarts with the failpoint disabled 4. storage controller's background reconciliation still wants to create the tenant 5. pageserver gets new request to create the tenant from background reconciliation This commit just avoids the storage controller entirely. It has its own set of issues, as the re-attach request will obviously not include the tenant, but it's still useful to test for non-existence of the tenant. The generation is also not optional any more during tenant attachment. If you omit it, the pageserver yields an error. We change the signature of `tenant_attach` to reflect that. Alternative to #8385 Fixes #8266	2024-07-16 12:19:28 +02:00
Anastasia Lubennikova	e6dadcd2f3	Compute: add compatibility patch for rum Fixes #8251	2024-07-16 13:10:34 +03:00
John Spray	83e07c1a5b	pageserver: un-Arc Timeline::layers (#8386 ) ## Problem This structure was in an Arc<> unnecessarily, making it harder to reason about its lifetime (i.e. it was superficially possible for LayerManager to outlive timeline, even though no code used it that way) ## Summary of changes - Remove the Arc<>	2024-07-16 08:52:49 +01:00
Arpad Müller	ee263e6a62	Allow the new clippy::doc_lazy_continuation lint (#8388 ) The `doc_lazy_continuation` lint of clippy is still unknown on latest rust stable. Fixes fall-out from #8151.	2024-07-16 00:16:18 +00:00
Sasha Krassovsky	7eb37fea26	Allow reusing projects between runs of logical replication benchmarks (#8393 )	2024-07-15 14:55:57 -07:00
Joonas Koivunen	730db859c7	feat(timeline_detach_ancestor): success idempotency (#8354 ) Right now timeline detach ancestor reports an error (409, "no ancestor") on a new attempt after successful completion. This makes it troublesome for storage controller retries. Fix it to respond with `200 OK` as if the operation had just completed quickly. Additionally, the returned timeline identifiers in the 200 OK response are now ordered so that responses between different nodes for error comparison are done by the storage controller added in #8353. Design-wise, this PR introduces a new strategy for accessing the latest uploaded IndexPart: `RemoteTimelineClient::initialized_upload_queue(&self) -> Result<UploadQueueAccessor<'_>, NotInitialized>`. It should be a more scalable way to query the latest uploaded `IndexPart` than to add a query method for each question directly on `RemoteTimelineClient`. GC blocking will need to be introduced to make the operation fully idempotent. However, it is idempotent for the cases demonstrated by tests. Cc: #6994	2024-07-15 17:47:53 +00:00
John Spray	04448ac323	pageserver: use PITR GC cutoffs as authoritative (#8365 ) ## Problem Pageserver GC uses a size-based condition (GC "horizon" in addition to time-based "PITR"). Eventually we plan to retire the size-based condition: https://github.com/neondatabase/neon/issues/6374 Currently, we always apply the more conservative of the two, meaning that tenants always retain at least 64MB of history (default horizon), even after a very long time has passed. This is particularly acute in cases where someone has dropped tables/databases, and then leaves a database idle: the horizon can prevent GCing very large quantities of historical data (we already account for this in synthetic size by ignoring gc horizon). We're not entirely removing GC horizon right now because we don't want to 100% rely on standby_horizon for robustness of physical replication, but we can tweak our logic to avoid retaining that 64MB LSN length indefinitely. ## Summary of changes - Rework `Timeline::find_gc_cutoffs`, with new logic: - If there is no PITR set, then use `DEFAULT_PITR_INTERVAL` (1 week) to calculate a time threshold. Retain either the horizon or up to that thresholds, whichever requires less data. - When there is a PITR set, and we have unambiguously resolved the timestamp to an LSN, then ignore the GC horizon entirely. For typical PITRs (1 day, 1 week), this will still easily retain enough data to avoid stressing read only replicas. The key property we end up with, whether a PITR is set or not, is that after enough time has passed, our GC cutoff on an idle timeline will catch up with the last_record_lsn. Using `DEFAULT_PITR_INTERVAL` is a bit of an arbitrary hack, but this feels like it isn't really worth the noise of exposing in TenantConfig. We could just make it a different named constant though. The end-end state will be that there is no gc_horizon at all, and that tenants with pitr_interval=0 would truly retain no history, so this constant would go away.	2024-07-15 17:43:05 +01:00
Joonas Koivunen	324e4e008f	feat(storcon): timeline detach ancestor passthrough (#8353 ) Currently storage controller does not support forwarding timeline detach ancestor requests to pageservers. Add support for forwarding `PUT .../:tenant_id/timelines/:timeline_id/detach_ancestor`. Implement the support mostly as is, because the timeline detach ancestor will be made (mostly) idempotent in future PR. Cc: #6994	2024-07-15 18:08:24 +03:00
Christian Schwarz	b49b450dc4	remove page_service `show <tenant_id>` (#8372 ) This operation isn't used in practice, so let's remove it. Context: in https://github.com/neondatabase/neon/pull/8339	2024-07-15 15:33:56 +01:00
Konstantin Knizhnik	8a8b83df27	Add neon.running_xacts_overflow_policy to make it possible for RO replica to startup without primary even in case running xacts overflow (#8323 ) ## Problem Right now if there are too many running xacts to be restored from CLOG at replica startup, then replica is not trying to restore them and wait for non-overflown running-xacs WAL record from primary. But if primary is not active, then replica will not start at all. Too many running xacts can be caused by transactions with large number of subtractions. But right now it can be also cause by two reasons: - Lack of shutdown checkpoint which updates `oldestRunningXid` (because of immediate shutdown) - nextXid alignment on 1024 boundary (which cause loosing ~1k XIDs on each restart) Both problems are somehow addressed now. But we have existed customers with "sparse" CLOG and lack of checkpoints. To be able to start RO replicas for such customers I suggest to add GUC which allows replica to start even in case of subxacts overflow. ## Summary of changes Add `neon.running_xacts_overflow_policy` with the following values: - ignore: restore from CLOG last N XIDs and accept connections - skip: do not restore any XIDs from CXLOGbut still accept connections - wait: wait non-overflown running xacts record from primary node ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-15 15:52:00 +03:00
Arseny Sher	4bdfb96078	Fix test_timeline_copy flakiness. fixes https://github.com/neondatabase/neon/issues/8355	2024-07-15 15:21:16 +03:00
Luca Bruno	8da3b547f8	proxy/http: switch to typed_json (#8377 ) ## Summary of changes This switches JSON rendering logic to `typed_json` in order to reduce the number of allocations in the HTTP responder path. Followup from https://github.com/neondatabase/neon/pull/8319#issuecomment-2216991760. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-07-15 12:38:52 +01:00
Vlad Lazar	b329b1c610	tests: allow list breaching min resident size in statvfs test (#8358 ) ## Problem This test would sometimes violate the min resident size during disk eviction and fail due to the generate warning log. Disk usage candidate collection only takes into account active tenants. However, the statvfs call takes into account the entire tenants directory, which includes tenants which haven't become active yet. After re-starting the pageserver, disk usage eviction may kick in before both tenants have become active. Hence, the logic will try to satisfy thedisk usage requirements by evicting everything belonging to the active tenant, and hence violating the tenant minimum resident size. ## Summary of changes Allow the warning	2024-07-12 17:31:17 +01:00
Alex Chi Z	4184685721	fix(pageserver): unique test harness name for merge_in_between (#8366 ) As title, there should be a way to detect duplicated harness names in the future :( Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-12 14:28:13 +01:00
Conrad Ludgate	411a130675	Fix nightly warnings 2024 june (#8151 ) ## Problem new clippy warnings on nightly. ## Summary of changes broken up each commit by warning type. 1. Remove some unnecessary refs. 2. In edition 2024, inference will default to `!` and not `()`. 3. Clippy complains about doc comment indentation 4. Fix `Trait + ?Sized` where `Trait: Sized`. 5. diesel_derives triggering `non_local_defintions`	2024-07-12 13:58:04 +01:00
John Spray	0645ae318e	pageserver: circuit breaker on compaction (#8359 ) ## Problem We already back off on compaction retries, but the impact of a failing compaction can be so great that backing off up to 300s isn't enough. The impact is consuming a lot of I/O+CPU in the case of image layer generation for large tenants, and potentially also leaking disk space. Compaction failures are extremely rare and almost always indicate a bug, frequently a bug that will not let compaction to proceed until it is fixed. Related: https://github.com/neondatabase/neon/issues/6738 ## Summary of changes - Introduce a CircuitBreaker type - Add a circuit breaker for compaction, with a policy that after 5 failures, compaction will not be attempted again for 24 hours. - Add metrics that we can alert on: any >0 value for `pageserver_circuit_breaker_broken_total` should generate an alert. - Add a test that checks this works as intended. Couple notes to reviewers: - Circuit breakers are intrinsically a defense-in-depth measure: this is not the solution to any underlying issues, it is just a general mitigation for "unknown unknowns" that might be encountered in future. - This PR isn't primarily about writing a perfect CircuitBreaker type: the one in this PR is meant to be just enough to mitigate issues in compaction, and make it easy to monitor/alert on these failures. We can refine this type in future as/when we want to use it elsewhere.	2024-07-12 12:04:02 +01:00
Japin Li	86d6ef305a	Remove fs2 dependency (#8350 ) The fs2 dependency is not needed anymore after commit `d42700280`.	2024-07-12 12:56:06 +03:00
Arpad Müller	2e37aa3fe8	Implement decompression for vectored reads (#8302 ) Implement decompression of images for vectored reads. This doesn't implement support for still treating blobs as uncompressed with the bits we reserved for compression, as we have removed that functionality in #8300 anyways. Part of #5431	2024-07-12 04:32:34 +02:00
Arpad Müller	30bbfde50d	Pass configured compression param to image generation (#8363 ) We need to pass on the configured compression param during image layer generation. This was an oversight of #8106, and the likely cause why #8288 didn't bring any interesting regressions. Part of https://github.com/neondatabase/neon/issues/5431	2024-07-12 01:43:44 +02:00
Sasha Krassovsky	82b9a44ab4	Grant execute on snapshot functions to neon_superuser (#8346 ) ## Problem I need `neon_superuser` to be allowed to create snapshots for replication tests ## Summary of changes Adds a migration that grants these functions to neon_superuser	2024-07-11 20:29:35 +00:00
Joonas Koivunen	4a87bac036	test: limit `test_layer_download_timeouted` to MOCK_S3 (#8331 ) Requests against REAL_S3 on CI can consistently take longer than 1s; testing the short timeouts against it made no sense in hindsight, as MOCK_S3 works just as well. evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8229/9857994025/index.html#suites/b97efae3a617afb71cb8142f5afa5224/6828a50921660a32	2024-07-11 15:03:35 -04:00
Alex Chi Z	38b4ed297e	feat(pageserver): rewrite streaming vectored read planner (#8242 ) Rewrite streaming vectored read planner to be a separate struct. The API is designed to produce batches around `max_read_size` instead of exactly less than that so that `handle_XX` returns one batch a time. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-11 18:28:16 +00:00
Arseny Sher	cd29156927	Fix memory context of NeonWALReader allocation. Allocating it in short living context is wrong because it is reused during backend lifetime.	2024-07-11 20:31:15 +03:00
John Spray	814c8e8f68	storage controller: add node deletion API (#8226 ) ## Problem In anticipation of later adding a really nice drain+delete API, I initially only added an intentionally basic `/drop` API that is just about usable for deleting nodes in a pinch, but requires some ugly storage controller restarts to persuade it to restart secondaries. ## Summary of changes I started making a few tiny fixes, and ended up writing the delete API... - Quality of life nit: ordering of node + tenant listings in storcon_cli - Papercut: Fix the attach_hook using the wrong operation type for reporting slow locks - Make Service::spawn tolerate `generation_pageserver` columns that point to nonexistent node IDs. I started out thinking of this as a general resilience thing, but when implementing the delete API I realized it was actually a legitimate end state after the delete API is called (as that API doesn't wait for all reconciles to succeed). - Add a `DELETE` API for nodes, which does not gracefully drain, but does reschedule everything. This becomes safe to use when the system is in any state, but will incur availability gaps for any tenants that weren't already live-migrated away. If tenants have already been drained, this becomes a totally clean + safe way to decom a node. - Add a test and a storcon_cli wrapper for it This is meant to be a robust initial API that lets us remove nodes without doing ugly things like restarting the storage controller -- it's not quite a totally graceful node-draining routine yet. There's more work in https://github.com/neondatabase/neon/issues/8333 to get to our end-end state.	2024-07-11 17:05:47 +01:00
John Spray	0159ae9536	safekeeper: eviction metrics (#8348 ) ## Problem Follow up to https://github.com/neondatabase/neon/pull/8335, to improve observability of how many evict/restores we are doing. ## Summary of changes - Add `safekeeper_eviction_events_started_total` and `safekeeper_eviction_events_completed_total`, with a "kind" label of evict or restore. This gives us rates, and also ability to calculate how many are in progress. - Generalize SafekeeperMetrics test type to use the same helpers as pageserver, and enable querying any metric. - Read the new metrics at the end of the eviction test.	2024-07-11 17:05:35 +01:00
Vlad Lazar	d9a82468e2	storage_controller: fix ReconcilerWaiter::get_status (#8341 ) ## Problem SeqWait::would_wait_for returns Ok in the case when we would not wait for the sequence number and Err otherwise. ReconcilerWaiter::get_status uses it the wrong way around. This can cause the storage controller to go into a busy loop and make it look unavailable to the k8s controller. ## Summary of changes Use `SeqWait::would_wait_for` correctly.	2024-07-11 15:43:28 +01:00
Christian Schwarz	e26ef640c1	pageserver: remove `trace_read_requests` (#8338 ) `trace_read_requests` is a per `Tenant`-object option. But the `handle_pagerequests` loop doesn't know which `Tenant` object (i.e., which shard) the request is for. The remaining use of the `Tenant` object is to check `tenant.cancel`. That check is incorrect [if the pageserver hosts multiple shards](https://github.com/neondatabase/neon/issues/7427#issuecomment-2220577518). I'll fix that in a future PR where I completely eliminate the holding of `Tenant/Timeline` objects across requests. See [my code RFC](https://github.com/neondatabase/neon/pull/8286) for the high level idea. Note that we can always bring the tracing functionality if we need it. But since it's actually about logging the `page_service` wire bytes, it should be a `page_service`-level config option, not per-Tenant. And for enabling tracing on a single connection, we can implement a `set pageserver_trace_connection;` option.	2024-07-11 15:17:07 +02:00
Conrad Ludgate	98355a419a	Merge pull request #8351 from neondatabase/rc/proxy/2024-07-11 Proxy release 2024-07-11	2024-07-11 10:40:17 +01:00
Peter Bendel	c11b9cb43d	Run Performance bench on more platforms (#8312 ) ## Problem https://github.com/neondatabase/cloud/issues/14721 ## Summary of changes add one more platform to benchmarking job `57535c039c/.github/workflows/benchmarking.yml (L57C3-L126)` Run with pg 16, provisioner k8-neonvm by default on the new platform. Adjust some test cases to - not depend on database client <-> database server latency by pushing loops into server side pl/pgSQL functions - increase statement and test timeouts First successful run of these job steps https://github.com/neondatabase/neon/actions/runs/9869817756/job/27254280428	2024-07-11 10:07:12 +01:00
John Spray	69b6675da0	rfcs: add RFC for timeline archival (#8221 ) A design for a cheap low-resource state for idle timelines: - #8088	2024-07-11 08:23:51 +01:00
Stas Kelvich	6bbd34a216	Enable core dumps for postgres (#8272 ) Set core rmilit to ulimited in compute_ctl, so that all child processes inherit it. We could also set rlimit in relevant startup script, but that way we would depend on external setup and might inadvertently disable it again (core dumping worked in pods, but not in VMs with inittab-based startup).	2024-07-11 10:20:14 +03:00
John Spray	24f8133e89	safekeeper: add eviction_min_resident to stop evictions thrashing (#8335 ) ## Problem - The condition for eviction is not time-based: it is possible for a timeline to be restored in response to a client, that client times out, and then as soon as the timeline is restored it is immediately evicted again. - There is no delay on eviction at startup of the safekeeper, so when it starts up and sees many idle timelines, it does many evictions which will likely be immediately restored when someone uses the timeline. ## Summary of changes - Add `eviction_min_resident` parameter, and use it in `ready_for_eviction` to avoid evictions if the timeline has been resident for less than this period. - This also implicitly delays evictions at startup for `eviction_min_resident` - Set this to a very low number for the existing eviction test, which expects immediate eviction. The default period is 15 minutes. The general reasoning for that is that in the worst case where we thrash ~10k timelines on one safekeeper, downloading 16MB for each one, we should set a period that would not overwhelm the node's bandwidth.	2024-07-10 19:38:14 +01:00
Alex Chi Z	9f4511c554	feat(pageserver): add k-merge layer iterator with lazy loading (#8053 ) Part of https://github.com/neondatabase/neon/issues/8002. This pull request adds a k-merge iterator for bottom-most compaction. ## Summary of changes * Added back lsn_range / key_range in delta layer inner. This was removed due to https://github.com/neondatabase/neon/pull/8050, but added back because iterators need that information to process lazy loading. * Added lazy-loading k-merge iterator. * Added iterator wrapper as a unified iterator type for image+delta iterator. The current status and test should cover the use case for L0 compaction so that the L0 compaction process can bypass page cache and have a fixed amount of memory usage. The next step is to integrate this with the new bottom-most compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-10 14:11:27 -04:00
Arpad Müller	e78341e1c2	Remove ImageCompressionAlgorithm::DisabledNoDecompress (#8300 ) Removes the `ImageCompressionAlgorithm::DisabledNoDecompress` variant. We now assume any blob with the specific bits set is actually a compressed blob. The `ImageCompressionAlgorithm::Disabled` variant still remains and is the new default. Reverts large parts of #8238 , as originally intended in that PR. Part of #5431	2024-07-10 18:09:19 +02:00
dependabot[bot]	98387d6fb1	build(deps-dev): bump zipp from 3.8.1 to 3.19.1 Bumps [zipp](https://github.com/jaraco/zipp) from 3.8.1 to 3.19.1. - [Release notes](https://github.com/jaraco/zipp/releases) - [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst) - [Commits](https://github.com/jaraco/zipp/compare/v3.8.1...v3.19.1) --- updated-dependencies: - dependency-name: zipp dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2024-07-10 17:08:18 +01:00
Conrad Ludgate	1afab13ccb	proxy: remove some trace logs (#8334 )	2024-07-10 15:05:25 +01:00
John Spray	e89ec55ea5	tests: stabilize test_sharding_split_compaction (#8318 ) ## Problem This test incorrectly assumed that a post-split compaction would only drop content. This was easily destabilized by any changes to image generation rules. ## Summary of changes - Before split, do a full image layer generation pass, to guarantee that post-split compaction should only drop data, never create it. - Fix the force_image_layer_creation mode of compaction that we use from tests like this: previously it would try and generate image layers even if one already existed with the same layer key, which caused compaction to fail.	2024-07-10 14:14:10 +01:00
Conrad Ludgate	fe13fccdc2	proxy: pg17 fixes (#8321 ) ## Problem #7809 - we do not support sslnegotiation=direct #7810 - we do not support negotiating down the protocol extensions. ## Summary of changes 1. Same as postgres, check the first startup packet byte for tls header `0x16`, and check the ALPN. 2. Tell clients using protocol >3.0 to downgrade	2024-07-10 09:10:29 +01:00
Christian Schwarz	1a49f1c15c	pageserver: move `page_service`'s `import basebackup` / `import wal` to mgmt API (#8292 ) I want to fix bugs in `page_service` ([issue](https://github.com/neondatabase/neon/issues/7427)) and the `import basebackup` / `import wal` stand in the way / make the refactoring more complicated. We don't use these methods anyway in practice, but, there have been some objections to removing the functionality completely. So, this PR preserves the existing functionality but moves it into the HTTP management API. Note that I don't try to fix existing bugs in the code, specifically not fixing * it only ever worked correctly for unsharded tenants * it doesn't clean up on error All errors are mapped to `ApiError::InternalServerError`.	2024-07-09 23:17:42 +02:00
Christian Schwarz	9bb16c8780	fix(l0_flush): drops permit before fsync, potential cause for OOMs (#8327 ) ## Problem Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1720511577862519 We're seeing OOMs in staging on a pageserver that has l0_flush.mode=Direct enabled. There's a strong correlation between jumps in `maxrss_kb` and `pageserver_timeline_ephemeral_bytes`, so, it's quite likely that l0_flush.mode=Direct is the culprit. Notably, the expected max memory usage on that staging server by the l0_flush.mode=Direct is ~2GiB but we're seeing as much as 24GiB max RSS before the OOM kill. One hypothesis is that we're dropping the semaphore permit before all the dirtied pages have been flushed to disk. (The flushing to disk likely happens in the fsync inside the `.finish()` call, because we're using ext4 in data=ordered mode). ## Summary of changes Hold the permit until after we're done with `.finish()`.	2024-07-09 19:58:48 +01:00
Christian Schwarz	3f7aebb01c	refactor: postgres_backend: replace abstract shutdown_watcher with CancellationToken (#8295 ) Preliminary refactoring while working on https://github.com/neondatabase/neon/issues/7427 and specifically https://github.com/neondatabase/neon/pull/8286	2024-07-09 21:11:11 +03:00
Tristan Partin	abc330e095	Add an application_name to more Neon connections Helps identify connections in the logs.	2024-07-09 12:42:09 -05:00
Tristan Partin	6d3cb222ee	Refactor how migrations are ran Just a small improvement I noticed while looking at fixing CVE-2024-4317 in Neon.	2024-07-09 12:42:09 -05:00
Alex Chi Z	b1fe8259b4	fix(storage-scrubber): use default AWS authentication (#8299 ) part of https://github.com/neondatabase/cloud/issues/14024 close https://github.com/neondatabase/neon/issues/7665 Things running in k8s container use this authentication: https://docs.aws.amazon.com/sdkref/latest/guide/feature-container-credentials.html while we did not configure the client to use it. This pull request simply uses the default s3 client credential chain for storage scrubber. It might break compatibility with minio. ## Summary of changes * Use default AWS credential provider chain. * Improvements for s3 errors, we now have detailed errors and correct backtrace on last trial of the operation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-09 18:41:37 +01:00
Conrad Ludgate	4a5b55c834	chore: fix nightly build (#8142 ) ## Problem `cargo +nightly check` fails ## Summary of changes Updates `measured`, `time`, and `crc32c`. * `measured`: updated to fix https://github.com/rust-lang/rust/issues/125763. * `time`: updated to fix https://github.com/rust-lang/rust/issues/125319 * `crc32c`: updated to remove some nightly feature detection with a removed nightly feature	2024-07-09 18:25:49 +01:00
Alex Chi Z	73fa3c014b	chore(storage-scrubber): allow disable file logging (#8297 ) part of https://github.com/neondatabase/cloud/issues/14024, k8s does not always have a volume available for logging, and I'm running into weird permission errors... While I could spend time figuring out how to create temp directories for logging, I think it would be better to just disable file logging as k8s containers are ephemeral and we cannot retrieve anything on the fs after the container gets removed. ## Summary of changes `PAGESERVER_DISABLE_FILE_LOGGING=1` -> file logging disabled Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-09 17:11:37 +01:00
Luca BRUNO	c196cf6ac1	proxy/http: avoid spurious vector reallocations This tweaks the rows-to-JSON rendering logic in order to avoid allocating 0-sized temporary vectors and later growing them to insert elements. As the exact size is known in advance, both vectors can be built with an exact capacity upfront. This will avoid further vector growing/reallocation in the rendering hotpath. Signed-off-by: Luca BRUNO <lucab@lucabruno.net>	2024-07-09 15:20:00 +01:00
Alexander Bayandin	8b15864f59	CI(promote-compatibility-data): take into account commit sha (#8283 ) ## Problem In https://github.com/neondatabase/neon/pull/8161, we changed the path to Neon artefacts by adding commit sha to it, but we missed adding these changes to `promote-compatibility-data` job that we use for backward/forward- compatibility testing. ## Summary of changes - Add commit sha to `promote-compatibility-data`	2024-07-09 08:39:10 +00:00
Yuchen Liang	d9c1068cf4	tests: increase approx size equal threshold to avoid `test_lsn_lease_size` flakiness (#8282 ) ## Summary of changes Increase the `assert_size_approx_equal` threshold to avoid flakiness of `test_lsn_lease_size`. Still needs more investigation to fully resolve #8293. - Also set `autovacuum=off` for the endpoint we are running in the test. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-08 21:50:13 +01:00
John Spray	811eb88b89	tests: stabilize test_timeline_size_quota_on_startup (#8255 ) ## Problem `test_timeline_size_quota_on_startup` assumed that writing data beyond the size limit would always be blocked. This is not so: the limit is only enforced if feedback makes it back from the pageserver to the safekeeper + compute. Closes: https://github.com/neondatabase/neon/issues/6562 ## Summary of changes - Modify the test to wait for the pageserver to catch up. The size limit was never actually being enforced robustly, the original version of this test was just writing much more than 30MB and about 98% of the time getting lucky such that the feedback happened to arrive before the tests for loop was done. - If the test fails, log the logical size as seen by the pageserver.	2024-07-08 20:06:34 +00:00
Alex Chi Z	df3dc6e4c1	fix(pageserver): write to both v1+v2 for aux tenant import (#8316 ) close https://github.com/neondatabase/neon/issues/8202 ref https://github.com/neondatabase/neon/pull/6560 For tenant imports, we now write the aux files into both v1+v2 storage, so that the test case can pick either one for testing. Given the API is only used for testing, this looks like a safe change. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-08 20:05:59 +01:00
John Spray	daea26a22f	tests: use smaller layers in test_pg_regress (#8232 ) ## Problem Debug-mode runs of test_pg_regress are rather slow since https://github.com/neondatabase/neon/pull/8105, and occasionally exceed their 600s timeout. ## Summary of changes - Use 8MiB layer files, avoiding large ephemeral layers On a hetzner AX102, this takes the runtime from 230s to 190s. Which hopefully will be enough to get the runtime on github runners more reliably below its 600s timeout. This has the side benefit of exercising more of the pageserver stack (including compaction) under a workload that exercises a more diverse set of postgres functionality than most of our tests.	2024-07-08 19:05:35 +00:00
Alexey Kondratov	84b039e615	compute_ctl: Use 'fast' shutdown for Postgres termination (#8289 ) ## Problem We currently use 'immediate' mode in the most commonly used shutdown path, when the control plane calls a `compute_ctl` API to terminate Postgres inside compute without waiting for the actual pod / VM termination. Yet, 'immediate' shutdown doesn't create a shutdown checkpoint and ROs have bad times figuring out the list of running xacts during next start. ## Summary of changes Use 'fast' mode, which creates a shutdown checkpoint that is important for ROs to get a list of running xacts faster instead of going through the CLOG. On the control plane side, we poll this `compute_ctl` termination API for 10s, it should be enough as we don't really write any data at checkpoint time. If it times out, we anyway switch to the slow k8s-based termination. See https://www.postgresql.org/docs/current/server-shutdown.html for the list of modes and signals. The default VM shutdown hook already uses `fast` mode, see [1] [1] `c9fd8d7693/vm-image-spec.yaml (L30-L31)` Related to #6211	2024-07-08 19:54:02 +02:00
Yuchen Liang	a68edad913	refactor: move part of sharding API from `pageserver_api` to `utils` (#8254 ) ## Problem LSN Leases introduced in #8084 is a new API that is made shard-aware from day 1. To support ephemeral endpoint in #7994 without linking Postgres C API against `compute_ctl`, part of the sharding needs to reside in `utils`. ## Summary of changes - Create a new `shard` module in utils crate. - Move more interface related part of tenant sharding API to utils and re-export them in pageserver_api. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-08 15:43:10 +01:00
John Spray	fcdf060816	pageserver: respect has_relmap_file in collect_keyspace (#8276 ) ## Problem Rarely, a dbdir entry can exist with no `relmap_file_key` data. This causes compaction to fail, because it assumes that if the database exists, then so does the relmap file. Basebackup already handled this using a boolean to record whether such a key exists, but `collect_keyspace` didn't. ## Summary of changes - Respect the flag for whether a relfilemap exists in collect_keyspace - The reproducer for this issue will merge separately in https://github.com/neondatabase/neon/pull/8232	2024-07-08 15:39:41 +01:00
Tristan Partin	1c57f6bac3	Add long running replication tests These tests will help verify that replication, both physical and logical, works as expected in Neon. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-08 07:30:22 -07:00
Tristan Partin	b54dd9af15	Add PgBin.run_nonblocking() Allows a process to run without blocking program execution, which can be useful for certain test scenarios. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-08 07:30:22 -07:00
Tristan Partin	118847cd41	Log PG environment variables when a PgBin runs Useful for debugging situations like connecting to databases. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-08 07:30:22 -07:00
Tristan Partin	f2ec542954	Add Neon HTTP API test fixture This is a Python binding to the Neon HTTP API. It isn't complete, but can be extended as necessary. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-08 07:30:22 -07:00
Tristan Partin	2a3410d1c3	Hide import behind TYPE_CHECKING No need to import it if we aren't type checking anything.	2024-07-08 07:30:22 -07:00
John Spray	1121a1cbac	pageserver: switch to jemalloc (#8307 ) ## Problem - Resident memory on long running pageserver processes tends to climb: memory fragmentation is suspected. - Total resident memory may be a limiting factor for running on smaller nodes. ## Summary of changes - As a low-energy experiment, switch the pageserver to use jemalloc (not a net-new dependency, proxy already use it) - Decide at end of week whether to revert before next release.	2024-07-08 14:10:42 +01:00
Alex Chi Z	154ba5e1b4	fix(pageserver): ensure sparse keyspace is ordered (#8285 ) ## Problem Sparse keyspaces were constructed with ranges out of order: this didn't break things obviously, but meant that users of KeySpace functions that assume ordering would assert out. Closes https://github.com/neondatabase/neon/issues/8277 ## Summary of changes make sure the sparse keyspace has ordered keyspace parts	2024-07-08 14:05:49 +01:00
dependabot[bot]	27fe7f8963	build(deps): bump certifi from 2023.7.22 to 2024.7.4 (#8301 )	2024-07-06 17:41:54 +01:00
Arpad Müller	0a937b7f91	Add concurrency to the find-large-objects scrubber subcommand (#8291 ) The find-large-objects scrubber subcommand is quite fast if you run it in an environment with low latency to the S3 bucket (say an EC2 instance in the same region). However, the higher the latency gets, the slower the command becomes. Therefore, add a concurrency param and make it parallelized. This doesn't change that general relationship, but at least lets us do multiple requests in parallel and therefore hopefully faster. Running with concurrency of 64 (default): ``` 2024-07-05T17:30:22.882959Z INFO lazy_load_identity [...] [...] 2024-07-05T17:30:28.289853Z INFO Scanned 500 shards. [...] ``` With concurrency of 1, simulating state before this PR: ``` 2024-07-05T17:31:43.375153Z INFO lazy_load_identity [...] [...] 2024-07-05T17:33:51.987092Z INFO Scanned 500 shards. [...] ``` In other words, to list 500 shards, speed is increased from 2:08 minutes to 6 seconds. Follow-up of #8257, part of #5431	2024-07-05 21:36:28 +01:00
Arpad Müller	b8d031cd0c	Improve parsing of `ImageCompressionAlgorithm` (#8281 ) Improve parsing of the `ImageCompressionAlgorithm` enum to allow level customization like `zstd(1)`, as strum only takes `Default::default()`, i.e. `None` as the level. Part of #5431	2024-07-05 20:18:05 +00:00
Christian Schwarz	f0d29a0f3e	pageserver_live_connections: track as counter pair (#8227 ) Generally counter pairs are preferred over gauges. In this case, I found myself asking what the typical rate of accepted page_service connections on a pageserver is, and I couldn't answer it with the gauge metric. There are a few dashboards using this metric: https://github.com/search?q=repo%3Aneondatabase%2Fgrafana-dashboard-export%20pageserver_live_connections&type=code I'll convert them to use the new metric once this PR reaches prod. refs https://github.com/neondatabase/neon/issues/7427	2024-07-05 21:17:05 +01:00
Konstantin Knizhnik	13522fb722	Increase timeout for wating subscriber caught-up (#8118 ) ## Problem test_subscriber_restart has quit large failure rate' https://neonprod.grafana.net/d/fddp4rvg7k2dcf/regression-test-failures?orgId=1&var-test_name=test_subscriber_restart&var-max_count=100&var-restrict=false I can be caused by too small timeout (5 seconds) to wait until changes are propagated. Related to #8097 ## Summary of changes Increase timeout to 30 seconds. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-05 20:39:10 +03:00
Alexander Bayandin	c9fd8d7693	SELECT 💣(); (#8270 ) ## Problem We want to be able to test how our infrastructure reacts on segfaults in Postgres (for example, we collect cores, and get some required logs/metrics, etc) ## Summary of changes - Add `trigger_segfauls` function to `neon_test_utils` to trigger a segfault in Postgres - Add `trigger_panic` function to `neon_test_utils` to trigger SIGABRT (by using `elog(PANIC, ...)) - Fix cleanup logic in regression tests in endpoint crashed	2024-07-05 15:12:01 +01:00
Vlad Lazar	7dd2e447d3	pageserver: add time based image layer creation check (#8247 ) ## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.	2024-07-05 14:02:02 +01:00
John Spray	6849ae4810	safekeeper: add separate `tombstones` map for deleted timelines (#8253 ) ## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: https://github.com/neondatabase/neon/issues/6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes https://github.com/neondatabase/neon/pull/6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.	2024-07-05 11:17:44 +01:00
John Spray	5aae80640b	tests: make location_conf_churn more robust (#8271 ) ## Problem This test directly manages locations on pageservers and configuration of an endpoint. However, it did not switch off the parts of the storage controller that attempt to do the same: occasionally, the test would fail in a strange way such as a compute failing to accept a reconfiguration request. ## Summary of changes - Wire up the storage controller's compute notification hook to a no-op handler - Configure the tenant's scheduling policy to Stop.	2024-07-05 10:34:16 +01:00
Peter Bendel	6876f0d066	correct error handling for periodic pagebench runner status (#8274 ) ## Problem the following periodic pagebench run was failed but was still shown as successful https://github.com/neondatabase/neon/actions/runs/9798909458/job/27058179993#step:9:47 ## Summary of changes if the ec2 test runner reports a failure fail the job step and thus the workflow --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-07-05 10:23:46 +01:00
John Spray	e25ac31fc9	tests: extend allow list in deletion test (#8268 ) ## Problem `1ea5d8b132` tolerated this as an error message, but it can show up in logs as well. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8201/9780147712/index.html#testresult/263422f5f5f292ea/retries ## Summary of changes - Tolerate "failed to delete 1 objects" in pageserver logs, this occurs occasionally when injected failures exhaust deletion's retries.	2024-07-05 10:09:15 +01:00
Peter Bendel	711716c725	add checkout depth1 to workflow to access local github actions like generate allure report (#8259 ) ## Problem job step to create allure report fails https://github.com/neondatabase/neon/actions/runs/9781886710/job/27006997416#step:11:1 ## Summary of changes Shallow checkout of sources to get access to local github action needed in the job step ## Example run example run with this change https://github.com/neondatabase/neon/actions/runs/9790647724 do not merge this PR until the job is clean --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-07-04 22:17:45 +02:00
Konstantin Knizhnik	88b13d4552	implement rolling hyper-log-log algorithm (#8068 ) ## Problem See #7466 ## Summary of changes Implement algorithm descried in https://hal.science/hal-00465313/document Now new GUC is added: `neon.wss_max_duration` which specifies size of sliding window (in seconds). Default value is 1 hour. It is possible to request estimation of working set sizes (within this window using new function `approximate_working_set_size_seconds`. Old function `approximate_working_set_size` is preserved for backward compatibility. But its scope is also limited by `neon.wss_max_duration`. Version of Neon extension is changed to 1.4 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2024-07-04 22:03:58 +03:00
Arpad Müller	adde0ecfe0	Flatten compression algorithm setting (#8265 ) This flattens the compression algorithm setting, removing the `Option<_>` wrapping layer and making handling of the setting easier. It also adds a specific setting for disabled compression with the continued ability to read copmressed data, giving us the option to more easily back out of a compression rollout, should the need arise, which was one of the limitations of #8238. Implements my suggestion from https://github.com/neondatabase/neon/pull/8238#issuecomment-2206181594 , inspired by Christian's review in https://github.com/neondatabase/neon/pull/8238#pullrequestreview-2156460268 . Part of #5431	2024-07-04 16:59:19 +00:00
Yuchen Liang	19accfee4e	feat(pageserver): integrate lsn lease into synthetic size (#8220 ) Part of #7497, closes #8071. (accidentally closed #8208, reopened here) ## Problem After the changes in #8084, we need synthetic size to also account for leased LSNs so that users do not get free retention by running a small ephemeral endpoint for a long time. ## Summary of changes This PR integrates LSN leases into the synthetic size calculation. We model leases as read-only branches started at the leased LSN (except it does not have a timeline id). Other changes: - Add new unit tests testing whether a lease behaves like a read-only branch. - Change `/size_debug` response to include lease point in the SVG visualization. - Fix `/lsn_lease` HTTP API to do proper parsing for POST. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-04 15:09:05 +00:00
Arpad Müller	e579bc0819	Add find-large-objects subcommand to scrubber (#8257 ) Adds a find-large-objects subcommand to the scrubber to allow listing layer objects larger than a specific size. To be used like: ``` AWS_PROFILE=dev REGION=us-east-2 BUCKET=neon-dev-storage-us-east-2 cargo run -p storage_scrubber -- find-large-objects --min-size 250000000 --ignore-deltas ``` Part of #5431	2024-07-04 15:07:16 +00:00
John Spray	c9e6dd45d3	pageserver: downgrade stale generation messages to INFO (#8256 ) ## Problem When generations were new, these messages were an important way of noticing if something unexpected was going on. We found some real issues when investigating tests that unexpectedly tripped them. At time has gone on, this code is now pretty battle-tested, and as we do more live migrations etc, it's fairly normal to see the occasional message from a node with a stale generation. At this point the cognitive load on developers to selectively allow-list these logs outweighs the benefit of having them at warn severity. Closes: https://github.com/neondatabase/neon/issues/8080 ## Summary of changes - Downgrade "Dropped remote consistent LSN updates" and "Dropping stale deletions" messages to INFO - Remove all the allow-list entries for these logs.	2024-07-04 15:05:41 +01:00
Alexander Bayandin	bf9fc77061	CI(pg-clients): unify workflow with build-and-test (#8160 ) ## Problem `pg-clients` workflow looks different from the main `build-and-test` workflow for historical reasons (it was my very first task at Neon, and back then I wasn't really familiar with the rest of the CI pipelines). This PR unifies `pg-clients` workflow with `build-and-test` ## Summary of changes - Rename `pg_clients.yml` to `pg-clients.yml` - Run the workflow on changes in relevant files - Create Allure report for tests - Send slack notifications to `#on-call-qa-staging-stream` channel (instead of `#on-call-staging-stream`) - Update Client libraries once we're here	2024-07-04 14:58:01 +01:00
Arpad Müller	a004d27fca	Use bool param for round_trip_test_compressed (#8252 ) As per @koivunej 's request in https://github.com/neondatabase/neon/pull/8238#discussion_r1663892091 , use a runtime param instead of monomorphizing the function based on the value. Part of https://github.com/neondatabase/neon/issues/5431	2024-07-04 15:04:08 +02:00
Vlad Lazar	a46253766b	pageserver: increase rate limit duration for layer visit log (#8263 ) ## Problem I'd like to keep this in the tree since it might be useful in prod as well. It's a bit too noisy as is and missing the lsn. ## Summary of changes Add an lsn field and and increase the rate limit duration.	2024-07-04 13:22:33 +01:00
Alexander Bayandin	5b69b32dc5	CI(build-and-test): add conclusion job (#8246 ) ## Problem Currently, if you need to rename a job and the job is listed in [branch protection rules](https://github.com/neondatabase/neon/settings/branch_protection_rules), the PR won't be allowed to merge. ## Summary of changes - Add `conclusion` job that fails if any of its dependencies don't finish successfully	2024-07-04 09:20:01 +01:00
Conrad Ludgate	e03c3c9893	proxy: cache certain non-retriable console errors for a short time (#8201 ) ## Problem If there's a quota error, it makes sense to cache it for a short window of time. Many clients do not handle database connection errors gracefully, so just spam retry 🤡 ## Summary of changes Updates the node_info cache to support storing console errors. Store console errors if they cannot be retried (using our own heuristic. should only trigger for quota exceeded errors).	2024-07-04 09:03:03 +01:00
Vlad Lazar	bbb2fa7cdd	tests: perform graceful rolling restarts in storcon scale test (#8173 ) ## Problem Scale test doesn't exercise drain & fill. ## Summary of changes Make scale test exercise drain & fill	2024-07-04 06:04:19 +01:00
John Spray	778787d8e9	pageserver: add supplementary branch usage stats (#8131 ) ## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: https://github.com/neondatabase/neon/issues/8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in https://github.com/neondatabase/neon/pull/8245 than this PR is adding.	2024-07-03 22:29:43 +01:00
Alex Chi Z	90b51dcf16	fix(pageserver): ensure test creates valid layer map (#8191 ) I'd like to add some constraints to the layer map we generate in tests. (1) is the layer map that the current compaction algorithm will produce. There is a property that for all delta layer, all delta layer overlaps with it on the LSN axis will have the same LSN range. (2) is the layer map that cannot be produced with the legacy compaction algorithm. (3) is the layer map that will be produced by the future tiered-compaction algorithm. The current validator does not allow that but we can modify the algorithm to allow it in the future. ## Summary of changes Add a validator to check if the layer map is valid and refactor the test cases to include delta layer start/end LSN. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-03 18:46:58 +00:00
Christian Schwarz	a85aa03d18	page_service: stop exposing `get_last_record_rlsn` (#8244 ) Compute doesn't use it, let's eliminate it. Ref to Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1719920261995529	2024-07-03 20:05:01 +02:00
Japin Li	cdaed4d79c	Fix outdated comment (#8149 ) Commit `97b48c23f` changes the log wait timeout from 1 second to 100 milliseconds but forgets to update the comment.	2024-07-03 13:55:36 -04:00
John Spray	ea0b22a9b0	pageserver: reduce ops tracked at per-timeline detail (#8245 ) ## Problem We record detailed histograms for all page_service op types, which mostly aren't very interesting, but make our prometheus scrapes huge. Closes: #8223 ## Summary of changes - Only track GetPageAtLsn histograms on a per-timeline granularity. For all other operation types, rely on existing node-wide histograms.	2024-07-03 17:27:34 +01:00
Peter Bendel	392a58bdce	add pagebench test cases for periodic pagebench on dedicated hardware (#8233 ) we want to run some specific pagebench test cases on dedicated hardware to get reproducible results run1: 1 client per tenant => characterize throughput with n tenants. - 500 tenants - scale 13 (200 MB database) - 1 hour duration - ca 380 GB layer snapshot files run2.singleclient: 1 client per tenant => characterize latencies run2.manyclient: N clients per tenant => characterize throughput scalability within one tenant. - 1 tenant with 1 client for latencies - 1 tenant with 64 clients because typically for a high number of connections we recommend the connection pooler which by default uses 64 connections (for scalability) - scale 136 (2048 MB database) - 20 minutes each	2024-07-03 16:22:33 +00:00
Arpad Müller	e0891ec8c8	Only support compressed reads if the compression setting is present (#8238 ) PR #8106 was created with the assumption that no blob is larger than `256 MiB`. Due to #7852 we have checking for writes of blobs larger than that limit, but we didn't have checking for reads of such large blobs: in theory, we could be reading these blobs every day but we just don't happen to write the blobs for some reason. Therefore, we now add a warning for reads of such large blobs as well. To make deploying compression less dangerous, we therefore only assume a blob is compressed if the compression setting is present in the config. This also means that we can't back out of compression once we enabled it. Part of https://github.com/neondatabase/neon/issues/5431	2024-07-03 18:02:10 +02:00
John Spray	97f7188a07	pageserver: don't try to flush if shutdown during attach (#8235 ) ## Problem test_location_conf_churn fails on log errors when it tries to shutdown a pageserver immediately after starting a tenant attach, like this: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8224/9761000525/index.html#/testresult/15fb6beca5c7327c ``` shutdown:shutdown{tenant_id=35f5c55eb34e7e5e12288c5d8ab8b909 shard_id=0000}:timeline_shutdown{timeline_id=30936747043353a98661735ad09cbbfe shutdown_mode=FreezeAndFlush}: failed to freeze and flush: cannot flush frozen layers when flush_loop is not running, state is Exited\n') ``` This is happening because Tenant::shutdown fires its cancellation token early if the tenant is not fully attached by the time shutdown is called, so the flush loop is shutdown by the time we try and flush. ## Summary of changes - In the early-cancellation case, also set the shutdown mode to Hard to skip trying to do a flush that will fail.	2024-07-03 13:13:06 +00:00
Alexander Bayandin	aae3876318	CI: update docker/* actions to latest versions (#7694 ) ## Problem GitHub Actions complain that we use actions that depend on deprecated Node 16: ``` Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: docker/setup-buildx-action@v2 ``` But also, the latest `docker/setup-buildx-action` fails with the following error: ``` /nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:175 throw new Error(`Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.`); ^ Error: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved. at Object.rejected (/nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:175:1) at Generator.next (<anonymous>) at fulfilled (/nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:29:1) ``` We can work this around by setting `cache-binary: false` for `uses: docker/setup-buildx-action@v3` ## Summary of changes - Update `docker/setup-buildx-action` from `v2` to `v3`, set `cache-binary: false` - Update `docker/login-action` from `v2` to `v3` - Update `docker/build-push-action` from `v4`/`v5` to `v6`	2024-07-03 12:19:13 +01:00
Heikki Linnakangas	dae55badf3	Simplify test_wal_page_boundary_start test (#8214 ) All the code to ensure the WAL record lands at a page boundary was unnecessary for reproducing the original problem. In fact, it's a pretty basic test that checks that outbound replication (= neon as publisher) still works after restarting the endpoint. It just used to be very broken before commit `5ceccdc7de`, which also added this test. To verify that: 1. Check out commit `f3af5f4660` (because the next commit, `7dd58e1449`, fixed the same bug in a different way, making it infeasible to revert the bug fix in an easy way) 2. Revert the bug fix from commit `5ceccdc7de` with this: ``` diff --git a/pgxn/neon/walproposer_pg.c b/pgxn/neon/walproposer_pg.c index 7debb6325..9f03bbd99 100644 --- a/pgxn/neon/walproposer_pg.c +++ b/pgxn/neon/walproposer_pg.c @@ -1437,8 +1437,10 @@ XLogWalPropWrite(WalProposer wp, char buf, Size nbytes, XLogRecPtr recptr) * * https://github.com/neondatabase/neon/issues/5749 */ +#if 0 if (!wp->config->syncSafekeepers) XLogUpdateWalBuffers(buf, recptr, nbytes); +#endif while (nbytes > 0) { ``` 3. Run the test_wal_page_boundary_start regression test. It fails, as expected 4. Apply this commit to the test, and run it again. It still fails, with the same error mentioned in issue #5749: ``` PG:2024-06-30 20:49:08.805 GMT [1248196] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') PG:2024-06-30 21:37:52.567 GMT [1467972] LOG: starting logical decoding for slot "sub1" PG:2024-06-30 21:37:52.567 GMT [1467972] DETAIL: Streaming transactions committing after 0/1532330, reading WAL from 0/1531C78. PG:2024-06-30 21:37:52.567 GMT [1467972] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') PG:2024-06-30 21:37:52.567 GMT [1467972] LOG: logical decoding found consistent point at 0/1531C78 PG:2024-06-30 21:37:52.567 GMT [1467972] DETAIL: There are no running transactions. PG:2024-06-30 21:37:52.567 GMT [1467972] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') PG:2024-06-30 21:37:52.568 GMT [1467972] ERROR: could not find record while sending logically-decoded data: invalid contrecord length 312 (expected 6) at 0/1533FD8 ```	2024-07-03 13:22:53 +03:00
Alex Chi Z	4273309962	docker: add storage_scrubber into the docker image (#8239 ) ## Problem We will run this tool in the k8s cluster. To make it accessible from k8s, we need to package it into the docker image. part of https://github.com/neondatabase/cloud/issues/14024	2024-07-03 09:48:56 +01:00
Konstantin Knizhnik	4a0c2aebe0	Add test for proper handling of connection failure to avoid 'cannot wait on socket event without a socket' error (#8231 ) ## Problem See https://github.com/neondatabase/cloud/issues/14289 and PR #8210 ## Summary of changes Add test for problems fixed in #8210 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-02 21:45:42 +03:00
Alex Chi Z	891cb5a9a8	fix(pageserver): comments about metadata key range (#8236 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-02 16:54:32 +00:00
John Spray	f5832329ac	tense of errors (#8234 ) I forgot a commit when merging https://github.com/neondatabase/neon/pull/8177	2024-07-02 17:17:22 +01:00
Alexander Bayandin	6216df7765	CI(benchmarking): move psql queries to actions/run-python-test-set (#8230 ) ## Problem Some of the Nightly benchmarks fail with the error ``` + /tmp/neon/pg_install/v14/bin/pgbench --version /tmp/neon/pg_install/v14/bin/pgbench: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory ``` Originally, we added the `pgbench --version` call to check that `pgbench` is installed and to fail earlier if it's not. The failure happens because we don't have `LD_LIBRARY_PATH` set for every job, and it also affects `psql` command. We can move it to `actions/run-python-test-set` so as not to duplicate code (as it already have `LD_LIBRARY_PATH` set). ## Summary of changes - Remove `pgbench --version` call - Move `psql` commands to common `actions/run-python-test-set`	2024-07-02 15:21:23 +00:00
Christian Schwarz	5de896e7d8	L0 flush: opt-in mechanism to bypass PageCache reads and writes (#8190 ) part of https://github.com/neondatabase/neon/issues/7418 # Motivation (reproducing #7418) When we do an `InMemoryLayer::write_to_disk`, there is a tremendous amount of random read I/O, as deltas from the ephemeral file (written in LSN order) are written out to the delta layer in key order. In benchmarks (https://github.com/neondatabase/neon/pull/7409) we can see that this delta layer writing phase is substantially more expensive than the initial ingest of data, and that within the delta layer write a significant amount of the CPU time is spent traversing the page cache. # High-Level Changes Add a new mode for L0 flush that works as follows: * Read the full ephemeral file into memory -- layers are much smaller than total memory, so this is afforable * Do all the random reads directly from this in memory buffer instead of using blob IO/page cache/disk reads. * Add a semaphore to limit how many timelines may concurrently do this (limit peak memory). * Make the semaphore configurable via PS config. # Implementation Details The new `BlobReaderRef::Slice` is a temporary hack until we can ditch `blob_io` for `InMemoryLayer` => Plan for this is laid out in https://github.com/neondatabase/neon/issues/8183 # Correctness The correctness of this change is quite obvious to me: we do what we did before (`blob_io`) but read from memory instead of going to disk. The highest bug potential is in doing owned-buffers IO. I refactored the API a bit in preliminary PR https://github.com/neondatabase/neon/pull/8186 to make it less error-prone, but still, careful review is requested. # Performance I manually measured single-client ingest performance from `pgbench -i ...`. Full report: https://neondatabase.notion.site/2024-06-28-benchmarking-l0-flush-performance-e98cff3807f94cb38f2054d8c818fe84?pvs=4 tl;dr: * no speed improvements during ingest, but * significantly lower pressure on PS PageCache (eviction rate drops to 1/3) * (that's why I'm working on this) * noticable but modestly lower CPU time This is good enough for merging this PR because the changes require opt-in. We'll do more testing in staging & pre-prod. # Stability / Monitoring memory consumption: there's no _hard_ limit on max `InMemoryLayer` size (aka "checkpoint distance") , hence there's no hard limit on the memory allocation we do for flushing. In practice, we a) [log a warning](`23827c6b0d/pageserver/src/tenant/timeline.rs (L5741-L5743)`) when we flush oversized layers, so we'd know which tenant is to blame and b) if we were to put a hard limit in place, we would have to decide what to do if there is an InMemoryLayer that exceeds the limit. It seems like a better option to guarantee a max size for frozen layer, dependent on `checkpoint_distance`. Then limit concurrency based on that. metrics: we do have the [flush_time_histo](`23827c6b0d/pageserver/src/tenant/timeline.rs (L3725-L3726)`), but that includes the wait time for the semaphore. We could add a separate metric for the time spent after acquiring the semaphore, so one can infer the wait time. Seems unnecessary at this point, though.	2024-07-02 16:29:09 +02:00
Arpad Müller	25eefdeb1f	Add support for reading and writing compressed blobs (#8106 ) Add support for reading and writing zstd-compressed blobs for use in image layer generation, but maybe one day useful also for delta layers. The reading of them is unconditional while the writing is controlled by the `image_compression` config variable allowing for experiments. For the on-disk format, we re-use some of the bitpatterns we currently keep reserved for blobs larger than 256 MiB. This assumes that we have never ever written any such large blobs to image layers. After the preparation in #7852, we now are unable to read blobs with a size larger than 256 MiB (or write them). A non-goal of this PR is to come up with good heuristics of when to compress a bitpattern. This is left for future work. Parts of the PR were inspired by #7091. cc #7879 Part of #5431	2024-07-02 14:14:12 +00:00
Vlad Lazar	28929d9cfa	pageserver: rate limit log for loads of layers visited (#8228 ) ## Problem At high percentiles we see more than 800 layers being visited by the read path. We need the tenant/timeline to investigate. ## Summary of changes Add a rate limited log line when the average number of layers visited per key is in the last specified histogram bucket. I plan to use this to identify tenants in us-east-2 staging that exhibit this behaviour. Will revert before next week's release.	2024-07-02 14:14:10 +01:00
Christian Schwarz	9b4b4bbf6f	fix: noisy logging when download gets cancelled during shutdown (#8224 ) Before this PR, during timeline shutdown, we'd occasionally see log lines like this one: ``` 2024-06-26T18:28:11.063402Z INFO initial_size_calculation{tenant_id=$TENANT,shard_id=0000 timeline_id=$TIMELINE}:logical_size_calculation_task:get_or_maybe_download{layer=000000000000000000000000000000000000-000000067F0001A3950001C1630100000000__0000000D88265898}: layer file download failed, and caller has been cancelled: Cancelled, shutting down Stack backtrace: 0: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1964:27 pageserver::tenant::remote_timeline_client::RemoteTimelineClient::download_layer_file::{{closure}} at /home/nonroot/pageserver/src/tenant/remote_timeline_client.rs:531:13 pageserver::tenant::storage_layer::layer::LayerInner::download_and_init::{{closure}} at /home/nonroot/pageserver/src/tenant/storage_layer/layer.rs:1136:14 pageserver::tenant::storage_layer::layer::LayerInner::download_init_and_wait::{{closure}}::{{closure}} at /home/nonroot/pageserver/src/tenant/storage_layer/layer.rs:1082:74 ``` We can eliminate the anyhow backtrace with no loss of information because the conversion to anyhow::Error happens in exactly one place. refs #7427	2024-07-02 13:13:27 +00:00
John Spray	1a0f545c16	pageserver: simpler, stricter config error handling (#8177 ) ## Problem Tenant attachment has error paths for failures to write local configuration, but these types of local storage I/O errors should be considered fatal for the process. Related thread on an earlier PR that touched this code: https://github.com/neondatabase/neon/pull/7947#discussion_r1655134114 ## Summary of changes - Make errors writing tenant config fatal (abort process) - When reading tenant config, make all I/O errors except ENOENT fatal - Replace use of bare anyhow errors with `LoadConfigError`	2024-07-02 12:45:04 +00:00
Christian Schwarz	7dcdbaa25e	remote_storage config: move handling of empty inline table `{}` to callers (#8193 ) Before this PR, `RemoteStorageConfig::from_toml` would support deserializing an empty `{}` TOML inline table to a `None`, otherwise try `Some()`. We can instead let * in proxy: let clap derive handle the Option * in PS & SK: assume that if the field is specified, it must be a valid RemtoeStorageConfig (This PR started with a much simpler goal of factoring out the `deserialize_item` function because I need that in another PR).	2024-07-02 12:53:08 +02:00
Konstantin Knizhnik	0497b99f3a	Check status of connection after PQconnectStartParams (#8210 ) ## Problem See https://github.com/neondatabase/cloud/issues/14289 ## Summary of changes Check connection status after calling PQconnectStartParams ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-02 06:56:10 +03:00
Vlad Lazar	9882ac8e06	docs: Graceful storage controller cluster restarts RFC (#7704 ) RFC for "Graceful Restarts of Storage Controller Managed Clusters". Related https://github.com/neondatabase/neon/issues/7387	2024-07-01 18:44:28 +01:00
Heikki Linnakangas	0789160ffa	tests: Make neon_xlogflush() flush all WAL, if you omit the LSN arg (#8215 ) This makes it much more convenient to use in the common case that you want to flush all the WAL. (Passing pg_current_wal_insert_lsn() as the argument doesn't work for the same reasons as explained in the comments: we need to be back off to the beginning of a page if the previous record ended at page boundary.) I plan to use this to fix the issue that Arseny Sher called out at https://github.com/neondatabase/neon/pull/7288#discussion_r1660063852	2024-07-01 10:55:18 -05:00
Alexander Bayandin	9c32604aa9	CI(gather-rust-build-stats): fix build with libpq (#8219 ) ## Problem I've missed setting `PQ_LIB_DIR` in https://github.com/neondatabase/neon/pull/8206 in `gather-rust-build-stats` job and it fails now: ``` = note: /usr/bin/ld: cannot find -lpq collect2: error: ld returned 1 exit status error: could not compile `storage_controller` (bin "storage_controller") due to 1 previous error ``` https://github.com/neondatabase/neon/actions/runs/9743960062/job/26888597735 ## Summary of changes - Set `PQ_LIB_DIR` for `gather-rust-build-stats` job	2024-07-01 16:42:23 +01:00
Alex Chi Z	b02aafdfda	fix(pageserver): include aux file in basebackup only once (#8207 ) Extracted from https://github.com/neondatabase/neon/pull/6560, currently we include multiple copies of aux files in the basebackup. ## Summary of changes Fix the loop. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-01 14:36:49 +00:00
Alexander Bayandin	e823b92947	CI(build-tools): Remove libpq from build image (#8206 ) ## Problem We use `build-tools` image as a base image to build other images, and it has a pretty old `libpq-dev` installed (v13; it wasn't that old until I removed system Postgres 14 from `build-tools` image in https://github.com/neondatabase/neon/pull/6540) ## Summary of changes - Remove `libpq-dev` from `build-tools` image - Set `LD_LIBRARY_PATH` for tests (for different Postgres binaries that we use, like psql and pgbench) - Set `PQ_LIB_DIR` to build Storage Controller - Set `LD_LIBRARY_PATH`/`DYLD_LIBRARY_PATH` in the Storage Controller where it calls Postgres binaries	2024-07-01 13:11:55 +01:00
John Spray	aea5cfe21e	pageserver: add metric `pageserver_secondary_resident_physical_size` (#8204 ) ## Problem We lack visibility of how much local disk space is used by secondary tenant locations Close: https://github.com/neondatabase/neon/issues/8181 ## Summary of changes - Add `pageserver_secondary_resident_physical_size`, tagged by tenant - Register & de-register label sets from SecondaryTenant - Add+use wrappers in SecondaryDetail that update metrics when adding+removing layers/timelines	2024-07-01 12:48:20 +01:00
Heikki Linnakangas	9ce193082a	Restore running xacts from CLOG on replica startup (#7288 ) We have one pretty serious MVCC visibility bug with hot standby replicas. We incorrectly treat any transactions that are in progress in the primary, when the standby is started, as aborted. That can break MVCC for queries running concurrently in the standby. It can also lead to hint bits being set incorrectly, and that damage can last until the replica is restarted. The fundamental bug was that we treated any replica start as starting from a shut down server. The fix for that is straightforward: we need to set 'wasShutdown = false' in InitWalRecovery() (see changes in the postgres repo). However, that introduces a new problem: with wasShutdown = false, the standby will not open up for queries until it receives a running-xacts WAL record from the primary. That's correct, and that's how Postgres hot standby always works. But it's a problem for Neon, because: * It changes the historical behavior for existing users. Currently, the standby immediately opens up for queries, so if they now need to wait, we can breka existing use cases that were working fine (assuming you don't hit the MVCC issues). * The problem is much worse for Neon than it is for standalone PostgreSQL, because in Neon, we can start a replica from an arbitrary LSN. In standalone PostgreSQL, the replica always starts WAL replay from a checkpoint record, and the primary arranges things so that there is always a running-xacts record soon after each checkpoint record. You can still hit this issue with PostgreSQL if you have a transaction with lots of subtransactions running in the primary, but it's pretty rare in practice. To mitigate that, we introduce another way to collect the running-xacts information at startup, without waiting for the running-xacts WAL record: We can the CLOG for XIDs that haven't been marked as committed or aborted. It has limitations with subtransactions too, but should mitigate the problem for most users. See https://github.com/neondatabase/neon/issues/7236. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-01 12:58:12 +03:00
Heikki Linnakangas	75c84c846a	tests: Make neon_xlogflush() flush all WAL, if you omit the LSN arg This makes it much more convenient to use in the common case that you want to flush all the WAL. (Passing pg_current_wal_insert_lsn() as the argument doesn't work for the same reasons as explained in the comments: we need to be back off to the beginning of a page if the previous record ended at page boundary.) I plan to use this to fix the issue that Arseny Sher called out at https://github.com/neondatabase/neon/pull/7288#discussion_r1660063852	2024-07-01 12:58:08 +03:00
Heikki Linnakangas	57535c039c	tests: remove a leftover 'running' flag (#8216 ) The 'running' boolean was replaced with a semaphore in commit `f0e2bb79b2`, but this initialization was missed. Remove it so that if a test tries to access it, you get an error rather than always claiming that the endpoint is not running. Spotted by Arseny at https://github.com/neondatabase/neon/pull/7288#discussion_r1660068657	2024-07-01 11:23:31 +03:00
Heikki Linnakangas	30027d94a2	Fix tracking of the nextMulti in the pageserver's copy of CheckPoint (#6528 ) Whenever we see an XLOG_MULTIXACT_CREATE_ID WAL record, we need to update the nextMulti and NextMultiOffset fields in the pageserver's copy of the CheckPoint struct, to cover the new multi-XID. In PostgreSQL, this is done by updating an in-memory struct during WAL replay, but because in Neon you can start a compute node at any LSN, we need to have an up-to-date value pre-calculated in the pageserver at all times. We do the same for nextXid. However, we had a bug in WAL ingestion code that does that: the multi-XIDs will wrap around at 2^32, just like XIDs, so we need to do the comparisons in a wraparound-aware fashion. Fix that, and add tests. Fixes issue #6520 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-01 01:49:49 +03:00
Alex Chi Z	bc704917a3	fix(pageserver): ensure tenant harness has different names (#8205 ) rename the tenant test harness name Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-28 15:13:25 -04:00
John Spray	b8bbaafc03	storage controller: fix heatmaps getting disabled during shard split (#8197 ) ## Problem At the start of do_tenant_shard_split, we drop any secondary location for the parent shards. The reconciler uses presence of secondary locations as a condition for enabling heatmaps. On the pageserver, child shards inherit their configuration from parents, but the storage controller assumes the child's ObservedState is the same as the parent's config from the prepare phase. The result is that some child shards end up with inaccurate ObservedState, and until something next migrates or restarts, those tenant shards aren't uploading heatmaps, so their secondary locations are downloading everything that was resident at the moment of the split (including ancestor layers which are often cleaned up shortly after the split). Closes: https://github.com/neondatabase/neon/issues/8189 ## Summary of changes - Use PlacementPolicy to control enablement of heatmap upload, rather than the literal presence of secondaries in IntentState: this way we avoid switching them off during shard split - test: during tenant split test, assert that the child shards have heatmap uploads enabled.	2024-06-28 18:27:13 +01:00
Arthur Petukhovsky	e1a06b40b7	Add rate limiter for partial uploads (#8203 ) Too many concurrect partial uploads can hurt disk performance, this commit adds a limiter. Context: https://neondb.slack.com/archives/C04KGFVUWUQ/p1719489018814669?thread_ts=1719440183.134739&cid=C04KGFVUWUQ	2024-06-28 18:16:21 +01:00
John Spray	babbe125da	pageserver: drop out of secondary download if iteration time has passed (#8198 ) ## Problem Very long running downloads can be wasteful, because the heatmap they're using is outdated after a few minutes. Closes: https://github.com/neondatabase/neon/issues/8182 ## Summary of changes - Impose a deadline on timeline downloads, using the same period as we use for scheduling, and returning an UpdateError::Restart when it is reached. This restart will involve waiting for a scheduling interval, but that's a good thing: it helps let other tenants proceed. - Refactor download_timeline so that the part where we update the state for local layers is done even if we fall out of the layer download loop with an error: this is important, especially for big tenants, because only layers in the SecondaryDetail state will be considered for eviction.	2024-06-28 17:05:09 +00:00
Heikki Linnakangas	ca2f7d06b2	Cherry-pick upstream fix for TruncateMultiXact assertion (#8195 ) We hit that bug in a new test being added in PR #6528. We'd get the fix from upstream with the next minor release anyway, but cherry-pick it now to unblock PR #6528. Upstream commit b1ffe3ff0b. See https://github.com/neondatabase/neon/pull/6528#issuecomment-2167367910	2024-06-28 16:47:05 +03:00
Arthur Petukhovsky	c22c6a6c9e	Add buckets to safekeeper ops metrics (#8194 ) In #8188 I forgot to specify buckets for new operations metrics. This commit fixes that.	2024-06-28 11:09:11 +01:00
Christian Schwarz	deec3bc578	virtual_file: take a `Slice` in the read APIs, eliminate `read_exact_at_n`, fix UB for engine `std-fs` (#8186 ) part of https://github.com/neondatabase/neon/issues/7418 I reviewed how the VirtualFile API's `read` methods look like and came to the conclusion that we've been using `IoBufMut` / `BoundedBufMut` / `Slice` wrong. This patch rectifies the situation. # Change 1: take `tokio_epoll_uring::Slice` in the read APIs Before, we took an `IoBufMut`, which is too low of a primitive and while it _seems_ convenient to be able to pass in a `Vec<u8>` without any fuzz, it's actually very unclear at the callsite that we're going to fill up that `Vec` up to its `capacity()`, because that's what `IoBuf::bytes_total()` returns and that's what `VirtualFile::read_exact_at` fills. By passing a `Slice` instead, a caller that "just wants to read into a `Vec`" is forced to be explicit about it, adding either `slice_full()` or `slice(x..y)`, and these methods panic if the read is outside of the bounds of the `Vec::capacity()`. Last, passing slices is more similar to what the `std::io` APIs look like. # Change 2: fix UB in `virtual_file_io_engine=std-fs` While reviewing call sites, I noticed that the `io_engine::IoEngine::read_at` method for `StdFs` mode has been constructing an `&mut[u8]` from raw parts that were uninitialized. We then used `std::fs::File::read_exact` to initialize that memory, but, IIUC we must not even be constructing an `&mut[u8]` where some of the memory isn't initialized. So, stop doing that and add a helper ext trait on `Slice` to do the zero-initialization. # Change 3: eliminate `read_exact_at_n` The `read_exact_at_n` doesn't make sense because the caller can just 1. `slice = buf.slice()` the exact memory it wants to fill 2. `slice = read_exact_at(slice)` 3. `buf = slice.into_inner()` Again, the `std::io` APIs specify the length of the read via the Rust slice length. We should do the same for the owned buffers IO APIs, i.e., via `Slice::bytes_total()`. # Change 4: simplify filling of `PageWriteGuard` The `PageWriteGuardBuf::init_up_to` was never necessary. Remove it. See changes to doc comment for more details. --- Reviewers should probably look at the added test case first, it illustrates my case a bit.	2024-06-28 11:20:37 +02:00
John Spray	063553a51b	pageserver: remove tenant create API (#8135 ) ## Problem For some time, we have created tenants with calls to location_conf. The legacy "POST /v1/tenant" path was only used in some tests. ## Summary of changes - Remove the API - Relocate TenantCreateRequest to the controller API file (this used to be used in both pageserver and controller APIs) - Rewrite tenant_create test helper to use location_config API, as control plane and storage controller do - Update docker-compose test script to create tenants with location_config API (this small commit is also present in https://github.com/neondatabase/neon/pull/7947)	2024-06-28 09:14:19 +01:00
Conrad Ludgate	cfb03d6cf0	Merge pull request #8178 from neondatabase/rc/proxy/2024-06-27 Proxy release 2024-06-27	2024-06-27 11:35:30 +01:00
Conrad Ludgate	d81ef3f962	Revert "proxy: update tokio-postgres to allow arbitrary config params (#8076 )" This reverts commit `78d9059fc7`.	2024-06-27 09:46:58 +01:00
Conrad Ludgate	5d62c67e75	Merge pull request #8117 from neondatabase/rc/proxy/2024-06-20 Proxy release 2024-06-20	2024-06-20 11:42:35 +01:00
Anna Khanova	53d53d5b1e	Merge pull request #7980 from neondatabase/rc/proxy/2024-06-06 Proxy release 2024-06-06	2024-06-06 13:14:40 +02:00
Anna Khanova	29fe6ea47a	Merge pull request #7909 from neondatabase/rc/proxy/2024-05-30 Proxy release 2024-05-30	2024-05-30 14:59:41 +02:00
Alexander Bayandin	640327ccb3	Merge pull request #7880 from neondatabase/rc/proxy/2024-05-24 Proxy release 2024-05-24	2024-05-24 18:00:18 +01:00
Anna Khanova	7cf0f6b37e	Merge pull request #7853 from neondatabase/rc/proxy/2024-05-23 Proxy release 2024-05-23	2024-05-23 12:09:13 +02:00
Anna Khanova	03c2c569be	[proxy] Do not fail after parquet upload error (#7858 ) ## Problem If the parquet upload was unsuccessful, it will panic. ## Summary of changes Write error in logs instead.	2024-05-23 11:44:47 +02:00
Conrad Ludgate	eff6d4538a	Merge pull request #7654 from neondatabase/rc/proxy/2024-05-08 Proxy release 2024-05-08	2024-05-08 11:56:20 +01:00
Conrad Ludgate	5ef7782e9c	Merge pull request #7649 from neondatabase/rc/proxy/2024-05-08 Proxy release 2024-05-08	2024-05-08 06:54:03 +01:00
Conrad Ludgate	73101db8c4	Merge branch 'release-proxy' into rc/proxy/2024-05-08	2024-05-08 06:43:57 +01:00
Anna Khanova	bccdfc6d39	Merge pull request #7580 from neondatabase/rc/proxy/2024-05-02 Proxy release 2024-05-02	2024-05-02 12:00:01 +02:00
Anna Khanova	99595813bb	proxy: keep track on the number of events from redis by type. (#7582 ) ## Problem It's unclear what is the distribution of messages, proxy is consuming from redis. ## Summary of changes Add counter.	2024-05-02 11:56:19 +02:00
Anna Khanova	fe07b54758	Merge pull request #7507 from neondatabase/rc/proxy/2024-04-25 Proxy release 2024-04-25	2024-04-25 13:50:05 +02:00
Anna Khanova	a42d173e7b	proxy: Fix cancellations (#7510 ) ## Problem Cancellations were published to the channel, that was never read. ## Summary of changes Fallback to global redis publishing.	2024-04-25 13:42:25 +02:00
Anna Khanova	e07f689238	Update connect to compute and wake compute retry configs (#7509 ) ## Problem ## Summary of changes Decrease waiting time	2024-04-25 13:20:21 +02:00
Conrad Ludgate	7831eddc88	Merge pull request #7417 from neondatabase/rc/proxy/2024-04-18 Proxy release 2024-04-18	2024-04-18 12:03:07 +01:00
Conrad Ludgate	943b1bc80c	Merge pull request #7366 from neondatabase/proxy-hotfix Release proxy (2024-04-11 hotfix)	2024-04-12 10:15:14 +01:00
Conrad Ludgate	95a184e9b7	proxy: fix overloaded db connection closure (#7364 ) ## Problem possible for the database connections to not close in time. ## Summary of changes force the closing of connections if the client has hung up	2024-04-11 23:38:47 +01:00
Conrad Ludgate	3fa17e9d17	Merge pull request #7357 from neondatabase/rc/proxy/2024-04-11 Proxy release 2024-04-11	2024-04-11 11:49:45 +01:00
Anna Khanova	55e0fd9789	Merge pull request #7304 from neondatabase/rc/proxy/2024-04-04 Proxy release 2024-04-04	2024-04-04 12:40:11 +02:00
Anna Khanova	2a88889f44	Merge pull request #7254 from neondatabase/rc/proxy/2024-03-27 Proxy release 2024-03-27	2024-03-27 11:44:09 +01:00
Conrad Ludgate	5bad8126dc	Merge pull request #7173 from neondatabase/rc/proxy/2024-03-19 Proxy release 2024-03-19	2024-03-19 12:11:42 +00:00
Anna Khanova	27bc242085	Merge pull request #7119 from neondatabase/rc/proxy/2024-03-14 Proxy release 2024-03-14	2024-03-14 14:57:05 +05:00
Anna Khanova	192b49cc6d	Merge branch 'release-proxy' into rc/proxy/2024-03-14	2024-03-14 14:16:36 +05:00
Conrad Ludgate	e1b60f3693	Merge pull request #7041 from neondatabase/rc/proxy/2024-03-07 Proxy release 2024-03-07	2024-03-08 08:19:16 +00:00
Anna Khanova	2804f5323b	Merge pull request #6997 from neondatabase/rc/proxy/2024-03-04 Proxy release 2024-03-04	2024-03-04 17:36:11 +04:00
Anna Khanova	676adc6b32	Merge branch 'release-proxy' into rc/proxy/2024-03-04	2024-03-04 16:41:46 +04:00
Nikita Kalyanov	96a4e8de66	Add /terminate API (#6745 ) (#6853 ) this is to speed up suspends, see https://github.com/neondatabase/cloud/issues/10284 Cherry-pick to release branch to build new compute images	2024-02-22 11:51:19 +02:00
Arseny Sher	01180666b0	Merge pull request #6803 from neondatabase/releases/2024-02-19 Release 2024-02-19	2024-02-19 16:38:35 +04:00
Conrad Ludgate	6c94269c32	Merge pull request #6758 from neondatabase/release-proxy-2024-02-14 2024-02-14 Proxy Release	2024-02-15 09:45:08 +00:00
Anna Khanova	edc691647d	Proxy: remove fail fast logic to connect to compute (#6759 ) ## Problem Flaky tests ## Summary of changes Remove failfast logic	2024-02-15 07:42:12 +00:00
Conrad Ludgate	855d7b4781	hold cancel session (#6750 ) ## Problem In a recent refactor, we accidentally dropped the cancel session early ## Summary of changes Hold the cancel session during proxy passthrough	2024-02-14 14:57:22 +00:00
Anna Khanova	c49c9707ce	Proxy: send cancel notifications to all instances (#6719 ) ## Problem If cancel request ends up on the wrong proxy instance, it doesn't take an effect. ## Summary of changes Send redis notifications to all proxy pods about the cancel request. Related issue: https://github.com/neondatabase/neon/issues/5839, https://github.com/neondatabase/cloud/issues/10262	2024-02-14 14:57:22 +00:00
Anna Khanova	2227540a0d	Proxy refactor auth+connect (#6708 ) ## Problem Not really a problem, just refactoring. ## Summary of changes Separate authenticate from wake compute. Do not call wake compute second time if we managed to connect to postgres or if we got it not from cache.	2024-02-14 14:57:22 +00:00
Conrad Ludgate	f1347f2417	proxy: add more http logging (#6726 ) ## Problem hard to see where time is taken during HTTP flow. ## Summary of changes add a lot more for query state. add a conn_id field to the sql-over-http span	2024-02-14 14:57:22 +00:00
Conrad Ludgate	30b295b017	proxy: some more parquet data (#6711 ) ## Summary of changes add auth_method and database to the parquet logs	2024-02-14 14:57:22 +00:00
Anna Khanova	1cef395266	Proxy: copy bidirectional fork (#6720 ) ## Problem `tokio::io::copy_bidirectional` doesn't close the connection once one of the sides closes it. It's not really suitable for the postgres protocol. ## Summary of changes Fork `copy_bidirectional` and initiate a shutdown for both connections. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-02-14 14:57:22 +00:00
John Spray	78d160f76d	Merge pull request #6721 from neondatabase/releases/2024-02-12 Release 2024-02-12	2024-02-12 09:35:30 +00:00
Vlad Lazar	b9238059d6	Merge pull request #6617 from neondatabase/releases/2024-02-05 Release 2024-02-05	2024-02-05 12:50:38 +00:00
Arpad Müller	d0cb4b88c8	Don't preserve temp files on creation errors of delta layers (#6612 ) There is currently no cleanup done after a delta layer creation error, so delta layers can accumulate. The problem gets worse as the operation gets retried and delta layers accumulate on the disk. Therefore, delete them from disk (if something has been written to disk).	2024-02-05 09:58:18 +00:00
John Spray	1ec3e39d4e	Merge pull request #6504 from neondatabase/releases/2024-01-29 Release 2024-01-29	2024-01-29 10:05:01 +00:00
John Spray	a1a74eef2c	Merge pull request #6420 from neondatabase/releases/2024-01-22 Release 2024-01-22	2024-01-22 17:24:11 +00:00
John Spray	90e689adda	pageserver: mark tenant broken when cancelling attach (#6430 ) ## Problem When a tenant is in Attaching state, and waiting for the `concurrent_tenant_warmup` semaphore, it also listens for the tenant cancellation token. When that token fires, Tenant::attach drops out. Meanwhile, Tenant::set_stopping waits forever for the tenant to exit Attaching state. Fixes: https://github.com/neondatabase/neon/issues/6423 ## Summary of changes - In the absence of a valid state for the tenant, it is set to Broken in this path. A more elegant solution will require more refactoring, beyond this minimal fix. (cherry picked from commit `93572a3e99`)	2024-01-22 16:20:57 +00:00
Christian Schwarz	f0b2d4b053	fixup(#6037 ): actually fix the issue, #6388 failed to do so (#6429 ) Before this patch, the select! still retured immediately if `futs` was empty. Must have tested a stale build in my manual testing of #6388. (cherry picked from commit `15c0df4de7`)	2024-01-22 15:23:12 +00:00
Anna Khanova	299d9474c9	Proxy: fix gc (#6426 ) ## Problem Gc currently doesn't work properly. ## Summary of changes Change statement on running gc.	2024-01-22 14:39:09 +01:00
Conrad Ludgate	7234208b36	bump shlex (#6421 ) ## Problem https://rustsec.org/advisories/RUSTSEC-2024-0006 ## Summary of changes `cargo update -p shlex` (cherry picked from commit `5559b16953`)	2024-01-22 09:49:33 +00:00
Christian Schwarz	93450f11f5	Merge pull request #6354 from neondatabase/releases/2024-01-15 Release 2024-01-15 NB: the previous release PR https://github.com/neondatabase/neon/pull/6286 was accidentally merged by merge-by-squash instead of merge-by-merge-commit. See https://github.com/neondatabase/neon/pull/6354#issuecomment-1891706321 for more context.	2024-01-15 14:30:25 +01:00
Christian Schwarz	2f0f9edf33	Merge remote-tracking branch 'origin/release' into releases/2024-01-15	2024-01-15 09:36:42 +00:00
Christian Schwarz	d424f2b7c8	empty commit so we can produce a merge commit	2024-01-15 09:36:22 +00:00
Christian Schwarz	21315e80bc	Merge branch 'releases/2024-01-08--not-squashed' into releases/2024-01-15	2024-01-15 09:31:07 +00:00
vipvap	483b66d383	Merge branch 'release' into releases/2024-01-08 (not-squashed merge of #6286 ) Release PR https://github.com/neondatabase/neon/pull/6286 got accidentally merged-by-squash intstead of merge-by-merge-commit. This commit shows how things would look like if 6286 had been merged-by-squash. ``` git reset --hard `9f1327772` git merge --no-ff `5c0264b591` ``` Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-01-15 09:28:08 +00:00
vipvap	aa72a22661	Release 2024-01-08 (#6286 ) Release 2024-01-08	2024-01-08 09:26:27 +00:00
Shany Pozin	5c0264b591	Merge branch 'release' into releases/2024-01-08	2024-01-08 09:34:06 +02:00
Arseny Sher	9f13277729	Merge pull request #6242 from neondatabase/releases/2024-01-02 Release 2024-01-02	2024-01-02 12:04:43 +04:00
Arseny Sher	54aa319805	Don't split WAL record across two XLogData's when sending from safekeepers. As protocol demands. Not following this makes standby complain about corrupted WAL in various ways. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 closes https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:54:00 +04:00
Arseny Sher	4a227484bf	Add large insertion and slow WAL sending to test_hot_standby. To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL records torn across several XLogData messages. Add failpoint to safekeeper to slow down sending. Also check for corrupted WAL complains in standby log. Make the test a bit simpler in passing, e.g. we don't need explicit commits as autocommit is enabled by default. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:54:00 +04:00
Arseny Sher	2f83f85291	Add failpoint support to safekeeper. Just a copy paste from pageserver.	2024-01-02 10:54:00 +04:00
Arseny Sher	d6cfcb0d93	Move failpoint support code to utils. To enable them in safekeeper as well.	2024-01-02 10:54:00 +04:00
Arseny Sher	392843ad2a	Fix safekeeper START_REPLICATION (term=n). It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of uncommitted WAL since `cdb08f03` hanged. Add test for this.	2024-01-02 10:54:00 +04:00
Arseny Sher	bd4dae8f4a	compute_ctl: kill postgres and sync-safekeeprs on exit. Otherwise they are left orphaned when compute_ctl is terminated with a signal. It was invisible most of the time because normally neon_local or k8s kills postgres directly and then compute_ctl finishes gracefully. However, in some tests compute_ctl gets stuck waiting for sync-safekeepers which intentionally never ends because safekeepers are offline, and we want to stop compute_ctl without leaving orphanes behind. This is a quite rough approach which doesn't wait for children termination. A better way would be to convert compute_ctl to async which would make waiting easy.	2024-01-02 10:54:00 +04:00
Shany Pozin	b05fe53cfd	Merge pull request #6240 from neondatabase/releases/2024-01-01 Release 2024-01-01	2024-01-01 11:07:30 +02:00
Christian Schwarz	c13a2f0df1	Merge pull request #6192 from neondatabase/releases/2023-12-19 Release 2023-12-19 We need to do a config change that requires restarting the pageservers. Slip in two metrics-related commits that didn't make this week's regularly release.	2023-12-19 14:52:47 +01:00
Christian Schwarz	39be366fc5	higher resolution histograms for getpage@lsn (#6177 ) part of https://github.com/neondatabase/cloud/issues/7811	2023-12-19 13:46:59 +00:00
Christian Schwarz	6eda0a3158	[PRE-MERGE] fix metric `pageserver_initial_logical_size_start_calculation` (This is a pre-merge cherry-pick of https://github.com/neondatabase/neon/pull/6191) It wasn't being incremented. Fixup of commit `1c88824ed0` Author: Christian Schwarz <christian@neon.tech> Date: Fri Dec 1 12:52:59 2023 +0100 initial logical size calculation: add a bunch of metrics (#5995)	2023-12-19 13:46:55 +00:00
Shany Pozin	306c7a1813	Merge pull request #6173 from neondatabase/sasha_release_bypassrls_replication Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles	2023-12-18 22:16:36 +02:00
Sasha Krassovsky	80be423a58	Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles	2023-12-18 10:22:36 -08:00
Shany Pozin	5dcfef82f2	Merge pull request #6163 from neondatabase/releases/2023-12-18 Release 2023-12-18-2	2023-12-18 15:34:17 +02:00
Christian Schwarz	e67b8f69c0	[PRE-MERGE] pageserver: Reduce tracing overhead in timeline::get #6115 Pre-merge `git merge --squash` of https://github.com/neondatabase/neon/pull/6115 Lowering the tracing level in get_value_reconstruct_data and get_or_maybe_download from info to debug reduces the overhead of span creation in non-debug environments.	2023-12-18 13:39:48 +01:00
Shany Pozin	e546872ab4	Merge pull request #6158 from neondatabase/releases/2023-12-18 Release 2023-12-18	2023-12-18 14:24:34 +02:00
John Spray	322ea1cf7c	pageserver: on-demand activation cleanups (#6157 ) ## Problem #6112 added some logs and metrics: clean these up a bit: - Avoid counting startup completions for tenants launched after startup - exclude no-op cases from timing histograms - remove a rogue log messages	2023-12-18 11:14:19 +00:00
Vadim Kharitonov	3633742de9	Merge pull request #6121 from neondatabase/releases/2023-12-13 Release 2023-12-13	2023-12-13 12:39:43 +01:00
Joonas Koivunen	079d3a37ba	Merge remote-tracking branch 'origin/release' into releases/2023-12-13 this handles the hotfix introduced conflict.	2023-12-13 10:07:19 +00:00
Vadim Kharitonov	a46e77b476	Merge pull request #6090 from neondatabase/releases/2023-12-11 Release 2023-12-11	2023-12-12 12:10:35 +01:00
Tristan Partin	a92702b01e	Add submodule paths as safe directories as a precaution The check-codestyle-rust-arm job requires this for some reason, so let's just add them everywhere we do this workaround.	2023-12-11 22:00:35 +00:00
Tristan Partin	8ff3253f20	Fix git ownership issue in check-codestyle-rust-arm We have this workaround for other jobs. Looks like this one was forgotten about.	2023-12-11 22:00:35 +00:00
Joonas Koivunen	04b82c92a7	fix: accidential return Ok (#6106 ) Error indicating request cancellation OR timeline shutdown was deemed as a reason to exit the background worker that calculated synthetic size. Fix it to only be considered for avoiding logging such of such errors. This conflicted on tenant_shard_id having already replaced tenant_id on `main`.	2023-12-11 21:41:36 +00:00
Vadim Kharitonov	e5bf423e68	Merge branch 'release' into releases/2023-12-11	2023-12-11 11:55:48 +01:00
Vadim Kharitonov	60af392e45	Merge pull request #6057 from neondatabase/vk/patch_timescale_for_production Revert timescaledb for pg14 and pg15 (#6056)	2023-12-06 16:21:16 +01:00
Vadim Kharitonov	661fc41e71	Revert timescaledb for pg14 and pg15 (#6056 ) ``` could not start the compute node: compute is in state "failed": db error: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory Caused by: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory ```	2023-12-06 16:14:07 +01:00
Shany Pozin	702c488f32	Merge pull request #6022 from neondatabase/releases/2023-12-04 Release 2023-12-04	2023-12-05 17:03:28 +02:00
Sasha Krassovsky	45c5122754	Remove trusted from wal2json	2023-12-04 12:36:19 -08:00
Shany Pozin	558394f710	fix merge	2023-12-04 11:41:27 +02:00
Shany Pozin	73b0898608	Merge branch 'release' into releases/2023-12-04	2023-12-04 11:36:26 +02:00
Joonas Koivunen	e65be4c2dc	Merge pull request #6013 from neondatabase/releases/2023-12-01-hotfix fix: use create_new instead of create for mutex file	2023-12-01 15:35:56 +02:00
Joonas Koivunen	40087b8164	fix: use create_new instead of create for mutex file	2023-12-01 12:54:49 +00:00
Shany Pozin	c762b59483	Merge pull request #5986 from neondatabase/Release-11-30-hotfix Notify safekeeper readiness with systemd.	2023-11-30 10:01:05 +02:00
Arseny Sher	5d71601ca9	Notify safekeeper readiness with systemd. To avoid downtime during deploy, as in busy regions initial load can currently take ~30s.	2023-11-30 08:23:31 +03:00
Shany Pozin	a113c3e433	Merge pull request #5945 from neondatabase/release-2023-11-28-hotfix Release 2023 11 28 hotfix	2023-11-28 08:14:59 +02:00
Anastasia Lubennikova	e81fc598f4	Update neon extension relocatable for existing installations (#5943 )	2023-11-28 00:12:39 +00:00
Anastasia Lubennikova	48b845fa76	Make neon extension relocatable to allow SET SCHEMA (#5942 )	2023-11-28 00:12:32 +00:00
Shany Pozin	27096858dc	Merge pull request #5922 from neondatabase/releases/2023-11-27 Release 2023-11-27	2023-11-27 09:58:51 +02:00
Shany Pozin	4430d0ae7d	Merge pull request #5876 from neondatabase/releases/2023-11-17 Release 2023-11-17	2023-11-20 09:11:58 +02:00
Joonas Koivunen	6e183aa0de	Merge branch 'main' into releases/2023-11-17	2023-11-19 15:25:47 +00:00
Vadim Kharitonov	fd6d0b7635	Merge branch 'release' into releases/2023-11-17	2023-11-17 10:51:45 +01:00
Vadim Kharitonov	3710c32aae	Merge pull request #5778 from neondatabase/releases/2023-11-03 Release 2023-11-03	2023-11-03 16:06:58 +01:00
Vadim Kharitonov	be83bee49d	Merge branch 'release' into releases/2023-11-03	2023-11-03 11:18:15 +01:00
Alexander Bayandin	cf28e5922a	Merge pull request #5685 from neondatabase/releases/2023-10-26 Release 2023-10-26	2023-10-27 10:42:12 +01:00
Em Sharnoff	7d384d6953	Bump vm-builder v0.18.2 -> v0.18.4 (#5666 ) Only applicable change was neondatabase/autoscaling#584, setting pgbouncer auth_dbname=postgres in order to fix superuser connections from preventing dropping databases.	2023-10-26 20:15:45 +01:00
Em Sharnoff	4b3b37b912	Bump vm-builder v0.18.1 -> v0.18.2 (#5646 ) Only applicable change was neondatabase/autoscaling#571, removing the postgres_exporter flags `--auto-discover-databases` and `--exclude-databases=...`	2023-10-26 20:15:29 +01:00
Shany Pozin	1d8d200f4d	Merge pull request #5668 from neondatabase/sp/aux_files_cherry_pick Cherry pick: Ignore missed AUX_FILES_KEY when generating image layer (#5660)	2023-10-26 10:08:16 +03:00
Konstantin Knizhnik	0d80d6ce18	Ignore missed AUX_FILES_KEY when generating image layer (#5660 ) ## Problem Logical replication requires new AUX_FILES_KEY which is definitely absent in existed database. We do not have function to check if key exists in our KV storage. So I have to handle the error in `list_aux_files` method. But this key is also included in key space range and accessed y `create_image_layer` method. ## Summary of changes Check if AUX_FILES_KEY exists before including it in keyspace. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-10-26 09:30:28 +03:00
Shany Pozin	f653ee039f	Merge pull request #5638 from neondatabase/releases/2023-10-24 Release 2023-10-24	2023-10-24 12:10:52 +03:00
Em Sharnoff	e614a95853	Merge pull request #5610 from neondatabase/sharnoff/rc-2023-10-20-vm-monitor-fixes Release 2023-10-20: vm-monitor memory.high throttling fixes	2023-10-20 00:11:06 -07:00
Em Sharnoff	850db4cc13	vm-monitor: Deny not fail downscale if no memory stats yet (#5606 ) Fixes an issue we observed on staging that happens when the autoscaler-agent attempts to immediately downscale the VM after binding, which is typical for pooled computes. The issue was occurring because the autoscaler-agent was requesting downscaling before the vm-monitor had gathered sufficient cgroup memory stats to be confident in approving it. When the vm-monitor returned an internal error instead of denying downscaling, the autoscaler-agent retried the connection and immediately hit the same issue (in part because cgroup stats are collected per-connection, rather than globally).	2023-10-19 21:56:55 -07:00
Em Sharnoff	8a316b1277	vm-monitor: Log full error on message handling failure (#5604 ) There's currently an issue with the vm-monitor on staging that's not really feasible to debug because the current display impl gives no context to the errors (just says "failed to downscale"). Logging the full error should help. For communications with the autoscaler-agent, it's ok to only provide the outermost cause, because we can cross-reference with the VM logs. At some point in the future, we may want to change that.	2023-10-19 21:56:50 -07:00
Em Sharnoff	4d13bae449	vm-monitor: Switch from memory.high to polling memory.stat (#5524 ) tl;dr it's really hard to avoid throttling from memory.high, and it counts tmpfs & page cache usage, so it's also hard to make sense of. In the interest of fixing things quickly with something that should be good enough, this PR switches to instead periodically fetch memory statistics from the cgroup's memory.stat and use that data to determine if and when we should upscale. This PR fixes #5444, which has a lot more detail on the difficulties we've hit with memory.high. This PR also supersedes #5488.	2023-10-19 21:56:36 -07:00
Vadim Kharitonov	49377abd98	Merge pull request #5577 from neondatabase/releases/2023-10-17 Release 2023-10-17	2023-10-17 12:21:20 +02:00
Christian Schwarz	a6b2f4e54e	limit imitate accesses concurrency, using same semaphore as compactions (#5578 ) Before this PR, when we restarted pageserver, we'd see a rush of `$number_of_tenants` concurrent eviction tasks starting to do imitate accesses building up in the period of `[init_order allows activations, $random_access_delay + EvictionPolicyLayerAccessThreshold::period]`. We simply cannot handle that degree of concurrent IO. We already solved the problem for compactions by adding a semaphore. So, this PR shares that semaphore for use by evictions. Part of https://github.com/neondatabase/neon/issues/5479 Which is again part of https://github.com/neondatabase/neon/issues/4743 Risks / Changes In System Behavior ================================== * we don't do evictions as timely as we currently do * we log a bunch of warnings about eviction taking too long * imitate accesses and compactions compete for the same concurrency limit, so, they'll slow each other down through this shares semaphore Changes ======= - Move the `CONCURRENT_COMPACTIONS` semaphore into `tasks.rs` - Rename it to `CONCURRENT_BACKGROUND_TASKS` - Use it also for the eviction imitate accesses: - Imitate acceses are both per-TIMELINE and per-TENANT - The per-TENANT is done through coalescing all the per-TIMELINE tasks via a tokio mutex `eviction_task_tenant_state`. - We acquire the CONCURRENT_BACKGROUND_TASKS permit early, at the beginning of the eviction iteration, much before the imitate acesses start (and they may not even start at all in the given iteration, as they happen only every $threshold). - Acquiring early is sub-optimal because when the per-timline tasks coalesce on the `eviction_task_tenant_state` mutex, they are already holding a CONCURRENT_BACKGROUND_TASKS permit. - It's also unfair because tenants with many timelines win the CONCURRENT_BACKGROUND_TASKS more often. - I don't think there's another way though, without refactoring more of the imitate accesses logic, e.g, making it all per-tenant. - Add metrics for queue depth behind the semaphore. I found these very useful to understand what work is queued in the system. - The metrics are tagged by the new `BackgroundLoopKind`. - On a green slate, I would have used `TaskKind`, but we already had pre-existing labels whose names didn't map exactly to task kind. Also the task kind is kind of a lower-level detail, so, I think it's fine to have a separate enum to identify background work kinds. Future Work =========== I guess I could move the eviction tasks from a ticker to "sleep for $period". The benefit would be that the semaphore automatically "smears" the eviction task scheduling over time, so, we only have the rush on restart but a smeared-out rush afterward. The downside is that this perverts the meaning of "$period", as we'd actually not run the eviction at a fixed period. It also means the the "took to long" warning & metric becomes meaningless. Then again, that is already the case for the compaction and gc tasks, which do sleep for `$period` instead of using a ticker. (cherry picked from commit `9256788273`)	2023-10-17 12:16:26 +02:00
Shany Pozin	face60d50b	Merge pull request #5526 from neondatabase/releases/2023-10-11 Release 2023-10-11	2023-10-11 11:16:39 +03:00
Shany Pozin	9768aa27f2	Merge pull request #5516 from neondatabase/releases/2023-10-10 Release 2023-10-10	2023-10-10 14:16:47 +03:00
Shany Pozin	96b2e575e1	Merge pull request #5445 from neondatabase/releases/2023-10-03 Release 2023-10-03	2023-10-04 13:53:37 +03:00
Alexander Bayandin	7222777784	Update checksums for pg_jsonschema & pg_graphql (#5455 ) ## Problem Folks have re-taged releases for `pg_jsonschema` and `pg_graphql` (to increase timeouts on their CI), for us, these are a noop changes, but unfortunately, this will cause our builds to fail due to checksums mismatch (this might not strike right away because of the build cache). - `8ba7c7be9d` - `aa7509370a` ## Summary of changes - `pg_jsonschema` update checksum - `pg_graphql` update checksum	2023-10-03 18:44:30 +01:00
Em Sharnoff	5469fdede0	Merge pull request #5422 from neondatabase/sharnoff/rc-2023-09-28-fix-restart-on-postmaster-SIGKILL Release 2023-09-28: Fix (lack of) restart on neonvm postmaster SIGKILL	2023-09-28 10:48:51 -07:00
MMeent	72aa6b9fdd	Fix neon_zeroextend's WAL logging (#5387 ) When you log more than a few blocks, you need to reserve the space in advance. We didn't do that, so we got errors. Now we do that, and shouldn't get errors.	2023-09-28 09:37:28 -07:00
Em Sharnoff	ae0634b7be	Bump vm-builder v0.17.11 -> v0.17.12 (#5407 ) Only relevant change is neondatabase/autoscaling#534 - refer there for more details.	2023-09-28 09:28:04 -07:00
Shany Pozin	70711f32fa	Merge pull request #5375 from neondatabase/releases/2023-09-26 Release 2023-09-26	2023-09-26 15:19:45 +03:00
Vadim Kharitonov	52a88af0aa	Merge pull request #5336 from neondatabase/releases/2023-09-19 Release 2023-09-19	2023-09-19 11:16:43 +02:00
Alexander Bayandin	b7a43bf817	Merge branch 'release' into releases/2023-09-19	2023-09-19 09:07:20 +01:00
Alexander Bayandin	dce91b33a4	Merge pull request #5318 from neondatabase/releases/2023-09-15-1 Postgres 14/15: Use previous extensions versions	2023-09-15 16:30:44 +01:00
Alexander Bayandin	23ee4f3050	Revert plv8 only	2023-09-15 15:45:23 +01:00
Alexander Bayandin	46857e8282	Postgres 14/15: Use previous extensions versions	2023-09-15 15:27:00 +01:00
Alexander Bayandin	368ab0ce54	Merge pull request #5313 from neondatabase/releases/2023-09-15 Release 2023-09-15	2023-09-15 10:39:56 +01:00
Konstantin Knizhnik	a5987eebfd	References to old and new blocks were mixed in xlog_heap_update handler (#5312 ) ## Problem See https://neondb.slack.com/archives/C05L7D1JAUS/p1694614585955029 https://www.notion.so/neondatabase/Duplicate-key-issue-651627ce843c45188fbdcb2d30fd2178 ## Summary of changes Swap old/new block references ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-15 10:11:41 +01:00
Alexander Bayandin	6686ede30f	Update checksum for pg_hint_plan (#5309 ) ## Problem The checksum for `pg_hint_plan` doesn't match: ``` sha256sum: WARNING: 1 computed checksum did NOT match ``` Ref https://github.com/neondatabase/neon/actions/runs/6185715461/job/16793609251?pr=5307 It seems that the release was retagged yesterday: https://github.com/ossc-db/pg_hint_plan/releases/tag/REL16_1_6_0 I don't see any malicious changes from 15_1.5.1: https://github.com/ossc-db/pg_hint_plan/compare/REL15_1_5_1...REL16_1_6_0, so it should be ok to update. ## Summary of changes - Update checksum for `pg_hint_plan` 16_1.6.0	2023-09-15 09:54:42 +01:00
Em Sharnoff	373c7057cc	vm-monitor: Fix cgroup throttling (#5303 ) I believe this (not actual IO problems) is the cause of the "disk speed issue" that we've had for VMs recently. See e.g.: 1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH 2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659 The vm-informant (and now, the vm-monitor, its replacement) is supposed to gradually increase the `neon-postgres` cgroup's memory.high value, because otherwise the kernel will throttle all the processes in the cgroup. This PR fixes a bug with the vm-monitor's implementation of this behavior. --- Other references, for the vm-informant's implementation: - Original issue: neondatabase/autoscaling#44 - Original PR: neondatabase/autoscaling#223	2023-09-15 09:54:42 +01:00
Shany Pozin	7d6ec16166	Merge pull request #5296 from neondatabase/releases/2023-09-13 Release 2023-09-13	2023-09-13 13:49:14 +03:00
Shany Pozin	0e6fdc8a58	Merge pull request #5283 from neondatabase/releases/2023-09-12 Release 2023-09-12	2023-09-12 14:56:47 +03:00
Christian Schwarz	521438a5c6	fix deadlock around TENANTS (#5285 ) The sequence that can lead to a deadlock: 1. DELETE request gets all the way to `tenant.shutdown(progress, false).await.is_err() ` , while holding TENANTS.read() 2. POST request for tenant creation comes in, calls `tenant_map_insert`, it does `let mut guard = TENANTS.write().await;` 3. Something that `tenant.shutdown()` needs to wait for needs a `TENANTS.read().await`. The only case identified in exhaustive manual scanning of the code base is this one: Imitate size access does `get_tenant().await`, which does `TENANTS.read().await` under the hood. In the above case (1) waits for (3), (3)'s read-lock request is queued behind (2)'s write-lock, and (2) waits for (1). Deadlock. I made a reproducer/proof-that-above-hypothesis-holds in https://github.com/neondatabase/neon/pull/5281 , but, it's not ready for merge yet and we want the fix _now_. fixes https://github.com/neondatabase/neon/issues/5284	2023-09-12 14:13:13 +03:00
Vadim Kharitonov	07d7874bc8	Merge pull request #5202 from neondatabase/releases/2023-09-05 Release 2023-09-05	2023-09-05 12:16:06 +02:00
Anastasia Lubennikova	1804111a02	Merge pull request #5161 from neondatabase/rc-2023-08-31 Release 2023-08-31	2023-08-31 16:53:17 +03:00
Arthur Petukhovsky	cd0178efed	Merge pull request #5150 from neondatabase/release-sk-fix-active-timeline Release 2023-08-30	2023-08-30 11:43:39 +02:00
Shany Pozin	333574be57	Merge pull request #5133 from neondatabase/releases/2023-08-29 Release 2023-08-29	2023-08-29 14:02:58 +03:00
Alexander Bayandin	79a799a143	Merge branch 'release' into releases/2023-08-29	2023-08-29 11:17:57 +01:00
Conrad Ludgate	9da06af6c9	Merge pull request #5113 from neondatabase/release-http-connection-fix Release 2023-08-25	2023-08-25 17:21:35 +01:00
Conrad Ludgate	ce1753d036	proxy: dont return connection pending (#5107 ) ## Problem We were returning Pending when a connection had a notice/notification (introduced recently in #5020). When returning pending, the runtime assumes you will call `cx.waker().wake()` in order to continue processing. We weren't doing that, so the connection task would get stuck ## Summary of changes Don't return pending. Loop instead	2023-08-25 16:42:30 +01:00
Alek Westover	67db8432b4	Fix cargo deny errors (#5068 ) ## Problem cargo deny lint broken Links to the CVEs: [rustsec.org/advisories/RUSTSEC-2023-0052](https://rustsec.org/advisories/RUSTSEC-2023-0052) [rustsec.org/advisories/RUSTSEC-2023-0053](https://rustsec.org/advisories/RUSTSEC-2023-0053) One is fixed, the other one isn't so we allow it (for now), to unbreak CI. Then later we'll try to get rid of webpki in favour of the rustls fork. ## Summary of changes ``` +ignore = ["RUSTSEC-2023-0052"] ```	2023-08-25 16:42:30 +01:00
Vadim Kharitonov	4e2e44e524	Enable neon-pool-opt-in (#5062 )	2023-08-22 09:06:14 +01:00
Vadim Kharitonov	ed786104f3	Merge pull request #5060 from neondatabase/releases/2023-08-22 Release 2023-08-22	2023-08-22 09:41:02 +02:00
Stas Kelvich	84b74f2bd1	Merge pull request #4997 from neondatabase/sk/proxy-release-23-07-15 Fix lint	2023-08-15 18:54:20 +03:00
Arthur Petukhovsky	fec2ad6283	Fix lint	2023-08-15 18:49:02 +03:00
Stas Kelvich	98eebd4682	Merge pull request #4996 from neondatabase/sk/proxy_release Disable neon-pool-opt-in	2023-08-15 18:37:50 +03:00
Arthur Petukhovsky	2f74287c9b	Disable neon-pool-opt-in	2023-08-15 18:34:17 +03:00
Shany Pozin	aee1bf95e3	Merge pull request #4990 from neondatabase/releases/2023-08-15 Release 2023-08-15	2023-08-15 15:34:38 +03:00
Shany Pozin	b9de9d75ff	Merge branch 'release' into releases/2023-08-15	2023-08-15 14:35:00 +03:00
Stas Kelvich	7943b709e6	Merge pull request #4940 from neondatabase/sk/release-23-05-25-proxy-fixup Release: proxy retry fixup	2023-08-09 13:53:19 +03:00
Conrad Ludgate	d7d066d493	proxy: delay auth on retry (#4929 ) ## Problem When an endpoint is shutting down, it can take a few seconds. Currently when starting a new compute, this causes an "endpoint is in transition" error. We need to add delays before retrying to ensure that we allow time for the endpoint to shutdown properly. ## Summary of changes Adds a delay before retrying in auth. connect_to_compute already has this delay	2023-08-09 12:54:24 +03:00
Felix Prasanna	e78ac22107	release fix: revert vm builder bump from 0.13.1 -> 0.15.0-alpha1 (#4932 ) This reverts commit `682dfb3a31`. hotfix for a CLI arg issue in the monitor	2023-08-08 21:08:46 +03:00
Vadim Kharitonov	76a8f2bb44	Merge pull request #4923 from neondatabase/releases/2023-08-08 Release 2023-08-08	2023-08-08 11:44:38 +02:00
Vadim Kharitonov	8d59a8581f	Merge branch 'release' into releases/2023-08-08	2023-08-08 10:54:34 +02:00
Vadim Kharitonov	b1ddd01289	Define NEON_SMGR to make it possible for extensions to use Neon SMG API (#4889 ) Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-08-03 16:28:31 +03:00
Alexander Bayandin	6eae4fc9aa	Release 2023-08-02: update pg_embedding (#4877 ) Cherry-picking `ca4d71a954` from `main` into the `release` Co-authored-by: Vadim Kharitonov <vadim2404@users.noreply.github.com>	2023-08-03 08:48:09 +02:00
Christian Schwarz	765455bca2	Merge pull request #4861 from neondatabase/releases/2023-08-01--2-fix-pipeline ci: fix upload-postgres-extensions-to-s3 job	2023-08-01 13:22:07 +02:00
Christian Schwarz	4204960942	ci: fix upload-postgres-extensions-to-s3 job commit commit `5f8fd640bf` Author: Alek Westover <alek.westover@gmail.com> Date: Wed Jul 26 08:24:03 2023 -0400 Upload Test Remote Extensions (#4792) switched to using the release tag instead of `latest`, but, the `promote-images` job only uploads `latest` to the prod ECR. The switch to using release tag was good in principle, but, reverting that part to make the release pipeine work. Note that a proper fix should abandon use of `:latest` tag at all: currently, if a `main` pipeline runs concurrently with a `release` pipeline, the `release` pipeline may end up using the `main` pipeline's images.	2023-08-01 12:01:45 +02:00
Christian Schwarz	67345d66ea	Merge pull request #4858 from neondatabase/releases/2023-08-01 Release 2023-08-01	2023-08-01 10:44:01 +02:00
Shany Pozin	2266ee5971	Merge pull request #4803 from neondatabase/releases/2023-07-25 Release 2023-07-25	2023-07-25 14:21:07 +03:00
Shany Pozin	b58445d855	Merge pull request #4746 from neondatabase/releases/2023-07-18 Release 2023-07-18	2023-07-18 14:45:39 +03:00
Conrad Ludgate	36050e7f3d	Merge branch 'release' into releases/2023-07-18	2023-07-18 12:00:09 +01:00
Alexander Bayandin	33360ed96d	Merge pull request #4705 from neondatabase/release-2023-07-12 Release 2023-07-12 (only proxy)	2023-07-12 19:44:36 +01:00
Conrad Ludgate	39a28d1108	proxy wake_compute loop (#4675 ) ## Problem If we fail to wake up the compute node, a subsequent connect attempt will definitely fail. However, kubernetes won't fail the connection immediately, instead it hangs until we timeout (10s). ## Summary of changes Refactor the loop to allow fast retries of compute_wake and to skip a connect attempt.	2023-07-12 18:40:11 +01:00
Conrad Ludgate	efa6aa134f	allow repeated IO errors from compute node (#4624 ) ## Problem #4598 compute nodes are not accessible some time after wake up due to kubernetes DNS not being fully propagated. ## Summary of changes Update connect retry mechanism to support handling IO errors and sleeping for 100ms ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-07-12 18:40:06 +01:00
Alexander Bayandin	2c724e56e2	Merge pull request #4646 from neondatabase/releases/2023-07-06-hotfix Release 2023-07-06 (add pg_embedding extension only)	2023-07-06 12:19:52 +01:00
Alexander Bayandin	feff887c6f	Compile `pg_embedding` extension (#4634 ) ``` CREATE EXTENSION embedding; CREATE TABLE t (val real[]); INSERT INTO t (val) VALUES ('{0,0,0}'), ('{1,2,3}'), ('{1,1,1}'), (NULL); CREATE INDEX ON t USING hnsw (val) WITH (maxelements = 10, dims=3, m=3); INSERT INTO t (val) VALUES (array[1,2,4]); SELECT * FROM t ORDER BY val <-> array[3,3,3]; val --------- {1,2,3} {1,2,4} {1,1,1} {0,0,0} (5 rows) ```	2023-07-06 09:39:41 +01:00
Vadim Kharitonov	353d915fcf	Merge pull request #4633 from neondatabase/releases/2023-07-05 Release 2023-07-05	2023-07-05 15:10:47 +02:00
Vadim Kharitonov	2e38098cbc	Merge branch 'release' into releases/2023-07-05	2023-07-05 12:41:48 +02:00
Vadim Kharitonov	a6fe5ea1ac	Merge pull request #4571 from neondatabase/releases/2023-06-27 Release 2023-06-27	2023-06-27 12:55:33 +02:00
Vadim Kharitonov	05b0aed0c1	Merge branch 'release' into releases/2023-06-27	2023-06-27 12:22:12 +02:00
Alex Chi Z	cd1705357d	Merge pull request #4561 from neondatabase/releases/2023-06-23-hotfix Release 2023-06-23 (pageserver-only)	2023-06-23 15:38:50 -04:00
Christian Schwarz	6bc7561290	don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker The consumption metrics synthetic size worker does logical size calculation. Logical size calculation currently does synchronous disk IO. This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving other futures. While there's work on the way to move the synchronous disk IO into spawn_blocking, the quickfix here is to use the BACKGROUND_RUNTIME instead of MGMT_REQUEST_RUNTIME. Actually it's not just a quickfix. We simply shouldn't be blocking MGMT_REQUEST_RUNTIME executor threads on CPU or sync disk IO. That work isn't done yet, as many of the mgmt tasks still _do_ disk IO. But it's not as intensive as the logical size calculations that we're fixing here. While we're at it, fix disk-usage-based eviction in a similar way. It wasn't the culprit here, according to prod logs, but it can theoretically be a little CPU-intensive. More context, including graphs from Prod: https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949 (cherry picked from commit `d6e35222ea`)	2023-06-23 20:54:07 +02:00
Christian Schwarz	fbd3ac14b5	Merge pull request #4544 from neondatabase/releases/2023-06-21-hotfix Release 2023-06-21 (fixup for post-merge failed 2023-06-20)	2023-06-21 16:54:34 +03:00
Christian Schwarz	e437787c8f	cargo update -p openssl (#4542 ) To unblock release https://github.com/neondatabase/neon/pull/4536#issuecomment-1600678054 Context: https://rustsec.org/advisories/RUSTSEC-2023-0044	2023-06-21 15:52:56 +03:00
Christian Schwarz	3460dbf90b	Merge pull request #4536 from neondatabase/releases/2023-06-20 Release 2023-06-20 (actually 2023-06-21)	2023-06-21 14:19:14 +03:00
Vadim Kharitonov	6b89d99677	Merge pull request #4521 from neondatabase/release_2023-06-15 Release 2023 06 15	2023-06-15 17:40:01 +02:00
Vadim Kharitonov	6cc8ea86e4	Merge branch 'main' into release_2023-06-15	2023-06-15 16:50:44 +02:00
Shany Pozin	e62a492d6f	Merge pull request #4486 from neondatabase/releases/2023-06-13 Release 2023-06-13	2023-06-13 15:21:35 +03:00
Alexey Kondratov	a475cdf642	[compute_ctl] Fix logging if catalog updates are skipped (#4480 ) Otherwise, it wasn't clear from the log when Postgres started up completely if catalog updates were skipped. Follow-up for `4936ab6`	2023-06-13 13:37:24 +02:00
Stas Kelvich	7002c79a47	Merge pull request #4447 from neondatabase/release_proxy_08-06-2023 Release proxy 08 06 2023	2023-06-08 21:02:54 +03:00
Vadim Kharitonov	ee6cf357b4	Merge pull request #4427 from neondatabase/releases/2023-06-06 Release 2023-06-06	2023-06-06 14:42:21 +02:00
Vadim Kharitonov	e5c2086b5f	Merge branch 'release' into releases/2023-06-06	2023-06-06 12:33:56 +02:00
Shany Pozin	5f1208296a	Merge pull request #4395 from neondatabase/releases/2023-06-01 Release 2023-06-01	2023-06-01 10:58:00 +03:00
Stas Kelvich	88e8e473cd	Merge pull request #4345 from neondatabase/release-23-05-25-proxy Release 23-05-25, take 3	2023-05-25 19:40:43 +03:00
Stas Kelvich	b0a77844f6	Add SQL-over-HTTP endpoint to Proxy This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON response structure resembling that of the node-postgres driver. This method, using HTTP POST, achieves smaller amortized latencies in edge setups due to fewer round trips and an enhanced open connection reuse by the v8 engine. This update involves several intricacies: 1. SQL injection protection: We employed the extended query protocol, modifying the rust-postgres driver to send queries in one roundtrip using a text protocol rather than binary, bypassing potential issues like those identified in https://github.com/sfackler/rust-postgres/issues/1030. 2. Postgres type compatibility: As not all postgres types have binary representations (e.g., acl's in pg_class), we adjusted rust-postgres to respond with text protocol, simplifying serialization and fixing queries with text-only types in response. 3. Data type conversion: Considering JSON supports fewer data types than Postgres, we perform conversions where possible, passing all other types as strings. Key conversions include: - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain text) - postgres bool, null, text -> json bool, null, string - postgres array -> json array - postgres json and jsonb -> json object 4. Alignment with node-postgres: To facilitate integration with js libraries, we've matched the response structure of node-postgres, returning command tags and column oids. Command tag capturing was added to the rust-postgres functionality as part of this change.	2023-05-25 17:59:17 +03:00
Vadim Kharitonov	1baf464307	Merge pull request #4309 from neondatabase/releases/2023-05-23 Release 2023-05-23	2023-05-24 11:56:54 +02:00
Alexander Bayandin	e9b8e81cea	Merge branch 'release' into releases/2023-05-23	2023-05-23 12:54:08 +01:00
Alexander Bayandin	85d6194aa4	Fix regress-tests job for Postgres 15 on release branch (#4254 ) ## Problem Compatibility tests don't support Postgres 15 yet, but we're still trying to upload compatibility snapshot (which we do not collect). Ref https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129 ## Summary of changes Add `pg_version` parameter to `run-python-test-set` actions and do not upload compatibility snapshot for Postgres 15	2023-05-16 17:19:12 +01:00
Vadim Kharitonov	333a7a68ef	Merge pull request #4245 from neondatabase/releases/2023-05-16 Release 2023-05-16	2023-05-16 13:38:40 +02:00
Vadim Kharitonov	6aa4e41bee	Merge branch 'release' into releases/2023-05-16	2023-05-16 12:48:23 +02:00
Joonas Koivunen	840183e51f	try: higher page_service timeouts to isolate an issue	2023-05-11 16:24:53 +03:00
Shany Pozin	cbccc94b03	Merge pull request #4184 from neondatabase/releases/2023-05-09 Release 2023-05-09	2023-05-09 15:30:36 +03:00
Stas Kelvich	fce227df22	Merge pull request #4163 from neondatabase/main Release 23-05-05	2023-05-05 15:56:23 +03:00
Stas Kelvich	bd787e800f	Merge pull request #4133 from neondatabase/main Release 23-04-01	2023-05-01 18:52:46 +03:00
Shany Pozin	4a7704b4a3	Merge pull request #4131 from neondatabase/sp/hotfix_adding_sks_us_west Hotfix: Adding 4 new pageservers and two sets of safekeepers to us west 2	2023-05-01 15:17:38 +03:00
Shany Pozin	ff1119da66	Add 2 new sets of safekeepers to us-west2	2023-05-01 14:35:31 +03:00
Shany Pozin	4c3ba1627b	Add 4 new Pageservers for retool launch	2023-05-01 14:34:38 +03:00
Vadim Kharitonov	1407174fb2	Merge pull request #4110 from neondatabase/vk/release_2023-04-28 Release 2023 04 28	2023-04-28 17:43:16 +02:00
Vadim Kharitonov	ec9dcb1889	Merge branch 'release' into vk/release_2023-04-28	2023-04-28 16:32:26 +02:00
Joonas Koivunen	d11d781afc	revert: "Add check for duplicates of generated image layers" (#4104 ) This reverts commit `732acc5`. Reverted PR: #3869 As noted in PR #4094, we do in fact try to insert duplicates to the layer map, if L0->L1 compaction is interrupted. We do not have a proper fix for that right now, and we are in a hurry to make a release to production, so revert the changes related to this to the state that we have in production currently. We know that we have a bug here, but better to live with the bug that we've had in production for a long time, than rush a fix to production without testing it in staging first. Cc: #4094, #4088	2023-04-28 16:31:35 +02:00
Anastasia Lubennikova	4e44565b71	Merge pull request #4000 from neondatabase/releases/2023-04-11 Release 2023-04-11	2023-04-11 17:47:41 +03:00
Stas Kelvich	4ed51ad33b	Add more proxy cnames	2023-04-11 15:59:35 +03:00
Arseny Sher	1c1ebe5537	Merge pull request #3946 from neondatabase/releases/2023-04-04 Release 2023-04-04	2023-04-04 14:38:40 +04:00
Christian Schwarz	c19cb7f386	Merge pull request #3935 from neondatabase/releases/2023-04-03 Release 2023-04-03	2023-04-03 16:19:49 +02:00
Vadim Kharitonov	4b97d31b16	Merge pull request #3896 from neondatabase/releases/2023-03-28 Release 2023-03-28	2023-03-28 17:58:06 +04:00
Shany Pozin	923ade3dd7	Merge pull request #3855 from neondatabase/releases/2023-03-21 Release 2023-03-21	2023-03-21 13:12:32 +02:00
Arseny Sher	b04e711975	Merge pull request #3825 from neondatabase/release-2023-03-15 Release 2023.03.15	2023-03-15 15:38:00 +03:00
Arseny Sher	afd0a6b39a	Forward framed read buf contents to compute before proxy pass. Otherwise they get lost. Normally buffer is empty before proxy pass, but this is not the case with pipeline mode of out npm driver; fixes connection hangup introduced by `b80fe41af3` for it. fixes https://github.com/neondatabase/neon/issues/3822	2023-03-15 15:36:06 +04:00
Lassi Pölönen	99752286d8	Use RollingUpdate strategy also for legacy proxy (#3814 ) ## Describe your changes We have previously changed the neon-proxy to use RollingUpdate. This should be enabled in legacy proxy too in order to avoid breaking connections for the clients and allow for example backups to run even during deployment. (https://github.com/neondatabase/neon/pull/3683) ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-03-15 15:35:51 +04:00
Arseny Sher	15df93363c	Merge pull request #3804 from neondatabase/release-2023-03-13 Release 2023.03.13	2023-03-13 20:25:40 +03:00
Vadim Kharitonov	bc0ab741af	Merge pull request #3758 from neondatabase/releases/2023-03-07 Release 2023-03-07	2023-03-07 12:38:47 +01:00
Christian Schwarz	51d9dfeaa3	Merge pull request #3743 from neondatabase/releases/2023-03-03 Release 2023-03-03	2023-03-03 19:20:21 +01:00
Shany Pozin	f63cb18155	Merge pull request #3713 from neondatabase/releases/2023-02-28 Release 2023-02-28	2023-02-28 12:52:24 +02:00
Arseny Sher	0de603d88e	Merge pull request #3707 from neondatabase/release-2023-02-24 Release 2023-02-24 Hotfix for UNLOGGED tables. Contains #3706 Also contains rebase on 14.7 and 15.2 #3581	2023-02-25 00:32:11 +04:00
Heikki Linnakangas	240913912a	Fix UNLOGGED tables. Instead of trying to create missing files on the way, send init fork contents as main fork from pageserver during basebackup. Add test for that. Call put_rel_drop for init forks; previously they weren't removed. Bump vendor/postgres to revert previous approach on Postgres side. Co-authored-by: Arseny Sher <sher-ars@yandex.ru> ref https://github.com/neondatabase/postgres/pull/264 ref https://github.com/neondatabase/postgres/pull/259 ref https://github.com/neondatabase/neon/issues/1222	2023-02-24 23:54:53 +04:00
MMeent	91a4ea0de2	Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581 ) ## Describe your changes Rebase vendored PostgreSQL onto 14.7 and 15.2 ## Issue ticket number and link #3579 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ``` The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL 14 and 15.2 for PostgreSQL 15. ```	2023-02-24 23:54:42 +04:00
Arseny Sher	8608704f49	Merge pull request #3691 from neondatabase/release-2023-02-23 Release 2023-02-23 Hotfix for the unlogged tables with indexes issue. neondatabase/postgres#259 neondatabase/postgres#262	2023-02-23 13:39:33 +04:00
Arseny Sher	efef68ce99	Bump vendor/postgres to include hotfix for unlogged tables with indexes. https://github.com/neondatabase/postgres/pull/259 https://github.com/neondatabase/postgres/pull/262	2023-02-23 08:49:43 +04:00
Joonas Koivunen	8daefd24da	Merge pull request #3679 from neondatabase/releases/2023-02-22 Releases/2023-02-22	2023-02-22 15:56:55 +02:00
Arthur Petukhovsky	46cc8b7982	Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671 ) We migrated all timelines to `safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be removed.	2023-02-22 15:07:57 +02:00
Sergey Melnikov	38cd90dd0c	Add -v to ansible invocations (#3670 ) To get more debug output on failures	2023-02-22 15:07:57 +02:00
Joonas Koivunen	a51b269f15	fix: hold permit until GetObject eof (#3663 ) previously we applied the ratelimiting only up to receiving the headers from s3, or somewhere near it. the commit adds an adapter which carries the permit until the AsyncRead has been disposed. fixes #3662.	2023-02-22 15:07:57 +02:00
Joonas Koivunen	43bf6d0a0f	calculate_logical_size: no longer use spawn_blocking (#3664 ) Calculation of logical size is now async because of layer downloads, so we shouldn't use spawn_blocking for it. Use of `spawn_blocking` exhausted resources which are needed by `tokio::io::copy` when copying from a stream to a file which lead to deadlock. Fixes: #3657	2023-02-22 15:07:57 +02:00
Joonas Koivunen	15273a9b66	chore: ignore all compaction inactive tenant errors (#3665 ) these are happening in tests because of #3655 but they sure took some time to appear. makes the `Compaction failed, retrying in 2s: Cannot run compaction iteration on inactive tenant` into a globally allowed error, because it has been seen failing on different test cases.	2023-02-22 15:07:57 +02:00
Joonas Koivunen	78aca668d0	fix: log download failed error (#3661 ) Fixes #3659	2023-02-22 15:07:57 +02:00
Vadim Kharitonov	acbf4148ea	Merge pull request #3656 from neondatabase/releases/2023-02-21 Release 2023-02-21	2023-02-21 16:03:48 +01:00
Vadim Kharitonov	6508540561	Merge branch 'release' into releases/2023-02-21	2023-02-21 15:31:16 +01:00
Arthur Petukhovsky	a41b5244a8	Add new safekeeper to ap-southeast-1 prod (#3645 ) (#3646 ) To trigger deployment of #3645 to production.	2023-02-20 15:22:49 +00:00
Shany Pozin	2b3189be95	Merge pull request #3600 from neondatabase/releases/2023-02-14 Release 2023-02-14	2023-02-15 13:31:30 +02:00
Vadim Kharitonov	248563c595	Merge pull request #3553 from neondatabase/releases/2023-02-07 Release 2023-02-07	2023-02-07 14:07:44 +01:00
Vadim Kharitonov	14cd6ca933	Merge branch 'release' into releases/2023-02-07	2023-02-07 12:11:56 +01:00
Vadim Kharitonov	eb36403e71	Release 2023 01 31 (#3497 ) Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com> Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com> Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Sergey Melnikov <sergey@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com> Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>	2023-01-31 15:06:35 +02:00
Anastasia Lubennikova	3c6f779698	Merge pull request #3411 from neondatabase/release_2023_01_23 Fix Release 2023 01 23	2023-01-23 20:10:03 +02:00
Joonas Koivunen	f67f0c1c11	More tenant size fixes (#3410 ) Small changes, but hopefully this will help with the panic detected in staging, for which we cannot get the debugging information right now (end-of-branch before branch-point).	2023-01-23 17:46:13 +02:00
Shany Pozin	edb02d3299	Adding pageserver3 to staging (#3403 )	2023-01-23 17:46:13 +02:00
Konstantin Knizhnik	664a69e65b	Fix slru_segment_key_range function: segno was assigned to incorrect Key field (#3354 )	2023-01-23 17:46:13 +02:00
Anastasia Lubennikova	478322ebf9	Fix tenant size orphans (#3377 ) Before only the timelines which have passed the `gc_horizon` were processed which failed with orphans at the tree_sort phase. Example input in added `test_branched_empty_timeline_size` test case. The PR changes iteration to happen through all timelines, and in addition to that, any learned branch points will be calculated as they would had been in the original implementation if the ancestor branch had been over the `gc_horizon`. This also changes how tenants where all timelines are below `gc_horizon` are handled. Previously tenant_size 0 was returned, but now they will have approximately `initdb_lsn` worth of tenant_size. The PR also adds several new tenant size tests that describe various corner cases of branching structure and `gc_horizon` setting. They are currently disabled to not consume time during CI. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-01-23 17:46:13 +02:00
Joonas Koivunen	802f174072	fix: dont stop pageserver if we fail to calculate synthetic size	2023-01-23 17:46:13 +02:00
Alexey Kondratov	47f9890bae	[compute_ctl] Make role deletion spec processing idempotent (#3380 ) Previously, we were trying to re-assign owned objects of the already deleted role. This were causing a crash loop in the case when compute was restarted with a spec that includes delta operation for role deletion. To avoid such cases, check that role is still present before calling `reassign_owned_objects`. Resolves neondatabase/cloud#3553	2023-01-23 17:46:13 +02:00
Christian Schwarz	262265daad	Revert "Use actual temporary dir for pageserver unit tests" This reverts commit `826e89b9ce`. The problem with that commit was that it deletes the TempDir while there are still EphemeralFile instances open. At first I thought this could be fixed by simply adding Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None)) to TenantHarness::drop, but it turned out to be insufficient. So, reverting the commit until we find a proper solution. refs https://github.com/neondatabase/neon/issues/3385	2023-01-23 17:46:13 +02:00
bojanserafimov	300da5b872	Improve layer map docstrings (#3382 )	2023-01-23 17:46:13 +02:00
Heikki Linnakangas	7b22b5c433	Switch to 'tracing' for logging, restructure code to make use of spans. Refactors Compute::prepare_and_run. It's split into subroutines differently, to make it easier to attach tracing spans to the different stages. The high-level logic for waiting for Postgres to exit is moved to the caller. Replace 'env_logger' with 'tracing', and add `#instrument` directives to different stages fo the startup process. This is a fairly mechanical change, except for the changes in 'spec.rs'. 'spec.rs' contained some complicated formatting, where parts of log messages were printed directly to stdout with `print`s. That was a bit messed up because the log normally goes to stderr, but those lines were printed to stdout. In our docker images, stderr and stdout both go to the same place so you wouldn't notice, but I don't think it was intentional. This changes the log format to the default 'tracing_subscriber::format' format. It's different from the Postgres log format, however, and because both compute_tools and Postgres print to the same log, it's now a mix of two different formats. I'm not sure how the Grafana log parsing pipeline can handle that. If it's a problem, we can build custom formatter to change the compute_tools log format to be the same as Postgres's, like it was before this commit, or we can change the Postgres log format to match tracing_formatter's, or we can start printing compute_tool's log output to a different destination than Postgres	2023-01-23 17:46:12 +02:00
Kirill Bulatov	ffca97bc1e	Enable logs in unit tests	2023-01-23 17:46:12 +02:00
Kirill Bulatov	cb356f3259	Use actual temporary dir for pageserver unit tests	2023-01-23 17:46:12 +02:00
Vadim Kharitonov	c85374295f	Change SENTRY_ENVIRONMENT from "development" to "staging"	2023-01-23 17:46:12 +02:00
Anastasia Lubennikova	4992160677	Fix metric_collection_endpoint for prod. It was incorrectly set to staging url	2023-01-23 17:46:12 +02:00
Heikki Linnakangas	bd535b3371	If an error happens while checking for core dumps, don't panic. If we panic, we skip the 30s wait in 'main', and don't give the console a chance to observe the error. Which is not nice. Spotted by @ololobus at https://github.com/neondatabase/neon/pull/3352#discussion_r1072806981	2023-01-23 17:46:12 +02:00
Kirill Bulatov	d90c5a03af	Add more io::Error context when fail to operate on a path (#3254 ) I have a test failure that shows ``` Caused by: 0: Failed to reconstruct a page image: 1: Directory not empty (os error 39) ``` but does not really show where exactly that happens. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d The PR aims to add more context in debugging that issue.	2023-01-23 17:46:12 +02:00
Anastasia Lubennikova	2d02cc9079	Merge pull request #3365 from neondatabase/main Release 2023-01-17	2023-01-17 16:41:34 +02:00
Christian Schwarz	49ad94b99f	Merge pull request #3301 from neondatabase/release-2023-01-10 Release 2023-01-10	2023-01-10 16:42:26 +01:00
Christian Schwarz	948a217398	Merge commit '95bf19b85a06b27a7fc3118dee03d48648efab15' into release-2023-01-10 Conflicts: .github/helm-values/neon-stress.proxy-scram.yaml .github/helm-values/neon-stress.proxy.yaml .github/helm-values/staging.proxy-scram.yaml .github/helm-values/staging.proxy.yaml All of the above were deleted in `main` after we hotfixed them in `release. Deleting them here storage_broker/src/bin/storage_broker.rs Hotfix toned down logging, but `main` has sinced implemented a proper fix. Taken `main`'s side, see https://neondb.slack.com/archives/C033RQ5SPDH/p1673354385387479?thread_ts=1673354306.474729&cid=C033RQ5SPDH closes https://github.com/neondatabase/neon/issues/3287	2023-01-10 15:40:14 +01:00
Dmitry Rodionov	125381eae7	Merge pull request #3236 from neondatabase/dkr/retrofit-sk4-sk4-change Move zenith-1-sk-3 to zenith-1-sk-4 (#3164)	2022-12-30 14:13:50 +03:00
Arthur Petukhovsky	cd01bbc715	Move zenith-1-sk-3 to zenith-1-sk-4 (#3164 )	2022-12-30 12:32:52 +02:00
Dmitry Rodionov	d8b5e3b88d	Merge pull request #3229 from neondatabase/dkr/add-pageserver-for-release add pageserver to new region see https://github.com/neondatabase/aws/pull/116 decrease log volume for pageserver	2022-12-30 12:34:04 +03:00
Dmitry Rodionov	06d25f2186	switch to debug from info to produce less noise	2022-12-29 17:48:47 +02:00
Dmitry Rodionov	f759b561f3	add pageserver to new region see https://github.com/neondatabase/aws/pull/116	2022-12-29 17:17:35 +02:00
Sergey Melnikov	ece0555600	Push proxy metrics to Victoria Metrics (#3106 )	2022-12-16 14:44:49 +02:00
Joonas Koivunen	73ea0a0b01	fix(remote_storage): use cached credentials (#3128 ) IMDSv2 has limits, and if we query it on every s3 interaction we are going to go over those limits. Changes the s3_bucket client configuration to use: - ChainCredentialsProvider to handle env variables or imds usage - LazyCachingCredentialsProvider to actually cache any credentials Related: https://github.com/awslabs/aws-sdk-rust/issues/629 Possibly related: https://github.com/neondatabase/neon/issues/3118	2022-12-16 14:44:49 +02:00
Arseny Sher	d8f6d6fd6f	Merge pull request #3126 from neondatabase/broker-lb-release Deploy broker with L4 LB in new env.	2022-12-16 01:25:28 +03:00
Arseny Sher	d24de169a7	Deploy broker with L4 LB in new env. Seems to be fixing issue with missing keepalives.	2022-12-16 01:45:32 +04:00
Arseny Sher	0816168296	Hotfix: terminate subscription if channel is full. Might help as a hotfix, but need to understand root better.	2022-12-15 12:23:56 +03:00
Dmitry Rodionov	277b44d57a	Merge pull request #3102 from neondatabase/main Hotfix. See commits for details	2022-12-14 19:38:43 +03:00
MMeent	68c2c3880e	Merge pull request #3038 from neondatabase/main Release 22-12-14	2022-12-14 14:35:47 +01:00
Arthur Petukhovsky	49da498f65	Merge pull request #2833 from neondatabase/main Release 2022-11-16	2022-11-17 08:44:10 +01:00
Stas Kelvich	2c76ba3dd7	Merge pull request #2718 from neondatabase/main-rc-22-10-28 Release 22-10-28	2022-10-28 20:33:56 +03:00
Arseny Sher	dbe3dc69ad	Merge branch 'main' into main-rc-22-10-28 Release 22-10-28.	2022-10-28 19:10:11 +04:00
Arseny Sher	8e5bb3ed49	Enable etcd compaction in neon_local.	2022-10-27 12:53:20 +03:00
Stas Kelvich	ab0be7b8da	Avoid debian-testing packages in compute Dockerfiles plv8 can only be built with a fairly new gold linker version. We used to install it via binutils packages from testing, but it also updates libc and that causes troubles in the resulting image as different extensions were built against different libc versions. We could either use libc from debian-testing everywhere or restrain from using testing packages and install necessary programs manually. This patch uses the latter approach: gold for plv8 and cmake for h3 are installed manually. In a passing declare h3_postgis as a safe extension (previous omission).	2022-10-27 12:53:20 +03:00
bojanserafimov	b4c55f5d24	Move pagestream api to libs/pageserver_api (#2698 )	2022-10-27 12:53:20 +03:00
mikecaat	ede70d833c	Add a docker-compose example file (#1943 ) (#2666 ) Co-authored-by: Masahiro Ikeda <masahiro.ikeda.us@hco.ntt.co.jp>	2022-10-27 12:53:20 +03:00
Sergey Melnikov	70c3d18bb0	Do not release to new staging proxies on release (#2685 )	2022-10-27 12:53:20 +03:00
bojanserafimov	7a491f52c4	Add draw_timeline binary (#2688 )	2022-10-27 12:53:20 +03:00
Alexander Bayandin	323c4ecb4f	Add data format backward compatibility tests (#2626 )	2022-10-27 12:53:20 +03:00
Anastasia Lubennikova	3d2466607e	Merge pull request #2692 from neondatabase/main-rc Release 2022-10-25	2022-10-25 18:18:58 +03:00
Anastasia Lubennikova	ed478b39f4	Merge branch 'release' into main-rc	2022-10-25 17:06:33 +03:00
Stas Kelvich	91585a558d	Merge pull request #2678 from neondatabase/stas/hotfix_schema Hotfix to disable grant create on public schema	2022-10-22 02:54:31 +03:00
Stas Kelvich	93467eae1f	Hotfix to disable grant create on public schema `GRANT CREATE ON SCHEMA public` fails if there is no schema `public`. Disable it in release for now and make a better fix later (it is needed for v15 support).	2022-10-22 02:26:28 +03:00
Stas Kelvich	f3aac81d19	Merge pull request #2668 from neondatabase/main Release 2022-10-21	2022-10-21 15:21:42 +03:00
Stas Kelvich	979ad60c19	Merge pull request #2581 from neondatabase/main Release 2022-10-07	2022-10-07 16:50:55 +03:00
Stas Kelvich	9316cb1b1f	Merge pull request #2573 from neondatabase/main Release 2022-10-06	2022-10-07 11:07:06 +03:00
Anastasia Lubennikova	e7939a527a	Merge pull request #2377 from neondatabase/main Release 2022-09-01	2022-09-01 20:20:44 +03:00
Arthur Petukhovsky	36d26665e1	Merge pull request #2299 from neondatabase/main * Check for entire range during sasl validation (#2281) * Gen2 GH runner (#2128) * Re-add rustup override * Try s3 bucket * Set git version * Use v4 cache key to prevent problems * Switch to v5 for key * Add second rustup fix * Rebase * Add kaniko steps * Fix typo and set compress level * Disable global run default * Specify shell for step * Change approach with kaniko * Try less verbose shell spec * Add submodule pull * Add promote step * Adjust dependency chain * Try default swap again * Use env * Don't override aws key * Make kaniko build conditional * Specify runs on * Try without dependency link * Try soft fail * Use image with git * Try passing to next step * Fix duplicate * Try other approach * Try other approach * Fix typo * Try other syntax * Set env * Adjust setup * Try step 1 * Add link * Try global env * Fix mistake * Debug * Try other syntax * Try other approach * Change order * Move output one step down * Put output up one level * Try other syntax * Skip build * Try output * Re-enable build * Try other syntax * Skip middle step * Update check * Try first step of dockerhub push * Update needs dependency * Try explicit dir * Add missing package * Try other approach * Try other approach * Specify region * Use with * Try other approach * Add debug * Try other approach * Set region * Follow AWS example * Try github approach * Skip Qemu * Try stdin * Missing steps * Add missing close * Add echo debug * Try v2 endpoint * Use v1 endpoint * Try without quotes * Revert * Try crane * Add debug * Split steps * Fix duplicate * Add shell step * Conform to options * Add verbose flag * Try single step * Try workaround * First request fails hunch * Try bullseye image * Try other approach * Adjust verbose level * Try previous step * Add more debug * Remove debug step * Remove rogue indent * Try with larger image * Add build tag step * Update workflow for testing * Add tag step for test * Remove unused * Update dependency chain * Add ownership fix * Use matrix for promote * Force update * Force build * Remove unused * Add new image * Add missing argument * Update dockerfile copy * Update Dockerfile * Update clone * Update dockerfile * Go to correct folder * Use correct format * Update dockerfile * Remove cd * Debug find where we are * Add debug on first step * Changedir to postgres * Set workdir * Use v1 approach * Use other dependency * Try other approach * Try other approach * Update dockerfile * Update approach * Update dockerfile * Update approach * Update dockerfile * Update dockerfile * Add workspace hack * Update Dockerfile * Update Dockerfile * Update Dockerfile * Change last step * Cleanup pull in prep for review * Force build images * Add condition for latest tagging * Use pinned version * Try without name value * Remove more names * Shorten names * Add kaniko comments * Pin kaniko * Pin crane and ecr helper * Up one level * Switch to pinned tag for rust image * Force update for test Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box> * Add missing step output, revert one deploy step (#2285) * Add missing step output, revert one deploy step * Conform to syntax * Update approach * Add missing value * Add missing needs Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Error for fatal not git repo (#2286) Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Use main, not branch for ref check (#2288) * Use main, not branch for ref check * Add more debug * Count main, not head * Try new approach * Conform to syntax * Update approach * Get full history * Skip checkout * Cleanup debug * Remove more debug Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Fix docker zombie process issue (#2289) * Fix docker zombie process issue * Init everywhere Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Fix 1.63 clippy lints (#2282) * split out timeline metrics, track layer map loading and size calculation * reset rust cache for clippy run to avoid an ICE additionally remove trailing whitespaces * Rename pg_control_ffi.h to bindgen_deps.h, for clarity. The pg_control_ffi.h name implies that it only includes stuff related to pg_control.h. That's mostly true currently, but really the point of the file is to include everything that we need to generate Rust definitions from. * Make local mypy behave like CI mypy (#2291) * Fix flaky pageserver restarts in tests (#2261) * Remove extra type aliases (#2280) * Update cachepot endpoint (#2290) * Update cachepot endpoint * Update dockerfile & remove env * Update image building process * Cannot use metadata endpoint for this * Update workflow * Conform to kaniko syntax * Update syntax * Update approach * Update dockerfiles * Force update * Update dockerfiles * Update dockerfile * Cleanup dockerfiles * Update s3 test location * Revert s3 experiment * Add more debug * Specify aws region * Remove debug, add prefix * Remove one more debug Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * workflows/benchmarking: increase timeout (#2294) * Rework `init` in pageserver CLI (#2272) * Do not create initial tenant and timeline (adjust Python tests for that) * Rework config handling during init, add --update-config to manage local config updates * Fix: Always build images (#2296) * Always build images * Remove unused Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Move auto-generated 'bindings' to a separate inner module. Re-export only things that are used by other modules. In the future, I'm imagining that we run bindgen twice, for Postgres v14 and v15. The two sets of bindings would go into separate 'bindings_v14' and 'bindings_v15' modules. Rearrange postgres_ffi modules. Move function, to avoid Postgres version dependency in timelines.rs Move function to generate a logical-message WAL record to postgres_ffi. * fix cargo test * Fix walreceiver and safekeeper bugs (#2295) - There was an issue with zero commit_lsn `reason: LaggingWal { current_commit_lsn: 0/0, new_commit_lsn: 1/6FD90D38, threshold: 10485760 } }`. The problem was in `send_wal.rs`, where we initialized `end_pos = Lsn(0)` and in some cases sent it to the pageserver. - IDENTIFY_SYSTEM previously returned `flush_lsn` as a physical end of WAL. Now it returns `flush_lsn` (as it was) to walproposer and `commit_lsn` to everyone else including pageserver. - There was an issue with backoff where connection was cancelled right after initialization: `connected!` -> `safekeeper_handle_db: Connection cancelled` -> `Backoff: waiting 3 seconds`. The problem was in sleeping before establishing the connection. This is fixed by reworking retry logic. - There was an issue with getting `NoKeepAlives` reason in a loop. The issue is probably the same as the previous. - There was an issue with filtering safekeepers based on retry attempts, which could filter some safekeepers indefinetely. This is fixed by using retry cooldown duration instead of retry attempts. - Some `send_wal.rs` connections failed with errors without context. This is fixed by adding a timeline to safekeepers errors. New retry logic works like this: - Every candidate has a `next_retry_at` timestamp and is not considered for connection until that moment - When walreceiver connection is closed, we update `next_retry_at` using exponential backoff, increasing the cooldown on every disconnect. - When `last_record_lsn` was advanced using the WAL from the safekeeper, we reset the retry cooldown and exponential backoff, allowing walreceiver to reconnect to the same safekeeper instantly. * on safekeeper registration pass availability zone param (#2292) Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com> Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Anton Galitsyn <agalitsyn@users.noreply.github.com>	2022-08-18 15:32:33 +03:00
Arthur Petukhovsky	873347f977	Merge pull request #2275 from neondatabase/main * github/workflows: Fix git dubious ownership (#2223) * Move relation size cache from WalIngest to DatadirTimeline (#2094) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * refactor: replace lazy-static with once-cell (#2195) - Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy` - fixes #1147 Signed-off-by: Ankur Srivastava <best.ankur@gmail.com> * Add more buckets to pageserver latency metrics (#2225) * ignore record property warning to fix benchmarks * increase statement timeout * use event so it fires only if workload thread successfully finished * remove debug log * increase timeout to pass test with real s3 * avoid duplicate parameter, increase timeout * Major migration script (#2073) This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview. Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> * Fix etcd typos * Fix links to safekeeper protocol docs. (#2188) safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in commit `0b14fdb078`, as part of reorganizing the docs into 'mdbook' format. Fixes issue #1475. Thanks to @banks for spotting the outdated references. In addition to fixing the above issue, this patch also fixes other broken links as a result of `0b14fdb078`. See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480. Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech> * Update CONTRIBUTING.md * Update CONTRIBUTING.md * support node id and remote storage params in docker_entrypoint.sh * Safe truncate (#2218) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Check if relation exists before trying to truncat it refer #1932 * Add test reporducing FSM truncate problem Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Fix exponential backoff values * Update back `vendor/postgres` back; it was changed accidentally. (#2251) Commit `4227cfc96e` accidentally reverted vendor/postgres to an older version. Update it back. * Add pageserver checkpoint_timeout option. To flush inmemory layer eventually when no new data arrives, which helps safekeepers to suspend activity (stop pushing to the broker). Default 10m should be ok. * Share exponential backoff code and fix logic for delete task failure (#2252) * Fix bug when import large (>1GB) relations (#2172) Resolves #2097 - use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get` - update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. + `small` is the old test + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build * Fix timeline physical size flaky tests (#2244) Resolves #2212. - use `wait_for_last_flush_lsn` in `test_timeline_physical_size_` tests ## Context Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk. ### Note The `LayerFlushThread` threads are processed concurrently, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly postgres_ffi/waldecoder: validate more header fields * postgres_ffi/waldecoder: remove unused startlsn * postgres_ffi/waldecoder: introduce explicit `enum State` Previously it was emulated with a combination of nullable fields. This change should make the logic more readable. * disable `test_import_from_pageserver_multisegment` (#2258) This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure. See: #2255, #2256 * get_binaries uses DOCKER_TAG taken from docker image build step (#2260) * [proxy] Rework wire format of the password hack and some errors (#2236) The new format has a few benefits: it's shorter, simpler and human-readable as well. We don't use base64 anymore, since url encoding got us covered. We also show a better error in case we couldn't parse the payload; the users should know it's all about passing the correct project name. * test_runner/pg_clients: collect docker logs (#2259) * get_binaries script fix (#2263) * get_binaries uses DOCKER_TAG taken from docker image build step * remove docker tag discovery at all and fix get_binaries for version variable * Better storage sync logs (#2268) * Find end of WAL on safekeepers using WalStreamDecoder. We could make it inside wal_storage.rs, but taking into account that - wal_storage.rs reading is async - we don't need s3 here - error handling is different; error during decoding is normal I decided to put it separately. Test cargo test test_find_end_of_wal_last_crossing_segment prepared earlier by @yeputons passes now. Fixes https://github.com/neondatabase/neon/issues/544 https://github.com/neondatabase/cloud/issues/2004 Supersedes https://github.com/neondatabase/neon/pull/2066 * Improve walreceiver logic (#2253) This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers. - There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237 - Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down. - Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second. - `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast. - `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck. * increase timeout in wait_for_upload to avoid spurious failures when testing with real s3 * Bump vendor/postgres to include XLP_FIRST_IS_CONTRECORD fix. (#2274) * Set up a workflow to run pgbench against captest (#2077) Signed-off-by: Ankur Srivastava <best.ankur@gmail.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Ankur Srivastava <ansrivas@users.noreply.github.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech> Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com> Co-authored-by: Arseny Sher <sher-ars@yandex.ru> Co-authored-by: Egor Suvorov <egor@neon.tech> Co-authored-by: Andrey Taranik <andrey@cicd.team> Co-authored-by: Dmitry Ivanov <ivadmi5@gmail.com>	2022-08-15 21:30:45 +03:00
Arthur Petukhovsky	e814ac16f9	Merge pull request #2219 from neondatabase/main Release 2022-08-04	2022-08-04 20:06:34 +03:00
Heikki Linnakangas	ad3055d386	Merge pull request #2203 from neondatabase/release-uuid-ossp Deploy new storage and compute version to production Release 2022-08-02	2022-08-02 15:08:14 +03:00
Heikki Linnakangas	94e03eb452	Merge remote-tracking branch 'origin/main' into 'release' Release 2022-08-01	2022-08-02 12:43:49 +03:00
Sergey Melnikov	380f26ef79	Merge pull request #2170 from neondatabase/main (Release 2022-07-28) Release 2022-07-28	2022-07-28 14:16:52 +03:00
Arthur Petukhovsky	3c5b7f59d7	Merge pull request #2119 from neondatabase/main Release 2022-07-19	2022-07-19 11:58:48 +03:00
Arthur Petukhovsky	fee89f80b5	Merge pull request #2115 from neondatabase/main-2022-07-18 Release 2022-07-18	2022-07-18 19:21:11 +03:00
Arthur Petukhovsky	41cce8eaf1	Merge remote-tracking branch 'origin/release' into main-2022-07-18	2022-07-18 18:21:20 +03:00
Alexey Kondratov	f88fe0218d	Merge pull request #1842 from neondatabase/release-deploy-hotfix [HOTFIX] Release deploy fix This PR uses this branch neondatabase/postgres#171 and several required commits from the main to use only locally built compute-tools. This should allow us to rollout safekeepers sync issue fix on prod	2022-06-01 11:04:30 +03:00
Alexey Kondratov	cc856eca85	Install missing openssl packages in the Github Actions workflow	2022-05-31 21:31:31 +02:00
Alexey Kondratov	cf350c6002	Use :local compute-tools tag to build compute-node image	2022-05-31 21:31:16 +02:00
Arseny Sher	0ce6b6a0a3	Merge pull request #1836 from neondatabase/release-hotfix-basebackup-lsn-page-boundary Bump vendor/postgres to hotfix basebackup LSN comparison.	2022-05-31 16:54:03 +04:00
Arseny Sher	73f247d537	Bump vendor/postgres to hotfix basebackup LSN comparison.	2022-05-31 16:00:50 +04:00
Andrey Taranik	960be82183	Merge pull request #1792 from neondatabase/main Release 2202-05-25 (second)	2022-05-25 16:37:57 +03:00
Andrey Taranik	806e5a6c19	Merge pull request #1787 from neondatabase/main Release 2022-05-25	2022-05-25 13:34:11 +03:00
Alexey Kondratov	8d5df07cce	Merge pull request #1385 from zenithdb/main Release main 2022-03-22	2022-03-22 05:04:34 -05:00
Andrey Taranik	df7a9d1407	release fix 2022-03-16 (#1375 )	2022-03-17 00:43:28 +03:00

1201 changed files with 151632 additions and 45381 deletions

									
										10

.cargo/config.toml
									
												View File
												
				@@ -3,6 +3,16 @@

				# by the RUSTDOCFLAGS env var in CI.

				rustdocflags = ["-Arustdoc::private_intra_doc_links"]

				# Enable frame pointers. This may have a minor performance overhead, but makes it easier and more

				# efficient to obtain stack traces (and thus CPU/heap profiles). It may also avoid seg faults that

				# we've seen with libunwind-based profiling. See also:

				#

				# * <https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>

				# * <https://github.com/rust-lang/rust/pull/122646>

				#

				# NB: the RUSTFLAGS envvar will replace this. Make sure to update e.g. Dockerfile as well.

				rustflags = ["-Cforce-frame-pointers=yes"]

				[alias]

				build_testing = ["build", "--features", "testing"]

				neon = ["run", "--bin", "neon_local"]

									
										31

.config/hakari.toml
									
												View File
												
				@@ -23,10 +23,33 @@ platforms = [

				]

				[final-excludes]

				# vm_monitor benefits from the same Cargo.lock as the rest of our artifacts, but

				# it is built primarly in separate repo neondatabase/autoscaling and thus is excluded

				# from depending on workspace-hack because most of the dependencies are not used.

				workspace-members = ["vm_monitor"]

				workspace-members = [

				    # vm_monitor benefits from the same Cargo.lock as the rest of our artifacts, but

				    # it is built primarly in separate repo neondatabase/autoscaling and thus is excluded

				    # from depending on workspace-hack because most of the dependencies are not used.

				    "vm_monitor",

				    # All of these exist in libs and are not usually built independently.

				    # Putting workspace hack there adds a bottleneck for cargo builds.

				    "compute_api",

				    "consumption_metrics",

				    "desim",

				    "metrics",

				    "pageserver_api",

				    "postgres_backend",

				    "postgres_connection",

				    "postgres_ffi",

				    "pq_proto",

				    "remote_storage",

				    "safekeeper_api",

				    "tenant_size_model",

				    "tracing-utils",

				    "utils",

				    "wal_craft",

				    "walproposer",

				    "postgres-protocol2",

				    "postgres-types2",

				    "tokio-postgres2",

				]

				# Write out exact versions rather than a semver range. (Defaults to false.)

				# exact-versions = true

6

.dockerignore

View File

@@ -5,26 +5,22 @@
 !Cargo.toml
 !Makefile
 !rust-toolchain.toml
 !scripts/combine_control_files.py
 !scripts/ninstall.sh
 !vm-cgconfig.conf
 !docker-compose/run-tests.sh
 # Directories
 !.cargo/
 !.config/
 !compute/
 !compute_tools/
 !control_plane/
 !libs/
 !neon_local/
 !pageserver/
 !patches/
 !pgxn/
 !proxy/
 !storage_scrubber/
 !safekeeper/
 !storage_broker/
 !storage_controller/
 !trace/
 !vendor/postgres-*/
 !workspace_hack/

2

.gitattributes vendored Normal file

View File

@@ -0,0 +1,2 @@
 # allows for nicer hunk headers with git show
 *.rs diff=rust

									
										6

.github/ISSUE_TEMPLATE/config.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,6 @@

				blank_issues_enabled: true

				contact_links:

				  - name: Feature request

				    url: https://console.neon.tech/app/projects?modal=feedback

				    about: For feature requests in the Neon product, please submit via the feedback form on `https://console.neon.tech`

									
										18

.github/actionlint.yml
									
										vendored
									
												View File
												
				@@ -1,13 +1,29 @@

				self-hosted-runner:

				  labels:

				    - arm64

				    - gen3

				    - large

				    - large-arm64

				    - small

				    - small-metal

				    - small-arm64

				    - us-east-2

				config-variables:

				  - AZURE_DEV_CLIENT_ID

				  - AZURE_DEV_REGISTRY_NAME

				  - AZURE_DEV_SUBSCRIPTION_ID

				  - AZURE_PROD_CLIENT_ID

				  - AZURE_PROD_REGISTRY_NAME

				  - AZURE_PROD_SUBSCRIPTION_ID

				  - AZURE_TENANT_ID

				  - BENCHMARK_PROJECT_ID_PUB

				  - BENCHMARK_PROJECT_ID_SUB

				  - REMOTE_STORAGE_AZURE_CONTAINER

				  - REMOTE_STORAGE_AZURE_REGION

				  - SLACK_UPCOMING_RELEASE_CHANNEL_ID

				  - DEV_AWS_OIDC_ROLE_ARN

				  - BENCHMARK_INGEST_TARGET_PROJECTID

				  - PGREGRESS_PG16_PROJECT_ID

				  - PGREGRESS_PG17_PROJECT_ID

				  - SLACK_ON_CALL_QA_STAGING_STREAM

				  - DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN

				  - SLACK_ON_CALL_STORAGE_STAGING_STREAM

									
										17

.github/actions/allure-report-generate/action.yml
									
										vendored
									
												View File
												
				@@ -7,6 +7,9 @@ inputs:

				    type: boolean

				    required: false

				    default: false

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				outputs:

				  base-url:

				@@ -39,7 +42,8 @@ runs:

				        PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)

				        if [ "${PR_NUMBER}" != "null" ]; then

				          BRANCH_OR_PR=pr-${PR_NUMBER}

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \

				             [ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then

				          # Shortcut for special branches

				          BRANCH_OR_PR=${GITHUB_REF_NAME}

				        else

				@@ -79,6 +83,13 @@ runs:

				        ALLURE_VERSION: 2.27.0

				        ALLURE_ZIP_SHA256: b071858fb2fa542c65d8f152c5c40d26267b2dfb74df1f1608a589ecca38e777

				    - uses: aws-actions/configure-aws-credentials@v4

				      if: ${{ !cancelled() }}

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

				    # Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this

				    - name: Acquire lock

				      shell: bash -euxo pipefail {0}

				@@ -183,7 +194,7 @@ runs:

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}

				    - name: Store Allure test stat in the DB (new)

				      if: ${{ !cancelled() && inputs.store-test-results-into-db == 'true' }}

				@@ -221,6 +232,8 @@ runs:

				        REPORT_URL: ${{ steps.generate-report.outputs.report-url }}

				        COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				      with:

				        # Retry script for 5XX server errors: https://github.com/actions/github-script#retries

				        retries: 5

				        script: |

				          const { REPORT_URL, COMMIT_SHA } = process.env

									
										13

.github/actions/allure-report-store/action.yml
									
										vendored
									
												View File
												
				@@ -8,6 +8,9 @@ inputs:

				  unique-key:

				    description: 'string to distinguish different results in the same run'

				    required: true

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				runs:

				  using: "composite"

				@@ -19,7 +22,8 @@ runs:

				        PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)

				        if [ "${PR_NUMBER}" != "null" ]; then

				          BRANCH_OR_PR=pr-${PR_NUMBER}

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \

				             [ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then

				          # Shortcut for special branches

				          BRANCH_OR_PR=${GITHUB_REF_NAME}

				        else

				@@ -31,6 +35,13 @@ runs:

				      env:

				        REPORT_DIR: ${{ inputs.report-dir }}

				    - uses: aws-actions/configure-aws-credentials@v4

				      if: ${{ !cancelled() }}

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

				    - name: Upload test results

				      shell: bash -euxo pipefail {0}

				      run: |

									
										9

.github/actions/download/action.yml
									
										vendored
									
												View File
												
				@@ -15,10 +15,19 @@ inputs:

				  prefix:

				    description: "S3 prefix. Default is '${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"

				    required: false

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				runs:

				  using: "composite"

				  steps:

				    - uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600

				    - name: Download artifact

				      id: download-artifact

				      shell: bash -euxo pipefail {0}

									
										64

.github/actions/neon-project-create/action.yml
									
										vendored
									
												View File
												
				@@ -9,17 +9,39 @@ inputs:

				    description: 'Region ID, if not set the project will be created in the default region'

				    default: aws-us-east-2

				  postgres_version:

				    description: 'Postgres version; default is 15'

				    default: '15'

				    description: 'Postgres version; default is 16'

				    default: '16'

				  api_host:

				    description: 'Neon API host'

				    default: console-stage.neon.build

				  provisioner:

				    description: 'k8s-pod or k8s-neonvm'

				    default: 'k8s-pod'

				  compute_units:

				    description: '[Min, Max] compute units; Min and Max are used for k8s-neonvm with autoscaling, for k8s-pod values Min and Max should be equal'

				    description: '[Min, Max] compute units'

				    default: '[1, 1]'

				  # settings below only needed if you want the project to be sharded from the beginning

				  shard_split_project:

				    description: 'by default new projects are not shard-split, specify true to shard-split'

				    required: false

				    default: 'false'

				  admin_api_key:

				    description: 'Admin API Key needed for shard-splitting. Must be specified if shard_split_project is true'

				    required: false

				  shard_count:

				    description: 'Number of shards to split the project into, only applies if shard_split_project is true'

				    required: false

				    default: '8'

				  stripe_size:

				    description: 'Stripe size, optional, in 8kiB pages.  e.g. set 2048 for 16MB stripes. Default is 128 MiB, only applies if shard_split_project is true'

				    required: false

				    default: '32768'

				  psql_path:

				    description: 'Path to psql binary - it is caller responsibility to provision the psql binary'

				    required: false

				    default: '/tmp/neon/pg_install/v16/bin/psql'

				  libpq_lib_path:

				    description: 'Path to directory containing libpq library - it is caller responsibility to provision the libpq library'

				    required: false

				    default: '/tmp/neon/pg_install/v16/lib'

				outputs:

				  dsn:

				@@ -37,10 +59,6 @@ runs:

				      # A shell without `set -x` to not to expose password/dsn in logs

				      shell: bash -euo pipefail {0}

				      run: |

				        if [ "${PROVISIONER}" == "k8s-pod" ] && [ "${MIN_CU}" != "${MAX_CU}" ]; then

				          echo >&2 "For k8s-pod provisioner MIN_CU should be equal to MAX_CU"

				        fi

				        project=$(curl \

				          "https://${API_HOST}/api/v2/projects" \

				          --fail \

				@@ -52,7 +70,7 @@ runs:

				              \"name\": \"Created by actions/neon-project-create; GITHUB_RUN_ID=${GITHUB_RUN_ID}\",

				              \"pg_version\": ${POSTGRES_VERSION},

				              \"region_id\": \"${REGION_ID}\",

				              \"provisioner\": \"${PROVISIONER}\",

				              \"provisioner\": \"k8s-neonvm\",

				              \"autoscaling_limit_min_cu\": ${MIN_CU},

				              \"autoscaling_limit_max_cu\": ${MAX_CU},

				              \"settings\": { }

				@@ -70,11 +88,33 @@ runs:

				        echo "project_id=${project_id}" >> $GITHUB_OUTPUT

				        echo "Project ${project_id} has been created"

				        if [ "${SHARD_SPLIT_PROJECT}" = "true" ]; then

				          # determine tenant ID

				          TENANT_ID=`${PSQL} ${dsn} -t -A -c "SHOW neon.tenant_id"`

				          echo "Splitting project ${project_id} with tenant_id ${TENANT_ID} into $((SHARD_COUNT)) shards with stripe size $((STRIPE_SIZE))"

				          echo "Sending PUT request to https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split"

				          echo "with body {\"new_shard_count\": $((SHARD_COUNT)), \"new_stripe_size\": $((STRIPE_SIZE))}"

				          # we need an ADMIN API KEY to invoke storage controller API for shard splitting (bash -u above checks that the variable is set)

				          curl -X PUT \

				            "https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split" \

				            -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \

				            -d "{\"new_shard_count\": $SHARD_COUNT, \"new_stripe_size\": $STRIPE_SIZE}"

				        fi

				      env:

				        API_HOST: ${{ inputs.api_host }}

				        API_KEY: ${{ inputs.api_key }}

				        REGION_ID: ${{ inputs.region_id }}

				        POSTGRES_VERSION: ${{ inputs.postgres_version }}

				        PROVISIONER: ${{ inputs.provisioner }}

				        MIN_CU: ${{ fromJSON(inputs.compute_units)[0] }}

				        MAX_CU: ${{ fromJSON(inputs.compute_units)[1] }}

				        SHARD_SPLIT_PROJECT: ${{ inputs.shard_split_project }}

				        ADMIN_API_KEY: ${{ inputs.admin_api_key }}

				        SHARD_COUNT: ${{ inputs.shard_count }}

				        STRIPE_SIZE: ${{ inputs.stripe_size }}

				        PSQL: ${{ inputs.psql_path }}

				        LD_LIBRARY_PATH: ${{ inputs.libpq_lib_path }}

									
										68

.github/actions/run-python-test-set/action.yml
									
										vendored
									
												View File
												
				@@ -36,18 +36,21 @@ inputs:

				    description: 'Region name for real s3 tests'

				    required: false

				    default: ''

				  rerun_flaky:

				    description: 'Whether to rerun flaky tests'

				  rerun_failed:

				    description: 'Whether to rerun failed tests'

				    required: false

				    default: 'false'

				  pg_version:

				    description: 'Postgres version to use for tests'

				    required: false

				    default: 'v14'

				    default: 'v16'

				  benchmark_durations:

				    description: 'benchmark durations JSON'

				    required: false

				    default: '{}'

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				runs:

				  using: "composite"

				@@ -58,6 +61,7 @@ runs:

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact

				        path: /tmp/neon

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Download Neon binaries for the previous release

				      if: inputs.build_type != 'remote'

				@@ -66,30 +70,31 @@ runs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact

				        path: /tmp/neon-previous

				        prefix: latest

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Download compatibility snapshot

				      if: inputs.build_type != 'remote'

				      uses: ./.github/actions/download

				      with:

				        name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}

				        name: compatibility-snapshot-${{ runner.arch }}-${{ inputs.build_type }}-pg${{ inputs.pg_version }}

				        path: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}

				        prefix: latest

				        # The lack of compatibility snapshot (for example, for the new Postgres version)

				        # shouldn't fail the whole job. Only relevant test should fail.

				        skip-if-does-not-exist: true

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Checkout

				      if: inputs.needs_postgres_source == 'true'

				      uses: actions/checkout@v4

				      with:

				        submodules: true

				        fetch-depth: 1

				    - name: Cache poetry deps

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}

				    - name: Install Python deps

				      shell: bash -euxo pipefail {0}

				@@ -105,7 +110,7 @@ runs:

				        COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}

				        ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'backward compatibility breakage')

				        ALLOW_FORWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'forward compatibility breakage')

				        RERUN_FLAKY: ${{ inputs.rerun_flaky }}

				        RERUN_FAILED: ${{ inputs.rerun_failed }}

				        PG_VERSION: ${{ inputs.pg_version }}

				      shell: bash -euxo pipefail {0}

				      run: |

				@@ -114,6 +119,8 @@ runs:

				        export PLATFORM=${PLATFORM:-github-actions-selfhosted}

				        export POSTGRES_DISTRIB_DIR=${POSTGRES_DISTRIB_DIR:-/tmp/neon/pg_install}

				        export DEFAULT_PG_VERSION=${PG_VERSION#v}

				        export LD_LIBRARY_PATH=${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/lib

				        export BENCHMARK_CONNSTR=${BENCHMARK_CONNSTR:-}

				        if [ "${BUILD_TYPE}" = "remote" ]; then

				          export REMOTE_ENV=1

				@@ -129,8 +136,8 @@ runs:

				          exit 1

				        fi

				        if [[ "${{ inputs.run_in_parallel }}" == "true" ]]; then

				          # -n16 uses sixteen processes to run tests via pytest-xdist

				          EXTRA_PARAMS="-n16 $EXTRA_PARAMS"

				          # -n sets the number of parallel processes that pytest-xdist will run

				          EXTRA_PARAMS="-n12 $EXTRA_PARAMS"

				          # --dist=loadgroup points tests marked with @pytest.mark.xdist_group

				          # to the same worker to make @pytest.mark.order work with xdist

				@@ -149,15 +156,8 @@ runs:

				          EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"

				        fi

				        if [ "${RERUN_FLAKY}" == "true" ]; then

				          mkdir -p $TEST_OUTPUT

				          poetry run ./scripts/flaky_tests.py "${TEST_RESULT_CONNSTR}" \

				                                              --days 7 \

				                                              --output "$TEST_OUTPUT/flaky.json" \

				                                              --pg-version "${DEFAULT_PG_VERSION}" \

				                                              --build-type "${BUILD_TYPE}"

				          EXTRA_PARAMS="--flaky-tests-json $TEST_OUTPUT/flaky.json $EXTRA_PARAMS"

				        if [ "${RERUN_FAILED}" == "true" ]; then

				          EXTRA_PARAMS="--reruns 2 $EXTRA_PARAMS"

				        fi

				        # We use pytest-split plugin to run benchmarks in parallel on different CI runners

				@@ -168,17 +168,23 @@ runs:

				          EXTRA_PARAMS="--durations-path $TEST_OUTPUT/benchmark_durations.json $EXTRA_PARAMS"

				        fi

				        if [[ "${{ inputs.build_type }}" == "debug" ]]; then

				        if [[ $BUILD_TYPE == "debug" && $RUNNER_ARCH == 'X64' ]]; then

				          cov_prefix=(scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage run)

				        elif [[ "${{ inputs.build_type }}" == "release" ]]; then

				          cov_prefix=()

				        else

				          cov_prefix=()

				        fi

				        # Wake up the cluster if we use remote neon instance

				        if [ "${{ inputs.build_type }}" = "remote" ] && [ -n "${BENCHMARK_CONNSTR}" ]; then

				          ${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin/psql ${BENCHMARK_CONNSTR} -c "SELECT version();"

				          QUERIES=("SELECT version()")

				          if [[ "${PLATFORM}" = "neon"* ]]; then

				            QUERIES+=("SHOW neon.tenant_id")

				            QUERIES+=("SHOW neon.timeline_id")

				          fi

				          for q in "${QUERIES[@]}"; do

				            ${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin/psql ${BENCHMARK_CONNSTR} -c "${q}"

				          done

				        fi

				        # Run the tests.

				@@ -204,13 +210,24 @@ runs:

				        fi

				    - name: Upload compatibility snapshot

				      if: github.ref_name == 'release'

				      # Note, that we use `github.base_ref` which is a target branch for a PR

				      if: github.event_name == 'pull_request' && github.base_ref == 'release'

				      uses: ./.github/actions/upload

				      with:

				        name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}-${{ github.run_id }}

				        name: compatibility-snapshot-${{ runner.arch }}-${{ inputs.build_type }}-pg${{ inputs.pg_version }}

				        # Directory is created by test_compatibility.py::test_create_snapshot, keep the path in sync with the test

				        path: /tmp/test_output/compatibility_snapshot_pg${{ inputs.pg_version }}/

				        prefix: latest

				        # The lack of compatibility snapshot shouldn't fail the job

				        # (for example if we didn't run the test for non build-and-test workflow)

				        skip-if-does-not-exist: true

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - uses: aws-actions/configure-aws-credentials@v4

				      if: ${{ !cancelled() }}

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

				    - name: Upload test results

				      if: ${{ !cancelled() }}

				@@ -218,3 +235,4 @@ runs:

				      with:

				        report-dir: /tmp/test_output/allure/results

				        unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

									
										2

.github/actions/save-coverage-data/action.yml
									
										vendored
									
												View File
												
				@@ -14,9 +14,11 @@ runs:

				        name: coverage-data-artifact

				        path: /tmp/coverage

				        skip-if-does-not-exist: true # skip if there's no previous coverage to download

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Upload coverage data

				      uses: ./.github/actions/upload

				      with:

				        name: coverage-data-artifact

				        path: /tmp/coverage

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

									
										29

.github/actions/upload/action.yml
									
										vendored
									
												View File
												
				@@ -7,18 +7,28 @@ inputs:

				  path:

				    description: "A directory or file to upload"

				    required: true

				  skip-if-does-not-exist:

				    description: "Allow to skip if path doesn't exist, fail otherwise"

				    default: false

				    required: false

				  prefix:

				    description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"

				    required: false

				  aws-oicd-role-arn:

				    description: "the OIDC role arn for aws auth"

				    required: false

				    default: ""

				runs:

				  using: "composite"

				  steps:

				    - name: Prepare artifact

				      id: prepare-artifact

				      shell: bash -euxo pipefail {0}

				      env:

				        SOURCE: ${{ inputs.path }}

				        ARCHIVE: /tmp/uploads/${{ inputs.name }}.tar.zst

				        SKIP_IF_DOES_NOT_EXIST: ${{ inputs.skip-if-does-not-exist }}

				      run: |

				        mkdir -p $(dirname $ARCHIVE)

				@@ -33,14 +43,29 @@ runs:

				        elif [ -f ${SOURCE} ]; then

				          time tar -cf ${ARCHIVE} --zstd ${SOURCE}

				        elif ! ls ${SOURCE} > /dev/null 2>&1; then

				          echo >&2 "${SOURCE} does not exist"

				          exit 2

				          if [ "${SKIP_IF_DOES_NOT_EXIST}" = "true" ]; then

				            echo 'SKIPPED=true' >> $GITHUB_OUTPUT

				            exit 0

				          else

				            echo >&2 "${SOURCE} does not exist"

				            exit 2

				          fi

				        else

				          echo >&2 "${SOURCE} is neither a directory nor a file, do not know how to handle it"

				          exit 3

				        fi

				        echo 'SKIPPED=false' >> $GITHUB_OUTPUT

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600

				    - name: Upload artifact

				      if: ${{ steps.prepare-artifact.outputs.SKIPPED == 'false' }}

				      shell: bash -euxo pipefail {0}

				      env:

				        SOURCE: ${{ inputs.path }}

									
										12

.github/file-filters.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				rust_code: ['**/*.rs', '**/Cargo.toml', '**/Cargo.lock']

				v14: ['vendor/postgres-v14/**', 'Makefile', 'pgxn/**']

				v15: ['vendor/postgres-v15/**', 'Makefile', 'pgxn/**']

				v16: ['vendor/postgres-v16/**', 'Makefile', 'pgxn/**']

				v17: ['vendor/postgres-v17/**', 'Makefile', 'pgxn/**']

				rebuild_neon_extra:

				    - .github/workflows/neon_extra_builds.yml

				rebuild_macos:

				    - .github/workflows/build-macos.yml

									
										11

.github/pull_request_template.md
									
										vendored
									
												View File
												
				@@ -1,14 +1,3 @@

				## Problem

				## Summary of changes

				## Checklist before requesting a review

				- [ ] I have performed a self-review of my code.

				- [ ] If it is a core feature, I have added thorough tests.

				- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?

				- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

				## Checklist before merging

				- [ ] Do not forget to reformat commit message to not include the above checklist

									
										172

.github/workflows/_benchmarking_preparation.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,172 @@

				name: Prepare benchmarking databases by restoring dumps

				on:

				  workflow_call:

				    # no inputs needed

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				jobs:

				  setup-databases:

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    strategy:

				      fail-fast: false

				      matrix:

				        platform: [ aws-rds-postgres, aws-aurora-serverless-v2-postgres, neon, neon_pg17 ]

				        database: [ clickbench, tpch, userexample ]

				    env:

				      LD_LIBRARY_PATH: /tmp/neon/pg_install/v16/lib

				      PLATFORM: ${{ matrix.platform }}

				      PG_BINARIES: /tmp/neon/pg_install/v16/bin

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    steps:

				    - name: Set up Connection String

				      id: set-up-prep-connstr

				      run: |

				        case "${PLATFORM}" in

				          neon)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}

				            ;;

				          neon_pg17)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR_PG17 }}

				            ;;

				          aws-rds-postgres)

				            CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }}

				            ;;

				          aws-aurora-serverless-v2-postgres)

				            CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_CONNSTR }}

				            ;;

				          *)

				            echo >&2 "Unknown PLATFORM=${PLATFORM}"

				            exit 1

				            ;;

				        esac

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				    - uses: actions/checkout@v4

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    # we create a table that has one row for each database that we want to restore with the status whether the restore is done

				    - name: Create benchmark_restore_status table if it does not exist

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-prep-connstr.outputs.connstr }}

				        DATABASE_NAME: ${{ matrix.database }}

				      # to avoid a race condition of multiple jobs trying to create the table at the same time,

				      # we use an advisory lock

				      run: |

				        ${PG_BINARIES}/psql "${{ env.BENCHMARK_CONNSTR }}" -c "

				        SELECT pg_advisory_lock(4711);

				        CREATE TABLE IF NOT EXISTS benchmark_restore_status (

				        databasename text primary key,

				        restore_done boolean

				        );

				        SELECT pg_advisory_unlock(4711);

				        "

				    - name: Check if restore is already done

				      id: check-restore-done

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-prep-connstr.outputs.connstr }}

				        DATABASE_NAME: ${{ matrix.database }}

				      run: |

				        skip=false

				        if ${PG_BINARIES}/psql "${{ env.BENCHMARK_CONNSTR }}" -tAc "SELECT 1 FROM benchmark_restore_status WHERE databasename='${{ env.DATABASE_NAME }}' AND restore_done=true;" | grep -q 1; then

				          echo "Restore already done for database ${{ env.DATABASE_NAME }} on platform ${{ env.PLATFORM }}. Skipping this database."

				          skip=true

				        fi

				        echo "skip=${skip}" | tee -a $GITHUB_OUTPUT

				    - name: Check and create database if it does not exist

				      if: steps.check-restore-done.outputs.skip != 'true'

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-prep-connstr.outputs.connstr }}

				        DATABASE_NAME: ${{ matrix.database }}

				      run: |

				        DB_EXISTS=$(${PG_BINARIES}/psql "${{ env.BENCHMARK_CONNSTR }}" -tAc "SELECT 1 FROM pg_database WHERE datname='${{ env.DATABASE_NAME }}'")

				        if [ "$DB_EXISTS" != "1" ]; then

				          echo "Database ${{ env.DATABASE_NAME }} does not exist. Creating it..."

				          ${PG_BINARIES}/psql "${{ env.BENCHMARK_CONNSTR }}" -c "CREATE DATABASE \"${{ env.DATABASE_NAME }}\";"

				        else

				          echo "Database ${{ env.DATABASE_NAME }} already exists."

				        fi

				    - name: Download dump from S3 to /tmp/dumps

				      if: steps.check-restore-done.outputs.skip != 'true'

				      env:

				        DATABASE_NAME: ${{ matrix.database }}

				      run: |

				        mkdir -p /tmp/dumps

				        aws s3 cp s3://neon-github-dev/performance/pgdumps/$DATABASE_NAME/$DATABASE_NAME.pg_dump /tmp/dumps/

				    - name: Replace database name in connection string

				      if: steps.check-restore-done.outputs.skip != 'true'

				      id: replace-dbname

				      env:

				        DATABASE_NAME: ${{ matrix.database }}

				        BENCHMARK_CONNSTR: ${{ steps.set-up-prep-connstr.outputs.connstr }}

				      run: |

				        # Extract the part before the database name

				        base_connstr="${BENCHMARK_CONNSTR%/*}"

				        # Extract the query parameters (if any) after the database name

				        query_params="${BENCHMARK_CONNSTR#*\?}"

				        # Reconstruct the new connection string

				        if [ "$query_params" != "$BENCHMARK_CONNSTR" ]; then

				          new_connstr="${base_connstr}/${DATABASE_NAME}?${query_params}"

				        else

				          new_connstr="${base_connstr}/${DATABASE_NAME}"

				        fi

				        echo "database_connstr=${new_connstr}" >> $GITHUB_OUTPUT

				    - name: Restore dump

				      if: steps.check-restore-done.outputs.skip != 'true'

				      env:

				        DATABASE_NAME: ${{ matrix.database }}

				        DATABASE_CONNSTR: ${{ steps.replace-dbname.outputs.database_connstr }}

				        # the following works only with larger computes:

				        # PGOPTIONS: "-c maintenance_work_mem=8388608 -c max_parallel_maintenance_workers=7"

				        # we add the || true because:

				        # the dumps were created with Neon and contain neon extensions that are not

				        # available in RDS, so we will always report an error, but we can ignore it

				      run: |

				        ${PG_BINARIES}/pg_restore --clean --if-exists --no-owner --jobs=4 \

				        -d "${DATABASE_CONNSTR}" /tmp/dumps/${DATABASE_NAME}.pg_dump || true

				    - name: Update benchmark_restore_status table

				      if: steps.check-restore-done.outputs.skip != 'true'

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-prep-connstr.outputs.connstr }}

				        DATABASE_NAME: ${{ matrix.database }}

				      run: |

				        ${PG_BINARIES}/psql "${{ env.BENCHMARK_CONNSTR }}" -c "

				        INSERT INTO benchmark_restore_status (databasename, restore_done) VALUES ('${{ env.DATABASE_NAME }}', true)

				        ON CONFLICT (databasename) DO UPDATE SET restore_done = true;

				        "

									
										327

.github/workflows/_build-and-test-locally.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,327 @@

				name: Build and Test Locally

				on:

				  workflow_call:

				    inputs:

				      arch:

				        description: 'x64 or arm64'

				        required: true

				        type: string

				      build-tag:

				        description: 'build tag'

				        required: true

				        type: string

				      build-tools-image:

				        description: 'build-tools image'

				        required: true

				        type: string

				      build-type:

				        description: 'debug or release'

				        required: true

				        type: string

				      test-cfg:

				        description: 'a json object of postgres versions and lfc states to run regression tests on'

				        required: true

				        type: string

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				env:

				  RUST_BACKTRACE: 1

				  COPT: '-Werror'

				jobs:

				  build-neon:

				    runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      contents: read

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      # Raise locked memory limit for tokio-epoll-uring.

				      # On 5.10 LTS kernels < 5.10.162 (and generally mainline kernels < 5.12),

				      # io_uring will account the memory of the CQ and SQ as locked.

				      # More details: https://github.com/neondatabase/neon/issues/6373#issuecomment-1905814391

				      options: --init --shm-size=512mb --ulimit memlock=67108864:67108864

				    env:

				      BUILD_TYPE: ${{ inputs.build-type }}

				      GIT_VERSION: ${{ github.event.pull_request.head.sha || github.sha }}

				      BUILD_TAG: ${{ inputs.build-tag }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Set pg 14 revision for caching

				        id: pg_v14_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT

				      - name: Set pg 15 revision for caching

				        id: pg_v15_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT

				      - name: Set pg 16 revision for caching

				        id: pg_v16_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT

				      - name: Set pg 17 revision for caching

				        id: pg_v17_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) >> $GITHUB_OUTPUT

				      # Set some environment variables used by all the steps.

				      #

				      # CARGO_FLAGS is extra options to pass to "cargo build", "cargo test" etc.

				      #   It also includes --features, if any

				      #

				      # CARGO_FEATURES is passed to "cargo metadata". It is separate from CARGO_FLAGS,

				      #   because "cargo metadata" doesn't accept --release or --debug options

				      #

				      # We run tests with addtional features, that are turned off by default (e.g. in release builds), see

				      # corresponding Cargo.toml files for their descriptions.

				      - name: Set env variables

				        env:

				          ARCH: ${{ inputs.arch }}

				        run: |

				          CARGO_FEATURES="--features testing"

				          if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' ]]; then

				            cov_prefix="scripts/coverage --profraw-prefix=$GITHUB_JOB --dir=/tmp/coverage run"

				            CARGO_FLAGS="--locked"

				          elif [[ $BUILD_TYPE == "debug" ]]; then

				            cov_prefix=""

				            CARGO_FLAGS="--locked"

				          elif [[ $BUILD_TYPE == "release" ]]; then

				            cov_prefix=""

				            CARGO_FLAGS="--locked --release"

				          fi

				          {

				            echo "cov_prefix=${cov_prefix}"

				            echo "CARGO_FEATURES=${CARGO_FEATURES}"

				            echo "CARGO_FLAGS=${CARGO_FLAGS}"

				            echo "CARGO_HOME=${GITHUB_WORKSPACE}/.cargo"

				          } >> $GITHUB_ENV

				      - name: Cache postgres v14 build

				        id: cache_pg_14

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_15

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_16

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Cache postgres v17 build

				        id: cache_pg_17

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v17

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v17_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Build postgres v14

				        if: steps.cache_pg_14.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v14 -j$(nproc)

				      - name: Build postgres v15

				        if: steps.cache_pg_15.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v15 -j$(nproc)

				      - name: Build postgres v16

				        if: steps.cache_pg_16.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v16 -j$(nproc)

				      - name: Build postgres v17

				        if: steps.cache_pg_17.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v17 -j$(nproc)

				      - name: Build neon extensions

				        run: mold -run make neon-pg-ext -j$(nproc)

				      - name: Build walproposer-lib

				        run: mold -run make walproposer-lib -j$(nproc)

				      - name: Run cargo build

				        run: |

				          ${cov_prefix} mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins --tests

				      # Do install *before* running rust tests because they might recompile the

				      # binaries with different features/flags.

				      - name: Install rust binaries

				        env:

				          ARCH: ${{ inputs.arch }}

				        run: |

				          # Install target binaries

				          mkdir -p /tmp/neon/bin/

				          binaries=$(

				            ${cov_prefix} cargo metadata $CARGO_FEATURES --format-version=1 --no-deps |

				            jq -r '.packages[].targets[] | select(.kind | index("bin")) | .name'

				          )

				          for bin in $binaries; do

				            SRC=target/$BUILD_TYPE/$bin

				            DST=/tmp/neon/bin/$bin

				            cp "$SRC" "$DST"

				          done

				          # Install test executables and write list of all binaries (for code coverage)

				          if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' ]]; then

				            # Keep bloated coverage data files away from the rest of the artifact

				            mkdir -p /tmp/coverage/

				            mkdir -p /tmp/neon/test_bin/

				            test_exe_paths=$(

				              ${cov_prefix} cargo test $CARGO_FLAGS $CARGO_FEATURES --message-format=json --no-run |

				              jq -r '.executable | select(. != null)'

				            )

				            for bin in $test_exe_paths; do

				              SRC=$bin

				              DST=/tmp/neon/test_bin/$(basename $bin)

				              # We don't need debug symbols for code coverage, so strip them out to make

				              # the artifact smaller.

				              strip "$SRC" -o "$DST"

				              echo "$DST" >> /tmp/coverage/binaries.list

				            done

				            for bin in $binaries; do

				              echo "/tmp/neon/bin/$bin" >> /tmp/coverage/binaries.list

				            done

				          fi

				      - name: Configure AWS credentials

				        uses: aws-actions/configure-aws-credentials@v4

				        with:

				          aws-region: eu-central-1

				          role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				          role-duration-seconds: 18000 # 5 hours

				      - name: Run rust tests

				        env:

				          NEXTEST_RETRIES: 3

				        run: |

				          LD_LIBRARY_PATH=$(pwd)/pg_install/v17/lib

				          export LD_LIBRARY_PATH

				          #nextest does not yet support running doctests

				          ${cov_prefix} cargo test --doc $CARGO_FLAGS $CARGO_FEATURES

				          # run all non-pageserver tests

				          ${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E '!package(pageserver)'

				          # run pageserver tests with different settings

				          for get_vectored_concurrent_io in sequential sidecar-task; do

				            for io_engine in std-fs tokio-epoll-uring ; do

				              NEON_PAGESERVER_UNIT_TEST_GET_VECTORED_CONCURRENT_IO=$get_vectored_concurrent_io \

				                NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IOENGINE=$io_engine \

				                ${cov_prefix} \

				                cargo nextest run $CARGO_FLAGS $CARGO_FEATURES  -E 'package(pageserver)'

				            done

				          done

				          # Run separate tests for real S3

				          export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty

				          export REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests

				          export REMOTE_STORAGE_S3_REGION=eu-central-1

				          ${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(remote_storage)' -E 'test(test_real_s3)'

				          # Run separate tests for real Azure Blob Storage

				          # XXX: replace region with `eu-central-1`-like region

				          export ENABLE_REAL_AZURE_REMOTE_STORAGE=y

				          export AZURE_STORAGE_ACCOUNT="${{ secrets.AZURE_STORAGE_ACCOUNT_DEV }}"

				          export AZURE_STORAGE_ACCESS_KEY="${{ secrets.AZURE_STORAGE_ACCESS_KEY_DEV }}"

				          export REMOTE_STORAGE_AZURE_CONTAINER="${{ vars.REMOTE_STORAGE_AZURE_CONTAINER }}"

				          export REMOTE_STORAGE_AZURE_REGION="${{ vars.REMOTE_STORAGE_AZURE_REGION }}"

				          ${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(remote_storage)' -E 'test(test_real_azure)'

				      - name: Install postgres binaries

				        run: |

				          # Use tar to copy files matching the pattern, preserving the paths in the destionation

				          tar c \

				            pg_install/v* \

				            pg_install/build/*/src/test/regress/*.so \

				            pg_install/build/*/src/test/regress/pg_regress \

				            pg_install/build/*/src/test/isolation/isolationtester \

				            pg_install/build/*/src/test/isolation/pg_isolation_regress \

				            | tar  x -C /tmp/neon

				      - name: Upload Neon artifact

				        uses: ./.github/actions/upload

				        with:

				          name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-artifact

				          path: /tmp/neon

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      # XXX: keep this after the binaries.list is formed, so the coverage can properly work later

				      - name: Merge and upload coverage data

				        if: inputs.build-type == 'debug'

				        uses: ./.github/actions/save-coverage-data

				  regress-tests:

				    # Don't run regression tests on debug arm64 builds

				    if: inputs.build-type != 'debug' || inputs.arch != 'arm64'

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      contents: read

				      statuses: write

				    needs: [ build-neon ]

				    runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      # for changed limits, see comments on `options:` earlier in this file

				      options: --init --shm-size=512mb --ulimit memlock=67108864:67108864

				    strategy:

				      fail-fast: false

				      matrix: ${{ fromJSON(format('{{"include":{0}}}', inputs.test-cfg)) }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Pytest regression tests

				        continue-on-error: ${{ matrix.lfc_state == 'with-lfc' && inputs.build-type == 'debug' }}

				        uses: ./.github/actions/run-python-test-set

				        timeout-minutes: 60

				        with:

				          build_type: ${{ inputs.build-type }}

				          test_selection: regress

				          needs_postgres_source: true

				          run_with_real_s3: true

				          real_s3_bucket: neon-github-ci-tests

				          real_s3_region: eu-central-1

				          rerun_failed: true

				          pg_version: ${{ matrix.pg_version }}

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          TEST_RESULT_CONNSTR: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				          CHECK_ONDISK_DATA_COMPATIBILITY: nonempty

				          BUILD_TAG: ${{ inputs.build-tag }}

				          PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring

				          PAGESERVER_GET_VECTORED_CONCURRENT_IO: sidecar-task

				          USE_LFC: ${{ matrix.lfc_state == 'with-lfc' && 'true' || 'false' }}

				      # Temporary disable this step until we figure out why it's so flaky

				      # Ref https://github.com/neondatabase/neon/issues/4540

				      - name: Merge and upload coverage data

				        if: |

				          false &&

				          inputs.build-type == 'debug' && matrix.pg_version == 'v16'

				        uses: ./.github/actions/save-coverage-data

									
										37

.github/workflows/_check-codestyle-python.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,37 @@

				name: Check Codestyle Python

				on:

				  workflow_call:

				    inputs:

				      build-tools-image:

				        description: 'build-tools image'

				        required: true

				        type: string

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				jobs:

				  check-codestyle-python:

				    runs-on: [ self-hosted, small ]

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/cache@v4

				        with:

				          path: ~/.cache/pypoetry/virtualenvs

				          key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}

				      - run: ./scripts/pysync

				      - run: poetry run ruff check .

				      - run: poetry run ruff format --check .

				      - run: poetry run mypy .

									
										91

.github/workflows/_check-codestyle-rust.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,91 @@

				name: Check Codestyle Rust

				on:

				  workflow_call:

				    inputs:

				      build-tools-image:

				        description: "build-tools image"

				        required: true

				        type: string

				      archs:

				        description: "Json array of architectures to run on"

				        type: string

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				jobs:

				  check-codestyle-rust:

				    strategy:

				      matrix:

				        arch: ${{ fromJson(inputs.archs) }}

				    runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'small-arm64' || 'small')) }}

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Cache cargo deps

				        uses: actions/cache@v4

				        with:

				          path: |

				            ~/.cargo/registry

				            !~/.cargo/registry/src

				            ~/.cargo/git

				            target

				          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				      # Some of our rust modules use FFI and need those to be checked

				      - name: Get postgres headers

				        run: make postgres-headers -j$(nproc)

				      # cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.

				      # This will catch compiler & clippy warnings in all feature combinations.

				      # TODO: use cargo hack for build and test as well, but, that's quite expensive.

				      # NB: keep clippy args in sync with ./run_clippy.sh

				      #

				      # The only difference between "clippy --debug" and "clippy --release" is that in --release mode,

				      # #[cfg(debug_assertions)] blocks are not built. It's not worth building everything for second

				      # time just for that, so skip "clippy --release".

				      - run: |

				          CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"

				          if [ "$CLIPPY_COMMON_ARGS" = "" ]; then

				            echo "No clippy args found in .neon_clippy_args"

				            exit 1

				          fi

				          echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV

				      - name: Run cargo clippy (debug)

				        run: cargo hack --features default --ignore-unknown-features --feature-powerset clippy $CLIPPY_COMMON_ARGS

				      - name: Check documentation generation

				        run: cargo doc --workspace --no-deps --document-private-items

				        env:

				          RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"

				      # Use `${{ !cancelled() }}` to run quck tests after the longer clippy run

				      - name: Check formatting

				        if: ${{ !cancelled() }}

				        run: cargo fmt --all -- --check

				      # https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci

				      - name: Check rust dependencies

				        if: ${{ !cancelled() }}

				        run: |

				          cargo hakari generate --diff  # workspace-hack Cargo.toml is up-to-date

				          cargo hakari manage-deps --dry-run  # all workspace crates depend on workspace-hack

				      # https://github.com/EmbarkStudios/cargo-deny

				      - name: Check rust licenses/bans/advisories/sources

				        if: ${{ !cancelled() }}

				        run: cargo deny check --hide-inclusion-graph

									
										79

.github/workflows/_create-release-pr.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,79 @@

				name: Create Release PR

				on:

				  workflow_call:

				    inputs:

				      component-name:

				        description: 'Component name'

				        required: true

				        type: string

				      release-branch:

				        description: 'Release branch'

				        required: true

				        type: string

				    secrets:

				      ci-access-token:

				        description: 'CI access token'

				        required: true

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				jobs:

				  create-release-branch:

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # for `git push`

				    steps:

				    - uses: actions/checkout@v4

				      with:

				        ref: main

				    - name: Set variables

				      id: vars

				      env:

				        COMPONENT_NAME: ${{ inputs.component-name }}

				        RELEASE_BRANCH: ${{ inputs.release-branch }}

				      run: |

				        today=$(date +'%Y-%m-%d')

				        echo "title=${COMPONENT_NAME} release ${today}" | tee -a ${GITHUB_OUTPUT}

				        echo "rc-branch=rc/${RELEASE_BRANCH}/${today}"  | tee -a ${GITHUB_OUTPUT}

				    - name: Configure git

				      run: |

				        git config user.name "github-actions[bot]"

				        git config user.email "41898282+github-actions[bot]@users.noreply.github.com"

				    - name: Create RC branch

				      env:

				        RC_BRANCH: ${{ steps.vars.outputs.rc-branch }}

				        TITLE: ${{ steps.vars.outputs.title }}

				      run: |

				        git checkout -b "${RC_BRANCH}"

				        # create an empty commit to distinguish workflow runs

				        # from other possible releases from the same commit

				        git commit --allow-empty -m "${TITLE}"

				        git push origin "${RC_BRANCH}"

				    - name: Create a PR into ${{ inputs.release-branch }}

				      env:

				        GH_TOKEN: ${{ secrets.ci-access-token }}

				        RC_BRANCH: ${{ steps.vars.outputs.rc-branch }}

				        RELEASE_BRANCH: ${{ inputs.release-branch }}

				        TITLE: ${{ steps.vars.outputs.title }}

				      run: |

				        cat << EOF > body.md

				          ## ${TITLE}

				          **Please merge this Pull Request using 'Create a merge commit' button**

				        EOF

				        gh pr create --title "${TITLE}" \

				                     --body-file "body.md" \

				                     --head "${RC_BRANCH}" \

				                     --base "${RELEASE_BRANCH}"

									
										56

.github/workflows/_push-to-acr.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,56 @@

				name: Push images to ACR

				on:

				  workflow_call:

				    inputs:

				      client_id:

				        description: Client ID of Azure managed identity or Entra app

				        required: true

				        type: string

				      image_tag:

				        description: Tag for the container image

				        required: true

				        type: string

				      images:

				        description: Images to push

				        required: true

				        type: string

				      registry_name:

				        description: Name of the container registry

				        required: true

				        type: string

				      subscription_id:

				        description: Azure subscription ID

				        required: true

				        type: string

				      tenant_id:

				        description: Azure tenant ID

				        required: true

				        type: string

				jobs:

				  push-to-acr:

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read  # This is required for actions/checkout

				      id-token: write # This is required for Azure Login to work.

				    steps:

				      - name: Azure login

				        uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a  # @v2.1.1

				        with:

				          client-id: ${{ inputs.client_id }}

				          subscription-id: ${{ inputs.subscription_id }}

				          tenant-id: ${{ inputs.tenant_id }}

				      - name: Login to ACR

				        run: |

				          az acr login --name=${{ inputs.registry_name }}

				      - name: Copy docker images to ACR ${{ inputs.registry_name }}

				        run: |

				          images='${{ inputs.images }}'

				          for image in ${images}; do

				            docker buildx imagetools create \

				              -t ${{ inputs.registry_name }}.azurecr.io/neondatabase/${image}:${{ inputs.image_tag }} \

				                                                        neondatabase/${image}:${{ inputs.image_tag }}

				          done

									
										4

.github/workflows/actionlint.yml
									
										vendored
									
												View File
												
				@@ -33,7 +33,7 @@ jobs:

				          # SC2086 - Double quote to prevent globbing and word splitting. - https://www.shellcheck.net/wiki/SC2086

				          SHELLCHECK_OPTS: --exclude=SC2046,SC2086

				        with:

				          fail_on_error: true

				          fail_level: error

				          filter_mode: nofilter

				          level: error

				@@ -44,7 +44,7 @@ jobs:

				            grep -ERl $PAT .github/workflows |\

				            while read -r f

				            do

				              l=$(grep -nE $PAT .github/workflows/release.yml | awk -F: '{print $1}' | head -1)

				              l=$(grep -nE $PAT $f | awk -F: '{print $1}' | head -1)

				              echo "::error file=$f,line=$l::Please use 'ubuntu-22.04' instead of 'ubuntu-latest'"

				            done

				            exit 1

649

.github/workflows/benchmarking.yml vendored

View File

File diff suppressed because it is too large Load Diff

									
										154

.github/workflows/build-build-tools-image.yml
									
										vendored
									
												View File
												
				@@ -3,32 +3,86 @@ name: Build build-tools image

				on:

				  workflow_call:

				    inputs:

				      image-tag:

				        description: "build-tools image tag"

				        required: true

				      archs:

				        description: "Json array of architectures to build"

				        # Default values are set in `check-image` job, `set-variables` step

				        type: string

				        required: false

				      debians:

				        description: "Json array of Debian versions to build"

				        # Default values are set in `check-image` job, `set-variables` step

				        type: string

				        required: false

				    outputs:

				      image-tag:

				        description: "build-tools tag"

				        value: ${{ inputs.image-tag }}

				        value: ${{ jobs.check-image.outputs.tag }}

				      image:

				        description: "build-tools image"

				        value: neondatabase/build-tools:${{ inputs.image-tag }}

				        value: neondatabase/build-tools:${{ jobs.check-image.outputs.tag }}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				concurrency:

				  group: build-build-tools-image-${{ inputs.image-tag }}

				  cancel-in-progress: false

				# The initial idea was to prevent the waste of resources by not re-building the `build-tools` image

				# for the same tag in parallel workflow runs, and queue them to be skipped once we have

				# the first image pushed to Docker registry, but GitHub's concurrency mechanism is not working as expected.

				# GitHub can't have more than 1 job in a queue and removes the previous one, it causes failures if the dependent jobs.

				#

				# Ref https://github.com/orgs/community/discussions/41518

				#

				# concurrency:

				#   group: build-build-tools-image-${{ inputs.image-tag }}

				#   cancel-in-progress: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-image:

				    uses: ./.github/workflows/check-build-tools-image.yml

				    runs-on: ubuntu-22.04

				    outputs:

				      archs: ${{ steps.set-variables.outputs.archs }}

				      debians: ${{ steps.set-variables.outputs.debians }}

				      tag: ${{ steps.set-variables.outputs.image-tag }}

				      everything: ${{ steps.set-more-variables.outputs.everything }}

				      found: ${{ steps.set-more-variables.outputs.found }}

				    steps:

				      - uses: actions/checkout@v4

				      - name: Set variables

				        id: set-variables

				        env:

				          ARCHS: ${{ inputs.archs || '["x64","arm64"]' }}

				          DEBIANS: ${{ inputs.debians || '["bullseye","bookworm"]' }}

				          IMAGE_TAG: |

				            ${{ hashFiles('build-tools.Dockerfile',

				                          '.github/workflows/build-build-tools-image.yml') }}

				        run: |

				          echo "archs=${ARCHS}"           | tee -a ${GITHUB_OUTPUT}

				          echo "debians=${DEBIANS}"       | tee -a ${GITHUB_OUTPUT}

				          echo "image-tag=${IMAGE_TAG}"   | tee -a ${GITHUB_OUTPUT}

				      - name: Set more variables

				        id: set-more-variables

				        env:

				          IMAGE_TAG: ${{ steps.set-variables.outputs.image-tag }}

				          EVERYTHING: |

				            ${{ contains(fromJson(steps.set-variables.outputs.archs), 'x64') &&

				                contains(fromJson(steps.set-variables.outputs.archs), 'arm64') &&

				                contains(fromJson(steps.set-variables.outputs.debians), 'bullseye') &&

				                contains(fromJson(steps.set-variables.outputs.debians), 'bookworm') }}

				        run: |

				          if docker manifest inspect neondatabase/build-tools:${IMAGE_TAG}; then

				            found=true

				          else

				            found=false

				          fi

				          echo "everything=${EVERYTHING}" | tee -a ${GITHUB_OUTPUT}

				          echo "found=${found}"           | tee -a ${GITHUB_OUTPUT}

				  build-image:

				    needs: [ check-image ]

				@@ -36,62 +90,48 @@ jobs:

				    strategy:

				      matrix:

				        arch: [ x64, arm64 ]

				        arch: ${{ fromJson(needs.check-image.outputs.archs) }}

				        debian: ${{ fromJson(needs.check-image.outputs.debians) }}

				    runs-on: ${{ fromJson(format('["self-hosted", "gen3", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    env:

				      IMAGE_TAG: ${{ inputs.image-tag }}

				    runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    steps:

				      - name: Check `input.tag` is correct

				        env:

				          INPUTS_IMAGE_TAG: ${{ inputs.image-tag }}

				          CHECK_IMAGE_TAG : ${{ needs.check-image.outputs.image-tag }}

				        run: |

				          if [ "${INPUTS_IMAGE_TAG}" != "${CHECK_IMAGE_TAG}" ]; then

				            echo "'inputs.image-tag' (${INPUTS_IMAGE_TAG}) does not match the tag of the latest build-tools image 'inputs.image-tag' (${CHECK_IMAGE_TAG})"

				            exit 1

				          fi

				      - uses: actions/checkout@v4

				      # Use custom DOCKER_CONFIG directory to avoid conflicts with default settings

				      # The default value is ~/.docker

				      - name: Set custom docker config directory

				        run: |

				          mkdir -p /tmp/.docker-custom

				          echo DOCKER_CONFIG=/tmp/.docker-custom >> $GITHUB_ENV

				      - uses: neondatabase/dev-actions/set-docker-config-dir@6094485bf440001c94a94a3f9e221e81ff6b6193

				      - uses: docker/setup-buildx-action@v3

				        with:

				          cache-binary: false

				      - uses: docker/setup-buildx-action@v2

				      - uses: docker/login-action@v2

				      - uses: docker/login-action@v3

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - uses: docker/build-push-action@v4

				      - uses: docker/login-action@v3

				        with:

				          registry: cache.neon.build

				          username: ${{ secrets.NEON_CI_DOCKERCACHE_USERNAME }}

				          password: ${{ secrets.NEON_CI_DOCKERCACHE_PASSWORD }}

				      - uses: docker/build-push-action@v6

				        with:

				          file: build-tools.Dockerfile

				          context: .

				          provenance: false

				          push: true

				          pull: true

				          file: Dockerfile.build-tools

				          cache-from: type=registry,ref=neondatabase/build-tools:cache-${{ matrix.arch }}

				          cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=neondatabase/build-tools:cache-{0},mode=max', matrix.arch) || '' }}

				          tags: neondatabase/build-tools:${{ inputs.image-tag }}-${{ matrix.arch }}

				      - name: Remove custom docker config directory

				        run: |

				          rm -rf /tmp/.docker-custom

				          build-args: |

				            DEBIAN_VERSION=${{ matrix.debian }}

				          cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.debian }}-${{ matrix.arch }}

				          cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0}-{1},mode=max', matrix.debian, matrix.arch) || '' }}

				          tags: |

				            neondatabase/build-tools:${{ needs.check-image.outputs.tag }}-${{ matrix.debian }}-${{ matrix.arch }}

				  merge-images:

				    needs: [ build-image ]

				    needs: [ check-image, build-image ]

				    runs-on: ubuntu-22.04

				    env:

				      IMAGE_TAG: ${{ inputs.image-tag }}

				    steps:

				      - uses: docker/login-action@v3

				        with:

				@@ -99,7 +139,23 @@ jobs:

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - name: Create multi-arch image

				        env:

				          DEFAULT_DEBIAN_VERSION: bookworm

				          ARCHS: ${{ join(fromJson(needs.check-image.outputs.archs), ' ') }}

				          DEBIANS: ${{ join(fromJson(needs.check-image.outputs.debians), ' ') }}

				          EVERYTHING: ${{ needs.check-image.outputs.everything }}

				          IMAGE_TAG: ${{ needs.check-image.outputs.tag }}

				        run: |

				          docker buildx imagetools create -t neondatabase/build-tools:${IMAGE_TAG} \

				                                             neondatabase/build-tools:${IMAGE_TAG}-x64 \

				                                             neondatabase/build-tools:${IMAGE_TAG}-arm64

				          for debian in ${DEBIANS}; do

				            tags=("-t" "neondatabase/build-tools:${IMAGE_TAG}-${debian}")

				            if [ "${EVERYTHING}" == "true" ] && [ "${debian}" == "${DEFAULT_DEBIAN_VERSION}" ]; then

				              tags+=("-t" "neondatabase/build-tools:${IMAGE_TAG}")

				            fi

				            for arch in ${ARCHS}; do

				              tags+=("neondatabase/build-tools:${IMAGE_TAG}-${debian}-${arch}")

				            done

				            docker buildx imagetools create "${tags[@]}"

				          done

									
										241

.github/workflows/build-macos.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,241 @@

				name: Check neon with MacOS builds

				on:

				  workflow_call:

				    inputs:

				      pg_versions:

				        description: "Array of the pg versions to build for, for example: ['v14', 'v17']"

				        type: string

				        default: '[]'

				        required: false

				      rebuild_rust_code:

				        description: "Rebuild Rust code"

				        type: boolean

				        default: false

				        required: false

				      rebuild_everything:

				        description: "If true, rebuild for all versions"

				        type: boolean

				        default: false

				        required: false

				env:

				  RUST_BACKTRACE: 1

				  COPT: '-Werror'

				# TODO: move `check-*` and `files-changed` jobs to the "Caller" Workflow

				# We should care about that as Github has limitations:

				# - You can connect up to four levels of workflows

				# - You can call a maximum of 20 unique reusable workflows from a single workflow file.

				# https://docs.github.com/en/actions/sharing-automations/reusing-workflows#limitations

				jobs:

				  build-pgxn:

				    if: |

				      (inputs.pg_versions != '[]' || inputs.rebuild_everything) && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    timeout-minutes: 30

				    runs-on: macos-15

				    strategy:

				      matrix:

				        postgres-version: ${{ inputs.rebuild_everything && fromJson('["v14", "v15", "v16", "v17"]') || fromJSON(inputs.pg_versions) }}

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Checkout main repo

				        uses: actions/checkout@v4

				      - name: Set pg ${{ matrix.postgres-version }} for caching

				        id: pg_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-${{ matrix.postgres-version }}) | tee -a "${GITHUB_OUTPUT}"

				      - name: Cache postgres ${{ matrix.postgres-version }} build

				        id: cache_pg

				        uses: actions/cache@v4

				        with:

				          path: pg_install/${{ matrix.postgres-version }}

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ matrix.postgres-version }}-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Checkout submodule vendor/postgres-${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          git submodule init vendor/postgres-${{ matrix.postgres-version }}

				          git submodule update --depth 1 --recursive

				      - name: Install build dependencies

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          brew install flex bison openssl protobuf icu4c

				      - name: Set extra env for macOS

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Build Postgres ${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          make postgres-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)

				      - name: Build Neon Pg Ext ${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          make "neon-pg-ext-${{ matrix.postgres-version }}" -j$(sysctl -n hw.ncpu)

				      - name: Get postgres headers ${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          make postgres-headers-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)

				  build-walproposer-lib:

				    if: |

				      (inputs.pg_versions != '[]' || inputs.rebuild_everything) && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    timeout-minutes: 30

				    runs-on: macos-15

				    needs: [build-pgxn]

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Checkout main repo

				        uses: actions/checkout@v4

				      - name: Set pg v17 for caching

				        id: pg_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"

				      - name: Cache postgres v17 build

				        id: cache_pg

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v17

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache walproposer-lib

				        id: cache_walproposer_lib

				        uses: actions/cache@v4

				        with:

				          path: pg_install/build/walproposer-lib

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Checkout submodule vendor/postgres-v17

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run: |

				          git submodule init vendor/postgres-v17

				          git submodule update --depth 1 --recursive

				      - name: Install build dependencies

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run: |

				          brew install flex bison openssl protobuf icu4c

				      - name: Set extra env for macOS

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Build walproposer-lib (only for v17)

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run:

				          make walproposer-lib -j$(sysctl -n hw.ncpu)

				  cargo-build:

				    if: |

				      (inputs.pg_versions != '[]' || inputs.rebuild_rust_code || inputs.rebuild_everything) && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    timeout-minutes: 30

				    runs-on: macos-15

				    needs: [build-pgxn, build-walproposer-lib]

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Checkout main repo

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Set pg v14 for caching

				        id: pg_rev_v14

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) | tee -a "${GITHUB_OUTPUT}"

				      - name: Set pg v15 for caching

				        id: pg_rev_v15

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) | tee -a "${GITHUB_OUTPUT}"

				      - name: Set pg v16 for caching

				        id: pg_rev_v16

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) | tee -a "${GITHUB_OUTPUT}"

				      - name: Set pg v17 for caching

				        id: pg_rev_v17

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"

				      - name: Cache postgres v14 build

				        id: cache_pg

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v14-${{ steps.pg_rev_v14.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_v15

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v15-${{ steps.pg_rev_v15.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_v16

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v16-${{ steps.pg_rev_v16.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v17 build

				        id: cache_pg_v17

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v17

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache cargo deps (only for v17)

				        uses: actions/cache@v4

				        with:

				          path: |

				            ~/.cargo/registry

				            !~/.cargo/registry/src

				            ~/.cargo/git

				            target

				          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				      - name: Cache walproposer-lib

				        id: cache_walproposer_lib

				        uses: actions/cache@v4

				        with:

				          path: pg_install/build/walproposer-lib

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Install build dependencies

				        run: |

				          brew install flex bison openssl protobuf icu4c

				      - name: Set extra env for macOS

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Run cargo build (only for v17)

				        run: cargo build --all --release -j$(sysctl -n hw.ncpu)

				      - name: Check that no warnings are produced (only for v17)

				        run: ./run_clippy.sh

1295

.github/workflows/build_and_test.yml vendored

View File

File diff suppressed because it is too large Load Diff

									
										51

.github/workflows/check-build-tools-image.yml
									
										vendored
									
												View File
											
				@@ -1,51 +0,0 @@

				name: Check build-tools image

				on:

				  workflow_call:

				    outputs:

				      image-tag:

				        description: "build-tools image tag"

				        value: ${{ jobs.check-image.outputs.tag }}

				      found:

				        description: "Whether the image is found in the registry"

				        value: ${{ jobs.check-image.outputs.found }}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-image:

				    runs-on: ubuntu-22.04

				    outputs:

				      tag: ${{ steps.get-build-tools-tag.outputs.image-tag }}

				      found: ${{ steps.check-image.outputs.found }}

				    steps:

				      - uses: actions/checkout@v4

				      - name: Get build-tools image tag for the current commit

				        id: get-build-tools-tag

				        env:

				          IMAGE_TAG: |

				            ${{ hashFiles('Dockerfile.build-tools',

				                          '.github/workflows/check-build-tools-image.yml',

				                          '.github/workflows/build-build-tools-image.yml') }}

				        run: |

				          echo "image-tag=${IMAGE_TAG}" | tee -a $GITHUB_OUTPUT

				      - name: Check if such tag found in the registry

				        id: check-image

				        env:

				          IMAGE_TAG: ${{ steps.get-build-tools-tag.outputs.image-tag }}

				        run: |

				          if docker manifest inspect neondatabase/build-tools:${IMAGE_TAG}; then

				            found=true

				          else

				            found=false

				          fi

				          echo "found=${found}" | tee -a $GITHUB_OUTPUT

									
										130

.github/workflows/cloud-regress.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,130 @@

				name: Cloud Regression Test

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '45 1 * * *' # run once a day, timezone is utc

				  workflow_dispatch: # adds ability to run this manually

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				concurrency:

				  # Allow only one workflow

				  group: ${{ github.workflow }}

				  cancel-in-progress: true

				permissions:

				  id-token: write # aws-actions/configure-aws-credentials

				  statuses: write

				  contents: write

				jobs:

				  regress:

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				    strategy:

				      fail-fast: false

				      matrix:

				        pg-version: [16, 17]

				    runs-on: us-east-2

				    container:

				      image: neondatabase/build-tools:pinned-bookworm

				      options: --init

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Patch the test

				        env:

				          PG_VERSION: ${{matrix.pg-version}}

				        run: |

				          cd "vendor/postgres-v${PG_VERSION}"

				          patch -p1 < "../../compute/patches/cloud_regress_pg${PG_VERSION}.patch"

				      - name: Generate a random password

				        id: pwgen

				        run: |

				          set +x

				          DBPASS=$(dd if=/dev/random bs=48 count=1 2>/dev/null | base64)

				          echo "::add-mask::${DBPASS//\//}"

				          echo DBPASS="${DBPASS//\//}" >> "${GITHUB_OUTPUT}"

				      - name: Change tests according to the generated password

				        env:

				          DBPASS: ${{ steps.pwgen.outputs.DBPASS }}

				          PG_VERSION: ${{matrix.pg-version}}

				        run: |

				          cd vendor/postgres-v"${PG_VERSION}"/src/test/regress

				          for fname in sql/*.sql expected/*.out; do

				            sed -i.bak s/NEON_PASSWORD_PLACEHOLDER/"'${DBPASS}'"/ "${fname}"

				          done

				          for ph in $(grep NEON_MD5_PLACEHOLDER expected/password.out | awk '{print $3;}' | sort | uniq); do

				            USER=$(echo "${ph}" | cut -c 22-)

				            MD5=md5$(echo -n "${DBPASS}${USER}" | md5sum | awk '{print $1;}')

				            sed -i.bak "s/${ph}/${MD5}/" expected/password.out

				          done

				      - name: Download Neon artifact

				        uses: ./.github/actions/download

				        with:

				          name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				          path: /tmp/neon/

				          prefix: latest

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      - name: Create a new branch

				        id: create-branch

				        uses: ./.github/actions/neon-branch-create

				        with:

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				          project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}

				      - name: Run the regression tests

				        uses: ./.github/actions/run-python-test-set

				        with:

				          build_type: ${{ env.BUILD_TYPE }}

				          test_selection: cloud_regress

				          pg_version: ${{matrix.pg-version}}

				          extra_params: -m remote_cluster

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          BENCHMARK_CONNSTR: ${{steps.create-branch.outputs.dsn}}

				      - name: Delete branch

				        if: always()

				        uses: ./.github/actions/neon-branch-delete

				        with:

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				          project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}

				          branch_id: ${{steps.create-branch.outputs.branch_id}}

				      - name: Create Allure report

				        id: create-allure-report

				        if: ${{ !cancelled() }}

				        uses: ./.github/actions/allure-report-generate

				        with:

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      - name: Post to a Slack channel

				        if: ${{ github.event.schedule && failure() }}

				        uses: slackapi/slack-github-action@v1

				        with:

				          channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}

				          slack-message: |

				            Periodic pg_regress on staging: ${{ job.status }}

				            <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				            <${{ steps.create-allure-report.outputs.report-url }}|Allure report>

				        env:

				          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

									
										182

.github/workflows/ingest_benchmark.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,182 @@

				name: benchmarking ingest

				on:

				  # uncomment to run on push for debugging your PR

				  # push:

				  #   branches: [ your branch ]

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:   '0 9 * * *' # run once a day, timezone is utc

				  workflow_dispatch: # adds ability to run this manually

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				concurrency:

				  # Allow only one workflow globally because we need dedicated resources which only exist once

				  group: ingest-bench-workflow

				  cancel-in-progress: true

				jobs:

				  ingest:

				    strategy:

				      fail-fast: false # allow other variants to continue even if one fails

				      matrix:

				        include:

				          - target_project: new_empty_project_stripe_size_2048 

				            stripe_size: 2048 # 16 MiB

				            postgres_version: 16

				          - target_project: new_empty_project_stripe_size_32768

				            stripe_size: 32768 # 256 MiB # note that this is different from null because using null will shard_split the project only if it reaches the threshold

				                               # while here it is sharded from the beginning with a shard size of 256 MiB

				            postgres_version: 16

				          - target_project: new_empty_project

				            stripe_size: null # run with neon defaults which will shard split only when reaching the threshold

				            postgres_version: 16

				          - target_project: new_empty_project

				            stripe_size: null # run with neon defaults which will shard split only when reaching the threshold

				            postgres_version: 17

				          - target_project: large_existing_project

				            stripe_size: null # cannot re-shared or choose different stripe size for existing, already sharded project

				            postgres_version: 16

				      max-parallel: 1 # we want to run each stripe size sequentially to be able to compare the results

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    env:

				      PG_CONFIG: /tmp/neon/pg_install/v16/bin/pg_config

				      PSQL: /tmp/neon/pg_install/v16/bin/psql

				      PG_16_LIB_PATH: /tmp/neon/pg_install/v16/lib

				      PGCOPYDB: /pgcopydb/bin/pgcopydb

				      PGCOPYDB_LIB_PATH: /pgcopydb/lib

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    timeout-minutes: 1440

				    steps:

				    - uses: actions/checkout@v4

				    - name: Configure AWS credentials # necessary to download artefacts

				      uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Create Neon Project

				      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}

				      id: create-neon-project-ingest-target

				      uses: ./.github/actions/neon-project-create

				      with:

				        region_id: aws-us-east-2

				        postgres_version: ${{ matrix.postgres_version }}

				        compute_units: '[7, 7]' # we want to test large compute here to avoid compute-side bottleneck

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				        shard_split_project: ${{ matrix.stripe_size != null && 'true' || 'false' }}

				        admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }} 

				        shard_count: 8

				        stripe_size: ${{ matrix.stripe_size }}

				    - name: Initialize Neon project

				      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}

				      env:

				          BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-project-ingest-target.outputs.dsn }}

				          NEW_PROJECT_ID: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}

				      run: |

				        echo "Initializing Neon project with project_id: ${NEW_PROJECT_ID}"

				        export LD_LIBRARY_PATH=${PG_16_LIB_PATH}

				        ${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"

				        echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV

				    - name: Create Neon Branch for large tenant

				      if: ${{ matrix.target_project == 'large_existing_project' }}

				      id: create-neon-branch-ingest-target

				      uses: ./.github/actions/neon-branch-create

				      with:

				        project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Initialize Neon project

				      if: ${{ matrix.target_project == 'large_existing_project' }}

				      env:

				          BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-branch-ingest-target.outputs.dsn }}

				          NEW_BRANCH_ID: ${{ steps.create-neon-branch-ingest-target.outputs.branch_id }}

				      run: |

				        echo "Initializing Neon branch with branch_id: ${NEW_BRANCH_ID}"

				        export LD_LIBRARY_PATH=${PG_16_LIB_PATH}

				        # Extract the part before the database name

				        base_connstr="${BENCHMARK_INGEST_TARGET_CONNSTR%/*}"

				        # Extract the query parameters (if any) after the database name

				        query_params="${BENCHMARK_INGEST_TARGET_CONNSTR#*\?}"

				        # Reconstruct the new connection string

				        if [ "$query_params" != "$BENCHMARK_INGEST_TARGET_CONNSTR" ]; then

				          new_connstr="${base_connstr}/neondb?${query_params}"

				        else

				          new_connstr="${base_connstr}/neondb"

				        fi

				        ${PSQL} "${new_connstr}" -c "drop database ludicrous;"

				        ${PSQL} "${new_connstr}" -c "CREATE DATABASE ludicrous;"

				        if [ "$query_params" != "$BENCHMARK_INGEST_TARGET_CONNSTR" ]; then

				          BENCHMARK_INGEST_TARGET_CONNSTR="${base_connstr}/ludicrous?${query_params}"

				        else

				          BENCHMARK_INGEST_TARGET_CONNSTR="${base_connstr}/ludicrous"

				        fi

				        ${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"

				        echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV

				    - name: Invoke pgcopydb

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: remote

				        test_selection: performance/test_perf_ingest_using_pgcopydb.py

				        run_in_parallel: false

				        extra_params: -s -m remote_cluster --timeout 86400 -k test_ingest_performance_using_pgcopydb

				        pg_version: v${{ matrix.postgres_version }}

				        save_perf_report: true

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_INGEST_SOURCE_CONNSTR: ${{ secrets.BENCHMARK_INGEST_SOURCE_CONNSTR }}

				        TARGET_PROJECT_TYPE: ${{ matrix.target_project }}

				        # we report PLATFORM in zenbenchmark NeonBenchmarker perf database and want to distinguish between new project and large tenant

				        PLATFORM: "${{ matrix.target_project }}-us-east-2-staging"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				    - name: show tables sizes after ingest

				      run: |

				        export LD_LIBRARY_PATH=${PG_16_LIB_PATH}

				        ${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "\dt+"

				    - name: Delete Neon Project

				      if: ${{ always() && startsWith(matrix.target_project, 'new_empty_project') }}

				      uses: ./.github/actions/neon-project-delete

				      with:

				        project_id: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Delete Neon Branch for large tenant

				      if: ${{ always() && matrix.target_project == 'large_existing_project' }}

				      uses: ./.github/actions/neon-branch-delete

				      with:

				        project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}

				        branch_id: ${{ steps.create-neon-branch-ingest-target.outputs.branch_id }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

									
										78

.github/workflows/label-for-external-users.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,78 @@

				name: Add `external` label to issues and PRs created by external users

				on:

				  issues:

				    types:

				      - opened

				  pull_request_target:

				    types:

				      - opened

				  workflow_dispatch:

				    inputs:

				      github-actor:

				        description: 'GitHub username. If empty, the username of the current user will be used'

				        required: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				env:

				  LABEL: external

				jobs:

				  check-user:

				    runs-on: ubuntu-22.04

				    outputs:

				      is-member: ${{ steps.check-user.outputs.is-member }}

				    steps:

				    - name: Check whether `${{ github.actor }}` is a member of `${{ github.repository_owner }}`

				      id: check-user

				      env:

				        GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        ACTOR: ${{ inputs.github-actor || github.actor }}

				      run: |

				        expected_error="User does not exist or is not a member of the organization"

				        output_file=output.txt

				        for i in $(seq 1 10); do

				          if gh api "/orgs/${GITHUB_REPOSITORY_OWNER}/members/${ACTOR}" \

				              -H "Accept: application/vnd.github+json" \

				              -H "X-GitHub-Api-Version: 2022-11-28" > ${output_file}; then

				            is_member=true

				            break

				          elif grep -q "${expected_error}" ${output_file}; then

				            is_member=false

				            break

				          elif [ $i -eq 10 ]; then

				            title="Failed to get memmbership status for ${ACTOR}"

				            message="The latest GitHub API error message: '$(cat ${output_file})'"

				            echo "::error file=.github/workflows/label-for-external-users.yml,title=${title}::${message}"

				            exit 1

				          fi

				          sleep 1

				        done

				        echo "is-member=${is_member}" | tee -a ${GITHUB_OUTPUT}

				  add-label:

				    if: needs.check-user.outputs.is-member == 'false'

				    needs: [ check-user ]

				    runs-on: ubuntu-22.04

				    permissions:

				      pull-requests: write # for `gh pr edit`

				      issues: write        # for `gh issue edit`

				    steps:

				    - name: Add `${{ env.LABEL }}` label

				      env:

				        GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        ITEM_NUMBER: ${{ github.event[github.event_name == 'pull_request_target' && 'pull_request' || 'issue'].number }}

				        GH_CLI_COMMAND: ${{ github.event_name == 'pull_request_target' && 'pr' || 'issue' }}

				      run: |

				        gh ${GH_CLI_COMMAND} --repo ${GITHUB_REPOSITORY} edit --add-label=${LABEL} ${ITEM_NUMBER}

									
										373

.github/workflows/neon_extra_builds.yml
									
										vendored
									
												View File
												
				@@ -26,330 +26,69 @@ jobs:

				    with:

				      github-event-name: ${{ github.event_name}}

				  check-build-tools-image:

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/check-build-tools-image.yml

				  build-build-tools-image:

				    needs: [ check-build-tools-image ]

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/build-build-tools-image.yml

				    with:

				      image-tag: ${{ needs.check-build-tools-image.outputs.image-tag }}

				    secrets: inherit

				  files-changed:

				    name: Detect what files changed

				    runs-on: ubuntu-22.04

				    timeout-minutes: 3

				    outputs:

				      v17: ${{ steps.files_changed.outputs.v17 }}

				      postgres_changes: ${{ steps.postgres_changes.outputs.changes }}

				      rebuild_rust_code: ${{ steps.files_changed.outputs.rust_code }}

				      rebuild_everything: ${{ steps.files_changed.outputs.rebuild_neon_extra || steps.files_changed.outputs.rebuild_macos }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Check for Postgres changes

				        uses: dorny/paths-filter@1441771bbfdd59dcd748680ee64ebd8faab1a242  #v3

				        id: files_changed

				        with:

				          token: ${{ github.token }}

				          filters: .github/file-filters.yaml

				          base: ${{ github.event_name != 'pull_request' && (github.event.merge_group.base_ref || github.ref_name) || '' }}

				          ref: ${{ github.event_name != 'pull_request' && (github.event.merge_group.head_ref || github.ref) || '' }}

				      - name: Filter out only v-string for build matrix

				        id: postgres_changes

				        run: |

				          v_strings_only_as_json_array=$(echo ${{ steps.files_changed.outputs.chnages }} | jq '.[]|select(test("v\\d+"))' | jq --slurp -c)

				          echo "changes=${v_strings_only_as_json_array}" | tee -a "${GITHUB_OUTPUT}"

				  check-macos-build:

				    needs: [ check-permissions ]

				    needs: [ check-permissions, files-changed ]

				    if: |

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				      github.ref_name == 'main'

				    timeout-minutes: 90

				    runs-on: macos-14

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				      - name: Install macOS postgres dependencies

				        run: brew install flex bison openssl protobuf icu4c pkg-config

				      - name: Set pg 14 revision for caching

				        id: pg_v14_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT

				      - name: Set pg 15 revision for caching

				        id: pg_v15_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT

				      - name: Set pg 16 revision for caching

				        id: pg_v16_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT

				      - name: Cache postgres v14 build

				        id: cache_pg_14

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_15

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_16

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Set extra env for macOS

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Cache cargo deps

				        uses: actions/cache@v4

				        with:

				          path: |

				            ~/.cargo/registry

				            !~/.cargo/registry/src

				            ~/.cargo/git

				            target

				          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				      - name: Build postgres v14

				        if: steps.cache_pg_14.outputs.cache-hit != 'true'

				        run: make postgres-v14 -j$(sysctl -n hw.ncpu)

				      - name: Build postgres v15

				        if: steps.cache_pg_15.outputs.cache-hit != 'true'

				        run: make postgres-v15 -j$(sysctl -n hw.ncpu)

				      - name: Build postgres v16

				        if: steps.cache_pg_16.outputs.cache-hit != 'true'

				        run: make postgres-v16 -j$(sysctl -n hw.ncpu)

				      - name: Build neon extensions

				        run: make neon-pg-ext -j$(sysctl -n hw.ncpu)

				      - name: Build walproposer-lib

				        run: make walproposer-lib -j$(sysctl -n hw.ncpu)

				      - name: Run cargo build

				        run: PQ_LIB_DIR=$(pwd)/pg_install/v16/lib cargo build --all --release

				      - name: Check that no warnings are produced

				        run: ./run_clippy.sh

				  check-linux-arm-build:

				    needs: [ check-permissions, build-build-tools-image ]

				    timeout-minutes: 90

				    runs-on: [ self-hosted, small-arm64 ]

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				      CARGO_FEATURES: --features testing

				      CARGO_FLAGS: --release

				      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    steps:

				      - name: Fix git ownership

				        run: |

				          # Workaround for `fatal: detected dubious ownership in repository at ...`

				          #

				          # Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers

				          #   Ref https://github.com/actions/checkout/issues/785

				          #

				          git config --global --add safe.directory ${{ github.workspace }}

				          git config --global --add safe.directory ${GITHUB_WORKSPACE}

				          for r in 14 15 16; do

				            git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"

				            git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"

				          done

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				      - name: Set pg 14 revision for caching

				        id: pg_v14_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT

				      - name: Set pg 15 revision for caching

				        id: pg_v15_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT

				      - name: Set pg 16 revision for caching

				        id: pg_v16_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT

				      - name: Set env variables

				        run: |

				          echo "CARGO_HOME=${GITHUB_WORKSPACE}/.cargo" >> $GITHUB_ENV

				      - name: Cache postgres v14 build

				        id: cache_pg_14

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_15

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_16

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Build postgres v14

				        if: steps.cache_pg_14.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v14 -j$(nproc)

				      - name: Build postgres v15

				        if: steps.cache_pg_15.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v15 -j$(nproc)

				      - name: Build postgres v16

				        if: steps.cache_pg_16.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v16 -j$(nproc)

				      - name: Build neon extensions

				        run: mold -run make neon-pg-ext -j$(nproc)

				      - name: Build walproposer-lib

				        run: mold -run make walproposer-lib -j$(nproc)

				      - name: Run cargo build

				        run: |

				          mold -run cargo build --locked $CARGO_FLAGS $CARGO_FEATURES --bins --tests -j$(nproc)

				      - name: Run cargo test

				        env:

				          NEXTEST_RETRIES: 3

				        run: |

				          cargo nextest run $CARGO_FEATURES -j$(nproc)

				          # Run separate tests for real S3

				          export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty

				          export REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests

				          export REMOTE_STORAGE_S3_REGION=eu-central-1

				          # Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now

				          cargo nextest run --package remote_storage --test test_real_s3 -j$(nproc)

				          # Run separate tests for real Azure Blob Storage

				          # XXX: replace region with `eu-central-1`-like region

				          export ENABLE_REAL_AZURE_REMOTE_STORAGE=y

				          export AZURE_STORAGE_ACCOUNT="${{ secrets.AZURE_STORAGE_ACCOUNT_DEV }}"

				          export AZURE_STORAGE_ACCESS_KEY="${{ secrets.AZURE_STORAGE_ACCESS_KEY_DEV }}"

				          export REMOTE_STORAGE_AZURE_CONTAINER="${{ vars.REMOTE_STORAGE_AZURE_CONTAINER }}"

				          export REMOTE_STORAGE_AZURE_REGION="${{ vars.REMOTE_STORAGE_AZURE_REGION }}"

				          # Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now

				          cargo nextest run --package remote_storage --test test_real_azure -j$(nproc)

				  check-codestyle-rust-arm:

				    needs: [ check-permissions, build-build-tools-image ]

				    timeout-minutes: 90

				    runs-on: [ self-hosted, small-arm64 ]

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    strategy:

				      fail-fast: false

				      matrix:

				        build_type: [ debug, release ]

				    steps:

				      - name: Fix git ownership

				        run: |

				          # Workaround for `fatal: detected dubious ownership in repository at ...`

				          #

				          # Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers

				          #   Ref https://github.com/actions/checkout/issues/785

				          #

				          git config --global --add safe.directory ${{ github.workspace }}

				          git config --global --add safe.directory ${GITHUB_WORKSPACE}

				          for r in 14 15 16; do

				            git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"

				            git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"

				          done

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				      # Some of our rust modules use FFI and need those to be checked

				      - name: Get postgres headers

				        run: make postgres-headers -j$(nproc)

				      # cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.

				      # This will catch compiler & clippy warnings in all feature combinations.

				      # TODO: use cargo hack for build and test as well, but, that's quite expensive.

				      # NB: keep clippy args in sync with ./run_clippy.sh

				      - run: |

				          CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"

				          if [ "$CLIPPY_COMMON_ARGS" = "" ]; then

				            echo "No clippy args found in .neon_clippy_args"

				            exit 1

				          fi

				          echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV

				      - name: Run cargo clippy (debug)

				        if: matrix.build_type == 'debug'

				        run: cargo hack --feature-powerset clippy $CLIPPY_COMMON_ARGS

				      - name: Run cargo clippy (release)

				        if: matrix.build_type == 'release'

				        run: cargo hack --feature-powerset clippy --release $CLIPPY_COMMON_ARGS

				      - name: Check documentation generation

				        if: matrix.build_type == 'release'

				        run: cargo doc --workspace --no-deps --document-private-items -j$(nproc)

				        env:

				            RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"

				      # Use `${{ !cancelled() }}` to run quck tests after the longer clippy run

				      - name: Check formatting

				        if: ${{ !cancelled() && matrix.build_type == 'release' }}

				        run: cargo fmt --all -- --check

				      # https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci

				      - name: Check rust dependencies

				        if: ${{ !cancelled() && matrix.build_type == 'release' }}

				        run: |

				          cargo hakari generate --diff  # workspace-hack Cargo.toml is up-to-date

				          cargo hakari manage-deps --dry-run  # all workspace crates depend on workspace-hack

				      # https://github.com/EmbarkStudios/cargo-deny

				      - name: Check rust licenses/bans/advisories/sources

				        if: ${{ !cancelled() && matrix.build_type == 'release' }}

				        run: cargo deny check

				    uses: ./.github/workflows/build-macos.yml

				    with:

				      pg_versions: ${{ needs.files-changed.outputs.postgres_changes }}

				      rebuild_rust_code: ${{ needs.files-changed.outputs.rebuild_rust_code }}

				      rebuild_everything: ${{ fromJson(needs.files-changed.outputs.rebuild_everything) }}

				  gather-rust-build-stats:

				    needs: [ check-permissions, build-build-tools-image ]

				    needs: [ check-permissions, build-build-tools-image, files-changed ]

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      statuses: write

				      contents: write

				    if: |

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				      github.ref_name == 'main'

				      (needs.files-changed.outputs.v17 == 'true' || needs.files-changed.outputs.rebuild_everything == 'true') && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    runs-on: [ self-hosted, large ]

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				@@ -357,8 +96,6 @@ jobs:

				    env:

				      BUILD_TYPE: release

				      # remove the cachepot wrapper and build without crate caches

				      RUSTC_WRAPPER: ""

				      # build with incremental compilation produce partial results

				      # so do not attempt to cache this build, also disable the incremental compilation

				      CARGO_INCREMENTAL: 0

				@@ -368,7 +105,6 @@ jobs:

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				          fetch-depth: 1

				      # Some of our rust modules use FFI and need those to be checked

				      - name: Get postgres headers

				@@ -380,13 +116,18 @@ jobs:

				      - name: Produce the build stats

				        run: cargo build --all --release --timings -j$(nproc)

				      - name: Configure AWS credentials

				        uses: aws-actions/configure-aws-credentials@v4

				        with:

				          aws-region: eu-central-1

				          role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				          role-duration-seconds: 3600

				      - name: Upload the build stats

				        id: upload-stats

				        env:

				          BUCKET: neon-github-public-dev

				          SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				        run: |

				          REPORT_URL=https://${BUCKET}.s3.amazonaws.com/build-stats/${SHA}/${GITHUB_RUN_ID}/cargo-timing.html

				          aws s3 cp --only-show-errors ./target/cargo-timings/cargo-timing.html "s3://${BUCKET}/build-stats/${SHA}/${GITHUB_RUN_ID}/"

				@@ -398,6 +139,8 @@ jobs:

				          REPORT_URL: ${{ steps.upload-stats.outputs.report-url }}

				          SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				        with:

				          # Retry script for 5XX server errors: https://github.com/actions/github-script#retries

				          retries: 5

				          script: |

				            const { REPORT_URL, SHA } = process.env

									
										172

.github/workflows/periodic_pagebench.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,172 @@

				name: Periodic pagebench performance test on dedicated EC2 machine in eu-central-1 region

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '0 18 * * *' # Runs at 6 PM UTC every day

				  workflow_dispatch: # Allows manual triggering of the workflow

				    inputs:

				      commit_hash:

				        type: string

				        description: 'The long neon repo commit hash for the system under test (pageserver) to be tested.'

				        required: false

				        default: ''

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				concurrency:

				  group: ${{ github.workflow }}

				  cancel-in-progress: false

				jobs:

				  trigger_bench_on_ec2_machine_in_eu_central_1:

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      statuses: write

				      contents: write

				      pull-requests: write

				    runs-on: [ self-hosted, small ]

				    container:

				      image: neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init

				    timeout-minutes: 360  # Set the timeout to 6 hours

				    env:

				      API_KEY: ${{ secrets.PERIODIC_PAGEBENCH_EC2_RUNNER_API_KEY }}

				      RUN_ID: ${{ github.run_id }}

				      AWS_DEFAULT_REGION : "eu-central-1"

				      AWS_INSTANCE_ID : "i-02a59a3bf86bc7e74"

				    steps:

				    # we don't need the neon source code because we run everything remotely

				    # however we still need the local github actions to run the allure step below

				    - uses: actions/checkout@v4

				    - name: Show my own (github runner) external IP address - usefull for IP allowlisting

				      run: curl https://ifconfig.me

				    - name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)

				      uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}

				        role-duration-seconds: 3600

				    - name: Start EC2 instance and wait for the instance to boot up

				      run: |

				        aws ec2 start-instances --instance-ids $AWS_INSTANCE_ID

				        aws ec2 wait instance-running --instance-ids $AWS_INSTANCE_ID

				        sleep 60 # sleep some time to allow cloudinit and our API server to start up

				    - name: Determine public IP of the EC2 instance and set env variable EC2_MACHINE_URL_US

				      run: |

				        public_ip=$(aws ec2 describe-instances --instance-ids $AWS_INSTANCE_ID --query 'Reservations[*].Instances[*].PublicIpAddress' --output text)

				        echo "Public IP of the EC2 instance: $public_ip"

				        echo "EC2_MACHINE_URL_US=https://${public_ip}:8443" >> $GITHUB_ENV

				    - name: Determine commit hash

				      env:

				        INPUT_COMMIT_HASH: ${{ github.event.inputs.commit_hash }}

				      run: |

				        if [ -z "$INPUT_COMMIT_HASH" ]; then

				          echo "COMMIT_HASH=$(curl -s https://api.github.com/repos/neondatabase/neon/commits/main | jq -r '.sha')" >> $GITHUB_ENV

				        else

				          echo "COMMIT_HASH=$INPUT_COMMIT_HASH" >> $GITHUB_ENV

				        fi

				    - name: Start Bench with run_id

				      run: |

				        curl -k -X 'POST' \

				        "${EC2_MACHINE_URL_US}/start_test/${GITHUB_RUN_ID}" \

				        -H 'accept: application/json' \

				        -H 'Content-Type: application/json' \

				        -H "Authorization: Bearer $API_KEY" \

				        -d "{\"neonRepoCommitHash\": \"${COMMIT_HASH}\"}"

				    - name: Poll Test Status

				      id: poll_step

				      run: |

				        status=""

				        while [[ "$status" != "failure" && "$status" != "success" ]]; do

				          response=$(curl -k -X 'GET' \

				          "${EC2_MACHINE_URL_US}/test_status/${GITHUB_RUN_ID}" \

				          -H 'accept: application/json' \

				          -H "Authorization: Bearer $API_KEY")

				          echo "Response: $response"

				          set +x

				          status=$(echo $response | jq -r '.status')

				          echo "Test status: $status"

				          if [[ "$status" == "failure" ]]; then

				            echo "Test failed"

				            exit 1 # Fail the job step if status is failure

				          elif [[ "$status" == "success" || "$status" == "null" ]]; then

				            break

				          elif [[ "$status" == "too_many_runs" ]]; then

				            echo "Too many runs already running"

				            echo "too_many_runs=true" >> "$GITHUB_OUTPUT"

				            exit 1

				          fi

				          sleep 60 # Poll every 60 seconds

				        done

				    - name: Retrieve Test Logs

				      if: always() && steps.poll_step.outputs.too_many_runs != 'true'

				      run: |

				        curl -k -X 'GET' \

				        "${EC2_MACHINE_URL_US}/test_log/${GITHUB_RUN_ID}" \

				        -H 'accept: application/gzip' \

				        -H "Authorization: Bearer $API_KEY" \

				        --output "test_log_${GITHUB_RUN_ID}.gz"

				    - name: Unzip Test Log and Print it into this job's log

				      if: always() && steps.poll_step.outputs.too_many_runs != 'true'

				      run: |

				        gzip -d "test_log_${GITHUB_RUN_ID}.gz"

				        cat "test_log_${GITHUB_RUN_ID}"

				    - name: Create Allure report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      with:

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: "Periodic pagebench testing on dedicated hardware: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				    - name: Cleanup Test Resources

				      if: always()

				      run: |

				        curl -k -X 'POST' \

				        "${EC2_MACHINE_URL_US}/cleanup_test/${GITHUB_RUN_ID}" \

				        -H 'accept: application/json' \

				        -H "Authorization: Bearer $API_KEY" \

				        -d ''

				    - name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)

				      if: always() && steps.poll_step.outputs.too_many_runs != 'true'

				      uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}

				        role-duration-seconds: 3600

				    - name: Stop EC2 instance and wait for the instance to be stopped

				      if: always() && steps.poll_step.outputs.too_many_runs != 'true'

				      run: |

				        aws ec2 stop-instances --instance-ids $AWS_INSTANCE_ID

				        aws ec2 wait instance-stopped --instance-ids $AWS_INSTANCE_ID

									
										213

.github/workflows/pg-clients.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,213 @@

				name: Test Postgres client libraries

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '23 02 * * *' # run once a day, timezone is utc

				  pull_request:

				    paths:

				      - '.github/workflows/pg-clients.yml'

				      - 'test_runner/pg_clients/**'

				      - 'test_runner/logical_repl/**'

				      - 'poetry.lock'

				  workflow_dispatch:

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref_name }}

				  cancel-in-progress: ${{ github.event_name == 'pull_request' }}

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				permissions:

				  id-token: write # aws-actions/configure-aws-credentials

				  statuses: write # require for posting a status update

				env:

				  DEFAULT_PG_VERSION: 16

				  PLATFORM: neon-captest-new

				  AWS_DEFAULT_REGION: eu-central-1

				jobs:

				  check-permissions:

				    if: ${{ !contains(github.event.pull_request.labels.*.name, 'run-no-ci') }}

				    uses: ./.github/workflows/check-permissions.yml

				    with:

				      github-event-name: ${{ github.event_name }}

				  build-build-tools-image:

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/build-build-tools-image.yml

				    secrets: inherit

				  test-logical-replication:

				    needs: [ build-build-tools-image ]

				    runs-on: ubuntu-22.04

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init --user root

				    services:

				      clickhouse:

				        image: clickhouse/clickhouse-server:24.6.3.64

				        ports:

				          - 9000:9000

				          - 8123:8123

				      zookeeper:

				        image: quay.io/debezium/zookeeper:2.7

				        ports:

				          - 2181:2181

				      kafka:

				        image: quay.io/debezium/kafka:2.7

				        env:

				          ZOOKEEPER_CONNECT: "zookeeper:2181"

				          KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092

				          KAFKA_BROKER_ID: 1

				          KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

				          KAFKA_JMX_PORT: 9991

				        ports:

				          - 9092:9092

				      debezium:

				        image: quay.io/debezium/connect:2.7

				        env:

				          BOOTSTRAP_SERVERS: kafka:9092

				          GROUP_ID: 1

				          CONFIG_STORAGE_TOPIC: debezium-config

				          OFFSET_STORAGE_TOPIC: debezium-offset

				          STATUS_STORAGE_TOPIC: debezium-status

				          DEBEZIUM_CONFIG_CONNECTOR_CLASS: io.debezium.connector.postgresql.PostgresConnector

				        ports:

				          - 8083:8083

				    steps:

				      - uses: actions/checkout@v4

				      - name: Download Neon artifact

				        uses: ./.github/actions/download

				        with:

				          name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				          path: /tmp/neon/

				          prefix: latest

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      - name: Create Neon Project

				        id: create-neon-project

				        uses: ./.github/actions/neon-project-create

				        with:

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				          postgres_version: ${{ env.DEFAULT_PG_VERSION }}

				      - name: Run tests

				        uses: ./.github/actions/run-python-test-set

				        with:

				          build_type: remote

				          test_selection: logical_repl

				          run_in_parallel: false

				          extra_params: -m remote_cluster

				          pg_version: ${{ env.DEFAULT_PG_VERSION }}

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

				      - name: Delete Neon Project

				        if: always()

				        uses: ./.github/actions/neon-project-delete

				        with:

				          project_id: ${{ steps.create-neon-project.outputs.project_id }}

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				      - name: Create Allure report

				        if: ${{ !cancelled() }}

				        id: create-allure-report

				        uses: ./.github/actions/allure-report-generate

				        with:

				          store-test-results-into-db: true

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				      - name: Post to a Slack channel

				        if: github.event.schedule && failure()

				        uses: slackapi/slack-github-action@v1

				        with:

				          channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				          slack-message: |

				            Testing the logical replication: <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|${{ job.status }}> (<${{ steps.create-allure-report.outputs.report-url }}|test report>)

				        env:

				          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				  test-postgres-client-libs:

				    needs: [ build-build-tools-image ]

				    runs-on: ubuntu-22.04

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      options: --init --user root

				    steps:

				    - uses: actions/checkout@v4

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Create Neon Project

				      id: create-neon-project

				      uses: ./.github/actions/neon-project-create

				      with:

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				        postgres_version: ${{ env.DEFAULT_PG_VERSION }}

				    - name: Run tests

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: remote

				        test_selection: pg_clients

				        run_in_parallel: false

				        extra_params: -m remote_cluster

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

				    - name: Delete Neon Project

				      if: always()

				      uses: ./.github/actions/neon-project-delete

				      with:

				        project_id: ${{ steps.create-neon-project.outputs.project_id }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Create Allure report

				      if: ${{ !cancelled() }}

				      id: create-allure-report

				      uses: ./.github/actions/allure-report-generate

				      with:

				        store-test-results-into-db: true

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				    - name: Post to a Slack channel

				      if: github.event.schedule && failure()

				      uses: slackapi/slack-github-action@v1

				      with:

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Testing Postgres clients: <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|${{ job.status }}> (<${{ steps.create-allure-report.outputs.report-url }}|test report>)

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

									
										98

.github/workflows/pg_clients.yml
									
										vendored
									
												View File
											
				@@ -1,98 +0,0 @@

				name: Test Postgres client libraries

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '23 02 * * *' # run once a day, timezone is utc

				  workflow_dispatch:

				concurrency:

				  # Allow only one workflow per any non-`main` branch.

				  group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				  cancel-in-progress: true

				jobs:

				  test-postgres-client-libs:

				    # TODO: switch to gen2 runner, requires docker

				    runs-on: ubuntu-22.04

				    env:

				      DEFAULT_PG_VERSION: 14

				      TEST_OUTPUT: /tmp/test_output

				    steps:

				    - name: Checkout

				      uses: actions/checkout@v4

				    - uses: actions/setup-python@v4

				      with:

				        python-version: 3.9

				    - name: Install Poetry

				      uses: snok/install-poetry@v1

				    - name: Cache poetry deps

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-ubunutu-latest-${{ hashFiles('poetry.lock') }}

				    - name: Install Python deps

				      shell: bash -euxo pipefail {0}

				      run: ./scripts/pysync

				    - name: Create Neon Project

				      id: create-neon-project

				      uses: ./.github/actions/neon-project-create

				      with:

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				        postgres_version: ${{ env.DEFAULT_PG_VERSION }}

				    - name: Run pytest

				      env:

				        REMOTE_ENV: 1

				        BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

				        POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      shell: bash -euxo pipefail {0}

				      run: |

				        # Test framework expects we have psql binary;

				        # but since we don't really need it in this test, let's mock it

				        mkdir -p "$POSTGRES_DISTRIB_DIR/v${DEFAULT_PG_VERSION}/bin" && touch "$POSTGRES_DISTRIB_DIR/v${DEFAULT_PG_VERSION}/bin/psql";

				        ./scripts/pytest \

				          --junitxml=$TEST_OUTPUT/junit.xml \

				          --tb=short \

				          --verbose \

				          -m "remote_cluster" \

				          -rA "test_runner/pg_clients"

				    - name: Delete Neon Project

				      if: ${{ always() }}

				      uses: ./.github/actions/neon-project-delete

				      with:

				        project_id: ${{ steps.create-neon-project.outputs.project_id }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    # We use GitHub's action upload-artifact because `ubuntu-latest` doesn't have configured AWS CLI.

				    # It will be fixed after switching to gen2 runner

				    - name: Upload python test logs

				      if: always()

				      uses: actions/upload-artifact@v4

				      with:

				        retention-days: 7

				        name: python-test-pg_clients-${{ runner.os }}-${{ runner.arch }}-stage-logs

				        path: ${{ env.TEST_OUTPUT }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        slack-message: "Testing Postgres clients: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

									
										89

.github/workflows/pin-build-tools-image.yml
									
										vendored
									
												View File
												
				@@ -7,12 +7,20 @@ on:

				        description: 'Source tag'

				        required: true

				        type: string

				      force:

				        description: 'Force the image to be pinned'

				        default: false

				        type: boolean

				  workflow_call:

				    inputs:

				      from-tag:

				        description: 'Source tag'

				        required: true

				        type: string

				      force:

				        description: 'Force the image to be pinned'

				        default: false

				        type: boolean

				defaults:

				  run:

				@@ -22,15 +30,18 @@ concurrency:

				  group: pin-build-tools-image-${{ inputs.from-tag }}

				  cancel-in-progress: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  tag-image:

				    runs-on: ubuntu-22.04

				env:

				  FROM_TAG: ${{ inputs.from-tag }}

				  TO_TAG: pinned

				    env:

				      FROM_TAG: ${{ inputs.from-tag }}

				      TO_TAG: pinned

				jobs:

				  check-manifests:

				    runs-on: ubuntu-22.04

				    outputs:

				      skip: ${{ steps.check-manifests.outputs.skip }}

				    steps:

				      - name: Check if we really need to pin the image

				@@ -47,27 +58,61 @@ jobs:

				          echo "skip=${skip}" | tee -a $GITHUB_OUTPUT

				  tag-image:

				    needs: check-manifests

				    # use format(..) to catch both inputs.force = true AND inputs.force = 'true'

				    if: needs.check-manifests.outputs.skip == 'false' || format('{0}', inputs.force) == 'true'

				    runs-on: ubuntu-22.04

				    permissions:

				      id-token: write # for `azure/login` and aws auth

				    steps:

				      - uses: docker/login-action@v3

				        if: steps.check-manifests.outputs.skip == 'false'

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - name: Tag build-tools with `${{ env.TO_TAG }}` in Docker Hub

				        if: steps.check-manifests.outputs.skip == 'false'

				        run: |

				          docker buildx imagetools create -t neondatabase/build-tools:${TO_TAG} \

				                                             neondatabase/build-tools:${FROM_TAG}

				      - uses: docker/login-action@v3

				        if: steps.check-manifests.outputs.skip == 'false'

				      - name: Configure AWS credentials

				        uses: aws-actions/configure-aws-credentials@v4

				        with:

				          registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com

				          username: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				          password: ${{ secrets.AWS_SECRET_KEY_DEV }}

				          aws-region: eu-central-1

				          role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				          role-duration-seconds: 3600

				      - name: Tag build-tools with `${{ env.TO_TAG }}` in ECR

				        if: steps.check-manifests.outputs.skip == 'false'

				      - name: Login to Amazon Dev ECR

				        uses: aws-actions/amazon-ecr-login@v2

				      - name: Azure login

				        uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a  # @v2.1.1

				        with:

				          client-id: ${{ secrets.AZURE_DEV_CLIENT_ID }}

				          tenant-id: ${{ secrets.AZURE_TENANT_ID }}

				          subscription-id: ${{ secrets.AZURE_DEV_SUBSCRIPTION_ID }}

				      - name: Login to ACR

				        run: |

				          docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG} \

				                                             neondatabase/build-tools:${FROM_TAG}

				          az acr login --name=neoneastus2

				      - name: Tag build-tools with `${{ env.TO_TAG }}` in Docker Hub, ECR, and ACR

				        env:

				          DEFAULT_DEBIAN_VERSION: bookworm

				        run: |

				          for debian_version in bullseye bookworm; do

				            tags=()

				            tags+=("-t" "neondatabase/build-tools:${TO_TAG}-${debian_version}")

				            tags+=("-t" "369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG}-${debian_version}")

				            tags+=("-t" "neoneastus2.azurecr.io/neondatabase/build-tools:${TO_TAG}-${debian_version}")

				            if [ "${debian_version}" == "${DEFAULT_DEBIAN_VERSION}" ]; then

				              tags+=("-t" "neondatabase/build-tools:${TO_TAG}")

				              tags+=("-t" "369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG}")

				              tags+=("-t" "neoneastus2.azurecr.io/neondatabase/build-tools:${TO_TAG}")

				            fi

				            docker buildx imagetools create "${tags[@]}" \

				                                              neondatabase/build-tools:${FROM_TAG}-${debian_version}

				          done

									
										129

.github/workflows/pre-merge-checks.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,129 @@

				name: Pre-merge checks

				on:

				  pull_request:

				    paths:

				      - .github/workflows/_check-codestyle-python.yml

				      - .github/workflows/_check-codestyle-rust.yml

				      - .github/workflows/build-build-tools-image.yml

				      - .github/workflows/pre-merge-checks.yml

				  merge_group:

				    branches:

				      - main

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  get-changed-files:

				    runs-on: ubuntu-22.04

				    outputs:

				      python-changed: ${{ steps.python-src.outputs.any_changed }}

				      rust-changed: ${{ steps.rust-src.outputs.any_changed }}

				    steps:

				      - uses: actions/checkout@v4

				      - uses: tj-actions/changed-files@4edd678ac3f81e2dc578756871e4d00c19191daf # v45.0.4

				        id: python-src

				        with:

				          files: |

				            .github/workflows/_check-codestyle-python.yml

				            .github/workflows/build-build-tools-image.yml

				            .github/workflows/pre-merge-checks.yml

				            **/**.py

				            poetry.lock

				            pyproject.toml

				      - uses: tj-actions/changed-files@4edd678ac3f81e2dc578756871e4d00c19191daf # v45.0.4

				        id: rust-src

				        with:

				          files: |

				            .github/workflows/_check-codestyle-rust.yml

				            .github/workflows/build-build-tools-image.yml

				            .github/workflows/pre-merge-checks.yml

				            **/**.rs

				            **/Cargo.toml

				            Cargo.toml

				            Cargo.lock

				      - name: PRINT ALL CHANGED FILES FOR DEBUG PURPOSES

				        env:

				          PYTHON_CHANGED_FILES: ${{ steps.python-src.outputs.all_changed_files }}

				          RUST_CHANGED_FILES: ${{ steps.rust-src.outputs.all_changed_files }}

				        run: |

				          echo "${PYTHON_CHANGED_FILES}"

				          echo "${RUST_CHANGED_FILES}"

				  build-build-tools-image:

				    if: needs.get-changed-files.outputs.python-changed == 'true'

				    needs: [ get-changed-files ]

				    uses: ./.github/workflows/build-build-tools-image.yml

				    with:

				      # Build only one combination to save time

				      archs: '["x64"]'

				      debians: '["bookworm"]'

				    secrets: inherit

				  check-codestyle-python:

				    if: needs.get-changed-files.outputs.python-changed == 'true'

				    needs: [ get-changed-files, build-build-tools-image ]

				    uses: ./.github/workflows/_check-codestyle-python.yml

				    with:

				      # `-bookworm-x64` suffix should match the combination in `build-build-tools-image`

				      build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64

				    secrets: inherit

				  check-codestyle-rust:

				    if: needs.get-changed-files.outputs.rust-changed == 'true'

				    needs: [ get-changed-files, build-build-tools-image ]

				    uses: ./.github/workflows/_check-codestyle-rust.yml

				    with:

				      # `-bookworm-x64` suffix should match the combination in `build-build-tools-image`

				      build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64

				      archs: '["x64"]'

				    secrets: inherit

				  # To get items from the merge queue merged into main we need to satisfy "Status checks that are required".

				  # Currently we require 2 jobs (checks with exact name):

				  # - conclusion

				  # - neon-cloud-e2e

				  conclusion:

				    if: always()

				    permissions:

				      statuses: write # for `github.repos.createCommitStatus(...)`

				      contents: write

				    needs:

				      - get-changed-files

				      - check-codestyle-python

				      - check-codestyle-rust

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Create fake `neon-cloud-e2e` check

				        uses: actions/github-script@v7

				        with:

				          # Retry script for 5XX server errors: https://github.com/actions/github-script#retries

				          retries: 5

				          script: |

				            const { repo, owner } = context.repo;

				            const targetUrl = `${context.serverUrl}/${owner}/${repo}/actions/runs/${context.runId}`;

				            await github.rest.repos.createCommitStatus({

				              owner: owner,

				              repo: repo,

				              sha: context.sha,

				              context: `neon-cloud-e2e`,

				              state: `success`,

				              target_url: targetUrl,

				              description: `fake check for merge queue`,

				            });

				      - name: Fail the job if any of the dependencies do not succeed or skipped

				        run: exit 1

				        if: |

				          (contains(needs.check-codestyle-python.result, 'skipped') && needs.get-changed-files.outputs.python-changed == 'true')

				          || contains(needs.*.result, 'failure')

				          || contains(needs.*.result, 'cancelled')

									
										102

.github/workflows/release.yml
									
										vendored
									
												View File
												
				@@ -3,8 +3,9 @@ name: Create Release Branch

				on:

				  schedule:

				    # It should be kept in sync with if-condition in jobs

				    - cron: '0 6 * * MON' # Storage release

				    - cron: '0 6 * * THU' # Proxy release

				    - cron: '0 6 * * FRI' # Storage release

				    - cron: '0 7 * * FRI' # Compute release

				  workflow_dispatch:

				    inputs:

				      create-storage-release-branch:

				@@ -15,6 +16,10 @@ on:

				        type: boolean

				        description: 'Create Proxy release PR'

				        required: false

				      create-compute-release-branch:

				        type: boolean

				        description: 'Create Compute release PR'

				        required: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				@@ -25,83 +30,40 @@ defaults:

				jobs:

				  create-storage-release-branch:

				    if: ${{ github.event.schedule == '0 6 * * MON' || format('{0}', inputs.create-storage-release-branch) == 'true' }}

				    runs-on: ubuntu-22.04

				    if: ${{ github.event.schedule == '0 6 * * FRI' || inputs.create-storage-release-branch }}

				    permissions:

				      contents: write # for `git push`

				      contents: write

				    steps:

				    - name: Check out code

				      uses: actions/checkout@v4

				      with:

				        ref: main

				    - name: Set environment variables

				      run: |

				        echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				        echo "RELEASE_BRANCH=rc/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				    - name: Create release branch

				      run: git checkout -b $RELEASE_BRANCH

				    - name: Push new branch

				      run: git push origin $RELEASE_BRANCH

				    - name: Create pull request into release

				      env:

				        GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      run: |

				        TITLE="Storage & Compute release ${RELEASE_DATE}"

				        cat << EOF > body.md

				          ## ${TITLE}

				          **Please merge this Pull Request using 'Create a merge commit' button**

				        EOF

				        gh pr create --title "${TITLE}" \

				                     --body-file "body.md" \

				                     --head "${RELEASE_BRANCH}" \

				                     --base "release"

				    uses: ./.github/workflows/_create-release-pr.yml

				    with:

				      component-name: 'Storage'

				      release-branch: 'release'

				    secrets:

				      ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}

				  create-proxy-release-branch:

				    if: ${{ github.event.schedule == '0 6 * * THU' || format('{0}', inputs.create-proxy-release-branch) == 'true' }}

				    runs-on: ubuntu-22.04

				    if: ${{ github.event.schedule == '0 6 * * THU' || inputs.create-proxy-release-branch }}

				    permissions:

				      contents: write # for `git push`

				      contents: write

				    steps:

				    - name: Check out code

				      uses: actions/checkout@v4

				      with:

				        ref: main

				    uses: ./.github/workflows/_create-release-pr.yml

				    with:

				      component-name: 'Proxy'

				      release-branch: 'release-proxy'

				    secrets:

				      ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}

				    - name: Set environment variables

				      run: |

				        echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				        echo "RELEASE_BRANCH=rc/proxy/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				  create-compute-release-branch:

				    if: ${{ github.event.schedule == '0 7 * * FRI' || inputs.create-compute-release-branch }}

				    - name: Create release branch

				      run: git checkout -b $RELEASE_BRANCH

				    permissions:

				      contents: write

				    - name: Push new branch

				      run: git push origin $RELEASE_BRANCH

				    - name: Create pull request into release

				      env:

				        GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      run: |

				        TITLE="Proxy release ${RELEASE_DATE}"

				        cat << EOF > body.md

				          ## ${TITLE}

				          **Please merge this Pull Request using 'Create a merge commit' button**

				        EOF

				        gh pr create --title "${TITLE}" \

				                     --body-file "body.md" \

				                     --head "${RELEASE_BRANCH}" \

				                     --base "release-proxy"

				    uses: ./.github/workflows/_create-release-pr.yml

				    with:

				      component-name: 'Compute'

				      release-branch: 'release-compute'

				    secrets:

				      ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}

									
										53

.github/workflows/report-workflow-stats-batch.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,53 @@

				name: Report Workflow Stats Batch

				on:

				  schedule:

				    - cron: '*/15 * * * *'

				    - cron: '25 0 * * *'

				    - cron: '25 1 * * 6'

				jobs:

				  gh-workflow-stats-batch-2h:

				    name: GitHub Workflow Stats Batch 2 hours

				    if: github.event.schedule == '*/15 * * * *'

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: read

				    steps:

				    - name: Export Workflow Run for the past 2 hours

				      uses: neondatabase/gh-workflow-stats-action@v0.2.1

				      with:

				        db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}

				        db_table: "gh_workflow_stats_neon"

				        gh_token: ${{ secrets.GITHUB_TOKEN }}

				        duration: '2h'

				  gh-workflow-stats-batch-48h:

				    name: GitHub Workflow Stats Batch 48 hours

				    if: github.event.schedule == '25 0 * * *'

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: read

				    steps:

				    - name: Export Workflow Run for the past 48 hours

				      uses: neondatabase/gh-workflow-stats-action@v0.2.1

				      with:

				        db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}

				        db_table: "gh_workflow_stats_neon"

				        gh_token: ${{ secrets.GITHUB_TOKEN }}

				        duration: '48h'

				  gh-workflow-stats-batch-30d:

				    name: GitHub Workflow Stats Batch 30 days

				    if: github.event.schedule == '25 1 * * 6'

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: read

				    steps:

				    - name: Export Workflow Run for the past 30 days

				      uses: neondatabase/gh-workflow-stats-action@v0.2.1

				      with:

				        db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}

				        db_table: "gh_workflow_stats_neon"

				        gh_token: ${{ secrets.GITHUB_TOKEN }}

				        duration: '720h'

									
										53

.github/workflows/trigger-e2e-tests.yml
									
										vendored
									
												View File
												
				@@ -13,8 +13,6 @@ defaults:

				env:

				  # A concurrency group that we use for e2e-tests runs, matches `concurrency.group` above with `github.repository` as a prefix

				  E2E_CONCURRENCY_GROUP: ${{ github.repository }}-e2e-tests-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				jobs:

				  cancel-previous-e2e-tests:

				@@ -36,8 +34,8 @@ jobs:

				      build-tag: ${{ steps.build-tag.outputs.tag }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4

				      # Need `fetch-depth: 0` to count the number of commits in the branch

				      - uses: actions/checkout@v4

				        with:

				          fetch-depth: 0

				@@ -53,6 +51,8 @@ jobs:

				            echo "tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then

				            echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release-compute" ]]; then

				            echo "tag=release-compute-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          else

				            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"

				            BUILD_AND_TEST_RUN_ID=$(gh run list -b $CURRENT_BRANCH -c $CURRENT_SHA -w 'Build and Test' -L 1 --json databaseId --jq '.[].databaseId')

				@@ -64,19 +64,35 @@ jobs:

				    needs: [ tag ]

				    runs-on: ubuntu-22.04

				    env:

				      EVENT_ACTION: ${{ github.event.action }}

				      GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      TAG: ${{ needs.tag.outputs.build-tag }}

				    steps:

				      - name: check if ecr image are present

				        env:

				          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				      - name: Wait for `promote-images-dev` job to finish

				        # It's important to have a timeout here, the script in the step can run infinitely

				        timeout-minutes: 60

				        run: |

				          for REPO in neon compute-tools compute-node-v14 vm-compute-node-v14 compute-node-v15 vm-compute-node-v15 compute-node-v16 vm-compute-node-v16; do

				            OUTPUT=$(aws ecr describe-images --repository-name ${REPO} --region eu-central-1 --query "imageDetails[?imageTags[?contains(@, '${TAG}')]]" --output text)

				            if [ "$OUTPUT" == "" ]; then

				              echo "$REPO with image tag $TAG not found" >> $GITHUB_OUTPUT

				              exit 1

				            fi

				          if [ "${GITHUB_EVENT_NAME}" != "pull_request" ] || [ "${EVENT_ACTION}" != "ready_for_review" ]; then

				            exit 0

				          fi

				          # For PRs we use the run id as the tag

				          BUILD_AND_TEST_RUN_ID=${TAG}

				          while true; do

				            conclusion=$(gh run --repo ${GITHUB_REPOSITORY} view ${BUILD_AND_TEST_RUN_ID} --json jobs --jq '.jobs[] | select(.name == "promote-images-dev") | .conclusion')

				            case "$conclusion" in

				              success)

				                break

				                ;;

				              failure | cancelled | skipped)

				                echo "The 'promote-images-dev' job didn't succeed: '${conclusion}'. Exiting..."

				                exit 1

				                ;;

				              *)

				                echo "The 'promote-images-dev' hasn't succeed yet. Waiting..."

				                sleep 60

				                ;;

				            esac

				          done

				      - name: Set e2e-platforms

				@@ -88,12 +104,17 @@ jobs:

				          # Default set of platforms to run e2e tests on

				          platforms='["docker", "k8s"]'

				          # If the PR changes vendor/, pgxn/ or libs/vm_monitor/ directories, or Dockerfile.compute-node, add k8s-neonvm to the list of platforms.

				          # If a PR changes anything that affects computes, add k8s-neonvm to the list of platforms.

				          # If the workflow run is not a pull request, add k8s-neonvm to the list.

				          if [ "$GITHUB_EVENT_NAME" == "pull_request" ]; then

				            for f in $(gh api "/repos/${GITHUB_REPOSITORY}/pulls/${PR_NUMBER}/files" --paginate --jq '.[].filename'); do

				              case "$f" in

				                vendor/*|pgxn/*|libs/vm_monitor/*|Dockerfile.compute-node)

				                # List of directories that contain code which affect compute images.

				                #

				                # This isn't exhaustive, just the paths that are most directly compute-related.

				                # For example, compute_ctl also depends on libs/utils, but we don't trigger

				                # an e2e run on that.

				                vendor/*|pgxn/*|compute_tools/*|libs/vm_monitor/*|compute/compute-node.Dockerfile)

				                  platforms=$(echo "${platforms}" | jq --compact-output '. += ["k8s-neonvm"] | unique')

				                  ;;

				                *)

2

.gitignore vendored

View File

@@ -6,6 +6,8 @@ __pycache__/
 test_output/
 .vscode
 .idea
 *.swp
 tags
 neon.iml
 /.neon
 /integration_tests/.neon

4

.gitmodules vendored

View File

@@ -10,3 +10,7 @@
 	path = vendor/postgres-v16
 	url = https://github.com/neondatabase/postgres.git
 	branch = REL_16_STABLE_neon
 [submodule "vendor/postgres-v17"]
 	path = vendor/postgres-v17
 	url = https://github.com/neondatabase/postgres.git
 	branch = REL_17_STABLE_neon

36

CODEOWNERS

View File

@@ -1,13 +1,29 @@
 /compute_tools/ @neondatabase/control-plane @neondatabase/compute
 /storage_controller @neondatabase/storage
 /libs/pageserver_api/ @neondatabase/storage
 /libs/postgres_ffi/ @neondatabase/compute @neondatabase/safekeepers
 /libs/remote_storage/ @neondatabase/storage
 /libs/safekeeper_api/ @neondatabase/safekeepers
 # Autoscaling
 /libs/vm_monitor/ @neondatabase/autoscaling
 /pageserver/ @neondatabase/storage
 # DevProd
 /.github/ @neondatabase/developer-productivity
 # Compute
 /pgxn/ @neondatabase/compute
 /pgxn/neon/ @neondatabase/compute @neondatabase/safekeepers
 /proxy/ @neondatabase/proxy
 /safekeeper/ @neondatabase/safekeepers
 /vendor/ @neondatabase/compute
 /compute/ @neondatabase/compute
 /compute_tools/ @neondatabase/compute
 # Proxy
 /libs/proxy/ @neondatabase/proxy
 /proxy/ @neondatabase/proxy
 # Storage
 /pageserver/ @neondatabase/storage
 /safekeeper/ @neondatabase/storage
 /storage_controller @neondatabase/storage
 /storage_scrubber @neondatabase/storage
 /libs/pageserver_api/ @neondatabase/storage
 /libs/remote_storage/ @neondatabase/storage
 /libs/safekeeper_api/ @neondatabase/storage
 # Shared
 /pgxn/neon/ @neondatabase/compute @neondatabase/storage
 /libs/compute_api/ @neondatabase/compute @neondatabase/control-plane
 /libs/postgres_ffi/ @neondatabase/compute @neondatabase/storage

3543

Cargo.lock generated

View File

File diff suppressed because it is too large Load Diff

									
										166

Cargo.toml
									
												View File
												
				@@ -11,11 +11,12 @@ members = [

				    "pageserver/pagebench",

				    "proxy",

				    "safekeeper",

				    "safekeeper/client",

				    "storage_broker",

				    "storage_controller",

				    "storage_controller/client",

				    "storage_scrubber",

				    "workspace_hack",

				    "trace",

				    "libs/compute_api",

				    "libs/pageserver_api",

				    "libs/postgres_ffi",

				@@ -33,6 +34,11 @@ members = [

				    "libs/postgres_ffi/wal_craft",

				    "libs/vm_monitor",

				    "libs/walproposer",

				    "libs/wal_decoder",

				    "libs/postgres_initdb",

				    "libs/proxy/postgres-protocol2",

				    "libs/proxy/postgres-types2",

				    "libs/proxy/tokio-postgres2",

				]

				[workspace.package]

				@@ -46,45 +52,42 @@ anyhow = { version = "1.0", features = ["backtrace"] }

				arc-swap = "1.6"

				async-compression = { version = "0.4.0", features = ["tokio", "gzip", "zstd"] }

				atomic-take = "1.1.0"

				azure_core = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }

				azure_identity = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage_blobs = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				backtrace = "0.3.74"

				flate2 = "1.0.26"

				async-stream = "0.3"

				async-trait = "0.1"

				aws-config = { version = "1.3", default-features = false, features=["rustls"] }

				aws-sdk-s3 = "1.26"

				aws-sdk-iam = "1.15.0"

				aws-config = { version = "1.5", default-features = false, features=["rustls", "sso"] }

				aws-sdk-s3 = "1.52"

				aws-sdk-iam = "1.46.0"

				aws-sdk-kms = "1.47.0"

				aws-smithy-async = { version = "1.2.1", default-features = false, features=["rt-tokio"] }

				aws-smithy-types = "1.1.9"

				aws-smithy-types = "1.2"

				aws-credential-types = "1.2.0"

				aws-sigv4 = { version = "1.2.1", features = ["sign-http"] }

				aws-types = "1.2.0"

				axum = { version = "0.6.20", features = ["ws"] }

				aws-sigv4 = { version = "1.2", features = ["sign-http"] }

				aws-types = "1.3"

				axum = { version = "0.8.1", features = ["ws"] }

				base64 = "0.13.0"

				bincode = "1.3"

				bindgen = "0.65"

				bindgen = "0.70"

				bit_field = "0.10.2"

				bstr = "1.0"

				byteorder = "1.4"

				bytes = "1.0"

				bytes = "1.9"

				camino = "1.1.6"

				cfg-if = "1.0.0"

				chrono = { version = "0.4", default-features = false, features = ["clock"] }

				clap = { version = "4.0", features = ["derive"] }

				comfy-table = "6.1"

				clap = { version = "4.0", features = ["derive", "env"] }

				comfy-table = "7.1"

				const_format = "0.2"

				crc32c = "0.6"

				crossbeam-deque = "0.8.5"

				crossbeam-utils = "0.8.5"

				dashmap = { version = "5.5.0", features = ["raw-api"] }

				diatomic-waker = { version = "0.2.3" }

				either = "1.8"

				enum-map = "2.4.2"

				enumset = "1.0.12"

				fail = "0.5.0"

				fallible-iterator = "0.2"

				framed-websockets = { version = "0.1.0", git = "https://github.com/neondatabase/framed-websockets" }

				fs2 = "0.4.3"

				futures = "0.3"

				futures-core = "0.3"

				futures-util = "0.3"

				@@ -95,162 +98,179 @@ hdrhistogram = "7.5.2"

				hex = "0.4"

				hex-literal = "0.4"

				hmac = "0.12.1"

				hostname = "0.3.1"

				hostname = "0.4"

				http = {version = "1.1.0", features = ["std"]}

				http-types = { version = "2", default-features = false }

				http-body-util = "0.1.2"

				humantime = "2.1"

				humantime-serde = "1.1.1"

				hyper = "0.14"

				tokio-tungstenite = "0.20.0"

				hyper0 = { package = "hyper", version = "0.14" }

				hyper = "1.4"

				hyper-util = "0.1"

				tokio-tungstenite = "0.21.0"

				indexmap = "2"

				inotify = "0.10.2"

				ipnet = "2.9.0"

				indoc = "2"

				inferno = "0.12.0"

				ipnet = "2.10.0"

				itertools = "0.10"

				itoa = "1.0.11"

				jemalloc_pprof = "0.6"

				jsonwebtoken = "9"

				lasso = "0.7"

				leaky-bucket = "1.0.1"

				libc = "0.2"

				md5 = "0.7.0"

				measured = { version = "0.0.21", features=["lasso"] }

				measured-process = { version = "0.0.21" }

				memoffset = "0.8"

				nix = { version = "0.27", features = ["fs", "process", "socket", "signal", "poll"] }

				measured = { version = "0.0.22", features=["lasso"] }

				measured-process = { version = "0.0.22" }

				memoffset = "0.9"

				nix = { version = "0.27", features = ["dir", "fs", "process", "socket", "signal", "poll"] }

				notify = "6.0.0"

				num_cpus = "1.15"

				num-traits = "0.2.15"

				once_cell = "1.13"

				opentelemetry = "0.20.0"

				opentelemetry-otlp = { version = "0.13.0", default-features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }

				opentelemetry-semantic-conventions = "0.12.0"

				opentelemetry = "0.27"

				opentelemetry_sdk = "0.27"

				opentelemetry-otlp = { version = "0.27", default-features = false, features = ["http-proto", "trace", "http", "reqwest-client"] }

				opentelemetry-semantic-conventions = "0.27"

				parking_lot = "0.12"

				parquet = { version = "51.0.0", default-features = false, features = ["zstd"] }

				parquet_derive = "51.0.0"

				parquet = { version = "53", default-features = false, features = ["zstd"] }

				parquet_derive = "53"

				pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }

				pin-project-lite = "0.2"

				procfs = "0.14"

				pprof = { version = "0.14", features = ["criterion", "flamegraph", "frame-pointer", "protobuf", "protobuf-codec"] }

				procfs = "0.16"

				prometheus = {version = "0.13", default-features=false, features = ["process"]} # removes protobuf dependency

				prost = "0.11"

				prost = "0.13"

				rand = "0.8"

				redis = { version = "0.25.2", features = ["tokio-rustls-comp", "keep-alive"] }

				regex = "1.10.2"

				reqwest = { version = "0.12", default-features = false, features = ["rustls-tls"] }

				reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_20"] }

				reqwest-middleware = "0.3.0"

				reqwest-retry = "0.5"

				reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_27"] }

				reqwest-middleware = "0.4"

				reqwest-retry = "0.7"

				routerify = "3"

				rpds = "0.13"

				rustc-hash = "1.1.0"

				rustls = "0.22"

				rustls = { version = "0.23.16", default-features = false }

				rustls-pemfile = "2"

				rustls-split = "0.3"

				scopeguard = "1.1"

				sysinfo = "0.29.2"

				sd-notify = "0.4.1"

				send-future = "0.1.0"

				sentry = { version = "0.32", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }

				serde = { version = "1.0", features = ["derive"] }

				serde_json = "1"

				serde_path_to_error = "0.1"

				serde_with = "2.0"

				serde_with = { version = "2.0", features = [ "base64" ] }

				serde_assert = "0.5.0"

				sha2 = "0.10.2"

				signal-hook = "0.3"

				smallvec = "1.11"

				smol_str = { version = "0.2.0", features = ["serde"] }

				socket2 = "0.5"

				strum = "0.24"

				strum_macros = "0.24"

				strum = "0.26"

				strum_macros = "0.26"

				"subtle"  = "2.5.0"

				# Our PR https://github.com/nical/rust_debug/pull/4 has been merged but no new version released yet

				svg_fmt = { git = "https://github.com/nical/rust_debug", rev = "28a7d96eecff2f28e75b1ea09f2d499a60d0e3b4" }

				svg_fmt = "0.4.3"

				sync_wrapper = "0.1.2"

				tar = "0.4"

				task-local-extensions = "0.1.4"

				test-context = "0.3"

				thiserror = "1.0"

				tikv-jemallocator = "0.5"

				tikv-jemalloc-ctl = "0.5"

				tikv-jemallocator = { version = "0.6", features = ["profiling", "stats", "unprefixed_malloc_on_supported_platforms"] }

				tikv-jemalloc-ctl = { version = "0.6", features = ["stats"] }

				tokio = { version = "1.17", features = ["macros"] }

				tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }

				tokio-io-timeout = "1.2.0"

				tokio-postgres-rustls = "0.11.0"

				tokio-rustls = "0.25"

				tokio-postgres-rustls = "0.12.0"

				tokio-rustls = { version = "0.26.0", default-features = false, features = ["tls12", "ring"]}

				tokio-stream = "0.1"

				tokio-tar = "0.3"

				tokio-util = { version = "0.7.10", features = ["io", "rt"] }

				toml = "0.7"

				toml_edit = "0.19"

				tonic = {version = "0.9", features = ["tls", "tls-roots"]}

				tower-service = "0.3.2"

				toml = "0.8"

				toml_edit = "0.22"

				tonic = {version = "0.12.3", default-features = false, features = ["channel", "tls", "tls-roots"]}

				tower = { version = "0.5.2", default-features = false }

				tower-http = { version = "0.6.2", features = ["request-id", "trace"] }

				tower-service = "0.3.3"

				tracing = "0.1"

				tracing-error = "0.2.0"

				tracing-opentelemetry = "0.21.0"

				tracing-subscriber = { version = "0.3", default-features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter", "json", "ansi"] }

				tracing-error = "0.2"

				tracing-opentelemetry = "0.28"

				tracing-subscriber = { version = "0.3", default-features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter", "json"] }

				try-lock = "0.2.5"

				twox-hash = { version = "1.6.3", default-features = false }

				typed-json = "0.1"

				url = "2.2"

				urlencoding = "2.1"

				uuid = { version = "1.6.1", features = ["v4", "v7", "serde"] }

				walkdir = "2.3.2"

				rustls-native-certs = "0.7"

				x509-parser = "0.15"

				rustls-native-certs = "0.8"

				x509-parser = "0.16"

				whoami = "1.5.1"

				zerocopy = { version = "0.7", features = ["derive"] }

				## TODO replace this with tracing

				env_logger = "0.10"

				log = "0.4"

				## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed

				postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				## Other git libraries

				heapless = { default-features=false, features=[], git = "https://github.com/japaric/heapless.git", rev = "644653bf3b831c6bb4963be2de24804acf5e5001" } # upstream release pending

				## Azure SDK crates

				azure_core = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }

				azure_identity = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage_blobs = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }

				## Local libraries

				compute_api = { version = "0.1", path = "./libs/compute_api/" }

				consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }

				metrics = { version = "0.1", path = "./libs/metrics/" }

				pageserver = { path = "./pageserver" }

				pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }

				pageserver_client = { path = "./pageserver/client" }

				pageserver_compaction = { version = "0.1", path = "./pageserver/compaction/" }

				postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }

				postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }

				postgres_ffi = { version = "0.1", path = "./libs/postgres_ffi/" }

				postgres_initdb = { path = "./libs/postgres_initdb" }

				pq_proto = { version = "0.1", path = "./libs/pq_proto/" }

				remote_storage = { version = "0.1", path = "./libs/remote_storage/" }

				safekeeper_api = { version = "0.1", path = "./libs/safekeeper_api" }

				safekeeper_client = { path = "./safekeeper/client" }

				desim = { version = "0.1", path = "./libs/desim" }

				storage_broker = { version = "0.1", path = "./storage_broker/" } # Note: main broker code is inside the binary crate, so linking with the library shouldn't be heavy.

				storage_controller_client = { path = "./storage_controller/client" }

				tenant_size_model = { version = "0.1", path = "./libs/tenant_size_model/" }

				tracing-utils = { version = "0.1", path = "./libs/tracing-utils/" }

				utils = { version = "0.1", path = "./libs/utils/" }

				vm_monitor = { version = "0.1", path = "./libs/vm_monitor/" }

				walproposer = { version = "0.1", path = "./libs/walproposer/" }

				wal_decoder = { version = "0.1", path = "./libs/wal_decoder" }

				## Common library dependency

				workspace_hack = { version = "0.1", path = "./workspace_hack/" }

				## Build dependencies

				criterion = "0.5.1"

				rcgen = "0.12"

				rcgen = "0.13"

				rstest = "0.18"

				camino-tempfile = "1.0.2"

				tonic-build = "0.9"

				tonic-build = "0.12"

				[patch.crates-io]

				# Needed to get `tokio-postgres-rustls` to depend on our fork.

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }

				# bug fixes for UUID

				parquet = { git = "https://github.com/apache/arrow-rs", branch = "master" }

				parquet_derive = { git = "https://github.com/apache/arrow-rs", branch = "master" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				################# Binary contents sections

				[profile.release]

				# This is useful for profiling and, to some extent, debug.

				# Besides, debug info should not affect the performance.

				#

				# NB: we also enable frame pointers for improved profiling, see .cargo/config.toml.

				debug = true

				# disable debug symbols for all packages except this one to decrease binaries size

									
										64

Dockerfile
									
												View File
												
				@@ -5,6 +5,10 @@

				ARG REPOSITORY=neondatabase

				ARG IMAGE=build-tools

				ARG TAG=pinned

				ARG DEFAULT_PG_VERSION=17

				ARG STABLE_PG_VERSION=16

				ARG DEBIAN_VERSION=bookworm

				ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim

				# Build Postgres

				FROM $REPOSITORY/$IMAGE:$TAG AS pg-build

				@@ -13,11 +17,12 @@ WORKDIR /home/nonroot

				COPY --chown=nonroot vendor/postgres-v14 vendor/postgres-v14

				COPY --chown=nonroot vendor/postgres-v15 vendor/postgres-v15

				COPY --chown=nonroot vendor/postgres-v16 vendor/postgres-v16

				COPY --chown=nonroot vendor/postgres-v17 vendor/postgres-v17

				COPY --chown=nonroot pgxn pgxn

				COPY --chown=nonroot Makefile Makefile

				COPY --chown=nonroot scripts/ninstall.sh scripts/ninstall.sh

				ENV BUILD_TYPE release

				ENV BUILD_TYPE=release

				RUN set -e \

				    && mold -run make -j $(nproc) -s neon-pg-ext \

				    && rm -rf pg_install/build \

				@@ -28,26 +33,19 @@ FROM $REPOSITORY/$IMAGE:$TAG AS build

				WORKDIR /home/nonroot

				ARG GIT_VERSION=local

				ARG BUILD_TAG

				# Enable https://github.com/paritytech/cachepot to cache Rust crates' compilation results in Docker builds.

				# Set up cachepot to use an AWS S3 bucket for cache results, to reuse it between `docker build` invocations.

				# cachepot falls back to local filesystem if S3 is misconfigured, not failing the build

				ARG RUSTC_WRAPPER=cachepot

				ENV AWS_REGION=eu-central-1

				ENV CACHEPOT_S3_KEY_PREFIX=cachepot

				ARG CACHEPOT_BUCKET=neon-github-dev

				#ARG AWS_ACCESS_KEY_ID

				#ARG AWS_SECRET_ACCESS_KEY

				ARG STABLE_PG_VERSION

				COPY --from=pg-build /home/nonroot/pg_install/v14/include/postgresql/server pg_install/v14/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v15/include/postgresql/server pg_install/v15/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v16/include/postgresql/server pg_install/v16/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v17/include/postgresql/server pg_install/v17/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v16/lib                       pg_install/v16/lib

				COPY --from=pg-build /home/nonroot/pg_install/v17/lib                       pg_install/v17/lib

				COPY --chown=nonroot . .

				# Show build caching stats to check if it was used in the end.

				# Has to be the part of the same RUN since cachepot daemon is killed in the end of this RUN, losing the compilation stats.

				ARG ADDITIONAL_RUSTFLAGS

				RUN set -e \

				    && RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment" cargo build  \

				    && RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo build \

				      --bin pg_sni_router  \

				      --bin pageserver  \

				      --bin pagectl  \

				@@ -56,20 +54,26 @@ RUN set -e \

				      --bin storage_controller  \

				      --bin proxy  \

				      --bin neon_local \

				      --locked --release \

				    && cachepot -s

				      --bin storage_scrubber \

				      --locked --release

				# Build final image

				#

				FROM debian:bullseye-slim

				FROM debian:${DEBIAN_FLAVOR}

				ARG DEFAULT_PG_VERSION

				WORKDIR /data

				RUN set -e \

				    && echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries \

				    && apt update \

				    && apt install -y \

				        libreadline-dev \

				        libseccomp-dev \

				        ca-certificates \

					# System postgres for use with client libraries (e.g. in storage controller)

				        postgresql-15 \

				        openssl \

				    && rm -f /etc/apt/apt.conf.d/80-retries \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \

				    && useradd -d /data neon \

				    && chown -R neon:neon /data

				@@ -82,28 +86,30 @@ COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_broker

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_controller  /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/proxy               /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/neon_local          /usr/local/bin

				COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_scrubber    /usr/local/bin

				COPY --from=pg-build /home/nonroot/pg_install/v14 /usr/local/v14/

				COPY --from=pg-build /home/nonroot/pg_install/v15 /usr/local/v15/

				COPY --from=pg-build /home/nonroot/pg_install/v16 /usr/local/v16/

				COPY --from=pg-build /home/nonroot/pg_install/v17 /usr/local/v17/

				COPY --from=pg-build /home/nonroot/postgres_install.tar.gz /data/

				# By default, pageserver uses `.neon/` working directory in WORKDIR, so create one and fill it with the dummy config.

				# Now, when `docker run ... pageserver` is run, it can start without errors, yet will have some default dummy values.

				RUN mkdir -p /data/.neon/ && chown -R neon:neon /data/.neon/ \

				    && /usr/local/bin/pageserver -D /data/.neon/ --init \

				       -c "id=1234" \

				       -c "broker_endpoint='http://storage_broker:50051'" \

				       -c "pg_distrib_dir='/usr/local/'" \

				       -c "listen_pg_addr='0.0.0.0:6400'" \

				       -c "listen_http_addr='0.0.0.0:9898'"

				# When running a binary that links with libpq, default to using our most recent postgres version.  Binaries

				# that want a particular postgres version will select it explicitly: this is just a default.

				ENV LD_LIBRARY_PATH /usr/local/v16/lib

				RUN mkdir -p /data/.neon/ && \

				  echo "id=1234" > "/data/.neon/identity.toml" && \

				  echo "broker_endpoint='http://storage_broker:50051'\n" \

				       "pg_distrib_dir='/usr/local/'\n" \

				       "listen_pg_addr='0.0.0.0:6400'\n" \

				       "listen_http_addr='0.0.0.0:9898'\n" \

				       "availability_zone='local'\n" \

				  > /data/.neon/pageserver.toml && \

				  chown -R neon:neon /data/.neon

				VOLUME ["/data"]

				USER neon

				EXPOSE 6400

				EXPOSE 9898

				CMD ["/usr/local/bin/pageserver", "-D", "/data/.neon"]

1030

Dockerfile.compute-node

View File

File diff suppressed because it is too large Load Diff

									
										92

Makefile
									
												View File
												
				@@ -3,7 +3,6 @@ ROOT_PROJECT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))

				# Where to install Postgres, default is ./pg_install, maybe useful for package managers

				POSTGRES_INSTALL_DIR ?= $(ROOT_PROJECT_DIR)/pg_install/

				OPENSSL_PREFIX_DIR := /usr/local/openssl

				ICU_PREFIX_DIR := /usr/local/icu

				#

				@@ -26,11 +25,9 @@ endif

				ifeq ($(shell test -e /home/nonroot/.docker_build && echo -n yes),yes)

					# Exclude static build openssl, icu for local build (MacOS, Linux)

					# Only keep for build type release and debug

					PG_CFLAGS += -I$(OPENSSL_PREFIX_DIR)/include

					PG_CONFIGURE_OPTS += --with-icu

					PG_CONFIGURE_OPTS += ICU_CFLAGS='-I/$(ICU_PREFIX_DIR)/include -DU_STATIC_IMPLEMENTATION'

					PG_CONFIGURE_OPTS += ICU_LIBS='-L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -licui18n -licuuc -licudata -lstdc++ -Wl,-Bdynamic -lm'

					PG_CONFIGURE_OPTS += LDFLAGS='-L$(OPENSSL_PREFIX_DIR)/lib -L$(OPENSSL_PREFIX_DIR)/lib64 -L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -Wl,-Bstatic -lssl -lcrypto -Wl,-Bdynamic -lrt -lm -ldl -lpthread'

				endif

				UNAME_S := $(shell uname -s)

				@@ -38,6 +35,7 @@ ifeq ($(UNAME_S),Linux)

					# Seccomp BPF is only available for Linux

					PG_CONFIGURE_OPTS += --with-libseccomp

				else ifeq ($(UNAME_S),Darwin)

					PG_CFLAGS += -DUSE_PREFETCH

					ifndef DISABLE_HOMEBREW

						# macOS with brew-installed openssl requires explicit paths

						# It can be configured with OPENSSL_PREFIX variable

				@@ -66,8 +64,8 @@ CARGO_BUILD_FLAGS += $(filter -j1,$(MAKEFLAGS))

				CARGO_CMD_PREFIX += $(if $(filter n,$(MAKEFLAGS)),,+)

				# Force cargo not to print progress bar

				CARGO_CMD_PREFIX += CARGO_TERM_PROGRESS_WHEN=never CI=1

				# Set PQ_LIB_DIR to make sure `storage_controller` get linked with bundled libpq (through diesel)

				CARGO_CMD_PREFIX += PQ_LIB_DIR=$(POSTGRES_INSTALL_DIR)/v16/lib

				CACHEDIR_TAG_CONTENTS := "Signature: 8a477f597d28d172789f06886806bc55"

				#

				# Top level Makefile to build Neon and PostgreSQL

				@@ -79,15 +77,24 @@ all: neon postgres neon-pg-ext

				#

				# The 'postgres_ffi' depends on the Postgres headers.

				.PHONY: neon

				neon: postgres-headers walproposer-lib

				neon: postgres-headers walproposer-lib cargo-target-dir

					+@echo "Compiling Neon"

					$(CARGO_CMD_PREFIX) cargo build $(CARGO_BUILD_FLAGS)

				.PHONY: cargo-target-dir

				cargo-target-dir:

					# https://github.com/rust-lang/cargo/issues/14281

					mkdir -p target

					test -e target/CACHEDIR.TAG || echo "$(CACHEDIR_TAG_CONTENTS)" > target/CACHEDIR.TAG

				### PostgreSQL parts

				# Some rules are duplicated for Postgres v14 and 15. We may want to refactor

				# to avoid the duplication in the future, but it's tolerable for now.

				#

				$(POSTGRES_INSTALL_DIR)/build/%/config.status:

					mkdir -p $(POSTGRES_INSTALL_DIR)

					test -e $(POSTGRES_INSTALL_DIR)/CACHEDIR.TAG || echo "$(CACHEDIR_TAG_CONTENTS)" > $(POSTGRES_INSTALL_DIR)/CACHEDIR.TAG

					+@echo "Configuring Postgres $* build"

					@test -s $(ROOT_PROJECT_DIR)/vendor/postgres-$*/configure || { \

						echo "\nPostgres submodule not found in $(ROOT_PROJECT_DIR)/vendor/postgres-$*/, execute "; \

				@@ -108,6 +115,8 @@ $(POSTGRES_INSTALL_DIR)/build/%/config.status:

				# I'm not sure why it wouldn't work, but this is the only place (apart from

				# the "build-all-versions" entry points) where direct mention of PostgreSQL

				# versions is used.

				.PHONY: postgres-configure-v17

				postgres-configure-v17: $(POSTGRES_INSTALL_DIR)/build/v17/config.status

				.PHONY: postgres-configure-v16

				postgres-configure-v16: $(POSTGRES_INSTALL_DIR)/build/v16/config.status

				.PHONY: postgres-configure-v15

				@@ -133,6 +142,8 @@ postgres-%: postgres-configure-% \

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_prewarm install

					+@echo "Compiling pg_buffercache $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_buffercache install

					+@echo "Compiling pg_visibility $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_visibility install

					+@echo "Compiling pageinspect $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect install

					+@echo "Compiling amcheck $*"

				@@ -155,27 +166,27 @@ postgres-check-%: postgres-%

				neon-pg-ext-%: postgres-%

					+@echo "Compiling neon $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile install

					+@echo "Compiling neon_walredo $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile install

					+@echo "Compiling neon_rmgr $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_rmgr/Makefile install

					+@echo "Compiling neon_test_utils $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile install

					+@echo "Compiling neon_utils $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-utils-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile install

				@@ -204,29 +215,31 @@ neon-pg-clean-ext-%:

				# they depend on openssl and other libraries that are not included in our

				# Rust build.

				.PHONY: walproposer-lib

				walproposer-lib: neon-pg-ext-v16

				walproposer-lib: neon-pg-ext-v17

					+@echo "Compiling walproposer-lib"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgport.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgcommon.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

				ifeq ($(UNAME_S),Linux)

					cp $(POSTGRES_INSTALL_DIR)/v17/lib/libpgport.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v17/lib/libpgcommon.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgport.a \

						pg_strong_random.o

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgcommon.a \

						pg_crc32c.o \

						hmac_openssl.o \

						checksum_helper.o \

						cryptohash_openssl.o \

						scram-common.o \

						hmac_openssl.o \

						md5_common.o \

						checksum_helper.o

						parse_manifest.o \

						scram-common.o

				ifeq ($(UNAME_S),Linux)

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgcommon.a \

						pg_crc32c.o

				endif

				.PHONY: walproposer-lib-clean

				walproposer-lib-clean:

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config \

						-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean

				@@ -234,48 +247,55 @@ walproposer-lib-clean:

				neon-pg-ext: \

					neon-pg-ext-v14 \

					neon-pg-ext-v15 \

					neon-pg-ext-v16

					neon-pg-ext-v16 \

					neon-pg-ext-v17

				.PHONY: neon-pg-clean-ext

				neon-pg-clean-ext: \

					neon-pg-clean-ext-v14 \

					neon-pg-clean-ext-v15 \

					neon-pg-clean-ext-v16

					neon-pg-clean-ext-v16 \

					neon-pg-clean-ext-v17

				# shorthand to build all Postgres versions

				.PHONY: postgres

				postgres: \

					postgres-v14 \

					postgres-v15 \

					postgres-v16

					postgres-v16 \

					postgres-v17

				.PHONY: postgres-headers

				postgres-headers: \

					postgres-headers-v14 \

					postgres-headers-v15 \

					postgres-headers-v16

					postgres-headers-v16 \

					postgres-headers-v17

				.PHONY: postgres-clean

				postgres-clean: \

					postgres-clean-v14 \

					postgres-clean-v15 \

					postgres-clean-v16

					postgres-clean-v16 \

					postgres-clean-v17

				.PHONY: postgres-check

				postgres-check: \

					postgres-check-v14 \

					postgres-check-v15 \

					postgres-check-v16

					postgres-check-v16 \

					postgres-check-v17

				# This doesn't remove the effects of 'configure'.

				.PHONY: clean

				clean: postgres-clean neon-pg-clean-ext

					$(MAKE) -C compute clean

					$(CARGO_CMD_PREFIX) cargo clean

				# This removes everything

				.PHONY: distclean

				distclean:

					rm -rf $(POSTGRES_INSTALL_DIR)

					$(RM) -r $(POSTGRES_INSTALL_DIR)

					$(CARGO_CMD_PREFIX) cargo clean

				.PHONY: fmt

				@@ -307,16 +327,16 @@ postgres-%-pgindent: postgres-%-pg-bsd-indent postgres-%-typedefs.list

						$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/pgindent --typedefs postgres-$*-typedefs-full.list \

						$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/ \

						--excludes $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/exclude_file_patterns

					rm -f pg*.BAK

					$(RM) pg*.BAK

				# Indent pxgn/neon.

				.PHONY: pgindent

				neon-pgindent: postgres-v16-pg-bsd-indent neon-pg-ext-v16

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/find_typedef \

						INDENT=$(POSTGRES_INSTALL_DIR)/build/v16/src/tools/pg_bsd_indent/pg_bsd_indent \

						PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/pgindent/pgindent \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-v16 \

				.PHONY: neon-pgindent

				neon-pgindent: postgres-v17-pg-bsd-indent neon-pg-ext-v17

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config COPT='$(COPT)' \

						FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v17/src/tools/find_typedef \

						INDENT=$(POSTGRES_INSTALL_DIR)/build/v17/src/tools/pg_bsd_indent/pg_bsd_indent \

						PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v17/src/tools/pgindent/pgindent \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-v17 \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile pgindent

									
										20

README.md
									
												View File
												
				@@ -21,8 +21,10 @@ The Neon storage engine consists of two major components:

				See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more information.

				## Running local installation

				## Running a local development environment

				Neon can be run on a workstation for small experiments and to test code changes, by

				following these instructions.

				#### Installing dependencies on Linux

				1. Install build dependencies and other applicable packages

				@@ -31,7 +33,7 @@ See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more informati

				```bash

				apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \

				libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler \

				libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev

				libprotobuf-dev libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev

				```

				* On Fedora, these packages are needed:

				```bash

				@@ -58,12 +60,18 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

				1. Install XCode and dependencies

				```

				xcode-select --install

				brew install protobuf openssl flex bison icu4c pkg-config

				brew install protobuf openssl flex bison icu4c pkg-config m4

				# add openssl to PATH, required for ed25519 keys generation in neon_local

				echo 'export PATH="$(brew --prefix openssl)/bin:$PATH"' >> ~/.zshrc

				```

				If you get errors about missing `m4` you may have to install it manually:

				```

				brew install m4

				brew link --force m4

				```

				2. [Install Rust](https://www.rust-lang.org/tools/install)

				```

				# recommended approach from https://www.rust-lang.org/tools/install

				@@ -126,7 +134,7 @@ make -j`sysctl -n hw.logicalcpu` -s

				To run the `psql` client, install the `postgresql-client` package or modify `PATH` and `LD_LIBRARY_PATH` to include `pg_install/bin` and `pg_install/lib`, respectively.

				To run the integration tests or Python scripts (not required to use the code), install

				Python (3.9 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.3](https://python-poetry.org/)) in the project directory.

				Python (3.11 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.8](https://python-poetry.org/)) in the project directory.

				#### Running neon database

				@@ -232,7 +240,7 @@ postgres=# select * from t;

				> cargo neon stop

				```

				More advanced usages can be found at [Control Plane and Neon Local](./control_plane/README.md).

				More advanced usages can be found at [Local Development Control Plane (`neon_local`))](./control_plane/README.md).

				#### Handling build failures

				@@ -262,7 +270,7 @@ By default, this runs both debug and release modes, and all supported postgres v

				testing locally, it is convenient to run just one set of permutations, like this:

				```sh

				DEFAULT_PG_VERSION=15 BUILD_TYPE=release ./scripts/pytest

				DEFAULT_PG_VERSION=16 BUILD_TYPE=release ./scripts/pytest

				```

				## Flamegraphs

									
										157

Dockerfile.build-tools → build-tools.Dockerfile
									
												View File
												
				@@ -1,10 +1,78 @@

				FROM debian:bullseye-slim

				ARG DEBIAN_VERSION=bookworm

				FROM debian:bookworm-slim AS pgcopydb_builder

				ARG DEBIAN_VERSION

				RUN echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries && \

				    echo -e "retry_connrefused = on\ntimeout=15\ntries=5\n" > /root/.wgetrc \

				    echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /root/.curlrc

				RUN if [ "${DEBIAN_VERSION}" = "bookworm" ]; then \

				        set -e && \

				        apt update && \

				        apt install -y --no-install-recommends \

				        ca-certificates wget gpg && \

				        wget -qO - https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor -o /usr/share/keyrings/postgresql-keyring.gpg && \

				        echo "deb [signed-by=/usr/share/keyrings/postgresql-keyring.gpg] http://apt.postgresql.org/pub/repos/apt bookworm-pgdg main" > /etc/apt/sources.list.d/pgdg.list && \

				        apt-get update && \

				        apt install -y --no-install-recommends \

				        build-essential \

				        autotools-dev \

				        libedit-dev \

				        libgc-dev \

				        libpam0g-dev \

				        libreadline-dev \

				        libselinux1-dev \

				        libxslt1-dev \

				        libssl-dev \

				        libkrb5-dev \

				        zlib1g-dev \

				        liblz4-dev \

				        libpq5 \

				        libpq-dev \

				        libzstd-dev \

				        postgresql-16 \

				        postgresql-server-dev-16 \

				        postgresql-common  \

				        python3-sphinx && \

				        wget -O /tmp/pgcopydb.tar.gz https://github.com/dimitri/pgcopydb/archive/refs/tags/v0.17.tar.gz && \

				        mkdir /tmp/pgcopydb && \

				        tar -xzf /tmp/pgcopydb.tar.gz -C /tmp/pgcopydb --strip-components=1 && \

				        cd /tmp/pgcopydb && \

				        make -s clean && \

				        make -s -j12 install && \

				        libpq_path=$(find /lib /usr/lib -name "libpq.so.5" | head -n 1) && \

				        mkdir -p /pgcopydb/lib && \

				        cp "$libpq_path" /pgcopydb/lib/; \

				    else \

				        # copy command below will fail if we don't have dummy files, so we create them for other debian versions

				        mkdir -p /usr/lib/postgresql/16/bin && touch /usr/lib/postgresql/16/bin/pgcopydb && \

				        mkdir -p mkdir -p /pgcopydb/lib && touch /pgcopydb/lib/libpq.so.5; \

				    fi

				FROM debian:${DEBIAN_VERSION}-slim AS build_tools

				ARG DEBIAN_VERSION

				# Add nonroot user

				RUN useradd -ms /bin/bash nonroot -b /home

				SHELL ["/bin/bash", "-c"]

				RUN mkdir -p /pgcopydb/bin && \

				    mkdir -p /pgcopydb/lib && \

				    chmod -R 755 /pgcopydb && \

				    chown -R nonroot:nonroot /pgcopydb

				COPY --from=pgcopydb_builder /usr/lib/postgresql/16/bin/pgcopydb /pgcopydb/bin/pgcopydb

				COPY --from=pgcopydb_builder /pgcopydb/lib/libpq.so.5 /pgcopydb/lib/libpq.so.5

				RUN echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries && \

				    echo -e "retry_connrefused = on\ntimeout=15\ntries=5\n" > /root/.wgetrc \

				    echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /root/.curlrc

				# System deps

				#

				# 'gdb' is included so that we get backtraces of core dumps produced in

				# regression tests

				RUN set -e \

				    && apt update \

				    && apt install -y \

				@@ -16,29 +84,30 @@ RUN set -e \

				        cmake \

				        curl \

				        flex \

				        gdb \

				        git \

				        gnupg \

				        gzip \

				        jq \

				        jsonnet \

				        libcurl4-openssl-dev \

				        libbz2-dev \

				        libffi-dev \

				        liblzma-dev \

				        libncurses5-dev \

				        libncursesw5-dev \

				        libpq-dev \

				        libreadline-dev \

				        libseccomp-dev \

				        libsqlite3-dev \

				        libssl-dev \

				        libstdc++-10-dev \

				        $([[ "${DEBIAN_VERSION}" = "bullseye" ]] && echo libstdc++-10-dev || echo libstdc++-11-dev) \

				        libtool \

				        libxml2-dev \

				        libxmlsec1-dev \

				        libxxhash-dev \

				        lsof \

				        make \

				        netcat \

				        netcat-openbsd \

				        net-tools \

				        openssh-client \

				        parallel \

				@@ -50,8 +119,20 @@ RUN set -e \

				        zstd \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# sql_exporter

				# Keep the version the same as in compute/compute-node.Dockerfile and

				# test_runner/regress/test_compute_metrics.py.

				ENV SQL_EXPORTER_VERSION=0.17.0

				RUN curl -fsSL \

				    "https://github.com/burningalchemist/sql_exporter/releases/download/${SQL_EXPORTER_VERSION}/sql_exporter-${SQL_EXPORTER_VERSION}.linux-$(case "$(uname -m)" in x86_64) echo amd64;; aarch64) echo arm64;; esac).tar.gz" \

				    --output sql_exporter.tar.gz \

				    && mkdir /tmp/sql_exporter \

				    && tar xzvf sql_exporter.tar.gz -C /tmp/sql_exporter --strip-components=1 \

				    && mv /tmp/sql_exporter/sql_exporter /usr/local/bin/sql_exporter

				# protobuf-compiler (protoc)

				ENV PROTOC_VERSION 25.1

				ENV PROTOC_VERSION=25.1

				RUN curl -fsSL "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/protoc-${PROTOC_VERSION}-linux-$(uname -m | sed 's/aarch64/aarch_64/g').zip" -o "protoc.zip" \

				    && unzip -q protoc.zip -d protoc \

				    && mv protoc/bin/protoc /usr/local/bin/protoc \

				@@ -65,14 +146,26 @@ RUN curl -sL "https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/

				    && mv s5cmd /usr/local/bin/s5cmd

				# LLVM

				ENV LLVM_VERSION=18

				ENV LLVM_VERSION=19

				RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \

				    && echo "deb http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \

				    && echo "deb http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain-${DEBIAN_VERSION}-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \

				    && apt update \

				    && apt install -y clang-${LLVM_VERSION} llvm-${LLVM_VERSION} \

				    && bash -c 'for f in /usr/bin/clang*-${LLVM_VERSION} /usr/bin/llvm*-${LLVM_VERSION}; do ln -s "${f}" "${f%-${LLVM_VERSION}}"; done' \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# Install docker

				RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \

				    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian ${DEBIAN_VERSION} stable" > /etc/apt/sources.list.d/docker.list \

				    && apt update \

				    && apt install -y docker-ce docker-ce-cli \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# Configure sudo & docker

				RUN usermod -aG sudo nonroot && \

				    echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers && \

				    usermod -aG docker nonroot

				# AWS CLI

				RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "awscliv2.zip" \

				    && unzip -q awscliv2.zip \

				@@ -80,7 +173,7 @@ RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "aws

				    && rm awscliv2.zip

				# Mold: A Modern Linker

				ENV MOLD_VERSION v2.31.0

				ENV MOLD_VERSION=v2.34.1

				RUN set -e \

				    && git clone https://github.com/rui314/mold.git \

				    && mkdir mold/build \

				@@ -105,25 +198,10 @@ RUN for package in Capture::Tiny DateTime Devel::Cover Digest::MD5 File::Spec JS

				    && make install \

				    && rm -rf ../lcov.tar.gz

				# Compile and install the static OpenSSL library

				ENV OPENSSL_VERSION=1.1.1w

				ENV OPENSSL_PREFIX=/usr/local/openssl

				RUN wget -O /tmp/openssl-${OPENSSL_VERSION}.tar.gz https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz && \

				    echo "cf3098950cb4d853ad95c0841f1f9c6d3dc102dccfcacd521d93925208b76ac8 /tmp/openssl-${OPENSSL_VERSION}.tar.gz" | sha256sum --check && \

				    cd /tmp && \

				    tar xzvf /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \

				    rm /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \

				    cd /tmp/openssl-${OPENSSL_VERSION} && \

				    ./config --prefix=${OPENSSL_PREFIX}  -static --static no-shared -fPIC && \

				    make -j "$(nproc)" && \

				    make install && \

				    cd /tmp && \

				    rm -rf /tmp/openssl-${OPENSSL_VERSION}

				# Use the same version of libicu as the compute nodes so that

				# clusters created using inidb on pageserver can be used by computes.

				#

				# TODO: at this time, Dockerfile.compute-node uses the debian bullseye libicu

				# TODO: at this time, compute-node.Dockerfile uses the debian bullseye libicu

				# package, which is 67.1. We're duplicating that knowledge here, and also, technically,

				# Debian has a few patches on top of 67.1 that we're not adding here.

				ENV ICU_VERSION=67.1

				@@ -148,8 +226,10 @@ RUN wget -O /tmp/libicu-${ICU_VERSION}.tgz https://github.com/unicode-org/icu/re

				USER nonroot:nonroot

				WORKDIR /home/nonroot

				RUN echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /home/nonroot/.curlrc

				# Python

				ENV PYTHON_VERSION=3.9.18 \

				ENV PYTHON_VERSION=3.11.10 \

				    PYENV_ROOT=/home/nonroot/.pyenv \

				    PATH=/home/nonroot/.pyenv/shims:/home/nonroot/.pyenv/bin:/home/nonroot/.poetry/bin:$PATH

				RUN set -e \

				@@ -173,9 +253,14 @@ WORKDIR /home/nonroot

				# Rust

				# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)

				ENV RUSTC_VERSION=1.79.0

				ENV RUSTC_VERSION=1.84.0

				ENV RUSTUP_HOME="/home/nonroot/.rustup"

				ENV PATH="/home/nonroot/.cargo/bin:${PATH}"

				ARG RUSTFILT_VERSION=0.2.1

				ARG CARGO_HAKARI_VERSION=0.9.33

				ARG CARGO_DENY_VERSION=0.16.2

				ARG CARGO_HACK_VERSION=0.6.33

				ARG CARGO_NEXTEST_VERSION=0.9.85

				RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && whoami && \

					chmod +x rustup-init && \

					./rustup-init -y --default-toolchain ${RUSTC_VERSION} && \

				@@ -183,16 +268,14 @@ RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux

				    export PATH="$HOME/.cargo/bin:$PATH" && \

				    . "$HOME/.cargo/env" && \

				    cargo --version && rustup --version && \

				    rustup component add llvm-tools-preview rustfmt clippy && \

				    cargo install --git https://github.com/paritytech/cachepot && \

				    cargo install rustfilt && \

				    cargo install cargo-hakari && \

				    cargo install cargo-deny --locked && \

				    cargo install cargo-hack && \

				    cargo install cargo-nextest && \

				    rustup component add llvm-tools rustfmt clippy && \

				    cargo install rustfilt            --version ${RUSTFILT_VERSION} && \

				    cargo install cargo-hakari        --version ${CARGO_HAKARI_VERSION} && \

				    cargo install cargo-deny --locked --version ${CARGO_DENY_VERSION} && \

				    cargo install cargo-hack          --version ${CARGO_HACK_VERSION} && \

				    cargo install cargo-nextest       --version ${CARGO_NEXTEST_VERSION} && \

				    rm -rf /home/nonroot/.cargo/registry && \

				    rm -rf /home/nonroot/.cargo/git

				ENV RUSTC_WRAPPER=cachepot

				# Show versions

				RUN whoami \

				@@ -203,5 +286,11 @@ RUN whoami \

				    && rustc --version --verbose \

				    && clang --version

				RUN if [ "${DEBIAN_VERSION}" = "bookworm" ]; then \

				    LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version; \

				else \

				    echo "pgcopydb is not available for ${DEBIAN_VERSION}"; \

				fi

				# Set following flag to check in Makefile if its running in Docker

				RUN touch /home/nonroot/.docker_build

5

compute/.gitignore vendored Normal file

View File

@@ -0,0 +1,5 @@
 # sql_exporter config files generated from Jsonnet
 etc/neon_collector.yml
 etc/neon_collector_autoscaling.yml
 etc/sql_exporter.yml
 etc/sql_exporter_autoscaling.yml

									
										50

compute/Makefile
									
										Normal file
									
												View File
												
				@@ -0,0 +1,50 @@

				jsonnet_files = $(wildcard \

					etc/*.jsonnet \

					etc/sql_exporter/*.libsonnet)

				.PHONY: all

				all: neon_collector.yml neon_collector_autoscaling.yml sql_exporter.yml sql_exporter_autoscaling.yml

				neon_collector.yml: $(jsonnet_files)

					JSONNET_PATH=jsonnet:etc jsonnet \

						--output-file etc/$@ \

						--ext-str pg_version=$(PG_VERSION) \

						etc/neon_collector.jsonnet

				neon_collector_autoscaling.yml: $(jsonnet_files)

					JSONNET_PATH=jsonnet:etc jsonnet \

						--output-file etc/$@ \

						--ext-str pg_version=$(PG_VERSION) \

						etc/neon_collector_autoscaling.jsonnet

				sql_exporter.yml: $(jsonnet_files)

					JSONNET_PATH=etc jsonnet \

						--output-file etc/$@ \

						--tla-str collector_name=neon_collector \

						--tla-str collector_file=neon_collector.yml \

						--tla-str 'connection_string=postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=sql_exporter' \

						etc/sql_exporter.jsonnet

				sql_exporter_autoscaling.yml: $(jsonnet_files)

					JSONNET_PATH=etc jsonnet \

						--output-file etc/$@ \

						--tla-str collector_name=neon_collector_autoscaling \

						--tla-str collector_file=neon_collector_autoscaling.yml \

						--tla-str 'connection_string=postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=sql_exporter_autoscaling' \

						etc/sql_exporter.jsonnet

				.PHONY: clean

				clean:

					$(RM) \

						etc/neon_collector.yml \

						etc/neon_collector_autoscaling.yml \

						etc/sql_exporter.yml \

						etc/sql_exporter_autoscaling.yml

				.PHONY: jsonnetfmt-test

				jsonnetfmt-test:

					jsonnetfmt --test $(jsonnet_files)

				.PHONY: jsonnetfmt-format

				jsonnetfmt-format:

					jsonnetfmt --in-place $(jsonnet_files)

21

compute/README.md Normal file

View File

@@ -0,0 +1,21 @@
 This directory contains files that are needed to build the compute
 images, or included in the compute images.
 compute-node.Dockerfile
 	To build the compute image
 vm-image-spec.yaml
 	Instructions for vm-builder, to turn the compute-node image into
 	corresponding vm-compute-node image.
 etc/
 	Configuration files included in /etc in the compute image
 patches/
 	Some extensions need to be patched to work with Neon. This
 	directory contains such patches. They are applied to the extension
 	sources in compute-node.Dockerfile
 In addition to these, postgres itself, the neon postgres extension,
 and compute_ctl are built and copied into the compute image by
 compute-node.Dockerfile.

1523

compute/compute-node.Dockerfile Normal file

View File

File diff suppressed because it is too large Load Diff

									
										17

compute/etc/README.md
									
										Normal file
									
												View File
												
				@@ -0,0 +1,17 @@

				# Compute Configuration

				These files are the configuration files for various other pieces of software

				that will be running in the compute alongside Postgres.

				## `sql_exporter`

				### Adding a `sql_exporter` Metric

				We use `sql_exporter` to export various metrics from Postgres. In order to add

				a metric, you will need to create two files: a `libsonnet` and a `sql` file. You

				will then import the `libsonnet` file in one of the collector files, and the

				`sql` file will be imported in the `libsonnet` file.

				In the event your statistic is an LSN, you may want to cast it to a `float8`

				because Prometheus only supports floats. It's probably fine because `float8` can

				store integers from `-2^53` to `+2^53` exactly.

									
										54

compute/etc/neon_collector.jsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,54 @@

				{

				  collector_name: 'neon_collector',

				  metrics: [

				    import 'sql_exporter/checkpoints_req.libsonnet',

				    import 'sql_exporter/checkpoints_timed.libsonnet',

				    import 'sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet',

				    import 'sql_exporter/compute_current_lsn.libsonnet',

				    import 'sql_exporter/compute_logical_snapshot_files.libsonnet',

				    import 'sql_exporter/compute_logical_snapshots_bytes.libsonnet',

				    import 'sql_exporter/compute_max_connections.libsonnet',

				    import 'sql_exporter/compute_receive_lsn.libsonnet',

				    import 'sql_exporter/compute_subscriptions_count.libsonnet',

				    import 'sql_exporter/connection_counts.libsonnet',

				    import 'sql_exporter/db_total_size.libsonnet',

				    import 'sql_exporter/file_cache_read_wait_seconds_bucket.libsonnet',

				    import 'sql_exporter/file_cache_read_wait_seconds_count.libsonnet',

				    import 'sql_exporter/file_cache_read_wait_seconds_sum.libsonnet',

				    import 'sql_exporter/file_cache_write_wait_seconds_bucket.libsonnet',

				    import 'sql_exporter/file_cache_write_wait_seconds_count.libsonnet',

				    import 'sql_exporter/file_cache_write_wait_seconds_sum.libsonnet',

				    import 'sql_exporter/getpage_prefetch_discards_total.libsonnet',

				    import 'sql_exporter/getpage_prefetch_misses_total.libsonnet',

				    import 'sql_exporter/getpage_prefetch_requests_total.libsonnet',

				    import 'sql_exporter/getpage_prefetches_buffered.libsonnet',

				    import 'sql_exporter/getpage_sync_requests_total.libsonnet',

				    import 'sql_exporter/getpage_wait_seconds_bucket.libsonnet',

				    import 'sql_exporter/getpage_wait_seconds_count.libsonnet',

				    import 'sql_exporter/getpage_wait_seconds_sum.libsonnet',

				    import 'sql_exporter/lfc_approximate_working_set_size.libsonnet',

				    import 'sql_exporter/lfc_approximate_working_set_size_windows.libsonnet',

				    import 'sql_exporter/lfc_cache_size_limit.libsonnet',

				    import 'sql_exporter/lfc_hits.libsonnet',

				    import 'sql_exporter/lfc_misses.libsonnet',

				    import 'sql_exporter/lfc_used.libsonnet',

				    import 'sql_exporter/lfc_writes.libsonnet',

				    import 'sql_exporter/logical_slot_restart_lsn.libsonnet',

				    import 'sql_exporter/max_cluster_size.libsonnet',

				    import 'sql_exporter/pageserver_disconnects_total.libsonnet',

				    import 'sql_exporter/pageserver_requests_sent_total.libsonnet',

				    import 'sql_exporter/pageserver_send_flushes_total.libsonnet',

				    import 'sql_exporter/pageserver_open_requests.libsonnet',

				    import 'sql_exporter/pg_stats_userdb.libsonnet',

				    import 'sql_exporter/replication_delay_bytes.libsonnet',

				    import 'sql_exporter/replication_delay_seconds.libsonnet',

				    import 'sql_exporter/retained_wal.libsonnet',

				    import 'sql_exporter/wal_is_lost.libsonnet',

				  ],

				  queries: [

				    {

				      query_name: 'neon_perf_counters',

				      query: importstr 'sql_exporter/neon_perf_counters.sql',

				    },

				  ],

				}

									
										11

compute/etc/neon_collector_autoscaling.jsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,11 @@

				{

				  collector_name: 'neon_collector_autoscaling',

				  metrics: [

				    import 'sql_exporter/lfc_approximate_working_set_size_windows.autoscaling.libsonnet',

				    import 'sql_exporter/lfc_cache_size_limit.libsonnet',

				    import 'sql_exporter/lfc_hits.libsonnet',

				    import 'sql_exporter/lfc_misses.libsonnet',

				    import 'sql_exporter/lfc_used.libsonnet',

				    import 'sql_exporter/lfc_writes.libsonnet',

				  ],

				}

									
										30

compute/etc/pgbouncer.ini
									
										Normal file
									
												View File
												
				@@ -0,0 +1,30 @@

				[databases]

				;; pgbouncer propagates application_name (if it's specified) to the server, but some

				;; clients don't set it. We set default application_name=pgbouncer to make it

				;; easier to identify pgbouncer connections in Postgres. If client sets

				;; application_name, it will be used instead.

				*=host=localhost port=5432 auth_user=cloud_admin application_name=pgbouncer

				[pgbouncer]

				listen_port=6432

				listen_addr=0.0.0.0

				auth_type=scram-sha-256

				auth_user=cloud_admin

				auth_dbname=postgres

				client_tls_sslmode=disable

				server_tls_sslmode=disable

				pool_mode=transaction

				max_client_conn=10000

				default_pool_size=64

				max_prepared_statements=0

				admin_users=postgres

				unix_socket_dir=/tmp/

				unix_socket_mode=0777

				; required for pgbouncer_exporter

				ignore_startup_parameters=extra_float_digits

				;; Disable connection logging. It produces a lot of logs that no one looks at,

				;; and we can get similar log entries from the proxy too. We had incidents in

				;; the past where the logging significantly stressed the log device or pgbouncer

				;; itself.

				log_connections=0

				log_disconnections=0

0

compute/etc/postgres_exporter.yml Normal file

View File

									
										40

compute/etc/sql_exporter.jsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,40 @@

				function(collector_name, collector_file, connection_string) {

				  // Configuration for sql_exporter for autoscaling-agent

				  // Global defaults.

				  global: {

				    // If scrape_timeout <= 0, no timeout is set unless Prometheus provides one. The default is 10s.

				    scrape_timeout: '10s',

				    // Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.

				    scrape_timeout_offset: '500ms',

				    // Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.

				    min_interval: '0s',

				    // Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,

				    // as will concurrent scrapes.

				    max_connections: 1,

				    // Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should

				    // always be the same as max_connections.

				    max_idle_connections: 1,

				    // Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.

				    // If 0, connections are not closed due to a connection's age.

				    max_connection_lifetime: '5m',

				  },

				  // The target to monitor and the collectors to execute on it.

				  target: {

				    // Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)

				    // the schema gets dropped or replaced to match the driver expected DSN format.

				    data_source_name: connection_string,

				    // Collectors (referenced by name) to execute on the target.

				    // Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).

				    collectors: [

				      collector_name,

				    ],

				  },

				  // Collector files specifies a list of globs. One collector definition is read from each matching file.

				  // Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).

				  collector_files: [

				    collector_file,

				  ],

				}

									
										1

compute/etc/sql_exporter/checkpoints_req.17.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT num_requested AS checkpoints_req FROM pg_stat_checkpointer;

									
										15

compute/etc/sql_exporter/checkpoints_req.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,15 @@

				local neon = import 'neon.libsonnet';

				local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_req.sql';

				local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_req.17.sql';

				{

				  metric_name: 'checkpoints_req',

				  type: 'gauge',

				  help: 'Number of requested checkpoints',

				  key_labels: null,

				  values: [

				    'checkpoints_req',

				  ],

				  query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,

				}

									
										1

compute/etc/sql_exporter/checkpoints_req.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT checkpoints_req FROM pg_stat_bgwriter;

									
										1

compute/etc/sql_exporter/checkpoints_timed.17.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT num_timed AS checkpoints_timed FROM pg_stat_checkpointer;

									
										15

compute/etc/sql_exporter/checkpoints_timed.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,15 @@

				local neon = import 'neon.libsonnet';

				local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_timed.sql';

				local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_timed.17.sql';

				{

				  metric_name: 'checkpoints_timed',

				  type: 'gauge',

				  help: 'Number of scheduled checkpoints',

				  key_labels: null,

				  values: [

				    'checkpoints_timed',

				  ],

				  query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,

				}

									
										1

compute/etc/sql_exporter/checkpoints_timed.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT checkpoints_timed FROM pg_stat_bgwriter;

									
										10

compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_backpressure_throttling_seconds_total',

				  type: 'counter',

				  help: 'Time compute has spent throttled',

				  key_labels: null,

				  values: [

				    'throttled',

				  ],

				  query: importstr 'sql_exporter/compute_backpressure_throttling_seconds_total.sql',

				}

									
										1

compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT (neon.backpressure_throttling_time()::float8 / 1000000) AS throttled;

									
										10

compute/etc/sql_exporter/compute_current_lsn.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_current_lsn',

				  type: 'gauge',

				  help: 'Current LSN of the database',

				  key_labels: null,

				  values: [

				    'lsn',

				  ],

				  query: importstr 'sql_exporter/compute_current_lsn.sql',

				}

									
										4

compute/etc/sql_exporter/compute_current_lsn.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,4 @@

				SELECT CASE

				  WHEN pg_catalog.pg_is_in_recovery() THEN (pg_last_wal_replay_lsn() - '0/0')::FLOAT8

				  ELSE (pg_current_wal_lsn() - '0/0')::FLOAT8

				END AS lsn;

									
										12

compute/etc/sql_exporter/compute_logical_snapshot_files.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				{

				  metric_name: 'compute_logical_snapshot_files',

				  type: 'gauge',

				  help: 'Number of snapshot files in pg_logical/snapshot',

				  key_labels: [

				    'timeline_id',

				  ],

				  values: [

				    'num_logical_snapshot_files',

				  ],

				  query: importstr 'sql_exporter/compute_logical_snapshot_files.sql',

				}

									
										7

compute/etc/sql_exporter/compute_logical_snapshot_files.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,7 @@

				SELECT

				  (SELECT setting FROM pg_settings WHERE name = 'neon.timeline_id') AS timeline_id,

				  -- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.

				  -- These temporary snapshot files are renamed to the actual snapshot files

				  -- after they are completely built. We only WAL-log the completely built

				  -- snapshot files

				  (SELECT COUNT(*) FROM pg_ls_dir('pg_logical/snapshots') AS name WHERE name LIKE '%.snap') AS num_logical_snapshot_files;

									
										7

compute/etc/sql_exporter/compute_logical_snapshots_bytes.15.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,7 @@

				SELECT

				  (SELECT current_setting('neon.timeline_id')) AS timeline_id,

				  -- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.

				  -- These temporary snapshot files are renamed to the actual snapshot files

				  -- after they are completely built. We only WAL-log the completely built

				  -- snapshot files

				  (SELECT COALESCE(sum(size), 0) FROM pg_ls_logicalsnapdir() WHERE name LIKE '%.snap') AS logical_snapshots_bytes;

									
										17

compute/etc/sql_exporter/compute_logical_snapshots_bytes.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,17 @@

				local neon = import 'neon.libsonnet';

				local pg_ls_logicalsnapdir = importstr 'sql_exporter/compute_logical_snapshots_bytes.15.sql';

				local pg_ls_dir = importstr 'sql_exporter/compute_logical_snapshots_bytes.sql';

				{

				  metric_name: 'compute_logical_snapshots_bytes',

				  type: 'gauge',

				  help: 'Size of the pg_logical/snapshots directory, not including temporary files',

				  key_labels: [

				    'timeline_id',

				  ],

				  values: [

				    'logical_snapshots_bytes',

				  ],

				  query: if neon.PG_MAJORVERSION_NUM < 15 then pg_ls_dir else pg_ls_logicalsnapdir,

				}

									
										9

compute/etc/sql_exporter/compute_logical_snapshots_bytes.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				SELECT

				  (SELECT setting FROM pg_settings WHERE name = 'neon.timeline_id') AS timeline_id,

				  -- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.

				  -- These temporary snapshot files are renamed to the actual snapshot files

				  -- after they are completely built. We only WAL-log the completely built

				  -- snapshot files

				  (SELECT COALESCE(sum((pg_stat_file('pg_logical/snapshots/' || name, missing_ok => true)).size), 0)

				    FROM (SELECT * FROM pg_ls_dir('pg_logical/snapshots') WHERE pg_ls_dir LIKE '%.snap') AS name

				  ) AS logical_snapshots_bytes;

									
										10

compute/etc/sql_exporter/compute_max_connections.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_max_connections',

				  type: 'gauge',

				  help: 'Max connections allowed for Postgres',

				  key_labels: null,

				  values: [

				    'max_connections',

				  ],

				  query: importstr 'sql_exporter/compute_max_connections.sql',

				}

									
										1

compute/etc/sql_exporter/compute_max_connections.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT current_setting('max_connections') as max_connections;

									
										10

compute/etc/sql_exporter/compute_receive_lsn.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_receive_lsn',

				  type: 'gauge',

				  help: 'Returns the last write-ahead log location that has been received and synced to disk by streaming replication',

				  key_labels: null,

				  values: [

				    'lsn',

				  ],

				  query: importstr 'sql_exporter/compute_receive_lsn.sql',

				}

									
										4

compute/etc/sql_exporter/compute_receive_lsn.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,4 @@

				SELECT CASE

				  WHEN pg_catalog.pg_is_in_recovery() THEN (pg_last_wal_receive_lsn() - '0/0')::FLOAT8

				  ELSE 0

				END AS lsn;

									
										12

compute/etc/sql_exporter/compute_subscriptions_count.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				{

				  metric_name: 'compute_subscriptions_count',

				  type: 'gauge',

				  help: 'Number of logical replication subscriptions grouped by enabled/disabled',

				  key_labels: [

				    'enabled',

				  ],

				  values: [

				    'subscriptions_count',

				  ],

				  query: importstr 'sql_exporter/compute_subscriptions_count.sql',

				}

									
										1

compute/etc/sql_exporter/compute_subscriptions_count.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT subenabled::text AS enabled, count(*) AS subscriptions_count FROM pg_subscription GROUP BY subenabled;

									
										13

compute/etc/sql_exporter/connection_counts.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,13 @@

				{

				  metric_name: 'connection_counts',

				  type: 'gauge',

				  help: 'Connection counts',

				  key_labels: [

				    'datname',

				    'state',

				  ],

				  values: [

				    'count',

				  ],

				  query: importstr 'sql_exporter/connection_counts.sql',

				}

									
										1

compute/etc/sql_exporter/connection_counts.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT datname, state, count(*) AS count FROM pg_stat_activity WHERE state <> '' GROUP BY datname, state;

									
										10

compute/etc/sql_exporter/db_total_size.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'db_total_size',

				  type: 'gauge',

				  help: 'Size of all databases',

				  key_labels: null,

				  values: [

				    'total',

				  ],

				  query: importstr 'sql_exporter/db_total_size.sql',

				}

									
										1

compute/etc/sql_exporter/db_total_size.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT sum(pg_database_size(datname)) AS total FROM pg_database;

									
										12

compute/etc/sql_exporter/file_cache_read_wait_seconds_bucket.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				{

				  metric_name: 'file_cache_read_wait_seconds_bucket',

				  type: 'counter',

				  help: 'Histogram buckets of LFC read operation latencies',

				  key_labels: [

				    'bucket_le',

				  ],

				  values: [

				    'value',

				  ],

				  query: importstr 'sql_exporter/file_cache_read_wait_seconds_bucket.sql',

				}

									
										1

compute/etc/sql_exporter/file_cache_read_wait_seconds_bucket.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'file_cache_read_wait_seconds_bucket';

									
										9

compute/etc/sql_exporter/file_cache_read_wait_seconds_count.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'file_cache_read_wait_seconds_count',

				  type: 'counter',

				  help: 'Number of read operations in LFC',

				  values: [

				    'file_cache_read_wait_seconds_count',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										9

compute/etc/sql_exporter/file_cache_read_wait_seconds_sum.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'file_cache_read_wait_seconds_sum',

				  type: 'counter',

				  help: 'Time spent in LFC read operations',

				  values: [

				    'file_cache_read_wait_seconds_sum',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										12

compute/etc/sql_exporter/file_cache_write_wait_seconds_bucket.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				{

				  metric_name: 'file_cache_write_wait_seconds_bucket',

				  type: 'counter',

				  help: 'Histogram buckets of LFC write operation latencies',

				  key_labels: [

				    'bucket_le',

				  ],

				  values: [

				    'value',

				  ],

				  query: importstr 'sql_exporter/file_cache_write_wait_seconds_bucket.sql',

				}

									
										1

compute/etc/sql_exporter/file_cache_write_wait_seconds_bucket.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'file_cache_write_wait_seconds_bucket';

									
										9

compute/etc/sql_exporter/file_cache_write_wait_seconds_count.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'file_cache_write_wait_seconds_count',

				  type: 'counter',

				  help: 'Number of write operations in LFC',

				  values: [

				    'file_cache_write_wait_seconds_count',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										9

compute/etc/sql_exporter/file_cache_write_wait_seconds_sum.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'file_cache_write_wait_seconds_sum',

				  type: 'counter',

				  help: 'Time spent in LFC write operations',

				  values: [

				    'file_cache_write_wait_seconds_sum',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										9

compute/etc/sql_exporter/getpage_prefetch_discards_total.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'getpage_prefetch_discards_total',

				  type: 'counter',

				  help: 'Number of prefetch responses issued but not used',

				  values: [

				    'getpage_prefetch_discards_total',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										9

compute/etc/sql_exporter/getpage_prefetch_misses_total.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'getpage_prefetch_misses_total',

				  type: 'counter',

				  help: "Total number of readahead misses; consisting of either prefetches that don't satisfy the LSN bounds once the prefetch got read by the backend, or cases where somehow no readahead was issued for the read",

				  values: [

				    'getpage_prefetch_misses_total',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										9

compute/etc/sql_exporter/getpage_prefetch_requests_total.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'getpage_prefetch_requests_total',

				  type: 'counter',

				  help: 'Number of getpage issued for prefetching',

				  values: [

				    'getpage_prefetch_requests_total',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										9

compute/etc/sql_exporter/getpage_prefetches_buffered.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'getpage_prefetches_buffered',

				  type: 'gauge',

				  help: 'Number of prefetched pages buffered in neon',

				  values: [

				    'getpage_prefetches_buffered',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										9

compute/etc/sql_exporter/getpage_sync_requests_total.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'getpage_sync_requests_total',

				  type: 'counter',

				  help: 'Number of synchronous getpage issued',

				  values: [

				    'getpage_sync_requests_total',

				  ],

				  query_ref: 'neon_perf_counters',

				}

									
										12

compute/etc/sql_exporter/getpage_wait_seconds_bucket.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				{

				  metric_name: 'getpage_wait_seconds_bucket',

				  type: 'counter',

				  help: 'Histogram buckets of getpage request latency',

				  key_labels: [

				    'bucket_le',

				  ],

				  values: [

				    'value',

				  ],

				  query: importstr 'sql_exporter/getpage_wait_seconds_bucket.sql',

				}

									
										1

compute/etc/sql_exporter/getpage_wait_seconds_bucket.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'getpage_wait_seconds_bucket';

									
										9

compute/etc/sql_exporter/getpage_wait_seconds_count.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				{

				  metric_name: 'getpage_wait_seconds_count',

				  type: 'counter',

				  help: 'Number of getpage requests',

				  values: [

				    'getpage_wait_seconds_count',

				  ],

				  query_ref: 'neon_perf_counters',

				}

				`@@ -0,0 +1 @@`
				`SELECT num_requested AS checkpoints_req FROM pg_stat_checkpointer;`

				`@@ -0,0 +1 @@`
				`SELECT checkpoints_req FROM pg_stat_bgwriter;`

				`@@ -0,0 +1 @@`
				`SELECT num_timed AS checkpoints_timed FROM pg_stat_checkpointer;`

				`@@ -0,0 +1 @@`
				`SELECT checkpoints_timed FROM pg_stat_bgwriter;`

				`@@ -0,0 +1 @@`
				`SELECT (neon.backpressure_throttling_time()::float8 / 1000000) AS throttled;`

				`@@ -0,0 +1 @@`
				`SELECT current_setting('max_connections') as max_connections;`

				`@@ -0,0 +1 @@`
				`SELECT subenabled::text AS enabled, count(*) AS subscriptions_count FROM pg_subscription GROUP BY subenabled;`

				`@@ -0,0 +1 @@`
				`SELECT datname, state, count(*) AS count FROM pg_stat_activity WHERE state <> '' GROUP BY datname, state;`

				`@@ -0,0 +1 @@`
				`SELECT sum(pg_database_size(datname)) AS total FROM pg_database;`

				`@@ -0,0 +1 @@`
				`SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'file_cache_read_wait_seconds_bucket';`

Compare commits

1915 Commits compress-p ... release-pr

10 .cargo/config.toml Unescape Escape View File

31 .config/hakari.toml Unescape Escape View File

6 .dockerignore Unescape Escape View File

2 .gitattributes vendored Normal file Unescape Escape View File

6 .github/ISSUE_TEMPLATE/config.yml vendored Normal file Unescape Escape View File

18 .github/actionlint.yml vendored Unescape Escape View File

17 .github/actions/allure-report-generate/action.yml vendored Unescape Escape View File

13 .github/actions/allure-report-store/action.yml vendored Unescape Escape View File

9 .github/actions/download/action.yml vendored Unescape Escape View File

64 .github/actions/neon-project-create/action.yml vendored Unescape Escape View File

68 .github/actions/run-python-test-set/action.yml vendored Unescape Escape View File

2 .github/actions/save-coverage-data/action.yml vendored Unescape Escape View File

29 .github/actions/upload/action.yml vendored Unescape Escape View File

12 .github/file-filters.yaml vendored Normal file Unescape Escape View File

11 .github/pull_request_template.md vendored Unescape Escape View File

172 .github/workflows/_benchmarking_preparation.yml vendored Normal file Unescape Escape View File

327 .github/workflows/_build-and-test-locally.yml vendored Normal file Unescape Escape View File

37 .github/workflows/_check-codestyle-python.yml vendored Normal file Unescape Escape View File

91 .github/workflows/_check-codestyle-rust.yml vendored Normal file Unescape Escape View File

79 .github/workflows/_create-release-pr.yml vendored Normal file Unescape Escape View File

56 .github/workflows/_push-to-acr.yml vendored Normal file Unescape Escape View File

4 .github/workflows/actionlint.yml vendored Unescape Escape View File

649 .github/workflows/benchmarking.yml vendored View File

154 .github/workflows/build-build-tools-image.yml vendored Unescape Escape View File

241 .github/workflows/build-macos.yml vendored Normal file Unescape Escape View File

1295 .github/workflows/build_and_test.yml vendored View File

51 .github/workflows/check-build-tools-image.yml vendored Unescape Escape View File

130 .github/workflows/cloud-regress.yml vendored Normal file Unescape Escape View File

182 .github/workflows/ingest_benchmark.yml vendored Normal file Unescape Escape View File

78 .github/workflows/label-for-external-users.yml vendored Normal file Unescape Escape View File

373 .github/workflows/neon_extra_builds.yml vendored Unescape Escape View File

172 .github/workflows/periodic_pagebench.yml vendored Normal file Unescape Escape View File

213 .github/workflows/pg-clients.yml vendored Normal file Unescape Escape View File

98 .github/workflows/pg_clients.yml vendored Unescape Escape View File

89 .github/workflows/pin-build-tools-image.yml vendored Unescape Escape View File

129 .github/workflows/pre-merge-checks.yml vendored Normal file Unescape Escape View File

102 .github/workflows/release.yml vendored Unescape Escape View File

53 .github/workflows/report-workflow-stats-batch.yml vendored Normal file Unescape Escape View File

53 .github/workflows/trigger-e2e-tests.yml vendored Unescape Escape View File

2 .gitignore vendored Unescape Escape View File

4 .gitmodules vendored Unescape Escape View File

36 CODEOWNERS Unescape Escape View File

3543 Cargo.lock generated View File

166 Cargo.toml Unescape Escape View File

64 Dockerfile Unescape Escape View File

1030 Dockerfile.compute-node View File

92 Makefile Unescape Escape View File

20 README.md Unescape Escape View File

157 Dockerfile.build-tools → build-tools.Dockerfile Unescape Escape View File

5 compute/.gitignore vendored Normal file Unescape Escape View File

50 compute/Makefile Normal file Unescape Escape View File

21 compute/README.md Normal file Unescape Escape View File

1523 compute/compute-node.Dockerfile Normal file View File

17 compute/etc/README.md Normal file Unescape Escape View File

54 compute/etc/neon_collector.jsonnet Normal file Unescape Escape View File

11 compute/etc/neon_collector_autoscaling.jsonnet Normal file Unescape Escape View File

30 compute/etc/pgbouncer.ini Normal file Unescape Escape View File

0 compute/etc/postgres_exporter.yml Normal file Unescape Escape View File

40 compute/etc/sql_exporter.jsonnet Normal file Unescape Escape View File

1 compute/etc/sql_exporter/checkpoints_req.17.sql Normal file Unescape Escape View File

15 compute/etc/sql_exporter/checkpoints_req.libsonnet Normal file Unescape Escape View File

1 compute/etc/sql_exporter/checkpoints_req.sql Normal file Unescape Escape View File

1 compute/etc/sql_exporter/checkpoints_timed.17.sql Normal file Unescape Escape View File

15 compute/etc/sql_exporter/checkpoints_timed.libsonnet Normal file Unescape Escape View File

1 compute/etc/sql_exporter/checkpoints_timed.sql Normal file Unescape Escape View File

10 compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet Normal file Unescape Escape View File

1 compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.sql Normal file Unescape Escape View File

10 compute/etc/sql_exporter/compute_current_lsn.libsonnet Normal file Unescape Escape View File

4 compute/etc/sql_exporter/compute_current_lsn.sql Normal file Unescape Escape View File

12 compute/etc/sql_exporter/compute_logical_snapshot_files.libsonnet Normal file Unescape Escape View File

7 compute/etc/sql_exporter/compute_logical_snapshot_files.sql Normal file Unescape Escape View File

7 compute/etc/sql_exporter/compute_logical_snapshots_bytes.15.sql Normal file Unescape Escape View File

17 compute/etc/sql_exporter/compute_logical_snapshots_bytes.libsonnet Normal file Unescape Escape View File

9 compute/etc/sql_exporter/compute_logical_snapshots_bytes.sql Normal file Unescape Escape View File

10 compute/etc/sql_exporter/compute_max_connections.libsonnet Normal file Unescape Escape View File

1 compute/etc/sql_exporter/compute_max_connections.sql Normal file Unescape Escape View File

10 compute/etc/sql_exporter/compute_receive_lsn.libsonnet Normal file Unescape Escape View File

4 compute/etc/sql_exporter/compute_receive_lsn.sql Normal file Unescape Escape View File

1915 Commits

compress-p ... release-pr

10

.cargo/config.toml

View File

31

.config/hakari.toml

View File

6

.dockerignore

View File

2

.gitattributes vendored Normal file

View File

6

.github/ISSUE_TEMPLATE/config.yml vendored Normal file

View File

18

.github/actionlint.yml vendored

View File

17

.github/actions/allure-report-generate/action.yml vendored

View File

13

.github/actions/allure-report-store/action.yml vendored

View File

9

.github/actions/download/action.yml vendored

View File

64

.github/actions/neon-project-create/action.yml vendored

View File

68

.github/actions/run-python-test-set/action.yml vendored

View File

2

.github/actions/save-coverage-data/action.yml vendored

View File

29

.github/actions/upload/action.yml vendored

View File

12

.github/file-filters.yaml vendored Normal file

View File

11

.github/pull_request_template.md vendored

View File

172

.github/workflows/_benchmarking_preparation.yml vendored Normal file

View File

327

.github/workflows/_build-and-test-locally.yml vendored Normal file

View File

37

.github/workflows/_check-codestyle-python.yml vendored Normal file

View File

91

.github/workflows/_check-codestyle-rust.yml vendored Normal file

View File

79

.github/workflows/_create-release-pr.yml vendored Normal file

View File

56

.github/workflows/_push-to-acr.yml vendored Normal file

View File

4

.github/workflows/actionlint.yml vendored

View File

649

.github/workflows/benchmarking.yml vendored

View File

154

.github/workflows/build-build-tools-image.yml vendored

View File

241

.github/workflows/build-macos.yml vendored Normal file

View File

1295

.github/workflows/build_and_test.yml vendored

View File

51

.github/workflows/check-build-tools-image.yml vendored

View File

130

.github/workflows/cloud-regress.yml vendored Normal file

View File

182

.github/workflows/ingest_benchmark.yml vendored Normal file

View File

78

.github/workflows/label-for-external-users.yml vendored Normal file

View File

373

.github/workflows/neon_extra_builds.yml vendored

View File

172

.github/workflows/periodic_pagebench.yml vendored Normal file

View File

213

.github/workflows/pg-clients.yml vendored Normal file

View File

98

.github/workflows/pg_clients.yml vendored

View File

89

.github/workflows/pin-build-tools-image.yml vendored

View File

129

.github/workflows/pre-merge-checks.yml vendored Normal file

View File

102

.github/workflows/release.yml vendored

View File

53

.github/workflows/report-workflow-stats-batch.yml vendored Normal file

View File

53

.github/workflows/trigger-e2e-tests.yml vendored

View File

2

.gitignore vendored

View File

4

.gitmodules vendored

View File

36

CODEOWNERS

View File

3543

Cargo.lock generated

View File

166

Cargo.toml

View File

64

Dockerfile

View File

1030

Dockerfile.compute-node

View File

92

Makefile

View File

20

README.md

View File

157

Dockerfile.build-tools → build-tools.Dockerfile

View File

5

compute/.gitignore vendored Normal file

View File

50

compute/Makefile Normal file

View File

21

compute/README.md Normal file

View File

1523

compute/compute-node.Dockerfile Normal file

View File

17

compute/etc/README.md Normal file

View File

54

compute/etc/neon_collector.jsonnet Normal file

View File

11

compute/etc/neon_collector_autoscaling.jsonnet Normal file

View File

30

compute/etc/pgbouncer.ini Normal file

View File

0

compute/etc/postgres_exporter.yml Normal file

View File

40

compute/etc/sql_exporter.jsonnet Normal file

View File

1

compute/etc/sql_exporter/checkpoints_req.17.sql Normal file

View File

15

compute/etc/sql_exporter/checkpoints_req.libsonnet Normal file

View File

1

compute/etc/sql_exporter/checkpoints_req.sql Normal file

View File

1

compute/etc/sql_exporter/checkpoints_timed.17.sql Normal file

View File

15

compute/etc/sql_exporter/checkpoints_timed.libsonnet Normal file

View File

1

compute/etc/sql_exporter/checkpoints_timed.sql Normal file

View File

10

compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet Normal file

View File

1

compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.sql Normal file

View File

10

compute/etc/sql_exporter/compute_current_lsn.libsonnet Normal file

View File

4

compute/etc/sql_exporter/compute_current_lsn.sql Normal file

View File

12

compute/etc/sql_exporter/compute_logical_snapshot_files.libsonnet Normal file

View File

7

compute/etc/sql_exporter/compute_logical_snapshot_files.sql Normal file

View File

7

compute/etc/sql_exporter/compute_logical_snapshots_bytes.15.sql Normal file

View File

17

compute/etc/sql_exporter/compute_logical_snapshots_bytes.libsonnet Normal file

View File

9

compute/etc/sql_exporter/compute_logical_snapshots_bytes.sql Normal file

View File

10

compute/etc/sql_exporter/compute_max_connections.libsonnet Normal file

View File

1

compute/etc/sql_exporter/compute_max_connections.sql Normal file

View File

10

compute/etc/sql_exporter/compute_receive_lsn.libsonnet Normal file

View File

4

compute/etc/sql_exporter/compute_receive_lsn.sql Normal file

View File

12

compute/etc/sql_exporter/compute_subscriptions_count.libsonnet Normal file

View File