rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 13:32:57 +00:00

Author	SHA1	Message	Date
Vlad Lazar	c43e664ff5	storcon: provide an az id in metadata.json from neon local (#8897 ) ## Problem Neon local set-up does not inject an az id in `metadata.json`. See real change in https://github.com/neondatabase/neon/pull/8852. ## Summary of changes We piggyback on the existing `availability_zone` pageserver configuration in order to avoid making neon local even more complex.	2024-09-03 15:11:30 +01:00
Erik Grinaker	b37da32c6f	pageserver: reuse idempotency keys across metrics sinks (#8876 ) ## Problem Metrics event idempotency keys differ across S3 and Vector. The events should be identical. Resolves #8605. ## Summary of changes Pre-generate the idempotency keys and pass the same set into both metrics sinks. Co-authored-by: John Spray <john@neon.tech>	2024-09-03 09:05:24 +01:00
Christian Schwarz	3b317cae07	page_cache/layer load: correctly classify layer summary block reads (#8885 ) Before this PR, we would classify layer summary block reads as "Unknown" content kind. <img width="1267" alt="image" src="https://github.com/user-attachments/assets/508af034-5c2a-4c89-80db-2899967b337c">	2024-09-02 16:09:26 +01:00
Christian Schwarz	bf0531d107	fixup(#8839 ): `test_forward_compatibility` needs to allow lag warning as well (#8891 ) Found in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8885/10665614629/index.html#suites/0fbaeb107ef328d03993d44a1fb15690/ea10ba1c140fba1d	2024-09-02 15:10:10 +01:00
Christian Schwarz	15e90cc427	bottommost-compaction: remove dead code / rectify cfg!()s (#8884 ) part of https://github.com/neondatabase/neon/issues/8002	2024-09-02 14:45:17 +01:00
Arpad Müller	9746b6ea31	Implement archival_config timeline endpoint in the storage controller (#8680 ) Implement the timeline specific `archival_config` endpoint also in the storage controller. It's mostly a copy-paste of the detach handler: the task is the same: do the same operation on all shards. Part of #8088.	2024-09-02 13:51:45 +02:00
John Spray	516ac0591e	storage controller: eliminate ensure_attached (#8875 ) ## Problem This is a followup to #8783 - The old blocking ensure_attached function had been retained to handle the case where a shard had a None generation_pageserver, but this wasn't really necessary. - There was a subtle `.1` in the code where a struct would have been clearer Closes #8819 ## Summary of changes - Add ShardGenerationState to represent the results of peek_generation - Instead of calling ensure_attached when a tenant has a non-attached shard, check the shard's policy and return 409 if it isn't Attached, else return 503 if the shard's policy is attached but it hasn't been reconciled yet (i.e. has a None generation_pageserver)	2024-09-02 11:36:57 +00:00
Arpad Müller	3ec785f30d	Add safekeeper scrubber test (#8785 ) The test is very rudimentary, it only checks that before and after tenant deletion, we can run `scan_metadata` for the safekeeper node kind. Also, we don't actually expect any uploaded data, for that we don't have enough WAL (needs to create at least one S3-uploaded file, the scrubber doesn't recognize partial files yet). The `scan_metadata` scrubber subcommand is extended to support either specifying a database connection string, which was previously the only way, and required a database to be present, or specifying the timeline information manually via json. This is ideal for testing scenarios because in those, the number of timelines is usually limited, but it is involved to spin up a database just to write the timeline information.	2024-08-31 01:12:25 +02:00
Alex Chi Z.	05caaab850	fix(pageserver): fire layer eviction alert only when it's visible (#8882 ) The pull request https://github.com/neondatabase/neon/pull/8679 explicitly mentioned that it will evict layers earlier than before. Given that the eviction metrics is solely based on eviction threshold (which is 86400s now), we should consider the early eviction and do not fire alert if it's a covered layer. ## Summary of changes Record eviction timer only when the layer is visible + accessed. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-30 17:22:26 -04:00
Yuchen Liang	cacb1ae333	pageserver: set default io_buffer_alignment to 512 bytes (#8878 ) ## Summary of changes - Setting default io_buffer_alignment to 512 bytes. - Fix places that assumed `DEFAULT_IO_BUFFER_ALIGNMENT=0` - Adapt unit tests to handle merge with `chunk size <= 4096`. ## Testing and Performance We have done sufficient performance de-risking. Enabling it by default completes our correctness de-risking before the next release. Context: https://neondb.slack.com/archives/C07BZ38E6SD/p1725026845455259 Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-30 19:53:52 +01:00
Alex Chi Z.	df971f995c	feat(storage-scrubber): check layer map validity (#8867 ) When implementing bottom-most gc-compaction, we analyzed the structure of layer maps that the current compaction algorithm could produce, and decided to only support structures without delta layer overlaps and LSN intersections with the exception of single key layers. ## Summary of changes This patch adds the layer map valid check in the storage scrubber. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-30 14:12:39 -04:00
Alexander Bayandin	e58e045ebb	CI(promote-compatibility-data): fix job (#8871 ) ## Problem `promote-compatibility-data` job got broken and slightly outdated after - https://github.com/neondatabase/neon/pull/8552 -- we don't upload artifacts for ARM64 - https://github.com/neondatabase/neon/pull/8561 -- we don't prepare `debug` artifacts in the release branch anymore ## Summary of changes - Promote artifacts from release PRs to the latest version (but do it from `release` branch) - Upload artifacts for both X64 and ARM64	2024-08-30 13:18:30 +01:00
John Spray	20f82f9169	storage controller: sleep between compute notify retries (#8869 ) ## Problem Live migration retries when it fails to notify the compute of the new location. It should sleep between attempts. Closes: https://github.com/neondatabase/neon/issues/8820 ## Summary of changes - Do an `exponential_backoff` in the retry loop for compute notifications	2024-08-30 11:44:13 +01:00
Conrad Ludgate	72aa6b02da	chore: speed up testing (#8874 ) `safekeeper::random_test test_random_schedules` debug test takes over 2 minutes to run on our arm runners. Running it 6 times with pageserver settings seems redundant.	2024-08-30 11:34:23 +01:00
Conrad Ludgate	022fad65eb	proxy: fix password hash cancellation (#8868 ) In #8863 I replaced the threadpool with tokio tasks, but there was a behaviour I missed regarding cancellation. Adding the JoinHandle wrapper that triggers abort on drop should fix this. Another change, any panics that occur in password hashing will be propagated through the resume_unwind functionality.	2024-08-29 20:16:44 +01:00
Arpad Müller	8eaa8ad358	Remove async_trait usages from safekeeper and neon_local (#8864 ) Removes additional async_trait usages from safekeeper and neon_local. Also removes now redundant dependencies of the `async_trait` crate. cc earlier work: #6305, #6464, #7303, #7342, #7212, #8296	2024-08-29 18:24:25 +02:00
Alex Chi Z.	653a6532a2	fix(pageserver): reject non-i128 key on the write path (#8648 ) It's better to reject invalid keys on the write path than storing it and panic-ing the pageserver. https://github.com/neondatabase/neon/issues/8636 ## Summary of changes If a key cannot be represented using i128, we don't allow writing that key into the pageserver. There are two versions of the check valid function: the normal one that simply rejects i128 keys, and the stronger one that rejects all keys that we don't support. The current behavior when a key gets rejected is that safekeeper will keep retrying streaming that key to the pageserver. And once such key gets written, no new computes can be started. Therefore, there could be a large amount of pageserver warnings if a key cannot be ingested. To validate this behavior by yourself, the reviewer can (1) use the stronger version of the valid check (2) run the following SQL. ``` set neon.regress_test_mode = true; CREATE TABLESPACE regress_tblspace LOCATION '/Users/skyzh/Work/neon-test/tablespace'; CREATE SCHEMA testschema; CREATE TABLE testschema.foo (i int) TABLESPACE regress_tblspace; insert into testschema.foo values (1), (2), (3); ``` For now, I'd like to merge the patch with only rejecting non-i128 keys. It's still unknown whether the stronger version covers all the cases that basebackup doesn't support. Furthermore, the behavior of rejecting a key will produce large amounts of warnings due to safekeeper retry. Therefore, I'd like to reject the minimum set of keys that we don't support (i128 ones) for now. (well, erroring out is better than panic on `to_compact_key`) The next step is to fix the safekeeper behavior (i.e., on such key rejections, stop streaming WAL), so that we can properly stop writing. An alternative solution is to simply drop these keys on the write path. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-29 10:07:05 -04:00
Alex Chi Z.	18bfc43fa7	fix(pageserver): add dry-run to force compact API (#8859 ) Add `dry-run` flag to the compact API Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-29 10:01:54 -04:00
Conrad Ludgate	7ce49fe6e3	proxy: improve test performance (#8863 ) Some tests were very slow and some tests occasionally stalled. This PR improves some test performance and replaces the custom threadpool in order to fix the stalling of tests.	2024-08-29 13:20:15 +00:00
Christian Schwarz	a8fbc63be2	tenant background loops: periodic log message if long-running iteration (#8832 ) refs https://github.com/neondatabase/neon/issues/7524 Problem ------- When browsing Pageserver logs, background loop iterations that take a long time are hard to spot / easy to miss because they tend to not produce any log messages unless: - they overrun their period, but that's only one message when the iteration completes late - they do something that produces logs (e.g., create image layers) Further, a slow iteration that is still running does will not log nor bump the metrics of `warn_when_period_overrun`until _after_ it has finished. Again, that makes a still-running iteration hard to spot. Solution -------- This PR adds a wrapper around the per-tenant background loops that, while a slow iteration is ongoing, emit a log message every $period.	2024-08-29 15:06:13 +02:00
Arpad Müller	96b5c4d33d	Don't unarchive a timeline if its ancestor is archived (#8853 ) If a timeline unarchival request comes in, give an error if the parent timeline is archived. This prevents us from the situation of having an archived timeline with children that are not archived. Follow up of #8824 Part of #8088 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-08-29 12:54:02 +00:00
Christian Schwarz	c7481402a0	pageserver: default to 4MiB stack size and add env var to control it (#8862 ) # Motivation In https://github.com/neondatabase/neon/pull/8832 I get tokio runtime worker stack overflow errors in debug builds. In a similar vein, I had tokio runtimer worker stack overflow when trying to eliminate `async_trait` (https://github.com/neondatabase/neon/pull/8296). The 2MiB default is kind of arbitrary - so this PR bumps it to 4MiB. It also adds an env var to control it. # Risk Assessment With our 4 runtimes, the worst case stack memory usage is `4 (runtimes) * ($num_cpus (executor threads) + 512 (blocking pool threads)) * 4MiB`. On i3en.3xlarge, that's `8384 MiB`. On im4gn.2xlarge, that's `8320 MiB`. Before this change, it was half that. Looking at production metrics, we _do_ have the headroom to accomodate this worst case case. # Alternatives The problems only occur with debug builds, so technically we could only raise the stack size for debug builds. However, it would be another configuration where `debug != release`. # Future Work If we ever enable single runtime mode in prod (=> https://github.com/neondatabase/neon/issues/7312 ) then the worst case will drop to 25% of its current value. Eliminating the use of `tokio::spawn_blocking` / `tokio::fs` in favor of `tokio-epoll-uring` (=> https://github.com/neondatabase/neon/issues/7370 ) would reduce the worst case to `4 (runtimes) * $num_cpus (executor threads) * 4 MiB`.	2024-08-29 14:02:27 +02:00
Conrad Ludgate	a644f01b6a	proxy+pageserver: shared leaky bucket impl (#8539 ) In proxy I switched to a leaky-bucket impl using the GCRA algorithm. I figured I could share the code with pageserver and remove the leaky_bucket crate dependency with some very basic tokio timers and queues for fairness. The underlying algorithm should be fairly clear how it works from the comments I have left in the code. --- In benchmarking pageserver, @problame found that the new implementation fixes a getpage throughput discontinuity in pageserver under the `pagebench get-page-latest-lsn` benchmark with the clickbench dataset (`test_perf_olap.py`). The discontinuity is that for any of `--num-clients={2,3,4}`, getpage throughput remains 10k. With `--num-clients=5` and greater, getpage throughput then jumps to the configured 20k rate limit. With the changes in this PR, the discontinuity is gone, and we scale throughput linearly to `--num-clients` until the configured rate limit. More context in https://github.com/neondatabase/cloud/issues/16886#issuecomment-2315257641. closes https://github.com/neondatabase/cloud/issues/16886 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-29 11:26:52 +00:00
Christian Schwarz	c2f8fdccd7	ingest: rate-limited warning if WAL commit timestamps lags for > wait_lsn_timeout (#8839 ) refs https://github.com/neondatabase/cloud/issues/13750 The logging in this commit will make it easier to detect lagging ingest. We're trusting compute timestamps --- ideally we'd use SK timestmaps instead. But trusting the compute timestamp is ok for now.	2024-08-29 12:06:00 +01:00
Konstantin Knizhnik	cfa45ff5ee	Undo walloging replorgin file on checkpoint (#8794 ) ## Problem See #8620 ## Summary of changes Remove walloping of replorigin file because it is reconstructed by PS ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-29 07:45:33 +03:00
Andrew Rudenko	acc075071d	feat(compute_ctl): add periodic `lease lsn` requests for static computes (#7994 ) Part of #7497 ## Problem Static computes pinned at some fix LSN could be created initially within PITR interval but eventually go out it. To make sure that Static computes are not affected by GC, we need to start using the LSN lease API (introduced in #8084) in compute_ctl. ## Summary of changes compute_ctl - Spawn a thread for when a static compute starts to periodically ping pageserver(s) to make LSN lease requests. - Add `test_readonly_node_gc` to test if static compute can read all pages without error. - (test will fail on main without the code change here) page_service - `wait_or_get_last_lsn` will now allow `request_lsn` less than `latest_gc_cutoff_lsn` to proceed if there is a lease on `request_lsn`. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-08-28 19:09:26 +00:00
Christian Schwarz	9627747d35	bypass `PageCache` for `InMemoryLayer` + avoid `Value::deser` on L0 flush (#8537 ) Part of [Epic: Bypass PageCache for user data blocks](https://github.com/neondatabase/neon/issues/7386). # Problem `InMemoryLayer` still uses the `PageCache` for all data stored in the `VirtualFile` that underlies the `EphemeralFile`. # Background Before this PR, `EphemeralFile` is a fancy and (code-bloated) buffered writer around a `VirtualFile` that supports `blob_io`. The `InMemoryLayerInner::index` stores offsets into the `EphemeralFile`. At those offset, we find a varint length followed by the serialized `Value`. Vectored reads (`get_values_reconstruct_data`) are not in fact vectored - each `Value` that needs to be read is read sequentially. The `will_init` bit of information which we use to early-exit the `get_values_reconstruct_data` for a given key is stored in the serialized `Value`, meaning we have to read & deserialize the `Value` from the `EphemeralFile`. The L0 flushing also needs to re-determine the `will_init` bit of information, by deserializing each value during L0 flush. # Changes 1. Store the value length and `will_init` information in the `InMemoryLayer::index`. The `EphemeralFile` thus only needs to store the values. 2. For `get_values_reconstruct_data`: - Use the in-memory `index` figures out which values need to be read. Having the `will_init` stored in the index enables us to do that. - View the EphemeralFile as a byte array of "DIO chunks", each 512 bytes in size (adjustable constant). A "DIO chunk" is the minimal unit that we can read under direct IO. - Figure out which chunks need to be read to retrieve the serialized bytes for thes values we need to read. - Coalesce chunk reads such that each DIO chunk is only read once to serve all value reads that need data from that chunk. - Merge adjacent chunk reads into larger `EphemeralFile::read_exact_at_eof_ok` of up to 128k (adjustable constant). 3. The new `EphemeralFile::read_exact_at_eof_ok` fills the IO buffer from the underlying VirtualFile and/or its in-memory buffer. 4. The L0 flush code is changed to use the `index` directly, `blob_io` 5. We can remove the `ephemeral_file::page_caching` construct now. The `get_values_reconstruct_data` changes seem like a bit overkill but they are necessary so we issue the equivalent amount of read system calls compared to before this PR where it was highly likely that even if the first PageCache access was a miss, remaining reads within the same `get_values_reconstruct_data` call from the same `EphemeralFile` page were a hit. The "DIO chunk" stuff is truly unnecessary for page cache bypass, but, since we're working on [direct IO](https://github.com/neondatabase/neon/issues/8130) and https://github.com/neondatabase/neon/issues/8719 specifically, we need to do _something_ like this anyways in the near future. # Alternative Design The original plan was to use the `vectored_blob_io` code it relies on the invariant of Delta&Image layers that `index order == values order`. Further, `vectored_blob_io` code's strategy for merging IOs is limited to adjacent reads. However, with direct IO, there is another level of merging that should be done, specifically, if multiple reads map to the same "DIO chunk" (=alignment-requirement-sized and -aligned region of the file), then it's "free" to read the chunk into an IO buffer and serve the two reads from that buffer. => https://github.com/neondatabase/neon/issues/8719 # Testing / Performance Correctness of the IO merging code is ensured by unit tests. Additionally, minimal tests are added for the `EphemeralFile` implementation and the bit-packed `InMemoryLayerIndexValue`. Performance testing results are presented below. All pref testing done on my M2 MacBook Pro, running a Linux VM. It's a release build without `--features testing`. We see definitive improvement in ingest performance microbenchmark and an ad-hoc microbenchmark for getpage against InMemoryLayer. ``` baseline: commit `7c74112b2a` origin/main HEAD: `ef1c55c52e` ``` <details> ``` cargo bench --bench bench_ingest -- 'ingest 128MB/100b seq, no delta' baseline ingest-small-values/ingest 128MB/100b seq, no delta time: [483.50 ms 498.73 ms 522.53 ms] thrpt: [244.96 MiB/s 256.65 MiB/s 264.73 MiB/s] HEAD ingest-small-values/ingest 128MB/100b seq, no delta time: [479.22 ms 482.92 ms 487.35 ms] thrpt: [262.64 MiB/s 265.06 MiB/s 267.10 MiB/s] ``` </details> We don't have a micro-benchmark for InMemoryLayer and it's quite cumbersome to add one. So, I did manual testing in `neon_local`. <details> ``` ./target/release/neon_local stop rm -rf .neon ./target/release/neon_local init ./target/release/neon_local start ./target/release/neon_local tenant create --set-default ./target/release/neon_local endpoint create foo ./target/release/neon_local endpoint start foo psql 'postgresql://cloud_admin@127.0.0.1:55432/postgres' psql (13.16 (Debian 13.16-0+deb11u1), server 15.7) CREATE TABLE wal_test ( id SERIAL PRIMARY KEY, data TEXT ); DO $$ DECLARE i INTEGER := 1; BEGIN WHILE i <= 500000 LOOP INSERT INTO wal_test (data) VALUES ('data'); i := i + 1; END LOOP; END $$; -- => result is one L0 from initdb and one 137M-sized ephemeral-2 DO $$ DECLARE i INTEGER := 1; random_id INTEGER; random_record wal_test%ROWTYPE; start_time TIMESTAMP := clock_timestamp(); selects_completed INTEGER := 0; min_id INTEGER := 1; -- Minimum ID value max_id INTEGER := 100000; -- Maximum ID value, based on your insert range iters INTEGER := 100000000; -- Number of iterations to run BEGIN WHILE i <= iters LOOP -- Generate a random ID within the known range random_id := min_id + floor(random() * (max_id - min_id + 1))::int; -- Select the row with the generated random ID SELECT * INTO random_record FROM wal_test WHERE id = random_id; -- Increment the select counter selects_completed := selects_completed + 1; -- Check if a second has passed IF EXTRACT(EPOCH FROM clock_timestamp() - start_time) >= 1 THEN -- Print the number of selects completed in the last second RAISE NOTICE 'Selects completed in last second: %', selects_completed; -- Reset counters for the next second selects_completed := 0; start_time := clock_timestamp(); END IF; -- Increment the loop counter i := i + 1; END LOOP; END $$; ./target/release/neon_local stop baseline: commit `7c74112b2a` origin/main NOTICE: Selects completed in last second: 1864 NOTICE: Selects completed in last second: 1850 NOTICE: Selects completed in last second: 1851 NOTICE: Selects completed in last second: 1918 NOTICE: Selects completed in last second: 1911 NOTICE: Selects completed in last second: 1879 NOTICE: Selects completed in last second: 1858 NOTICE: Selects completed in last second: 1827 NOTICE: Selects completed in last second: 1933 ours NOTICE: Selects completed in last second: 1915 NOTICE: Selects completed in last second: 1928 NOTICE: Selects completed in last second: 1913 NOTICE: Selects completed in last second: 1932 NOTICE: Selects completed in last second: 1846 NOTICE: Selects completed in last second: 1955 NOTICE: Selects completed in last second: 1991 NOTICE: Selects completed in last second: 1973 ``` NB: the ephemeral file sizes differ by ca 1MiB, ours being 1MiB smaller. </details> # Rollout This PR changes the code in-place and is not gated by a feature flag.	2024-08-28 18:31:41 +00:00
Alex Chi Z.	63a0d0d039	fix(storage-scrubber): make retry error into warnings (#8851 ) We get many HTTP connect timeout errors from scrubber logs, and it turned out that the scrubber is retrying, and this is not an actual error. In the future, we should revisit all places where we log errors in the storage scrubber, and only error when necessary (i.e., errors that might need manual fixing) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-28 13:39:21 -04:00
Vlad Lazar	793b5061ec	storcon: track pageserver availability zone (#8852 ) ## Problem In order to build AZ aware scheduling, the storage controller needs to know what AZ pageservers are in. Related https://github.com/neondatabase/neon/issues/8848 ## Summary of changes This patch set adds a new nullable column to the `nodes` table: `availability_zone_id`. The node registration request is extended to include the AZ id (pageservers already have this in their `metadata.json` file). If the node is already registered, then we update the persistent and in-memory state with the provided AZ. Otherwise, we add the node with the AZ to begin with. A couple assumptions are made here: 1. Pageserver AZ ids are stable 2. AZ ids do not change over time Once all pageservers have a configured AZ, we can remove the optionals in the code and make the database column not nullable.	2024-08-28 18:23:55 +01:00
Yuchen Liang	a889a49e06	pageserver: do vectored read on each dio-aligned section once (#8763 ) Part of #8130, closes #8719. ## Problem Currently, vectored blob io only coalesce blocks if they are immediately adjacent to each other. When we switch to Direct IO, we need a way to coalesce blobs that are within the dio-aligned boundary but has gap between them. ## Summary of changes - Introduces a `VectoredReadCoalesceMode` for `VectoredReadPlanner` and `StreamingVectoredReadPlanner` which has two modes: - `AdjacentOnly` (current implementation) - `Chunked(<alignment requirement>)` - New `ChunkedVectorBuilder` that considers batching `dio-align`-sized read, the start and end of the vectored read will respect `stx_dio_offset_align` / `stx_dio_mem_align` (`vectored_read.start` and `vectored_read.blobs_at.first().start_offset` will be two different value). - Since we break the assumption that blobs within single `VectoredRead` are next to each other (implicit end offset), we start to store blob end offsets in the `VectoredRead`. - Adapted existing tests to run in both `VectoredReadCoalesceMode`. - The io alignment can also be live configured at runtime. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-28 15:54:42 +01:00
Vlad Lazar	5eb7322d08	docs: rolling storage controller restarts RFC (#8310 ) ## Problem Storage controller upgrades (restarts, more generally) can cause multi-second availability gaps. While the storage controller does not sit on the main data path, it's generally not acceptable to block management requests for extended periods of time (e.g. https://github.com/neondatabase/neon/issues/8034). ## Summary of changes This RFC describes the issues around the current storage controller restart procedure and describes an implementation which reduces downtime to a few milliseconds on the happy path. Related https://github.com/neondatabase/neon/issues/7797	2024-08-28 13:56:14 +00:00
Joonas Koivunen	c0ba18a112	bench: flush before shutting down (#8844 ) while driving by: - remove the extra tenant - remove the extra timelines implement this by turning the pg_compare to a yielding fixture. evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10571779162/index.html#suites/9681106e61a1222669b9d22ab136d07b/3bbe9f007b3ffae1/	2024-08-28 10:20:43 +01:00
John Spray	992a951b5e	.github: direct feature requests to the feedback form (#8849 ) ## Problem When folks open github issues for feature requests, they don't have a clear recipient: engineers usually see them during bug triage, but that doesn't necessarily get the work prioritized. ## Summary of changes Give end users a clearer path to submitting feedback to Neon	2024-08-28 09:22:19 +01:00
Heikki Linnakangas	c5ef779801	tests: Remove unnecessary entries from list of allowed errors (#8199 ) The "manual_gc" context was removed in commit `be0c73f8e7`. The code that generated the other error was removed in commit `9a6c0be823`.	2024-08-27 17:47:05 +01:00
Heikki Linnakangas	2d10306f7a	Remove support for pageserver <-> compute protocol version 1 (#8774 ) Protocol version 2 has been the default for a while now, and we no longer have any computes running in production that used protocol version 1. This completes the migration by removing support for v1 in both the pageserver and the compute. See issue #6211.	2024-08-27 18:36:33 +03:00
Alexey Kondratov	9b9f90c562	fix(walproposer): Do not restart on safekeepers reordering (#8840 ) ## Problem Currently, we compare `neon.safekeepers` values as is, so we unnecessarily restart walproposer even if safekeepers set didn't change. This leads to errors like: ```log FATAL: [WP] restarting walproposer to change safekeeper list from safekeeper-8.us-east-2.aws.neon.tech:6401,safekeeper-11.us-east-2.aws.neon.tech:6401,safekeeper-10.us-east-2.aws.neon.tech:6401 to safekeeper-11.us-east-2.aws.neon.tech:6401,safekeeper-8.us-east-2.aws.neon.tech:6401,safekeeper-10.us-east-2.aws.neon.tech:6401 ``` ## Summary of changes Split the GUC into the list of individual safekeepers and properly compare. We could've done that somewhere on the upper level, e.g., control plane, but I think it's still better when the actual config consumer is smarter and doesn't rely on upper levels.	2024-08-27 15:49:47 +02:00
Folke Behrens	52cb33770b	proxy: Rename backend types and variants as prep for refactor (#8845 ) * AuthBackend enum to AuthBackendType * BackendType enum to Backend * Link variants to Web * Adjust messages, comments, etc.	2024-08-27 14:12:42 +02:00
Conrad Ludgate	12850dd5e9	proxy: remove dead code (#8847 ) By marking everything possible as pub(crate), we find a few dead code candidates.	2024-08-27 12:00:35 +01:00
a-masterov	5d527133a3	Fix the pg_hintplan flakyness (#8834 ) ## Problem pg_hintplan test seems to be flaky, sometimes it fails, while usually it passes ## Summary of changes The regression test is changed to filter out the Neon service queries. The expected file is changed as well. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-08-27 12:39:42 +02:00
Arseny Sher	09362b6363	safekeeper: reorder routes and their handlers. Routes and their handlers were in a bit different order in 1) routes list 2) their implementation 3) python client 4) openapi spec, making addition of new ones intimidating. Make it the same everywhere, roughly lexicographically but preserving some of existing logic. No functional changes.	2024-08-27 07:37:55 +03:00
Alexey Kondratov	7820c572e7	fix(sql-exporter): Remove tenant_id from compute_logical_snapshot_files It appeared to be that it's already auto-added to all metrics [1] [1]: `3a907c317c/apps/base/ext-vmagent/vmagent.yaml (L43)`	2024-08-27 00:51:23 +02:00
Alexey Kondratov	bf03713fa1	fix(sql-exporter): Fix typo in gauge In `f4b3c317f` there was a typo and I missed that on review	2024-08-27 00:51:23 +02:00
Alex Chi Z.	0f65684263	feat(pageserver): use split layer writer in gc-compaction (#8608 ) Part of #8002, the final big PR in the batch. ## Summary of changes This pull request uses the new split layer writer in the gc-compaction. * It changes how layers are split. Previously, we split layers based on the original split point, but this creates too many layers (test_gc_feedback has one key per layer). * Therefore, we first verify if the layer map can be processed by the current algorithm (See https://github.com/neondatabase/neon/pull/8191, it's basically the same check) * On that, we proceed with the compaction. This way, it creates a large enough layer close to the target layer size. * Added a new set of functions `with_discard` in the split layer writer. This helps us skip layers if we are going to produce the same persistent key. * The delta writer will keep the updates of the same key in a single file. This might create a super large layer, but we can optimize it later. * The split layer writer is used in the gc-compaction algorithm, and it will split layers based on size. * Fix the image layer summary block encoded the wrong key range. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-26 14:19:47 -04:00
Christian Schwarz	97241776aa	pageserver: startup: ensure local disk state is durable (#8835 ) refs https://github.com/neondatabase/neon/issues/6989 Problem ------- After unclean shutdown, we get restarted, start reading the local filesystem, and make decisions based on those reads. However, some of the data might have not yet been fsynced when the unclean shutdown completed. Durability matters even though Pageservers are conceptually just a cache of state in S3. For example: - the cloud control plane is no control loop => pageserver responses to tenant attachmentm, etc, needs to be durable. - the storage controller does not rely on this (as much?) - we don't have layer file checksumming, so, downloaded+renamed but not fsynced layer files are technically not to be trusted - https://github.com/neondatabase/neon/issues/2683 Solution -------- `syncfs` the tenants directory during startup, before we start reading from it. This is a bit overkill because we do remove some temp files (InMemoryLayer!) later during startup. Further, these temp files are particularly likely to be dirty in the kernel page cache. However, we don't want to refactor that cleanup code right now, and the dirty data on pageservers is generally not that high. Last, with [direct IO](https://github.com/neondatabase/neon/issues/8130) we're going to have near-zero kernel page cache anyway quite soon.	2024-08-26 18:07:55 +02:00
Arpad Müller	2dd53e7ae0	Timeline archival test (#8824 ) This PR: * Implements the rule that archived timelines require all of their children to be archived as well, as specified in the RFC. There is no fancy locking mechanism though, so the precondition can still be broken. As a TODO for later, we still allow unarchiving timelines with archived parents. * Adds an `is_archived` flag to `TimelineInfo` * Adds timeline_archival_config to `PageserverHttpClient` * Adds a new `test_timeline_archive` test, loosely based on `test_timeline_delete` Part of #8088	2024-08-26 17:30:19 +02:00
Folke Behrens	d6eede515a	proxy: clippy lints: handle some low hanging fruit (#8829 ) Should be mostly uncontroversial ones.	2024-08-26 15:16:54 +02:00
Alexey Kondratov	d48229f50f	feat(compute): Introduce new compute_subscriptions_count metric (#8796 ) ## Problem We need some metric to sneak peek into how many people use inbound logical replication (Neon is a subscriber). ## Summary of changes This commit adds a new metric `compute_subscriptions_count`, which is number of subscriptions grouped by enabled/disabled state. Resolves: neondatabase/cloud#16146	2024-08-26 14:34:18 +02:00
Jakub Kołodziejczak	cdfdcd3e5d	chore: improve markdown formatting (#8825 ) fixes: ![Screenshot_2024-08-25_16-25-30](https://github.com/user-attachments/assets/c993309b-6c2d-4938-9fd0-ce0953fc63ff) fixes: ![Screenshot_2024-08-25_16-26-29](https://github.com/user-attachments/assets/cf497f4a-d9e3-45a6-a1a5-7e215d96d022)	2024-08-25 16:33:45 +01:00
Conrad Ludgate	06795c6b9a	proxy: new local-proxy application (#8736 ) Add binary for local-proxy that uses the local auth backend. Runs only the http serverless driver support and offers config reload based on a config file and SIGHUP	2024-08-23 22:32:10 +01:00
Conrad Ludgate	701cb61b57	proxy: local auth backend (#8806 ) Adds a Local authentication backend. Updates http to extract JWT bearer tokens and passes them to the local backend to validate.	2024-08-23 18:48:06 +00:00

1 2 3 4 5 ...

5988 Commits