rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 05:00:37 +00:00

Author	SHA1	Message	Date
Arpad Müller	61d2474632	Also check by the planned gc cutoff for lease creation (#10764 ) We don't want to allow new leases below the planned gc cutoff either. Other APIs like branch creation or getpage requests already enforce this.	2025-02-12 19:29:17 +00:00
Erik Grinaker	f62047ae97	pageserver: add separate semaphore for L0 compaction (#10780 ) ## Problem L0 compaction frequently gets starved out by other background tasks and image/GC compaction. L0 compaction must be responsive to keep read amplification under control. Touches #10694. Resolves #10689. ## Summary of changes Use a separate semaphore for the L0-only compaction pass. * Add a `CONCURRENT_L0_COMPACTION_TASKS` semaphore and `BackgroundLoopKind::L0Compaction`. * Add a setting `compaction_l0_semaphore` (default off via `compaction_l0_first`). * Use the L0 semaphore when doing an `OnlyL0Compaction` pass. * Use the background semaphore when doing a regular compaction pass (which includes an initial L0 pass). * While waiting for the background semaphore, yield for L0 compaction if triggered. * Add `CompactFlags::NoYield` to disable L0 yielding, and set it for the HTTP API route. * Remove the old `use_compaction_semaphore` setting and compaction-scoped semaphore. * Remove the warning when waiting for a semaphore; it's noisy and we have metrics.	2025-02-12 16:12:21 +00:00
Erik Grinaker	71c30e52fa	pageserver: properly yield for L0 compaction (#10769 ) ## Problem When image compaction yields for L0 compaction, it may not immediately schedule L0 compaction, because it just goes on to compact the next pending timeline. Touches #10694. Requires #10744. ## Summary of changes Extend `CompactionOutcome` with `YieldForL0` and `Skipped` variants, and immediately schedule an L0 compaction pass in the `YieldForL0` case.	2025-02-11 23:43:58 +00:00
Erik Grinaker	6c83ac3fd2	pageserver: do all L0 compaction before image compaction (#10744 ) ## Problem Image compaction can starve out L0 compaction if a tenant has several timelines with L0 debt. Touches #10694. Requires #10740. ## Summary of changes * Add an initial L0 compaction pass, in order of L0 count. * Add a tenant option `compaction_l0_first` to control the L0 pass (disabled by default). * Add `CompactFlags::OnlyL0Compaction` to run an L0-only compaction pass. * Clean up the compaction iteration logic. A later PR will use separate semaphores for the L0 and image compaction passes to avoid cross-tenant L0 starvation. That PR will also make image compaction yield if _any_ of the tenant's timelines have pending L0 compaction to further avoid starvation.	2025-02-11 22:08:46 +00:00
Heikki Linnakangas	635b67508b	Split utils::http to separate crate (#10753 ) Avoids compiling the crate and its dependencies into binaries that don't need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the release-line-debug-size-lto profile.	2025-02-11 22:06:53 +00:00
Arpad Müller	f7b2293317	Hardlink resident layers during detach ancestor (#10729 ) After a detach ancestor operation, we don't want to on-demand download layers that are already resident. This has shown to impede performance, sometimes quite a lot (50 seconds: https://github.com/neondatabase/neon/issues/8828#issuecomment-2643735644) Fixes #8828.	2025-02-11 16:58:34 +00:00
Arpad Müller	be447ba4f8	Change timeline_offloading setting default to true (#10760 ) This changes the default value of the `timeline_offloading` pageserver and tenant configs to true, now that offloading has been rolled out without problems. There is also a small fix in the tenant config merge function, where we applied the `lazy_slru_download` value instead of `timeline_offloading`. Related issue: https://github.com/neondatabase/cloud/issues/21353	2025-02-11 16:36:54 +00:00
Christian Schwarz	9247331c67	fix(page_service / batching): smgr op latency metric of dropped responses include flush time (#10756 ) # Problem Say we have a batch of 10 responses to send out. Then, even with - #10728 we've still only called observe_execution_end_flush_start for the first 3 responses. The remaining 7 response timers are still ticking. When compute now closes the connection, the waiting flush fails with an error and we `drop()` the remaining 7 responses' smgr op timers. The `impl Drop for SmgrOpTimer` will observe an execution time that includes the flush time. In practice, this is supsected to produce the `+Inf` observations in the smgr op latency histogram we've seen since the introduction of pipelining, even after shipping #10728. refs: - fixup of https://github.com/neondatabase/neon/pull/10042 - fixup of https://github.com/neondatabase/neon/pull/10728 - fixes https://github.com/neondatabase/neon/issues/10754	2025-02-11 14:05:59 +00:00
Alex Chi Z.	b0c7ee0175	feat(pageserver): better gc_compaction_split heuristics (#10727 ) ## Problem close https://github.com/neondatabase/neon/issues/10213 `range_search` only returns the top-most layers that may satisfy the search, so it doesn't include all layers that might be accessed (the user needs to recursively call this function). We need to retrieve the full layer map and find overlaps in order to have a correct heuristics of the job split. ## Summary of changes Retrieve all layers and find overlaps instead of doing `range_search`. The patch also reduces the time holding the layer map read guard. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-10 19:33:34 +00:00
Erik Grinaker	8c4e94107d	pageserver: notify compaction loop at threshold (#10740 ) ## Problem The compaction loop currently runs periodically, which can cause it to wait for up to 20 seconds before starting L0 compaction by default. Also, when we later separate the semaphores for L0 compaction and image compaction, we want to give up waiting for the image compaction semaphore if L0 compaction is needed on any timeline. Touches #10694. ## Summary of changes Notify the compaction loop when an L0 flush (on any timeline) exceeds `compaction_threshold`. Also do some opportunistic cleanups in the area.	2025-02-10 17:48:09 +00:00
Alex Chi Z.	b37f52fdf1	feat(pageserver): dump read path on missing key error (#10528 ) ## Problem helps investigate https://github.com/neondatabase/neon/issues/10482 ## Summary of changes In debug mode and testing mode, we will record all files visited by a read operation, and print it out when it errors. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-10 14:25:56 +00:00
Alex Chi Z.	443c8d0b4b	feat(pageserver): repartition on L0-L1 boundary (#10548 ) ## Problem Reduce the read amplification when doing `repartition`. ## Summary of changes Compute the L0-L1 boundary LSN and do repartition here. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-10 14:25:48 +00:00
Erik Grinaker	ac55e2dbe5	pageserver: improve tenant housekeeping task (#10725 ) # Problem walredo shutdown is done in the compaction task. Let's move it to tenant housekeeping. # Summary of changes * Rename "ingest housekeeping" to "tenant housekeeping". * Move walredo shutdown into tenant housekeeping. * Add a constant `WALREDO_IDLE_TIMEOUT` set to 3 minutes (previously 10x compaction threshold).	2025-02-08 12:42:55 +00:00
Erik Grinaker	874accd6ed	pageserver: misc task cleanups (#10723 ) This patch does a bunch of superficial cleanups of `tenant::tasks` to avoid noise in subsequent PRs. There are no functional changes. PS: enable "hide whitespace" when reviewing, due to the unindentation of large async blocks.	2025-02-08 11:02:13 +00:00
Christian Schwarz	6cd3b501ec	fix(page_service / batching): smgr op latency metrics includes the flush time of preceding requests (#10728 ) Before this PR, if a batch contains N responses, the smgr op latency reported for response (N-i) would include the time we spent flushing the preceding requests. refs: - fixup of https://github.com/neondatabase/neon/pull/10042 - fixes https://github.com/neondatabase/neon/issues/10674	2025-02-08 09:28:09 +00:00
Christian Schwarz	bf20d78292	fix(page_service): page reconstruct error log does not include `shard_id` label (#10680 ) # Problem Before this PR, the `shard_id` field was missing when page_service logs a reconstruct error. This was caused by batching-related refactorings. Example from staging: ``` 2025-01-30T07:10:04.346022Z ERROR page_service_conn_main{peer_addr=...}:process_query{tenant_id=... timeline_id=...}:handle_pagerequests:request:handle_get_page_at_lsn_request_batched{req_lsn=FFFFFFFF/FFFFFFFF}: error reading relation or page version: Read error: whole vectored get request failed because one or more of the requested keys were missing: could not find data for key ... ``` # Changes Delay creation of the handler-specific span until after shard routing This also avoids the need for the record() call in the pagestream hot path. # Testing Manual testing with a failpoint that is part of this PR's history but will be squashed away. # Refs - fixes https://github.com/neondatabase/neon/issues/10599	2025-02-07 19:45:39 +00:00
John Spray	9609f7547e	tests: address warnings in timeline shutdown (#10702 ) ## Problem There are a couple of log warnings tripping up `test_timeline_archival_chaos` - `[stopping left-over name="timeline_delete" tenant_shard_id=2d526292b67dac0e6425266d7079c253 timeline_id=Some(44ba36bfdee5023672c93778985facd9) kind=TimelineDeletionWorker\n')](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10672/13161357302/index.html#/testresult/716b997bb1d8a021)` - `ignoring attempt to restart exited flush_loop 503d8f401d8887cfaae873040a6cc193/d5eed0673ba37d8992f7ec411363a7e3\n')` Related: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - Downgrade the 'ignoring attempt to restart' to info -- there's nothing in the design that forbids this happening, i.e. someone calling maybe_spawn_flush_loop concurrently with shutdown() - Prevent timeline deletion tasks outliving tenants by carrying a gateguard. This logically makes sense because the deletion process does call into Tenant to update manifests.	2025-02-07 15:29:34 +00:00
Erik Grinaker	d6e87a3a9c	pageserver: add separate, disabled compaction semaphore (#10716 ) ## Problem L0 compaction can get starved by other background tasks. It needs to be responsive to avoid read amp blowing up during heavy write workloads. Touches #10694. ## Summary of changes Add a separate semaphore for compaction, configurable via `use_compaction_semaphore` (disabled by default). This is primarily for testing in staging; it needs further work (in particular to split image/L0 compaction jobs) before it can be enabled.	2025-02-07 15:11:31 +00:00
John Spray	08f92bb916	pageserver: clean up DeletionQueue push_layers_sync (#10701 ) ## Problem This is tech debt. While we introduced generations for tenants, some legacy situations without generations needed to delete things inline (async operation) instead of enqueing them (sync operation). ## Summary of changes - Remove the async code, replace calls with the sync variant, and assert that the generation is always set	2025-02-07 13:03:01 +00:00
Erik Grinaker	2943590694	pageserver: use histogram for background job semaphore waits (#10697 ) ## Problem We don't have visibility into how long an individual background job is waiting for a semaphore permit. ## Summary of changes * Make `pageserver_background_loop_semaphore_wait_seconds` a histogram rather than a sum. * Add a paced warning when a task takes more than 10 minutes to get a permit (for now). * Drive-by cleanup of some `EnumMap` usage.	2025-02-06 17:17:47 +00:00
Alex Chi Z.	f22d41eaec	feat(pageserver): num of background job metrics (#10690 ) ## Problem We need a metrics to know what's going on in pageserver's background jobs. ## Summary of changes * Waiting tasks: task still waiting for the semaphore. * Running tasks: tasks doing their actual jobs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-02-06 14:39:37 +00:00
Alexander Lakhin	977781e423	Enable sanitizers for postgres v17 (#10401 ) Add a build with sanitizers (asan, ubsan) to the CI pipeline and run tests on it. See https://github.com/neondatabase/neon/issues/6053 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-02-06 12:53:43 +00:00
Arpad Müller	67b71538d0	Limit returned lsn for timestamp by the planned gc cutoff (#10678 ) Often the output of the timestamp->lsn API is used as input for branch creation, and branch creation takes the planned lsn into account, i.e. rejects lsn's as branch lsns that are before the planned lsn. This patch doesn't fix all race conditions, it's still racy. But at least it is a step into the right direction. For #10639	2025-02-06 11:17:08 +00:00
Erik Grinaker	f4cfa725b8	pageserver: add a few critical errors (#10657 ) ## Problem Following #10641, let's add a few critical errors. Resolves #10094. ## Summary of changes Adds the following critical errors: * WAL sender read/decode failure. * WAL record ingestion failure. * WAL redo failure. * Missing key during compaction. We don't add an error for missing keys during GetPage requests, since we've seen a handful of these in production recently, and the cause is still unclear (most likely a benign race).	2025-02-06 10:30:27 +00:00
Arpad Müller	05326cc247	Skip gc cutoff lsn check at timeline creation if lease exists (#10685 ) Right now, branch creation doesn't care if a lsn lease exists or not, it just fails if the passed lsn is older than either the last or the planned gc cutoff. However, if an lsn lease exists for a given lsn, we can actually create a branch at that point: nothing has been gc'd away. This prevents race conditions that #10678 still leaves around. Related: #10639 https://github.com/neondatabase/cloud/issues/23667	2025-02-06 10:10:11 +00:00
Arpad Müller	b66fbd6176	Warn on basebackups for archived timelines (#10688 ) We don't want any external requests for an archived timeline. This includes basebackup requests, i.e. when a compute is being started up. Therefore, we'd like to forbid such basebackup requests: any attempt to get a basebackup on an archived timeline (or any getpage request really) is a cplane bug. Make this a warning for now so that, if there is potentially a bug, we can detect cases in the wild before they cause stuck operations, but the intention is to return an error eventually. Related: #9548	2025-02-06 10:09:20 +00:00
Christian Schwarz	1686d9e733	perf(page_service): dont `.instrument(span.clone())` the response flush (#10686 ) On my AX102 Hetzner box, removing this line removes about 20us from the `latency_mean` result in `test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant`. If the same 20us can be removed in the nightly benchmark run, this will be a ~10% improvement because there, mean latencies are about ~220us. This span was added during batching refactors, we didn't have it before, and I don't think it's terribly useful. refs - https://github.com/neondatabase/cloud/issues/21759	2025-02-06 08:33:37 +00:00
Erik Grinaker	abcd00181c	pageserver: set a concurrency limit for LocalFS (#10676 ) ## Problem The local filesystem backend for remote storage doesn't set a concurrency limit. While it can't/won't enforce a concurrency limit itself, this also bounds the upload queue concurrency. Some tests create thousands of uploads, which slows down the quadratic scheduling of the upload queue, and there is no point spawning that many Tokio tasks. Resolves #10409. ## Summary of changes Set a concurrency limit of 100 for the LocalFS backend. Before: `test_layer_map[release-pg17].test_query: 68.338 s` After: `test_layer_map[release-pg17].test_query: 5.209 s`	2025-02-06 07:24:36 +00:00
Alex Chi Z.	0ceeec9be3	fix(pageserver): schedule compaction immediately if pending (#10684 ) ## Problem The code is intended to reschedule compaction immediately if there are pending tasks. We set the duration to 0 before if there are pending tasks, but this will go through the `if period == Duration::ZERO {` branch and sleep for another 10 seconds. ## Summary of changes Set duration to 1 so that it doesn't sleep for too long. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 22:11:50 +00:00
Alex Chi Z.	733a57247b	fix(pageserver): disallow gc-compaction produce l0 layer (#10679 ) ## Problem Any compaction should never produce l0 layers. This never happened in my experiments, but would be good to guard it early. ## Summary of changes Disallow gc-compaction to produce l0 layers. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 20:44:28 +00:00
Alex Chi Z.	133b89a83d	feat(pageserver): continue from last incomplete image layer creation (#10660 ) ## Problem close https://github.com/neondatabase/neon/issues/10651 ## Summary of changes * Image layer creation starts from the next partition of the last processed partition if the previous attempt was not complete. * Add tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 17:35:39 +00:00
Erik Grinaker	f07119cca7	pageserver: add `pageserver_wal_ingest_values_committed` metric (#10653 ) ## Problem We don't have visibility into the ratio of image vs. delta pages ingested in Pageservers. This might be useful to determine whether we should compress WAL records before storing them, which in turn might make compaction more efficient. ## Summary of changes Add `pageserver_wal_ingest_values_committed` metric with dimensions `class=metadata\|data` and `kind=image\|delta`.	2025-02-05 14:33:04 +00:00
Vlad Lazar	f9009d6b80	pageserver: write heatmap to disk after uploading it (#10650 ) ## Problem We wish to make heatmap generation additive in https://github.com/neondatabase/neon/pull/10597. However, if the pageserver restarts and has a heatmap on disk from when it was a secondary long ago, we can end up keeping extra layers on the secondary's disk. ## Summary of changes Persist the heatmap after a successful upload.	2025-02-04 17:52:54 +00:00
Erik Grinaker	06090bbccd	pageserver: log critical error on `ClearVmBits` for unknown pages (#10634 ) ## Problem In #9895, we fixed some issues where `ClearVmBits` were broadcast to all shards, even those not owning the VM relation. As part of that, we found some ancient code from #1417, which discarded spurious incorrect `ClearVmBits` records for pages outside of the VM relation. We added observability in #9911 to see how often this actually happens in the wild. After two months, we have not seen this happen once in production or staging. However, out of caution, we don't want a hard error and break WAL ingestion. Resolves #10067. ## Summary of changes Log a critical error when ingesting `ClearVmBits` for unknown VM relations or pages.	2025-02-04 14:55:11 +00:00
Alex Chi Z.	e219d48bfe	refactor(pageserver): clearify compaction return value (#10643 ) ## Problem ## Summary of changes Make the return value of the set of compaction functions less confusing. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-03 21:56:55 +00:00
Alex Chi Z.	c1be84197e	feat(pageserver): preempt image layer generation if L0 piles up (#10572 ) ## Problem Image layer generation could block L0 compactions for a long time. ## Summary of changes * Refactored the return value of `create_image_layers_for_` functions to make it self-explainable. Preempt image layer generation in `Try` mode if L0 piles up. Note that we might potentially run into a state that only the beginning part of the keyspace gets image coverage. In that case, we either need to implement something to prioritize some keyspaces with image coverage, or tune the image_creation_threshold to ensure that the frequency of image creation could keep up with L0 compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-02-03 20:55:47 +00:00
OBBO67	b1e451091a	pageserver: clean up references to timeline delete marker, uninit marker (#5718 ) (#10627 ) ## Problem Since [#5580](https://github.com/neondatabase/neon/pull/5580) the delete and uninit file markers are no longer needed. ## Summary of changes Remove the remaining code for the delete and uninit markers. Additionally removes the `ends_with_suffix` function as it is no longer required. Closes [#5718](https://github.com/neondatabase/neon/issues/5718).	2025-02-03 11:54:07 +00:00
John Spray	aedeb1c7c2	pageserver: revise logging of cancelled request results (#10604 ) ## Problem When a client dropped before a request completed, and a handler returned an ApiError, we would log that at error severity. That was excessive in the case of a request erroring on a shutdown, and could cause test flakes. example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/13067651123/index.html#suites/ad9c266207b45eafe19909d1020dd987/6021ce86a0d72ae7/ ``` Cancelled request finished with an error: ShuttingDown ``` ## Summary of changes - Log a different info-level on ShuttingDown and ResourceUnavailable API errors from cancelled requests	2025-01-31 17:43:54 +00:00
John Spray	a93e9f22fc	pageserver: remove faulty debug assertion in compaction (#10610 ) ## Problem This assertion is incorrect: it is legal to see another shard's data at this point, after a shard split. Closes: https://github.com/neondatabase/neon/issues/10609 ## Summary of changes - Remove faulty assertion	2025-01-31 17:43:31 +00:00
John Spray	f09cfd11cb	pageserver: exclude archived timelines from freeze+flush on shutdown (#10594 ) ## Problem If offloading races with normal shutdown, we get a "failed to freeze and flush: cannot flush frozen layers when flush_loop is not running, state is Exited". This is harmless but points to it being quite strange to try and freeze and flush such a timeline. flushing on shutdown for an archived timeline isn't useful. Related: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - During Timeline::shutdown, ignore ShutdownMode::FreezeAndFlush if the timeline is archived	2025-01-31 10:54:14 +00:00
John Spray	e1273acdb1	pageserver: handle shutdown cleanly in layer download API (#10598 ) ## Problem This API is used in tests and occasionally for support. It cast all errors to 500. That can cause a failure on the log checks: https://neon-github-public-dev.s3.amazonaws.com/reports/main/13056992876/index.html#suites/ad9c266207b45eafe19909d1020dd987/683a7031d877f3db/ ## Summary of changes - Avoid using generic anyhow::Error for layer downloads - Map shutdown cases to 503 in http route	2025-01-30 22:43:36 +00:00
John Spray	6da7c556c2	pageserver: fix race cleaning up timeline files when shut down during bootstrap (#10532 ) ## Problem Timeline bootstrap starts a flush loop, but doesn't reliably shut down the timeline (incl. waiting for flush loop to exit) before destroying UninitializedTimeline, and that destructor tries to clean up local storage. If local storage is still being written to, then this is unsound. Currently the symptom is that we see a "Directory not empty" error log, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/main/12966756686/index.html#testresult/5523f7d15f46f7f7/retries ## Summary of changes - Move fallible IO part of bootstrap into a function (notably, this is fallible in the case of the tenant being shut down while creation is happening) - When that function returns an error, call shutdown() on the timeline	2025-01-30 20:33:22 +00:00
Alex Chi Z.	cf6dee946e	fix(pageserver): gc-compaction race with read (#10543 ) ## Problem close https://github.com/neondatabase/neon/issues/10482 ## Summary of changes Add an extra lock on the read path to protect against races. The read path has an implication that only certain kind of compactions can be performed. Garbage keys must first have an image layer covering the range, and then being gc-ed -- they cannot be done in one operation. An alternative to fix this is to move the layers read guard to be acquired at the beginning of `get_vectored_reconstruct_data_timeline`, but that was intentionally optimized out and I don't want to regress. The race is not limited to image layers. Gc-compaction will consolidate deltas automatically and produce a flat delta layer (i.e., when we have retain_lsns below the gc-horizon). The same race would also cause behaviors like getting an un-replayable key history as in https://github.com/neondatabase/neon/issues/10049. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-30 15:25:29 +00:00
Arpad Müller	93714c4c7b	secondary downloader: load metadata on loading of timeline (#10539 ) Related to #10308, we might have legitimate changes in file size or generation. Those changes should not cause warn log lines. In order to detect changes of the generation number while the file size stayed the same, load the metadata that we store on disk on loading of the timeline. Still do a comparison with the on-disk layer sizes to find any discrepancies that might occur due to race conditions (new metadata file gets written but layer file has not been updated yet, and PS shuts down). However, as it's possible to hit it in a race conditon, downgrade it to a warning. Also fix a mistake in #10529: we want to compare the old with the new metadata, not the old metadata with itself.	2025-01-30 12:03:36 +00:00
Erik Grinaker	6a2afa0c02	pageserver: add per-timeline read amp histogram (#10566 ) ## Problem We don't have per-timeline observability for read amplification. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes Add a per-timeline `pageserver_layers_per_read` histogram. NB: per-timeline histograms are expensive, but probably worth it in this case.	2025-01-30 11:24:49 +00:00
Erik Grinaker	d3db96c211	pageserver: add `pageserver_deltas_per_read_global` metric (#10570 ) ## Problem We suspect that Postgres checkpoints will limit the number of page deltas necessary to reconstruct a page, but don't know for certain. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes Add `pageserver_deltas_per_read_global` metric. This pairs with `pageserver_layers_per_read_global` from #10573.	2025-01-30 10:55:07 +00:00
Erik Grinaker	b24727134c	pageserver: improve read amp metric (#10573 ) ## Problem The current global `pageserver_layers_visited_per_vectored_read_global` metric does not appear to accurately measure read amplification. It divides the layer count by the number of reads in a batch, but this means that e.g. 10 reads with 100 L0 layers will only measure a read amp of 10 per read, while the actual read amp was 100. While the cost of layer visits are amortized across the batch, and some layers may not intersect with a given key, each visited layer contributes directly to the observed latency for every read in the batch, which is what we care about. Touches https://github.com/neondatabase/cloud/issues/23283. Extracted from #10566. ## Summary of changes * Count the number of layers visited towards each read in the batch, instead of the average across the batch. * Rename `pageserver_layers_visited_per_vectored_read_global` to `pageserver_layers_per_read_global`. * Reduce the read amp log warning threshold down from 512 to 100.	2025-01-30 09:27:40 +00:00
Alex Chi Z.	77ea9b16fe	fix(pageserver): use the larger one of upper limit and threshold (#10571 ) ## Problem Follow up of https://github.com/neondatabase/neon/pull/10550 in case the upper limit is set larger than threshold. It does not make sense for someone to enforce the behavior like "if there are >= 50 L0s, only compact 10 of them". ## Summary of changes Use the maximum of compaction threshold and upper limit when selecting L0 files to compact. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-30 00:05:40 +00:00
Alex Chi Z.	9dff6cc2a4	fix(pageserver): skip repartition if we need L0 compaction (#10547 ) ## Problem Repartition is slow, but it's only used in image layer creation. We can skip it if we have a lot of L0 layers to ingest. ## Summary of changes If L0 compaction is not complete, do not repartition and do not create image layers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-29 21:32:50 +00:00
Erik Grinaker	ff298afb97	pageserver: add `level` for timeline layer metrics (#10563 ) ## Problem We don't have good observability for per-timeline compaction debt, specifically the number of delta layers in the frozen, L0, and L1 levels. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes * Add a `level` label for `pageserver_layer_{count,size}` with values `l0`, `l1`, and `frozen`. * Track metrics for frozen layers. There is already a `kind={delta,image}` label. `kind=image` is only possible for `level=l1`. We don't include the currently open ephemeral layer, only frozen layers. There is always exactly 1 ephemeral layer, with a dynamic size which is already tracked in `pageserver_timeline_ephemeral_bytes`.	2025-01-29 21:10:56 +00:00

1 2 3 4 5 ...

2627 Commits