rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-27 10:00:38 +00:00

Author	SHA1	Message	Date
Christian Schwarz	140ef67dd8	refactor: replace LD_PRELOAD with a baked-in solution	2023-03-30 15:51:26 +02:00
Christian Schwarz	a698ddb8a4	fix: avoid needless timeline.clone()	2023-03-29 10:58:59 +02:00
Christian Schwarz	57d215e6bb	fix: suggestions commited from GitHub web didn't compile	2023-03-29 10:55:57 +02:00
Christian Schwarz	83813f2cb1	fix: remove unneeded clippy allow	2023-03-29 10:55:45 +02:00
Christian Schwarz	9a55e4f909	fix: structured logging of tenant_id Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-29 10:51:14 +02:00
Christian Schwarz	0b9a44a879	fix: structured logging of tenant_id Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-29 10:50:36 +02:00
Christian Schwarz	a6f9ebf178	fix: repeat tenant_id in debug message	2023-03-29 10:49:06 +02:00
Christian Schwarz	370b3637db	doc: add explainer to debug_assert	2023-03-29 10:47:15 +02:00
Christian Schwarz	d6c2867b46	doc: add debug_assert for self-documenting candidates.sort_unstable_by_kye()	2023-03-28 19:16:26 +02:00
Christian Schwarz	386c2d0112	refactor: go back to a single list The MinResidentSizePartition is effectively what `overage` was earlier, but more expressive and outside of EvictionCandidates. So switch the code back to a single list, but use (MinResidentSizePartition, EvictionCandidates) tuples. That eliminates the need for iter_in_eviction_order() alltogether. It consumes 8 bytes more memory per candidate, but, that doesn't matter for now.	2023-03-28 18:25:44 +02:00
Christian Schwarz	704d4f4640	doc: improve comment on min_resident_size	2023-03-28 18:06:44 +02:00
Christian Schwarz	dc72a9534e	doc: update doc comment for `collect_eviction_candidates` And move the impl of MinResidentSizePartitionedCandidates below it because it makes sense when reading the code top-down.	2023-03-28 18:05:31 +02:00
Christian Schwarz	07c44f9151	doc: hint that usage_assumed is modified in the loop	2023-03-28 17:47:11 +02:00
Christian Schwarz	0c10e6d3e7	feat: demote info logs to debug These would be per tenant, we don't want to emit thousands of log lines when this code runs.	2023-03-28 17:36:59 +02:00
Christian Schwarz	85becb148f	feat: bring back min_resident=max(all layers) behavior	2023-03-28 17:36:59 +02:00
Christian Schwarz	ea3c76a9d6	refactor: instead of 'overage', have two separate lists	2023-03-28 17:36:59 +02:00
Christian Schwarz	799576ab1e	Merge branch 'problame/disk-usage-eviction' into heikki/disk-usage-eviction	2023-03-28 15:24:52 +02:00
Christian Schwarz	0428b6822a	doc: more comment on lru_candidates to address questions from review	2023-03-28 11:53:48 +02:00
Christian Schwarz	42d63270a5	doc: add comment on extend_lru_candidates	2023-03-28 11:44:10 +02:00
Christian Schwarz	0056108c45	doc: remove stray comment	2023-03-28 11:44:10 +02:00
Christian Schwarz	54cc1d5064	doc: sub-headings for mechanics & policy in module comment	2023-03-28 11:44:10 +02:00
Joonas Koivunen	70c837a4b2	refactor: simplify max_layer_size as u64	2023-03-28 12:44:01 +03:00
Christian Schwarz	38d3061143	add comment on not-need to be 100% accurate about max_layer_size	2023-03-28 12:44:01 +03:00
Joonas Koivunen	75759f709f	doc: explain "bug" message, log layers	2023-03-28 12:44:01 +03:00
Joonas Koivunen	03ab5df081	chore: remove dead code	2023-03-28 12:44:01 +03:00
Joonas Koivunen	6e8d7b449f	refactor: combine nested macthes	2023-03-28 12:44:01 +03:00
Joonas Koivunen	17b5c8d1c4	refactor: get rid of ApproxAccurate	2023-03-28 12:44:01 +03:00
Joonas Koivunen	0943dd30eb	chore: clippy	2023-03-28 12:44:01 +03:00
Joonas Koivunen	b599755042	refactor: rename DiskUsageEvictionState => State	2023-03-28 12:44:01 +03:00
Joonas Koivunen	244185e6e6	doc: comment changes Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-03-28 12:44:01 +03:00
Joonas Koivunen	0a5043fae5	refactor: less static mutexes	2023-03-28 12:44:01 +03:00
Heikki Linnakangas	041b708dc6	rustfmt	2023-03-28 11:08:35 +03:00
Heikki Linnakangas	b6b8265450	Rewrite to make the algorithm more understandable (I hope). The algorithm is the same (with two small exceptions), but rewrite the way it's implemented to make it easier to follow. The exceptions: 1. 'min_resident_size' now protects at least that much data in the first "respectful" phase of the algorithm. Previously, it would evict layers until the resident size fell below min_resident_size. In other words, we know protect one more layer of each tenant, so that the resident size stays just above min_resident_size, while previously we would evict enough to bring the resident size just under min_resident_size. 2. Previously, the "max layer size" that's used as the default min_resident_size was calculated from all layers in the tenant, including remote layers. Now it's only calculated across all locally-present layers. I don't know if that was a deliberate choice, but this is slightly simpler.	2023-03-28 01:28:27 +03:00
Christian Schwarz	0462563d31	doc: more explanation of the added config knobs	2023-03-27 19:52:48 +02:00
Christian Schwarz	ef39b3f067	rename: dedupe comment about allow(dead_code)	2023-03-27 19:27:41 +02:00
Christian Schwarz	504256a0cc	rename: serde_percent::{Value => Percent}	2023-03-27 18:54:32 +02:00
Joonas Koivunen	e2b7e2962e	fixup: missing .get() as u64	2023-03-27 19:39:24 +03:00
Christian Schwarz	9d78e98467	refactor: move the percentage value deserialization to newtype in `utils`	2023-03-27 18:27:29 +02:00
Christian Schwarz	5aef192bf2	disk-usage-based layer eviction This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. If the above doesn't relieve enough pressure, we fall back to Global LRU. In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console. Co-authored-by: Joonas Koivunen <joonas@neon.tech> fixes https://github.com/neondatabase/neon/issues/3728	2023-03-20 16:51:09 +01:00

39 Commits