Files
neon/pageserver/Cargo.toml
Christian Schwarz 5aef192bf2 disk-usage-based layer eviction
This patch adds a pageserver-global background loop that evicts layers
in response to a shortage of available bytes in the $repo/tenants
directory's filesystem.

The loop runs periodically at a configurable `period`.

Each loop iteration uses `statvfs` to determine filesystem-level space
usage.  It compares the returned usage data against two different types
of thresholds. The iteration tries to evict layers until app-internal
accounting says we should be below the thresholds.  We cross-check this
internal accounting with the real world by making another `statvfs` at
the end of the iteration.  We're good if that second statvfs shows that
we're _actually_ below the configured thresholds.  If we're still above
one or more thresholds, we emit a warning log message, leaving it to the
operator to investigate further.

There are two thresholds: `max_usage_pct` is the relative available
space, expressed in percent of the total filesystem space. If the actual
usage is higher, the threshold is exceeded.  `min_avail_bytes` is the
absolute available space in bytes. If the actual usage is lower, the
threshold is exceeded.

The iteration evicts layers in LRU fashion with a reservation of up to
`min_resident_size` bytes of the most recent layers per tenant.
The layers not part of the per-tenant reservation are evicted
least-recently-used first until we're below all thresholds.
If the above doesn't relieve enough pressure, we fall back to Global LRU.

In addition to the loop, there is also an HTTP endpoint to perform
one loop iteration synchronous to the request.
The endpoint takes an absolute number of bytes that the iteration
needs to evict before pressure is relieved.
The tests use this endpoint, which is a great simplification over
setting up loopback-mounts in the tests, which would be required to
test the statvfs part of the implementation.
We will rely on manual testing in staging to test the statvfs parts.

The HTTP endpoint is also handy in emergencies where an operator wants
the pageserver to evict a given amount of space _now.
Hence, it's arguments documented in openapi_spec.yml.
The response type isn't documented though because we don't consider
it stable. The endpoint should _not_ be used by Console.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>

fixes https://github.com/neondatabase/neon/issues/3728
2023-03-20 16:51:09 +01:00

90 lines
2.4 KiB
TOML

[package]
name = "pageserver"
version = "0.1.0"
edition.workspace = true
license.workspace = true
[features]
default = []
# Enables test-only APIs, incuding failpoints. In particular, enables the `fail_point!` macro,
# which adds some runtime cost to run tests on outage conditions
testing = ["fail/failpoints"]
[dependencies]
anyhow.workspace = true
async-stream.workspace = true
async-trait.workspace = true
byteorder.workspace = true
bytes.workspace = true
chrono = { workspace = true, features = ["serde"] }
clap = { workspace = true, features = ["string"] }
close_fds.workspace = true
const_format.workspace = true
consumption_metrics.workspace = true
crc32c.workspace = true
crossbeam-utils.workspace = true
either.workspace = true
fail.workspace = true
futures.workspace = true
git-version.workspace = true
hex.workspace = true
humantime.workspace = true
humantime-serde.workspace = true
hyper.workspace = true
itertools.workspace = true
nix.workspace = true
num-traits.workspace = true
once_cell.workspace = true
pin-project-lite.workspace = true
postgres.workspace = true
postgres_backend.workspace = true
postgres-protocol.workspace = true
postgres-types.workspace = true
rand.workspace = true
regex.workspace = true
scopeguard.workspace = true
serde.workspace = true
serde_json = { workspace = true, features = ["raw_value"] }
serde_with.workspace = true
signal-hook.workspace = true
svg_fmt.workspace = true
sync_wrapper.workspace = true
tokio-tar.workspace = true
thiserror.workspace = true
tokio = { workspace = true, features = ["process", "sync", "fs", "rt", "io-util", "time"] }
tokio-postgres.workspace = true
tokio-util.workspace = true
toml_edit = { workspace = true, features = [ "serde" ] }
tracing.workspace = true
url.workspace = true
walkdir.workspace = true
metrics.workspace = true
pageserver_api.workspace = true
postgres_connection.workspace = true
postgres_ffi.workspace = true
pq_proto.workspace = true
remote_storage.workspace = true
storage_broker.workspace = true
tenant_size_model.workspace = true
utils.workspace = true
workspace_hack.workspace = true
reqwest.workspace = true
rpds.workspace = true
enum-map.workspace = true
enumset.workspace = true
strum.workspace = true
strum_macros.workspace = true
[dev-dependencies]
criterion.workspace = true
hex-literal.workspace = true
tempfile.workspace = true
[[bench]]
name = "bench_layer_map"
harness = false
[[bench]]
name = "bench_walredo"
harness = false