Files
neon/pageserver
Christian Schwarz 97241776aa pageserver: startup: ensure local disk state is durable (#8835)
refs https://github.com/neondatabase/neon/issues/6989

Problem
-------

After unclean shutdown, we get restarted, start reading the local
filesystem,
and make decisions based on those reads. However, some of the data might
have
not yet been fsynced when the unclean shutdown completed.

Durability matters even though Pageservers are conceptually just a cache
of state in S3. For example:
- the cloud control plane is no control loop => pageserver responses
  to tenant attachmentm, etc, needs to be durable.
  - the storage controller does not rely on this (as much?)
- we don't have layer file checksumming, so, downloaded+renamed but not
  fsynced layer files are technically not to be trusted
  - https://github.com/neondatabase/neon/issues/2683

Solution
--------

`syncfs` the tenants directory during startup, before we start reading
from it.

This is a bit overkill because we do remove some temp files
(InMemoryLayer!)
later during startup. Further, these temp files are particularly likely
to
be dirty in the kernel page cache. However, we don't want to refactor
that
cleanup code right now, and the dirty data on pageservers is generally
not that high. Last, with [direct
IO](https://github.com/neondatabase/neon/issues/8130) we're going to
have near-zero kernel page cache anyway quite soon.
2024-08-26 18:07:55 +02:00
..