Files
neon/postgres_ffi
Heikki Linnakangas 6127b6638b Major storage format rewrite
Major changes and new concepts:

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which relations
exist and what are their sizes, from Repository to a new module,
pgdatadir_mapping.rs. The interface to Repository is now a simple key-value
PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes with
an LSN.

Key
---

The key to the Repository key-value store is a Key struct, which consists
of a few integer fields. It's wide enough to store a full RelFileNode,
fork and block number, and to distinguish those from metadata keys.

See pgdatadir_mapping.rs for how relation blocks and metadata keys are
mapped to the Key struct.

Store arbitrary key-ranges in the layer files
---------------------------------------------

The concept of a "segment" is gone. Each layer file can store an arbitrary
range of Keys.

TODO:

- Deleting keys, to reclaim space. This isn't visible to Postgres, dropping
  or truncating a relation works as you would expect if you look at it from
  the compute node. If you drop a relation, for example, the relation is
  removed from the metadata entry, so that it appears to be gone. However,
  the layered repository implementation never reclaims the storage.

- Tracking "logical database size", for disk space quotas. That ought to
  be reimplemented now in pgdatadir_mapping.rs, or perhaps in walingest.rs.

- LSM compaction. The logic for checkpointing and creating image layers is
  very dumb. AFAIK the *read* code could deal with a full-fledged LSM tree
  now consisting of the delta and image layers. But there's no code to
  take a bunch of delta layers and compact them, and the heuristics for
  when to create image layers is pretty dumb.

- The code to track the layers is inefficient. All layers are just stored in
  a vector, and whenever we need to find a layer, we do a linear search in
  it.
2022-03-09 11:36:39 +02:00
..
2022-03-09 11:36:39 +02:00
2022-01-04 11:26:37 +02:00
2022-02-10 08:33:22 -05:00
2022-01-04 11:26:37 +02:00
2021-08-24 16:32:37 +03:00

This module contains utilities for working with PostgreSQL file
formats. It's a collection of structs that are auto-generated from the
PostgreSQL header files using bindgen, and Rust functions to read and
manipulate them.

There are also a bunch of constants in `pg_constants.rs` that are copied
from various PostgreSQL headers, rather than auto-generated. They mostly
should be auto-generated too, but that's a TODO.

The PostgreSQL on-disk file format is not portable across different
CPU architectures and operating systems. It is also subject to change
in each major PostgreSQL version. Currently, this module is based on
PostgreSQL v14, but in the future we will probably need a separate
copy for each PostgreSQL version.

TODO: Currently, there is also some code that deals with WAL records
in pageserver/src/waldecoder.rs.  That should be moved into this
module. The rest of the codebase should not have intimate knowledge of
PostgreSQL file formats or WAL layout, that knowledge should be
encapsulated in this module.