mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-13 16:32:56 +00:00
This is a backwards-incompatible change. The new pageserver cannot read repositories created with an old pageserver binary, or vice versa. Simplify Repository to a value-store ------------------------------------ Move the responsibility of tracking relation metadata, like which relations exist and what are their sizes, from Repository to a new module, pgdatadir_mapping.rs. The interface to Repository is now a simple key-value PUT/GET operations. It's still not any old key-value store though. A Repository is still responsible from handling branching, and every GET operation comes with an LSN. Mapping from Postgres data directory to keys/values --------------------------------------------------- All the data is now stored in the key-value store. The 'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects like relation pages and SLRUs, to key-value pairs. The key to the Repository key-value store is a Key struct, which consists of a few integer fields. It's wide enough to store a full RelFileNode, fork and block number, and to distinguish those from metadata keys. 'pgdatadir_mapping.rs' is also responsible for maintaining a "partitioning" of the keyspace. Partitioning means splitting the keyspace so that each partition holds a roughly equal number of keys. The partitioning is used when new image layer files are created, so that each image layer file is roughly the same size. The partitioning is also responsible for reclaiming space used by deleted keys. The Repository implementation doesn't have any explicit support for deleting keys. Instead, the deleted keys are simply omitted from the partitioning, and when a new image layer is created, the omitted keys are not copied over to the new image layer. We might want to implement tombstone keys in the future, to reclaim space faster, but this will work for now. Changes to low-level layer file code ------------------------------------ The concept of a "segment" is gone. Each layer file can now store an arbitrary range of Keys. Checkpointing, compaction ------------------------- The background tasks are somewhat different now. Whenever checkpoint_distance is reached, the WAL receiver thread "freezes" the current in-memory layer, and creates a new one. This is a quick operation and doesn't perform any I/O yet. It then launches a background "layer flushing thread" to write the frozen layer to disk, as a new L0 delta layer. This mechanism takes care of durability. It replaces the checkpointing thread. Compaction is a new background operation that takes a bunch of L0 delta layers, and reshuffles the data in them. It runs in a separate compaction thread. Deployment ---------- This also contains changes to the ansible scripts that enable having multiple different pageservers running at the same time in the staging environment. We will use that to keep an old version of the pageserver running, for clusters created with the old version, at the same time with a new pageserver with the new binary. Author: Heikki Linnakangas Author: Konstantin Knizhnik <knizhnik@zenith.tech> Author: Andrey Taranik <andrey@zenith.tech> Reviewed-by: Matthias Van De Meent <matthias@zenith.tech> Reviewed-by: Bojan Serafimov <bojan@zenith.tech> Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech> Reviewed-by: Anton Shyrabokau <antons@zenith.tech> Reviewed-by: Dhammika Pathirana <dham@zenith.tech> Reviewed-by: Kirill Bulatov <kirill@zenith.tech> Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech> Reviewed-by: Alexey Kondratov <alexey@zenith.tech>
79 lines
2.3 KiB
Python
79 lines
2.3 KiB
Python
import os
|
|
import subprocess
|
|
|
|
from typing import Any, List
|
|
from fixtures.log_helper import log
|
|
|
|
|
|
def get_self_dir() -> str:
|
|
""" Get the path to the directory where this script lives. """
|
|
return os.path.dirname(os.path.abspath(__file__))
|
|
|
|
|
|
def mkdir_if_needed(path: str) -> None:
|
|
""" Create a directory if it doesn't already exist
|
|
|
|
Note this won't try to create intermediate directories.
|
|
"""
|
|
try:
|
|
os.mkdir(path)
|
|
except FileExistsError:
|
|
pass
|
|
assert os.path.isdir(path)
|
|
|
|
|
|
def subprocess_capture(capture_dir: str, cmd: List[str], **kwargs: Any) -> str:
|
|
""" Run a process and capture its output
|
|
|
|
Output will go to files named "cmd_NNN.stdout" and "cmd_NNN.stderr"
|
|
where "cmd" is the name of the program and NNN is an incrementing
|
|
counter.
|
|
|
|
If those files already exist, we will overwrite them.
|
|
Returns basepath for files with captured output.
|
|
"""
|
|
assert type(cmd) is list
|
|
base = os.path.basename(cmd[0]) + '_{}'.format(global_counter())
|
|
basepath = os.path.join(capture_dir, base)
|
|
stdout_filename = basepath + '.stdout'
|
|
stderr_filename = basepath + '.stderr'
|
|
|
|
with open(stdout_filename, 'w') as stdout_f:
|
|
with open(stderr_filename, 'w') as stderr_f:
|
|
log.info('(capturing output to "{}.stdout")'.format(base))
|
|
subprocess.run(cmd, **kwargs, stdout=stdout_f, stderr=stderr_f)
|
|
|
|
return basepath
|
|
|
|
|
|
_global_counter = 0
|
|
|
|
|
|
def global_counter() -> int:
|
|
""" A really dumb global counter.
|
|
|
|
This is useful for giving output files a unique number, so if we run the
|
|
same command multiple times we can keep their output separate.
|
|
"""
|
|
global _global_counter
|
|
_global_counter += 1
|
|
return _global_counter
|
|
|
|
|
|
def lsn_to_hex(num: int) -> str:
|
|
""" Convert lsn from int to standard hex notation. """
|
|
return "{:X}/{:X}".format(num >> 32, num & 0xffffffff)
|
|
|
|
|
|
def lsn_from_hex(lsn_hex: str) -> int:
|
|
""" Convert lsn from hex notation to int. """
|
|
l, r = lsn_hex.split('/')
|
|
return (int(l, 16) << 32) + int(r, 16)
|
|
|
|
|
|
def print_gc_result(row):
|
|
log.info("GC duration {elapsed} ms".format_map(row))
|
|
log.info(
|
|
" total: {layers_total}, needed_by_cutoff {layers_needed_by_cutoff}, needed_by_branches: {layers_needed_by_branches}, not_updated: {layers_not_updated}, removed: {layers_removed}"
|
|
.format_map(row))
|