rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-21 15:10:44 +00:00

Author	SHA1	Message	Date
Kirill Bulatov	673c297949	Download timelines on demand	2021-12-10 17:23:35 +02:00
Dmitry Ivanov	7cec13d1df	Improve shutdown story for code coverage This patch introduces fixes for several problems affecting LLVM-based code coverage: * Daemonizing parent processes should call _exit() to prevent coverage data file corruption (.profraw) due to concurrent writes. Implement proper shutdown handlers in safekeeper.	2021-12-06 13:27:52 +03:00
Kirill Bulatov	670205e17a	Evict excessively failing sync tasks, improve processing for the rest of the tasks	2021-11-30 13:58:49 +02:00
Heikki Linnakangas	7cae265447	Fix dump_layerfile. The VirtualFile machinery panics if it's not initialized	2021-11-29 11:26:54 +02:00
Heikki Linnakangas	d47f610606	Fix pageserver CLI parameter names and document them	2021-11-25 13:31:52 +02:00
Heikki Linnakangas	431d32756b	Add a buffer cache, and use it to store materialized pages. The buffer cache is shared across all tenants, allowing memory to be dynamically allocated where it's needed the most. The cache works on 8 kB pages, and uses the clock algorithm for replacement policy; same as the PostgreSQL buffer cache. One peculiarity is that the materialized page versions can be looked up by an inexact LSN, to find the latest page version with an LSN >= the search key. The code is structured to support caching other kinds of pages in the same cache in the future, but with a different mapping key. Co-authored-by: Patrick Insinger <patrick@zenith.tech>	2021-11-12 11:02:12 -08:00
Heikki Linnakangas	9300107cdf	Cache Book objects, use virtual files to avoid running out of fds. Currently, whenever a page version is needed from an image or delta layer, we open the file and read and parse the bookfile headers. That's pretty expensive. To reduce the overhead, introduce a cache of open file descriptors, and use that to cache the Book objects so that we don't need to read the metadata on every access.	2021-11-10 17:19:37 +02:00
Dmitry Rodionov	987833e0b9	Propagate git SHA to zenith binaries Git commit sha is displayed when --version flag is used and is written to logs during service startup. Uses git_version crate when git is available, and GIT_VERSION environment variable otherwise which is the case for docker builds.	2021-11-04 14:22:29 +03:00
Kirill Bulatov	f36acf00de	Reduce "relish" word usages in remote storage	2021-11-04 12:53:42 +02:00
Heikki Linnakangas	fb524dd973	Put a global limit on memory used by in-memory layers. Adds simple global tracking of memory used by the in-memory layers. It's very approximate, it doesn't take into account allocator, memory fragmentation or many other things, but it's a good first step. After storing a WAL record in the repository, the WAL receiver checks if the global memory usage. If it's above a configurable threshold (hard coded at 128 MB at the moment), it evicts a layer. The victim layer is chosen by GClock algorithm, similar to that used in the Postgres buffer cache. This stops the page server from using an unbounded amount of memory. It's pretty crude, the eviction and materializing and writing a layer to disk happens now in the WAL receiver thread. It would be nice to move that to a background thread, and it would be nice to have a smarter policy on when to materialize a new image layer and when to just write out a delta layer, and it would be nice to have more accurate accounting of memory. But this should fix the most pressing OOM issues, and is a step in the right direction. Co-authored-by: Patrick Insinger <patrickinsinger@gmail.com>	2021-11-02 15:49:39 +02:00
Patrick Insinger	b532470792	Set SO_REUSEADDR for all TCP listeners	2021-10-29 12:45:26 -07:00
Kirill Bulatov	e9b5224a8a	Fix toml serde gotchas	2021-10-18 14:14:27 +03:00
Kirill Bulatov	ba557d126b	React on sigint	2021-10-15 21:24:24 +03:00
anastasia	d7c9dd06f4	Implement graceful shutdown at 'pageserver stop': - perform checkpoint for each tenant repository. - wait for the completion of all threads. Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.	2021-10-11 13:35:01 +03:00
Heikki Linnakangas	7216f22609	Use tracing crate to have more context in log messages. Whenever we start processing a request, we now enter a tracing "span" that includes context information like the tenant and timeline ID, and the operation we're performing. That context information gets attached to every log message we create within the span. That way, we don't need to include basic context information like that in every log message, and it also becomes easier to filter the logs programmatically. This removes the eplicit timeline and tenant IDs from most log messages, as you get that information from the enclosing span now. Also improve log messages in general, dialing down the level of some messages that are not very useful, and adding information to others. We now obey the RUST_LOG env variable, if it's set. The 'tracing' crate allows for different log formatters, like JSON or bunyan output. The one we use now is human-readable multi-line format, which is nice when reading the log directly, but hard for post-processing. For production, we'll probably want JSON output and some tools for working with it, but that's left as a TODO. The log format is easy to change.	2021-10-11 08:59:06 +03:00
Egor Suvorov	7e190d72a5	Make `pageserver_` prefix for common metric names configurable (#681 )	2021-10-05 19:06:44 +03:00
Kirill Bulatov	5719f13cb2	Rework the relish thread model (#689 )	2021-10-05 10:15:56 +03:00
Heikki Linnakangas	e474790400	Print more details on errors to log Fixes https://github.com/zenithdb/zenith/issues/661	2021-10-01 17:57:41 +03:00
Kirill Bulatov	287ea2e5e3	Limit concurrent relish storage sync operations	2021-10-01 08:37:09 +03:00
Kirill Bulatov	fb05e4cb0b	Show better error messages on pageserver failures	2021-09-29 01:55:41 +03:00
Egor Suvorov	b0a7234759	pageserver: fix stale default listen addrs * In command line help * In dummy_conf	2021-09-28 20:57:51 +03:00
Egor Suvorov	3065532f15	pageserver: fix mistype in listen-http arg help	2021-09-28 20:57:51 +03:00
Kirill Bulatov	1d5abf1253	Initial version of the relish storage	2021-09-17 15:30:22 +03:00
anastasia	8de41f1d70	Change checkpoint_distance type to u64	2021-09-16 12:33:50 +03:00
anastasia	6984d33b4e	Run GC and checkpointer separate threads. Add checkpoint_period configuration parameter	2021-09-16 12:33:50 +03:00
anastasia	98d4f9cea5	Add checkpoint_distance config parameter. - Change hardcoded OLDEST_INMEM_DISTANCE value to pageserver config option checkpoint_distance. - Get rid of 'force' flag in checkpoint_internal(). Use checkpoint_distance=0 instead.	2021-09-16 12:33:50 +03:00
Dmitry Rodionov	4ebe643d0c	Support parallel test running for python tests Support is done via pytest-xdist plugin. To use the feature add -n<concurrency> to pytest invocation e.g. pytest -n8 to run 8 tests in parallel. Changes in code are mostly about ports assigning. Previously port for pageserver was hardcoded without the ability to override through zenith cli and ports for started compute nodes were calculated twice, in zenith cli and in test code. Now zenith cli supports port arguments for pageserver and compute nodes to be passed explicitly. Tests are modified in such a way that each worker gets a non overlapping port range which can be configured and now contains 100 ports. These ports are distributed to test services (pageserver, wal acceptors, compute nodes) so they can work independently.	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	84008a2560	factor out common logging initialisation routine This contains a lowest common denominator of pageserver and safekeeper log initialisation routines. It uses daemonize flag to decide where to stream log messages. In case daemonize is true log messages are forwarded to file. Otherwise streaming to stdout is used. Usage of stdout for log output is the default in docker side of things, so make it easier to browse our logs via builtin docker commands.	2021-09-14 18:09:14 +03:00
Patrick Insinger	7c62a57e54	initialize tenant_mgr after daemonizing Ran into problems launching the WAL redo process on OS X after 4b73ad. Launching the `initdb` process was met with "bad file descriptor" errors. Using dtrace, I found shortly after calling `posix_spawn` for `initdb`, `kevent` was returning this error. I haven't dug super deep to see if the daemonization itself is the problem, but this commit fixes it for me. My hunch is that some file descriptors used when the Tokio runtime is initailzed become invalid in the daemon process.	2021-09-10 13:00:39 +03:00
Dmitry Rodionov	4b73ada26e	fix connection error appeared on zenith start by binding sockets before daemonization also use less annoying error reporting by not printing full error messages for connect errors in first several connection retries closes #507	2021-09-07 20:50:27 +03:00
Dmitry Rodionov	bc709561b6	fix clippy warnings	2021-09-02 18:54:44 +03:00
Kirill Bulatov	0e4cbe0165	Fix some typos	2021-09-02 17:27:18 +03:00
Heikki Linnakangas	d7bebd8074	Add 'dump_layerfile' utility for debugging. Seems handy for getting a quick idea of what's stored in an image or delta layer file. Example output on a file after runnnig pgbench for a while: % ./target/debug/dump_layerfile pgbench_layers/pg_control_checkpoint_0_00000000016B914A ----- image layer for checkpoint.0 at 0/16B914A ---- non-blocky (88 bytes) % ./target/debug/dump_layerfile pgbench_layers/pg_xact_0000_0_000000000412FD40 ----- image layer for pg_xact/0000.0 at 0/412FD40 ---- (1) blocks % ./target/debug/dump_layerfile pgbench_layers/rel_1663_14236_1247_0_0_00000000016B914A_000000000412FD40 \| head -n 20 ----- delta layer for 1663/14236/1247.0 0/16B914A-0/412FD40 ---- --- relsizes --- 0/16B914A: 14 0/16CA559: 15 --- page versions --- blk 13 at 0/16BB1D2: rec 8162 bytes will_init: true HEAP INSERT blk 14 at 0/16CA559: rec 8241 bytes will_init: true XLOG FPI blk 14 at 0/16CA637: rec 215 bytes will_init: true HEAP INSERT blk 14 at 0/16DF14F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16DF3A7: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E0637: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E088F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E5F9F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E620F: rec 215 bytes will_init: false HEAP INSERT	2021-09-01 12:20:16 -07:00
Heikki Linnakangas	b949127b06	Rename page_cache.rs to tenant_mgr.rs. Once upon a time, 'page_cache.rs' contained an actual page cache, but it hasn't for a very long time. Rename to reflect what it actually does these days.	2021-08-30 15:17:30 +03:00
Heikki Linnakangas	4046530160	Remove remnants of choosing between repository formats. Now that we only have one Repository implementation, no need for the command-line options to choose it either. I'm removing these as a separate commit to show what we will need to do if we add another Repository implementation in the future (even though I don't foresee us doing that any time soon)	2021-08-25 18:37:22 +03:00
Heikki Linnakangas	5998744bcc	Remove rocksdb implementation. The layered storage format is good enough that we don't need the rocksdb implementation anymore. There are a lot of known issues but we'll keep working on them.	2021-08-25 18:37:22 +03:00
Dmitry Rodionov	f2f02a8af0	apply transformation (Arc<Option> -> Option<Arc>) suggested by @funbringer	2021-08-24 19:05:00 +03:00
Dmitry Rodionov	b135723994	review adjustments	2021-08-24 19:05:00 +03:00
Dmitry Rodionov	23b5249512	translate pageserver api to http	2021-08-24 19:05:00 +03:00
Heikki Linnakangas	2450f82de5	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>" The unit tests have been refactored to exercise both implementations. 'layered' is now the default. Push/pull is not implemented. The 'test_history_inmemory' test has been commented out accordingly. It's not clear how we will implement that functionality; probably by copying the snapshot files directly.	2021-08-16 10:06:48 +03:00
Dmitry Rodionov	ce5333656f	Introduce authentication v0.1. Current state with authentication. Page server validates JWT token passed as a password during connection phase and later when performing an action such as create branch tenant parameter of an operation is validated to match one submitted in token. To allow access from console there is dedicated scope: PageServerApi, this scope allows access to all tenants. See code for access validation in: PageServerHandler::check_permission. Because we are in progress of refactoring of communication layer involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper now doesn’t check token passed from compute, and uses “hardcoded” token passed via environment variable to communicate with pageserver. Compute postgres now takes token from environment variable and passes it as a password field in pageserver connection. It is not passed through settings because then user will be able to retrieve it using pg_settings or SHOW .. I’ve added basic test in test_auth.py. Probably after we add authentication to remaining network paths we should enable it by default and switch all existing tests to use it.	2021-08-11 20:05:54 +03:00
Dmitry Ivanov	cb1b4a12a6	Add some prometheus metrics to pageserver The metrics are served by an http endpoint, which is meant to be spawned in a new thread. In the future the endpoint will provide more APIs, but for the time being, we won't bother with proper routing.	2021-08-03 21:42:24 +03:00
Heikki Linnakangas	47824c5fca	Remove page server interactive mode. It was pretty cool, but no one used it, and it had gotten badly out of date. The main interesting thing with it was to see some basic metrics on the fly, while the page server is running, but the metrics collection had been broken for a long time, too. Best to just remove it.	2021-07-23 12:21:21 +03:00
Dmitry Rodionov	767590bbd5	support tenants this patch adds support for tenants. This touches mostly pageserver. Directory layout on disk is changed to contain new layer of indirection. Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant id>. Tenant id has the same format as timeline id. Tenant id is included in pageserver commands when needed. Also new commands are available in pageserver: tenant_list, tenant_create. This is also reflected CLI. During init default tenant is created and it's id is saved in CLI config, so following commands can use it without extra options. Tenant id is also included in compute postgres configuration, so it can be passed via ServerInfo to safekeeper and in connection string to pageserver. For more info see docs/multitenancy.md.	2021-07-22 20:54:20 +03:00
anastasia	c913404739	Redirect log to pageserver.log during zenith init. Add new module logger.rs that contains shared code to init logging	2021-07-21 18:56:34 +03:00
sharnoff	c4b2bf7ebd	Use 'zenith_admin' as superuser name in `initdb`	2021-07-21 17:22:22 +03:00
Konstantin Knizhnik	9838c71a47	Explicit compact (#341 ) * Do no perform compaction of RocksDB storage on each GC iteration * Increase GC timeout to let GC tests passed * Add comment to gc_iteration	2021-07-19 16:49:12 +03:00
Stas Kelvich	2b33894e7b	few more review fixes	2021-07-19 14:52:41 +03:00
Dmitry Rodionov	ed0fcfa9b7	replace parse_duration crate because of unpatched known vulnerability resolves #87	2021-07-16 14:30:27 +03:00
Dmitry Rodionov	75e717fe86	allow both domains and ip addresses in connection options for pageserver and wal keeper. Also updated PageServerNode definition in control plane to account for that. resolves #303	2021-07-09 16:46:21 +03:00

1 2 3

145 Commits