rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-25 09:00:37 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	534fd33e3d	Start GC in page cache initialization rather than during branch creation	2021-08-02 19:59:12 +03:00
Konstantin Knizhnik	6b2e7d9499	Implement background GC for layered repo	2021-08-02 18:18:30 +03:00
Heikki Linnakangas	ef5f3eb514	Fix object repository startup	2021-08-02 16:03:04 +03:00
Heikki Linnakangas	be1386a555	Hold list of all layers in memory. Previously, the LayerMap was only used as a cache to hold the snapshot layers that were loaded into memory. As a result, we often had to scan the filesystem to get list of all the other snapshot files that exist on disk, but hadn't been loaded into memory yet. That was very slow, consuming huge amounts of CPU and causing timeouts in any non-trivial tests. Refactor so that on startup, we scan the directory once and keep the list of layers in memory.	2021-08-02 14:59:26 +03:00
Heikki Linnakangas	aad4d1da85	Fix pg_regress tests. It was getting stuck at CREATE DATABASE. It's because freeze() function did the wrong thing if there were any page versions newer than the 'end_lsn' argument. The possibility for this was mentioned in a FIXME comment earlier. Plus misc comment and error message cleanup	2021-08-01 22:34:54 +03:00
Heikki Linnakangas	fbff37a64c	Merge fixes	2021-08-01 13:44:02 +03:00
Heikki Linnakangas	2b080b49c4	Fixes to rebase this over 'main' and 'relishes' branches. This includes some changes that would've been more neat to have as separate commits, but oh well, this 'layered-repo' branch will need to be squashed before it's committed to 'main' anyway: - introduce a new SnaphotFilename struct in snapshot_layer.rs, to make it more convenient to work with snapshot file filenames. - Fix the code in get_layer_for_read() to get the right layer from disk even if some older layer was in cache. There was a FIXME for this, but it didn't apparently cause trouble before. It started to cause regression failures after the rebase, I think because that scenario arised in with the Clog in the test_branch_behind test.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	dd63b81539	Rename some functions and variables to talk about "layers". The "snapshot files" term was a leftover from before the code for the inmemory layer was separated from the on-disk layer code. There are a few places left that deal specifically with snapshot layers, and "snapshot file" still makes sense in those, but replace most instances to "layer".	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	f8e533bbdf	Add a README file to explain how the snapshot files are managed. This explains how and when snapshot files are created, how they're used to find the correct page version, and how garbage collection works. I tried to resist the temptation to write how it should work, and purely document how it currently works in this branch.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	8d0086f749	Expand comment on the policy of when we dump in-memory layers to disk.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	d285898c73	Disable GC test specific to the rocksdb implementation	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	61761bf1ce	Implement unlinking relations in layered storage. If a relation is dropped, the last snapshot file for it is given the _DROPPED suffix. The garbage collection knows that it can remove the file when it's old enough, even if there is no newer file.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	0b2ed17f86	Implement garbage collection. The counters returned by garbage collection, in GcResult, don't make much sense with the snapshot files implemention, so I added new counters. That broke the test_gc test, so I made a copy of it as test_snapfiles_gc based on the new counters. Handling relation drops is still not implemented.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	df8e3e1695	Ignore failing test. CI won't run the python tests if 'cargo test' is failing.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	99f3775d68	Bump up the rust version we use. The 'aversion' crate version 0.2.0 needs Rust 1.52. That was relaxed at https://github.com/ericseppanen/aversion/pull/3, but there's no released crate version with that change yet. Maybe we should bump up the rust version anyway, not sure, but this should fix the immediate problem of compiling this branch on CI.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	8f81ac064e	Use the 'bookfile' crate for the snapshot files.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	3b84975ca9	Checkpoint more often, to generate more snapshot files. This should help make bugs more shallow.	2021-08-01 13:17:18 +03:00
Heikki Linnakangas	df3e403967	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' in the "pageserver init" call, but there is no corresponding --repository-formt option in 'zenith init', so in practice you have to change the default in pageserver.rs if you want to test different implementations. The unit tests have been refactored to exercise both implementations, though. 'layered' is now the default. TODOs: - Push/pull is not implemented, causing 'test_history_inmemory' test in 'cargo test' to fail. - Garbage collection has not been implemented yet. The 'test_gc' test is failing because of that. - Unlinking relations has not been implemented either. (That has no user visible effect until garbage collection is implemented)	2021-08-01 13:17:16 +03:00
Heikki Linnakangas	4f7b22a8a8	Refactor ObjectTags, intruducing a new concept called "relish" This clarifies - I hope - the abstractions between Repository and ObjectRepository. The ObjectTag struct was a mix of objects that could be accessed directly through the public Timeline interface, and also objects that were created and used internally by the ObjectRepository implementation and not supposed to be accessed directly by the callers. With the RelishTag separaate from ObjectTag, the distinction is more clear: RelishTag is used in the public interface, and ObjectTag is used internally between object_repository.rs and object_store.rs, and it contains the internal metadata object types. One awkward thing with the ObjectTag struct was that the Repository implementation had to distinguish between ObjectTags for relations, and track the size of the relation, while others were used to store "blobs". With the RelishTags, some relishes are considered "non-blocky", and the Repository implementation is expected to track their sizes, while others are stored as blobs. I'm not 100% happy with how RelishTag captures that either: it just knows that some relish kinds are blocky and some non-blocky, and there's an is_block() function to check that. But this does enable size-tracking for SLRUs, allowing us to treat them more like relations. This changes the way SLRUs are stored in the repository. Each SLRU segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a separate relish. This removes the need for the SLRU-specific put_slru_truncate() function in the Timeline trait. SLRU truncation is now handled by caling put_unlink() on the segment. This is more in line with how PostgreSQL stores SLRUs and handles their trunction. The SLRUs are "blocky", so they are accessed one 8k page at a time, and repository tracks their size. I considered an alternative design where we would treat each SLRU segment as non-blocky, and just store the whole file as one blob. Each SLRU segment is up to 256 kB in size, which isn't that large, so that might've worked fine, too. One reason I didn't do that is that it seems better to have the WAL redo routines be as close as possible to the PostgreSQL routines. It doesn't matter much in the repository, though; we have to track the size for relations anyway, so there's not much difference in whether we also do it for SLRUs.	2021-08-01 13:16:16 +03:00
Heikki Linnakangas	3a3e48059c	Handle SLRU ZERO records directly by storing an all-zeros page image. It's simpler than storing the original WAL record.	2021-08-01 13:16:16 +03:00
Heikki Linnakangas	acc0f41985	Don't try to launch duplicate WAL redo thread if tenant already exists. The codepath for tenant_create command first launched the WAL redo thread, and then called branches::create_repo() which checked if the tenant's directory already exists. That's problematic, because launching the WAL redo thread will run initdb if the directory doesn't already exist. Race condition: If the tenant already exists, it will have a WAL redo thread already running, and the old and new WAL redo thread might try to run initdb at the same time, causing all kinds of weird failures. The test_pageserver_api test was failing 100% repeatably on my laptop because of this. I'm not sure why this doesn't occur on the CI: Jul 31 18:05:48.877 INFO running initdb in "./tenants/5227e4eb90894775ac6b8a8c76f24b2e/wal-redo-datadir", location: pageserver::walredo, pageserver/src/walredo.rs:483 thread 'WAL redo thread' panicked at 'initdb failed: The files belonging to this database system will be owned by user "heikki". This user must also own the server process. The database cluster will be initialized with locale "C". The default database encoding has accordingly been set to "SQL_ASCII". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory ./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Europe/Helsinki creating configuration files ... ok running bootstrap script ... stderr: 2021-07-31 15:05:48.875 GMT [282569] LOG: could not open configuration file "/home/heikki/git-sandbox/zenith/test_output/test_tenant_list/repo/./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir/postgresql.conf": No such file or directory 2021-07-31 15:05:48.875 GMT [282569] FATAL: configuration file "/home/heikki/git-sandbox/zenith/test_output/test_tenant_list/repo/./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir/postgresql.conf" contains errors child process exited with exit code 1 initdb: removing data directory "./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir"	2021-07-31 18:13:21 +03:00
Alexey Kondratov	bd7d811921	Add libseccomp-dev as a dep to Dockerfile	2021-07-25 17:46:47 +03:00
anastasia	14b6796915	Send pgdata subdirs with basebackup. Fix for `1e6267a`.	2021-07-25 17:46:47 +03:00
Max Sharnoff	3f4815efa2	Correct `LeSer` doc: "Big Endian" -> "Little Endian" (#362 )	2021-07-23 12:38:37 -07:00
anastasia	ec03848d2f	Fix pageserver.log destination for zenith init. The problem was caused by merge conflict in `767590b`	2021-07-23 16:22:01 +03:00
anastasia	1e6267a35f	Get rid of snapshot directory + related code cleanup and refactoring. - Add new subdir postgres_ffi/samples/ for config file samples. - Don't copy wal to the new branch on zenith init or zenith branch. - Import_timeline_wal on zenith init.	2021-07-23 13:21:45 +03:00
Heikki Linnakangas	47824c5fca	Remove page server interactive mode. It was pretty cool, but no one used it, and it had gotten badly out of date. The main interesting thing with it was to see some basic metrics on the fly, while the page server is running, but the metrics collection had been broken for a long time, too. Best to just remove it.	2021-07-23 12:21:21 +03:00
Dmitry Rodionov	767590bbd5	support tenants this patch adds support for tenants. This touches mostly pageserver. Directory layout on disk is changed to contain new layer of indirection. Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant id>. Tenant id has the same format as timeline id. Tenant id is included in pageserver commands when needed. Also new commands are available in pageserver: tenant_list, tenant_create. This is also reflected CLI. During init default tenant is created and it's id is saved in CLI config, so following commands can use it without extra options. Tenant id is also included in compute postgres configuration, so it can be passed via ServerInfo to safekeeper and in connection string to pageserver. For more info see docs/multitenancy.md.	2021-07-22 20:54:20 +03:00
Stas Kelvich	d210ba5fdb	Update README.md	2021-07-22 20:33:34 +03:00
Dmitry Ivanov	8b656bad5f	Add a missing [cfg(test)] We don't always need to compile tests.	2021-07-22 16:46:27 +03:00
Dmitry Ivanov	97329d4906	Add a test for EOF in walkeeper's background thread It would be nice to have a proper Timeline mock api, but this time we'll get by with what we have.	2021-07-22 12:12:55 +03:00
Dmitry Ivanov	6a3b9b1d46	Fix accidental busyloop in walkeeper's background thread It used to be the case that walkeeper's background thread failed to recognize the end of stream (EOF) signaled by the `Ok(None)` result of `FeMessage::read`.	2021-07-22 12:12:55 +03:00
anastasia	c913404739	Redirect log to pageserver.log during zenith init. Add new module logger.rs that contains shared code to init logging	2021-07-21 18:56:34 +03:00
anastasia	8e42af9b1d	Remove unused 'identify_system' pageserver query	2021-07-21 18:55:41 +03:00
Arseny Sher	fe17188464	Alternative way to truncate behind-the-vcl part of log. Which is important to do before bumping epoch.	2021-07-21 17:27:05 +03:00
Arseny Sher	51b50f5cf5	Fix truncating the wal after VCL.	2021-07-21 17:27:05 +03:00
Arseny Sher	9e3fe2b4d4	Truncate not matching part of log. ref #296	2021-07-21 17:27:05 +03:00
Arseny Sher	eb1618f2ed	TLA+ specification of proposer-acceptor consensus protocol. And .cfg file for running TLC. ref #293	2021-07-21 17:27:05 +03:00
Stas Kelvich	791312824d	set superuser name in python tests too	2021-07-21 17:22:22 +03:00
Stas Kelvich	a17b2a4364	reflect postgres superuser changes in pageserver->compute connstring	2021-07-21 17:22:22 +03:00
sharnoff	c4b2bf7ebd	Use 'zenith_admin' as superuser name in `initdb`	2021-07-21 17:22:22 +03:00
Konstantin Knizhnik	0723d49e0b	Object push (#276 ) * Introducing common enum ObjectVal for all values * Rewrite push mechanism to use raw object copy * Fix history unit test * Add skip_nonrel_objects functions for history unit tests	2021-07-21 00:41:57 +03:00
Eric Seppanen	ad79ca05e9	suppress nullptr warnings on auto-generated bindgen unit tests Hopefully, this will be addressed upstream before too long; see rust-bindgen issue #1651.	2021-07-20 20:12:15 +03:00
Heikki Linnakangas	325dd41277	Remove unused constructor function. This was failing to compile with rustc nightly version, because the datatype of 'fullPageWrites' was changed. See discussion at https://github.com/zenithdb/zenith/issues/207#issuecomment-881478570. But since the function is actually unused, let's just remove it.	2021-07-20 16:01:37 +03:00
sharnoff	7c96c638aa	Fix particluar typos: `s/cofig/config/g`	2021-07-20 10:32:59 +03:00
Konstantin Knizhnik	9838c71a47	Explicit compact (#341 ) * Do no perform compaction of RocksDB storage on each GC iteration * Increase GC timeout to let GC tests passed * Add comment to gc_iteration	2021-07-19 16:49:12 +03:00
Stas Kelvich	79d9314ba6	terminate socket explicitly	2021-07-19 14:52:41 +03:00
Stas Kelvich	2b33894e7b	few more review fixes	2021-07-19 14:52:41 +03:00
Stas Kelvich	a118557331	review fixes	2021-07-19 14:52:41 +03:00
Stas Kelvich	8ec234ba78	fix tokio features set for proxy standalone build	2021-07-19 14:52:41 +03:00

1 2 3 4 5 ...

644 Commits