rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 22:12:56 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	45f641cabb	Handle last "open" layer specially in LayerMap. There can be only one "open" layer for each segment. That's the last one, implemented by InMemoryLayer. That's the only one where new records can be appended to. Much of the code needed to distinguish between the last open layer and other layers anyway, so make the distinction explicit in LayerMap.	2021-08-17 18:54:51 +03:00
Heikki Linnakangas	48f4a7b886	Refactor get_page_at_lsn() logic to layered_repository.rs There was a a lot of duplicated code between the get_page_at_lsn() implementations in InMemoryLayer and SnapshotLayer. Move the code for requesting WAL redo from the Layer trait into LayeredTimeline. The get-function in Layer now just returns the WAL records and base image to the caller, and the caller is responsible for performing the WAL redo on them.	2021-08-17 18:54:48 +03:00
Heikki Linnakangas	91f72fabc9	Work with smaller segments. Split each relish into fixed-sized 10 MB segments. Separate layers are created for each segment. This reduces the write amplification if you have a large relation and update only parts of it; the downside is that you have a lot more files. The 10 MB is just a guess, we should do some modeling and testing in the future to figure out the optimal size. Each segment tracks the size of the segment separately. To figure out the total size of a relish, you need to loop through the segment to find the highest segment that's in use. That's a bit inefficient, but will do for now. We might want to add a cache or something later.	2021-08-17 18:54:41 +03:00
anastasia	cbeb67067c	Issue #367 . Change CLI so that we always create node from scratch at 'pg start'. This operation preserve previously existing config Add new flag '--config-only' to 'pg create'. If this flag is passed, don't perform basebackup, just fill initial postgresql.conf for the node.	2021-08-17 18:12:31 +03:00
anastasia	921ec390bc	cargo fmt	2021-08-16 19:41:07 +03:00
Heikki Linnakangas	f37cb21305	Update Cargo.lock for addition of 'bincode' Commit `5eb1738e8b` added a dependency to the 'bincode' crate. 'cargo build' adds it to Cargo.lock automatically, so let's remember it.	2021-08-16 19:24:26 +03:00
Heikki Linnakangas	7ee8de3725	Add metrics to WAL redo. Track the time spent on replaying WAL records by the special Postgres process, the time spent waiting for acces to the Postgres process (since there is only one per tenant), and the number of records replayed.	2021-08-16 15:49:17 +03:00
Heikki Linnakangas	047a05efb2	Minor formatting and comment fixes.	2021-08-16 15:48:59 +03:00
Dmitry Rodionov	0c4ab80eac	try to be more intelligent in WalAcceptor.start, added a bunch of typing sugar to wal acceptor fixtures	2021-08-16 14:27:44 +03:00
Heikki Linnakangas	2450f82de5	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>" The unit tests have been refactored to exercise both implementations. 'layered' is now the default. Push/pull is not implemented. The 'test_history_inmemory' test has been commented out accordingly. It's not clear how we will implement that functionality; probably by copying the snapshot files directly.	2021-08-16 10:06:48 +03:00
Max Sharnoff	5eb1738e8b	Rework walkeeper protocol to use libpq (#366 ) Most of the work here was done on the postgres side. There's more information in the commit message there. (see: `04cfa326a5`) On the WAL acceptor side, we're now expecting 'START_WAL_PUSH' to initialize the WAL keeper protocol. Everything else is mostly the same, with the only real difference being that protocol messages are now discrete CopyData messages sent over the postgres protocol. For the sake of documentation, the full set of these messages is: <- recv: START_WAL_PUSH query <- recv: server info from postgres (type `ServerInfo`) -> send: walkeeper info (type `SafeKeeperInfo`) <- recv: vote info (type `RequestVote`) if node id mismatch: -> send: self node id (type `NodeId`); exit -> send: confirm vote (with node id) (type `NodeId`) loop: <- recv: info and maybe WAL block (type `SafeKeeperRequest` + bytes) (break loop if done) -> send: confirm receipt (type `SafeKeeperResponse`)	2021-08-13 11:25:16 -07:00
Heikki Linnakangas	6e22a8f709	Refactor WAL redo to not use a separate thread. My main motivation is to make it easier to attribute time spent in WAL redo to the request that needed the WAL redo. With this patch, the WAL redo is performed by the requester thread, so it shows up in stack traces and in 'perf' report as part of the requester's call stack. This is also slightly simpler (less lines of code) and should be a bit faster too.	2021-08-13 17:23:36 +03:00
Heikki Linnakangas	f8de71eab0	Update vendor/postgres to fix race condition leading to CRC errors. Fixes https://github.com/zenithdb/zenith/issues/413	2021-08-13 14:02:26 +03:00
Heikki Linnakangas	8517d9696d	Move gc_iteration() function to Repository trait. The upcoming layered storage implementation handles GC as a repository-wide operation because it needs to pay attention to the branch points of all timelines.	2021-08-12 23:46:01 +03:00
Heikki Linnakangas	97f9021c88	Fix JWT token encoding issue in test. On my laptop, the server was receiving the token as a string with extra b'...' escaping, e.g as "b'eyJ0....0ifQA'" instead of just "eyJ0....0ifQA". That was causing the test to fail. I'm using Python 3.9, while the CI is using Python 3.8. I suspect that's why. My version of pyjwt might be different too. See also https://github.com/jpadilla/pyjwt/issues/391.	2021-08-12 20:46:14 +03:00
Heikki Linnakangas	0a92b31496	If a pg_regress test fails in CI, save regression.diffs	2021-08-12 18:39:23 +03:00
anastasia	6c3726913f	Introduce check for physical relishes. They represent files and use RelationSizeEntry to track existing and dropped files. They can be both blocky and non-blocky. get_relish_size() and get_rel_exists() functions work with physical relishes, not only with blocky ones.	2021-08-12 14:42:21 +03:00
anastasia	1bfade8adc	Issue #330 . Use put_unlink for twophase relishes. Follow PostgreSQL logic: remove Twophase files when prepared transaction is committed/aborted. Always store Twophase segments as materialized page images (no wal records).	2021-08-12 14:42:21 +03:00
anastasia	4eebe22fbb	cargo fmt	2021-08-12 14:42:21 +03:00
Heikki Linnakangas	20d5e757ca	Remove now-unused get_next_tag function. The only caller was removed by commit `c99a211b01`.	2021-08-11 22:16:38 +03:00
Heikki Linnakangas	70cb399d59	Add convenience function to create a RowDescriptor message for an int8 col. Makes the code to construct a result set a bit more terse and readable.	2021-08-11 20:17:33 +03:00
Dmitry Rodionov	ce5333656f	Introduce authentication v0.1. Current state with authentication. Page server validates JWT token passed as a password during connection phase and later when performing an action such as create branch tenant parameter of an operation is validated to match one submitted in token. To allow access from console there is dedicated scope: PageServerApi, this scope allows access to all tenants. See code for access validation in: PageServerHandler::check_permission. Because we are in progress of refactoring of communication layer involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper now doesn’t check token passed from compute, and uses “hardcoded” token passed via environment variable to communicate with pageserver. Compute postgres now takes token from environment variable and passes it as a password field in pageserver connection. It is not passed through settings because then user will be able to retrieve it using pg_settings or SHOW .. I’ve added basic test in test_auth.py. Probably after we add authentication to remaining network paths we should enable it by default and switch all existing tests to use it.	2021-08-11 20:05:54 +03:00
Arseny Sher	5f0fd093d7	Revert "Walkeeper safe info (#408 )" Temporary revert commit `0ee2e16b17` as it leads to safekeeper state deserialization failure. Let's sort that out and get it back.	2021-08-11 16:26:35 +03:00
Konstantin Knizhnik	0ee2e16b17	Walkeeper safe info (#408 ) * Align prev record CRC on 8-bytes boundary * Upadate safekeeper in-memory status on receiving message from WAL proposer	2021-08-11 09:14:05 +03:00
Konstantin Knizhnik	b607f0fd8e	Align prev record CRC on 8-bytes boundary (#407 )	2021-08-11 08:56:37 +03:00
anastasia	c99a211b01	Fix CLOG truncate handling in case of wraparound.	2021-08-11 05:49:24 +03:00
anastasia	949ac54401	Add test of clog (pg_xact) truncation	2021-08-11 05:49:24 +03:00
anastasia	e406811375	Fixes for handling SLRU relishes: replace get_tx_status() with self.get_tx_is_in_progress() to handle xacts in truncated SLRU segments correctly	2021-08-11 05:49:24 +03:00
anastasia	590ace104a	Fixes for handling SLRU relishes: - don't return ZERO_PAGE from get_page_at_lsn_nowait() for truncated SLRU segments;	2021-08-11 05:49:24 +03:00
anastasia	e475f82ff1	Rename get_rel_size() to get_relish_size(). Don't bail if relish is not found, just return None and let the caller to decide how to handle this	2021-08-11 05:49:24 +03:00
anastasia	a368642790	cargo fmt	2021-08-10 14:26:52 +03:00
anastasia	8c7983797b	Remove unused SLRUTruncate ObjectValue	2021-08-10 14:26:32 +03:00
anastasia	5dd9a66f9e	Move postgres backend messages to trace level	2021-08-10 14:26:28 +03:00
anastasia	cc877f1980	Add unit test for find_end_of_wal(). Based on previous attempt to add same test by @lubennikovaav Now WAL files are generated by initdb command.	2021-08-10 12:30:21 +03:00
anastasia	a5d57ca10b	list_nonrels() returns elements in arbitrary order. Remove incorrect comments that say otherwise.	2021-08-06 15:23:46 +03:00
Konstantin Knizhnik	3ca3394170	[refer #395 ] Check WAL record CRC in waldecoder (#396 )	2021-08-05 16:57:57 +03:00
Heikki Linnakangas	e59e0ae2dc	Clarify the terms "WAL service", "safekeeper", "proposer"	2021-08-05 10:27:56 +03:00
Stas Kelvich	ec07acfb12	fix typo in run_initdb()	2021-08-04 23:57:17 +03:00
Stas Kelvich	fa04096733	cargo fmt pass	2021-08-04 23:51:02 +03:00
Dmitry Ivanov	754892402c	Enable full feature set for hyper in zenith_utils Server functionality requires not only the "server" feature flag, but also either "http1" or "http2" (or both). To make things simpler (and prevent analogous problems), enable all features.	2021-08-04 21:41:17 +03:00
Stas Kelvich	02b9be488b	Disable GC test. Current GC test is flaky and overly strict. Since we are migrating to the layered repo format with different GC implementation let's just silence this test for now.	2021-08-04 18:33:33 +03:00
Arseny Sher	cc3ac2b74c	Allow safekeeper to stream till real end of wal. Otherwise it prematurely terminates, e.g. in test_compute_restart. ref #388	2021-08-04 18:03:43 +03:00
Arseny Sher	1dc2ae6968	Point vendor/postgres to main.	2021-08-04 14:21:01 +03:00
Stas Kelvich	04ae63a5c4	use proper postgres version	2021-08-04 14:15:07 +03:00
Arseny Sher	b77fade7b8	Look up wal directory properly in all find_end_of_wal callers. ref #388	2021-08-04 14:15:07 +03:00
Stas Kelvich	56565c0f58	look up WAL in right directory	2021-08-04 14:15:07 +03:00
Dmitry Ivanov	ed634ec320	Extract message processing function from PostgresBackend's event loop This patch has been extracted from #348, where it became unnecessary after we had decided that we didn't want to measure anything inside PostgresBackend. IMO the change is good enough to make its way into the codebase, even though it brings nothing "new" to the code.	2021-08-04 10:49:02 +03:00
Alexey Kondratov	bcaa59c0b9	Test compute restart with AND without safekeepers	2021-08-04 00:05:19 +03:00
Dmitry Ivanov	cb1b4a12a6	Add some prometheus metrics to pageserver The metrics are served by an http endpoint, which is meant to be spawned in a new thread. In the future the endpoint will provide more APIs, but for the time being, we won't bother with proper routing.	2021-08-03 21:42:24 +03:00
Heikki Linnakangas	9ff122835f	Refactor ObjectTags, intruducing a new concept called "relish" This clarifies - I hope - the abstractions between Repository and ObjectRepository. The ObjectTag struct was a mix of objects that could be accessed directly through the public Timeline interface, and also objects that were created and used internally by the ObjectRepository implementation and not supposed to be accessed directly by the callers. With the RelishTag separaate from ObjectTag, the distinction is more clear: RelishTag is used in the public interface, and ObjectTag is used internally between object_repository.rs and object_store.rs, and it contains the internal metadata object types. One awkward thing with the ObjectTag struct was that the Repository implementation had to distinguish between ObjectTags for relations, and track the size of the relation, while others were used to store "blobs". With the RelishTags, some relishes are considered "non-blocky", and the Repository implementation is expected to track their sizes, while others are stored as blobs. I'm not 100% happy with how RelishTag captures that either: it just knows that some relish kinds are blocky and some non-blocky, and there's an is_block() function to check that. But this does enable size-tracking for SLRUs, allowing us to treat them more like relations. This changes the way SLRUs are stored in the repository. Each SLRU segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a separate relish. This removes the need for the SLRU-specific put_slru_truncate() function in the Timeline trait. SLRU truncation is now handled by caling put_unlink() on the segment. This is more in line with how PostgreSQL stores SLRUs and handles their trunction. The SLRUs are "blocky", so they are accessed one 8k page at a time, and repository tracks their size. I considered an alternative design where we would treat each SLRU segment as non-blocky, and just store the whole file as one blob. Each SLRU segment is up to 256 kB in size, which isn't that large, so that might've worked fine, too. One reason I didn't do that is that it seems better to have the WAL redo routines be as close as possible to the PostgreSQL routines. It doesn't matter much in the repository, though; we have to track the size for relations anyway, so there's not much difference in whether we also do it for SLRUs. While working on this, I noticed that the CLOG and MultiXact redo code did not handle wraparound correctly. We need to fix that, but for now, I just commented them out with a FIXME comment.	2021-08-03 14:01:05 +03:00

1 2 3 4 5 ...

675 Commits