rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-23 22:29:58 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	55a4cf64a1	Refactor WAL record handling. Introduce the concept of a "ZenithWalRecord", which can be a Postgres WAL record that is replayed with the Postgres WAL redo process, or a built-in type that is handled entirely by pageserver code. Replace the special code to replay Postgres XACT commit/abort records with new Zenith WAL records. A separate zenith WAL record is created for each modified CLOG page. This allows removing the 'main_data_offset' field from stored PostgreSQL WAL records, which saves some memory and some disk space in delta layers. Introduce zenith WAL records for updating bits in the visibility map. Previously, when e.g. a heap insert cleared the VM bit, we duplicated the heap insert WAL record for the affected VM page. That was very wasteful. The heap WAL record could be massive, containing a full page image in the worst case. This addresses github issue #941.	2022-01-04 11:26:37 +02:00
Heikki Linnakangas	c77e30116e	Split waldecoder.rs into two source files. Move the code for decoding a WAL stream into WAL records into 'postgres_ffi', and keep the code to parse the WAL records deeper in 'pageserver' crate, renamed to walrecord.rs. This tidies up the dependencies a bit. 'walkeeper' reuses the same waldecoder routines, and it used to depend on 'pageserver' because of that. Now it only depends on 'postgres_ffi'. (The comment in walkeeper/Cargo.toml that claimed that the dependency was needed for ZTimelineId was obsolete. ZTimelineId is defined in 'zenith_utils', the dependency was actually needed for the waldecoder.)	2021-12-10 15:14:13 +02:00
Arseny Sher	d39608c367	Fix passing start_offset to find_end_of_wal_segment.	2021-12-03 12:43:57 +03:00
Arseny Sher	e7ca8ef5a8	Use PG timelineid 1 everywhere. As changing it doesn't have useful meaning in Zenith. ref #824	2021-11-11 13:53:39 +03:00
Heikki Linnakangas	feae7f39c1	Support read-only nodes Change 'zenith.signal' file to a human-readable format, similar to backup_label. It can contain a "PREV LSN: %X/%X" line, or a special value to indicate that it's OK to start with invalid LSN ('none'), or that it's a read-only node and generating WAL is forbidden ('invalid'). The 'zenith pg create' and 'zenith pg start' commands now take a node name parameter, separate from the branch name. If the node name is not given, it defaults to the branch name, so this doesn't break existing scripts. If you pass "foo@<lsn>" as the branch name, a read-only node anchored at that LSN is created. The anchoring is performed by setting the 'recovery_target_lsn' option in the postgresql.conf file, and putting the server into standby mode with 'standby.signal'. We no longer store the synthetic checkpoint record in the WAL segment. The postgres startup code has been changed to use the copy of the checkpoint record in the pg_control file, when starting in zenith mode.	2021-10-19 09:48:12 +03:00
Heikki Linnakangas	0e026371ec	Optimize WAL decoding slightly. This adds a fast-path for the common case that the record doesn't cross a page boundary. We now split off a new Bytes directly from the original input buffer in that case, instead of copying the record to a new BytesMut. Shaves about 5% of the page server's CPU time on my laptop, in the 'test_bulk_insert' test.	2021-10-14 14:21:23 +03:00
Heikki Linnakangas	934fb8592f	Detect when a checkpoint is modified in a smarter way. Previously, the WAL receiver we would make a decoded copy of the current Checkpoint before each WAL record, and compare it with the Checkpoint after the record has been processed. If it has changed, the checkpoint relish is updated in the repository. That's somewhat expensive, the Checkpoint::encode() function is visible in 'perf' profile. Change that so that we set a flag whenever the Checkpoint struct is modified, so that we dont need to compare the whole struct anymore.	2021-10-12 09:09:10 +03:00
Kirill Bulatov	7dda9f2894	Fix clippy lints and enable clippy checking in CI	2021-09-16 15:09:16 +03:00
Max Sharnoff	b11b0bb088	bin_ser: reject trailing bytes by default (#587 ) Changes `LeSer`/`BeSer::des`. Also adds a new `des_prefix` function to keep a way to allow trailing bytes.	2021-09-15 11:48:19 -07:00
Arseny Sher	a68c23448a	Skip the bootstrap hole in safekeeper's find_end_of_wal. Otherwise restart of safekeeper before the first segment is filled makes it report 0 as flushed LSN. To this end, tweak find_end_of_wal_segment to allow starting from given LSN, not only from the start of the segment. While here, make it less panicky.	2021-09-13 22:46:04 +03:00
Stas Kelvich	ed4eed0a19	Make use of `postgres --sync-safekeepers` in tests and CLI. Change control plane code to call `postgres --sync-safekeepers` before compute node start when safekeepers are enabled. Now `pg create` will create an empty data directory with the proper config file. Subsequent `pg start` will run `sync-safekeepers` and will call basebackup with the resulting LSN. Also change few tests to accommodate this new behavior.	2021-09-06 13:06:20 +03:00
Konstantin Knizhnik	b227c63edf	Set proper xl_prev in basebackup, when possible. In a passing fix two minor issues with basabackup: * check that we can't create branches with pre-initdb LSN's * normalize branch LSN's that are pointing to the segment boundary patch by @knizhnik closes #506	2021-09-03 14:58:59 +03:00
Dmitry Rodionov	bc709561b6	fix clippy warnings	2021-09-02 18:54:44 +03:00
Kirill Bulatov	0e4cbe0165	Fix some typos	2021-09-02 17:27:18 +03:00
Heikki Linnakangas	d7bebd8074	Add 'dump_layerfile' utility for debugging. Seems handy for getting a quick idea of what's stored in an image or delta layer file. Example output on a file after runnnig pgbench for a while: % ./target/debug/dump_layerfile pgbench_layers/pg_control_checkpoint_0_00000000016B914A ----- image layer for checkpoint.0 at 0/16B914A ---- non-blocky (88 bytes) % ./target/debug/dump_layerfile pgbench_layers/pg_xact_0000_0_000000000412FD40 ----- image layer for pg_xact/0000.0 at 0/412FD40 ---- (1) blocks % ./target/debug/dump_layerfile pgbench_layers/rel_1663_14236_1247_0_0_00000000016B914A_000000000412FD40 \| head -n 20 ----- delta layer for 1663/14236/1247.0 0/16B914A-0/412FD40 ---- --- relsizes --- 0/16B914A: 14 0/16CA559: 15 --- page versions --- blk 13 at 0/16BB1D2: rec 8162 bytes will_init: true HEAP INSERT blk 14 at 0/16CA559: rec 8241 bytes will_init: true XLOG FPI blk 14 at 0/16CA637: rec 215 bytes will_init: true HEAP INSERT blk 14 at 0/16DF14F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16DF3A7: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E0637: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E088F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E5F9F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E620F: rec 215 bytes will_init: false HEAP INSERT	2021-09-01 12:20:16 -07:00
anastasia	8b3a293bb0	Use postgres_ffi bindings instead of custom type definitions. Move several functions to postgres_ffi crate	2021-09-01 16:11:44 +03:00
Dmitry Rodionov	989ab7e883	move several functions which replicate ones from postgresql to postgres_ffi crate	2021-09-01 16:11:44 +03:00
Arseny Sher	8d3450f4c6	Basic safekeeper refactoring and bug fixing. 1) Extract consensus logic to safekeeper.rs. 2) Change the voting flow so that acceptor tells his epoch along with giving the vote, not before it; otherwise it might get immediately stale. #294 3) Process messages from compute atomically and sync state properly. #270 4) Use separate structs for disk and network. ref #315	2021-08-27 15:22:10 +03:00
Patrick Insinger	d265b4cdd3	waldecoder - check for trailing bytes When we parse the main data in a WAL record, ensure we consume all bytes.	2021-08-26 10:24:33 -07:00
Eric Seppanen	41fa02f82b	Replace transmute with serde Upgrade to bindgen 0.59, which has two new abilities: - specify arbitrary #[derive] attributes to attach to generated structs - request explicit padding fields These two features are enough to replace transmute with serde/bincode.	2021-08-24 16:32:37 +03:00
anastasia	1bfade8adc	Issue #330 . Use put_unlink for twophase relishes. Follow PostgreSQL logic: remove Twophase files when prepared transaction is committed/aborted. Always store Twophase segments as materialized page images (no wal records).	2021-08-12 14:42:21 +03:00
anastasia	cc877f1980	Add unit test for find_end_of_wal(). Based on previous attempt to add same test by @lubennikovaav Now WAL files are generated by initdb command.	2021-08-10 12:30:21 +03:00
anastasia	1e6267a35f	Get rid of snapshot directory + related code cleanup and refactoring. - Add new subdir postgres_ffi/samples/ for config file samples. - Don't copy wal to the new branch on zenith init or zenith branch. - Import_timeline_wal on zenith init.	2021-07-23 13:21:45 +03:00
Eric Seppanen	ad79ca05e9	suppress nullptr warnings on auto-generated bindgen unit tests Hopefully, this will be addressed upstream before too long; see rust-bindgen issue #1651.	2021-07-20 20:12:15 +03:00
Heikki Linnakangas	325dd41277	Remove unused constructor function. This was failing to compile with rustc nightly version, because the datatype of 'fullPageWrites' was changed. See discussion at https://github.com/zenithdb/zenith/issues/207#issuecomment-881478570. But since the function is actually unused, let's just remove it.	2021-07-20 16:01:37 +03:00
Konstantin Knizhnik	e74b06d999	Pass prev_record_ptr through zenith.signal file to compute node	2021-07-16 18:43:07 +03:00
Konstantin Knizhnik	f6705b7a7d	Fix TimestampTz type to i64 to be compatbile with Postgres	2021-07-16 18:43:07 +03:00
Heikki Linnakangas	46e613f423	Fix typos	2021-07-16 18:43:07 +03:00
Konstantin Knizhnik	3cded20662	Refactring after Heikki review	2021-07-16 18:43:07 +03:00
Konstantin Knizhnik	eb0a56eb22	Replay non-relational WAL records on page server	2021-07-16 18:43:07 +03:00
Heikki Linnakangas	befefe8d84	Run 'cargo fmt'. Fixes a few formatting discrepancies had crept in recently.	2021-07-14 22:03:14 +03:00
Konstantin Knizhnik	ad92b66eed	Fix TimestampTz type to i64 to be compatbile with Postgres	2021-07-14 15:55:12 +03:00
Konstantin Knizhnik	3e69c41a47	Add XLOG_HEAP_OPMASK to pg_contants	2021-07-10 10:09:56 +03:00
Heikki Linnakangas	ced338fd20	Handle relation DROPs in page server. Add back code to parse transaction commit and abort records, and in particular the list of dropped relations in them. Add 'put_unlink' function to the Timeline trait and implementation. We had the code to handle dropped relations in the GC code and elsewhere in ObjectRepository already, but there was nothing to create the RelationSizeEntry::Unlink tombstone entries until now. Also add a test to check that GC correctly removes all page versions of a dropped relation. Implements https://github.com/zenithdb/zenith/issues/232, except for the "orphaned" rels. Reviewed-by: Konstantin Knizhnik	2021-06-29 00:27:10 +03:00
Heikki Linnakangas	44c35722d8	Remove a bunch of dead code Some of these were related to handling various WAL records that are not related to any relations, like pg_multixact updates. These should have been removed in the revert commit `6a9c036ac1`, but I missed them. Also, we didn't anything with commit/abort records. We will start parsing commit/abort records in the next commit, but seems better to add that from clean slate. Reviewed-by: Konstantin Knizhnik	2021-06-29 00:26:53 +03:00
Heikki Linnakangas	4f1b22a2c8	Use ObjectTag enum instead of special fork number to store metadata objects. Extracted from Konstantin's larger PR: https://github.com/zenithdb/zenith/pull/268	2021-06-22 21:34:31 +03:00
anastasia	0969574d48	Use bindgen for various xlog structures and checkpoint. Implement encode/decode methods for them. Some methods are unused now. This is a preparatory commit for nonrel_wal	2021-06-09 01:00:42 +03:00
anastasia	2b0193e6bf	implement from_bytes for XLogPageHeader structs	2021-06-08 13:08:57 +03:00
anastasia	c31a5e2c8f	move XLogPageHeader structs to xlog_utils	2021-06-08 13:08:57 +03:00
Heikki Linnakangas	434374ebb4	Turn encode/decode into methods Like in PR #208	2021-06-04 23:05:30 +03:00
Heikki Linnakangas	a7ae552851	Use rust memoffset crate to replace C offsetof(). Cherry-picked from Eric's PR #208	2021-06-04 23:05:28 +03:00
Heikki Linnakangas	8b5a061c8e	Add comments on the unsafe use of transmute in encode/decode_pg_control Note the unsafety of the unsafe block, with a link to the ongoing discussion. This doesn't try to solve the problem, but let's at least document the status quo.	2021-06-04 23:05:26 +03:00
Heikki Linnakangas	8147aa7e93	Use u8 slice instead of Bytes in function argument. Bytes is handy, but in decode_pg_control's case it's just complicating things. Also, pass ControlFileData by ref to encode_pg_control().	2021-06-04 23:05:20 +03:00
Heikki Linnakangas	d18cc8a3a8	Update 'postgres_ffi' module's readme file and comments. Explain the purpose of of the 'postgres_ffi' module, explain what the PostgreSQL control file is, and some other minor cleanup.	2021-06-04 23:05:11 +03:00
Heikki Linnakangas	762e9859d6	Move functions for reading/writing control file to separate source file. To follow the precedent of xlog_utils.rs and relfile_utils.rs.	2021-06-04 23:05:05 +03:00
Heikki Linnakangas	924261f7db	Remove unused ControlFile::new() constructor. It has never been used, AFAICS.	2021-06-04 23:05:02 +03:00
Heikki Linnakangas	ac60b68d50	Handle VM and FSM truncation WAL records in the page server. Fixes issue #190. Original patch by Konstantin Knizhnik.	2021-05-31 23:36:17 +03:00
Heikki Linnakangas	d5fe515363	Implement "checkpointing" in the page server. - Previously, we checked on first use of a timeline, whether there is a snapshot and WAL for the timeline, and loaded it all into the (rocksdb) repository. That's a waste of effort if we had done that earlier already, and stopped and restarted the server. Track the last LSN that we have loaded into the repository, and only load the recent missing WAL after that. - When you create a new zenith repository with "zenith init", immediately load the initial empty postgres cluster into the rocksdb repository. Previously, we only did that on the first connection. This way, we don't need any "load from filesystem" codepath during normal operation, we can assume that the repository for a timeline is always up to date. (We might still want to use the functionality to import an existing PostgreSQL data directory into the repository in the future, as a separate Import feature, but not today.)	2021-05-24 17:02:05 +03:00
Heikki Linnakangas	6a9c036ac1	Revert all changes related to storing and restoring non-rel data in page server This includes the following commits: `35a1c3d521` Specify right LSN in test_createdb.py `d95e1da742` Fix issue with propagation of CREATE DATABASE to the branch `8465738aa5` [refer #167] Fix handling of pg_filenode.map files in page server `86056abd0e` Fix merge conflict: set initial WAL position to second segment because of pg_resetwal `2bf2dd1d88` Add nonrelfile_utils.rs file `20b6279beb` Fix restoring non-relational data during compute node startup `06f96f9600` Do not transfer WAL to computation nodes: use pg_resetwal for node startup As well as some older changes related to storing CLOG and MultiXact data as "pseudorelation" in the page server. With this revert, we go back to the situtation that when you create a new compute node, we ship all the WAL from the beginning of time to the compute node. Obviously we need a better solution, like the code that this reverts. But per discussion with Konstantin and Stas, this stuff was still half-baked, and it's better for it to live in a branch for now, until it's more complete and has gone through some review.	2021-05-24 16:05:45 +03:00
Eric Seppanen	4aabc9a682	easy clippy cleanups Various things that clippy complains about, and are really easy to fix.	2021-05-23 13:17:15 -07:00

1 2

83 Commits