rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-17 10:22:56 +00:00

Author	SHA1	Message	Date
Arseny Sher	0aec60938a	Make flush_lsn reported by safekeepers point to record boundary. Otherwise we produce corrupted record holes in WAL during compute node restart in case there was an unfinished record from the old compute, as these reports advance commit_lsn -- reliably persisted part of WAL. ref #549. Mostly by @knizhnik. I adjusted to make sure proposer always starts streaming since record beginning so we don't need special quirks for decoding in safekeeper.	2021-09-11 06:10:10 +03:00
Dmitry Rodionov	bc709561b6	fix clippy warnings	2021-09-02 18:54:44 +03:00
Stas Kelvich	8c07a36fda	Remove last_valid_lsn tracking in wal_receiver. There are two main reasons for that: a) Latest unfinished record may disapper after compute node restart, so let's try not leak volatile part of the WAL into the repository. Always use last_valid_record instead. That change requires different getPage@LSN logic in postgres -- we need to ask LSN's that point to some complete record instead of GetFlushRecPtr() that can point in the middle of the record. That was already done by @knizhnik to deal with the same problem during the work on `postgres --sync-safekeepers`. Postgres will use LSN's aligned on 0x8 boundary in get_page requests, so we also need to be sure that last_valid_record is aligned. b) Switch to get_last_record_lsn() in basebackup@no_lsn. When compute node is running without safekeepers and streams WAL directly to pageserver it is important to match basebackup LSN and LSN of replication start. Before this commit basebackup@no_lsn was waiting for last_valid_lsn and walreceiver started replication with last_record_lsn, which can be less. So replication was failing since compute node doesn't have requested WAL.	2021-09-02 12:06:12 +03:00
Heikki Linnakangas	d7bebd8074	Add 'dump_layerfile' utility for debugging. Seems handy for getting a quick idea of what's stored in an image or delta layer file. Example output on a file after runnnig pgbench for a while: % ./target/debug/dump_layerfile pgbench_layers/pg_control_checkpoint_0_00000000016B914A ----- image layer for checkpoint.0 at 0/16B914A ---- non-blocky (88 bytes) % ./target/debug/dump_layerfile pgbench_layers/pg_xact_0000_0_000000000412FD40 ----- image layer for pg_xact/0000.0 at 0/412FD40 ---- (1) blocks % ./target/debug/dump_layerfile pgbench_layers/rel_1663_14236_1247_0_0_00000000016B914A_000000000412FD40 \| head -n 20 ----- delta layer for 1663/14236/1247.0 0/16B914A-0/412FD40 ---- --- relsizes --- 0/16B914A: 14 0/16CA559: 15 --- page versions --- blk 13 at 0/16BB1D2: rec 8162 bytes will_init: true HEAP INSERT blk 14 at 0/16CA559: rec 8241 bytes will_init: true XLOG FPI blk 14 at 0/16CA637: rec 215 bytes will_init: true HEAP INSERT blk 14 at 0/16DF14F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16DF3A7: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E0637: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E088F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E5F9F: rec 215 bytes will_init: false HEAP INSERT blk 14 at 0/16E620F: rec 215 bytes will_init: false HEAP INSERT	2021-09-01 12:20:16 -07:00
anastasia	8b3a293bb0	Use postgres_ffi bindings instead of custom type definitions. Move several functions to postgres_ffi crate	2021-09-01 16:11:44 +03:00
anastasia	78963ad104	Issue #411 . Support drop database in pageserver. Use put_unlink for FileNodeMap relishes. Always store FileNodeMap as materialized page images (no wal records).	2021-08-30 17:29:29 +03:00
Heikki Linnakangas	b949127b06	Rename page_cache.rs to tenant_mgr.rs. Once upon a time, 'page_cache.rs' contained an actual page cache, but it hasn't for a very long time. Rename to reflect what it actually does these days.	2021-08-30 15:17:30 +03:00
Patrick Insinger	d265b4cdd3	waldecoder - check for trailing bytes When we parse the main data in a WAL record, ensure we consume all bytes.	2021-08-26 10:24:33 -07:00
Heikki Linnakangas	81dd4bc41e	Fix decoding XLOG_HEAP_DELETE and XLOG_HEAP_UPDATE records. Because the t_cid field was missing from the XlHeapDelete struct that corresponds to the PostgreSQL xl_heap_delete struct, the check for the XLH_DELETE_ALL_VISIBLE_CLEARED flag did not work correctly. Decoding XlHeapUpdate struct was also missing the t_cid field, but that didn't cause any immediate problems because in that struct, the t_cid field is after all the fields that the page server cares about. But fix that too, as it was an accident waiting to happen. The bug was mostly hidden by the VM page handling in zenith_wallog_page, where it forcibly generates a FPW record whenever a VM page is evicted: else if (forknum == VISIBILITYMAP_FORKNUM && !RecoveryInProgress()) { /* * Always WAL-log vm. * We should never miss clearing visibility map bits. * * TODO Is it too bad for performance? * Hopefully we do not evict actively used vm too often. */ XLogRecPtr recptr; recptr = log_newpage_copy(&reln->smgr_rnode.node, forknum, blocknum, buffer, false); XLogFlush(recptr); lsn = recptr; But that was just hiding the issue: it's still visible if you had a read-only node relying on the data in the page server, or you killed and restarted the primary node, or you started a branch. In the included test case, I used a new branch to expose this. Fixes https://github.com/zenithdb/zenith/issues/461	2021-08-24 15:59:25 +03:00
anastasia	a368642790	cargo fmt	2021-08-10 14:26:52 +03:00
Konstantin Knizhnik	3ca3394170	[refer #395 ] Check WAL record CRC in waldecoder (#396 )	2021-08-05 16:57:57 +03:00
Heikki Linnakangas	9ff122835f	Refactor ObjectTags, intruducing a new concept called "relish" This clarifies - I hope - the abstractions between Repository and ObjectRepository. The ObjectTag struct was a mix of objects that could be accessed directly through the public Timeline interface, and also objects that were created and used internally by the ObjectRepository implementation and not supposed to be accessed directly by the callers. With the RelishTag separaate from ObjectTag, the distinction is more clear: RelishTag is used in the public interface, and ObjectTag is used internally between object_repository.rs and object_store.rs, and it contains the internal metadata object types. One awkward thing with the ObjectTag struct was that the Repository implementation had to distinguish between ObjectTags for relations, and track the size of the relation, while others were used to store "blobs". With the RelishTags, some relishes are considered "non-blocky", and the Repository implementation is expected to track their sizes, while others are stored as blobs. I'm not 100% happy with how RelishTag captures that either: it just knows that some relish kinds are blocky and some non-blocky, and there's an is_block() function to check that. But this does enable size-tracking for SLRUs, allowing us to treat them more like relations. This changes the way SLRUs are stored in the repository. Each SLRU segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a separate relish. This removes the need for the SLRU-specific put_slru_truncate() function in the Timeline trait. SLRU truncation is now handled by caling put_unlink() on the segment. This is more in line with how PostgreSQL stores SLRUs and handles their trunction. The SLRUs are "blocky", so they are accessed one 8k page at a time, and repository tracks their size. I considered an alternative design where we would treat each SLRU segment as non-blocky, and just store the whole file as one blob. Each SLRU segment is up to 256 kB in size, which isn't that large, so that might've worked fine, too. One reason I didn't do that is that it seems better to have the WAL redo routines be as close as possible to the PostgreSQL routines. It doesn't matter much in the repository, though; we have to track the size for relations anyway, so there's not much difference in whether we also do it for SLRUs. While working on this, I noticed that the CLOG and MultiXact redo code did not handle wraparound correctly. We need to fix that, but for now, I just commented them out with a FIXME comment.	2021-08-03 14:01:05 +03:00
Konstantin Knizhnik	f6705b7a7d	Fix TimestampTz type to i64 to be compatbile with Postgres	2021-07-16 18:43:07 +03:00
Konstantin Knizhnik	eb0a56eb22	Replay non-relational WAL records on page server	2021-07-16 18:43:07 +03:00
Konstantin Knizhnik	97681acfcf	Replace XLR_RMGR_INFO_MASK with XLOG_HEAP_OPMASK	2021-07-10 10:09:56 +03:00
Konstantin Knizhnik	baf8800b96	Fix incorrect mask in wldecoder	2021-07-10 10:09:56 +03:00
Heikki Linnakangas	ced338fd20	Handle relation DROPs in page server. Add back code to parse transaction commit and abort records, and in particular the list of dropped relations in them. Add 'put_unlink' function to the Timeline trait and implementation. We had the code to handle dropped relations in the GC code and elsewhere in ObjectRepository already, but there was nothing to create the RelationSizeEntry::Unlink tombstone entries until now. Also add a test to check that GC correctly removes all page versions of a dropped relation. Implements https://github.com/zenithdb/zenith/issues/232, except for the "orphaned" rels. Reviewed-by: Konstantin Knizhnik	2021-06-29 00:27:10 +03:00
Heikki Linnakangas	44c35722d8	Remove a bunch of dead code Some of these were related to handling various WAL records that are not related to any relations, like pg_multixact updates. These should have been removed in the revert commit `6a9c036ac1`, but I missed them. Also, we didn't anything with commit/abort records. We will start parsing commit/abort records in the next commit, but seems better to add that from clean slate. Reviewed-by: Konstantin Knizhnik	2021-06-29 00:26:53 +03:00
anastasia	0969574d48	Use bindgen for various xlog structures and checkpoint. Implement encode/decode methods for them. Some methods are unused now. This is a preparatory commit for nonrel_wal	2021-06-09 01:00:42 +03:00
anastasia	2b0193e6bf	implement from_bytes for XLogPageHeader structs	2021-06-08 13:08:57 +03:00
anastasia	c31a5e2c8f	move XLogPageHeader structs to xlog_utils	2021-06-08 13:08:57 +03:00
anastasia	d85d67a6f1	use constants defined in xlog_utils for waldecoder	2021-06-08 13:08:57 +03:00
Konstantin Knizhnik	063429aade	Implement GC for new object_store API (#229 ) * Implement GC for new object_store API * Add comments for GC * Revert postgres module version reference	2021-06-04 20:11:56 +03:00
anastasia	0e423d481e	Update rustdoc comments and README for pageserver crate	2021-06-01 19:38:42 +03:00
Heikki Linnakangas	558a2214bc	Fix comment	2021-06-01 18:28:01 +03:00
Heikki Linnakangas	ac60b68d50	Handle VM and FSM truncation WAL records in the page server. Fixes issue #190. Original patch by Konstantin Knizhnik.	2021-05-31 23:36:17 +03:00
Heikki Linnakangas	6a9c036ac1	Revert all changes related to storing and restoring non-rel data in page server This includes the following commits: `35a1c3d521` Specify right LSN in test_createdb.py `d95e1da742` Fix issue with propagation of CREATE DATABASE to the branch `8465738aa5` [refer #167] Fix handling of pg_filenode.map files in page server `86056abd0e` Fix merge conflict: set initial WAL position to second segment because of pg_resetwal `2bf2dd1d88` Add nonrelfile_utils.rs file `20b6279beb` Fix restoring non-relational data during compute node startup `06f96f9600` Do not transfer WAL to computation nodes: use pg_resetwal for node startup As well as some older changes related to storing CLOG and MultiXact data as "pseudorelation" in the page server. With this revert, we go back to the situtation that when you create a new compute node, we ship all the WAL from the beginning of time to the compute node. Obviously we need a better solution, like the code that this reverts. But per discussion with Konstantin and Stas, this stuff was still half-baked, and it's better for it to live in a branch for now, until it's more complete and has gone through some review.	2021-05-24 16:05:45 +03:00
anastasia	84508d4f68	fix replay of nextMulti and nextMultiOffset fields	2021-05-24 15:17:35 +03:00
Eric Seppanen	4aabc9a682	easy clippy cleanups Various things that clippy complains about, and are really easy to fix.	2021-05-23 13:17:15 -07:00
Konstantin Knizhnik	20b6279beb	Fix restoring non-relational data during compute node startup	2021-05-20 14:14:52 +03:00
Konstantin Knizhnik	06f96f9600	Do not transfer WAL to computation nodes: use pg_resetwal for node startup	2021-05-20 14:13:47 +03:00
Konstantin Knizhnik	04dc698d4b	Add support of twophase transactions	2021-05-16 00:03:20 +03:00
Konstantin Knizhnik	9ece1e863d	Compute and restore pg_xact, pg_multixact and pg_filenode.map files	2021-05-14 16:35:09 +03:00
Konstantin Knizhnik	22e7fcbf2d	Handle visbility map updates in WAL redo	2021-05-12 10:38:43 +03:00
Eric Seppanen	d26b76fe7c	cargo fmt	2021-05-07 13:11:44 -07:00
anastasia	15db0d1d6f	refactor walreciever and restore_local_repo	2021-05-06 12:58:08 +03:00
Konstantin Knizhnik	eea6f0898e	Restore CLOG from snapshot	2021-04-30 14:22:47 +03:00
anastasia	1369145e83	code cleanup	2021-04-29 18:41:42 +03:00
anastasia	b49164a1d4	cargo fmt	2021-04-29 18:41:42 +03:00
anastasia	e7b112aacc	Refactor pg_constants. Move them to postgres_ffi/	2021-04-29 18:41:42 +03:00
anastasia	421d586953	code cleanup for XLogRecord decoding	2021-04-28 13:56:27 +03:00
anastasia	ef37eb96b9	refactor XLogRecord reading	2021-04-28 13:56:27 +03:00
anastasia	d311f708b6	handle subtrans in COMMIT/ABORT records	2021-04-28 13:56:27 +03:00
Eric Seppanen	648755a25e	add Lsn::block_offset, remaining_in_block, calc_padding Replace open-coded math with member fns.	2021-04-25 19:37:02 -07:00
Eric Seppanen	01e239afa3	apply Lsn type everywhere Use the `Lsn` type everywhere that I can find u64 being used to represent an LSN.	2021-04-25 19:37:02 -07:00
Konstantin Knizhnik	52ee3a2bac	Support CREATE DATABASE command	2021-04-23 17:03:56 +03:00
anastasia	b64bd2a8af	handle XLOG_DBASE_CREATE in waldecoder	2021-04-23 14:06:09 +03:00
Konstantin Knizhnik	9e7c45cb72	Merge with master	2021-04-22 09:45:13 +03:00
Eric Seppanen	1f3f4cfaf5	clippy cleanup #2 - remove needless return - remove needless format! - remove a few more needless clone() - from_str_radix(_, 10) -> .parse() - remove needless reference - remove needless `mut` Also manually replaced a match statement with map_err() because after clippy was done with it, there was almost nothing left in the match expression.	2021-04-21 17:56:58 -07:00
Konstantin Knizhnik	a22cb7acc1	Merge with main branch	2021-04-21 20:19:34 +03:00

1 2

64 Commits