rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-05 20:42:54 +00:00

Author	SHA1	Message	Date
anastasia	01e2e88135	Fixes to checkpoint.nextXid accounting: - Advance next_xid only when incoming xlog record has one. - Advance next_xid by one. Remove XID_CHECKPOINT_INTERVAL.	2021-09-01 12:26:57 +03:00
anastasia	d64c083dad	Add batch_wraparound/test_xid_wraparound.py	2021-09-01 12:26:57 +03:00
Heikki Linnakangas	b45d5368b0	Don't create an image layer for dropped relations. I noticed that the timeline directory contained files like this: pg_xact_0000_0_000000000169C3C2_00000000016BB399 pg_xact_0000_0_00000000016BB399 pg_xact_0000_0_00000000016BB399_00000000016BDD06 pg_xact_0000_0_00000000016BDD06 pg_xact_0000_0_00000000016BDD06_00000000016C63AA pg_xact_0000_0_00000000016C63AA pg_xact_0000_0_00000000016C63AA_0000000001765226_DROPPED pg_xact_0000_0_0000000001765226 pg_xact_0001_0_00000000016BB77E_00000000016BDD06 pg_xact_0001_0_00000000016BDD06 pg_xact_0001_0_00000000016BDD06_0000000001765226_DROPPED pg_xact_0001_0_0000000001765226 Note how there is an image file after each DROPPED file. It's a waste of time and space to materialize an image of the file at the point where it's dropped, no one is going to request pages on a dropped relation. And it's a correctness issue too: list_rels() and list_nonrels() will not consider the relation as unlinked, unless the latest layer indicates so, and there is no concept of a dropped image layer. That was causing test_clog_truncate test to fail, when I adjusted the checkpointer to force a checkpoint more aggressively. There are a bunch more issues related to dropped rels and branching, see https://github.com/zenithdb/zenith/issues/502. Hence this doesn't completely fix the issue I saw with test_clog_truncate either. But it's a start.	2021-09-01 09:42:18 +03:00
Max Sharnoff	625abf3c52	Bump vendor/postgres for walproposer cleanup ref zenithdb/postgres#69	2021-08-31 13:09:16 -07:00
anastasia	c0ace1efff	Bump vendor/postgres to use relsize cache.	2021-08-31 14:10:50 +03:00
Kirill Bulatov	03a09b7827	Replace old git urls with the current ones	2021-08-30 23:51:47 +03:00
Heikki Linnakangas	63d0a865f4	Update and move comment. The comment talked about the WAL redo thread, but commit `6e22a8f709` refactored that away. The problem the comment describes probably still exists, so keep the comment, but update the wording.	2021-08-30 20:35:08 +03:00
Patrick Insinger	5ac4a27042	image_layer - read images directly from disk Avoid slurping entire image files into memory. For blocky segments, we write the bytes directly to a bookfile chapter. The blocks are a fixed size, which allows for random access.	2021-08-30 10:34:36 -07:00
Patrick Insinger	7c7e89e2ea	layered_repo - atomic last/prev record_lsn Make a new type that stores both Lsns. Use an RwLock for thread safety.	2021-08-30 09:40:13 -07:00
Patrick Insinger	561bf2c510	circleci - fix test summary	2021-08-30 09:18:49 -07:00
Patrick Insinger	98f49671c1	delta_layer - read page versions from disk split the page versions into two chapters: PAGE_VERSION_METAS - a rust BTreeMap from (block #, lsn) -> page & WAL byte ranges in PAGE_VERSIONS_CHAPTER PAGE_VERSIONS_CHAPTER - raw page images and serialized WAL records	2021-08-30 09:12:38 -07:00
anastasia	78963ad104	Issue #411 . Support drop database in pageserver. Use put_unlink for FileNodeMap relishes. Always store FileNodeMap as materialized page images (no wal records).	2021-08-30 17:29:29 +03:00
anastasia	27442c3daa	Add test for DROP DATABASE command	2021-08-30 17:29:29 +03:00
anastasia	e29bfa09b2	Fix list_rels and list_nonrels in layeredRepository - return only visible objects	2021-08-30 17:29:29 +03:00
Heikki Linnakangas	b949127b06	Rename page_cache.rs to tenant_mgr.rs. Once upon a time, 'page_cache.rs' contained an actual page cache, but it hasn't for a very long time. Rename to reflect what it actually does these days.	2021-08-30 15:17:30 +03:00
Heikki Linnakangas	a3f3d46016	Misc doc updates	2021-08-30 14:29:21 +03:00
Heikki Linnakangas	c5fc4e6905	Fix instructions in README.md on how to start psql Commit `c4b2bf7ebd` changed the bootstrap superuser name.	2021-08-30 14:29:21 +03:00
Heikki Linnakangas	9dfee8a3b5	Add Gauge for # of layers Seems like a useful metric	2021-08-30 12:58:15 +03:00
sharnoff	263e03f4b8	Improve code & text formatting in proxy welcome Adds some named formatting variables to make things a little more clear. Also adds some words & commas to the message itself.	2021-08-30 12:41:47 +03:00
Heikki Linnakangas	074bd3bb12	Add basic performance test framework. This provides a pytest fixture to record metrics from pytest tests. The The recorded metrics are printed out at the end of the tests. As a starter, this includes on small test, using pgbench. It prints out three metrics: the initialization time, runtime of 5000 xacts, and the repository size after the tests.	2021-08-27 21:00:45 +03:00
Alexey Kondratov	e1d8f97b9e	Mention `pipenv run` as an option to run pytest	2021-08-27 19:46:51 +03:00
Alexey Kondratov	7e7b31a626	Extract basebackup directly from the CopyOutReader Do not fetch it into the intermediate buffer.	2021-08-27 19:46:51 +03:00
Heikki Linnakangas	787806285d	Remove unused 'update_meta' argument. It was used by the object repository code, but now that that's gone, it's dead.	2021-08-27 15:45:45 +03:00
Arseny Sher	7474cfac08	Rename VCL to epochStartLsn and restart_lsn to truncate_lsn. epochStartLsn is the LSN since which new proposer writes its WAL in its epoch, let's be more explicit here. truncate_lsn is LSN still needed by the most lagging safekeeper. restart_lsn is terminology from pg_replicaton_slots, but here we don't really have 'restart'; hopefully truncate word makes it clearer.	2021-08-27 15:22:10 +03:00
Arseny Sher	6cbc08f1fb	bump pg version	2021-08-27 15:22:10 +03:00
Arseny Sher	8d3450f4c6	Basic safekeeper refactoring and bug fixing. 1) Extract consensus logic to safekeeper.rs. 2) Change the voting flow so that acceptor tells his epoch along with giving the vote, not before it; otherwise it might get immediately stale. #294 3) Process messages from compute atomically and sync state properly. #270 4) Use separate structs for disk and network. ref #315	2021-08-27 15:22:10 +03:00
Heikki Linnakangas	4902d1daa8	Store base images in separate ImageLayers Previously, a SnapshotLayer and corresponding file on disk contained the base image of every page in the segment at the start LSN, and all the changes (= WAL records) in the range between start and end LSN. That was a bit awkward, because we had to keep the base image of every page in memory until we had accumulated enough WAL after the base image to write out the layer. When it's time to write out a layer, we would really want to replay the WAL to reconstruct the most recent version of each page, to save the effort later. That's on the assumption that the client will usually request the most recent version, not some older one. Split the SnapshotLayer into two structs: ImageLayer and DeltaLayer. An image layer contains a "snapshot" of the segment at one specific LSN, and no WAL records, whereas a delta layer contains WAL records in a range of LSNs. In order to reconstruct a page version in the delta layer, by performing WAL redo, you also need the previous image layer. So the delta layers are "incremental" against the previous layer. So where previously we would create snapshot files like this: rel_100_200 rel_200_300 rel_300_400 We now create image and delta files like this: rel_100 # image rel_100_200 # delta rel_200 rel_200_300 rel_300 rel_300_400 rel_400 That's more files, but as discussed above, this allows storing more up-to-date page versions on disk, which should reduce the latency of responding to a GetPage request. It also allows more fine-grained garbage collection. In the above example, after the old page version are no longer needed and if the relation is not modified anymore, we only need to keep the latest image file, 'rel_400', and everything else can be removed. Implements https://github.com/zenithdb/zenith/issues/339	2021-08-27 02:35:16 +03:00
Heikki Linnakangas	40c79988a8	Move code to handle snapshot filenames This isn't very useful yet, but the next commit will add more code related to handling the filenames.	2021-08-27 02:35:16 +03:00
Patrick Insinger	d265b4cdd3	waldecoder - check for trailing bytes When we parse the main data in a WAL record, ensure we consume all bytes.	2021-08-26 10:24:33 -07:00
Konstantin Knizhnik	beaa2cd0a2	Handle COPY error	2021-08-26 13:53:10 +03:00
Arseny Sher	c4450907e5	Don't hide exact error of get_timeline. ref #470	2021-08-25 20:46:31 +03:00
Heikki Linnakangas	de9d5e0aa4	Remove unnecessary dependencies. Found by "cargo udeps"	2021-08-25 18:51:15 +03:00
Heikki Linnakangas	4046530160	Remove remnants of choosing between repository formats. Now that we only have one Repository implementation, no need for the command-line options to choose it either. I'm removing these as a separate commit to show what we will need to do if we add another Repository implementation in the future (even though I don't foresee us doing that any time soon)	2021-08-25 18:37:22 +03:00
Heikki Linnakangas	5998744bcc	Remove rocksdb implementation. The layered storage format is good enough that we don't need the rocksdb implementation anymore. There are a lot of known issues but we'll keep working on them.	2021-08-25 18:37:22 +03:00
Heikki Linnakangas	250ae643a8	Remove 'zenith push' feature. Now that the new storage format is based on immutable files, we want to implement push/pull in terms of these immutable files as well. Similarly to how those files will be transferred between S3 and the page server. The implementation we had was fairly tightly coupled with the object repository implementation, but I'm about to remove the object / rocksdb storage format soon. That would leave the current "zenith push" command completely broken. It seemed like a good idea at the time, but in hindsight, it was premature to implement push/pull yet. It's a nice feature and I'd like to see it reimplemented in the future, but in the meanwhile, let's remove the code we had. We can dig the parts of it that might be useful in the future from the git history.	2021-08-25 18:37:22 +03:00
Dmitry Ivanov	3edad463fb	Adjust docker container for console's CI pipeline	2021-08-25 17:28:42 +03:00
Heikki Linnakangas	19fcea99da	If too much memory is being used for in-memory layers, flush oldest one. The old policy was to flush all in-memory layers to disk every 10 seconds. That was a pretty dumb policy, unnecessarily aggressive. This commit changes the policy so that we only flush layers where the oldest WAL record is older than 16 MB from the last valid LSN on the timeline. That's still pretty aggressive, but it's a step in the right direction. We do need a limit on how old the oldest in-memory layer is allowed to be, because that determines how much WAL the safekeepers need to hold onto, and how much WAL we need to reprocess in case of a page server crash. 16 MB is surely still too aggressive for that, but it's easy to change the setting later. To support that, keep all in-memory layers in a binary heap, so that we can easily find the one with the oldest LSN. This tracks and a new LSN value in the metadata file: 'disk_consistent_lsn'. Before, on page server restart we restarted the WAL processing from the 'last_record_lsn' value, but now that we don't flush everything to disk in one go, the 'last_record_lsn' tracked in memory is usually ahead of the last record that's been flushed to disk. Even though we track that oldest LSN now, the crash recovery story isn't really complete. We don't do fsync()s anywhere, and thing will break if a snapshot file isn't complete, as there's no CRC on them. That's not new, and it's a TODO.	2021-08-25 11:20:47 +03:00
Dmitry Rodionov	f2f02a8af0	apply transformation (Arc<Option> -> Option<Arc>) suggested by @funbringer	2021-08-24 19:05:00 +03:00
Dmitry Rodionov	b135723994	review adjustments	2021-08-24 19:05:00 +03:00
Dmitry Rodionov	23b5249512	translate pageserver api to http	2021-08-24 19:05:00 +03:00
Eric Seppanen	41fa02f82b	Replace transmute with serde Upgrade to bindgen 0.59, which has two new abilities: - specify arbitrary #[derive] attributes to attach to generated structs - request explicit padding fields These two features are enough to replace transmute with serde/bincode.	2021-08-24 16:32:37 +03:00
Heikki Linnakangas	81dd4bc41e	Fix decoding XLOG_HEAP_DELETE and XLOG_HEAP_UPDATE records. Because the t_cid field was missing from the XlHeapDelete struct that corresponds to the PostgreSQL xl_heap_delete struct, the check for the XLH_DELETE_ALL_VISIBLE_CLEARED flag did not work correctly. Decoding XlHeapUpdate struct was also missing the t_cid field, but that didn't cause any immediate problems because in that struct, the t_cid field is after all the fields that the page server cares about. But fix that too, as it was an accident waiting to happen. The bug was mostly hidden by the VM page handling in zenith_wallog_page, where it forcibly generates a FPW record whenever a VM page is evicted: else if (forknum == VISIBILITYMAP_FORKNUM && !RecoveryInProgress()) { /* * Always WAL-log vm. * We should never miss clearing visibility map bits. * * TODO Is it too bad for performance? * Hopefully we do not evict actively used vm too often. */ XLogRecPtr recptr; recptr = log_newpage_copy(&reln->smgr_rnode.node, forknum, blocknum, buffer, false); XLogFlush(recptr); lsn = recptr; But that was just hiding the issue: it's still visible if you had a read-only node relying on the data in the page server, or you killed and restarted the primary node, or you started a branch. In the included test case, I used a new branch to expose this. Fixes https://github.com/zenithdb/zenith/issues/461	2021-08-24 15:59:25 +03:00
anastasia	ad8b5c3845	use updated vendor/postgres	2021-08-23 18:19:59 +03:00
Dmitry Rodionov	dcaa2126f1	fix code format after main rebase	2021-08-23 18:01:59 +03:00
Dmitry Rodionov	b29ca232d6	add ability to disable colors, use argparse for arguments	2021-08-23 17:28:45 +03:00
Dmitry Rodionov	8c62b11bd5	adjust for review	2021-08-23 17:28:45 +03:00
Dmitry Rodionov	35b60d509f	Add support for code format checking using rustfmt in optional pre-commit hook and in ci pipeline. Found issues can be fixed automatically via make fmt.	2021-08-23 17:28:45 +03:00
Dmitry Rodionov	d989580c1c	remove small code duplication involving InMemoryLayer::get_seg_size, and remove redundant Option around new snapshot layer in InMemoryLayer::freeze	2021-08-23 13:00:05 +03:00
anastasia	798160544c	Update zenith readmes: - Move source tree overview into separate docs/sourcetree.md and update it. - Add glossary: docs/glossary.md - Add a draft of Architecture overview to main Readme.md	2021-08-23 10:21:10 +03:00
Max Sharnoff	39bb6fb19c	Marginally improve walkeeper error visibility (#440 ) Adds a warning if a postgres query fails, and some additional context to errors generated inside `ReceiveWalConn::run`	2021-08-19 08:46:18 -07:00

1 2 3 4 5 ...

738 Commits