rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 13:10:37 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	40c79988a8	Move code to handle snapshot filenames This isn't very useful yet, but the next commit will add more code related to handling the filenames.	2021-08-27 02:35:16 +03:00
Patrick Insinger	d265b4cdd3	waldecoder - check for trailing bytes When we parse the main data in a WAL record, ensure we consume all bytes.	2021-08-26 10:24:33 -07:00
Arseny Sher	c4450907e5	Don't hide exact error of get_timeline. ref #470	2021-08-25 20:46:31 +03:00
Heikki Linnakangas	4046530160	Remove remnants of choosing between repository formats. Now that we only have one Repository implementation, no need for the command-line options to choose it either. I'm removing these as a separate commit to show what we will need to do if we add another Repository implementation in the future (even though I don't foresee us doing that any time soon)	2021-08-25 18:37:22 +03:00
Heikki Linnakangas	5998744bcc	Remove rocksdb implementation. The layered storage format is good enough that we don't need the rocksdb implementation anymore. There are a lot of known issues but we'll keep working on them.	2021-08-25 18:37:22 +03:00
Heikki Linnakangas	250ae643a8	Remove 'zenith push' feature. Now that the new storage format is based on immutable files, we want to implement push/pull in terms of these immutable files as well. Similarly to how those files will be transferred between S3 and the page server. The implementation we had was fairly tightly coupled with the object repository implementation, but I'm about to remove the object / rocksdb storage format soon. That would leave the current "zenith push" command completely broken. It seemed like a good idea at the time, but in hindsight, it was premature to implement push/pull yet. It's a nice feature and I'd like to see it reimplemented in the future, but in the meanwhile, let's remove the code we had. We can dig the parts of it that might be useful in the future from the git history.	2021-08-25 18:37:22 +03:00
Heikki Linnakangas	19fcea99da	If too much memory is being used for in-memory layers, flush oldest one. The old policy was to flush all in-memory layers to disk every 10 seconds. That was a pretty dumb policy, unnecessarily aggressive. This commit changes the policy so that we only flush layers where the oldest WAL record is older than 16 MB from the last valid LSN on the timeline. That's still pretty aggressive, but it's a step in the right direction. We do need a limit on how old the oldest in-memory layer is allowed to be, because that determines how much WAL the safekeepers need to hold onto, and how much WAL we need to reprocess in case of a page server crash. 16 MB is surely still too aggressive for that, but it's easy to change the setting later. To support that, keep all in-memory layers in a binary heap, so that we can easily find the one with the oldest LSN. This tracks and a new LSN value in the metadata file: 'disk_consistent_lsn'. Before, on page server restart we restarted the WAL processing from the 'last_record_lsn' value, but now that we don't flush everything to disk in one go, the 'last_record_lsn' tracked in memory is usually ahead of the last record that's been flushed to disk. Even though we track that oldest LSN now, the crash recovery story isn't really complete. We don't do fsync()s anywhere, and thing will break if a snapshot file isn't complete, as there's no CRC on them. That's not new, and it's a TODO.	2021-08-25 11:20:47 +03:00
Dmitry Rodionov	f2f02a8af0	apply transformation (Arc<Option> -> Option<Arc>) suggested by @funbringer	2021-08-24 19:05:00 +03:00
Dmitry Rodionov	b135723994	review adjustments	2021-08-24 19:05:00 +03:00
Dmitry Rodionov	23b5249512	translate pageserver api to http	2021-08-24 19:05:00 +03:00
Heikki Linnakangas	81dd4bc41e	Fix decoding XLOG_HEAP_DELETE and XLOG_HEAP_UPDATE records. Because the t_cid field was missing from the XlHeapDelete struct that corresponds to the PostgreSQL xl_heap_delete struct, the check for the XLH_DELETE_ALL_VISIBLE_CLEARED flag did not work correctly. Decoding XlHeapUpdate struct was also missing the t_cid field, but that didn't cause any immediate problems because in that struct, the t_cid field is after all the fields that the page server cares about. But fix that too, as it was an accident waiting to happen. The bug was mostly hidden by the VM page handling in zenith_wallog_page, where it forcibly generates a FPW record whenever a VM page is evicted: else if (forknum == VISIBILITYMAP_FORKNUM && !RecoveryInProgress()) { /* * Always WAL-log vm. * We should never miss clearing visibility map bits. * * TODO Is it too bad for performance? * Hopefully we do not evict actively used vm too often. */ XLogRecPtr recptr; recptr = log_newpage_copy(&reln->smgr_rnode.node, forknum, blocknum, buffer, false); XLogFlush(recptr); lsn = recptr; But that was just hiding the issue: it's still visible if you had a read-only node relying on the data in the page server, or you killed and restarted the primary node, or you started a branch. In the included test case, I used a new branch to expose this. Fixes https://github.com/zenithdb/zenith/issues/461	2021-08-24 15:59:25 +03:00
Dmitry Rodionov	dcaa2126f1	fix code format after main rebase	2021-08-23 18:01:59 +03:00
Dmitry Rodionov	d989580c1c	remove small code duplication involving InMemoryLayer::get_seg_size, and remove redundant Option around new snapshot layer in InMemoryLayer::freeze	2021-08-23 13:00:05 +03:00
anastasia	20e6cd7724	Update test_twophase - check that we correctly restore files at compute node start.	2021-08-19 12:15:09 +03:00
Heikki Linnakangas	3319befc30	Revert a bunch of commits that I pushed by accident This reverts commits: `e35a5aa550` `a389c2ed7f` `11ebcb531f` `8d2b61f4d1` `882f549236` `ddb7155bbe` Those were follow-up work on top of PR https://github.com/zenithdb/zenith/pull/430, but they were still very much not ready.	2021-08-17 19:20:27 +03:00
Heikki Linnakangas	ddb7155bbe	WIP Store base images in separate ImageLayers	2021-08-17 18:55:04 +03:00
Heikki Linnakangas	882f549236	WIP: store base images separately	2021-08-17 18:54:53 +03:00
Heikki Linnakangas	8d2b61f4d1	Move code to handle snapshot filenames	2021-08-17 18:54:53 +03:00
Heikki Linnakangas	11ebcb531f	Add Gauge for # of layers	2021-08-17 18:54:53 +03:00
Heikki Linnakangas	a389c2ed7f	WIP: Track oldest open layer	2021-08-17 18:54:53 +03:00
Heikki Linnakangas	e35a5aa550	WIP: track mem usage	2021-08-17 18:54:53 +03:00
Heikki Linnakangas	45f641cabb	Handle last "open" layer specially in LayerMap. There can be only one "open" layer for each segment. That's the last one, implemented by InMemoryLayer. That's the only one where new records can be appended to. Much of the code needed to distinguish between the last open layer and other layers anyway, so make the distinction explicit in LayerMap.	2021-08-17 18:54:51 +03:00
Heikki Linnakangas	48f4a7b886	Refactor get_page_at_lsn() logic to layered_repository.rs There was a a lot of duplicated code between the get_page_at_lsn() implementations in InMemoryLayer and SnapshotLayer. Move the code for requesting WAL redo from the Layer trait into LayeredTimeline. The get-function in Layer now just returns the WAL records and base image to the caller, and the caller is responsible for performing the WAL redo on them.	2021-08-17 18:54:48 +03:00
Heikki Linnakangas	91f72fabc9	Work with smaller segments. Split each relish into fixed-sized 10 MB segments. Separate layers are created for each segment. This reduces the write amplification if you have a large relation and update only parts of it; the downside is that you have a lot more files. The 10 MB is just a guess, we should do some modeling and testing in the future to figure out the optimal size. Each segment tracks the size of the segment separately. To figure out the total size of a relish, you need to loop through the segment to find the highest segment that's in use. That's a bit inefficient, but will do for now. We might want to add a cache or something later.	2021-08-17 18:54:41 +03:00
anastasia	921ec390bc	cargo fmt	2021-08-16 19:41:07 +03:00
Heikki Linnakangas	7ee8de3725	Add metrics to WAL redo. Track the time spent on replaying WAL records by the special Postgres process, the time spent waiting for acces to the Postgres process (since there is only one per tenant), and the number of records replayed.	2021-08-16 15:49:17 +03:00
Heikki Linnakangas	047a05efb2	Minor formatting and comment fixes.	2021-08-16 15:48:59 +03:00
Heikki Linnakangas	2450f82de5	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>" The unit tests have been refactored to exercise both implementations. 'layered' is now the default. Push/pull is not implemented. The 'test_history_inmemory' test has been commented out accordingly. It's not clear how we will implement that functionality; probably by copying the snapshot files directly.	2021-08-16 10:06:48 +03:00
Heikki Linnakangas	6e22a8f709	Refactor WAL redo to not use a separate thread. My main motivation is to make it easier to attribute time spent in WAL redo to the request that needed the WAL redo. With this patch, the WAL redo is performed by the requester thread, so it shows up in stack traces and in 'perf' report as part of the requester's call stack. This is also slightly simpler (less lines of code) and should be a bit faster too.	2021-08-13 17:23:36 +03:00
Heikki Linnakangas	8517d9696d	Move gc_iteration() function to Repository trait. The upcoming layered storage implementation handles GC as a repository-wide operation because it needs to pay attention to the branch points of all timelines.	2021-08-12 23:46:01 +03:00
anastasia	6c3726913f	Introduce check for physical relishes. They represent files and use RelationSizeEntry to track existing and dropped files. They can be both blocky and non-blocky. get_relish_size() and get_rel_exists() functions work with physical relishes, not only with blocky ones.	2021-08-12 14:42:21 +03:00
anastasia	1bfade8adc	Issue #330 . Use put_unlink for twophase relishes. Follow PostgreSQL logic: remove Twophase files when prepared transaction is committed/aborted. Always store Twophase segments as materialized page images (no wal records).	2021-08-12 14:42:21 +03:00
anastasia	4eebe22fbb	cargo fmt	2021-08-12 14:42:21 +03:00
Heikki Linnakangas	20d5e757ca	Remove now-unused get_next_tag function. The only caller was removed by commit `c99a211b01`.	2021-08-11 22:16:38 +03:00
Heikki Linnakangas	70cb399d59	Add convenience function to create a RowDescriptor message for an int8 col. Makes the code to construct a result set a bit more terse and readable.	2021-08-11 20:17:33 +03:00
Dmitry Rodionov	ce5333656f	Introduce authentication v0.1. Current state with authentication. Page server validates JWT token passed as a password during connection phase and later when performing an action such as create branch tenant parameter of an operation is validated to match one submitted in token. To allow access from console there is dedicated scope: PageServerApi, this scope allows access to all tenants. See code for access validation in: PageServerHandler::check_permission. Because we are in progress of refactoring of communication layer involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper now doesn’t check token passed from compute, and uses “hardcoded” token passed via environment variable to communicate with pageserver. Compute postgres now takes token from environment variable and passes it as a password field in pageserver connection. It is not passed through settings because then user will be able to retrieve it using pg_settings or SHOW .. I’ve added basic test in test_auth.py. Probably after we add authentication to remaining network paths we should enable it by default and switch all existing tests to use it.	2021-08-11 20:05:54 +03:00
Konstantin Knizhnik	b607f0fd8e	Align prev record CRC on 8-bytes boundary (#407 )	2021-08-11 08:56:37 +03:00
anastasia	c99a211b01	Fix CLOG truncate handling in case of wraparound.	2021-08-11 05:49:24 +03:00
anastasia	949ac54401	Add test of clog (pg_xact) truncation	2021-08-11 05:49:24 +03:00
anastasia	e406811375	Fixes for handling SLRU relishes: replace get_tx_status() with self.get_tx_is_in_progress() to handle xacts in truncated SLRU segments correctly	2021-08-11 05:49:24 +03:00
anastasia	590ace104a	Fixes for handling SLRU relishes: - don't return ZERO_PAGE from get_page_at_lsn_nowait() for truncated SLRU segments;	2021-08-11 05:49:24 +03:00
anastasia	e475f82ff1	Rename get_rel_size() to get_relish_size(). Don't bail if relish is not found, just return None and let the caller to decide how to handle this	2021-08-11 05:49:24 +03:00
anastasia	a368642790	cargo fmt	2021-08-10 14:26:52 +03:00
anastasia	8c7983797b	Remove unused SLRUTruncate ObjectValue	2021-08-10 14:26:32 +03:00
anastasia	a5d57ca10b	list_nonrels() returns elements in arbitrary order. Remove incorrect comments that say otherwise.	2021-08-06 15:23:46 +03:00
Konstantin Knizhnik	3ca3394170	[refer #395 ] Check WAL record CRC in waldecoder (#396 )	2021-08-05 16:57:57 +03:00
Stas Kelvich	ec07acfb12	fix typo in run_initdb()	2021-08-04 23:57:17 +03:00
Stas Kelvich	fa04096733	cargo fmt pass	2021-08-04 23:51:02 +03:00
Dmitry Ivanov	ed634ec320	Extract message processing function from PostgresBackend's event loop This patch has been extracted from #348, where it became unnecessary after we had decided that we didn't want to measure anything inside PostgresBackend. IMO the change is good enough to make its way into the codebase, even though it brings nothing "new" to the code.	2021-08-04 10:49:02 +03:00
Dmitry Ivanov	cb1b4a12a6	Add some prometheus metrics to pageserver The metrics are served by an http endpoint, which is meant to be spawned in a new thread. In the future the endpoint will provide more APIs, but for the time being, we won't bother with proper routing.	2021-08-03 21:42:24 +03:00

... 25 26 27 28 29 ...

1609 Commits