rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 12:40:37 +00:00

Author	SHA1	Message	Date
Kirill Bulatov	4ade0bb41c	Refactor upload/download_relish function signatures. This makes them more generic, by taking any Read / Write trait implementation, instead of operating directly on a a file.	2021-10-15 11:34:15 +03:00
Arseny Sher	de744a44dd	Add /timeline http request to safekeeper returning its status. Which is mainly generational state (terms) and useful LSNs. Also add /status basic healthcheck request which is now used in tests to determine the safekeeper is up; this fixes #726. ref #115	2021-10-14 19:02:38 +03:00
Heikki Linnakangas	0e026371ec	Optimize WAL decoding slightly. This adds a fast-path for the common case that the record doesn't cross a page boundary. We now split off a new Bytes directly from the original input buffer in that case, instead of copying the record to a new BytesMut. Shaves about 5% of the page server's CPU time on my laptop, in the 'test_bulk_insert' test.	2021-10-14 14:21:23 +03:00
Heikki Linnakangas	8a4f092e82	Skip syncing the temp initdb installation. Doesn't make much difference on my laptop with SSD, but every little helps, and with a slower disk it might be noticeable.	2021-10-13 16:59:00 +03:00
Patrick Insinger	1c29de81de	pageserver - remove `lsn` from `WALRecord`	2021-10-13 00:03:42 -07:00
Patrick Insinger	160c4aff61	pageserver - use write guard for checkpointing	2021-10-12 10:02:15 -07:00
Patrick Insinger	6e5ca5dc5c	pageserver - create TimelineWriter	2021-10-12 10:02:15 -07:00
Heikki Linnakangas	95a85312f5	Simplify code to build walredo messages. No need to use BytesMut in these functions. Plain Vec is simpler. And should be marginally faster too; I saw BytesMut functions previously in 'perf' profile, consuming around 5% of the overall pageserver CPU time. That's gone with this patch, although I don't see any discernible difference in the overall performance test results.	2021-10-12 10:16:26 +03:00
Heikki Linnakangas	934fb8592f	Detect when a checkpoint is modified in a smarter way. Previously, the WAL receiver we would make a decoded copy of the current Checkpoint before each WAL record, and compare it with the Checkpoint after the record has been processed. If it has changed, the checkpoint relish is updated in the repository. That's somewhat expensive, the Checkpoint::encode() function is visible in 'perf' profile. Change that so that we set a flag whenever the Checkpoint struct is modified, so that we dont need to compare the whole struct anymore.	2021-10-12 09:09:10 +03:00
anastasia	d7c9dd06f4	Implement graceful shutdown at 'pageserver stop': - perform checkpoint for each tenant repository. - wait for the completion of all threads. Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.	2021-10-11 13:35:01 +03:00
Heikki Linnakangas	7216f22609	Use tracing crate to have more context in log messages. Whenever we start processing a request, we now enter a tracing "span" that includes context information like the tenant and timeline ID, and the operation we're performing. That context information gets attached to every log message we create within the span. That way, we don't need to include basic context information like that in every log message, and it also becomes easier to filter the logs programmatically. This removes the eplicit timeline and tenant IDs from most log messages, as you get that information from the enclosing span now. Also improve log messages in general, dialing down the level of some messages that are not very useful, and adding information to others. We now obey the RUST_LOG env variable, if it's set. The 'tracing' crate allows for different log formatters, like JSON or bunyan output. The one we use now is human-readable multi-line format, which is nice when reading the log directly, but hard for post-processing. For production, we'll probably want JSON output and some tools for working with it, but that's left as a TODO. The log format is easy to change.	2021-10-11 08:59:06 +03:00
Kirill Bulatov	bf58f7f649	Expose certain layered repository structs to reuse in relish storage (#688 )	2021-10-09 19:23:57 +03:00
Patrick Insinger	3f0ebc6a40	pageserver - move early File::open call	2021-10-09 08:45:52 -07:00
Patrick Insinger	c356030660	pageserver - use VecMap for delta metadata & sizes	2021-10-08 15:05:22 -07:00
Patrick Insinger	c4bb6d78d4	pageserver - use VecMap for in memory segsizes	2021-10-08 14:37:32 -07:00
Patrick Insinger	3b82e806f2	pageserver - use VecMap for in-memory PageVersions	2021-10-08 14:11:07 -07:00
Heikki Linnakangas	960c7d69a8	Remove 'predecessor' reference from in-memory and delta layers. The caller is now responsible for lookin up the predecessor layer, instead. This makes the code simpler, as you don't need to update the predecessor reference when a layer is frozen or written to disk. There was a bug in that, as Konstantin noted on discord: Assume that freeze doesn't create new inmem layer (maybe_new_open=None). Then we temporary place in historics frozen layer. Assume that now new put_wal_record request arrives. There is no open in-mem layer, so it has to create new one. It is looking for previous layer for read and set it as new in-mem layer predecessor. But as far as I understand, prev layer should be our temporary frozen layer. Which will be then removed from historics. That leaves the predecessor field of the new in-memory layer pointing at the frozen in-memory layer that has been removed from the layer map, preventing it from being removed from memory. This makes two subtle changes: 1. When the first new layer is created on a branch for a segment that existed on the ancestor branch, the start_lsn of the new layer is now the branch point + 1. We were previously slightly confused on what the branch point LSN meant. It means that all the WAL up to and including the LSN on the old branch is visible to the new branch. If we mark the start LSN of the new layer as equal to the branch point, that's wrong, because if there is a WAL record with that LSN on the predecessor layer, the new layer would hide it. This bug was hidden when the layer on the new branch contained a direct reference to the layer in the old branch, as get_page_reconstruct_data() followed that reference directly when it didn't find the page version in the new layer. But now that the caller performs the lookup, it will look up the new layer that doesn't contain the record, and you get an error. 2. InMemoryLayer now always stores the segment size at the beginning of the layer's LSN range. Previously, get_seg_size() might have recursed into the predecessor layer to get the size, but now we avoid that by always copying over the last size from the previous layer, when a new layer is created.	2021-10-08 00:54:13 +03:00
Heikki Linnakangas	fdb19fdb92	Remove unused function. The caller was removed in commit `acc0f41985`.	2021-10-07 11:24:27 +03:00
Heikki Linnakangas	53b4dc944d	Don't create unused "wal" directory It hasn't been used since commit `ca9af37478`.	2021-10-07 10:36:26 +03:00
Heikki Linnakangas	15f1bcc9c2	Remove obsolete code, now that we don't load WAL from local disk anymore. Commit `ca9af37478` removed the import_timeline_wal() call from here. After that, the info!() message is bogus, as we no longer load the WAL from local disk. Also, the logical size assertion is pointless now.	2021-10-06 15:59:28 +03:00
Heikki Linnakangas	d806c3a47e	pageserver - serialize PageVersion as it is Removes the need for PageVersionMeta struct.	2021-10-05 11:07:50 -07:00
Egor Suvorov	530d3eaf09	Add more details to pageserver and safekeeper docs (#680 )	2021-10-05 19:10:50 +03:00
Egor Suvorov	7e190d72a5	Make `pageserver_` prefix for common metric names configurable (#681 )	2021-10-05 19:06:44 +03:00
Patrick Insinger	9c936034b6	pageserver - fix newer clippy lints	2021-10-05 00:28:14 -07:00
Kirill Bulatov	5719f13cb2	Rework the relish thread model (#689 )	2021-10-05 10:15:56 +03:00
Patrick Insinger	d134a9856e	pageserver - introduce RepoHarness for testing	2021-10-04 08:36:35 -07:00
Patrick Insinger	664b99b5ac	pageserver - use constant TIMELINE_ID for tests	2021-10-04 08:36:35 -07:00
Max Sharnoff	7fab38c51e	Use threadlocal for walreceiver check (#692 )	2021-10-01 15:47:45 -07:00
Max Sharnoff	84f7dcd052	Fix clippy errors on nightly (2021-09-29) (#691 ) Most of the changes are for the new if-then-panic lint added in https://github.com/rust-lang/rust-clippy/pull/7669.	2021-10-01 15:45:42 -07:00
Patrick Insinger	7095a5d551	pageserver - reject and backup future layer files If a layer file is found with LSN after the disk_consistent_lsn, it is renamed (to avoid conflicts with new layer files) and a warning is logged.	2021-10-01 11:41:39 -07:00
Patrick Insinger	538c2a2a3e	pageserver - store timeline metadata durably The metadata file is now always 512 bytes. The last 4 bytes are a crc32c checksum of the previous 508 bytes. Padding zeroes are added between the serde serialization and the start of the checksum. A single write call is used, and the file is fsyncd after. On file creation, the parent directory is fsyncd as well.	2021-10-01 11:41:39 -07:00
Patrick Insinger	62f83869f1	pageserver - fsync image/delta layers Ensure image and delta layer files are durable. Also, fsync the parent directory to ensure the directory entries are durable.	2021-10-01 11:41:39 -07:00
Patrick Insinger	69670b61c4	pageserver - use crashsafe_dir utility Replace usage of std::fs::create_dir/create_dir_all with crashsafe equivalents.	2021-10-01 11:41:39 -07:00
Heikki Linnakangas	e474790400	Print more details on errors to log Fixes https://github.com/zenithdb/zenith/issues/661	2021-10-01 17:57:41 +03:00
Kirill Bulatov	287ea2e5e3	Limit concurrent relish storage sync operations	2021-10-01 08:37:09 +03:00
Kirill Bulatov	fb05e4cb0b	Show better error messages on pageserver failures	2021-09-29 01:55:41 +03:00
Egor Suvorov	b0a7234759	pageserver: fix stale default listen addrs * In command line help * In dummy_conf	2021-09-28 20:57:51 +03:00
Egor Suvorov	ddf4b15ebc	pageserver: use const_format crate to generate default listen addrs	2021-09-28 20:57:51 +03:00
Egor Suvorov	3065532f15	pageserver: fix mistype in listen-http arg help	2021-09-28 20:57:51 +03:00
Heikki Linnakangas	014be8b230	Use Iterator, to avoid making one copy of page_versions BTreeMap Reduces the CPU time spent in checkpointing, in the write_to_disk() function.	2021-09-27 19:28:02 +03:00
Heikki Linnakangas	08978458be	Refactor write_to_disk, handling dropped segment as a special case. Similar to what commit `7fb7f67b` did to 'freeze', dealing with the dropped segment separately from the rest of the logic makes the code easier to follow. It is also needed by the next commit that replaces the code to build new BTreeMap with an iterator; we cannot pass one of two kinds of closures as argument, it has to always be the same one. Having separate DeltaLayer::create() calls for the case of dropped segment and the other cases works around that.	2021-09-27 19:23:32 +03:00
Heikki Linnakangas	2252d9faa8	Switch to RwLock in InMemoryLayer Allows more parallelism basically for free.	2021-09-27 19:15:40 +03:00
Arthur Petukhovsky	22e15844ae	Fix clippy errors (#673 )	2021-09-27 18:59:30 +03:00
Konstantin Knizhnik	ca9af37478	Do not write WAL at pageserver (#645 ) * Do not write WAL at pageserver * Remove import_timeline_wal function	2021-09-27 14:15:55 +03:00
Heikki Linnakangas	b71e3a40e2	Add more details to the log, when an error happens in GetPage request.	2021-09-24 21:44:22 +03:00
Heikki Linnakangas	41dfc117e7	Buffer the writes to the WAL redo process pipe. Reduces the CPU time spent in the write() syscalls. I noticed that we were spending a lot of CPU time in libc::write, coming from request_redo(), in the 'bulk_insert' test. According to some quick profiling with 'perf', this reduces the CPU time spent in request_redo() from about 30% to 15%. For some reason, it doesn't reduce the overall runtime of the 'bulk_insert' test much, maybe by one second if you squint (from about 37s to 36s), so there must be some other bottleneck, like I/O. But this is surely still a good idea, just based on the reduced CPU cycles.	2021-09-24 21:12:38 +03:00
sharnoff	a72707b8cb	Redo #655 with fix: Allow `LeSer`/`BeSer` impls missing either `Serialize` or `Deserialize` Commit message copied below: * Allow LeSer/BeSer impls missing Serialize/Deserialize Currently, using `LeSer` or `BeSer` requires that the type implements both `Serialize` and `DeserializeOwned`, even if we're only using the trait for one of those functionalities. Moving the bounds to the methods gives the convenience of the traits without requiring unnecessary derives. * Remove unused #[derive(Serialize/Deserialize)] This should hopefully reduce compile times - if only by a little bit. Some of these were already unused (we weren't using LeSer/BeSer for the types), but most are have become unused with the change to LeSer/BeSer.	2021-09-24 10:58:01 -07:00
Max Sharnoff	0f770967b4	Revert "Allow `LeSer`/`BeSer` impls missing either `Serialize` or `Deserialize` (#655 ) This reverts commit `bd9f4794d9`.	2021-09-24 10:18:36 -07:00
Max Sharnoff	bd9f4794d9	Allow `LeSer`/`BeSer` impls missing either `Serialize` or `Deserialize` (#655 ) * Allow LeSer/BeSer impls missing Serialize/Deserialize Currently, using `LeSer` or `BeSer` requires that the type implements both `Serialize` and `DeserializeOwned`, even if we're only using the trait for one of those functionalities. Moving the bounds to the methods gives the convenience of the traits without requiring unnecessary derives. * Remove unused #[derive(Serialize/Deserialize)] This should hopefully reduce compile times - if only by a little bit. Some of these were already unused (we weren't using LeSer/BeSer for the types), but most are have become unused with the change to LeSer/BeSer.	2021-09-24 10:06:03 -07:00
Heikki Linnakangas	ff5cbe2694	Support overlapping and nested Layers in the layer map. This introduces a new tree data structure for holding intervals, and queries of the form "which intervals contain the given point?". It then uses that to store the Layers in the layer map, instead of the BTreeMap. While we don't currently create overlapping layers in the page server, that situation might arise in the future if we start to create extra layers for performance purposes, or as part of some multi-stage garbage collection operation that creates new layers in some interval and then removes old ones. The situation might also arise if you have multiple page servers running on the same timeline, freezing layers at different points, and both uploading them to S3. So even though overlapping layers might not happen currently, let's avoid getting confused if it does happen for some reason. Fixes https://github.com/zenithdb/zenith/issues/517.	2021-09-24 14:10:52 +03:00

1 2 3 4 5 ...

464 Commits