rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-04 20:12:54 +00:00

Author	SHA1	Message	Date
Arseny Sher	de744a44dd	Add /timeline http request to safekeeper returning its status. Which is mainly generational state (terms) and useful LSNs. Also add /status basic healthcheck request which is now used in tests to determine the safekeeper is up; this fixes #726. ref #115	2021-10-14 19:02:38 +03:00
Heikki Linnakangas	0e026371ec	Optimize WAL decoding slightly. This adds a fast-path for the common case that the record doesn't cross a page boundary. We now split off a new Bytes directly from the original input buffer in that case, instead of copying the record to a new BytesMut. Shaves about 5% of the page server's CPU time on my laptop, in the 'test_bulk_insert' test.	2021-10-14 14:21:23 +03:00
Arthur Petukhovsky	4b87acb1f6	Use logging in python tests (#674 ) * Use logging in python tests * Use f-strings for logs * Don't log test output while running * Use only pytest logging handler * Add more info about pytest logging	2021-10-14 13:10:09 +03:00
Dmitry Ivanov	43957f4401	[cross-repo-ci] Use solely commit hash to test PRs in CI See #744 for the discussion.	2021-10-13 17:16:02 +03:00
Heikki Linnakangas	8a4f092e82	Skip syncing the temp initdb installation. Doesn't make much difference on my laptop with SSD, but every little helps, and with a slower disk it might be noticeable.	2021-10-13 16:59:00 +03:00
Egor Suvorov	6b6b3f68be	Safekeeper metrics refactor (#747 )	2021-10-13 16:28:24 +03:00
Arseny Sher	96f1175a80	Cleanup hardcoded oids.	2021-10-13 10:52:47 +03:00
Patrick Insinger	1c29de81de	pageserver - remove `lsn` from `WALRecord`	2021-10-13 00:03:42 -07:00
Egor Suvorov	f658263543	Revert "Dockerfile: remove wal_acceptor alias for safekeeper" This reverts commit `64ca947722`.	2021-10-12 19:05:58 +00:00
Egor Suvorov	64ca947722	Dockerfile: remove wal_acceptor alias for safekeeper	2021-10-12 19:05:16 +00:00
Egor Suvorov	23f4c0a742	Rename `wal_acceptor` binary to `safekeeper` (#740 ), stage 1/2 * Rename wal_acceptor binary to safekeeper * Rename wal_acceptor.pid and wal_acceptor.log to safekeeper.pid and safekeeper.log * Change some mentions of WAL acceptor to safekeeper * Dockerfile: alias wal_acceptor to safekeeper temporarily until internal scripts are updated	2021-10-12 22:03:06 +03:00
Dmitry Ivanov	7c5b99683c	Speed up builds by passing make jobserver to cargo This change brings the following improvements to our build system: * Now BUILD_TYPE also affects rust apps. * From now on, cargo will respect `-jN` passed via `make`. However, note that `rustc` may spawn multiple threads depending on compile flags. * Cargo is able to cooperate with make to better schedule parallel jobs, which leads to better build times (-20s in release mode on my machine).	2021-10-12 21:02:39 +03:00
Patrick Insinger	160c4aff61	pageserver - use write guard for checkpointing	2021-10-12 10:02:15 -07:00
Patrick Insinger	6e5ca5dc5c	pageserver - create TimelineWriter	2021-10-12 10:02:15 -07:00
Egor Suvorov	f3445949d1	Wal acceptor: report socket bind errors better when daemonizing (#738 ) Fixes #664	2021-10-12 16:51:28 +03:00
Heikki Linnakangas	95a85312f5	Simplify code to build walredo messages. No need to use BytesMut in these functions. Plain Vec is simpler. And should be marginally faster too; I saw BytesMut functions previously in 'perf' profile, consuming around 5% of the overall pageserver CPU time. That's gone with this patch, although I don't see any discernible difference in the overall performance test results.	2021-10-12 10:16:26 +03:00
Heikki Linnakangas	934fb8592f	Detect when a checkpoint is modified in a smarter way. Previously, the WAL receiver we would make a decoded copy of the current Checkpoint before each WAL record, and compare it with the Checkpoint after the record has been processed. If it has changed, the checkpoint relish is updated in the repository. That's somewhat expensive, the Checkpoint::encode() function is visible in 'perf' profile. Change that so that we set a flag whenever the Checkpoint struct is modified, so that we dont need to compare the whole struct anymore.	2021-10-12 09:09:10 +03:00
Dmitry Ivanov	bb239b4f69	[Makefile] Set default build type to debug	2021-10-11 17:08:31 +03:00
Dmitry Ivanov	1cd7900790	[Makefile] Make build type detection more precise Previously, typos like `BUILD_TYPE=rlease` would silently lead to building debug binaries. The current approach is also more future-proof, since we might add `profile`, `valgrind` as well as other build types.	2021-10-11 17:03:51 +03:00
Arseny Sher	8c61c3e54e	Minor safekeeper readme fix.	2021-10-11 16:31:44 +03:00
anastasia	d7c9dd06f4	Implement graceful shutdown at 'pageserver stop': - perform checkpoint for each tenant repository. - wait for the completion of all threads. Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.	2021-10-11 13:35:01 +03:00
Heikki Linnakangas	b9119f11bf	Add perf test case for buffering GiST build. When a WAL record affects multiple pages, we currently duplicate the record for each affected page. That's a bit wasteful, but not too bad for b-tree splits and non-hot heap updates that affect two pages. But buffering GiST index build WAL-logs the whole relation in 32 page chunks, with one giant WAL record for each 32-page chunk. Currently we duplicate that giant record for each of the 32 pages, which is really wasteful. Github issue https://github.com/zenithdb/zenith/issues/720 tracks the problem. This commit adds a test case for it to demonstrate it.	2021-10-11 11:10:58 +03:00
Heikki Linnakangas	7216f22609	Use tracing crate to have more context in log messages. Whenever we start processing a request, we now enter a tracing "span" that includes context information like the tenant and timeline ID, and the operation we're performing. That context information gets attached to every log message we create within the span. That way, we don't need to include basic context information like that in every log message, and it also becomes easier to filter the logs programmatically. This removes the eplicit timeline and tenant IDs from most log messages, as you get that information from the enclosing span now. Also improve log messages in general, dialing down the level of some messages that are not very useful, and adding information to others. We now obey the RUST_LOG env variable, if it's set. The 'tracing' crate allows for different log formatters, like JSON or bunyan output. The one we use now is human-readable multi-line format, which is nice when reading the log directly, but hard for post-processing. For production, we'll probably want JSON output and some tools for working with it, but that's left as a TODO. The log format is easy to change.	2021-10-11 08:59:06 +03:00
Kirill Bulatov	bf58f7f649	Expose certain layered repository structs to reuse in relish storage (#688 )	2021-10-09 19:23:57 +03:00
Patrick Insinger	3f0ebc6a40	pageserver - move early File::open call	2021-10-09 08:45:52 -07:00
Patrick Insinger	0baf4bc796	fix `cargo doc` complaints	2021-10-09 08:45:46 -07:00
Patrick Insinger	c356030660	pageserver - use VecMap for delta metadata & sizes	2021-10-08 15:05:22 -07:00
Patrick Insinger	c4bb6d78d4	pageserver - use VecMap for in memory segsizes	2021-10-08 14:37:32 -07:00
Patrick Insinger	3b82e806f2	pageserver - use VecMap for in-memory PageVersions	2021-10-08 14:11:07 -07:00
Egor Suvorov	403d9779d9	safekeeper: add initial metrics and HTTP handler (#699 , #541 ) * `wal_acceptor`: add HTTP handler, /metrics endpoint only, no authentication * Two gauges are currently reported: `flush_lsn` and `commit_lsn` * Add `DEFAULT_PG_LISTEN_PORT` and `DEFAULT_PG_LISTEN_PORT` consts for uniformity	2021-10-08 18:55:41 +03:00
Patrick Insinger	b3b8f18f61	tests - fix get_timeline_size signature	2021-10-07 15:38:22 -07:00
Heikki Linnakangas	960c7d69a8	Remove 'predecessor' reference from in-memory and delta layers. The caller is now responsible for lookin up the predecessor layer, instead. This makes the code simpler, as you don't need to update the predecessor reference when a layer is frozen or written to disk. There was a bug in that, as Konstantin noted on discord: Assume that freeze doesn't create new inmem layer (maybe_new_open=None). Then we temporary place in historics frozen layer. Assume that now new put_wal_record request arrives. There is no open in-mem layer, so it has to create new one. It is looking for previous layer for read and set it as new in-mem layer predecessor. But as far as I understand, prev layer should be our temporary frozen layer. Which will be then removed from historics. That leaves the predecessor field of the new in-memory layer pointing at the frozen in-memory layer that has been removed from the layer map, preventing it from being removed from memory. This makes two subtle changes: 1. When the first new layer is created on a branch for a segment that existed on the ancestor branch, the start_lsn of the new layer is now the branch point + 1. We were previously slightly confused on what the branch point LSN meant. It means that all the WAL up to and including the LSN on the old branch is visible to the new branch. If we mark the start LSN of the new layer as equal to the branch point, that's wrong, because if there is a WAL record with that LSN on the predecessor layer, the new layer would hide it. This bug was hidden when the layer on the new branch contained a direct reference to the layer in the old branch, as get_page_reconstruct_data() followed that reference directly when it didn't find the page version in the new layer. But now that the caller performs the lookup, it will look up the new layer that doesn't contain the record, and you get an error. 2. InMemoryLayer now always stores the segment size at the beginning of the layer's LSN range. Previously, get_seg_size() might have recursed into the predecessor layer to get the size, but now we avoid that by always copying over the last size from the previous layer, when a new layer is created.	2021-10-08 00:54:13 +03:00
Heikki Linnakangas	60dae0b4ac	Add test case that demonstrates Write Amplification.	2021-10-08 00:34:29 +03:00
Heikki Linnakangas	c660926a06	Refactor duplicated code to get on-disk timeline size in tests. Move it to a common function. In the passing, remove the obsolete check to exclude the 'wal' directory. The 'wal' directory is no more.	2021-10-08 00:34:26 +03:00
Egor Suvorov	7fa04e2d14	zenith_metrics: exit process on config errors (#706 )	2021-10-08 00:14:56 +03:00
Heikki Linnakangas	db4059cd6d	Measure peak memory usage in perf test. Another useful metric to keep an eye on.	2021-10-07 18:03:20 +03:00
Heikki Linnakangas	fdb19fdb92	Remove unused function. The caller was removed in commit `acc0f41985`.	2021-10-07 11:24:27 +03:00
Heikki Linnakangas	53b4dc944d	Don't create unused "wal" directory It hasn't been used since commit `ca9af37478`.	2021-10-07 10:36:26 +03:00
MMeent	a03e1b3895	Docker build now also uses BUILD_TYPE=release. (#712 ) The dockerignore and dockerfile have also been excluded from being moved into docker images, saving docker layer cache busts if only those are changed.	2021-10-06 23:42:00 +02:00
Heikki Linnakangas	15f1bcc9c2	Remove obsolete code, now that we don't load WAL from local disk anymore. Commit `ca9af37478` removed the import_timeline_wal() call from here. After that, the info!() message is bogus, as we no longer load the WAL from local disk. Also, the logical size assertion is pointless now.	2021-10-06 15:59:28 +03:00
MMeent	24580f2493	Improve build system: (#703 ) - Build postgresql with -O2 for releases - Make make make postgresql with 8 parallel threads The node is xlarge, so it has 8 vCPU available	2021-10-06 14:37:27 +02:00
Heikki Linnakangas	e3945d94fd	Store unlogged tables locally, and replace PD_WAL_LOGGED. All the changes are in the vendor/postgres side. However, because we now generate fewer Full Page Writes, the 'branch_behind' test needs to be modified so that it still generates enough WAL to consume a few WAL segments.	2021-10-06 10:58:15 +03:00
Heikki Linnakangas	d806c3a47e	pageserver - serialize PageVersion as it is Removes the need for PageVersionMeta struct.	2021-10-05 11:07:50 -07:00
Egor Suvorov	05fe39088b	Readme updates based on a fresher Ubuntu installation experience (#627 )	2021-10-05 19:19:25 +03:00
Egor Suvorov	530d3eaf09	Add more details to pageserver and safekeeper docs (#680 )	2021-10-05 19:10:50 +03:00
Egor Suvorov	7e190d72a5	Make `pageserver_` prefix for common metric names configurable (#681 )	2021-10-05 19:06:44 +03:00
Patrick Insinger	9c936034b6	pageserver - fix newer clippy lints	2021-10-05 00:28:14 -07:00
Kirill Bulatov	5719f13cb2	Rework the relish thread model (#689 )	2021-10-05 10:15:56 +03:00
Patrick Insinger	d134a9856e	pageserver - introduce RepoHarness for testing	2021-10-04 08:36:35 -07:00
Patrick Insinger	664b99b5ac	pageserver - use constant TIMELINE_ID for tests	2021-10-04 08:36:35 -07:00

... 3 4 5 6 7 ...

1154 Commits