rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-03 11:32:56 +00:00

Author	SHA1	Message	Date
Dmitry Ivanov	18d3d078ad	[WIP] [proxy] Migrate to async	2022-02-08 05:43:32 +03:00
Dmitry Ivanov	c2927353a5	Enable async deserialization of FeMessage Now it's possible to call Fe{Startup,}Message in both sync and async contexts, which is good for proxy. Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2022-01-28 19:40:37 +03:00
Arseny Sher	86045ac36c	Prefix per-cluster directory with ztenant_id in safekeeper. Currently ztimelineids are unique, but all APIs accept the pair, so let's keep it everywhere for uniformity. Carry around ZTTId containing both ZTenantId and ZTimelineId for simplicity. (existing clusters on staging ought to be preprocessed for that)	2022-01-27 17:22:07 +03:00
Konstantin Knizhnik	79f0e44a20	Gc cutoff rwlock (#1139 ) * Reproduce github issue #1047. * Use RwLock to protect gc_cuttof_lsn * Eeduce number of updates in test_gc_aggressive * Change test_prohibit_get_page_at_lsn_for_garbage_collected_pages test * Change test_prohibit_get_page_at_lsn_for_garbage_collected_pages * Lock latest_gc_cutoff_lsn in all operations accessing storage to prevent race conditions with GC * Remove random sleep between wait_for_lsn and get_page_at_lsn * Initialize latest_gc_cutoff with initdb_lsn and remove separate check that lsn >= initdb_lsn * Update test_prohibit_branch_creation_on_pre_initdb_lsn test Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-01-27 14:41:16 +03:00
anastasia	5abe2129c6	Extend replication protocol with ZentihFeedback message to pass current_timeline_size to compute node Put standby_status_update fields into ZenithFeedback and send them as one message. Pass values sizes together with keys in ZenithFeedback message.	2022-01-27 11:20:45 +03:00
Dmitry Rodionov	e6f2d70517	use 2021 rust edition	2022-01-25 18:48:49 +03:00
Dmitry Ivanov	703716228e	Use `&str` instead of `String` in `BeMessage::ErrorResponse` There's no need in allocating string literals in the heap.	2022-01-24 18:49:05 +03:00
Dmitry Rodionov	37c440c5d3	Introduce first version of tenant migraiton between pageservers This patch includes attach/detach http endpoints in pageservers. Some changes in callmemaybe handling inside safekeeper and an integrational test to check migration with and without load. There are still some rough edges that will be addressed in follow up patches	2022-01-24 17:20:15 +03:00
Dmitry Ivanov	d3542c34f1	Refactoring: use anyhow::Context's methods where possible	2022-01-19 16:33:48 +03:00
Heikki Linnakangas	dab30c27b6	Refactor thread management and shutdown This introduces a new module to handle thread creation and shutdown. All page server threads are now registered in a global hash map, and there's a function to request individual threads to shut down gracefully. Thread shutdown request is signalled to the thread with a flag, as well as a Future that can be used to wake up async operations if shutdown is requested. Use that facility to have the libpq listener thread respond to pageserver shutdown, based on Kirill's earlier prototype (https://github.com/zenithdb/zenith/pull/1088). That addresses https://github.com/zenithdb/zenith/issues/1036, previously the libpq listener thread would not exit until one more connection arrives. This also eliminates a resource leak in the accept() loop. Previously, we added the JoinHanlde of each new thread to a vector but old handles for threads that had already exited were never removed.	2022-01-14 18:36:10 +02:00
Heikki Linnakangas	adb0b3dada	Include backtrace in error messages in the log. 'anyhow' crate can include a backtrace in all errors, when the 'backtrace' feature is enabled. Enable it, and change the places that used '{:#}' or '{}' to '{:?}', so that the backtrace is printed.	2022-01-14 10:10:17 +02:00
bojanserafimov	5b9391b51d	Support "query cancel" in proxy (#1052 )	2022-01-05 17:27:12 -05:00
bojanserafimov	24eca8d58b	Parse cancel message in pq_proto (#1060 )	2021-12-28 16:43:44 -05:00
Bojan Serafimov	1e3ddd43bc	Add struct for key data	2021-12-28 22:40:22 +03:00
Bojan Serafimov	989371493b	Add BeMessage::BackendKeyData variant	2021-12-28 22:40:22 +03:00
Kirill Bulatov	f0afd08667	Fix zenith init defaults	2021-12-28 00:21:48 +02:00
Arseny Sher	a163650a99	Refactor Postgres command parsing in safekeeper. Do it separately with SafekeeperPostgresCommand enum as a result. Since query is always C string, switch postgres_backend process_query argument from Bytes to &str. Make passing ztli/ztenant id in safekeeper connection string optional; this is needed for upcoming intra-safekeeper heartbeat cmd which is not bound to any timeline.	2021-12-24 15:48:13 +03:00
Kirill Bulatov	114a757d1c	Use generic config parameters in pageserver cli Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2021-12-23 18:58:28 +02:00
anastasia	3b61f364f7	Stop WAL streaming threads, when compute node is shut down. WAL stream uses the 2 connections: 1. Compute node (walproposer) -> Safekeeper (ReceiveWalConn module) When compute node is shut down, safekeeper needs to stop the respective receiving thread. Prior to this PR it didn't work because PostgresBackend haven't handled disconnection properly. 2. Safekeeper (ReplicationConn module) -> pageserver (walreceiver thread) When incoming WAL stream is gone, safekeeper can stop streaming WAL and cancel connection as soon as replica is caught up. Note that the WAL can be streamed to multiple replicas simultaneously, only disconnect ones that are caught up to the last_recieved_lsn.	2021-12-20 12:34:28 +03:00
Kirill Bulatov	673c297949	Download timelines on demand	2021-12-10 17:23:35 +02:00
Dmitry Ivanov	7cec13d1df	Improve shutdown story for code coverage This patch introduces fixes for several problems affecting LLVM-based code coverage: * Daemonizing parent processes should call _exit() to prevent coverage data file corruption (.profraw) due to concurrent writes. Implement proper shutdown handlers in safekeeper.	2021-12-06 13:27:52 +03:00
Dmitry Rodionov	2669d140f8	use full commit sha for version info for builds in docker this is not needed, since environment variable with commit sha already contains full version	2021-12-01 17:35:57 +03:00
Dmitry Rodionov	130184fee9	Prohibit branch creation and basebackup at out of scope lsns Out of scope LSNs include pre initdb LSNs, and LSNs prior to latest_gc_cutoff. To get there there was also two cleanups: * Fix error handling in Execute message handler. This fixes behaviour when basebackup retured an error. Previously pageserver thread just died. * Remove "ancestor" file which previously contained ancestor id and branch lsn. Currently the same data can be obtained from metadata file. And just the way we handled ancestor file in the code introduced the case when branching fails timeline directory is created but there is no data in it except ancestor file. And this confused gc because it scans directories. So it is better to just remove ancestor file and clean up this timeline directory creation so it happens after all validity checks have passed	2021-11-25 15:27:16 +03:00
Dmitry Ivanov	0ccfc62e88	[proxy] Pass PostgreSQL version to client Fixes #779	2021-11-17 16:28:44 +03:00
Dmitry Ivanov	43ded1c54b	[proxy] Minor cleanup	2021-11-17 16:28:44 +03:00
Dmitry Rodionov	44111e3ba3	Prohibit branch creation at lsn that was already garbage collected. This introduces new timeline field latest_gc_cutoff. It is updated before each gc iteration. New check is added to branch_timelines to prevent branch creation with start point less than latest_gc_cutoff. Also this adds a check to get_page_at_lsn which asserts that lsn at which the page is requested was not garbage collected. This check currently is triggered for readonly nodes which are pinned to specific lsn and because they are not tracked in pageserver garbage collection can remove data that still might be referenced. This is a bug and will be fixed separately.	2021-11-15 20:03:16 +03:00
Patrick Insinger	1ce4976e36	pageserver - track size of VecMaps	2021-11-10 11:09:34 -08:00
Dmitry Rodionov	987833e0b9	Propagate git SHA to zenith binaries Git commit sha is displayed when --version flag is used and is written to logs during service startup. Uses git_version crate when git is available, and GIT_VERSION environment variable otherwise which is the case for docker builds.	2021-11-04 14:22:29 +03:00
Heikki Linnakangas	b38e841f2d	Use poll() in communication with WAL redo process. The tokio futures added some overhead, so switch to plain non-blocking I/O with poll(). In a simple pgbench test on my laptop (select-only queries, scale-factor 1 `pgbench -P1 -T50 -S`), this gives about 10% improvement, from about 4300 TPS to 4800 TPS.	2021-11-04 10:39:04 +02:00
Patrick Insinger	b532470792	Set SO_REUSEADDR for all TCP listeners	2021-10-29 12:45:26 -07:00
Heikki Linnakangas	af429fb401	Improve 'zenith' CLI utility for safekeepers and a config file. The 'zenith' CLI utility can now be used to launch safekeepers. By default, one safekeeper is configured. There are new 'safekeeper start/stop' subcommands to manage the safekeepers. Each safekeeper is given a name that can be used to identify the safekeeper to start/stop with the 'zenith start/stop' commands. The safekeeper data is stored in '.zenith/safekeepers/<name>'. The 'zenith start' command now starts the pageserver and also all safekeepers. 'zenith stop' stops pageserver, all safekeepers, and all postgres nodes. Introduce new 'zenith pageserver start/stop' subcommands for starting/stopping just the page server. The biggest change here is to the 'zenith init' command. This adds a new 'zenith init --config=<path to toml file>' option. It takes a toml config file that describes the environment. In the config file, you can specify options for the pageserver, like the pg and http ports, and authentication. For each safekeeper, you can define a name and the pg and http ports. If you don't use the --config option, you get a default configuration with a pageserver and one safekeeper. Note that that's different from the previous default of no safekeepers. Any fields that are omitted in the configuration file are filled with defaults. You can also specify the initial tenant ID in the config file. A couple of sample config files are added in the control_plane/ directory. The --pageserver-pg-port, --pageserver-http-port, and --pageserver-auth options to 'zenith init' are removed. Use a config file instead. Finally, change the python test fixtures to use the new 'zenith' commands and the config file to describe the environment.	2021-10-27 10:49:38 +03:00
Kirill Bulatov	d88377f9f0	Remove log from zenith_utils	2021-10-26 23:24:11 +03:00
Kirill Bulatov	ecd577c934	Simplify tracing declarations	2021-10-26 23:24:11 +03:00
Konstantin Knizhnik	c310932121	Implement backpressure for compute node to avoid WAL overflow Co-authored-by: Arseny Sher <sher-ars@yandex.ru> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2021-10-21 18:15:50 +03:00
Dmitry Ivanov	85116a8375	[proxy] Prevent TLS stream from hanging This change causes writer halves of a TLS stream to always flush after a portion of bytes has been written by `std::io::copy`. Furthermore, some cosmetic and minor functional changes are made to facilitate debug.	2021-10-20 14:15:49 +03:00
Arseny Sher	de744a44dd	Add /timeline http request to safekeeper returning its status. Which is mainly generational state (terms) and useful LSNs. Also add /status basic healthcheck request which is now used in tests to determine the safekeeper is up; this fixes #726. ref #115	2021-10-14 19:02:38 +03:00
Arseny Sher	96f1175a80	Cleanup hardcoded oids.	2021-10-13 10:52:47 +03:00
anastasia	d7c9dd06f4	Implement graceful shutdown at 'pageserver stop': - perform checkpoint for each tenant repository. - wait for the completion of all threads. Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.	2021-10-11 13:35:01 +03:00
Heikki Linnakangas	7216f22609	Use tracing crate to have more context in log messages. Whenever we start processing a request, we now enter a tracing "span" that includes context information like the tenant and timeline ID, and the operation we're performing. That context information gets attached to every log message we create within the span. That way, we don't need to include basic context information like that in every log message, and it also becomes easier to filter the logs programmatically. This removes the eplicit timeline and tenant IDs from most log messages, as you get that information from the enclosing span now. Also improve log messages in general, dialing down the level of some messages that are not very useful, and adding information to others. We now obey the RUST_LOG env variable, if it's set. The 'tracing' crate allows for different log formatters, like JSON or bunyan output. The one we use now is human-readable multi-line format, which is nice when reading the log directly, but hard for post-processing. For production, we'll probably want JSON output and some tools for working with it, but that's left as a TODO. The log format is easy to change.	2021-10-11 08:59:06 +03:00
Patrick Insinger	0baf4bc796	fix `cargo doc` complaints	2021-10-09 08:45:46 -07:00
Patrick Insinger	c356030660	pageserver - use VecMap for delta metadata & sizes	2021-10-08 15:05:22 -07:00
Patrick Insinger	c4bb6d78d4	pageserver - use VecMap for in memory segsizes	2021-10-08 14:37:32 -07:00
Patrick Insinger	3b82e806f2	pageserver - use VecMap for in-memory PageVersions	2021-10-08 14:11:07 -07:00
Egor Suvorov	7e190d72a5	Make `pageserver_` prefix for common metric names configurable (#681 )	2021-10-05 19:06:44 +03:00
Patrick Insinger	664b99b5ac	pageserver - use constant TIMELINE_ID for tests	2021-10-04 08:36:35 -07:00
Max Sharnoff	84f7dcd052	Fix clippy errors on nightly (2021-09-29) (#691 ) Most of the changes are for the new if-then-panic lint added in https://github.com/rust-lang/rust-clippy/pull/7669.	2021-10-01 15:45:42 -07:00
Patrick Insinger	0a8aaa2c24	zenith_utils - add crashsafe_dir Utility for creating directories and directory trees in a crash safe manor. Minimizes calls to fsync for trees.	2021-10-01 11:41:39 -07:00
Stas Kelvich	3bac4d485d	Fix EncryptionResponse message in pq_proto.rs Positive EncryptionResponse should set 'S' byte, not 'Y'. With that fix it is possible to connect to proxy with SSL enabled and read deciphered notice text. But after the first query everything stucks.	2021-09-27 11:56:43 +03:00
sharnoff	a72707b8cb	Redo #655 with fix: Allow `LeSer`/`BeSer` impls missing either `Serialize` or `Deserialize` Commit message copied below: * Allow LeSer/BeSer impls missing Serialize/Deserialize Currently, using `LeSer` or `BeSer` requires that the type implements both `Serialize` and `DeserializeOwned`, even if we're only using the trait for one of those functionalities. Moving the bounds to the methods gives the convenience of the traits without requiring unnecessary derives. * Remove unused #[derive(Serialize/Deserialize)] This should hopefully reduce compile times - if only by a little bit. Some of these were already unused (we weren't using LeSer/BeSer for the types), but most are have become unused with the change to LeSer/BeSer.	2021-09-24 10:58:01 -07:00
Max Sharnoff	0f770967b4	Revert "Allow `LeSer`/`BeSer` impls missing either `Serialize` or `Deserialize` (#655 ) This reverts commit `bd9f4794d9`.	2021-09-24 10:18:36 -07:00

1 2 3

126 Commits