rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Patrick Insinger	24c8dab86f	pageserver - parallelize checkpoint fsyncs	2022-01-04 20:40:57 -08:00
Dmitry Rodionov	c910132d4b	Fix wal receiver shutdown This patch allows to shutdown wal receiver when there are no messages and wal receiver is blocked inside tokio-postgres. In this case it cannot check the shutdown flag. This patch switches to use async interface of tokio-postgres directly without sync wrappers. It opens the possibility to use tokio::select! between the phsycal_stream.next() and a shutdown channel readiness to interrupt replication process. Also this allows to shutdown only particular wal receiver without using global shutdown_requested flag.	2021-12-29 14:42:29 +03:00
bojanserafimov	b807570f46	Use parking_lot::Mutex instead of std::Mutex in walreceiver (#1045 )	2021-12-23 14:25:44 -05:00
Kirill Bulatov	114a757d1c	Use generic config parameters in pageserver cli Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2021-12-23 18:58:28 +02:00
Kirill Bulatov	5cff7d1de9	Use proper download order	2021-12-14 15:32:22 +02:00
Kirill Bulatov	e61732ca7c	Compress checkpoint files before streaming into S3	2021-12-10 17:23:35 +02:00
Heikki Linnakangas	cb4a8396fb	Use rustls rather than native-tls in all dependencies. We depends on rustls in postgres_backend anyway, so might as well use it for all TLS stuff. Seems better to depend on only one library both from a security point of view, and because fewer dependencies means less code to compile. With this commit, we no longer depend on OpenSSL.	2021-12-10 15:14:27 +02:00
Heikki Linnakangas	9d369f158c	Update rust-s3 to version 0.28.0 0.28.0 includes two changes I submitted to upstream: - Add support for older ListObjects API, needed to use rust-s3 with Google Cloud Storage: https://github.com/durch/rust-s3/pull/229 - If file is smaller than one chunk, don't initiate multi-part upload. https://github.com/durch/rust-s3/pull/228 These are not critical for Zenith right now, but let's stay up-to-date.	2021-12-10 14:52:08 +02:00
Dmitry Ivanov	7cec13d1df	Improve shutdown story for code coverage This patch introduces fixes for several problems affecting LLVM-based code coverage: * Daemonizing parent processes should call _exit() to prevent coverage data file corruption (.profraw) due to concurrent writes. Implement proper shutdown handlers in safekeeper.	2021-12-06 13:27:52 +03:00
Kirill Bulatov	670205e17a	Evict excessively failing sync tasks, improve processing for the rest of the tasks	2021-11-30 13:58:49 +02:00
Heikki Linnakangas	9300107cdf	Cache Book objects, use virtual files to avoid running out of fds. Currently, whenever a page version is needed from an image or delta layer, we open the file and read and parse the bookfile headers. That's pretty expensive. To reduce the overhead, introduce a cache of open file descriptors, and use that to cache the Book objects so that we don't need to read the metadata on every access.	2021-11-10 17:19:37 +02:00
Heikki Linnakangas	b38e841f2d	Use poll() in communication with WAL redo process. The tokio futures added some overhead, so switch to plain non-blocking I/O with poll(). In a simple pgbench test on my laptop (select-only queries, scale-factor 1 `pgbench -P1 -T50 -S`), this gives about 10% improvement, from about 4300 TPS to 4800 TPS.	2021-11-04 10:39:04 +02:00
Dmitry Rodionov	5bc09074ea	add a flag to avoid non incremental size calculation in pageserver http api This calculation is not that heavy but it is needed only in tests, and in case the number of tenants/timelines is high the calculation can take noticeable time. Resolves https://github.com/zenithdb/zenith/issues/804	2021-10-27 13:30:34 +03:00
Kirill Bulatov	04fb0a0342	Add core relish backup and restore functionality	2021-10-22 22:22:38 +03:00
anastasia	d7c9dd06f4	Implement graceful shutdown at 'pageserver stop': - perform checkpoint for each tenant repository. - wait for the completion of all threads. Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.	2021-10-11 13:35:01 +03:00
Heikki Linnakangas	7216f22609	Use tracing crate to have more context in log messages. Whenever we start processing a request, we now enter a tracing "span" that includes context information like the tenant and timeline ID, and the operation we're performing. That context information gets attached to every log message we create within the span. That way, we don't need to include basic context information like that in every log message, and it also becomes easier to filter the logs programmatically. This removes the eplicit timeline and tenant IDs from most log messages, as you get that information from the enclosing span now. Also improve log messages in general, dialing down the level of some messages that are not very useful, and adding information to others. We now obey the RUST_LOG env variable, if it's set. The 'tracing' crate allows for different log formatters, like JSON or bunyan output. The one we use now is human-readable multi-line format, which is nice when reading the log directly, but hard for post-processing. For production, we'll probably want JSON output and some tools for working with it, but that's left as a TODO. The log format is easy to change.	2021-10-11 08:59:06 +03:00
Patrick Insinger	664b99b5ac	pageserver - use constant TIMELINE_ID for tests	2021-10-04 08:36:35 -07:00
Egor Suvorov	ddf4b15ebc	pageserver: use const_format crate to generate default listen addrs	2021-09-28 20:57:51 +03:00
Kirill Bulatov	1d5abf1253	Initial version of the relish storage	2021-09-17 15:30:22 +03:00
Kirill Bulatov	3ab60ce76f	Unify tokio deps and bump cargo resolver version	2021-09-15 16:00:08 +03:00
Dmitry Rodionov	84008a2560	factor out common logging initialisation routine This contains a lowest common denominator of pageserver and safekeeper log initialisation routines. It uses daemonize flag to decide where to stream log messages. In case daemonize is true log messages are forwarded to file. Otherwise streaming to stdout is used. Usage of stdout for log output is the default in docker side of things, so make it easier to browse our logs via builtin docker commands.	2021-09-14 18:09:14 +03:00
Heikki Linnakangas	de9d5e0aa4	Remove unnecessary dependencies. Found by "cargo udeps"	2021-08-25 18:51:15 +03:00
Heikki Linnakangas	5998744bcc	Remove rocksdb implementation. The layered storage format is good enough that we don't need the rocksdb implementation anymore. There are a lot of known issues but we'll keep working on them.	2021-08-25 18:37:22 +03:00
Dmitry Rodionov	23b5249512	translate pageserver api to http	2021-08-24 19:05:00 +03:00
Heikki Linnakangas	2450f82de5	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>" The unit tests have been refactored to exercise both implementations. 'layered' is now the default. Push/pull is not implemented. The 'test_history_inmemory' test has been commented out accordingly. It's not clear how we will implement that functionality; probably by copying the snapshot files directly.	2021-08-16 10:06:48 +03:00
Dmitry Ivanov	cb1b4a12a6	Add some prometheus metrics to pageserver The metrics are served by an http endpoint, which is meant to be spawned in a new thread. In the future the endpoint will provide more APIs, but for the time being, we won't bother with proper routing.	2021-08-03 21:42:24 +03:00
Heikki Linnakangas	47824c5fca	Remove page server interactive mode. It was pretty cool, but no one used it, and it had gotten badly out of date. The main interesting thing with it was to see some basic metrics on the fly, while the page server is running, but the metrics collection had been broken for a long time, too. Best to just remove it.	2021-07-23 12:21:21 +03:00
Dmitry Rodionov	ed0fcfa9b7	replace parse_duration crate because of unpatched known vulnerability resolves #87	2021-07-16 14:30:27 +03:00
Patrick Insinger	cc169a6896	pageserver - config file To simplify cloud ops, allow configuration via file. toml is used as the config format, and the file is stored in the working directory. Arguments used at initialization are saved in the config file. Config file params may be overridden by CLI arguments.	2021-06-14 09:40:22 -07:00
Stas Kelvich	002cd8ed5b	Dockerfile for pageserver.	2021-06-01 16:08:32 +03:00
Heikki Linnakangas	34f4207501	Refactoring of the Repository/Timeline stuff - All timelines are now stored in the same rocksdb repository. The GET functions have been taught to follow the ancestors. - Change the way relation size is stored. Instead of inserting "tombstone" entries for blocks that are truncated away, store relation size as separate key-value entry for each relation - Add an abstraction for the key-value store: ObjectStore. It allows swapping RocksDB with some other key-value store easily. Perhaps we will write our own storage implementation using that interface, or perhaps we'll need a different abstraction, but this is a small improvement over status quo in any case. - Garbage Collection is broken and commented out. It's not clear where and how it should be implemented.	2021-05-27 20:07:50 +03:00
Stas Kelvich	746f667311	Refactor CLI and CLI<->pageserver interfaces to support remote pageserver This patch started as an effort to support CLI working against remote pageserver, but turned into a pretty big refactoring. * CLI now does not look into repository files directly. New commands 'branch_create' and 'identify_system' were introduced into page_service to support that. * Branch management that was scattered between local_env and zenith/main.rs is moved into pageserver/branches.rs. That code could better fit in Repository/Timeline impl, but I'll leave that for a different patch. * All tests-related code from local_env went into integration_tests/src/lib.rs as an extension to PostgresNode trait. * Paths-generating functions were concentrated around corresponding config types (LocalEnv and PageserverConf).	2021-05-17 19:17:51 +03:00
Patrick Insinger	99d80aba52	use pageserver for pg list command	2021-05-12 12:34:03 +03:00
Eric Seppanen	0cbb3798da	try using serde to do all the serialization in wal_service This version validates on every call that our result is exactly the same as the previous result. NodeId is a strange corner case: one field is serialized little-endian and one field is serialized big-endian. Hopefully we can fix that in the future.	2021-05-10 16:21:05 -07:00
Eric Seppanen	1767208563	remove tokio-postgres from dependencies	2021-05-10 15:24:55 -07:00
Eric Seppanen	4b46693c81	adapt to new upstream tokio-postgres replication interface Switch over to a newer version of rust-postgres PR752. A few minor changes are required: - PgLsn::UNDEFINED -> PgLsn::from(0) - PgTimestamp -> SystemTime	2021-05-10 15:24:55 -07:00
Eric Seppanen	df5a55c445	add workspace_hack crate Our builds can be a little inconsistent, because Cargo doesn't deal well with workspaces where there are multiple crates which have different dependencies that select different features. As a workaround, copy what other big rust projects do: add a workspace_hack crate. This crate just pins down a set of dependencies and features that satisfies all of the workspace crates. The benefits are: - running `cargo build` from one of the workspace subdirectories now works without rebuilding anything. - running `cargo install` works (without rebuilding anything). - making small dependency changes is much less likely to trigger large dependency rebuilds.	2021-05-07 13:08:31 -07:00
Eric Seppanen	2e0d45d092	Switch to upstream rust-s3 The local fork of rust-s3 has some code to support Google Cloud, but that PR no longer applies upstream, and will need significant changes before it can be re-submitted. In the meantime, we might as well just use the most similar upstream release. The benefit of switching is that it fixes a feature-resolution bug that was causing us to build 24 more crates than needed (mostly async-std and its dependencies).	2021-05-04 12:02:00 -07:00
Eric Seppanen	a3818dee58	pin dependencies to versions If there isn't any version specified for a dependency crate, Cargo may choose a newer version. This could happen when Cargo.lock is updated ("cargo update") but can also happen unexpectedly when adding or changing other dependencies. This can allow API-breaking changes to be picked up, breaking the build. To prevent this, specify versions for all dependencies. Cargo is still allowed to pick newer versions that are (hopefully) non-breaking, by analyzing the semver version number. There are two special cases here: 1. serde_derive::{Serialize, Deserialize} isn't really used any more. It was only a separate crate in the past because of compiler limitations. Nowadays, people turn on the "derive" feature of the serde crate and use serde::{Serialize, Deserialize}. 2. parse_duration is unmaintained and has an open security issue. (gh iss. 87) That issue probably isn't critical for us because of where we use that crate, but it's probably still better to pin the version so we can't get hit with an API-breaking change at an awkward time.	2021-05-03 14:02:10 -07:00
Heikki Linnakangas	93d7d2ae2a	Refactor pagecache <-> Wal redo communication After the rocksdb patch (commit `6aa38d3f7d`), the CacheEntry struct was used only momentarily in the communication between the page_cache and the walredo modules. It was in fact not stored in any cache anymore. For clarity, refactor the communication. There is now a WalRedoManager struct, with `request_redo` function, that can be used to request WAL replay of a particular page. It sends a request to a queue like before, but the queue has been replaced with tokio::sync::mpsc. Previously, the resulting page image was stored directly in the CacheEntry, and the requestor was notified using a condition variable. Now, the requestor includes a 'oneshot' channel in the request, and the WAL redo manager sends the response there.	2021-04-24 12:24:04 +03:00
Konstantin Knizhnik	ed30f2096c	Disable GC by default	2021-04-22 11:30:27 +03:00
Konstantin Knizhnik	da9508716d	Address issues from Eric's review	2021-04-22 10:37:52 +03:00
Konstantin Knizhnik	9e7c45cb72	Merge with master	2021-04-22 09:45:13 +03:00
Eric Seppanen	2cd730d31f	page_cache: replace long mutex sleep with SeqWait When calling into the page cache, it was possible to wait on a blocking mutex, which can stall the async executor. Replace that sleep with a SeqWait::wait_for(lsn).await so that the executor can go on with other work while we wait. Change walreceiver_works to an AtomicBool to avoid the awkwardness of taking the lock, then dropping it while we call wait_for and then acquiring it again to do real work.	2021-04-21 18:02:13 -07:00
Konstantin Knizhnik	4f3f0304c2	Merge branch 'main' into rocksdb_pageserver	2021-04-21 19:05:02 +03:00
Konstantin Knizhnik	c981f4ad66	Implement garbage collection of unused versions	2021-04-21 19:04:30 +03:00
Heikki Linnakangas	e911427872	Remove some unnecessary dependencies	2021-04-21 16:42:12 +03:00
Konstantin Knizhnik	07507274c0	Merge branch 'main' into rocksdb_pageserver	2021-04-21 16:06:31 +03:00
Heikki Linnakangas	3600b33f1c	Implement "timelines" in page server This replaces the page server's "datadir" concept. The Page Server now always works with a "Zenith Repository". When you initialize a new repository with "zenith init", it runs initdb and loads an initial basebackup of the freshly-created cluster into the repository, on "main" branch. Repository can hold multiple "timelines", which can be given human-friendly names, making them "branches". One page server simultaneously serves all timelines stored in the repository, and you can have multiple Postgres compute nodes connected to the page server, as long they all operate on a different timeline. There is a new command "zenith branch", which can be used to fork off new branches from existing branches. The repository uses the directory layout desribed as Repository format v1 in https://github.com/zenithdb/rfcs/pull/5. It it highly inefficient: - we never create new snapshots. So in practice, it's really just a base backup of the initial empty cluster, and everything else is reconstructed by redoing all WAL - when you create a new timeline, the base snapshot and all WAL is copied from the new timeline to the new one. There is no smarts about referencing the old snapshots/wal from the ancestor timeline. To support all this, this commit includes a bunch of other changes: - Implement "basebackup" funtionality in page server. When you initialize a new compute node with "zenith pg create", it connects to the page server, and requests a base backup of the Postgres data directory on that timeline. (the base backup excludes user tables, so it's not as bad as it sounds). - Have page server's WAL receiver write the WAL into timeline dir. This allows running a Page Server and Compute Nodes without a WAL safekeeper, until we get around to integrate that properly into the system. (Even after we integrate WAL safekeeper, this is perhaps how this will operate when you want to run the system on your laptop.) - restore_datadir.rs was renamed to restore_local_repo.rs, and heavily modified to use the new format. It now also restores all WAL. - Page server no longer scans and restores everything into memory at startup. Instead, when the first request is made for a timeline, the timeline is slurped into memory at that point. - The responsibility for telling page server to "callmemaybe" was moved into Postgres libpqpagestore code. Also, WAL producer connstring cannot be specified in the pageserver's command line anymore. - Having multiple "system identifiers" in the same page server is no longer supported. I repurposed much of that code to support multiple timelines, instead. - Implemented very basic, incomplete, support for PostgreSQL's Extended Query Protocol in page_service.rs. Turns out that rust-postgres' copy_out() function always uses the extended query protocol to send out the command, and I'm using that to stream the base backup from the page server. TODO: I haven't fixed the WAL safekeeper for this scheme, so all the integration tests involving safekeepers are failing. My plan is to modify the safekeeper to know about Zenith timelines, too, and modify it to work with the same Zenith repository format. It only needs to care about the '.zenith/timelines/<timeline>/wal' directories.	2021-04-20 19:11:27 +03:00
Konstantin Knizhnik	95160dee6d	Merge with main branch	2021-04-19 17:00:30 +03:00

1 2

57 Commits