rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-05 04:22:56 +00:00

Author	SHA1	Message	Date
anastasia	cbeb67067c	Issue #367 . Change CLI so that we always create node from scratch at 'pg start'. This operation preserve previously existing config Add new flag '--config-only' to 'pg create'. If this flag is passed, don't perform basebackup, just fill initial postgresql.conf for the node.	2021-08-17 18:12:31 +03:00
Heikki Linnakangas	f37cb21305	Update Cargo.lock for addition of 'bincode' Commit `5eb1738e8b` added a dependency to the 'bincode' crate. 'cargo build' adds it to Cargo.lock automatically, so let's remember it.	2021-08-16 19:24:26 +03:00
Heikki Linnakangas	2450f82de5	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>" The unit tests have been refactored to exercise both implementations. 'layered' is now the default. Push/pull is not implemented. The 'test_history_inmemory' test has been commented out accordingly. It's not clear how we will implement that functionality; probably by copying the snapshot files directly.	2021-08-16 10:06:48 +03:00
Dmitry Rodionov	ce5333656f	Introduce authentication v0.1. Current state with authentication. Page server validates JWT token passed as a password during connection phase and later when performing an action such as create branch tenant parameter of an operation is validated to match one submitted in token. To allow access from console there is dedicated scope: PageServerApi, this scope allows access to all tenants. See code for access validation in: PageServerHandler::check_permission. Because we are in progress of refactoring of communication layer involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper now doesn’t check token passed from compute, and uses “hardcoded” token passed via environment variable to communicate with pageserver. Compute postgres now takes token from environment variable and passes it as a password field in pageserver connection. It is not passed through settings because then user will be able to retrieve it using pg_settings or SHOW .. I’ve added basic test in test_auth.py. Probably after we add authentication to remaining network paths we should enable it by default and switch all existing tests to use it.	2021-08-11 20:05:54 +03:00
anastasia	cc877f1980	Add unit test for find_end_of_wal(). Based on previous attempt to add same test by @lubennikovaav Now WAL files are generated by initdb command.	2021-08-10 12:30:21 +03:00
Dmitry Ivanov	cb1b4a12a6	Add some prometheus metrics to pageserver The metrics are served by an http endpoint, which is meant to be spawned in a new thread. In the future the endpoint will provide more APIs, but for the time being, we won't bother with proper routing.	2021-08-03 21:42:24 +03:00
Heikki Linnakangas	9ff122835f	Refactor ObjectTags, intruducing a new concept called "relish" This clarifies - I hope - the abstractions between Repository and ObjectRepository. The ObjectTag struct was a mix of objects that could be accessed directly through the public Timeline interface, and also objects that were created and used internally by the ObjectRepository implementation and not supposed to be accessed directly by the callers. With the RelishTag separaate from ObjectTag, the distinction is more clear: RelishTag is used in the public interface, and ObjectTag is used internally between object_repository.rs and object_store.rs, and it contains the internal metadata object types. One awkward thing with the ObjectTag struct was that the Repository implementation had to distinguish between ObjectTags for relations, and track the size of the relation, while others were used to store "blobs". With the RelishTags, some relishes are considered "non-blocky", and the Repository implementation is expected to track their sizes, while others are stored as blobs. I'm not 100% happy with how RelishTag captures that either: it just knows that some relish kinds are blocky and some non-blocky, and there's an is_block() function to check that. But this does enable size-tracking for SLRUs, allowing us to treat them more like relations. This changes the way SLRUs are stored in the repository. Each SLRU segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a separate relish. This removes the need for the SLRU-specific put_slru_truncate() function in the Timeline trait. SLRU truncation is now handled by caling put_unlink() on the segment. This is more in line with how PostgreSQL stores SLRUs and handles their trunction. The SLRUs are "blocky", so they are accessed one 8k page at a time, and repository tracks their size. I considered an alternative design where we would treat each SLRU segment as non-blocky, and just store the whole file as one blob. Each SLRU segment is up to 256 kB in size, which isn't that large, so that might've worked fine, too. One reason I didn't do that is that it seems better to have the WAL redo routines be as close as possible to the PostgreSQL routines. It doesn't matter much in the repository, though; we have to track the size for relations anyway, so there's not much difference in whether we also do it for SLRUs. While working on this, I noticed that the CLOG and MultiXact redo code did not handle wraparound correctly. We need to fix that, but for now, I just commented them out with a FIXME comment.	2021-08-03 14:01:05 +03:00
anastasia	14b6796915	Send pgdata subdirs with basebackup. Fix for `1e6267a`.	2021-07-25 17:46:47 +03:00
Heikki Linnakangas	47824c5fca	Remove page server interactive mode. It was pretty cool, but no one used it, and it had gotten badly out of date. The main interesting thing with it was to see some basic metrics on the fly, while the page server is running, but the metrics collection had been broken for a long time, too. Best to just remove it.	2021-07-23 12:21:21 +03:00
Dmitry Rodionov	767590bbd5	support tenants this patch adds support for tenants. This touches mostly pageserver. Directory layout on disk is changed to contain new layer of indirection. Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant id>. Tenant id has the same format as timeline id. Tenant id is included in pageserver commands when needed. Also new commands are available in pageserver: tenant_list, tenant_create. This is also reflected CLI. During init default tenant is created and it's id is saved in CLI config, so following commands can use it without extra options. Tenant id is also included in compute postgres configuration, so it can be passed via ServerInfo to safekeeper and in connection string to pageserver. For more info see docs/multitenancy.md.	2021-07-22 20:54:20 +03:00
Stas Kelvich	aa404b60fe	change mgmt json format; add cli flags	2021-07-19 14:52:41 +03:00
Stas Kelvich	1b6d99db7c	unfreeze client session upon callback	2021-07-19 14:52:41 +03:00
Stas Kelvich	605b90c6c7	do an actual proxy pass	2021-07-19 14:52:41 +03:00
Stas Kelvich	bf45bef284	md5 auth for postgres_backend.rs	2021-07-19 14:52:41 +03:00
Dmitry Rodionov	8a541147e2	run cargo generate-lockfile It removes remaining issues with running cargo audit. There was one error and one warning: Crate: tokio Version: 1.5.0 Title: Task dropped in wrong thread when aborting `LocalSet` task Date: 2021-07-07 ID: RUSTSEC-2021-0072 URL: https://rustsec.org/advisories/RUSTSEC-2021-0072 Solution: Upgrade to >=1.5.1, <1.6.0 OR >=1.6.3, <1.7.0 OR >=1.7.2, <1.8.0 OR >=1.8.1 Crate: cpuid-bool Version: 0.1.2 Warning: unmaintained Title: `cpuid-bool` has been renamed to `cpufeatures` Date: 2021-05-06 ID: RUSTSEC-2021-0064 URL: https://rustsec.org/advisories/RUSTSEC-2021-0064	2021-07-16 15:04:56 +03:00
Dmitry Rodionov	ed0fcfa9b7	replace parse_duration crate because of unpatched known vulnerability resolves #87	2021-07-16 14:30:27 +03:00
Dmitry Rodionov	75e717fe86	allow both domains and ip addresses in connection options for pageserver and wal keeper. Also updated PageServerNode definition in control plane to account for that. resolves #303	2021-07-09 16:46:21 +03:00
Arseny Sher	37b0236e9a	Move wal acceptor tests to python. Includes fixtures for wal acceptors and associated setup. Nothing really new here, but surprisingly this caught some issues in walproposer. ref #182	2021-06-15 15:14:27 +03:00
Patrick Insinger	cc169a6896	pageserver - config file To simplify cloud ops, allow configuration via file. toml is used as the config format, and the file is stored in the working directory. Arguments used at initialization are saved in the config file. Config file params may be overridden by CLI arguments.	2021-06-14 09:40:22 -07:00
Arseny Sher	b2f51026aa	Consolidate PG proto parsing-deparsing and backend code. Now postgres_backend communicates with the client, passing queries to the provided handler; we have two currently, for wal_acceptor and pageserver. Now BytesMut is again used for writing data to avoid manual message length calculation. ref #118	2021-06-08 17:31:40 +03:00
Heikki Linnakangas	a7ae552851	Use rust memoffset crate to replace C offsetof(). Cherry-picked from Eric's PR #208	2021-06-04 23:05:28 +03:00
Heikki Linnakangas	34f4207501	Refactoring of the Repository/Timeline stuff - All timelines are now stored in the same rocksdb repository. The GET functions have been taught to follow the ancestors. - Change the way relation size is stored. Instead of inserting "tombstone" entries for blocks that are truncated away, store relation size as separate key-value entry for each relation - Add an abstraction for the key-value store: ObjectStore. It allows swapping RocksDB with some other key-value store easily. Perhaps we will write our own storage implementation using that interface, or perhaps we'll need a different abstraction, but this is a small improvement over status quo in any case. - Garbage Collection is broken and commented out. It's not clear where and how it should be implemented.	2021-05-27 20:07:50 +03:00
Heikki Linnakangas	538f903861	Optimize parse_relfilename() function. Compiling a Regex is very expensive, so let's not do it on every invocation. This was consuming a big fraction of the time in creating a new base backup at "zenith pg create". This commits brings down the time to run "zenith pg create" on a freshly created repository from about 2 seconds to 1 second. It's not worth spending much effort on optimizing things at this stage in general, but might as well pick low-hanging fruit like this.	2021-05-19 14:08:37 +03:00
Stas Kelvich	746f667311	Refactor CLI and CLI<->pageserver interfaces to support remote pageserver This patch started as an effort to support CLI working against remote pageserver, but turned into a pretty big refactoring. * CLI now does not look into repository files directly. New commands 'branch_create' and 'identify_system' were introduced into page_service to support that. * Branch management that was scattered between local_env and zenith/main.rs is moved into pageserver/branches.rs. That code could better fit in Repository/Timeline impl, but I'll leave that for a different patch. * All tests-related code from local_env went into integration_tests/src/lib.rs as an extension to PostgresNode trait. * Paths-generating functions were concentrated around corresponding config types (LocalEnv and PageserverConf).	2021-05-17 19:17:51 +03:00
Heikki Linnakangas	b266c28345	Use common Lsn datatype in a few more places This isn't just cosmetic, this also fixes one bug: the code in parse_point_in_time() function used str::parse::<u64>() to parse the parts of the LSN string (e.g. 0/1A2B3C4D). That's wrong, because the LSN consists of hex digits, not base-10.	2021-05-17 10:07:42 +03:00
Patrick Insinger	99d80aba52	use pageserver for pg list command	2021-05-12 12:34:03 +03:00
Eric Seppanen	e5df42feef	add workspace_hack dependency to zenith_utils I didn't think this mattered, but it does: if you add a dependency to zenith_utils, but forget to request a feature you need, the crate will build from the workspace root, but not by itself. It's probably better to pull in the whole dependency tree. This leaves one problem unsolved: the missing feature above will now be a latent bug. If that feature gets removed later by other crates, and then the workspace_hack Cargo.toml is updated, this missing feature will become a build failure.	2021-05-10 18:21:45 -07:00
Eric Seppanen	8d8bc304c1	work around NodeId endian issues Instead of playing games during serialize/deserialize, just treat NodeId::term as an 8-byte array instead of a u64.	2021-05-10 16:21:05 -07:00
Eric Seppanen	0cbb3798da	try using serde to do all the serialization in wal_service This version validates on every call that our result is exactly the same as the previous result. NodeId is a strange corner case: one field is serialized little-endian and one field is serialized big-endian. Hopefully we can fix that in the future.	2021-05-10 16:21:05 -07:00
Eric Seppanen	36c12247b9	add bin_ser module This module adds two traits that implement bincode-based serialization. BeSer implements methods for big-endian encoding/decoding. LeSer implements methods for little-endian encoding/decoding. Right now, the BeSer and LeSer methods have the same names, meaning you can't `use` them both at the same time. This is intended to be a safety mechanism: mixing big-endian and little-endian encoding in the same file is error-prone. There are ways around this, but the easiest fix is to put the big-endian code and little-endian code in different files or submodules.	2021-05-10 16:21:05 -07:00
Eric Seppanen	1767208563	remove tokio-postgres from dependencies	2021-05-10 15:24:55 -07:00
Eric Seppanen	4b46693c81	adapt to new upstream tokio-postgres replication interface Switch over to a newer version of rust-postgres PR752. A few minor changes are required: - PgLsn::UNDEFINED -> PgLsn::from(0) - PgTimestamp -> SystemTime	2021-05-10 15:24:55 -07:00
Eric Seppanen	df5a55c445	add workspace_hack crate Our builds can be a little inconsistent, because Cargo doesn't deal well with workspaces where there are multiple crates which have different dependencies that select different features. As a workaround, copy what other big rust projects do: add a workspace_hack crate. This crate just pins down a set of dependencies and features that satisfies all of the workspace crates. The benefits are: - running `cargo build` from one of the workspace subdirectories now works without rebuilding anything. - running `cargo install` works (without rebuilding anything). - making small dependency changes is much less likely to trigger large dependency rebuilds.	2021-05-07 13:08:31 -07:00
Heikki Linnakangas	e5e5c3e067	Tidy up the `parse_relfilename` function. A few things that Eric commented on at PR #96: - Use thiserror to simplify the implemention of FilePathError - Add unit tests - Fix a few complaints from clippy	2021-05-07 11:01:34 +03:00
Heikki Linnakangas	61af9bb889	Move a few functions that have been copy-pasted around to shared module.	2021-05-06 21:57:10 +03:00
Eric Seppanen	2e0d45d092	Switch to upstream rust-s3 The local fork of rust-s3 has some code to support Google Cloud, but that PR no longer applies upstream, and will need significant changes before it can be re-submitted. In the meantime, we might as well just use the most similar upstream release. The benefit of switching is that it fixes a feature-resolution bug that was causing us to build 24 more crates than needed (mostly async-std and its dependencies).	2021-05-04 12:02:00 -07:00
Eric Seppanen	aac913f9dc	use nix kill instead of spawning a process Since we are now calling the syscall directly, read_pidfile can now parse an integer. We also verify the pid is >= 1, because calling kill on 0 or negative values goes straight to crazytown.	2021-05-03 23:20:51 -07:00
Eric Seppanen	a3818dee58	pin dependencies to versions If there isn't any version specified for a dependency crate, Cargo may choose a newer version. This could happen when Cargo.lock is updated ("cargo update") but can also happen unexpectedly when adding or changing other dependencies. This can allow API-breaking changes to be picked up, breaking the build. To prevent this, specify versions for all dependencies. Cargo is still allowed to pick newer versions that are (hopefully) non-breaking, by analyzing the semver version number. There are two special cases here: 1. serde_derive::{Serialize, Deserialize} isn't really used any more. It was only a separate crate in the past because of compiler limitations. Nowadays, people turn on the "derive" feature of the serde crate and use serde::{Serialize, Deserialize}. 2. parse_duration is unmaintained and has an open security issue. (gh iss. 87) That issue probably isn't critical for us because of where we use that crate, but it's probably still better to pin the version so we can't get hit with an API-breaking change at an awkward time.	2021-05-03 14:02:10 -07:00
Eric Seppanen	7d104e5660	update dependencies Running 'cargo update' happens to synchronize a few transitive dependencies, allowing us to build slightly fewer crates.	2021-05-02 16:01:18 -07:00
Konstantin Knizhnik	3b09a74f58	Implement offloading of old WAL files to S3 in walkeeper	2021-04-26 16:23:00 +03:00
Heikki Linnakangas	3b9e7fc5e6	Use explicit threads. Remove 'async' usage a much as feasible. Async code is harder to debug, and mixing async and non-async code is a recipe for confusion and bugs. There are a couple of exceptions: - The code in walredo.rs, which needs to read and write to the child process simultaneously, still uses async. It's more convenient there. The 'async' usage is carefully limited to just the functions that communicate with the child process. - Code in walreceiver.rs that uses tokio-postgres to do streaming replication. We have to use async there, because tokio-postgres is async. Most rust-postgres functionality has non-async wrappers, but not the new replication client code. The async usage is very limited here, too: we use just block_on to call the tokio-postgres functions. The code in 'page_service.rs' now launches a dedicated thread for each connection. This replaces tokio::sync:⌚:channel with std::sync:mpsc in 'seqwait.rs', to make that non-async. It's not a drop-in replacement, though: std::sync::mpsc doesn't support multiple consumers, so we cannot share a channel between multiple waiters. So this removes the code to check if an existing channel can be reused, and creates a new one for each waiter. That created another problem: BTreeMap cannot hold duplicates, so I replaced that with BinaryHeap. Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo manager and PageCache are replaced with std::sync::mpsc. (There is no separate 'oneshot' channel in the standard library.) Fixes github issue #58, and coincidentally also issue #66.	2021-04-26 13:07:51 +03:00
Heikki Linnakangas	93d7d2ae2a	Refactor pagecache <-> Wal redo communication After the rocksdb patch (commit `6aa38d3f7d`), the CacheEntry struct was used only momentarily in the communication between the page_cache and the walredo modules. It was in fact not stored in any cache anymore. For clarity, refactor the communication. There is now a WalRedoManager struct, with `request_redo` function, that can be used to request WAL replay of a particular page. It sends a request to a queue like before, but the queue has been replaced with tokio::sync::mpsc. Previously, the resulting page image was stored directly in the CacheEntry, and the requestor was notified using a condition variable. Now, the requestor includes a 'oneshot' channel in the request, and the WAL redo manager sends the response there.	2021-04-24 12:24:04 +03:00
Konstantin Knizhnik	28f2800275	Merge branch 'main' into rocksdb_pageserver	2021-04-22 14:00:57 +03:00
Heikki Linnakangas	8af5cbedb1	Move xlog_utils.rs to postgres_ffi module. I had copy-pasted these functions to a few other places. Clean that up, move them to a common module, and add some comments.	2021-04-22 13:22:34 +03:00
Konstantin Knizhnik	ed30f2096c	Disable GC by default	2021-04-22 11:30:27 +03:00
Konstantin Knizhnik	da9508716d	Address issues from Eric's review	2021-04-22 10:37:52 +03:00
Konstantin Knizhnik	9e7c45cb72	Merge with master	2021-04-22 09:45:13 +03:00
Eric Seppanen	2cd730d31f	page_cache: replace long mutex sleep with SeqWait When calling into the page cache, it was possible to wait on a blocking mutex, which can stall the async executor. Replace that sleep with a SeqWait::wait_for(lsn).await so that the executor can go on with other work while we wait. Change walreceiver_works to an AtomicBool to avoid the awkwardness of taking the lock, then dropping it while we call wait_for and then acquiring it again to do real work.	2021-04-21 18:02:13 -07:00
Eric Seppanen	8060e17b50	add SeqWait SeqWait adds a way to .await the arrival of some sequence number. It provides wait_for(num) which is an async fn, and advance(num) which is synchronous. This should be useful in solving the page cache deadlocks, and may be useful in other areas too. This implementation still uses a Mutex internally, but only for a brief critical section. If we find this code broadly useful and start to care more about executor stalls due to unfair thread scheduling, there might be ways to make it lock-free.	2021-04-21 18:02:13 -07:00
Konstantin Knizhnik	4f3f0304c2	Merge branch 'main' into rocksdb_pageserver	2021-04-21 19:05:02 +03:00

1 2

81 Commits