rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
anastasia	6984d33b4e	Run GC and checkpointer separate threads. Add checkpoint_period configuration parameter	2021-09-16 12:33:50 +03:00
anastasia	98d4f9cea5	Add checkpoint_distance config parameter. - Change hardcoded OLDEST_INMEM_DISTANCE value to pageserver config option checkpoint_distance. - Get rid of 'force' flag in checkpoint_internal(). Use checkpoint_distance=0 instead.	2021-09-16 12:33:50 +03:00
Dmitry Rodionov	4ebe643d0c	Support parallel test running for python tests Support is done via pytest-xdist plugin. To use the feature add -n<concurrency> to pytest invocation e.g. pytest -n8 to run 8 tests in parallel. Changes in code are mostly about ports assigning. Previously port for pageserver was hardcoded without the ability to override through zenith cli and ports for started compute nodes were calculated twice, in zenith cli and in test code. Now zenith cli supports port arguments for pageserver and compute nodes to be passed explicitly. Tests are modified in such a way that each worker gets a non overlapping port range which can be configured and now contains 100 ports. These ports are distributed to test services (pageserver, wal acceptors, compute nodes) so they can work independently.	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	84008a2560	factor out common logging initialisation routine This contains a lowest common denominator of pageserver and safekeeper log initialisation routines. It uses daemonize flag to decide where to stream log messages. In case daemonize is true log messages are forwarded to file. Otherwise streaming to stdout is used. Usage of stdout for log output is the default in docker side of things, so make it easier to browse our logs via builtin docker commands.	2021-09-14 18:09:14 +03:00
Heikki Linnakangas	04ee1d5977	Add test for managing old open segments in binary heap. I thought this test would trigger the bug fixed previous commit, but it did not. More tests are nice in any case.	2021-09-07 18:10:07 +03:00
Heikki Linnakangas	b949127b06	Rename page_cache.rs to tenant_mgr.rs. Once upon a time, 'page_cache.rs' contained an actual page cache, but it hasn't for a very long time. Rename to reflect what it actually does these days.	2021-08-30 15:17:30 +03:00
Heikki Linnakangas	4046530160	Remove remnants of choosing between repository formats. Now that we only have one Repository implementation, no need for the command-line options to choose it either. I'm removing these as a separate commit to show what we will need to do if we add another Repository implementation in the future (even though I don't foresee us doing that any time soon)	2021-08-25 18:37:22 +03:00
Heikki Linnakangas	5998744bcc	Remove rocksdb implementation. The layered storage format is good enough that we don't need the rocksdb implementation anymore. There are a lot of known issues but we'll keep working on them.	2021-08-25 18:37:22 +03:00
Dmitry Rodionov	23b5249512	translate pageserver api to http	2021-08-24 19:05:00 +03:00
Heikki Linnakangas	2450f82de5	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>" The unit tests have been refactored to exercise both implementations. 'layered' is now the default. Push/pull is not implemented. The 'test_history_inmemory' test has been commented out accordingly. It's not clear how we will implement that functionality; probably by copying the snapshot files directly.	2021-08-16 10:06:48 +03:00
Dmitry Rodionov	ce5333656f	Introduce authentication v0.1. Current state with authentication. Page server validates JWT token passed as a password during connection phase and later when performing an action such as create branch tenant parameter of an operation is validated to match one submitted in token. To allow access from console there is dedicated scope: PageServerApi, this scope allows access to all tenants. See code for access validation in: PageServerHandler::check_permission. Because we are in progress of refactoring of communication layer involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper now doesn’t check token passed from compute, and uses “hardcoded” token passed via environment variable to communicate with pageserver. Compute postgres now takes token from environment variable and passes it as a password field in pageserver connection. It is not passed through settings because then user will be able to retrieve it using pg_settings or SHOW .. I’ve added basic test in test_auth.py. Probably after we add authentication to remaining network paths we should enable it by default and switch all existing tests to use it.	2021-08-11 20:05:54 +03:00
Dmitry Ivanov	cb1b4a12a6	Add some prometheus metrics to pageserver The metrics are served by an http endpoint, which is meant to be spawned in a new thread. In the future the endpoint will provide more APIs, but for the time being, we won't bother with proper routing.	2021-08-03 21:42:24 +03:00
Heikki Linnakangas	9ff122835f	Refactor ObjectTags, intruducing a new concept called "relish" This clarifies - I hope - the abstractions between Repository and ObjectRepository. The ObjectTag struct was a mix of objects that could be accessed directly through the public Timeline interface, and also objects that were created and used internally by the ObjectRepository implementation and not supposed to be accessed directly by the callers. With the RelishTag separaate from ObjectTag, the distinction is more clear: RelishTag is used in the public interface, and ObjectTag is used internally between object_repository.rs and object_store.rs, and it contains the internal metadata object types. One awkward thing with the ObjectTag struct was that the Repository implementation had to distinguish between ObjectTags for relations, and track the size of the relation, while others were used to store "blobs". With the RelishTags, some relishes are considered "non-blocky", and the Repository implementation is expected to track their sizes, while others are stored as blobs. I'm not 100% happy with how RelishTag captures that either: it just knows that some relish kinds are blocky and some non-blocky, and there's an is_block() function to check that. But this does enable size-tracking for SLRUs, allowing us to treat them more like relations. This changes the way SLRUs are stored in the repository. Each SLRU segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a separate relish. This removes the need for the SLRU-specific put_slru_truncate() function in the Timeline trait. SLRU truncation is now handled by caling put_unlink() on the segment. This is more in line with how PostgreSQL stores SLRUs and handles their trunction. The SLRUs are "blocky", so they are accessed one 8k page at a time, and repository tracks their size. I considered an alternative design where we would treat each SLRU segment as non-blocky, and just store the whole file as one blob. Each SLRU segment is up to 256 kB in size, which isn't that large, so that might've worked fine, too. One reason I didn't do that is that it seems better to have the WAL redo routines be as close as possible to the PostgreSQL routines. It doesn't matter much in the repository, though; we have to track the size for relations anyway, so there's not much difference in whether we also do it for SLRUs. While working on this, I noticed that the CLOG and MultiXact redo code did not handle wraparound correctly. We need to fix that, but for now, I just commented them out with a FIXME comment.	2021-08-03 14:01:05 +03:00
anastasia	1e6267a35f	Get rid of snapshot directory + related code cleanup and refactoring. - Add new subdir postgres_ffi/samples/ for config file samples. - Don't copy wal to the new branch on zenith init or zenith branch. - Import_timeline_wal on zenith init.	2021-07-23 13:21:45 +03:00
Heikki Linnakangas	47824c5fca	Remove page server interactive mode. It was pretty cool, but no one used it, and it had gotten badly out of date. The main interesting thing with it was to see some basic metrics on the fly, while the page server is running, but the metrics collection had been broken for a long time, too. Best to just remove it.	2021-07-23 12:21:21 +03:00
Dmitry Rodionov	767590bbd5	support tenants this patch adds support for tenants. This touches mostly pageserver. Directory layout on disk is changed to contain new layer of indirection. Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant id>. Tenant id has the same format as timeline id. Tenant id is included in pageserver commands when needed. Also new commands are available in pageserver: tenant_list, tenant_create. This is also reflected CLI. During init default tenant is created and it's id is saved in CLI config, so following commands can use it without extra options. Tenant id is also included in compute postgres configuration, so it can be passed via ServerInfo to safekeeper and in connection string to pageserver. For more info see docs/multitenancy.md.	2021-07-22 20:54:20 +03:00
anastasia	c913404739	Redirect log to pageserver.log during zenith init. Add new module logger.rs that contains shared code to init logging	2021-07-21 18:56:34 +03:00
sharnoff	c4b2bf7ebd	Use 'zenith_admin' as superuser name in `initdb`	2021-07-21 17:22:22 +03:00
Dmitry Rodionov	75e717fe86	allow both domains and ip addresses in connection options for pageserver and wal keeper. Also updated PageServerNode definition in control plane to account for that. resolves #303	2021-07-09 16:46:21 +03:00
Heikki Linnakangas	4f1b22a2c8	Use ObjectTag enum instead of special fork number to store metadata objects. Extracted from Konstantin's larger PR: https://github.com/zenithdb/zenith/pull/268	2021-06-22 21:34:31 +03:00
Heikki Linnakangas	34f4207501	Refactoring of the Repository/Timeline stuff - All timelines are now stored in the same rocksdb repository. The GET functions have been taught to follow the ancestors. - Change the way relation size is stored. Instead of inserting "tombstone" entries for blocks that are truncated away, store relation size as separate key-value entry for each relation - Add an abstraction for the key-value store: ObjectStore. It allows swapping RocksDB with some other key-value store easily. Perhaps we will write our own storage implementation using that interface, or perhaps we'll need a different abstraction, but this is a small improvement over status quo in any case. - Garbage Collection is broken and commented out. It's not clear where and how it should be implemented.	2021-05-27 20:07:50 +03:00
Alexey Kondratov	b5f60f3874	Issue #144 : Refactor errors handling during branches tree printing	2021-05-20 12:49:04 +03:00
Heikki Linnakangas	1912546e52	Change the meaning of PageServerConf.workdir Commit `746f667311` added the 'workdir' field and the get__path() functions, with the idea that we cd into the directory at page server startup, so that the get__path() functions can always return paths relative to '.', but 'workdir' shows the original path to it. Change it so that 'conf.workdir' is always set to '.', too, and the get__path() functions include 'workdir' in the returned paths. Why? Because that allows writing unit tests without changing the current directory. When I was working on commit `97992226d3`, I initially wrote the test so that it changed the current working directory, just like commit `746f667311` did. But that was problematic, when I tried to add another unit test that also* wants to change the current working dir, because they could then not run concurrently. In fact, they could not even run serially, unless the current directory was carefully reset after the test. So it is better to avoid changing the current directory in tests.	2021-05-19 08:49:16 +03:00
Eric Seppanen	398d522d88	cargo fmt	2021-05-17 09:29:58 -07:00
Stas Kelvich	746f667311	Refactor CLI and CLI<->pageserver interfaces to support remote pageserver This patch started as an effort to support CLI working against remote pageserver, but turned into a pretty big refactoring. * CLI now does not look into repository files directly. New commands 'branch_create' and 'identify_system' were introduced into page_service to support that. * Branch management that was scattered between local_env and zenith/main.rs is moved into pageserver/branches.rs. That code could better fit in Repository/Timeline impl, but I'll leave that for a different patch. * All tests-related code from local_env went into integration_tests/src/lib.rs as an extension to PostgresNode trait. * Paths-generating functions were concentrated around corresponding config types (LocalEnv and PageserverConf).	2021-05-17 19:17:51 +03:00
Patrick Insinger	99d80aba52	use pageserver for pg list command	2021-05-12 12:34:03 +03:00
Eric Seppanen	0cbb3798da	try using serde to do all the serialization in wal_service This version validates on every call that our result is exactly the same as the previous result. NodeId is a strange corner case: one field is serialized little-endian and one field is serialized big-endian. Hopefully we can fix that in the future.	2021-05-10 16:21:05 -07:00
Heikki Linnakangas	b484b896b6	Refactor the functionality page_cache.rs. This moves things around: - The PageCache is split into two structs: Repository and Timeline. A Repository holds multiple Timelines. In order to get a page version, you must first get a reference to the Repository, then the Timeline in the repository, and finally call the get_page_at_lsn() function on the Timeline object. This sounds complicated, but because each connection from a compute node, and each WAL receiver, only deals with one timeline at a time, the callers can get the reference to the Timeline object once and hold onto it. The Timeline corresponds most closely to the old PageCache object. - Repository and Timeline are now abstract traits, so that we can support multiple implementations. I don't actually expect us to have multiple implementations for long. We have the RocksDB implementation now, but as soon as we have a different implementation that's usable, I expect that we will retire the RocksDB implementation. But I think this abstraction works as good documentation in any case: it's now easier to see what the interface for storing and loading pages from the repository is, by looking at the Repository/Timeline traits. They abstract traits are in repository.rs, and the RocksDB implementation of them is in repository/rocksdb.rs. - page_cache.rs is now a "switchboard" to get a handle to the repository. Currently, the page server can only handle one repository at a time, so there isn't much there, but in the future we might do multi-tenancy there.	2021-05-05 10:37:36 +03:00
anastasia	e7b112aacc	Refactor pg_constants. Move them to postgres_ffi/	2021-04-29 18:41:42 +03:00
Konstantin Knizhnik	f3192ee415	Merge branch 'main' into rocksdb_pageserver	2021-04-22 09:45:42 +03:00
Konstantin Knizhnik	9e7c45cb72	Merge with master	2021-04-22 09:45:13 +03:00
Heikki Linnakangas	18ba16aaac	Fix and improve comment on ZTimelineId. The comment was incorrect, claiming that ZTimelineId is a 32-byte value. It is actually 16 bytes wide. While we're at it, improve the comment, explaining what a zenith timeline is, and why it's different from PostgreSQL timelines.	2021-04-22 09:25:53 +03:00
Konstantin Knizhnik	c981f4ad66	Implement garbage collection of unused versions	2021-04-21 19:04:30 +03:00
Konstantin Knizhnik	d8fa2ec367	Merge with main branch	2021-04-21 16:10:05 +03:00
Eric Seppanen	92e4f4b3b6	cargo fmt	2021-04-20 17:59:56 -07:00
Heikki Linnakangas	d047a3abf7	Fixes, per Eric's and Konstantin's comments	2021-04-20 19:11:29 +03:00
Heikki Linnakangas	f69db17409	Make WAL safekeeper work with zenith timelines	2021-04-20 19:11:29 +03:00
Heikki Linnakangas	3600b33f1c	Implement "timelines" in page server This replaces the page server's "datadir" concept. The Page Server now always works with a "Zenith Repository". When you initialize a new repository with "zenith init", it runs initdb and loads an initial basebackup of the freshly-created cluster into the repository, on "main" branch. Repository can hold multiple "timelines", which can be given human-friendly names, making them "branches". One page server simultaneously serves all timelines stored in the repository, and you can have multiple Postgres compute nodes connected to the page server, as long they all operate on a different timeline. There is a new command "zenith branch", which can be used to fork off new branches from existing branches. The repository uses the directory layout desribed as Repository format v1 in https://github.com/zenithdb/rfcs/pull/5. It it highly inefficient: - we never create new snapshots. So in practice, it's really just a base backup of the initial empty cluster, and everything else is reconstructed by redoing all WAL - when you create a new timeline, the base snapshot and all WAL is copied from the new timeline to the new one. There is no smarts about referencing the old snapshots/wal from the ancestor timeline. To support all this, this commit includes a bunch of other changes: - Implement "basebackup" funtionality in page server. When you initialize a new compute node with "zenith pg create", it connects to the page server, and requests a base backup of the Postgres data directory on that timeline. (the base backup excludes user tables, so it's not as bad as it sounds). - Have page server's WAL receiver write the WAL into timeline dir. This allows running a Page Server and Compute Nodes without a WAL safekeeper, until we get around to integrate that properly into the system. (Even after we integrate WAL safekeeper, this is perhaps how this will operate when you want to run the system on your laptop.) - restore_datadir.rs was renamed to restore_local_repo.rs, and heavily modified to use the new format. It now also restores all WAL. - Page server no longer scans and restores everything into memory at startup. Instead, when the first request is made for a timeline, the timeline is slurped into memory at that point. - The responsibility for telling page server to "callmemaybe" was moved into Postgres libpqpagestore code. Also, WAL producer connstring cannot be specified in the pageserver's command line anymore. - Having multiple "system identifiers" in the same page server is no longer supported. I repurposed much of that code to support multiple timelines, instead. - Implemented very basic, incomplete, support for PostgreSQL's Extended Query Protocol in page_service.rs. Turns out that rust-postgres' copy_out() function always uses the extended query protocol to send out the command, and I'm using that to stream the base backup from the page server. TODO: I haven't fixed the WAL safekeeper for this scheme, so all the integration tests involving safekeepers are failing. My plan is to modify the safekeeper to know about Zenith timelines, too, and modify it to work with the same Zenith repository format. It only needs to care about the '.zenith/timelines/<timeline>/wal' directories.	2021-04-20 19:11:27 +03:00
Eric Seppanen	52d6275812	drop nonfunctional attributes allow(dead_code) These had no effect, so remove them.	2021-04-16 15:59:32 -07:00
anastasia	1190030872	handle SLRU in restore_datadir	2021-04-15 16:43:03 +03:00
lubennikovaav	82dc1e82ba	Restore pageserver from s3 or local datadir (#9 ) * change pageserver --skip-recovery option to --restore-from=[s3\|local] * implement restore from local pgdata * add simple test for local restore	2021-04-14 21:14:10 +03:00
Stas Kelvich	c0fcbbbe0c	Cargo fmt pass over a codebase	2021-04-06 14:42:13 +03:00
Heikki Linnakangas	1367332447	Separate walkeeper and pageserver sources into different directories. The integration tests, which depend on both walkeeper and pageserver, are moved into yet another directory, 'integration_tests'.	2021-04-06 13:15:26 +03:00

43 Commits