rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-13 08:22:55 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	2c94775ca8	Use big integer arithmetic to avoid multiplication overflow	2022-03-31 12:53:41 +03:00
Konstantin Knizhnik	a31bc88e46	Fix race condition in image layer refer #1439	2022-03-30 18:18:33 +03:00
Konstantin Knizhnik	8b137dfcd2	Check for valid LSN range in count_deltas	2022-03-30 12:55:01 +03:00
Konstantin Knizhnik	258b1d4935	Aff num-traits crate	2022-03-30 11:35:30 +03:00
Konstantin Knizhnik	a1f6bbb076	Use wrapping multiplication in R-Tree	2022-03-30 08:16:20 +03:00
Konstantin Knizhnik	b1a9f292b2	Fix tests	2022-03-28 20:19:49 +03:00
Konstantin Knizhnik	22614f74b1	Remove redundant LayerEnveloper::ne method	2022-03-28 16:39:04 +03:00
Konstantin Knizhnik	cd0fdada82	Use R-Tree for layer map	2022-03-28 16:15:28 +03:00
Heikki Linnakangas	07342f7519	Major storage format rewrite. This is a backwards-incompatible change. The new pageserver cannot read repositories created with an old pageserver binary, or vice versa. Simplify Repository to a value-store ------------------------------------ Move the responsibility of tracking relation metadata, like which relations exist and what are their sizes, from Repository to a new module, pgdatadir_mapping.rs. The interface to Repository is now a simple key-value PUT/GET operations. It's still not any old key-value store though. A Repository is still responsible from handling branching, and every GET operation comes with an LSN. Mapping from Postgres data directory to keys/values --------------------------------------------------- All the data is now stored in the key-value store. The 'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects like relation pages and SLRUs, to key-value pairs. The key to the Repository key-value store is a Key struct, which consists of a few integer fields. It's wide enough to store a full RelFileNode, fork and block number, and to distinguish those from metadata keys. 'pgdatadir_mapping.rs' is also responsible for maintaining a "partitioning" of the keyspace. Partitioning means splitting the keyspace so that each partition holds a roughly equal number of keys. The partitioning is used when new image layer files are created, so that each image layer file is roughly the same size. The partitioning is also responsible for reclaiming space used by deleted keys. The Repository implementation doesn't have any explicit support for deleting keys. Instead, the deleted keys are simply omitted from the partitioning, and when a new image layer is created, the omitted keys are not copied over to the new image layer. We might want to implement tombstone keys in the future, to reclaim space faster, but this will work for now. Changes to low-level layer file code ------------------------------------ The concept of a "segment" is gone. Each layer file can now store an arbitrary range of Keys. Checkpointing, compaction ------------------------- The background tasks are somewhat different now. Whenever checkpoint_distance is reached, the WAL receiver thread "freezes" the current in-memory layer, and creates a new one. This is a quick operation and doesn't perform any I/O yet. It then launches a background "layer flushing thread" to write the frozen layer to disk, as a new L0 delta layer. This mechanism takes care of durability. It replaces the checkpointing thread. Compaction is a new background operation that takes a bunch of L0 delta layers, and reshuffles the data in them. It runs in a separate compaction thread. Deployment ---------- This also contains changes to the ansible scripts that enable having multiple different pageservers running at the same time in the staging environment. We will use that to keep an old version of the pageserver running, for clusters created with the old version, at the same time with a new pageserver with the new binary. Author: Heikki Linnakangas Author: Konstantin Knizhnik <knizhnik@zenith.tech> Author: Andrey Taranik <andrey@zenith.tech> Reviewed-by: Matthias Van De Meent <matthias@zenith.tech> Reviewed-by: Bojan Serafimov <bojan@zenith.tech> Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech> Reviewed-by: Anton Shyrabokau <antons@zenith.tech> Reviewed-by: Dhammika Pathirana <dham@zenith.tech> Reviewed-by: Kirill Bulatov <kirill@zenith.tech> Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech> Reviewed-by: Alexey Kondratov <alexey@zenith.tech>	2022-03-28 05:41:15 -05:00
Kirill Bulatov	55de0b88f5	Hide remote timeline index access details	2022-03-28 12:36:01 +03:00
Heikki Linnakangas	e3fa00972e	Use RwLocks in image and delta layers for more concurrency. With a Mutex, only one thread could read from the layer at a time. I did some ad hoc profiling with pgbench and saw that a fair amout of time was spent blocked on these Mutexes.	2022-03-25 15:34:38 +02:00
Kirill Bulatov	b39d1b1717	Exit only on important thread failures	2022-03-25 11:58:54 +02:00
Kirill Bulatov	28bc8e3f5c	Log pageserver threads better and shut down on errors in them	2022-03-25 11:58:54 +02:00
Kirill Bulatov	f6b1d76c30	Replace assert! with ensure! for anyhow::Result functions	2022-03-25 11:58:54 +02:00
Kirill Bulatov	edc7bebcb5	Remove obvious panic sources	2022-03-25 11:58:54 +02:00
Heikki Linnakangas	825d363170	Remove some unnecessary Ord etc. trait implementations. It doesn't make much sense to compare TimelineMetadata structs with < or >. But we depended on that in the remote storage upload code, so replace BTreeSets with Vecs there.	2022-03-24 12:20:06 +02:00
Heikki Linnakangas	c718870517	Tiny refactoring of page_cache::init function. The init function only needs the 'page_cache_size' from the config, so seems slightly nicer to pass just that.	2022-03-24 09:46:07 +02:00
Dmitry Rodionov	8437fc056e	some follow ups after s3 integration was enabled on staging * do not error out when upload file list is empty * ignore ephemeral files during sync initialization	2022-03-23 23:35:36 +04:00
Dmitry Rodionov	8b8d78a3a0	use main branch of our bookfile crate	2022-03-23 22:05:43 +04:00
Dmitry Rodionov	8a86276a6e	add more context to error	2022-03-23 18:38:15 +04:00
Dmitry Rodionov	0be7ed0cb5	decrease log message severity for timeline checkpoint internals	2022-03-23 18:20:43 +04:00
Dmitry Rodionov	e80ae4306a	change log level from info to debug for timeline gc messages	2022-03-23 18:20:43 +04:00
Kirill Bulatov	bd6bef468c	Provide single list timelines HTTP API handle	2022-03-21 13:42:21 +02:00
Kirill Bulatov	063f9ba81d	Use serde_with to (de)serialize ZId and Lsn to hex	2022-03-21 12:46:07 +02:00
Heikki Linnakangas	3b069f5aef	Fix name of directory used in unit test. There's another test called 'timeline_load'. If the two tests run in parallel, they would conflict and fail.	2022-03-18 21:27:48 +02:00
Dmitry Rodionov	b19870cd88	guard against partial uploads to local storage	2022-03-18 18:14:57 +03:00
Dmitry Rodionov	7738254f83	refactor timeline memory state management	2022-03-18 18:14:57 +03:00
Dhammika Pathirana	5d7bd8643a	Fix page reconstruct time histo Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>	2022-03-10 14:42:28 -08:00
Dhammika Pathirana	27dadba52c	Fix retain references to layer histograms Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>	2022-03-10 14:42:28 -08:00
Dhammika Pathirana	f67d010d1b	Add ps smgr/storage metrics tenant tags Signed-off-by: Dhammika Pathirana <dhammika@gmail.com> Add tenant_id,timeline_id in smgr/storage metrics (#1234)	2022-03-10 14:42:28 -08:00
Kirill Bulatov	093ad8ab59	Send 409 HTTP responses on timeline and tenant creation for existing entity	2022-03-10 19:38:58 +02:00
Kirill Bulatov	c51d545fd9	Serialize Lsn as strings in http api	2022-03-10 19:38:58 +02:00
Kirill Bulatov	fe6fccfdae	Allow already existing repo when creating a tenant	2022-03-10 19:38:58 +02:00
Kirill Bulatov	dd74c66ef0	Do not create timeline along with tenant	2022-03-10 19:38:58 +02:00
Kirill Bulatov	a5e10c4f64	Tidy up pageserver's endpoints	2022-03-10 19:38:58 +02:00
Kirill Bulatov	7b5482bac0	Properly store the branch name mappings	2022-03-10 19:38:58 +02:00
Kirill Bulatov	c7569dce47	Allow passing initial timeline id into zenith CLI commands	2022-03-10 19:38:58 +02:00
Kirill Bulatov	4d0f7fd1e4	Update Zenith CLI config between runs	2022-03-10 19:38:58 +02:00
Kirill Bulatov	f49990ed43	Allow creating timelines by branching off ancestors	2022-03-10 19:38:58 +02:00
Kirill Bulatov	0c91091c63	Avoid point in time concept on pageserver level	2022-03-10 19:38:58 +02:00
Kirill Bulatov	10f811e886	Use `timeline` instead of `branch` in pageserver's API	2022-03-10 19:38:58 +02:00
Kirill Bulatov	9424bfae22	Use a separate newtype for ZId that (de)serialize as hex strings	2022-03-04 10:58:40 +02:00
Dmitry Rodionov	1d90b1b205	add node id to pageserver (#1310 ) * Add --id argument to safekeeper setting its unique u64 id. In preparation for storage node messaging. IDs are supposed to be monotonically assigned by the console. In tests it is issued by ZenithEnv; at the zenith cli level and fixtures, string name is completely replaced by integer id. Example TOML configs are adjusted accordingly. Sequential ids are chosen over Zid mainly because they are compact and easy to type/remember. * add node id to pageserver This adds node id parameter to pageserver configuration. Also I use a simple builder to construct pageserver config struct to avoid setting node id to some temporary invalid value. Some of the changes in test fixtures are needed to split init and start operations for envrionment. Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2022-03-04 01:10:42 +03:00
Kirill Bulatov	949f8b4633	Fix 1.59 rustc clippy warnings	2022-03-02 21:35:34 +02:00
Heikki Linnakangas	5120ba4b5f	Refactor the interface for using cached page image. Instead of passing it as a separate argument to get_page_reconstruct_data, the caller can fill it in the PageReconstructData struct.	2022-02-24 10:37:12 +02:00
Heikki Linnakangas	e4670a5f1e	Remove the PageVersions abstraction. Since commit `fdd987c3ad`, it was only used in InMemoryLayers. Let's just "inline" the code into InMemoryLayer itself. I originally did this as part of a bigger PR (#1267). With that PR, one in-memory layer, and one ephemeral file, would hold page versions belonging to multiple segments. Currently, PageVersions can only hold versions for a single segment, so that would need to be changed. Rather than modify PageVersions to support that, just remove it altogether.	2022-02-23 21:04:39 +02:00
Heikki Linnakangas	7fae894648	Move a few unit tests specific to layered file format. These tests have intimate knowledge of the directory layeout and layer file names used by the LayeredRepository implementation of the Repository trait. Move them, so that all the tests that remain in repository.rs are expected to work without changes with any implementation of Repository. Not that we have any plans to create another Repository implementaiton any time soon, but as long as we have the Repository interface, let's try to maintain that abstraction in the tests too.	2022-02-23 20:32:06 +02:00
anastasia	87edbd38c7	Add 'wait_lsn_timeout' and 'wal_redo_timeout' pageserver config options instead of hardcoded defaults	2022-02-23 19:59:35 +03:00
Heikki Linnakangas	468366a28f	Fix wrong 'lsn' stored in test page image The test creates a page version with a string like "foo 123 at 0/10" as the content. But the LSN stored in that string was wrong: the page version stored at LSN 0/20 would say "foo <blk> at 0/10".	2022-02-23 11:33:17 +02:00
anastasia	1a4682a04a	Add 'walreceiver-after-ingest' failpoint. Use sleep at this point to imitate slow walreceiver.	2022-02-22 13:56:21 +03:00

1 2 3 4 5 ...

643 Commits