rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 14:02:55 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	263d60f12d	Add prometheus metric for time spent waiting for WAL to arrive	2022-04-21 16:37:32 +03:00
Kirill Bulatov	81cad6277a	Move and library crates into a dedicated directory and rename them	2022-04-21 13:30:33 +03:00
Heikki Linnakangas	9d3779c124	Add a counter for materialized page cache hits.	2022-04-20 21:26:03 +03:00
Heikki Linnakangas	334a1d6b5d	Fix materialized page caching with delta layers. We only checked the cache page version when collecting WAL records in an in-memory layer, not in a delta layer. Refactor the code so that we always stop collecting WAL records when we reach a cached materialized page. Fix the assertion on the LSN range in InMemoryLayer::get_value_reconstruct_data. It was supposed to check that the requested LSN range is within the layer's LSN range, but the inequality was backwards. That went unnoticed before, because the caller always passed the layer's start LSN as the requested LSN range's start LSN, but now we might stop the search earlier, if we have a cached page version. Co-authored-by: Konstantin Knizhnik <knizhnik@zenith.tech>	2022-04-20 21:25:59 +03:00
Dmitry Rodionov	e41ad3be0f	add more context to writeback error	2022-04-20 17:07:07 +03:00
Heikki Linnakangas	e113c6fa8d	Print a warning if unlinking an ephemeral file fails. Unlink failure isn't serious on its own, we were about to remove the file anyway, but it shouldn't happen and could be a symptom of something more serious. We just saw "No such file or directory" errors happening from ephemeral file writeback in staging, and I suspect if we had this warning in place, we would have seen these warnings too, if the problem was that the ephemeral file was removed before dropping the EphemeralFile struct. Next time it happens, we'll have more information.	2022-04-20 16:23:16 +03:00
Kirill Bulatov	91fb21225a	Show more logs during S3 sync	2022-04-20 02:57:03 +03:00
Kirill Bulatov	3e6087a12f	Remove S3 archiving	2022-04-19 23:13:52 +03:00
Kirill Bulatov	44bfc529f6	Require specifying the upload size in remote storage	2022-04-19 23:13:52 +03:00
Kirill Bulatov	0ca2bd929b	Remove log crate from pageserver	2022-04-18 00:00:36 +03:00
Heikki Linnakangas	93e0ac2b7a	Remove a couple of unused dependencies. Found by "cargo-udeps"	2022-04-14 17:38:26 +03:00
bojanserafimov	d5ae9db997	Add s3 cost estimate to tests (#1478 )	2022-04-14 10:09:03 -04:00
Heikki Linnakangas	9e4de6bed0	Use RwLock instad of Mutex for layer map lock. For more concurrency	2022-04-14 13:34:01 +03:00
Dhammika Pathirana	a0781f229c	Add ps compact command Signed-off-by: Dhammika Pathirana <dhammika@gmail.com> Add ps compact command to api (#707) (#1484)	2022-04-13 22:47:13 -07:00
Dmitry Rodionov	49da76237b	remove noisy debug log message	2022-04-13 19:50:31 +03:00
Dhammika Pathirana	1fd08107ca	Add ps compaction_threshold config Signed-off-by: Dhammika Pathirana <dhammika@gmail.com> Add ps compaction_threadhold knob for (#707) (#1484)	2022-04-13 07:42:58 -07:00
Dmitry Rodionov	20414c4b16	defuse possible deadlock in download_timeline too	2022-04-13 10:05:19 +03:00
Dmitry Rodionov	9b7a8e67a4	fix deadlock in upload_timeline_checkpoint It originated from the fact that we were calling to fetch_full_index without releasing the read guard, and fetch_full_index tries to acquire read again. For plain mutex it is already a deeadlock, for RW lock deadlock was achieved by an attempt to acquire write access later in the code while still having active read guard up in the stack This is sort of a bandaid because Kirill plans to change this code during removal of an archiving mechanism	2022-04-13 10:05:19 +03:00
Kirill Bulatov	dc7e3ff05a	Fix rustc 1.60 clippy warnings	2022-04-11 21:34:04 +03:00
Kirill Bulatov	4f172e7612	Replicate S3 blob metadata in the remote storage	2022-04-11 21:34:04 +03:00
Kirill Bulatov	db63fa64ae	Use rusoto lib for S3 relish_storage impl	2022-04-11 21:34:04 +03:00
Heikki Linnakangas	214567bf8f	Use B-tree for the index in image and delta layers. We now use a page cache for those, instead of slurping the whole index into memory. Fixes https://github.com/zenithdb/zenith/issues/1356 This is a backwards-incompatible change to the storage format, so bump STORAGE_FORMAT_VERSION.	2022-04-07 20:58:55 +03:00
Heikki Linnakangas	c4b57e4b8f	Move BlobRef It's not needed in image layers anymore, so move it into delta_layer.rs	2022-04-07 20:58:55 +03:00
Heikki Linnakangas	5d9851f5d1	Refactor the I/O functions. This introduces two new abstraction layers for I/O: - Block I/O, and - Blob I/O. The BlockReader trait abstracts a file or something else that can be read in 8kB pages. It is implemented by EphemeralFiles, and by a new FileBlockReader struct that allows reading arbitrary VirtualFiles in that manner, utilizing the page cache. There is also a new BlockCursor struct that works as a cursor over a BlockReader. When you create a BlockCursor and read the first page using it, it keeps the reference to the page. If you access the same page again, it avoids going to page cache and quickly returns the same page again. That can save a lot of lookups in the page cache if you perform multiple reads. The Blob-oriented API allows reading and writing "blobs" of arbitrary length. It is a layer on top of the block-oriented API. When you write a blob with the write_blob() function, it writes a length field followed by the actual data to the underlying block storage, and returns the offset where the blob was stored. The blob can be retrieved later using the offset. Finally, this replaces the I/O code in image-, delta-, and in-memory layers to use the new abstractions. These replace the 'bookfile' crate. This is a backwards-incompatible change to the storage format.	2022-04-07 20:58:54 +03:00
Alexey Kondratov	d0c246ac3c	Update pageserver OpenAPI spec with missing attach/detach methods (#1463 ) We have these methods for some time in the API, so mentioning them in the spec could be useful for console (see zenithdb/console#867), as we generate pageserver HTTP API golang client there.	2022-04-05 20:01:57 +03:00
Heikki Linnakangas	2f784144fe	Avoid deadlock when locking two buffers. It happened in unit tests. If a thread tries to read a buffer while already holding a lock on one buffer, the code to find a victim buffer to evict could try to evict the buffer that's already locked. To fix, skip locked buffers.	2022-04-04 20:12:31 +03:00
Heikki Linnakangas	222b723354	Handle read errors when dumping a delta layer file. If a file is corrupt, let's not stop on first read error, but continue dumping.	2022-04-04 20:12:28 +03:00
Heikki Linnakangas	089ba6abfe	Clean up some comments that still referred to 'segments'	2022-04-04 20:12:25 +03:00
Konstantin Knizhnik	fcf613b6e3	Fix unit tests build	2022-04-04 10:43:27 +03:00
Konstantin Knizhnik	572b3f48cf	Add compaction_target_size parameter	2022-04-04 10:43:27 +03:00
Konstantin Knizhnik	bef9b837f1	Replace rwlock with mutex in repartition	2022-04-04 10:43:27 +03:00
Konstantin Knizhnik	232fe14297	Refactor partitioning	2022-04-04 10:43:27 +03:00
Konstantin Knizhnik	92031d376a	Fix unit tests	2022-04-04 10:43:27 +03:00
Konstantin Knizhnik	1f0b406b63	Perform repartitioning in compaction thread refer #1441	2022-04-04 10:43:27 +03:00
Kirill Bulatov	4c9447589a	Place an info span into gc loop step	2022-04-03 19:30:36 +03:00
Kirill Bulatov	43c16c5145	Don't log ZIds in the timeline load span	2022-04-03 19:30:36 +03:00
bojanserafimov	af712798e7	Fix pageserver readme formatting I put the diagram in a fixed-width block, since it wasn't rendering correctly on github.	2022-04-02 00:36:54 +03:00
Dmitry Ivanov	f5da652388	[proxy] Enable keepalives for all tcp connections (#1448 )	2022-03-31 20:44:57 +03:00
Anastasia Lubennikova	8745b022a9	Extend LayerMap dump() function to print also open_layers and frozen_layers. Add verbose option to chose if we need to print all layer's keys or not.	2022-03-31 17:26:24 +03:00
Konstantin Knizhnik	1aa8fe43cf	Fix race condition in image layer (#1440 ) * Fix race condition in image layer refer #1439 * Add explicit drop(inner) in layer load method * Add explicit drop(inner) in layer load method	2022-03-31 15:47:59 +03:00
Dmitry Rodionov	649f324fe3	make logging in basebackup more consistent	2022-03-30 17:58:51 +03:00
Dmitry Rodionov	8609234204	decrease the log level to debug because it is too noisy	2022-03-30 10:13:38 +03:00
Kirill Bulatov	277e41f4b7	Show s3 spans in logs and improve the log messages	2022-03-29 19:21:31 +03:00
Arthur Petukhovsky	ce0243bc12	Add metric for last_record_lsn (#1430 )	2022-03-29 18:54:24 +03:00
Dmitry Rodionov	eee0f51e0c	use cargo-hakari to manage workspace_hack crate workspace_hack is needed to avoid recompilation when different crates inside the workspace depend on the same packages but with different features being enabled. Problem occurs when you build crates separately one by one. So this is irrelevant to our CI setup because there we build all binaries at once, but it may be relevant for local development. this also changes cargo's resolver version to 2	2022-03-29 10:42:04 +03:00
Kirill Bulatov	0e44887929	Show more S3 logs and less verbove WAL logs	2022-03-29 00:36:06 +03:00
Dhammika Pathirana	1aa57fc262	Fix tone down compact log chatter Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>	2022-03-28 13:24:13 -07:00
Heikki Linnakangas	07342f7519	Major storage format rewrite. This is a backwards-incompatible change. The new pageserver cannot read repositories created with an old pageserver binary, or vice versa. Simplify Repository to a value-store ------------------------------------ Move the responsibility of tracking relation metadata, like which relations exist and what are their sizes, from Repository to a new module, pgdatadir_mapping.rs. The interface to Repository is now a simple key-value PUT/GET operations. It's still not any old key-value store though. A Repository is still responsible from handling branching, and every GET operation comes with an LSN. Mapping from Postgres data directory to keys/values --------------------------------------------------- All the data is now stored in the key-value store. The 'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects like relation pages and SLRUs, to key-value pairs. The key to the Repository key-value store is a Key struct, which consists of a few integer fields. It's wide enough to store a full RelFileNode, fork and block number, and to distinguish those from metadata keys. 'pgdatadir_mapping.rs' is also responsible for maintaining a "partitioning" of the keyspace. Partitioning means splitting the keyspace so that each partition holds a roughly equal number of keys. The partitioning is used when new image layer files are created, so that each image layer file is roughly the same size. The partitioning is also responsible for reclaiming space used by deleted keys. The Repository implementation doesn't have any explicit support for deleting keys. Instead, the deleted keys are simply omitted from the partitioning, and when a new image layer is created, the omitted keys are not copied over to the new image layer. We might want to implement tombstone keys in the future, to reclaim space faster, but this will work for now. Changes to low-level layer file code ------------------------------------ The concept of a "segment" is gone. Each layer file can now store an arbitrary range of Keys. Checkpointing, compaction ------------------------- The background tasks are somewhat different now. Whenever checkpoint_distance is reached, the WAL receiver thread "freezes" the current in-memory layer, and creates a new one. This is a quick operation and doesn't perform any I/O yet. It then launches a background "layer flushing thread" to write the frozen layer to disk, as a new L0 delta layer. This mechanism takes care of durability. It replaces the checkpointing thread. Compaction is a new background operation that takes a bunch of L0 delta layers, and reshuffles the data in them. It runs in a separate compaction thread. Deployment ---------- This also contains changes to the ansible scripts that enable having multiple different pageservers running at the same time in the staging environment. We will use that to keep an old version of the pageserver running, for clusters created with the old version, at the same time with a new pageserver with the new binary. Author: Heikki Linnakangas Author: Konstantin Knizhnik <knizhnik@zenith.tech> Author: Andrey Taranik <andrey@zenith.tech> Reviewed-by: Matthias Van De Meent <matthias@zenith.tech> Reviewed-by: Bojan Serafimov <bojan@zenith.tech> Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech> Reviewed-by: Anton Shyrabokau <antons@zenith.tech> Reviewed-by: Dhammika Pathirana <dham@zenith.tech> Reviewed-by: Kirill Bulatov <kirill@zenith.tech> Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech> Reviewed-by: Alexey Kondratov <alexey@zenith.tech>	2022-03-28 05:41:15 -05:00
Kirill Bulatov	55de0b88f5	Hide remote timeline index access details	2022-03-28 12:36:01 +03:00
Heikki Linnakangas	e3fa00972e	Use RwLocks in image and delta layers for more concurrency. With a Mutex, only one thread could read from the layer at a time. I did some ad hoc profiling with pgbench and saw that a fair amout of time was spent blocked on these Mutexes.	2022-03-25 15:34:38 +02:00

1 2 3 4 5 ...

782 Commits