rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-14 08:52:56 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	d07101d317	Make clippy happy	2023-03-10 09:52:12 +02:00
Konstantin Knizhnik	89e4fc3c63	Copy block content in block cursor cache	2023-03-10 08:19:20 +02:00
Arseny Sher	b80fe41af3	Refactor postgres protocol parsing. 1) Remove allocation and data copy during each message read. Instead, parsing functions now accept BytesMut from which data they form messages, with pointers (e.g. in CopyData) pointing directly into BytesMut buffer. Accordingly, move ConnectionError containing IO error subtype into framed.rs providing this and leave in pq_proto only ProtocolError. 2) Remove anyhow from pq_proto. 3) Move FeStartupPacket out of FeMessage. Now FeStartupPacket::parse returns it directly, eliminating dead code where user wants startup packet but has to match for others. proxy stream.rs is adapted to framed.rs with minimal changes. It also benefits from framed.rs improvements described above.	2023-03-09 20:45:56 +03:00
Arseny Sher	0d8ced8534	Remove sync postgres_backend, tidy up its split usage. - Add support for splitting async postgres_backend into read and write halfes. Safekeeper needs this for bidirectional streams. To this end, encapsulate reading-writing postgres messages to framed.rs with split support without any additional changes (relying on BufRead for reading and BytesMut out buffer for writing). - Use async postgres_backend throughout safekeeper (and in proxy auth link part). - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: https://github.com/neondatabase/neon/pull/3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split.	2023-03-09 20:45:56 +03:00
Arseny Sher	7627d85345	Move async postgres_backend to its own crate. To untie cyclic dependency between sync and async versions of postgres_backend, copy QueryError and some logging/error routines to postgres_backend.rs. This is temporal glue to make commits smaller, sync version will be dropped by the upcoming commit completely.	2023-03-09 20:45:56 +03:00
Arseny Sher	3f11a647c0	Rename write_message to write_message_noflush in postgres_backend_async.rs To make it unifrom across the project; proxy stream.rs and older postgres_backend uses write_message_noflush.	2023-03-09 20:45:56 +03:00
Kirill Bulatov	03a2ce9d13	Add tracing spans with request_id into pageserver management API handlers (#3755 ) Adds a newtype that creates a span with request_id from https://github.com/neondatabase/neon/pull/3708 for every HTTP request served. Moves request logging and error handlers under the new wrapper, so every request-related event now is logged under the request span. For compatibility reasons, error handler is left on the general router, since not every service uses the new handler wrappers yet.	2023-03-09 09:24:01 +02:00
Heikki Linnakangas	ccf92df4da	Remove deprecated support to handle ZENITH_AUTH_TOKEN. It's not used anywhere anymore.	2023-03-09 00:53:13 +02:00
Heikki Linnakangas	fb1581d0b9	Fix setting "image_creation_threshold" setting in tenant config. (#3762 ) We have a few tests that try to set image_creation_threshold, but it didn't actually have any effect because we were missing some critical code to load the setting from config file into memory. The two modified tests in `test_remote_storage.py perform compaction and GC, and assert that GC removes some layers. That only happens if new image layers are created by the compaction. The tests explicitly disabled image layer creation by setting image_creation_threshold to a high value, but it didn't take effect because reading image_creation_threshold from config file was broken, which is why the test worked. Fix the test to set image_creation_threshold low, instead, so that GC has work to do. Change 'test_tenant_conf.py' so that it exercises the added code. This might explain why we're apparently missing test coverage for GC (issue #3415), although I didn't try to address that here, nor did I check if this improves the it.	2023-03-08 11:39:30 +02:00
Sasha Krassovsky	02b8e0e5af	Add OpenAPI spec for do_gc (#3756 ) ## Describe your changes Adds a field to the OpenAPI spec for the page server which describes the `do_gc` command. ## Issue ticket number and link #3669 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-07 09:08:46 -08:00
Stas Kelvich	069b5b0a06	Make `postgres --wal-redo` more embeddable. * Stop allocating and maintaining 128MB hash table for last written LSN cache as it is not needed in wal-redo. * Do not require access to the initialized data directory. That saves few dozens megabytes of empty but initialized data directory. Currently such directories do occupy about 10% of the disk space on the pageservers as most of tenants are empty. * Move shmem-initialization code to the extension instead of postgres	2023-03-07 15:01:14 +02:00
Shany Pozin	7b9057ad01	Add timeout to download copy (#3675 ) ## Describe your changes Adding a timeout handling for the remote download of layers of 120 seconds for each operation Note that these downloads are being retried for N times ## Issue ticket number and link Fixes: #3672 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-06 18:52:59 +02:00
Konstantin Knizhnik	96f65fad68	Handle crash of walredo process and retry applying wal records (#3739 ) ## Describe your changes Restart walredo process an d retry applying walredo records i case of abnormal walredo process termination ## Issue ticket number and link See #1700 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-06 10:10:58 +02:00
Christian Schwarz	66a5159511	fix: compaction: no index upload scheduled if no on-demand downloads Commit `0cf7fd0fb8` Compaction with on-demand download (#3598) introduced a subtle bug: if we don't have to do on-demand downloads, we only take one ROUND in fn compact() and exit early. Thereby, we miss scheduling the index part upload for any layers created by fn compact_inner(). Before that commit, we didn't have this problem. So, this patch fixes it. Since no regression test caught this, I went ahead and extended the timeline size tests to assert that, if remote storage is configured, 1. pageserver_remote_physical_size matches the other physical sizes 2. file sizes reported by the layer map info endpoint match the other physical size metrics Without the pageserver code fix, the regression test would fail at the physical size assertion, complaining that any of the resident physical size != remote physical size metric 50790400.0 != 18399232.0 I figured out what the problem is by comparing the remote storage and local directories like so, and noticed that the image layer in the local directory wasn't present on the remote side. It's size was exactly the difference 50790400.0 - 18399232.0 =32391168.0 fixes https://github.com/neondatabase/neon/issues/3738	2023-03-03 16:11:54 +01:00
Christian Schwarz	1b780fa752	timeline_checkpoint_handler: add span with tenant and timeline id Before this patch, the logs written by freeze_and_flush() and compact() didn't have any span, which made the test logs annoying to read.	2023-03-03 12:10:24 +01:00
Christian Schwarz	38022ff11c	gc: only decrement resident size if GC'd layer is resident Before this patch, GC would call PersistentLayer::delete() on every GC'ed layer. RemoteLayer::delete() returned Ok(()) unconditionally. GC would then proceed by decrementing the resident size metric, even though the layer is a RemoteLayer. This patch makes the following changes: - Rename PersistentLayer::delete() to delete_resident_layer_file(). That name is unambiguous. - Make RemoteLayer::delete_resident_layer_file return an Err(). We would have uncovered this bug if we had done that from the start. - Change GC / Timeline::delete_historic_layer check whether the layer is remote or not, and only call delete_resident_layer_file() if it's not remote. This brings us in line with how eviction does it. - Add a regression test. fixes https://github.com/neondatabase/neon/issues/3722	2023-03-03 12:10:24 +01:00
Christian Schwarz	1b9b9d60d4	eviction: add comment explaining resident size decrement on error https://github.com/neondatabase/neon/issues/3722	2023-03-03 12:10:24 +01:00
Christian Schwarz	68141a924d	eviction: remove needless if-let around resident size decrement The branch was always taken at runtime, so, this should not constitute a behavioral change. refs https://github.com/neondatabase/neon/issues/3722	2023-03-03 12:10:24 +01:00
Christian Schwarz	764d27f696	fix checkpoint_timeout serialization in TenantConf Without this change, when actually setting this conf opt, the tenant would become Broken next time we load it. Why? The serde_toml representation that persist_tenant_conf would write out would be a TOML inline table of `secs` and `nsecs`. But our hand-rolled TenantConf parser expects a TOML string. I checked that all other `Duration` values in TenantConfOpt use the humantime serialization. Issues like this would likely be systematically prevent by https://github.com/neondatabase/neon/issues/3682	2023-03-03 12:10:24 +01:00
Heikki Linnakangas	f51b48fa49	Fix UNLOGGED tables. Instead of trying to create missing files on the way, send init fork contents as main fork from pageserver during basebackup. Add test for that. Call put_rel_drop for init forks; previously they weren't removed. Bump vendor/postgres to revert previous approach on Postgres side. Co-authored-by: Arseny Sher <sher-ars@yandex.ru> ref https://github.com/neondatabase/postgres/pull/264 ref https://github.com/neondatabase/postgres/pull/259 ref https://github.com/neondatabase/neon/issues/1222	2023-02-24 23:30:02 +04:00
Konstantin Knizhnik	412e0aa985	Skip largest N holes during compaction (#3597 ) ## Describe your changes This is yet another attempt to address problem with storage size ballooning #2948 Previous PR #3348 tries to address this problem by maintaining list of holes for each layer. The problem with this approach is that we have to load all layer on pageserver start. Lazy loading of layers is not possible any more. This PR tries to collect information of N largest holes on compaction time and exclude this holes from produced layers. It can cause generation of larger number of layers (up to 2 times) and producing small layers. But it requires minimal changes in code and doesn't affect storage format. For graphical explanation please see thread: https://github.com/neondatabase/neon/pull/3597#discussion_r1112704451 ## Issue ticket number and link #2948 #3348 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-22 18:28:01 +02:00
Joonas Koivunen	225add041f	calculate_logical_size: no longer use spawn_blocking (#3664 ) Calculation of logical size is now async because of layer downloads, so we shouldn't use spawn_blocking for it. Use of `spawn_blocking` exhausted resources which are needed by `tokio::io::copy` when copying from a stream to a file which lead to deadlock. Fixes: #3657	2023-02-21 21:09:31 +02:00
Joonas Koivunen	fe462de85b	fix: log download failed error (#3661 ) Fixes #3659	2023-02-21 19:31:53 +02:00
Joonas Koivunen	b220ba6cd1	add random init delay for background tasks (#3655 ) Fixes #3649.	2023-02-21 12:42:11 +01:00
Joonas Koivunen	7de373210d	Warn when background tasks exceed their configured period (#3654 ) Fixes #3648.	2023-02-21 13:02:19 +02:00
Joonas Koivunen	d7d3f451f0	Use tracing panic hook in all binaries (#3634 ) Enables tracing panic hook in addition to pageserver introduced in #3475: - proxy - safekeeper - storage_broker For proxy, a drop guard which resets the original std panic hook was added on the first commit. Other binaries don't need it so they never reset anything by `disarm`ing the drop guard. The aim of the change is to make sure all panics a) have span information b) are logged similar to other messages, not interleaved with other messages as happens right now. Interleaving happens right now because std prints panics to stderr, and other logging happens in stdout. If this was handled gracefully by some utility, the log message splitter would treat panics as belonging to the previous message because it expects a message to start with a timestamp. Cc: #3468	2023-02-21 10:03:55 +02:00
Christian Schwarz	485b269674	eviction: tone down logs to debug!() level if there were no evictions fixes #3647	2023-02-20 18:01:59 +01:00
Christian Schwarz	ee1eda9921	eviction: remove EvictionStats::not_considered_due_to_clock_skew Rationale: see the block comment added in this patch. fixes #3641	2023-02-20 18:01:59 +01:00
Christian Schwarz	e363911c85	timeline: propagate span to download_remote_layer (#3644 ) fixes #3643 refs #3604	2023-02-20 17:18:13 +02:00
Shany Pozin	af210c8b42	Allow running do_gc in non testing env (#3639 ) ## Describe your changes Since the current default gc period is set to 1 hour, whenever there is an immediate need to reduce PITR and run gc, the user has to wait 1 hour for PITR change to take effect By enabling this API the user can configure PITR and immediately call the do_gc API to trigger gc ## Issue ticket number and link #3590 ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-20 13:23:13 +02:00
Anastasia Lubennikova	40799d8ae7	Add debug messages to catch abnormal consumption metric values	2023-02-17 17:57:45 +02:00
Joonas Koivunen	8e6b27bf7c	fix: avoid busy loop on replacement failure (#3613 ) Add an AtomicBool per RemoteLayer, use it to mark together with closed semaphore that remotelayer is unusable until restart or ignore+load. https://github.com/neondatabase/neon/issues/3533#issuecomment-1431481554	2023-02-17 14:15:29 +02:00
Joonas Koivunen	ae3eff1ad2	Tracing panic hook (#3475 ) Fixes #3468. This does change how the panics look, and most importantly, make sure they are not interleaved with other messages. Adds a `GET /v1/panic` endpoint for panic testing (useful for sentry dedup and this hook testing). The panics are now logged within a new error level span called `panic` which separates it from other error level events. The panic info is unpacked into span fields: - thread=mgmt request worker - location="pageserver/src/http/routes.rs:898:9" Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-02-17 13:56:00 +02:00
Anastasia Lubennikova	0d3aefb274	Only use active timelines in synthetic_size calculation	2023-02-16 17:58:53 +02:00
Anastasia Lubennikova	6139e8e426	Revert "Add debug messages around sending cached metrics" This reverts commit `a839860c2e`.	2023-02-16 17:46:15 +02:00
Anastasia Lubennikova	d9ba3c5f5e	Revert "Add debug messages around timeline.get_current_logical_size" This reverts commit `a5ce2b5330`.	2023-02-16 17:46:15 +02:00
Joonas Koivunen	0cf7fd0fb8	Compaction with on-demand download (#3598 ) Repeatedly (twice) try to download the compaction targeted layers before actual compaction. Adds tests for both L0 compaction downloading layers and image creation downloading layers. Image creation support existed already. Fixes #3591 Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-02-16 15:36:13 +02:00
Anastasia Lubennikova	7991bd3b69	Fix periodic metric sending: don't reset timer on every iteration (#3617 ) Previously timer was reset on every collect_metrics_iteration and sending of cached metrics was never triggered. This is a follow-up for `a69da4a7`.	2023-02-16 10:56:42 +02:00
Heikki Linnakangas	ddbdcdddd7	Tenant size calculation: refactor, rewrite, and add SVG (#2817 ) Refactor the tenant_size_model code. Segment now contains just the minimum amount of information needed to calculate the size. Other information that is useful for building up the segment tree, and for display purposes, is now kept elsewhere. The code in 'main.rs' has a new ScenarioBuilder struct for that. Calculating which Segments are "needed" is now the responsibility of the caller of tenant_size_mode, not part of the calculation itself. So it's up to the caller to make all the decisions with retention periods for each branch. The output of the sizing calculation is now a Vec of SizeResults, rather than a tree. It uses a tree representation internally, when doing the calculation, but it's not exposed to the caller anymore. Refactor the way the recursive calculation is performed. Rewrite the code in size.rs that builds the Segment model. Get rid of the intermediate representation with Update structs. Build the Segments directly, with some local HashMaps and Vecs to track branch points to help with that. retention_period is now an input to gather_inputs(), rather than an output. Update pageserver http API: rename /size endpoint to /synthetic_size with following parameters: - /synthetic_size?inputs_only to get debug info; - /synthetic_size?retention_period=0 to override cutoff that is used to calculate the size; pass header -H "Accept: text/html" to get HTML output, otherwise JSON is returned Update python tests and openapi spec. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-02-16 10:53:46 +02:00
Anastasia Lubennikova	a839860c2e	Add debug messages around sending cached metrics	2023-02-15 16:02:02 +02:00
Anastasia Lubennikova	a5ce2b5330	Add debug messages around timeline.get_current_logical_size	2023-02-15 16:02:02 +02:00
Joonas Koivunen	e6618f1cc0	Update current logical size gauge (#3592 ) Alternative to #3586. Introduces usage of current_logical_size.current_size as a boundary after which we start to update the metric gauge on ingested wal. Previously any incremented value (ingested wal) would had updated the gauge, but this would had left the metric at zero for timelines which never receive any wal even if size had been calculated. Now the gauge is updated right away as the calculation completes, not requiring any wal to be received.	2023-02-14 13:17:34 +02:00
Christian Schwarz	a4256b3250	allow on-demand downloads in walreceiver connection handler Without this patch, basebackup fails if we evict all layers before that. This slipped in as part of commit `01b4b0c2f3` Author: Christian Schwarz <christian@neon.tech> Date: Fri Jan 13 17:02:22 2023 +0100 Introduce RequestContext	2023-02-09 13:39:04 +01:00
Christian Schwarz	175a577ad4	automatic layer eviction This patch adds a per-timeline periodic task that executes an eviction policy. The eviction policy is configurable per tenant. Two policies exist: - NoEviction (the default one) - LayerAccessThreshold The LayerAccessThreshold policy examines the last access timestamp per layer in the layer map and evicts the layer if that last access is further in the past than a configurable threshold value. This policy kind is evaluated periodically at a configurable period. It logs a summary statistic at `info!()` or `warn!()` level, depending on whether any evictions failed. This feature has no explicit killswitch since it's off by default.	2023-02-09 13:33:55 +01:00
Joonas Koivunen	1fdf01e3bc	fix: readable Debug for Layers (#3575 ) #3536 added the custom Debug implementations but it using derived Debug on Key lead to too verbose output. Instead of making `Key`'s `Debug` unconditionally or conditionally do the `Display` variant (for table space'd keys), opted to build a newtype to provide `Debug` for `Range<Key>` via `Display` which seemed to work unconditionally. Also orders Key to have: 1. comment, 2. derive, 3. `struct Key`.	2023-02-09 13:55:37 +02:00
Christian Schwarz	446a39e969	make LayerAccesStatFullDetails Copy Method to_api_model renamed to as_api_model because of Clippy complaint: https://rust-lang.github.io/rust-clippy/master/index.html#wrong_self_convention	2023-02-09 12:35:45 +01:00
Joonas Koivunen	f07d6433b6	fix: one leftover Arc::ptr_eq (#3573 ) @knizhnik noticed that one instance of `Arc::<dyn PersistentLayer>::ptr_eq` was missed in #3558. Now all `ptr_eq` which remain are in comments.	2023-02-09 13:02:07 +02:00
Christian Schwarz	7ed93fff06	refactor: allow for eviction of layers in a batch The auto-eviction PR (#3552) operates in two phaes: 1. find candidate layers 2. evict them. For (2), a batch API like the one added in this commit is useful. Note that this PR requires #3558 to be merged first. Otherwise, the tests won't pass.	2023-02-08 14:40:47 +01:00
Joonas Koivunen	a6dffb6ef9	fix: stop using Arc::ptr_eq with dyn Trait (#3558 ) This changes the way we compare `Arc<dyn PersistentLayer>` in Timeline's `LayerMap` not to use `Arc::ptr_eq` which has been witnessed in development of #3557 to yield wrong results. It gives wrong results because it compares fat pointers, which are `(object, vtable)` tuples for `dyn Trait` and there are no guarantees that the `vtable`s are unique. As in there were multiple vtables for `RemoteLayer` which is why the comparison failed in #3557. This is a known issue in rust, clippy warns against it and rust std might be moving to the solution which has been reproduced on this PR: compare only object pointers by "casting out" the vtable pointer.	2023-02-08 12:25:25 +00:00
Joonas Koivunen	fcb905f519	Use LayerMap::replace in eviction (#3544 ) Follow-up to #3536, to actually use the new `Debug` in replacing the layers, and use replacement with manual eviction endpoint. Turns out the two paths share a lot of handling of `Replacement` but didn't unify the two (need 3). There are also upcoming refactorings from other PRs to this.	2023-02-07 11:08:55 +02:00

1 2 3 4 5 ...

1214 Commits