rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 05:00:37 +00:00

Author	SHA1	Message	Date
Alexander Bayandin	105b8bb9d3	test_runner: automatically rerun flaky tests (#3880 ) This PR adds a plugin that automatically reruns (up to 3 times) flaky tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and `pytest-rerunfailures` plugin. As the first approximation we consider the test flaky if it has failed on the main branch in the last 10 days. Flaky tests are fetched by `scripts/flaky_tests.py` script (it's possible to use it in a standalone mode to learn which tests are flaky), stored to a JSON file, and then the file is passed to the pytest plugin.	2023-04-04 12:21:54 +01:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Shany Pozin	0f7de84785	Allow calling detach on ignored tenant (#3834 ) ## Describe your changes Added a query param to detach API Allow to remove local state of a tenant even if its not in the memory (following ignore API) ## Issue ticket number and link #3828 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2023-03-22 07:17:00 +00:00
Christian Schwarz	881356c417	add metrics to detect eviction-induced thrashing (#3837 ) This patch adds two metrics that will enable us to detect thrashing of layers, i.e., repetitions of `eviction, on-demand-download, eviction, ... ` for a given layer. The first metric counts all layer evictions per timeline. It requires no further explanation. The second metric counts the layer evictions where the layer was resident for less than a given threshold. We can alert on increments to the second metric. The first metric will serve as a baseline, and further, it's generally interesting, outside of thrashing. The second metric's threshold is configurable in PageServerConf and defaults to 24h. The threshold value is reproduced as a label in the metric because the counter's value is semantically tied to that threshold. Since changes to the config and hence the label value are infrequent, this will have low storage overhead in the metrics storage. The data source to determine the time that the layer was resident is the file's `mtime`. Using `mtime` is more of a crutch. It would be better if Pageserver did its own persistent bookkeeping of residence change events instead of relying on the filesystem. We had some discussion about this: https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900 My position is that `mtime` is good enough for now. It can theoretically jump forward if someone copies files without resetting `mtime`. But that shouldn't happen in practice. Note that moving files back and forth doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`, which is typically used for filesystem-level backup / restore, correctly syncs `mtime`. I've added a label that identifies the data source to keep options open for a future, better data source than `mtime`. Since this value will stay the same for the time being, it's not a problem for metrics storage. refs https://github.com/neondatabase/neon/issues/3728	2023-03-20 16:11:36 +01:00
Heikki Linnakangas	fea4b5f551	Switch to EdDSA algorithm for the storage JWT authentication tokens. The control plane currently only supports EdDSA. We need to either teach the storage to use EdDSA, or the control plane to use RSA. EdDSA is more modern, so let's use that. We could support both, but it would require a little more code and tests, and we don't really need the flexibility since we control both sides.	2023-03-20 16:28:01 +02:00
Joonas Koivunen	0c1228c37a	feat: store initial timeline in env fixture (#3839 ) minor change, but will allow more use in future for the default tenants. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-03-20 11:57:27 +02:00
Shany Pozin	93f3f4ab5f	Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818 ) ## Describe your changes Add Error enum for tenant state response to allow better error handling in mgmt api ## Issue ticket number and link #2238 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-19 10:44:42 +02:00
Heikki Linnakangas	10a5d36af8	Separate mgmt and libpq authentication configs in pageserver. (#3773 ) This makes it possible to enable authentication only for the mgmt HTTP API or the compute API. The HTTP API doesn't need to be directly accessible from compute nodes, and it can be secured through network policies. This also allows rolling out authentication in a piecemeal fashion.	2023-03-15 13:52:29 +02:00
Joonas Koivunen	15b692ccc9	test: more strict finding of WARN, ERROR lines (#3798 ) this prevents flakyness when `WARN\|ERROR` appears in some other part of the line, for example in a random filename.	2023-03-14 15:27:52 +02:00
Alexander Bayandin	3d869cbcde	Replace flake8 and isort with ruff (#3810 ) - Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort - Update mypy and black	2023-03-14 13:25:44 +00:00
Arseny Sher	0d8ced8534	Remove sync postgres_backend, tidy up its split usage. - Add support for splitting async postgres_backend into read and write halfes. Safekeeper needs this for bidirectional streams. To this end, encapsulate reading-writing postgres messages to framed.rs with split support without any additional changes (relying on BufRead for reading and BytesMut out buffer for writing). - Use async postgres_backend throughout safekeeper (and in proxy auth link part). - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: https://github.com/neondatabase/neon/pull/3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split.	2023-03-09 20:45:56 +03:00
Christian Schwarz	9cada8b59d	fix benchmarks, broken by PR #3737 Benchmarks only run on `main` branch, so, the pre-commit tests didn't catch these.	2023-03-03 18:47:57 +01:00
Christian Schwarz	66a5159511	fix: compaction: no index upload scheduled if no on-demand downloads Commit `0cf7fd0fb8` Compaction with on-demand download (#3598) introduced a subtle bug: if we don't have to do on-demand downloads, we only take one ROUND in fn compact() and exit early. Thereby, we miss scheduling the index part upload for any layers created by fn compact_inner(). Before that commit, we didn't have this problem. So, this patch fixes it. Since no regression test caught this, I went ahead and extended the timeline size tests to assert that, if remote storage is configured, 1. pageserver_remote_physical_size matches the other physical sizes 2. file sizes reported by the layer map info endpoint match the other physical size metrics Without the pageserver code fix, the regression test would fail at the physical size assertion, complaining that any of the resident physical size != remote physical size metric 50790400.0 != 18399232.0 I figured out what the problem is by comparing the remote storage and local directories like so, and noticed that the image layer in the local directory wasn't present on the remote side. It's size was exactly the difference 50790400.0 - 18399232.0 =32391168.0 fixes https://github.com/neondatabase/neon/issues/3738	2023-03-03 16:11:54 +01:00
Christian Schwarz	d1a0a907ff	tests: use `parse_metrics` everywhere (#3737 ) - use parse_metrics() in all places where we parse Prometheus metrics - query_all: make `filter` argument optional - encourage using properly parsed, typed metrics by changing get_metrics() to return already-parsed metrics. The new get_metric_str() method, like in the Safekeeper type, returns the raw text response.	2023-03-03 14:53:27 +01:00
Christian Schwarz	38022ff11c	gc: only decrement resident size if GC'd layer is resident Before this patch, GC would call PersistentLayer::delete() on every GC'ed layer. RemoteLayer::delete() returned Ok(()) unconditionally. GC would then proceed by decrementing the resident size metric, even though the layer is a RemoteLayer. This patch makes the following changes: - Rename PersistentLayer::delete() to delete_resident_layer_file(). That name is unambiguous. - Make RemoteLayer::delete_resident_layer_file return an Err(). We would have uncovered this bug if we had done that from the start. - Change GC / Timeline::delete_historic_layer check whether the layer is remote or not, and only call delete_resident_layer_file() if it's not remote. This brings us in line with how eviction does it. - Add a regression test. fixes https://github.com/neondatabase/neon/issues/3722	2023-03-03 12:10:24 +01:00
Arthur Petukhovsky	b23742e09c	Create `/v1/debug_dump` safekeepers endpoint (#3710 ) Add HTTP endpoint to get full safekeeper state of all existing timelines (all in-memory values and info about all files stored on disk). Example: https://gist.github.com/petuhovskiy/3cbb8f870401e9f486731d145161c286	2023-03-03 14:01:05 +03:00
MMeent	20a4d817ce	Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581 ) ## Describe your changes Rebase vendored PostgreSQL onto 14.7 and 15.2 ## Issue ticket number and link #3579 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ``` The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL 14 and 15.2 for PostgreSQL 15. ```	2023-02-23 16:10:22 +02:00
Joonas Koivunen	5d001b1e5a	chore: ignore all compaction inactive tenant errors (#3665 ) these are happening in tests because of #3655 but they sure took some time to appear. makes the `Compaction failed, retrying in 2s: Cannot run compaction iteration on inactive tenant` into a globally allowed error, because it has been seen failing on different test cases.	2023-02-21 20:20:13 +02:00
Joonas Koivunen	7de373210d	Warn when background tasks exceed their configured period (#3654 ) Fixes #3648.	2023-02-21 13:02:19 +02:00
Dmitry Ivanov	956b6f17ca	[proxy] Handle some unix signals. On the surface, this doesn't add much, but there are some benefits: * We can do graceful shutdowns and thus record more code coverage data. * We now have a foundation for the more interesting behaviors, e.g. "stop accepting new connections after SIGTERM but keep serving the existing ones". * We give the otel machinery a chance to flush trace events before finally shutting down.	2023-02-17 15:32:14 +03:00
Joonas Koivunen	8e6b27bf7c	fix: avoid busy loop on replacement failure (#3613 ) Add an AtomicBool per RemoteLayer, use it to mark together with closed semaphore that remotelayer is unusable until restart or ignore+load. https://github.com/neondatabase/neon/issues/3533#issuecomment-1431481554	2023-02-17 14:15:29 +02:00
Heikki Linnakangas	ddbdcdddd7	Tenant size calculation: refactor, rewrite, and add SVG (#2817 ) Refactor the tenant_size_model code. Segment now contains just the minimum amount of information needed to calculate the size. Other information that is useful for building up the segment tree, and for display purposes, is now kept elsewhere. The code in 'main.rs' has a new ScenarioBuilder struct for that. Calculating which Segments are "needed" is now the responsibility of the caller of tenant_size_mode, not part of the calculation itself. So it's up to the caller to make all the decisions with retention periods for each branch. The output of the sizing calculation is now a Vec of SizeResults, rather than a tree. It uses a tree representation internally, when doing the calculation, but it's not exposed to the caller anymore. Refactor the way the recursive calculation is performed. Rewrite the code in size.rs that builds the Segment model. Get rid of the intermediate representation with Update structs. Build the Segments directly, with some local HashMaps and Vecs to track branch points to help with that. retention_period is now an input to gather_inputs(), rather than an output. Update pageserver http API: rename /size endpoint to /synthetic_size with following parameters: - /synthetic_size?inputs_only to get debug info; - /synthetic_size?retention_period=0 to override cutoff that is used to calculate the size; pass header -H "Accept: text/html" to get HTML output, otherwise JSON is returned Update python tests and openapi spec. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-02-16 10:53:46 +02:00
Kirill Bulatov	ec3a3aed37	Dump current tenant config (#3534 ) The PR adds an endpoint to show tenant's current config: `GET /v1/tenant/:tenant_id/config` Tenant's config consists of two parts: tenant overrides (could be changed via other management API requests) and the default part, substituting all missing overrides (constant, hardcoded in pageserver). The API returns the custom overrides and the final tenant config, after applying all the defaults. Along the way, it had to fix two things in the config: * allow to shorten the json version and omit all `null`'s (same as toml serializer behaves by default), and to understand such shortened format when deserialized. A unit test is added * fix a bug, when `PUT /v1/tenant/config` endpoint rewritten the local file with what had came in the request, but updating (not rewriting the old values) the in-memory state instead. That got uncovered during adjusting the e2e test and fixed to do the replacement everywhere, otherwise there's no way to revert existing overrides. Fixes #3471 (commit `dc688affe8`) * fixes https://github.com/neondatabase/neon/issues/3472 by reordering the config saving operations	2023-02-04 01:32:29 +02:00
Christian Schwarz	87cd2bae77	introduce LaunchTimestamp to identify process restarts This patch adds a LaunchTimestamp type to the `metrics` crate, along with a `libmetric_` Prometheus metric. The initial user is pageserver. In addition to exposing the Prometheus metric, it also reproduces the launch timestamp as a header in the API responses. The motivation for this is that we plan to scrape the pageserver's /v1/tenant/:tenant_id/timeline/:timeline_id/layer HTTP endpoint over time. It will soon expose access metrics (#3496) which reset upon process restart. We will use the pageserver's launch ID to identify a restart between two scrape points. However, there are other potential uses. For example, we could use the Prometheus metric to annotate Grafana plots whenever the launch timestamp changes.	2023-02-03 18:12:17 +01:00
bojanserafimov	be81db21b9	Revert accidental change (#3538 )	2023-02-03 17:54:12 +02:00
bojanserafimov	ada933eb42	Pageserver read trace utils (#2795 ) List, dump, and analyze read traces.	2023-02-02 15:33:40 -05:00
Kirill Bulatov	2759f1a22e	Evict layers on demand (#3486 ) Closes https://github.com/neondatabase/neon/issues/3439 Adds a set of commands to manipulate the layer map: * dump the layer map contents * evict the layer form the layer map (remove the local file, put the remote layer instead in the layer map) * download the layer (operation, reversing the eviction) The commands will change later, when the statistics is added on top, so the swagger schema is not adjusted. The commands might have issues with big amount of layers: no pagination is done for the dump command, eviction and download commands look for the layer to evict/download by iterating all layers sequentially and comparing the layer names. For now, that seems to be tolerable ("big" number of layers is ~2_000) and further experiments are needed. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-02-02 12:14:44 +02:00
Christian Schwarz	590695e845	improve query param parsing - add parse_query_param() - use Cow<> where possible - move param parsing code to utils::http::request This was originally PR https://github.com/neondatabase/neon/pull/3502 which targeted a different branch. closes #3510	2023-02-01 14:11:12 +01:00
Lassi Pölönen	20b38acff0	Replace per timeline `pageserver_storage_operations_seconds` with a global one (#3409 ) Related to: https://github.com/neondatabase/neon/issues/2848 `pageserver_storage_operations_seconds` is the most expensive metric we have, as there are a lot of tenants/timelines and the histogram had 42 buckets. These are quite sparse too, so instead of having a histogram per timeline, create a new histogram `pageserver_storage_operations_seconds_global` without tenant and timeline dimensions and replace `pageserver_storage_operations_seconds` with sum and counter. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-01-30 17:10:29 +02:00
Shany Pozin	ddb9c2fe94	Add metrics for tenants state (#3448 ) ## Describe your changes Added a metric that allow to monitor tenants state ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3161 ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [X] I have added an e2e test for it. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-01-29 14:04:06 +02:00
Anastasia Lubennikova	36f048d6b0	Fix tenant size orphans (#3377 ) Before only the timelines which have passed the `gc_horizon` were processed which failed with orphans at the tree_sort phase. Example input in added `test_branched_empty_timeline_size` test case. The PR changes iteration to happen through all timelines, and in addition to that, any learned branch points will be calculated as they would had been in the original implementation if the ancestor branch had been over the `gc_horizon`. This also changes how tenants where all timelines are below `gc_horizon` are handled. Previously tenant_size 0 was returned, but now they will have approximately `initdb_lsn` worth of tenant_size. The PR also adds several new tenant size tests that describe various corner cases of branching structure and `gc_horizon` setting. They are currently disabled to not consume time during CI. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-01-20 20:21:36 +02:00
Christian Schwarz	58c8c1076c	download_all_remote_layers API: require client to specify max_concurrent_downloads Before this patch, we would start all layer downloads simultaneously. There is at most one download_all_remote_layers task per timeline. Hence, the specified limit is per timeline. There is still no global concurrency limit for layer downloads. We'll have to revisit that at some point and also prioritize on-demand initiated downloads over download_all_remote_layers downloads. But that's for another day.	2023-01-16 19:29:06 +01:00
Anastasia Lubennikova	2cbe84b78f	Proxy metrics (#3290 ) Implement proxy metrics collection. Only collect metric for outbound traffic. Add proxy CLI parameters: - metric-collection-endpoint - metric-collection-interval. Add test_proxy_metric_collection test. Move shared consumption metrics code to libs/consumption_metrics. Refactor the code.	2023-01-16 15:17:28 +00:00
Kirill Bulatov	a457256fef	Fix log message matching (#3291 ) Spotted https://neon-github-public-dev.s3.amazonaws.com/reports/main/debug/3871991071/index.html#suites/158be07438eb5188d40b466b6acfaeb3/22966d740e33b677/ failing on `main`, fixes that by using a proper regex match string. Also removes one clippy lint suppression.	2023-01-09 14:25:12 +02:00
Kirill Bulatov	b6237474d2	Fix README and basic startup example (#3275 ) Follow-up of https://github.com/neondatabase/neon/pull/3270 which made an example from main README.md not working. Fixes that, by adding a way to specify a default tenant now and modifies the basic neon_local test to start postgres and check branching. Not all neon_local commands are implemented, so not all README.md contents is tested yet.	2023-01-06 12:26:14 +02:00
Kirill Bulatov	8712e1899e	Move initial timeline creation into pytest (#3270 ) For every Python test, we start the storage first, and expect that later, in the test, when we start a compute, it will work without specific timeline and tenant creation or their IDs specified. For that, we have a concept of "default" branch that was created on the control plane level first, but that's not needed at all, given that it's only Python tests that need it: let them create the initial timeline during set-up. Before, control plane started and stopped pageserver for timeline creation, now Python harness runs an extra tenant creation request on test env init. I had to adjust the metrics test, turns out it registered the metrics from the default tenant after an extra pageserver restart. New model does not sent the metrics before the collection time happens, and that was 30s before.	2023-01-05 17:48:27 +02:00
Christian Schwarz	d7f1e30112	remote_timeline_client: more metrics & metrics-related cleanups - Clean up redundant metric removal in TimelineMetrics::drop. RemoteTimelineClientMetrics is responsible for cleaning up REMOTE_OPERATION_TIME andREMOTE_UPLOAD_QUEUE_UNFINISHED_TASKS. - Rename `pageserver_remote_upload_queue_unfinished_tasks` to `pageserver_remote_timeline_client_calls_unfinished`. The new name reflects that the metric is with respect to the entire call to remote timeline client. This includes wait time in the upload queue and hence it's a longer span than what `pageserver_remote_OPERATION_seconds` measures. - Add the `pageserver_remote_timeline_client_calls_started` histogram. See the metric description for why we need it. - Add helper functions `call_begin` etc to `RemoteTimelineClientMetrics` to centralize the logic for updating the metrics above (they relate to each other, see comments in code). - Use these constructs to track ongoing downloads in `pageserver_remote_timeline_client_calls_unfinished` refs https://github.com/neondatabase/neon/issues/2029 fixes https://github.com/neondatabase/neon/issues/3249 closes https://github.com/neondatabase/neon/pull/3250	2023-01-05 11:50:17 +01:00
Kirill Bulatov	efad64bc7f	Expect compute shutdown test log error (#3262 ) https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3261/debug/3833043374/index.html#suites/ffbb7f9930a77115316b58ff32b7c719/1f6ebaedc0a113a1/ Spotted a flacky test that appeared after https://github.com/neondatabase/neon/pull/3227 changes	2023-01-04 10:45:11 +00:00
Kirill Bulatov	10dae79c6d	Tone down safekeeper and pageserver walreceiver errors (#3227 ) Closes https://github.com/neondatabase/neon/issues/3114 Adds more typization into errors that appear during protocol messages (`FeMessage`), postgres and walreceiver connections. Socket IO errors are now better detected and logged with lesser (INFO, DEBUG) error level, without traces that they were logged before, when they were wrapped in anyhow context.	2023-01-03 20:42:04 +00:00
Heikki Linnakangas	e9583db73b	Remove code and test to generate flamegraph on GetPage requests. (#3257 ) It was nice to have and useful at the time, but unfortunately the method used to gather the profiling data doesn't play nicely with 'async'. PR #3228 will turn 'get_page_at_lsn' function async, which will break the profiling support. Let's remove it, and re-introduce some kind of profiling later, using some different method, if we feel like we need it again.	2023-01-03 20:11:32 +02:00
Heikki Linnakangas	7ff591ffbf	On-Demand Download The code in this change was extracted from #2595 (Heikki’s on-demand download draft PR). High-Level Changes - New RemoteLayer Type - On-Demand Download As An Effect Of Page Reconstruction - Breaking Semantics For Physical Size Metrics There are several follow-up work items planned. Refer to the Epic issue on GitHub: https://github.com/neondatabase/neon/issues/2029 closes https://github.com/neondatabase/neon/pull/3013 Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> New RemoteLayer Type ==================== Instead of downloading all layers during tenant attach, we create RemoteLayer instances for each of them and add them to the layer map. On-Demand Download As An Effect Of Page Reconstruction ====================================================== At the heart of pageserver is Timeline::get_reconstruct_data(). It traverses the layer map until it has collected all the data it needs to produce the page image. Most code in the code base uses it, though many layers of indirection. Before this patch, the function would use synchronous filesystem IO to load data from disk-resident layer files if the data was not cached. That is not possible with RemoteLayer, because the layer file has not been downloaded yet. So, we do the download when get_reconstruct_data gets there, i.e., “on demand”. The mechanics of how the download is done are rather involved, because of the infamous async-sync-async sandwich problem that plagues the async Rust world. We use the new PageReconstructResult type to work around this. Its introduction is the cause for a good amount of code churn in this patch. Refer to the block comment on `with_ondemand_download()` for details. Breaking Semantics For Physical Size Metrics ============================================ We rename prometheus metric pageserver_{current,resident}_physical_size to reflect what this metric actually represents with on-demand download. This intentionally BREAKS existing grafana dashboard and the cost model data pipeline. Breaking is desirable because the meaning of this metrics has changed with on-demand download. See https://docs.google.com/document/d/12AFpvKY-7FZdR5a4CaD6Ir_rI3QokdCLSPJ6upHxJBo/edit# for how we will handle this breakage. Likewise, we rename the new billing_metrics’s PhysicalSize => ResidentSize. This is not yet used anywhere, so, this is not a breaking change. There is still a field called TimelineInfo::current_physical_size. It is now the sum of the layer sizes in layer map, regardless of whether local or remote. To compute that sum, we added a new trait method PersistentLayer::file_size(). When updating the Python tests, we got rid of current_physical_size_non_incremental. An earlier commit removed it from the OpenAPI spec already, so this is not a breaking change. test_timeline_size.py has grown additional assertions on the resident_physical_size metric.	2022-12-21 19:16:39 +01:00
Alexander Bayandin	486a985629	mypy: enable check_untyped_defs (#3142 ) Enable `check_untyped_defs` and fix warnings.	2022-12-21 09:38:42 +00:00
Heikki Linnakangas	8e2edfcf39	Retry remote downloads. Remote operations fail sometimes due to network failures or other external reasons. Add retry logic to all the remote downloads, so that a transient failure at pageserver startup or tenant attach doesn't cause the whole tenant to be marked as Broken. Like in the uploads retry logic, we print the failure to the log as a WARNing after three retries, but keep retrying. We will retry up to 10 times now, before returning the error to the caller. To test the retries, I created a new RemoteStorage wrapper that simulates failures, by returning an error for the first N times that a remote operation is performed. It can be enabled by setting a new "test_remote_failures" option in the pageserver config file. Fixes #3112	2022-12-20 14:27:24 +02:00
Kirill Bulatov	2c11f1fa95	Use separate broker per Python test (#3158 ) And add its logs to Allure reports per test	2022-12-20 11:06:21 +00:00
Kirill Bulatov	56d8c25dc8	Revert "Use local brokers" This reverts commit `f9f57e211a`.	2022-12-20 01:57:36 +02:00
Kirill Bulatov	f9f57e211a	Use local brokers	2022-12-20 01:55:59 +02:00
Kirill Bulatov	49a211c98a	Add neon_local test	2022-12-19 21:43:36 +02:00
Alexander Bayandin	12e6f443da	test_perf_pgbench: switch to server-side data generation (#3058 ) To offload the network and reduce its impact, I suggest switching to server-side data generation for the pgbench initialize workflow.	2022-12-18 00:02:04 +00:00
Alexander Bayandin	64775a0a75	test_runner/performance: fix flush for NeonCompare (#3135 ) Fix performance tests: ``` AttributeError: 'NeonCompare' object has no attribute 'pageserver_http' ```	2022-12-16 17:45:38 +00:00
Heikki Linnakangas	6dec85b19d	Redefine the timeline_gc API to not perform a forced compaction Previously, the /v1/tenant/:tenant_id/timeline/:timeline_id/do_gc API call performed a flush and compaction on the timeline before GC. Change it not to do that, and change all the tests that used that API to perform compaction explicitly. The compaction happens at a slightly different point now. Previously, the code performed the `refresh_gc_info_internal` step first, and only then did compaction on all the timelines. I don't think that was what was originally intended here. Presumably the idea with compaction was to make some old layer files available for GC. But if we're going to flush the current in-memory layer to disk, surely you would want to include the newly-written layer in the compaction too. I guess this didn't make any difference to the tests in practice, but in any case, the tests now perform the flush and compaction before any of the GC steps. Some of the tests might not need the compaction at all, but I didn't try hard to determine which ones might need it. I left it out from a few tests that intentionally tested calling do_gc with an invalid tenant or timeline ID, though.	2022-12-16 11:05:55 +02:00

1 2 3 4 5 ...

310 Commits