rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-06 05:30:38 +00:00

Author	SHA1	Message	Date
Arpad Müller	ee7bbdda0e	Create new metric for directory counts (#6736 ) There is O(n^2) issues due to how we store these directories (#6626), so it's good to keep an eye on them and ensure the numbers stay low. The new per-timeline metric `pageserver_directory_entries_count` isn't perfect, namely we don't calculate it every time we attach the timeline, but only if there is an actual change. Also, it is a collective metric over multiple scalars. Lastly, we only emit the metric if it is above a certain threshold. However, the metric still give a feel for the general size of the timeline. We care less for small values as the metric is mainly there to detect and track tenants with large directory counts. We also expose the directory counts in `TimelineInfo` so that one can get the detailed size distribution directly via the pageserver's API. Related: #6642 , https://github.com/neondatabase/cloud/issues/10273	2024-02-14 02:12:00 +01:00
Christian Schwarz	7fa732c96c	refactor(virtual_file): take owned buffer in VirtualFile::write_all (#6664 ) Building atop #6660 , this PR converts VirtualFile::write_all to owned buffers. Part of https://github.com/neondatabase/neon/issues/6663	2024-02-13 18:46:25 +01:00
Arthur Petukhovsky	4be2223a4c	Discrete event simulation for safekeepers (#5804 ) This PR contains the first version of a [FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc) simulation testing for safekeeper and walproposer. ### desim This is a core "framework" for running determenistic simulation. It operates on threads, allowing to test syncronous code (like walproposer). `libs/desim/src/executor.rs` contains implementation of a determenistic thread execution. This is achieved by blocking all threads, and each time allowing only a single thread to make an execution step. All executor's threads are blocked using `yield_me(after_ms)` function. This function is called when a thread wants to sleep or wait for an external notification (like blocking on a channel until it has a ready message). `libs/desim/src/chan.rs` contains implementation of a channel (basic sync primitive). It has unlimited capacity and any thread can push or read messages to/from it. `libs/desim/src/network.rs` has a very naive implementation of a network (only reliable TCP-like connections are supported for now), that can have arbitrary delays for each package and failure injections for breaking connections with some probability. `libs/desim/src/world.rs` ties everything together, to have a concept of virtual nodes that can have network connections between them. ### walproposer_sim Has everything to run walproposer and safekeepers in a simulation. `safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`, `send_wal.rs` and `timelines_global_map.rs`. `walproposer_api.rs` implements all walproposer callback to use simulation library. `simulation.rs` defines a schedule – a set of events like `restart <sk>` or `write_wal` that should happen at time `<ts>`. It also has code to spawn walproposer/safekeeper threads and provide config to them. ### tests `simple_test.rs` has tests that just start walproposer and 3 safekeepers together in a simulation, and tests that they are not crashing right away. `misc_test.rs` has tests checking more advanced simulation cases, like crashing or restarting threads, testing memory deallocation, etc. `random_test.rs` is the main test, it checks thousands of random seeds (schedules) for correctness. It roughly corresponds to running a real python integration test in an environment with very unstable network and cpu, but in a determenistic way (each seed results in the same execution log) and much much faster. Closes #547 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-02-12 20:29:57 +00:00
Joonas Koivunen	7ea593db22	refactor(LayerManager): resident layers query (#6634 ) Refactor out layer accesses so that we can have easy access to resident layers, which are needed for number of cases instead of layers for eviction. Simplifies the heatmap building by only using Layers, not RemoteTimelineClient. Cc: #5331	2024-02-12 17:13:35 +02:00
Christian Schwarz	242dd8398c	refactor(blob_io): use owned buffers (#6660 ) This PR refactors the `blob_io` code away from using slices towards taking owned buffers and return them after use. Using owned buffers will eventually allow us to use io_uring for writes. part of https://github.com/neondatabase/neon/issues/6663 Depends on https://github.com/neondatabase/tokio-epoll-uring/pull/43 The high level scheme is as follows: - call writing functions with the `BoundedBuf` - return the underlying `BoundedBuf::Buf` for potential reuse in the caller NB: Invoking `BoundedBuf::slice(..)` will return a slice that _includes the uninitialized portion of `BoundedBuf`_. I.e., the portion between `bytes_init()` and `bytes_total()`. It's a safe API that actually permits access to uninitialized memory. Not great. Another wrinkle is that it panics if the range has length 0. However, I don't want to switch away from the `BoundedBuf` API, since it's what tokio-uring uses. We can always weed this out later by replacing `BoundedBuf` with our own type. Created an issue so we don't forget: https://github.com/neondatabase/tokio-epoll-uring/issues/46	2024-02-12 15:58:55 +01:00
Joonas Koivunen	c77411e903	cleanup around `attach` (#6621 ) The smaller changes I found while looking around #6584. - rustfmt was not able to format handle_timeline_create - fix Generation::get_suffix always allocating - Generation was missing a `#[track_caller]` for panicky method - attach has a lot of issues, but even with this PR it cannot be formatted by rustfmt - moved the `preload` span to be on top of `attach` -- it is awaited inline - make disconnected panic! or unreachable! into expect, expect_err	2024-02-12 14:52:20 +02:00
Heikki Linnakangas	0fd3cd27cb	Tighten up the check for garbage after end-of-tar. Turn the warning into an error, if there is garbage after the end of imported tar file. However, it's normal for 'tar' to append extra empty blocks to the end, so tolerate those without warnings or errors.	2024-02-10 12:05:02 +02:00
Christian Schwarz	5779c7908a	revert two recent `heavier_once_cell` changes (#6704 ) This PR reverts - https://github.com/neondatabase/neon/pull/6589 - https://github.com/neondatabase/neon/pull/6652 because there's a performance regression that's particularly visible at high layer counts. Most likely it's because the switch to RwLock inflates the ``` inner: heavier_once_cell::OnceCell<ResidentOrWantedEvicted>, ``` size from 48 to 88 bytes, which, by itself is almost a doubling of the cache footprint, and probably the fact that it's now larger than a cache line also doesn't help. See this chat on the Neon discord for more context: https://discord.com/channels/1176467419317940276/1204714372295958548/1205541184634617906 I'm reverting 6652 as well because it might also have perf implications, and we're getting close to the next release. We should re-do its changes after the next release, though. cc @koivunej cc @ivaxer	2024-02-09 22:22:40 +00:00
Arseny Sher	1bb9abebf2	Remove WAL segments from s3 in batches. Do list-delete operations in batches instead of doing full list first, to ensure deletion makes progress even if there are a lot of files to remove. To this end, add max_keys limit to remote storage list_files.	2024-02-09 22:11:53 +04:00
Joonas Koivunen	eb919cab88	prepare to move timeouts and cancellation handling to remote_storage (#6696 ) This PR is preliminary cleanups and refactoring around `remote_storage` for next PR which will move the timeouts and cancellation into `remote_storage`. Summary: - smaller drive-by fixes - code simplification - refactor common parts like `DownloadError::is_permanent` - align error types with `RemoteStorage::list_*` to use more `download_retry` helper Cc: #6096	2024-02-09 12:52:58 +00:00
Joonas Koivunen	c09993396e	fix: secondary tenant relative order eviction (#6491 ) Calculate the `relative_last_activity` using the total evicted and resident layers similar to what we originally planned. Cc: #5331	2024-02-09 00:37:57 +02:00
John Spray	e8d2843df6	storage controller: improved handling of node availability on restart (#6658 ) - Automatically set a node's availability to Active if it is responsive in startup_reconcile - Impose a 5s timeout of HTTP request to list location conf, so that an unresponsive node can't hang it for minutes - Do several retries if the request fails with a retryable error, to be tolerant of concurrent pageserver & storage controller restarts - Add a readiness hook for use with k8s so that we can tell when the startup reconciliaton is done and the service is fully ready to do work. - Add /metrics to the list of un-authenticated endpoints (this is unrelated but we're touching the line in this PR already, and it fixes auth error spam in deployed container.) - A test for the above. Closes: #6670	2024-02-08 18:00:53 +00:00
John Spray	af91a28936	pageserver: shard splitting (#6379 ) ## Problem One doesn't know at tenant creation time how large the tenant will grow. We need to be able to dynamically adjust the shard count at runtime. This is implemented as "splitting" of shards into smaller child shards, which cover a subset of the keyspace that the parent covered. Refer to RFC: https://github.com/neondatabase/neon/pull/6358 Part of epic: #6278 ## Summary of changes This PR implements the happy path (does not cleanly recover from a crash mid-split, although won't lose any data), without any optimizations (e.g. child shards re-download their own copies of layers that the parent shard already had on local disk) - Add `/v1/tenant/:tenant_shard_id/shard_split` API to pageserver: this copies the shard's index to the child shards' paths, instantiates child `Tenant` object, and tears down parent `Tenant` object. - Add `splitting` column to `tenant_shards` table. This is written into an existing migration because we haven't deployed yet, so don't need to cleanly upgrade. - Add `/control/v1/tenant/:tenant_id/shard_split` API to attachment_service, - Add `test_sharding_split_smoke` test. This covers the happy path: future PRs will add tests that exercise failure cases.	2024-02-08 15:35:13 +00:00
Christian Schwarz	c52495774d	tokio-epoll-uring: expose its metrics in pageserver's `/metrics` (#6672 ) context: https://github.com/neondatabase/neon/issues/6667	2024-02-07 23:58:54 +00:00
Christian Schwarz	c561ad4e2e	feat: expose locked memory in pageserver `/metrics` (#6669 ) context: https://github.com/neondatabase/neon/issues/6667	2024-02-07 19:39:52 +00:00
Christian Schwarz	51f9385b1b	live-reconfigurable virtual_file::IoEngine (#6552 ) This PR adds an API to live-reconfigure the VirtualFile io engine. It also adds a flag to `pagebench get-page-latest-lsn`, which is where I found this functionality to be useful: it helps compare the io engines in a benchmark without re-compiling a release build, which took ~50s on the i3en.3xlarge where I was doing the benchmark. Switching the IO engine is completely safe at runtime.	2024-02-07 17:47:55 +00:00
Konstantin Knizhnik	f3d7d23805	Some small WAL records can write a lot of data to KV storage, so perform checkpoint check more frequently (#6639 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1707149618314539?thread_ts=1707081520.140049&cid=C04DGM6SMTM ## Summary of changes Perform checkpoint check after processing `ingest_batch_size` (default 100) WAL records. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-07 08:47:19 +02:00
Christian Schwarz	d7b29aace7	refactor(walredo): don't create WalRedoManager for broken tenants (#6597 ) When we'll later introduce a global pool of pre-spawned walredo processes (https://github.com/neondatabase/neon/issues/6581), this refactoring avoids plumbing through the reference to the pool to all the places where we create a broken tenant. Builds atop the refactoring in #6583	2024-02-06 16:20:02 +01:00
Christian Schwarz	53a3ed0a7e	debug_assert presence of `shard_id` tracing field (#6572 ) also: fixes https://github.com/neondatabase/neon/issues/6638	2024-02-06 14:43:33 +00:00
John Spray	6297843317	tests: flakiness fixes in pageserver tests (#6632 ) Fix several test flakes: - test_sharding_service_smoke had log failures on "Dropped LSN updates" - test_emergency_mode had log failures on a deletion queue shutdown check, where the check was incorrect because it was expecting channel receiver to stay alive after cancellation token was fired. - test_secondary_mode_eviction had racing heatmap uploads because the test was using a live migration hook to set up locations, where that migration was itself uploading heatmaps and generally making the situation more complex than it needed to be. These are the failure modes that I saw when spot checking the last few failures of each test. This will mostly/completely address #6511, but I'll leave that ticket open for a couple days and then check if either of the tests named in that ticket are flaky. Related #6511	2024-02-06 12:49:41 +00:00
Christian Schwarz	0de46fd6f2	heavier_once_cell: switch to tokio::sync::RwLock (#6589 ) Using the RwLock reduces contention on the hot path. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-02-06 14:04:15 +02:00
Joonas Koivunen	53743991de	uploader: avoid cloning vecs just to get Bytes (#6645 ) Fix cloning the serialized heatmap on every attempt by just turning it into `bytes::Bytes` before clone so it will be a refcounted instead of refcounting a vec clone later on. Also fixes one cancellation token cloning I had missed in #6618. Cc: #6096	2024-02-06 11:34:13 +00:00
Christian Schwarz	edcde05c1c	refactor(walredo): split up the massive `walredo.rs` (#6583 ) Part of https://github.com/neondatabase/neon/issues/6581	2024-02-06 09:44:49 +00:00
Christian Schwarz	e196d974cc	pagebench: actually implement `--num_clients` (#6640 ) Will need this to validate per-tenant throttling in https://github.com/neondatabase/neon/issues/5899	2024-02-06 10:34:16 +01:00
Joonas Koivunen	947165788d	refactor: needless cancellation token cloning (#6618 ) The solution we ended up for `backoff::retry` requires always cloning of cancellation tokens even though there is just `.await`. Fix that, and also turn the return type into `Option<Result<T, E>>` avoiding the need for the `E::cancelled()` fn passed in. Cc: #6096	2024-02-06 09:39:06 +02:00
Joonas Koivunen	5e8deca268	metrics: remove broken tenants (#6586 ) Before tenant migration it made sense to leak broken tenants in the metrics until restart. Nowdays it makes less sense because on cancellations we set the tenant broken. The set metric still allows filterable alerting. Fixes: #6507	2024-02-05 14:49:35 +02:00
Joonas Koivunen	db89b13aaa	fix: use the shared constant download buffer size (#6620 ) Noticed that we had forgotten to use `remote_timeline_client.rs::BUFFER_SIZE` in one instance.	2024-02-05 13:10:08 +01:00
Arpad Müller	56cf360439	Don't preserve temp files on creation errors of delta layers (#6612 ) There is currently no cleanup done after a delta layer creation error, so delta layers can accumulate. The problem gets worse as the operation gets retried and delta layers accumulate on the disk. Therefore, delete them from disk (if something has been written to disk).	2024-02-05 09:53:37 +00:00
Joonas Koivunen	70f646ffe2	More logging fixes (#6584 ) I was on-call this week, these would had made me understand more/faster of the system: - move stray attaching start logging inside the span it starts, add generation - log ancestor timeline_id or bootstrapping in the beginning of timeline creation	2024-02-05 09:34:03 +02:00
Arpad Müller	aac8eb2c36	Minor logging improvements (#6593 ) * log when `lsn_by_timestamp` finished together with its result * add back logging of the layer name as suggested in https://github.com/neondatabase/neon/pull/6549#discussion_r1475756808	2024-02-03 02:16:20 +01:00
John Spray	2e5eab69c6	tests: remove test_gc_cutoff (#6587 ) This test became flaky when postgres retry handling was fixed to use backoff delays -- each iteration in this test's loop was taking much longer because pgbench doesn't fail until postgres has given up on retrying to the pageserver. We are just removing it, because the condition it tests is no longer risky: we reload all metadata from remote storage on restart, so crashing directly between making local changes and doing remote uploads isn't interesting any more. Closes: https://github.com/neondatabase/neon/issues/2856 Closes: https://github.com/neondatabase/neon/issues/5329	2024-02-02 18:20:18 +00:00
John Spray	46fb1a90ce	pageserver: avoid calculating/sending logical sizes on shard !=0 (#6567 ) ## Problem Sharded tenants only maintain accurate relation sizes on shard 0. Therefore logical size can only be calculated on shard 0. Fortunately it is also only _needed_ on shard 0, to provide Safekeeper feedback and to send consumption metrics. Closes: #6307 ## Summary of changes - Send 0 for logical size to safekeepers on shards !=0 - Skip logical size warmup task on shards !=0 - Skip imitate_layer_accesses on shards !=0	2024-02-02 15:52:03 +00:00
John Spray	56171cbe8c	pageserver: more permissive activation timeout when testing (#6564 ) ## Problem The 5 second activation timeout is appropriate for production environments, where we want to give a prompt response to the cloud control plane, and if we fail it will retry the call. In tests however, we don't want every call to e.g. timeline create to have to come with a retry wrapper. This issue has always been there, but it is more apparent in sharding tests that concurrently attach several tenant shards. Closes: https://github.com/neondatabase/neon/issues/6563 ## Summary of changes When `testing` feature is enabled, make `ACTIVE_TENANT_TIMEOUT` 30 seconds instead of 5 seconds.	2024-02-02 15:14:42 +01:00
Arpad Müller	48b05b7c50	Add a time_travel_remote_storage http endpoint (#6533 ) Adds an endpoint to the pageserver to S3-recover an entire tenant to a specific given timestamp. Required input parameters: * `travel_to`: the target timestamp to recover the S3 state to * `done_if_after`: a timestamp that marks the beginning of the recovery process. retries of the query should keep this value constant. it must be after `travel_to`, and also after any changes we want to revert, and must represent a point in time before the endpoint is being called, all of these time points in terms of the time source used by S3. these criteria need to hold even in the face of clock differences, so I recommend waiting a specific amount of time, then taking `done_if_after`, then waiting some amount of time again, and only then issuing the request. Also important to note: the timestamps in S3 work at second accuracy, so one needs to add generous waits before and after for the process to work smoothly (at least 2-3 seconds). We ignore the added test for the mocked S3 for now due to a limitation in moto: https://github.com/getmoto/moto/issues/7300 . Part of https://github.com/neondatabase/cloud/issues/8233	2024-02-02 14:52:12 +01:00
John Spray	24e916d37f	pageserver: fix a syntax error in swagger (#6566 ) A description was written as a follow-on to a section line, rather than in the proper `description:` part. This caused swagger parsers to rightly reject it.	2024-02-02 10:35:09 +00:00
Heikki Linnakangas	350865392c	Print checkpoint key contents with "pagectl print-layer-file" (#6541 ) This was very useful in debugging the bugs fixed in #6410 and #6502. There's a lot more we could do. This only adds the printing to delta layers, not image layers, for example, and it might be useful to print details of more record types. But this is a good start.	2024-02-02 01:35:31 +02:00
Christian Schwarz	1be5e564ce	feat(walredo): use posix_spawn by moving close_fds() work to walredo C code (#6574 ) The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually using `gdb` that we're now forking out the walredo process using `posix_spawn`. refs https://github.com/neondatabase/neon/issues/6565	2024-02-01 22:38:34 +01:00
Christian Schwarz	7a70ef991f	feat(walredo): various observability improvements (#6573 ) - log when we start walredo process - include tenant shard id in walredo argv - dump some basic walredo state in tenant details api - more suitable walredo process launch histogram buckets - avoid duplicate tracing labels in walredo launch spans	2024-02-01 21:59:40 +01:00
Vlad Lazar	221531c9db	pageserver: lift ancestor timeline logic from read path (#6543 ) When the read path needs to follow a key into the ancestor timeline, it needs to wait for said ancestor to become active and aware of it's branching lsn. The logic is lifted into a separate function with it's own new error type. This is done because the vectored read path needs the same logic. It's also the reason for the newly introduced error type. When we'll switch the read path to proxy into `get_vectored`, we can remove the duplicated variants from `PageReconstructError`.	2024-02-01 10:35:18 +00:00
Christian Schwarz	4c173456dc	pagebench: fix percentiles reporting (#6547 ) Before this patch, pagebench was always showing the same value. refs https://github.com/neondatabase/neon/issues/6509	2024-01-31 23:29:48 +00:00
Christian Schwarz	e82625b77d	refactor(pageserver main): signal handling (#6554 ) This refactoring makes it easier to experimentally replace BACKGROUND_RUNTIME with a single-threaded runtime. Found this useful [during benchmarking](https://github.com/neondatabase/neon/pull/6555).	2024-01-31 23:25:57 +00:00
Joonas Koivunen	3d5fab127a	rewrite Gate impl for better observability (#6542 ) changes: - two messages instead of message every second when gate was closing - replace the gate name string by using a pointer - slow GateGuards are likely to log who they were (see example) example found in regress tests: <https://github.com/neondatabase/neon/pull/6542#issuecomment-1919009256>	2024-01-31 22:15:58 +00:00
Joonas Koivunen	66719d7eaf	logging: fix span usage (#6549 ) Fixes some duplication due to extra or misconfigured `#[instrument]`, while filling in the `timeline_id` to delete timeline flow calls.	2024-01-31 20:52:00 +00:00
Konstantin Knizhnik	9a9d9beaee	Download SLRU segments on demand (#6151 ) ## Problem See https://github.com/neondatabase/cloud/issues/8673 ## Summary of changes Download missed SLRU segments from page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-31 21:39:18 +02:00
Arpad Müller	47380be12d	Remove version param from get_lsn_by_timestamp (#6551 ) This removes the last remnants of the version param added by #5608 , concluding the transition plan laid out in https://github.com/neondatabase/cloud/pull/7553#discussion_r1370473911 . It follows PR https://github.com/neondatabase/cloud/pull/9202, which we now assume has been deployed to all environments. Full history: * https://github.com/neondatabase/neon/pull/5608 * https://github.com/neondatabase/cloud/pull/7553 * https://github.com/neondatabase/neon/pull/6178 * https://github.com/neondatabase/cloud/pull/9202	2024-01-31 15:30:19 +01:00
John Spray	4010adf653	control_plane/attachment_service: complete APIs (#6394 ) Depends on: https://github.com/neondatabase/neon/pull/6468 ## Problem The sharding service will be used as a "virtual pageserver" by the control plane -- so it needs the set of pageserver APIs that the control plane uses, and to present them under identical URLs, including prefix (/v1). ## Summary of changes - Add missing APIs: - Tenant deletion - Timeline deletion - Node list (used in test now, later in tools) - `/location_config` API (for migrating tenants into the sharding service) - Rework attachment service URLs: - `/v1` prefix is used for pageserver-compatible APIs - `/upcall/v1` prefix is used for APIs that are called by the pageserver (re-attach and validate) - `/debug/v1` prefix is used for endpoints that are for testing - `/control/v1` prefix is used for new sharding service APIs that do not mimic a pageserver API, such as registering and configuring nodes. - Add test_sharding_service. The sharding service already had some collateral coverage from its use in general tests, but this is the first dedicated testing for it.	2024-01-31 12:23:06 +00:00
Christian Schwarz	79137a089f	fix(#6366 ): pageserver: incorrect log level for Tenant not found during basebackup (#6400 ) Before this patch, when requesting basebackup for a not-found tenant or timeline, we'd emit an ERROR-level log entry with a huge stack trace. See #6366 "Details" section for an example With this patch, we log at INFO level and only a single line. Example: ``` 2024-01-19T14:16:11.479800Z INFO page_service_conn_main{peer_addr=127.0.0.1:43448}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4b 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Tenant d69a536d529a68fcf85bc070030cdf4b not found 2024-01-19T14:19:35.807819Z INFO page_service_conn_main{peer_addr=127.0.0.1:48862}: query handler for 'basebackup d69a536d529a68fcf85bc070030cdf4a 035484e9c28d8d0138a492caadd03ffd 0/2204340 --gzip' entity not found: Timeline d69a536d529a68fcf85bc070030cdf4a/035484e9c28d8d0138a492caadd03ffd was not found ``` fixes https://github.com/neondatabase/neon/issues/6366 Changes ------- - Change `handle_basebackup_request` to return a `QueryError` - The new `impl From<WaitLsnError> for QueryError` is needed so the `?` at `wait_lsn()` call in `handle_basebackup_request` works again. It's duplicating `impl From<WaitLsnError> for PageStreamError`. - Remove hard-to-spot conversion of `handle_basebackup_request` return value to anyhow::Result (the place where I replaced `anyhow::Ok` with `Result::<(), QueryError>::Ok(())` - Add forgotten distinguished handling for "Tenant not found" case in `impl From<GetActiveTenantError> for QueryError` This was not at all pleasant, and I find it very hard to follow the various error conversions. It took me a while to spot the hard-to-spot `anyhow::Ok` thing above. It would have been caught by the compiler if we weren't auto-converting `anyhow::Error` into `QueryError::Other`. We should move away from that, in my opinion, instead forcing each `.context()` site to become `.context().map_err(QueryError::Other)`. But that's for a future PR.	2024-01-30 13:10:48 +00:00
Joonas Koivunen	e3cb715e8a	fix: capture initdb stderr, discard others (#6524 ) When using spawn + wait_with_output instead of std::process::Command::output or tokio::process::Command::output we must configure the redirection. Fixes: #6523 by discarding the stdout completely, we only care about stderr if any.	2024-01-30 14:07:58 +01:00
Vlad Lazar	0c7b89235c	pageserver: add range layer map search implementation (#6469 ) ## Problem There's no efficient way of querying the layer map for a range. ## Summary of changes Introduce a range query for the layer map (`LayerMap::range_search`). There's two broad steps to it: 1. Find all coverage changes for layers that intersect the queried range (see `LayerCoverage::range_overlaps`). The slightly tricky part is dealing with the start of the range. We can either be aligned with a layer or not and we need to treat these cases differently. 2. Iterate over the coverage changes and collect the result. For this we use a two pointer approach: the trailing pointer tracks the start of the current range (current location in the key space) and the forward pointer tracks the next coverage change. Plugging the range search into the read path is deferred to a future PR. ## Performance I adapted the layer map benchmarks on a local branch. Range searches are between 2x and 2.5x slower than point searches. That's in line with what I expected since we query thelayer map twice. Since `Timeline::get` will proxy to `Timeline::get_vectored` we can special case the one element layer map range search at that point.	2024-01-29 09:47:12 +00:00
Joonas Koivunen	1e9a50bca8	disk_usage_eviction_task: cleanup summaries (#6490 ) This is the "partial revert" of #6384. The summaries turned out to be expensive due to naive vec usage, but also inconclusive because of the additional context required. In addition to removing summary traces, small refactoring is done.	2024-01-29 10:38:40 +02:00

1 2 3 4 5 ...

1860 Commits