rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 04:30:38 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	e4898a6e60	Don't pass InvalidTransactionId to update_next_xid. (#6410 ) update_next_xid() doesn't have any special treatment for the invalid or other special XIDs, so it will treat InvalidTransactionId (0) as a regular XID. If old nextXid is smaller than 2^31, 0 will look like a very old XID, and nothing happens. But if nextXid is greater than 2^31 0 will look like a very new XID, and update_next_xid() will incorrectly bump up nextXID.	2024-01-20 18:04:16 +02:00
Christian Schwarz	760a48207d	fixup(#6037 ): page_service hangs up within 10ms if there's no message (#6388 ) From #6037 on, until this patch, if the client opens the connection but doesn't send a `PagestreamFeMessage` within the first 10ms, we'd close the connection because `self.timeline_cancelled()` returns. It returns because `self.shard_timelines` is still empty at that point: it gets filled lazily within the handlers for the incoming messages. Changes ------- The question is: if we can't check for timeline cancellation, what else do we need to be cancellable for? `tenant.cancel` is also a bad choice because the `tenant` (shard) we pick at the top of handle_pagerequests might indeed go away over the course of the connection lifetime, but other shards may still be there. The correct solution, I think, is to be responsive to task_mgr cancellation, because the connection handler runs in a task_mgr task and it is already the current canonical way how we shut down a tenant's / timelin's page_service connections (see `Tenant::shutdown` / `Timeline::shutdown`). So, rename the function and make it sensitive to task_mgr cancellation.	2024-01-19 19:16:01 +00:00
Christian Schwarz	e8f773387d	pagebench: avoid noise about `CopyFail` in PS logs (#6392 ) Before this patch, pagebench get-page-latest-lsn would sometimes cause noisy errors in pageserver log about `CopyFail` protocol message. refs https://github.com/neondatabase/neon/issues/6390	2024-01-18 18:50:42 +00:00
Christian Schwarz	00936d19e1	pagebench: use tracing panic hook (#6393 )	2024-01-18 18:39:38 +00:00
Joonas Koivunen	57155ada77	temp: human readable summaries for relative access time compared to absolute (#6384 ) With testing the new eviction order there is a problem of all of the (currently rare) disk usage based evictions being rare and unique; this PR adds a human readable summary of what absolute order would had done and what the relative order does. Assumption is that these loggings will make the few evictions runs in staging more useful. Cc: #5304 for allowing testing in the staging	2024-01-18 17:21:08 +02:00
John Spray	bd19290d9f	pageserver: add shard_id to metric labels (#6308 ) ## Problem tenant_id/timeline_id is no longer a full identifier for metrics from a `Tenant` or `Timeline` object. Closes: https://github.com/neondatabase/neon/issues/5953 ## Summary of changes Include `shard_id` label everywhere we have `tenant_id`/`timeline_id` label.	2024-01-18 10:52:18 +00:00
John Spray	b6ec11ad78	control_plane: generalize attachment_service to handle sharding (#6251 ) ## Problem To test sharding, we need something to control it. We could write python code for doing this from the test runner, but this wouldn't be usable with neon_local run directly, and when we want to write tests with large number of shards/tenants, Rust is a better fit efficiently handling all the required state. This service enables automated tests to easily get a system with sharding/HA without the test itself having to set this all up by hand: existing tests can be run against sharded tenants just by setting a shard count when creating the tenant. ## Summary of changes Attachment service was previously a map of TenantId->TenantState, where the principal state stored for each tenant was the generation and the last attached pageserver. This enabled it to serve the re-attach and validate requests that the pageserver requires. In this PR, the scope of the service is extended substantially to do overall management of tenants in the pageserver, including tenant/timeline creation, live migration, evacuation of offline pageservers etc. This is done using synchronous code to make declarative changes to the tenant's intended state (`TenantState.policy` and `TenantState.intent`), which are then translated into calls into the pageserver by the `Reconciler`. Top level summary of modules within `control_plane/attachment_service/src`: - `tenant_state`: structure that represents one tenant shard. - `service`: implements the main high level such as tenant/timeline creation, marking a node offline, etc. - `scheduler`: for operations that need to pick a pageserver for a tenant, construct a scheduler and call into it. - `compute_hook`: receive notifications when a tenant shard is attached somewhere new. Once we have locations for all the shards in a tenant, emit an update to postgres configuration via the neon_local `LocalEnv`. - `http`: HTTP stubs. These mostly map to methods on `Service`, but are separated for readability and so that it'll be easier to adapt if/when we switch to another RPC layer. - `node`: structure that describes a pageserver node. The most important attribute of a node is its availability: marking a node offline causes tenant shards to reschedule away from it. This PR is a precursor to implementing the full sharding service for prod (#6342). What's the difference between this and a production-ready controller for pageservers? - JSON file persistence to be replaced with a database - Limited observability. - No concurrency limits. Marking a pageserver offline will try and migrate every tenant to a new pageserver concurrently, even if there are thousands. - Very simple scheduler that only knows to pick the pageserver with fewest tenants, and place secondary locations on a different pageserver than attached locations: it does not try to place shards for the same tenant on different pageservers. This matters little in tests, because picking the least-used pageserver usually results in round-robin placement. - Scheduler state is rebuilt exhaustively for each operation that requires a scheduler. - Relies on neon_local mechanisms for updating postgres: in production this would be something that flows through the real control plane. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-01-17 18:01:08 +00:00
John Spray	4cec95ba13	pageserver: add list API for LocationConf (#6329 ) ## Problem The `/v1/tenant` listing API only applies to attached tenants. For an external service to implement a global reconciliation of its list of shards vs. what's on the pageserver, we need a full view of what's in TenantManager, including secondary tenant locations, and InProgress locations. Dependency of https://github.com/neondatabase/neon/pull/6251 ## Summary of changes - Add methods to Tenant and SecondaryTenant to reconstruct the LocationConf used to create them. - Add `GET /v1/location_config` API	2024-01-17 13:34:51 +00:00
Arpad Müller	ab86060d97	Copy initdb if loading from different timeline ID (#6363 ) Previously, if we: 1. created a new timeline B from a different timeline's A initdb 2. deleted timeline A the initdb for timeline B would be gone, at least in a world where we are deleting initdbs upon timeline deletion. This world is imminent (#6226). Therefore, if the pageserver is instructed to load the initdb from a different timeline ID, copy it to the newly created timeline's directory in S3. This ensures that we can disaster recover the new timeline as well, regardless of whether the original timeline was deleted or not. Part of https://github.com/neondatabase/neon/issues/5282.	2024-01-17 12:42:42 +01:00
John Spray	bf4e708646	pageserver: eviction for secondary mode tenants (#6225 ) Follows #6123 Closes: https://github.com/neondatabase/neon/issues/5342 The approach here is to avoid using `Layer` from secondary tenants, and instead make the eviction types (e.g. `EvictionCandidate`) have a variant that carries a Layer for attached tenants, and a different variant for secondary tenants. Other changes: - EvictionCandidate no longer carries a `Timeline`: this was only used for providing a witness reference to remote timeline client. - The types for returning eviction candidates are all in disk_usage_eviction_task.rs now, whereas some of them were in timeline.rs before. - The EvictionCandidate type replaces LocalLayerInfoForDiskUsageEviction type, which was basically the same thing.	2024-01-16 10:29:26 +00:00
John Spray	887e94d7da	page_service: more efficient page_service -> shard lookup (#6037 ) ## Problem In #5980 the page service connection handler gets a simple piece of logic for finding the right Timeline: at connection time, it picks an arbitrary Timeline, and then when handling individual page requests it checks if the original timeline is the correct shard, and if not looks one up. This is pretty slow in the case where we have to go look up the other timeline, because we take the big tenants manager lock. ## Summary of changes - Add a `shard_timelines` map of ShardIndex to Timeline on the page service connection handler - When looking up a Timeline for a particular ShardIndex, consult `shard_timelines` to avoid hitting the TenantsManager unless we really need to. - Re-work the CancellationToken handling, because the handler now holds gateguards on multiple timelines, and so must respect cancellation of _any_ timeline it has in its cache, not just the timeline related to the request it is currently servicing. --------- Co-authored-by: Vlad Lazar <vlad@neon.tech>	2024-01-16 09:39:19 +00:00
John Spray	df9e9de541	pageserver: API updates for sharding (#6330 ) The theme of the changes in this PR is that they're enablers for #6251 which are superficial struct/api changes. This is a spinoff from #6251: - Various APIs + clients thereof take TenantShardId rather than TenantId - The creation API gets a ShardParameters member, which may be used to configure shard count and stripe size. This enables the attachment service to present a "virtual pageserver" creation endpoint that creates multiple shards. - The attachment service will use tenant size information to drive shard splitting. Make a version of `TenantHistorySize` that is usable for decoding these API responses. - ComputeSpec includes a shard stripe size.	2024-01-16 09:21:00 +00:00
John Khvatov	2a3cfc9665	Remove PAGE_CACHE_ACQUIRE_PINNED_SLOT_TIME histogram. (#6356 ) Fixes #6343. ## Problem PAGE_CACHE_ACQUIRE_PINNED_SLOT_TIME is used on hot path and it adds noticeable latency to GetPage@LSN. ## Refs https://discordapp.com/channels/1176467419317940276/1195022264115151001/1196370689268125716	2024-01-15 17:19:19 +01:00
Christian Schwarz	0e1ef3713e	fix(pagebench): #6325 broke running without `--runtime` (#6351 ) After PR #6325, when running without --runtime, we wouldn't wait for start_work_barrier, causing the benchmark to not start at all.	2024-01-15 08:54:19 +00:00
Arpad Müller	60ced06586	Fix timeline creation and tenant deletion race (#6310 ) Fixes the race condition between timeline creation and tenant deletion outlined in #6255. Related: #5914, which is a similar race condition about the uninit marker file. Fixes #6255	2024-01-13 09:15:58 +01:00
Christian Schwarz	cd48ea784f	TenantInfo: expose generation number (#6348 ) Generally useful when debugging / troubleshooting. I found this useful when manually duplicating a tenant from a script[^1] where I can't use `neon_fixtures.Pageserver.tenant_attach`'s automatic integration with the neon_local's attachment_service. [^1]: https://github.com/neondatabase/neon/pull/6349	2024-01-12 18:27:11 +01:00
Vlad Lazar	02c6abadf0	pageserver: remove depenency of pagebench on pageserver (#6334 ) To achieve this I had to lift the BlockNumber and key_to_rel_block definitions to pageserver_api (similar to a change in #5980). Closes #6299	2024-01-12 17:11:19 +00:00
John Spray	7af4c676c0	pageserver: only upload initdb from shard 0 (#6331 ) ## Problem When creating a timeline on a sharded tenant, we call into each shard. We don't need to upload the initdb from every shard: only do it on shard zero. ## Summary of changes - Move the initdb upload into a function, and only call it on shard zero.	2024-01-12 15:32:27 +01:00
John Spray	aafe79873c	page_service: handle GetActiveTenantError::Cancelled (#6344 ) ## Problem Occasional test failures with QueryError::Other errors saying "cancelled" that get logged at error severity. ## Summary of changes Avoid casting GetActiveTenantError::Cancelled into QueryError::Other -- it should be QueryError::Shutdown, which is not logged as an error.	2024-01-12 12:43:14 +00:00
Christian Schwarz	eae74383c1	pageserver client: mgmt_api: expose reset API (#6326 ) By-product of some hack work that will be thrown away.	2024-01-12 11:07:16 +00:00
Christian Schwarz	8b657a1481	pagebench: getpage: cancellation & better logging (#6325 ) Needed these while working on https://github.com/neondatabase/neon/issues/5479	2024-01-12 11:53:18 +01:00
Christian Schwarz	915fba146d	pagebench: getpage: optional keyspace cache file (#6324 ) Proved useful when benchmarking 20k tenant setup when validating https://github.com/neondatabase/neon/issues/5479	2024-01-11 17:42:11 +00:00
Vlad Lazar	da7a7c867e	pageserver: do not bump priority of background task for timeline status requests (#6301 ) ## Problem Previously, `GET /v1/tenant/:tenant_id/timeline` and `GET /v1/tenant/:tenant_id/timeline/:timeline_id` would bump the priority of the background task which computes the initial logical size by cancelling the wait on the synchronisation semaphore. However, the request would still return an approximate logical size. It's undesirable to force background work for a status request. ## Summary of changes This PR updates the priority used by the timeline status request such that they don't do priority boosting by default anymore. An optional query parameter, `force-await-initial-logical-size`, is added for both mentioned endpoints. When set to true, it will skip the concurrency limiting semaphore and wait for the background task to complete before returning the exact logical size. In order to exercise this behaviour in a test I had to add an extra failpoint. If you think it's too intrusive, it can be removed. Also fixeda small bug where the cancellation of a download is reported as an opaque download failure upstream. This caused `test_location_conf_churn` to fail at teardown due to a WARN log line. Closes https://github.com/neondatabase/neon/issues/6168	2024-01-11 15:55:32 +00:00
Christian Schwarz	3ee981889f	compaction: avoid no-op timeline dir fsync (#6311 ) Random find while looking at an idle 20k tenant pageserver where each tenant has 9 tiny L0 layers and compaction produces no new L1s / image layers. The aggregate CPU cost of running this every 20s for 20k tenants is actually substantial, due to the use of `spawn_blocking`.	2024-01-11 10:32:39 +00:00
Christian Schwarz	fc66ba43c4	Revert "revert recent VirtualFile asyncification changes (#5291 )" (#6309 ) This reverts commit `ab1f37e908`. Thereby fixes #5479 Updated Analysis ================ The problem with the original patch was that it, for the first time, exposed the `VirtualFile` code to tokio task concurrency instead of just thread-based concurrency. That caused the VirtualFile file descriptor cache to start thrashing, effectively grinding the system to a halt. Details ------- At the time of the original patch, we had a _lot_ of runnable tasks in the pageserver. The symptom that prompted the revert (now being reverted in this PR) is that our production systems fell into a valley of zero goodput, high CPU, and zero disk IOPS shortly after PS restart. We lay out the root cause for that behavior in this subsection. At the time, there was no concurrency limit on the number of concurrent initial logical size calculations. Initial size calculation was initiated for all timelines within the first 10 minutes as part of consumption metrics collection. On a PS with 20k timelines, we'd thus have 20k runnable tasks. Before the original patch, the `VirtualFile` code never returned `Poll::Pending`. That meant that once we entered it, the calling tokio task would not yield to the tokio executor until we were done performing the VirtualFile operation, i.e., doing a blocking IO system call. The original patch switched the VirtualFile file descriptor cache's synchronization primitives to those from `tokio::sync`. It did not change that we were doing synchronous IO system calls. And the cache had more slots than we have tokio executor threads. So, these primitives never actually needed to return `Poll::Pending`. But, the tokio scheduler makes tokio sync primitives return `Pending` artificially, as a mechanism for the scheduler to get back into control more often ([example](https://docs.rs/tokio/1.35.1/src/tokio/sync/batch_semaphore.rs.html#570)). So, the new reality was that VirtualFile calls could now yield to the tokio executor. Tokio would pick one of the other 19999 runnable tasks to run. These tasks were also using VirtualFile. So, we now had a lot more concurrency in that area of the code. The problem with more concurrency was that caches started thrashing, most notably the VirtualFile file descriptor cache: each time a task would be rescheduled, it would want to do its next VirtualFile operation. For that, it would first need to evict another (task's) VirtualFile fd from the cache to make room for its own fd. It would then do one VirtualFile operation before hitting an await point and yielding to the executor again. The executor would run the other 19999 tasks for fairness before circling back to the first task, which would find its fd evicted. The other cache that would theoretically be impacted in a similar way is the pageserver's `PageCache`. However, for initial logical size calculation, it seems much less relevant in experiments, likely because of the random access nature of initial logical size calculation. Fixes ===== We fixed the above problems by - raising VirtualFile cache sizes - https://github.com/neondatabase/cloud/issues/8351 - changing code to ensure forward-progress once cache slots have been acquired - https://github.com/neondatabase/neon/pull/5480 - https://github.com/neondatabase/neon/pull/5482 - tbd: https://github.com/neondatabase/neon/issues/6065 - reducing the amount of runnable tokio tasks - https://github.com/neondatabase/neon/pull/5578 - https://github.com/neondatabase/neon/pull/6000 - fix bugs that caused unnecessary concurrency induced by connection handlers - https://github.com/neondatabase/neon/issues/5993 I manually verified that this PR doesn't negatively affect startup performance as follows: create a pageserver in production configuration, with 20k tenants/timelines, 9 tiny L0 layer files each; Start it, and observe ``` INFO Startup complete (368.009s since start) elapsed_ms=368009 ``` I further verified in that same setup that, when using `pagebench`'s getpage benchmark at as-fast-as-possible request rate against 5k of the 20k tenants, the achieved throughput is identical. The VirtualFile cache isn't thrashing in that case. Future Work =========== We will still exposed to the cache thrashing risk from outside factors, e.g., request concurrency is unbounded, and initial size calculation skips the concurrency limiter when we establish a walreceiver connection. Once we start thrashing, we will degrade non-gracefully, i.e., encounter a valley as was seen with the original patch. However, we have sufficient means to deal with that unlikely situation: 1. we have dashboards & metrics to monitor & alert on cache thrashing 2. we can react by scaling the bottleneck resources (cache size) or by manually shedding load through tenant relocation Potential systematic solutions are future work: * global concurrency limiting * per-tenant rate limiting => #5899 * pageserver-initiated load shedding Related Issues ============== This PR unblocks the introduction of tokio-epoll-uring for asynchronous disk IO ([Epic](#4744)).	2024-01-11 11:29:14 +01:00
Christian Schwarz	4e1b0b84eb	pagebench: fixup after is_rel_block_key changes in #6266 (#6303 ) PR #6266 broke the getpage_latest_lsn benchmark. Before this patch, we'd fail with ``` not implemented: split up range ``` because `r.start = rel size key` and `r.end = rel size key + 1`. The filtering of the key ranges in that loop is a bit ugly, but, I measured: * setup with 180k layer files (20k tenants * 9 layers). * total physical size is 463GiB * 5k tenants, the range filtering takes `0.6 seconds` on an i3en.3xlarge. That's a tiny fraction of the overall time it takes for pagebench to get ready to send requests. So, this is good enough for now / there are other bottlenecks that are bigger.	2024-01-09 19:00:37 +01:00
John Spray	f94abbab95	pageserver: clean up a redundant tenant_id attribute (#6280 ) This was a small TODO(sharding) thing in TenantHarness.	2024-01-09 12:10:15 +00:00
John Spray	4b9b4c2c36	pageserver: cleanup redundant create/attach code, fix detach while attaching (#6277 ) ## Problem The code for tenant create and tenant attach was just a special case of what upsert_location does. ## Summary of changes - Use `upsert_location` for create and attach APIs - Clean up error handling in upsert_location so that it can generate appropriate HTTP response codes - Update tests that asserted the old non-idempotent behavior of attach - Rework the `test_ignore_while_attaching` test, and fix tenant shutdown during activation, which this test was supposed to cover, but it was actually just waiting for activation to complete.	2024-01-09 10:37:54 +00:00
Arpad Müller	8186f6b6f9	Drop async_trait usage from three internal traits (#6305 ) This uses the [newly stable](https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html) async trait feature for three internal traits. One requires `Send` bounds to be present so uses `impl Future<...> + Send` instead. Advantages: * less macro usage * no extra boxing Disadvantages: * impl syntax needed for `Send` bounds is a bit more verbose (but only required in one place)	2024-01-09 11:20:08 +01:00
Arpad Müller	d5e3434371	Also allow unnecessary_fallible_conversions lint (#6294 ) This fixes the clippy lint firing on macOS on the conversion which needed for portability. For some reason, the logic in https://github.com/rust-lang/rust-clippy/pull/11669 to avoid an overlap is not working.	2024-01-09 04:22:36 +00:00
John Spray	b5ed6f22ae	pageserver: clean up a TODO comment (#6282 ) These functions don't need updating for sharding: it's fine for them to remain shard-naive, as they're only used in the context of dumping a layer file. The sharding metadata doesn't live in the layer file, it lives in the index.	2024-01-08 09:19:00 +00:00
John Spray	d1c0232e21	pageserver: use `pub(crate)` in metrics.rs, and clean up unused items (#6275 ) ## Problem Noticed while making other changes that there were `pub` items that were unused. ## Summary of changes - Make everything `pub(crate)` in metrics.rs, apart from items used from `bin/` - Fix the timelines eviction metric: it was never being incremented - Remove an unused ephemeral_bytes counter.	2024-01-08 03:53:15 +00:00
John Spray	3c560d27a8	pageserver: implement secondary-mode downloads (#6123 ) Follows on from #6050 , in which we upload heatmaps. Secondary locations will now poll those heatmaps and download layers mentioned in the heatmap. TODO: - [X] ~Unify/reconcile stats for behind-schedule execution with warn_when_period_overrun (https://github.com/neondatabase/neon/pull/6050#discussion_r1426560695)~ - [x] Give downloads their own concurrency config independent of uploads Deferred optimizations: - https://github.com/neondatabase/neon/issues/6199 - https://github.com/neondatabase/neon/issues/6200 Eviction will be the next PR: - #5342	2024-01-05 12:29:20 +00:00
John Spray	18e9208158	pageserver: improved error handling for shard routing error, timeline not found (#6262 ) ## Problem - When a client requests a key that isn't found in any shard on the node (edge case that only happens if a compute's config is out of date), we should prompt them to reconnect (as this includes a backoff), since they will not be able to complete the request until they eventually get a correct pageserver connection string. - QueryError::Other is used excessively: this contains a type-ambiguous anyhow::Error and is logged very verbosely (including backtrace). ## Summary of changes - Introduce PageStreamError to replace use of anyhow::Error in request handlers for getpage, etc. - Introduce Reconnect and NotFound variants to QueryError - Map the "shard routing error" case to PageStreamError::Reconnect -> QueryError::Reconnect - Update type conversions for LSN timeouts and tenant/timeline not found errors to use PageStreamError::NotFound->QueryError::NotFound	2024-01-04 10:40:03 +00:00
John Spray	c119af8ddd	pageserver: run at least 2 background task threads Otherwise an assertion in CONCURRENT_BACKGROUND_TASKS will trip if you try to run the pageserver on a single core.	2024-01-03 14:22:40 +00:00
John Spray	a2e083ebe0	pageserver: make walredo shard-aware This does not have a functional impact, but enables all the logging in this code to include the shard_id label.	2024-01-03 14:22:40 +00:00
John Spray	73a944205b	pageserver: log details on shard routing error	2024-01-03 14:22:40 +00:00
John Spray	34ebfbdd6f	pageserver: fix handling getpage with multiple shards on one node Previously, we would wait for the LSN to be visible on whichever timeline we happened to load at the start of the connection, then proceed to look up the correct timeline for the key and do the read. If the timeline holding the key was behind the timeline we used for the LSN wait, then we might serve an apparently-successful read result that actually contains data from behind the requested lsn.	2024-01-03 14:22:40 +00:00
John Spray	ef7c9c2ccc	pageserver: fix active tenant lookup hitting secondaries with sharding If there is some secondary shard for a tenant on the same node as an attached shard, the secondary shard could trip up this code and cause page_service to incorrectly get an error instead of finding the attached shard.	2024-01-03 14:22:40 +00:00
John Spray	6c79e12630	pageserver: drop unwanted keys during compaction after split	2024-01-03 14:22:40 +00:00
John Spray	753d97bd77	pageserver: don't delete ancestor shard layers	2024-01-03 14:22:40 +00:00
Cuong Nguyen	fb518aea0d	Add batch ingestion mechanism to avoid high contention (#5886 ) ## Problem For context, this problem was observed in a research project where we try to make neon run in multiple regions and I was asked by @hlinnaka to make this PR. In our project, we use the pageserver in a non-conventional way such that we would send a larger number of requests to the pageserver than normal (imagine postgres without the buffer pool). I measured the time from the moment a WAL record left the safekeeper to when it reached the pageserver ([code](`e593db1f5a/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs (L282-L287)`)) and observed that when the number of get_page_at_lsn requests was high, the wal receiving time increased significantly (see the left side of the graphs below). Upon further investigation, I found that the delay was caused by this line `d2ca410919/pageserver/src/tenant/timeline.rs (L2348)` The `get_layer_for_write` method is called for every value during WAL ingestion and it tries to acquire layers write lock every time, thus this results in high contention when read lock is acquired more frequently. ![Untitled](https://github.com/neondatabase/neon/assets/6244849/85460f4d-ead1-4532-bc64-736d0bfd7f16) ![Untitled2](https://github.com/neondatabase/neon/assets/6244849/84199ab7-5f0e-413b-a42b-f728f2225218) ## Summary of changes It is unnecessary to call `get_layer_for_write` repeatedly for all values in a WAL message since they would end up in the same memory layer anyway, so I created the batched versions of `InMemoryLayer::put_value`, `InMemoryLayer ::put_tombstone`, `Timeline::put_value`, and `Timeline::put_tombstone`, that acquire the locks once for a batch of values. Additionally, `DatadirModification` is changed to store multiple versions of uncommitted values, and `WalIngest::ingest_record()` can now ingest records without immediately committing them. With these new APIs, the new ingestion loop can be changed to commit for every `ingest_batch_size` records. The `ingest_batch_size` variable is exposed as a config. If it is set to 1 then we get the same behavior before this change. I found that setting this value to 100 seems to work the best, and you can see its effect on the right side of the above graphs. --------- Co-authored-by: John Spray <john@neon.tech>	2024-01-03 10:41:58 +00:00
Christian Schwarz	aa9f1d4b69	pagebench get-page: default to latest=true, make configurable via flag (#6252 ) fixes https://github.com/neondatabase/neon/issues/6209	2024-01-02 16:57:29 +00:00
Arseny Sher	dbd36e40dc	Move failpoint support code to utils. To enable them in safekeeper as well.	2024-01-02 10:50:20 +04:00
John Spray	e68ae2888a	pageserver: expedite tenant activation on delete (#6190 ) ## Problem During startup, a tenant delete request might have to retry for many minutes waiting for a tenant to enter Active state. ## Summary of changes - Refactor delete_tenant into TenantManager: this is not a functional change, but will avoid merge conflicts with https://github.com/neondatabase/neon/pull/6105 later - Add 412 responses to the swagger definition of this endpoint. - Use Tenant::wait_to_become_active in `TenantManager::delete_tenant` --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-12-22 10:22:22 +00:00
Arpad Müller	7d6fc3c826	Use pre-generated initdb.tar.zst in test_ingest_real_wal (#6206 ) This implements the TODO mentioned in the test added by #5892.	2023-12-21 14:23:09 +00:00
Christian Schwarz	5385791ca6	add pageserver component-level benchmark (`pagebench`) (#6174 ) This PR adds a component-level benchmarking utility for pageserver. Its name is `pagebench`. The problem solved by `pagebench` is that we want to put Pageserver under high load. This isn't easily achieved with `pgbench` because it needs to go through a compute, which has signficant performance overhead compared to accessing Pageserver directly. Further, compute has its own performance optimizations (most importantly: caches). Instead of designing a compute-facing workload that defeats those internal optimizations, `pagebench` simply bypasses them by accessing pageserver directly. Supported benchmarks: * getpage@latest_lsn * basebackup * triggering logical size calculation This code has no automated users yet. A performance regression test for getpage@latest_lsn will be added in a later PR. part of https://github.com/neondatabase/neon/issues/5771	2023-12-21 13:07:23 +01:00
Arpad Müller	48890d206e	Simplify inject_index_part test function (#6207 ) Instead of manually constructing the directory's path, we can just use the `parent()` function. This is a drive-by improvement from #6206	2023-12-21 12:52:38 +01:00
Joonas Koivunen	48f156b8a2	feat: relative last activity based eviction (#6136 ) Adds a new disk usage based eviction option, EvictionOrder, which selects whether to use the current `AbsoluteAccessed` or this new proposed but not yet tested `RelativeAccessed`. Additionally a fudge factor was noticed while implementing this, which might help sparing smaller tenants at the expense of targeting larger tenants. Cc: #5304 Co-authored-by: Arpad Müller <arpad@neon.tech>	2023-12-20 18:44:19 +00:00
John Spray	f260f1565e	pageserver: fixes + test updates for sharding (#6186 ) This is a precursor to: - https://github.com/neondatabase/neon/pull/6185 While that PR contains big changes to neon_local and attachment_service, this PR contains a few unrelated standalone changes generated while working on that branch: - Fix restarting a pageserver when it contains multiple shards for the same tenant - When using location_config api to attach a tenant, create its timelines dir - Update test paths where generations were previously optional to make them always-on: this avoids tests having to spuriously assert that attachment_service is not None in order to make the linter happy. - Add a TenantShardId python implementation for subsequent use in test helpers that will be made shard-aware - Teach scrubber to read across shards when checking for layer existence: this is a refactor to track the list of existent layers at tenant-level rather than locally to each timeline. This is a precursor to testing shard splitting.	2023-12-20 12:26:20 +00:00

1 2 3 4 5 ...

1791 Commits