rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 04:30:38 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	57155ada77	temp: human readable summaries for relative access time compared to absolute (#6384 ) With testing the new eviction order there is a problem of all of the (currently rare) disk usage based evictions being rare and unique; this PR adds a human readable summary of what absolute order would had done and what the relative order does. Assumption is that these loggings will make the few evictions runs in staging more useful. Cc: #5304 for allowing testing in the staging	2024-01-18 17:21:08 +02:00
Konstantin Knizhnik	02b916d3c9	Use [NEON_SMGR] tag for all messages in neon extension (#6313 ) ## Problem Use [NEON_SMGR] for all log messages produced by neon extension. ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-18 17:08:34 +02:00
Anastasia Lubennikova	e6e013b3b7	Fix pgbouncer settings update: - Start pgbouncer in VM from postgres user, to allow connection to pgbouncer admin console. - Remove unused compute_ctl options --pgbouncer-connstr and --pgbouncer-ini-path. - Fix and cleanup code of connection to pgbouncer, add retries because pgbouncer may not be instantly ready when compute_ctl starts.	2024-01-18 11:27:12 +00:00
John Spray	bd19290d9f	pageserver: add shard_id to metric labels (#6308 ) ## Problem tenant_id/timeline_id is no longer a full identifier for metrics from a `Tenant` or `Timeline` object. Closes: https://github.com/neondatabase/neon/issues/5953 ## Summary of changes Include `shard_id` label everywhere we have `tenant_id`/`timeline_id` label.	2024-01-18 10:52:18 +00:00
Joonas Koivunen	a584e300d1	test: figure out the relative eviction order assertions (#6375 ) I just failed to see this earlier on #6136. layer counts are used as an abstraction, and each of the two tenants lose proportionally about the same amount of layers. sadly there is no difference in between `relative_spare` and `relative_equal` as both of these end up evicting the exact same amount of layers, but I'll try to add later another test for those. Cc: #5304	2024-01-18 12:39:45 +02:00
Joonas Koivunen	e247ddbddc	build: update h2 (#6383 ) Notes: https://github.com/hyperium/h2/releases/tag/v0.3.24 Related: https://rustsec.org/advisories/RUSTSEC-2024-0003	2024-01-18 09:54:15 +00:00
Konstantin Knizhnik	0dc4c9b0b8	Relsize hash lru eviction (#6353 ) ## Problem Currently relation hash size is limited by "neon.relsize_hash_size" GUC with default value 64k. 64k relations is not so small number... but it is enough to create 376 databases to exhaust it. ## Summary of changes Use LRU replacement algorithm to prevent hash overflow ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-17 20:34:30 +02:00
John Spray	b6ec11ad78	control_plane: generalize attachment_service to handle sharding (#6251 ) ## Problem To test sharding, we need something to control it. We could write python code for doing this from the test runner, but this wouldn't be usable with neon_local run directly, and when we want to write tests with large number of shards/tenants, Rust is a better fit efficiently handling all the required state. This service enables automated tests to easily get a system with sharding/HA without the test itself having to set this all up by hand: existing tests can be run against sharded tenants just by setting a shard count when creating the tenant. ## Summary of changes Attachment service was previously a map of TenantId->TenantState, where the principal state stored for each tenant was the generation and the last attached pageserver. This enabled it to serve the re-attach and validate requests that the pageserver requires. In this PR, the scope of the service is extended substantially to do overall management of tenants in the pageserver, including tenant/timeline creation, live migration, evacuation of offline pageservers etc. This is done using synchronous code to make declarative changes to the tenant's intended state (`TenantState.policy` and `TenantState.intent`), which are then translated into calls into the pageserver by the `Reconciler`. Top level summary of modules within `control_plane/attachment_service/src`: - `tenant_state`: structure that represents one tenant shard. - `service`: implements the main high level such as tenant/timeline creation, marking a node offline, etc. - `scheduler`: for operations that need to pick a pageserver for a tenant, construct a scheduler and call into it. - `compute_hook`: receive notifications when a tenant shard is attached somewhere new. Once we have locations for all the shards in a tenant, emit an update to postgres configuration via the neon_local `LocalEnv`. - `http`: HTTP stubs. These mostly map to methods on `Service`, but are separated for readability and so that it'll be easier to adapt if/when we switch to another RPC layer. - `node`: structure that describes a pageserver node. The most important attribute of a node is its availability: marking a node offline causes tenant shards to reschedule away from it. This PR is a precursor to implementing the full sharding service for prod (#6342). What's the difference between this and a production-ready controller for pageservers? - JSON file persistence to be replaced with a database - Limited observability. - No concurrency limits. Marking a pageserver offline will try and migrate every tenant to a new pageserver concurrently, even if there are thousands. - Very simple scheduler that only knows to pick the pageserver with fewest tenants, and place secondary locations on a different pageserver than attached locations: it does not try to place shards for the same tenant on different pageservers. This matters little in tests, because picking the least-used pageserver usually results in round-robin placement. - Scheduler state is rebuilt exhaustively for each operation that requires a scheduler. - Relies on neon_local mechanisms for updating postgres: in production this would be something that flows through the real control plane. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-01-17 18:01:08 +00:00
John Spray	4cec95ba13	pageserver: add list API for LocationConf (#6329 ) ## Problem The `/v1/tenant` listing API only applies to attached tenants. For an external service to implement a global reconciliation of its list of shards vs. what's on the pageserver, we need a full view of what's in TenantManager, including secondary tenant locations, and InProgress locations. Dependency of https://github.com/neondatabase/neon/pull/6251 ## Summary of changes - Add methods to Tenant and SecondaryTenant to reconstruct the LocationConf used to create them. - Add `GET /v1/location_config` API	2024-01-17 13:34:51 +00:00
Arpad Müller	ab86060d97	Copy initdb if loading from different timeline ID (#6363 ) Previously, if we: 1. created a new timeline B from a different timeline's A initdb 2. deleted timeline A the initdb for timeline B would be gone, at least in a world where we are deleting initdbs upon timeline deletion. This world is imminent (#6226). Therefore, if the pageserver is instructed to load the initdb from a different timeline ID, copy it to the newly created timeline's directory in S3. This ensures that we can disaster recover the new timeline as well, regardless of whether the original timeline was deleted or not. Part of https://github.com/neondatabase/neon/issues/5282.	2024-01-17 12:42:42 +01:00
Arpad Müller	6ffdcfe6a4	remote_storage: unify azure and S3 tests (#6364 ) The remote_storage crate contains two copies of each test, one for azure and one for S3. The repetition is not necessary and makes the tests more prone to drift, so we remove it by moving the tests into a shared module. The module has a different name depending on where it is included, so that each test still has "s3" or "azure" in its full path, allowing you to just test the S3 test or just the azure tests. Earlier PR that removed some duplication already: #6176 Fixes #6146.	2024-01-16 18:45:19 +01:00
Arpad Müller	4b0204ede5	Add copy operation tests and implement them for azure blobs (#6362 ) This implements the `copy` operation for azure blobs, added to S3 by #6091, and adds tests both to s3 and azure ensuring that the copy operation works.	2024-01-16 12:07:20 +00:00
John Spray	bf4e708646	pageserver: eviction for secondary mode tenants (#6225 ) Follows #6123 Closes: https://github.com/neondatabase/neon/issues/5342 The approach here is to avoid using `Layer` from secondary tenants, and instead make the eviction types (e.g. `EvictionCandidate`) have a variant that carries a Layer for attached tenants, and a different variant for secondary tenants. Other changes: - EvictionCandidate no longer carries a `Timeline`: this was only used for providing a witness reference to remote timeline client. - The types for returning eviction candidates are all in disk_usage_eviction_task.rs now, whereas some of them were in timeline.rs before. - The EvictionCandidate type replaces LocalLayerInfoForDiskUsageEviction type, which was basically the same thing.	2024-01-16 10:29:26 +00:00
John Spray	887e94d7da	page_service: more efficient page_service -> shard lookup (#6037 ) ## Problem In #5980 the page service connection handler gets a simple piece of logic for finding the right Timeline: at connection time, it picks an arbitrary Timeline, and then when handling individual page requests it checks if the original timeline is the correct shard, and if not looks one up. This is pretty slow in the case where we have to go look up the other timeline, because we take the big tenants manager lock. ## Summary of changes - Add a `shard_timelines` map of ShardIndex to Timeline on the page service connection handler - When looking up a Timeline for a particular ShardIndex, consult `shard_timelines` to avoid hitting the TenantsManager unless we really need to. - Re-work the CancellationToken handling, because the handler now holds gateguards on multiple timelines, and so must respect cancellation of _any_ timeline it has in its cache, not just the timeline related to the request it is currently servicing. --------- Co-authored-by: Vlad Lazar <vlad@neon.tech>	2024-01-16 09:39:19 +00:00
John Spray	df9e9de541	pageserver: API updates for sharding (#6330 ) The theme of the changes in this PR is that they're enablers for #6251 which are superficial struct/api changes. This is a spinoff from #6251: - Various APIs + clients thereof take TenantShardId rather than TenantId - The creation API gets a ShardParameters member, which may be used to configure shard count and stripe size. This enables the attachment service to present a "virtual pageserver" creation endpoint that creates multiple shards. - The attachment service will use tenant size information to drive shard splitting. Make a version of `TenantHistorySize` that is usable for decoding these API responses. - ComputeSpec includes a shard stripe size.	2024-01-16 09:21:00 +00:00
Anna Khanova	3f2187eb92	Proxy relax sni check (#6323 ) ## Problem Using the same domain name () for serverless driver can help with connection caching. https://github.com/neondatabase/neon/issues/6290 ## Summary of changes Relax SNI check.	2024-01-16 08:42:13 +00:00
John Khvatov	2a3cfc9665	Remove PAGE_CACHE_ACQUIRE_PINNED_SLOT_TIME histogram. (#6356 ) Fixes #6343. ## Problem PAGE_CACHE_ACQUIRE_PINNED_SLOT_TIME is used on hot path and it adds noticeable latency to GetPage@LSN. ## Refs https://discordapp.com/channels/1176467419317940276/1195022264115151001/1196370689268125716	2024-01-15 17:19:19 +01:00
Cihan Demirci	d34adf46b4	do not provide disclaimer input for the deploy-prod workflow (#6360 ) We've removed this input from the deploy-prod workflow.	2024-01-15 16:15:34 +00:00
Conrad Ludgate	0bac8ddd76	proxy: fix serverless error message info (#6279 ) ## Problem https://github.com/neondatabase/serverless/issues/51#issuecomment-1878677318 ## Summary of changes 1. When we have a db_error, use db_error.message() as the message. 2. include error position. 3. line should be a string (weird?) 4. `datatype` -> `dataType`	2024-01-15 16:43:19 +01:00
Christian Schwarz	0e1ef3713e	fix(pagebench): #6325 broke running without `--runtime` (#6351 ) After PR #6325, when running without --runtime, we wouldn't wait for start_work_barrier, causing the benchmark to not start at all.	2024-01-15 08:54:19 +00:00
Konstantin Knizhnik	31a4eb40b2	Do not suspend compute if autovacuum is active (#6322 ) ## Problem Se.e https://github.com/orgs/neondatabase/projects/49/views/13?pane=issue&itemId=48282912 ## Summary of changes Do not suspend compute if there are active auto vacuum workers ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-14 09:33:57 +02:00
Arpad Müller	60ced06586	Fix timeline creation and tenant deletion race (#6310 ) Fixes the race condition between timeline creation and tenant deletion outlined in #6255. Related: #5914, which is a similar race condition about the uninit marker file. Fixes #6255	2024-01-13 09:15:58 +01:00
Christian Schwarz	b76454ae41	add script to set up EC2 storage-optimized instance store for benchmarking (#6350 ) Been using this all the time in https://github.com/neondatabase/neon/pull/6214 Part of https://github.com/neondatabase/neon/issues/5771 Should consider this in https://github.com/neondatabase/neon/issues/6297	2024-01-12 19:25:17 +00:00
Arthur Petukhovsky	97b48c23f8	Compact some compute_ctl logs (#6346 ) Print postgres roles in a single line and add some info.	2024-01-12 18:24:22 +00:00
Christian Schwarz	cd48ea784f	TenantInfo: expose generation number (#6348 ) Generally useful when debugging / troubleshooting. I found this useful when manually duplicating a tenant from a script[^1] where I can't use `neon_fixtures.Pageserver.tenant_attach`'s automatic integration with the neon_local's attachment_service. [^1]: https://github.com/neondatabase/neon/pull/6349	2024-01-12 18:27:11 +01:00
Alexey Kondratov	1c432d5492	[compute_ctl] Do not miss short-living connections (#6008 ) ## Problem Currently, activity monitor in `compute_ctl` has 500 ms polling interval. It also looks on the list of current client backends looking for an active one or one with the most recent state change. This means we can miss short-living connections. Yet, during testing this PR I realized that it's usually not a problem with pooled connection, as pgbouncer maintains connections to Postgres even though client connection are short-living. We can still miss direct connections. ## Summary of changes This commit introduces another way to detect user activity on the compute. It polls a sum of `active_time` and sum of `sessions` from all non-system databases in the `pg_stat_database` [1]. If user runs some queries or just open a direct connection, it will rise; if user will drop db, it can go down, but it's still a change and will be detected as activity. New statistic-based logic seems to be working fine. Yet, after having it running for a couple of hours I've seen several odd cases with connections via pgbouncer: 1. Sometimes, if you run just `psql pooler_connstr -c 'select 1;'` `active_time` could be not updated immediately, and it may take a couple of dozens of seconds. This doesn't seem critical, though. 2. Same query with pooler, `active_time` can be bumped a bit, then pgbouncer keeps open connection to Postgres for ~10 minutes, then it disconnects, and `active_time` could be bumped a bit again. 'Could be' because I've seen it once, but it didn't reproduce for a second try. I think this can create false-positives (hopefully rare), when we will not suspend some computes because of lagged statistics update OR because some non-user processes will try to connect to user databases. Currently, we don't touch them outside of startup and `postgres_exporter` is configured to do not discover other databases, but this can change in the future. New behavior is covered by feature flag `activity_monitor_experimental`, which should be provided by control plane via neondatabase/cloud#9171 [1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-VIEW Related to neondatabase/cloud#7966, neondatabase/cloud#7198	2024-01-12 18:15:41 +01:00
Vlad Lazar	02c6abadf0	pageserver: remove depenency of pagebench on pageserver (#6334 ) To achieve this I had to lift the BlockNumber and key_to_rel_block definitions to pageserver_api (similar to a change in #5980). Closes #6299	2024-01-12 17:11:19 +00:00
John Spray	7af4c676c0	pageserver: only upload initdb from shard 0 (#6331 ) ## Problem When creating a timeline on a sharded tenant, we call into each shard. We don't need to upload the initdb from every shard: only do it on shard zero. ## Summary of changes - Move the initdb upload into a function, and only call it on shard zero.	2024-01-12 15:32:27 +01:00
John Spray	aafe79873c	page_service: handle GetActiveTenantError::Cancelled (#6344 ) ## Problem Occasional test failures with QueryError::Other errors saying "cancelled" that get logged at error severity. ## Summary of changes Avoid casting GetActiveTenantError::Cancelled into QueryError::Other -- it should be QueryError::Shutdown, which is not logged as an error.	2024-01-12 12:43:14 +00:00
Christian Schwarz	eae74383c1	pageserver client: mgmt_api: expose reset API (#6326 ) By-product of some hack work that will be thrown away.	2024-01-12 11:07:16 +00:00
Christian Schwarz	8b657a1481	pagebench: getpage: cancellation & better logging (#6325 ) Needed these while working on https://github.com/neondatabase/neon/issues/5479	2024-01-12 11:53:18 +01:00
Christian Schwarz	42613d4c30	refactor(NeonEnv): shutdown of child processes (#6327 ) Also shuts down `Broker`, which, before this PR, we did start in `start()` but relied on the fixture to stop. Do it a bit earlier so that, after `NeonEnv.stop()` returns, there are no child processes using `repo_dir`. Also, drive-by-fixes inverted logic around `ps_assert_metric_no_errors`, missed during https://github.com/neondatabase/neon/pull/6295 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-01-12 10:23:21 +01:00
Arseny Sher	7f828890cf	Extract safekeeper per timeline state from safekeeper.rs safekeeper.rs is mostly about consensus, but state is wider. Also form SafekeeperState which encapsulates persistent part + in memory layer with API for atomic updates. Moves remote_consistent_lsn back to SafekeeperMemState, fixes its absense from memory dump. Also renames SafekeeperState to TimelinePersistentState, as TimelineMemState and TimelinePersistent state are created.	2024-01-12 10:58:22 +04:00
Sasha Krassovsky	1eb30b40af	Bump postgres version to support CREATE PUBLICATION FOR ALL TABLES	2024-01-11 15:30:33 -08:00
dependabot[bot]	8551a61014	build(deps): bump jinja2 from 3.1.2 to 3.1.3 (#6333 )	2024-01-11 19:49:28 +00:00
Christian Schwarz	087526b81b	neon_local init: add `--force` mode that allows an empty dir (#6328 ) Need this in https://github.com/neondatabase/neon/pull/6214 refs https://github.com/neondatabase/neon/issues/5771	2024-01-11 18:11:44 +00:00
Christian Schwarz	915fba146d	pagebench: getpage: optional keyspace cache file (#6324 ) Proved useful when benchmarking 20k tenant setup when validating https://github.com/neondatabase/neon/issues/5479	2024-01-11 17:42:11 +00:00
Vlad Lazar	da7a7c867e	pageserver: do not bump priority of background task for timeline status requests (#6301 ) ## Problem Previously, `GET /v1/tenant/:tenant_id/timeline` and `GET /v1/tenant/:tenant_id/timeline/:timeline_id` would bump the priority of the background task which computes the initial logical size by cancelling the wait on the synchronisation semaphore. However, the request would still return an approximate logical size. It's undesirable to force background work for a status request. ## Summary of changes This PR updates the priority used by the timeline status request such that they don't do priority boosting by default anymore. An optional query parameter, `force-await-initial-logical-size`, is added for both mentioned endpoints. When set to true, it will skip the concurrency limiting semaphore and wait for the background task to complete before returning the exact logical size. In order to exercise this behaviour in a test I had to add an extra failpoint. If you think it's too intrusive, it can be removed. Also fixeda small bug where the cancellation of a download is reported as an opaque download failure upstream. This caused `test_location_conf_churn` to fail at teardown due to a WARN log line. Closes https://github.com/neondatabase/neon/issues/6168	2024-01-11 15:55:32 +00:00
Conrad Ludgate	551f0cc097	proxy: refactor how neon-options are handled (#6306 ) ## Problem HTTP connection pool was not respecting the PitR options. ## Summary of changes 1. refactor neon_options a bit to allow easier access to cache_key 2. make HTTP not go through `StartupMessageParams` 3. expose SNI processing to replace what was removed in step 2.	2024-01-11 14:58:31 +00:00
Anna Khanova	a84935d266	Extend unsupported startup parameter error message (#6318 ) ## Problem Unsupported startup parameter error happens with pooled connection. However the reason of this error might not be obvious to the user. ## Summary of changes Send more descriptive message with the link to our troubleshooting page: https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter. Resolves: https://github.com/neondatabase/neon/issues/6291	2024-01-11 12:09:26 +00:00
Christian Schwarz	3ee981889f	compaction: avoid no-op timeline dir fsync (#6311 ) Random find while looking at an idle 20k tenant pageserver where each tenant has 9 tiny L0 layers and compaction produces no new L1s / image layers. The aggregate CPU cost of running this every 20s for 20k tenants is actually substantial, due to the use of `spawn_blocking`.	2024-01-11 10:32:39 +00:00
Christian Schwarz	fc66ba43c4	Revert "revert recent VirtualFile asyncification changes (#5291 )" (#6309 ) This reverts commit `ab1f37e908`. Thereby fixes #5479 Updated Analysis ================ The problem with the original patch was that it, for the first time, exposed the `VirtualFile` code to tokio task concurrency instead of just thread-based concurrency. That caused the VirtualFile file descriptor cache to start thrashing, effectively grinding the system to a halt. Details ------- At the time of the original patch, we had a _lot_ of runnable tasks in the pageserver. The symptom that prompted the revert (now being reverted in this PR) is that our production systems fell into a valley of zero goodput, high CPU, and zero disk IOPS shortly after PS restart. We lay out the root cause for that behavior in this subsection. At the time, there was no concurrency limit on the number of concurrent initial logical size calculations. Initial size calculation was initiated for all timelines within the first 10 minutes as part of consumption metrics collection. On a PS with 20k timelines, we'd thus have 20k runnable tasks. Before the original patch, the `VirtualFile` code never returned `Poll::Pending`. That meant that once we entered it, the calling tokio task would not yield to the tokio executor until we were done performing the VirtualFile operation, i.e., doing a blocking IO system call. The original patch switched the VirtualFile file descriptor cache's synchronization primitives to those from `tokio::sync`. It did not change that we were doing synchronous IO system calls. And the cache had more slots than we have tokio executor threads. So, these primitives never actually needed to return `Poll::Pending`. But, the tokio scheduler makes tokio sync primitives return `Pending` artificially, as a mechanism for the scheduler to get back into control more often ([example](https://docs.rs/tokio/1.35.1/src/tokio/sync/batch_semaphore.rs.html#570)). So, the new reality was that VirtualFile calls could now yield to the tokio executor. Tokio would pick one of the other 19999 runnable tasks to run. These tasks were also using VirtualFile. So, we now had a lot more concurrency in that area of the code. The problem with more concurrency was that caches started thrashing, most notably the VirtualFile file descriptor cache: each time a task would be rescheduled, it would want to do its next VirtualFile operation. For that, it would first need to evict another (task's) VirtualFile fd from the cache to make room for its own fd. It would then do one VirtualFile operation before hitting an await point and yielding to the executor again. The executor would run the other 19999 tasks for fairness before circling back to the first task, which would find its fd evicted. The other cache that would theoretically be impacted in a similar way is the pageserver's `PageCache`. However, for initial logical size calculation, it seems much less relevant in experiments, likely because of the random access nature of initial logical size calculation. Fixes ===== We fixed the above problems by - raising VirtualFile cache sizes - https://github.com/neondatabase/cloud/issues/8351 - changing code to ensure forward-progress once cache slots have been acquired - https://github.com/neondatabase/neon/pull/5480 - https://github.com/neondatabase/neon/pull/5482 - tbd: https://github.com/neondatabase/neon/issues/6065 - reducing the amount of runnable tokio tasks - https://github.com/neondatabase/neon/pull/5578 - https://github.com/neondatabase/neon/pull/6000 - fix bugs that caused unnecessary concurrency induced by connection handlers - https://github.com/neondatabase/neon/issues/5993 I manually verified that this PR doesn't negatively affect startup performance as follows: create a pageserver in production configuration, with 20k tenants/timelines, 9 tiny L0 layer files each; Start it, and observe ``` INFO Startup complete (368.009s since start) elapsed_ms=368009 ``` I further verified in that same setup that, when using `pagebench`'s getpage benchmark at as-fast-as-possible request rate against 5k of the 20k tenants, the achieved throughput is identical. The VirtualFile cache isn't thrashing in that case. Future Work =========== We will still exposed to the cache thrashing risk from outside factors, e.g., request concurrency is unbounded, and initial size calculation skips the concurrency limiter when we establish a walreceiver connection. Once we start thrashing, we will degrade non-gracefully, i.e., encounter a valley as was seen with the original patch. However, we have sufficient means to deal with that unlikely situation: 1. we have dashboards & metrics to monitor & alert on cache thrashing 2. we can react by scaling the bottleneck resources (cache size) or by manually shedding load through tenant relocation Potential systematic solutions are future work: * global concurrency limiting * per-tenant rate limiting => #5899 * pageserver-initiated load shedding Related Issues ============== This PR unblocks the introduction of tokio-epoll-uring for asynchronous disk IO ([Epic](#4744)).	2024-01-11 11:29:14 +01:00
Arthur Petukhovsky	544284cce0	Collapse multiline queries in compute_ctl (#6316 )	2024-01-10 22:25:28 +04:00
Arthur Petukhovsky	71beabf82d	Join multiline postgres logs in compute_ctl (#5903 ) Postgres can write multiline logs, and they are difficult to handle after they are mixed with other logs. This PR combines multiline logs from postgres into a single line, where previous line breaks are replaced with unicode zero-width spaces. Then postgres logs are written to stderr with `PG:` prefix. It makes it easy to distinguish postgres logs from all other compute logs with a simple grep, e.g. `\|= "PG:"`	2024-01-10 15:11:43 +00:00
Anna Khanova	76372ce002	Added auth info cache with notifiations to redis. (#6208 ) ## Problem Current cache doesn't support any updates from the cplane. ## Summary of changes * Added redis notifier listner. * Added cache which can be invalidated with the notifier. If the notifier is not available, it's just a normal ttl cache. * Updated cplane api. The motivation behind this organization of the data is the following: * In the Neon data model there are projects. Projects could have multiple branches and each branch could have more than one endpoint. * Also there is one special `main` branch. * Password reset works per branch. * Allowed IPs are the same for every branch in the project (except, maybe, the main one). * The main branch can be changed to the other branch. * The endpoint can be moved between branches. Every event described above requires some special processing on the porxy (or cplane) side. The idea of invalidating for the project is that whenever one of the events above is happening with the project, proxy can invalidate all entries for the entire project. This approach also requires some additional API change (returning project_id inside the auth info).	2024-01-10 11:51:05 +00:00
Christian Schwarz	4e1b0b84eb	pagebench: fixup after is_rel_block_key changes in #6266 (#6303 ) PR #6266 broke the getpage_latest_lsn benchmark. Before this patch, we'd fail with ``` not implemented: split up range ``` because `r.start = rel size key` and `r.end = rel size key + 1`. The filtering of the key ranges in that loop is a bit ugly, but, I measured: * setup with 180k layer files (20k tenants * 9 layers). * total physical size is 463GiB * 5k tenants, the range filtering takes `0.6 seconds` on an i3en.3xlarge. That's a tiny fraction of the overall time it takes for pagebench to get ready to send requests. So, this is good enough for now / there are other bottlenecks that are bigger.	2024-01-09 19:00:37 +01:00
John Spray	f94abbab95	pageserver: clean up a redundant tenant_id attribute (#6280 ) This was a small TODO(sharding) thing in TenantHarness.	2024-01-09 12:10:15 +00:00
John Spray	4b9b4c2c36	pageserver: cleanup redundant create/attach code, fix detach while attaching (#6277 ) ## Problem The code for tenant create and tenant attach was just a special case of what upsert_location does. ## Summary of changes - Use `upsert_location` for create and attach APIs - Clean up error handling in upsert_location so that it can generate appropriate HTTP response codes - Update tests that asserted the old non-idempotent behavior of attach - Rework the `test_ignore_while_attaching` test, and fix tenant shutdown during activation, which this test was supposed to cover, but it was actually just waiting for activation to complete.	2024-01-09 10:37:54 +00:00
Arpad Müller	8186f6b6f9	Drop async_trait usage from three internal traits (#6305 ) This uses the [newly stable](https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html) async trait feature for three internal traits. One requires `Send` bounds to be present so uses `impl Future<...> + Send` instead. Advantages: * less macro usage * no extra boxing Disadvantages: * impl syntax needed for `Send` bounds is a bit more verbose (but only required in one place)	2024-01-09 11:20:08 +01:00
Christian Schwarz	90e0219b29	python tests: support overlayfs for NeonEnvBuilder.from_repo_dir (#6295 ) Part of #5771 Extracted from https://github.com/neondatabase/neon/pull/6214 This PR makes the test suite sensitive to the new env var `NEON_ENV_BUILDER_FROM_REPO_DIR_USE_OVERLAYFS`. If it is set, `NeonEnvBuilder.from_repo_dir` uses overlayfs to duplicate the the snapshot repo dir contents. Since mounting requires root privileges, we use sudo to perform the mounts. That, and macOS support, is also why copytree remains the default. If we ever run on a filesystem with copy reflink support, we should consider that as an alternative. This PR can be tried on a Linux machine on the `test_backward_compatiblity` test, which uses `from_repo_dir`.	2024-01-09 10:15:46 +00:00

1 2 3 4 5 ...

4375 Commits