rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 21:20:37 +00:00

Author	SHA1	Message	Date
John Spray	775c0c8892	tests: adjust threshold in test_partial_evict_tenant (#8509 ) ## Problem This test was destabilized by https://github.com/neondatabase/neon/pull/8431. The threshold is arbitrary & failures are still quite close to it. At a high level the test is asserting "eviction was approximately fair to these tenants", which appears to still be the case when the abs diff between ratios is slightly higher at ~0.6-0.7. ## Summary of changes - Change threshold from 0.06 to 0.065. Based on the last ~10 failures that should be sufficient.	2024-07-25 15:00:42 +01:00
John Spray	24ea9f9f60	tests: always scrub on test exit when using S3Storage (#8437 ) ## Problem Currently, tests may have a scrub during teardown if they ask for it, but most tests don't request it. To detect "unknown unknowns", let's run it at the end of every test where possible. This is similar to asserting that there are no errors in the log at the end of tests. ## Summary of changes - Remove explicit `enable_scrub_on_exit` - Always scrub if remote storage is an S3Storage.	2024-07-25 14:19:38 +01:00
Vlad Lazar	9c5ad21341	storcon: make heartbeats restart aware (#8222 ) ## Problem Re-attach blocks the pageserver http server from starting up. Hence, it can't reply to heartbeats until that's done. This makes the storage controller mark the node off-line (not good). We worked around this by setting the interval after which nodes are marked offline to 5 minutes. This isn't a long term solution. ## Summary of changes * Introduce a new `NodeAvailability` state: `WarmingUp`. This state models the following time interval: * From receiving the re-attach request until the pageserver replies to the first heartbeat post re-attach * The heartbeat delta generator becomes aware of this state and uses a separate longer interval * Flag `max-warming-up-interval` now models the longer timeout and `max-offline-interval` the shorter one to match the names of the states Closes https://github.com/neondatabase/neon/issues/7552	2024-07-25 14:09:12 +01:00
John Spray	f5db655447	pageserver: simplify LayerAccessStats (#8431 ) ## Problem LayerAccessStats contains a lot of detail that we don't use: short histories of most recent accesses, specifics on what kind of task accessed a layer, etc. This is all stored inside a Mutex, which is locked every time something accesses a layer. ## Summary of changes - Store timestamps at a very low resolution (to the nearest second), sufficient for use on the timescales of eviction. - Pack access time and last residence change time into a single u64 - Use the high bits of the u64 for other flags, including the new layer visibility concept. - Simplify the external-facing model for access stats to just include what we now track. Note that the `HistoryBufferWithDropCounter` is removed here because it is no longer used. I do not dislike this type, we just happen not to use it for anything else at present. Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-24 08:17:28 +01:00
Konstantin Knizhnik	563d73d923	Use smgrexists() instead of access() to enforce uniqueness of generated relfilenumber (#7992 ) ## Problem Postgres is using `access()` function in `GetNewRelFileNumber` to check if assigned relfilenumber is not used for any other relation. This check will not work in Neon, because we do not have all files in local storage. ## Summary of changes Use smgrexists() instead which will check at page server if such relfilenode is used. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-23 18:41:55 +03:00
John Spray	80c8ceacbc	tests: make `test_scrubber_physical_gc_ancestors` more stable (#8453 ) ## Problem This test sometimes found that ancestors were getting cleaned up before it had done any compaction. Compaction was happening implicitly via Workload. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8298/10032173390/index.html#testresult/fb04786402f80822/retries ## Summary of changes - Set upload=False when writing data after shard split, to avoid doing a checkpoint - Add a checkpoint_period & explicit wait for uploads so that we ensure data lands in S3 without doing a checkpoint	2024-07-23 12:57:57 +01:00
Vlad Lazar	35854928d9	pageserver: use identity file as node id authority and remove init command and config-override flags (#7766 ) Ansible will soon write the node id to `identity.toml` in the work dir for new pageservers. On the pageserver side, we read the node id from the identity file if it is present and use that as the source of truth. If the identity file is missing, cannot be read, or does not deserialise, start-up is aborted. This PR also removes the `--init` mode and the `--config-override` flag from the `pageserver` binary. The neon_local is already not using these flags anymore. Ansible still uses them until the linked change is merged & deployed, so, this PR has to land simultaneously or after the Ansible change due to that. Related Ansible change: https://github.com/neondatabase/aws/pull/1322 Cplane change to remove config-override usages: https://github.com/neondatabase/cloud/pull/13417 Closes: https://github.com/neondatabase/neon/issues/7736 Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-23 11:41:12 +01:00
Konstantin Knizhnik	a868e342d4	Change default version of Neon extensio to 1.4	2024-07-22 17:58:07 +01:00
Yuchen Liang	595c450036	fix(scrubber): more robust metadata consistency checks (#8344 ) Part of #8128. ## Problem Scrubber uses `scan_metadata` command to flag metadata inconsistencies. To trust it at scale, we need to make sure the errors we emit is a reflection of real scenario. One check performed in the scrubber is to see whether layers listed in the latest `index_part.json` is present in object listing. Currently, the scrubber does not robustly handle the case where objects are uploaded/deleted during the scan. ## Summary of changes Condition for success: An object in the index is (1) in the object listing we acquire from S3 or (2) found in a HeadObject request (new object). - Add in the `HeadObject` requests for the layers missing from the object listing. - Keep the order of first getting the object listing and then downloading the layers. - Update check to only consider shards with highest shard count. - Skip analyzing a timeline if `deleted_at` tombstone is marked in `index_part.json`. - Add new test to see if scrubber actually detect the metadata inconsistency. _Misc_ - A timeline with no ancestor should always have some layers. - Removed experimental histograms _Caveat_ - Ancestor layer is not cleaned until #8308 is implemented. If ancestor layers reference non-existing layers in the index, the scrubber will emit false positives. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-22 14:53:33 +01:00
John Spray	8d948f2e07	tests: make test_change_pageserver more robust (#8442 ) ## Problem This test predates the storage controller. It stops pageservers and reconfigures computes, but that races with the storage controller's node failure detection, which can result in restarting nodes not getting the attachments they expect, and the test failing ## Summary of changes - Configure the storage controller to use a compute notify hook that does nothing, so that it cannot interfere with the test's configuration of computes. - Instead of using the attach hook, just notify the storage controller that nodes are offline, and reconcile tenants so that they will automatically be attached to the other node.	2024-07-22 14:17:02 +01:00
John Spray	98af1e365b	pageserver: remove absolute-order disk usage eviction (#8454 ) ## Problem Deployed pageserver configurations are all like this: ``` disk_usage_based_eviction: max_usage_pct: 85 min_avail_bytes: 0 period: "10s" eviction_order: type: "RelativeAccessed" args: highest_layer_count_loses_first: true ``` But we're maintaining this optional absolute order eviction, with test cases etc. ## Summary of changes - Remove absolute order eviction. Make the default eviction policy the same as how we really deploy pageservers.	2024-07-22 13:15:55 +01:00
John Spray	a4fa250c92	tests: longer timeouts in test_timeline_deletion_with_files_stuck_in_upload_queue (#8438 ) ## Problem This test had two locations with 2 second timeouts, which is rather low when we run on a highly contended test machine running lots of tests in parallel. It usually passes, but today I've seen both of these locations time out on separate PRs. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8432/10007868041/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/6c6a092be083d27c ## Summary of changes - Change 2 second timeouts to 20 second timeouts	2024-07-19 19:30:28 +02:00
John Spray	44781518d0	storage scrubber: GC ancestor shard layers (#8196 ) ## Problem After a shard split, the pageserver leaves the ancestor shard's content in place. It may be referenced by child shards, but eventually child shards will de-reference most ancestor layers as they write their own data and do GC. We would like to eventually clean up those ancestor layers to reclaim space. ## Summary of changes - Extend the physical GC command with `--mode=full`, which includes cleaning up unreferenced ancestor shard layers - Add test `test_scrubber_physical_gc_ancestors` - Remove colored log output: in testing this is irritating ANSI code spam in logs, and in interactive use doesn't add much. - Refactor storage controller API client code out of storcon_client into a `storage_controller/client` crate - During physical GC of ancestors, call into the storage controller to check that the latest shards seen in S3 reflect the latest state of the tenant, and there is no shard split in progress.	2024-07-19 19:07:59 +03:00
Christian Schwarz	16071e57c6	pageserver: remove obsolete cached_metric_collection_interval (#8370 ) We're removing the usage of this long-meaningless config field in https://github.com/neondatabase/aws/pull/1599 Once that PR has been deployed to staging and prod, we can merge this PR.	2024-07-19 17:01:02 +01:00
Arpad Müller	c96e8012ce	Enable zstd in tests (#8368 ) Successor of #8288 , just enable zstd in tests. Also adds a test that creates easily compressable data. Part of #5431 --------- Co-authored-by: John Spray <john@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-18 19:09:57 +01:00
John Spray	9ded2556df	tests: increase test_pg_regress and test_isolation timeouts (#8418 ) ## Problem These tests time out ~1 in 50 runs when in debug mode. There is no indication of a real issue: they're just wrappers that have large numbers of individual tests contained within on pytest case. ## Summary of changes - Bump pg_regress timeout from 600 to 900s - Bump test_isolation timeout from 300s (default) to 600s In future it would be nice to break out these tests to run individual cases (or batches thereof) as separate tests, rather than this monolith.	2024-07-18 10:23:17 +01:00
John Spray	7672e49ab5	tests: fix metrics check in test_s3_eviction (#8419 ) ## Problem This test would occasionally fail its metric check. This could happen in the rare case that the nodes had all been restarted before their most recent eviction. The metric check was added in https://github.com/neondatabase/neon/pull/8348 ## Summary of changes - Check metrics before each restart, accumulate into a bool that we assert on at the end of the test	2024-07-18 10:14:56 +01:00
Christian Schwarz	a2d170b6d0	NeonEnv.from_repo_dir: use storage_controller_db instead of `attachments.json` (#8382 ) When `NeonEnv.from_repo_dir` was introduced, storage controller stored its state exclusively `attachments.json`. Since then, it has moved to using Postgres, which stores its state in `storage_controller_db`. But `NeonEnv.from_repo_dir` wasn't adjusted to do this. This PR rectifies the situation. Context for this is failures in `test_pageserver_characterize_throughput_with_n_tenants` CF: https://neondb.slack.com/archives/C033RQ5SPDH/p1721035799502239?thread_ts=1720901332.293769&cid=C033RQ5SPDH Notably, `from_repo_dir` is also used by the backwards- and forwards-compatibility. Thus, the changes in this PR affect those tests as well. However, it turns out that the compatibility snapshot already contains the `storage_controller_db`. Thus, it should just work and in fact we can remove hacks like `fixup_storage_controller`. Follow-ups created as part of this work: * https://github.com/neondatabase/neon/issues/8399 * https://github.com/neondatabase/neon/issues/8400	2024-07-18 10:56:07 +02:00
Joonas Koivunen	e250b9e063	test: allow requests to any pageserver get cancelled (#8413 ) Fix flakyness on `test_sharded_timeline_detach_ancestor` which does not reproduce on a fast enough runner by allowing cancelled request before completing on all pageservers. It was only allowed on half of the pageservers. Failure evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8352/9972357040/index.html#suites/a1c2be32556270764423c495fad75d47/7cca3e3d94fe12f2	2024-07-17 22:03:02 +01:00
John Spray	f4f0869dc8	pageserver: exclude un-read layers from short residence statistic (#8396 ) ## Problem The `evictions_with_low_residence_duration` is used as an indicator of cache thrashing. However, there are situations where it is quite legitimate to only have a short residence during compaction, where a delta is downloaded, used to generate an image layer, and then discarded. This can lead to false positive alerts. ## Summary of changes - Only track low residence duration for layers that have been accessed at least once (compaction doesn't count as an access). This will give us a metric that indicates thrashing on layers that the _user_ is using, rather than those we're downloading for housekeeping purposes. Once we add "layer visibility" as an explicit property of layers, this can also be used as a cleaner condition (residence of non-visible layers should never be alertable)	2024-07-16 20:55:29 +01:00
Tristan Partin	b197cc20fc	Hide import behind TYPE_CHECKING	2024-07-16 12:12:29 -05:00
Tristan Partin	ba17025a57	Run each migration in its own transaction Previously, every migration was run in the same transaction. This is preparatory work for fixing CVE-2024-4317.	2024-07-16 12:12:29 -05:00
Arpad Müller	66337097de	Avoid the storage controller in test_tenant_creation_fails (#8392 ) As described in #8385, the likely source for flakiness in test_tenant_creation_fails is the following sequence of events: 1. test instructs the storage controller to create the tenant 2. storage controller adds the tenant and persists it to the database. issues a creation request 3. the pageserver restarts with the failpoint disabled 4. storage controller's background reconciliation still wants to create the tenant 5. pageserver gets new request to create the tenant from background reconciliation This commit just avoids the storage controller entirely. It has its own set of issues, as the re-attach request will obviously not include the tenant, but it's still useful to test for non-existence of the tenant. The generation is also not optional any more during tenant attachment. If you omit it, the pageserver yields an error. We change the signature of `tenant_attach` to reflect that. Alternative to #8385 Fixes #8266	2024-07-16 12:19:28 +02:00
Joonas Koivunen	730db859c7	feat(timeline_detach_ancestor): success idempotency (#8354 ) Right now timeline detach ancestor reports an error (409, "no ancestor") on a new attempt after successful completion. This makes it troublesome for storage controller retries. Fix it to respond with `200 OK` as if the operation had just completed quickly. Additionally, the returned timeline identifiers in the 200 OK response are now ordered so that responses between different nodes for error comparison are done by the storage controller added in #8353. Design-wise, this PR introduces a new strategy for accessing the latest uploaded IndexPart: `RemoteTimelineClient::initialized_upload_queue(&self) -> Result<UploadQueueAccessor<'_>, NotInitialized>`. It should be a more scalable way to query the latest uploaded `IndexPart` than to add a query method for each question directly on `RemoteTimelineClient`. GC blocking will need to be introduced to make the operation fully idempotent. However, it is idempotent for the cases demonstrated by tests. Cc: #6994	2024-07-15 17:47:53 +00:00
John Spray	04448ac323	pageserver: use PITR GC cutoffs as authoritative (#8365 ) ## Problem Pageserver GC uses a size-based condition (GC "horizon" in addition to time-based "PITR"). Eventually we plan to retire the size-based condition: https://github.com/neondatabase/neon/issues/6374 Currently, we always apply the more conservative of the two, meaning that tenants always retain at least 64MB of history (default horizon), even after a very long time has passed. This is particularly acute in cases where someone has dropped tables/databases, and then leaves a database idle: the horizon can prevent GCing very large quantities of historical data (we already account for this in synthetic size by ignoring gc horizon). We're not entirely removing GC horizon right now because we don't want to 100% rely on standby_horizon for robustness of physical replication, but we can tweak our logic to avoid retaining that 64MB LSN length indefinitely. ## Summary of changes - Rework `Timeline::find_gc_cutoffs`, with new logic: - If there is no PITR set, then use `DEFAULT_PITR_INTERVAL` (1 week) to calculate a time threshold. Retain either the horizon or up to that thresholds, whichever requires less data. - When there is a PITR set, and we have unambiguously resolved the timestamp to an LSN, then ignore the GC horizon entirely. For typical PITRs (1 day, 1 week), this will still easily retain enough data to avoid stressing read only replicas. The key property we end up with, whether a PITR is set or not, is that after enough time has passed, our GC cutoff on an idle timeline will catch up with the last_record_lsn. Using `DEFAULT_PITR_INTERVAL` is a bit of an arbitrary hack, but this feels like it isn't really worth the noise of exposing in TenantConfig. We could just make it a different named constant though. The end-end state will be that there is no gc_horizon at all, and that tenants with pitr_interval=0 would truly retain no history, so this constant would go away.	2024-07-15 17:43:05 +01:00
Joonas Koivunen	324e4e008f	feat(storcon): timeline detach ancestor passthrough (#8353 ) Currently storage controller does not support forwarding timeline detach ancestor requests to pageservers. Add support for forwarding `PUT .../:tenant_id/timelines/:timeline_id/detach_ancestor`. Implement the support mostly as is, because the timeline detach ancestor will be made (mostly) idempotent in future PR. Cc: #6994	2024-07-15 18:08:24 +03:00
Christian Schwarz	b49b450dc4	remove page_service `show <tenant_id>` (#8372 ) This operation isn't used in practice, so let's remove it. Context: in https://github.com/neondatabase/neon/pull/8339	2024-07-15 15:33:56 +01:00
Konstantin Knizhnik	8a8b83df27	Add neon.running_xacts_overflow_policy to make it possible for RO replica to startup without primary even in case running xacts overflow (#8323 ) ## Problem Right now if there are too many running xacts to be restored from CLOG at replica startup, then replica is not trying to restore them and wait for non-overflown running-xacs WAL record from primary. But if primary is not active, then replica will not start at all. Too many running xacts can be caused by transactions with large number of subtractions. But right now it can be also cause by two reasons: - Lack of shutdown checkpoint which updates `oldestRunningXid` (because of immediate shutdown) - nextXid alignment on 1024 boundary (which cause loosing ~1k XIDs on each restart) Both problems are somehow addressed now. But we have existed customers with "sparse" CLOG and lack of checkpoints. To be able to start RO replicas for such customers I suggest to add GUC which allows replica to start even in case of subxacts overflow. ## Summary of changes Add `neon.running_xacts_overflow_policy` with the following values: - ignore: restore from CLOG last N XIDs and accept connections - skip: do not restore any XIDs from CXLOGbut still accept connections - wait: wait non-overflown running xacts record from primary node ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-15 15:52:00 +03:00
Arseny Sher	4bdfb96078	Fix test_timeline_copy flakiness. fixes https://github.com/neondatabase/neon/issues/8355	2024-07-15 15:21:16 +03:00
Vlad Lazar	b329b1c610	tests: allow list breaching min resident size in statvfs test (#8358 ) ## Problem This test would sometimes violate the min resident size during disk eviction and fail due to the generate warning log. Disk usage candidate collection only takes into account active tenants. However, the statvfs call takes into account the entire tenants directory, which includes tenants which haven't become active yet. After re-starting the pageserver, disk usage eviction may kick in before both tenants have become active. Hence, the logic will try to satisfy thedisk usage requirements by evicting everything belonging to the active tenant, and hence violating the tenant minimum resident size. ## Summary of changes Allow the warning	2024-07-12 17:31:17 +01:00
John Spray	0645ae318e	pageserver: circuit breaker on compaction (#8359 ) ## Problem We already back off on compaction retries, but the impact of a failing compaction can be so great that backing off up to 300s isn't enough. The impact is consuming a lot of I/O+CPU in the case of image layer generation for large tenants, and potentially also leaking disk space. Compaction failures are extremely rare and almost always indicate a bug, frequently a bug that will not let compaction to proceed until it is fixed. Related: https://github.com/neondatabase/neon/issues/6738 ## Summary of changes - Introduce a CircuitBreaker type - Add a circuit breaker for compaction, with a policy that after 5 failures, compaction will not be attempted again for 24 hours. - Add metrics that we can alert on: any >0 value for `pageserver_circuit_breaker_broken_total` should generate an alert. - Add a test that checks this works as intended. Couple notes to reviewers: - Circuit breakers are intrinsically a defense-in-depth measure: this is not the solution to any underlying issues, it is just a general mitigation for "unknown unknowns" that might be encountered in future. - This PR isn't primarily about writing a perfect CircuitBreaker type: the one in this PR is meant to be just enough to mitigate issues in compaction, and make it easy to monitor/alert on these failures. We can refine this type in future as/when we want to use it elsewhere.	2024-07-12 12:04:02 +01:00
Sasha Krassovsky	82b9a44ab4	Grant execute on snapshot functions to neon_superuser (#8346 ) ## Problem I need `neon_superuser` to be allowed to create snapshots for replication tests ## Summary of changes Adds a migration that grants these functions to neon_superuser	2024-07-11 20:29:35 +00:00
Joonas Koivunen	4a87bac036	test: limit `test_layer_download_timeouted` to MOCK_S3 (#8331 ) Requests against REAL_S3 on CI can consistently take longer than 1s; testing the short timeouts against it made no sense in hindsight, as MOCK_S3 works just as well. evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8229/9857994025/index.html#suites/b97efae3a617afb71cb8142f5afa5224/6828a50921660a32	2024-07-11 15:03:35 -04:00
Arseny Sher	cd29156927	Fix memory context of NeonWALReader allocation. Allocating it in short living context is wrong because it is reused during backend lifetime.	2024-07-11 20:31:15 +03:00
John Spray	814c8e8f68	storage controller: add node deletion API (#8226 ) ## Problem In anticipation of later adding a really nice drain+delete API, I initially only added an intentionally basic `/drop` API that is just about usable for deleting nodes in a pinch, but requires some ugly storage controller restarts to persuade it to restart secondaries. ## Summary of changes I started making a few tiny fixes, and ended up writing the delete API... - Quality of life nit: ordering of node + tenant listings in storcon_cli - Papercut: Fix the attach_hook using the wrong operation type for reporting slow locks - Make Service::spawn tolerate `generation_pageserver` columns that point to nonexistent node IDs. I started out thinking of this as a general resilience thing, but when implementing the delete API I realized it was actually a legitimate end state after the delete API is called (as that API doesn't wait for all reconciles to succeed). - Add a `DELETE` API for nodes, which does not gracefully drain, but does reschedule everything. This becomes safe to use when the system is in any state, but will incur availability gaps for any tenants that weren't already live-migrated away. If tenants have already been drained, this becomes a totally clean + safe way to decom a node. - Add a test and a storcon_cli wrapper for it This is meant to be a robust initial API that lets us remove nodes without doing ugly things like restarting the storage controller -- it's not quite a totally graceful node-draining routine yet. There's more work in https://github.com/neondatabase/neon/issues/8333 to get to our end-end state.	2024-07-11 17:05:47 +01:00
John Spray	0159ae9536	safekeeper: eviction metrics (#8348 ) ## Problem Follow up to https://github.com/neondatabase/neon/pull/8335, to improve observability of how many evict/restores we are doing. ## Summary of changes - Add `safekeeper_eviction_events_started_total` and `safekeeper_eviction_events_completed_total`, with a "kind" label of evict or restore. This gives us rates, and also ability to calculate how many are in progress. - Generalize SafekeeperMetrics test type to use the same helpers as pageserver, and enable querying any metric. - Read the new metrics at the end of the eviction test.	2024-07-11 17:05:35 +01:00
Christian Schwarz	e26ef640c1	pageserver: remove `trace_read_requests` (#8338 ) `trace_read_requests` is a per `Tenant`-object option. But the `handle_pagerequests` loop doesn't know which `Tenant` object (i.e., which shard) the request is for. The remaining use of the `Tenant` object is to check `tenant.cancel`. That check is incorrect [if the pageserver hosts multiple shards](https://github.com/neondatabase/neon/issues/7427#issuecomment-2220577518). I'll fix that in a future PR where I completely eliminate the holding of `Tenant/Timeline` objects across requests. See [my code RFC](https://github.com/neondatabase/neon/pull/8286) for the high level idea. Note that we can always bring the tracing functionality if we need it. But since it's actually about logging the `page_service` wire bytes, it should be a `page_service`-level config option, not per-Tenant. And for enabling tracing on a single connection, we can implement a `set pageserver_trace_connection;` option.	2024-07-11 15:17:07 +02:00
John Spray	24f8133e89	safekeeper: add eviction_min_resident to stop evictions thrashing (#8335 ) ## Problem - The condition for eviction is not time-based: it is possible for a timeline to be restored in response to a client, that client times out, and then as soon as the timeline is restored it is immediately evicted again. - There is no delay on eviction at startup of the safekeeper, so when it starts up and sees many idle timelines, it does many evictions which will likely be immediately restored when someone uses the timeline. ## Summary of changes - Add `eviction_min_resident` parameter, and use it in `ready_for_eviction` to avoid evictions if the timeline has been resident for less than this period. - This also implicitly delays evictions at startup for `eviction_min_resident` - Set this to a very low number for the existing eviction test, which expects immediate eviction. The default period is 15 minutes. The general reasoning for that is that in the worst case where we thrash ~10k timelines on one safekeeper, downloading 16MB for each one, we should set a period that would not overwhelm the node's bandwidth.	2024-07-10 19:38:14 +01:00
John Spray	e89ec55ea5	tests: stabilize test_sharding_split_compaction (#8318 ) ## Problem This test incorrectly assumed that a post-split compaction would only drop content. This was easily destabilized by any changes to image generation rules. ## Summary of changes - Before split, do a full image layer generation pass, to guarantee that post-split compaction should only drop data, never create it. - Fix the force_image_layer_creation mode of compaction that we use from tests like this: previously it would try and generate image layers even if one already existed with the same layer key, which caused compaction to fail.	2024-07-10 14:14:10 +01:00
Christian Schwarz	1a49f1c15c	pageserver: move `page_service`'s `import basebackup` / `import wal` to mgmt API (#8292 ) I want to fix bugs in `page_service` ([issue](https://github.com/neondatabase/neon/issues/7427)) and the `import basebackup` / `import wal` stand in the way / make the refactoring more complicated. We don't use these methods anyway in practice, but, there have been some objections to removing the functionality completely. So, this PR preserves the existing functionality but moves it into the HTTP management API. Note that I don't try to fix existing bugs in the code, specifically not fixing * it only ever worked correctly for unsharded tenants * it doesn't clean up on error All errors are mapped to `ApiError::InternalServerError`.	2024-07-09 23:17:42 +02:00
Yuchen Liang	d9c1068cf4	tests: increase approx size equal threshold to avoid `test_lsn_lease_size` flakiness (#8282 ) ## Summary of changes Increase the `assert_size_approx_equal` threshold to avoid flakiness of `test_lsn_lease_size`. Still needs more investigation to fully resolve #8293. - Also set `autovacuum=off` for the endpoint we are running in the test. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-08 21:50:13 +01:00
John Spray	811eb88b89	tests: stabilize test_timeline_size_quota_on_startup (#8255 ) ## Problem `test_timeline_size_quota_on_startup` assumed that writing data beyond the size limit would always be blocked. This is not so: the limit is only enforced if feedback makes it back from the pageserver to the safekeeper + compute. Closes: https://github.com/neondatabase/neon/issues/6562 ## Summary of changes - Modify the test to wait for the pageserver to catch up. The size limit was never actually being enforced robustly, the original version of this test was just writing much more than 30MB and about 98% of the time getting lucky such that the feedback happened to arrive before the tests for loop was done. - If the test fails, log the logical size as seen by the pageserver.	2024-07-08 20:06:34 +00:00
John Spray	daea26a22f	tests: use smaller layers in test_pg_regress (#8232 ) ## Problem Debug-mode runs of test_pg_regress are rather slow since https://github.com/neondatabase/neon/pull/8105, and occasionally exceed their 600s timeout. ## Summary of changes - Use 8MiB layer files, avoiding large ephemeral layers On a hetzner AX102, this takes the runtime from 230s to 190s. Which hopefully will be enough to get the runtime on github runners more reliably below its 600s timeout. This has the side benefit of exercising more of the pageserver stack (including compaction) under a workload that exercises a more diverse set of postgres functionality than most of our tests.	2024-07-08 19:05:35 +00:00
Tristan Partin	2a3410d1c3	Hide import behind TYPE_CHECKING No need to import it if we aren't type checking anything.	2024-07-08 07:30:22 -07:00
Konstantin Knizhnik	13522fb722	Increase timeout for wating subscriber caught-up (#8118 ) ## Problem test_subscriber_restart has quit large failure rate' https://neonprod.grafana.net/d/fddp4rvg7k2dcf/regression-test-failures?orgId=1&var-test_name=test_subscriber_restart&var-max_count=100&var-restrict=false I can be caused by too small timeout (5 seconds) to wait until changes are propagated. Related to #8097 ## Summary of changes Increase timeout to 30 seconds. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-05 20:39:10 +03:00
Alexander Bayandin	c9fd8d7693	SELECT 💣(); (#8270 ) ## Problem We want to be able to test how our infrastructure reacts on segfaults in Postgres (for example, we collect cores, and get some required logs/metrics, etc) ## Summary of changes - Add `trigger_segfauls` function to `neon_test_utils` to trigger a segfault in Postgres - Add `trigger_panic` function to `neon_test_utils` to trigger SIGABRT (by using `elog(PANIC, ...)) - Fix cleanup logic in regression tests in endpoint crashed	2024-07-05 15:12:01 +01:00
John Spray	5aae80640b	tests: make location_conf_churn more robust (#8271 ) ## Problem This test directly manages locations on pageservers and configuration of an endpoint. However, it did not switch off the parts of the storage controller that attempt to do the same: occasionally, the test would fail in a strange way such as a compute failing to accept a reconfiguration request. ## Summary of changes - Wire up the storage controller's compute notification hook to a no-op handler - Configure the tenant's scheduling policy to Stop.	2024-07-05 10:34:16 +01:00
John Spray	e25ac31fc9	tests: extend allow list in deletion test (#8268 ) ## Problem `1ea5d8b132` tolerated this as an error message, but it can show up in logs as well. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8201/9780147712/index.html#testresult/263422f5f5f292ea/retries ## Summary of changes - Tolerate "failed to delete 1 objects" in pageserver logs, this occurs occasionally when injected failures exhaust deletion's retries.	2024-07-05 10:09:15 +01:00
Konstantin Knizhnik	88b13d4552	implement rolling hyper-log-log algorithm (#8068 ) ## Problem See #7466 ## Summary of changes Implement algorithm descried in https://hal.science/hal-00465313/document Now new GUC is added: `neon.wss_max_duration` which specifies size of sliding window (in seconds). Default value is 1 hour. It is possible to request estimation of working set sizes (within this window using new function `approximate_working_set_size_seconds`. Old function `approximate_working_set_size` is preserved for backward compatibility. But its scope is also limited by `neon.wss_max_duration`. Version of Neon extension is changed to 1.4 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2024-07-04 22:03:58 +03:00
Yuchen Liang	19accfee4e	feat(pageserver): integrate lsn lease into synthetic size (#8220 ) Part of #7497, closes #8071. (accidentally closed #8208, reopened here) ## Problem After the changes in #8084, we need synthetic size to also account for leased LSNs so that users do not get free retention by running a small ephemeral endpoint for a long time. ## Summary of changes This PR integrates LSN leases into the synthetic size calculation. We model leases as read-only branches started at the leased LSN (except it does not have a timeline id). Other changes: - Add new unit tests testing whether a lease behaves like a read-only branch. - Change `/size_debug` response to include lease point in the SVG visualization. - Fix `/lsn_lease` HTTP API to do proper parsing for POST. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-04 15:09:05 +00:00

1 2 3 4 5 ...

829 Commits