rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 13:10:37 +00:00

Author	SHA1	Message	Date
Arseny Sher	a4eea5025c	Fix logical apply worker reporting of flush_lsn wrt sync replication. It should take syncrep flush_lsn into account because WAL before it on endpoint restart is lost, which makes replication miss some data if slot had already been advanced too far. This commit adds test reproducing the issue and bumps vendor/postgres to commit with the actual fix.	2024-08-12 13:14:02 +03:00
Arpad Müller	507f1a5bdd	Also pass HOME env var in access_env_vars (#8685 ) Noticed this while debugging a test failure in #8673 which only occurs with real S3 instead of mock S3: if you authenticate to S3 via `AWS_PROFILE`, then it requires the `HOME` env var to be set so that it can read inside the `~/.aws` directory. The scrubber abstraction `StorageScrubber::scrubber_cli` in `neon_fixtures.py` would otherwise not work. My earlier PR #6556 has done similar things for the `neon_local` wrapper. You can try: ``` aws sso login --profile dev export ENABLE_REAL_S3_REMOTE_STORAGE=y REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests REMOTE_STORAGE_S3_REGION=eu-central-1 AWS_PROFILE=dev RUST_BACKTRACE=1 BUILD_TYPE=debug DEFAULT_PG_VERSION=16 ./scripts/pytest -vv --tb=short -k test_scrubber_tenant_snapshot ``` before and after this patch: this patch fixes it.	2024-08-10 12:04:47 +00:00
Vlad Lazar	f5cef7bf7f	storcon: skip draining shard if it's secondary is lagging too much (#8644 ) ## Problem Migrations of tenant shards with cold secondaries are holding up drains in during production deployments. ## Summary of changes If a secondary locations is lagging by more than 256MiB (configurable, but that's the default), then skip cutting it over to the secondary as part of the node drain.	2024-08-09 15:45:07 +01:00
Alex Chi Z.	a155914c1c	fix(neon): disable create tablespace stmt (#8657 ) part of https://github.com/neondatabase/neon/issues/8653 Disable create tablespace stmt. It turns out it requires much less effort to do the regress test mode flag than patching the test cases, and given that we might need to support tablespaces in the future, I decided to add a new flag `regress_test_mode` to change the behavior of create tablespace. Tested manually that without setting regress_test_mode, create tablespace will be rejected. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-09 09:18:55 +01:00
Conrad Ludgate	7e08fbd1b9	Revert "proxy: update tokio-postgres to allow arbitrary config params (#8076 )" (#8654 ) This reverts #8076 - which was already reverted from the release branch since forever (it would have been a breaking change to release for all users who currently set TimeZone options). It's causing conflicts now so we should revert it here as well.	2024-08-09 09:09:29 +01:00
John Spray	4431688dc6	tests: don't require kafka client for regular tests (#8662 ) ## Problem We're adding more third party dependencies to support more diverse + realistic test cases in `test_runner/logical_repl`. I ❤️ these tests, they are a good thing. The slight glitch is that python packaging is hard, and some third party python packages have issues. For example the current kafka dependency doesn't work on latest python. We can mitigate that by only importing these more specialized dependencies in the tests that use them. ## Summary of changes - Move the `kafka` import into a test body, so that folks running the regular `test_runner/regress` tests don't have to have a working kafka client package.	2024-08-08 19:24:21 +01:00
Joonas Koivunen	8561b2c628	fix: stop leaking BackgroundPurges (#8650 ) avoid "leaking" the completions of BackgroundPurges by: 1. switching it to TaskTracker for provided close+wait 2. stop using tokio::fs::remove_dir_all which will consume two units of memory instead of one blocking task Additionally, use more graceful shutdown in tests which do actually some background cleanup.	2024-08-08 12:02:53 +01:00
Konstantin Knizhnik	cbe8c77997	Use sycnhronous commit for logical replicaiton worker (#8645 ) ## Problem See https://neondb.slack.com/archives/C03QLRH7PPD/p1723038557449239?thread_ts=1722868375.476789&cid=C03QLRH7PPD Logical replication subscription by default use `synchronous_commit=off` which cause problems with safekeeper ## Summary of changes Set `synchronous_commit=on` for logical replication subscription in test_subscriber_restart.py ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-08 10:23:57 +03:00
Joonas Koivunen	05dd1ae9e0	fix: drain completed page_service connections (#8632 ) We've noticed increased memory usage with the latest release. Drain the joinset of `page_service` connection handlers to avoid leaking them until shutdown. An alternative would be to use a TaskTracker. TaskTracker was not discussed in original PR #8339 review, so not hot fixing it in here either.	2024-08-07 17:14:45 +00:00
Joonas Koivunen	a81fab4826	refactor(timeline_detach_ancestor): replace ordered reparented with a hashset (#8629 ) Earlier I was thinking we'd need a (ancestor_lsn, timeline_id) ordered list of reparented. Turns out we did not need it at all. Replace it with an unordered hashset. Additionally refactor the reparented direct children query out, it will later be used from more places. Split off from #8430. Cc: #6994	2024-08-07 18:19:00 +02:00
Yuchen Liang	ed5724d79d	scrubber: clean up `scan_metadata` before prod (#8565 ) Part of #8128. ## Problem Currently, scrubber `scan_metadata` command will return with an error code if the metadata on remote storage is corrupted with fatal errors. To safely deploy this command in a cronjob, we want to differentiate between failures while running scrubber command and the erroneous metadata. At the same time, we also want our regression tests to catch corrupted metadata using the scrubber command. ## Summary of changes - Return with error code only when the scrubber command fails - Uses explicit checks on errors and warnings to determine metadata health in regression tests. Resolve conflict with `tenant-snapshot` command (after shard split): [`test_scrubber_tenant_snapshot`](https://github.com/neondatabase/neon/blob/yuchen/scrubber-scan-cleanup-before-prod/test_runner/regress/test_storage_scrubber.py#L23) failed before applying `422a8443dd` - When taking a snapshot, the old `index_part.json` in the unsharded tenant directory is not kept. - The current `list_timeline_blobs` implementation consider no `index_part.json` as a parse error. - During the scan, we are only analyzing shards with highest shard count, so we will not get a parse error. but we do need to add the layers to tenant object listing, otherwise we will get index is referencing a layer that is not in remote storage error. - Action: Add s3_layers from `list_timeline_blobs` regardless of parsing error Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-06 18:55:42 +01:00
John Spray	3727c6fbbe	pageserver: use layer visibility when composing heatmap (#8616 ) ## Problem Sometimes, a layer is Covered by hasn't yet been evicted from local disk (e.g. shortly after image layer generation). It is not good use of resources to download these to a secondary location, as there's a good chance they will never be read. This follows the previous change that added layer visibility: - #8511 Part of epic: - https://github.com/neondatabase/neon/issues/8398 ## Summary of changes - When generating heatmaps, only include Visible layers - Update test_secondary_downloads to filter to visible layers when listing layers from an attached location	2024-08-06 17:15:40 +01:00
John Spray	b7beaa0fd7	tests: improve stability of `test_storage_controller_many_tenants` (#8607 ) ## Problem The controller scale test does random migrations. These mutate secondary locations, and therefore can cause secondary optimizations to happen in the background, violating the test's expectation that consistency_check will work as there are no reconciliations running. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10247161379/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/6316beacd3fb3060/ ## Summary of changes - Only migrate to existing secondary locations, not randomly picked nodes, so that we can do a fast reconcile_until_idle (otherwise reconcile_until_idle is takes a long time to create new secondary locations). - Do a reconcile_until_idle before consistency_check.	2024-08-06 12:58:33 +01:00
a-masterov	078f941dc8	Add a test using Debezium as a client for the logical replication (#8568 ) ## Problem We need to test the logical replication with some external consumers. ## Summary of changes A test of the logical replication with Debezium as a consumer was added. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-08-06 13:08:55 +02:00
Joonas Koivunen	138f008bab	feat: persistent gc blocking (#8600 ) Currently, we do not have facilities to persistently block GC on a tenant for whatever reason. We could do a tenant configuration update, but that is risky for generation numbers and would also be transient. Introduce a `gc_block` facility in the tenant, which manages per timeline blocking reasons. Additionally, add HTTP endpoints for enabling/disabling manual gc blocking for a specific timeline. For debugging, individual tenant status now includes a similar string representation logged when GC is skipped. Cc: #6994	2024-08-06 10:09:56 +01:00
Alexander Bayandin	e6e578821b	CI(benchmarking): set pub/sub projects for LR tests (#8483 ) ## Problem > Currently, long-running LR tests recreate endpoints every night. We'd like to have along-running buildup of history to exercise the pageserver in this case (instead of "unit-testing" the same behavior everynight). Closes #8317 ## Summary of changes - Update Postgres version for replication tests - Set `BENCHMARK_PROJECT_ID_PUB`/`BENCHMARK_PROJECT_ID_SUB` env vars to projects that were created for this purpose --------- Co-authored-by: Sasha Krassovsky <krassovskysasha@gmail.com>	2024-08-05 22:06:47 +00:00
Joonas Koivunen	c32807ac19	fix: allow awaiting logical size for root timelines (#8604 ) Currently if `GET /v1/tenant/x/timeline/y?force-await-initial-logical-size=true` is requested for a root timeline created within the current pageserver session, the request handler panics hitting the debug assertion. These timelines will always have an accurate (at initdb import) calculated logical size. Fix is to never attempt prioritizing timeline size calculation if we already have an exact value. Split off from #8528.	2024-08-05 21:21:33 +01:00
John Spray	0a667bc8ef	tests: add test_historic_storage_formats (#8423 ) ## Problem Currently, our backward compatibility tests only look one release back. That means, for example, that when we switch on image layer compression by default, we'll test reading of uncompressed layers for one release, and then stop doing it. When we make an index_part.json format change, we'll test against the old format for a week, then stop (unless we write separate unit tests for each old format). The reality in the field is that data in old formats will continue to exist for weeks/months/years. When we make major format changes, we should retain examples of the old format data, and continuously verify that the latest code can still read them. This test uses contents from a new path in the public S3 bucket, `compatibility-data-snapshots/`. It is populated by hand. The first important artifact is one from before we switch on compression, so that we will keep testing reads of uncompressed data. We will generate more artifacts ahead of other key changes, like when we update remote storage format for archival timelines. Closes: https://github.com/neondatabase/cloud/issues/15576	2024-08-02 18:28:23 +01:00
Arpad Müller	8c828c586e	Wait for completion of the upload queue in flush_frozen_layer (#8550 ) Makes `flush_frozen_layer` add a barrier to the upload queue and makes it wait for that barrier to be reached until it lets the flushing be completed. This gives us backpressure and ensures that writes can't build up in an unbounded fashion. Fixes #7317	2024-08-02 13:07:12 +02:00
John Spray	c53799044d	pageserver: refine how we delete timelines after shard split (#8436 ) ## Problem Previously, when we do a timeline deletion, shards will delete layers that belong to an ancestor. That is not a correctness issue, because when we delete a timeline, we're always deleting it from all shards, and destroying data for that timeline is clearly fine. However, there exists a race where one shard might start doing this deletion while another shard has not yet received the deletion request, and might try to access an ancestral layer. This creates ambiguity over the "all layers referenced by my index should always exist" invariant, which is important to detecting and reporting corruption. Now that we have a GC mode for clearing up ancestral layers, we can rely on that to clean up such layers, and avoid deleting them right away. This makes things easier to reason about: there are now no cases where a shard will delete a layer that belongs to a ShardIndex other than itself. ## Summary of changes - Modify behavior of RemoteTimelineClient::delete_all - Add `test_scrubber_physical_gc_timeline_deletion` to exercise this case - Tweak AWS SDK config in the scrubber to enable retries. Motivated by seeing the test for this feature encounter some transient "service error" S3 errors (which are probably nothing to do with the changes in this PR)	2024-08-02 08:00:46 +01:00
Alexander Bayandin	e7477855b7	test_runner: don't create artifacts if Allure is not enabled (#8580 ) ## Problem `allure_attach_from_dir` method might create `tar.zst` archives even if `--alluredir` is not set (i.e. Allure results collection is disabled) ## Summary of changes - Don't run `allure_attach_from_dir` if `--alluredir` is not set	2024-08-01 15:55:43 +00:00
John Spray	1678dea20f	pageserver: add layer visibility calculation (#8511 ) ## Problem We recently added a "visibility" state to layers, but nothing initializes it. Part of: - #8398 ## Summary of changes - Add a dependency on `range-set-blaze`, which is used as a fast incrementally updated alternative to KeySpace. We could also use this to replace the internals of KeySpaceRandomAccum if we wanted to. Writing a type that does this kind of "BtreeMap & merge overlapping entries" thing isn't super complicated, but no reason to write this ourselves when there's a third party impl available. - Add a function to layermap to calculate visibilities for each layer - Add a function to Timeline to call into layermap and then apply these visibilities to the Layer objects. - Invoke the calculation during startup, after image layer creations, and when removing branches. Branch removal and image layer creation are the two ways that a layer can go from Visible to Covered. - Add unit test & benchmark for the visibility calculation - Expose `pageserver_visible_physical_size` metric, which should always be <= `pageserver_remote_physical_size`. - This metric will feed into the /v1/utilization endpoint later: the visible size indicates how much space we would like to use on this pageserver for this tenant. - When `pageserver_visible_physical_size` is greater than `pageserver_resident_physical_size`, this is a sign that the tenant has long-idle branches, which result in layers that are visible in principle, but not used in practice. This does not keep visibility hints up to date in all cases: particularly, when creating a child timeline, any previously covered layers will not get marked Visible until they are accessed. Updates after image layer creation could be implemented as more of a special case, but this would require more new code: the existing depth calculation code doesn't maintain+yield the list of deltas that would be covered by an image layer. ## Performance This operation is done rarely (at startup and at timeline deletion), so needs to be efficient but not ultra-fast. There is a new `visibility` bench that measures runtime for a synthetic 100k layers case (`sequential`) and a real layer map (`real_map`) with ~26k layers. The benchmark shows runtimes of single digit milliseconds (on a ryzen 7950). This confirms that the runtime shouldn't be a problem at startup (as we already incur S3-level latencies there), but that it's slow enough that we definitely shouldn't call it more often than necessary, and it may be worthwhile to optimize further later (things like: when removing a branch, only bother scanning layers below the branchpoint) ``` visibility/sequential time: [4.5087 ms 4.5894 ms 4.6775 ms] change: [+2.0826% +3.9097% +5.8995%] (p = 0.00 < 0.05) Performance has regressed. Found 24 outliers among 100 measurements (24.00%) 2 (2.00%) high mild 22 (22.00%) high severe min: 0/1696070, max: 93/1C0887F0 visibility/real_map time: [7.0796 ms 7.0832 ms 7.0871 ms] change: [+0.3900% +0.4505% +0.5164%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe min: 0/1696070, max: 93/1C0887F0 visibility/real_map_many_branches time: [4.5285 ms 4.5355 ms 4.5434 ms] change: [-1.0012% -0.8004% -0.5969%] (p = 0.00 < 0.05) Change within noise threshold. ```	2024-08-01 09:25:35 +00:00
Alex Chi Z.	d6c79b77df	test(pageserver): add test_gc_feedback_with_snapshots (#8474 ) should be working after https://github.com/neondatabase/neon/pull/8328 gets merged. Part of https://github.com/neondatabase/neon/issues/8002 adds a new perf benchmark case that ensures garbages can be collected with branches --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-31 17:55:19 -04:00
Yuchen Liang	85bef9f05d	feat(scrubber): post `scan_metadata` results to storage controller (#8502 ) Part of #8128, followup to #8480. closes #8421. Enable scrubber to optionally post metadata scan health results to storage controller. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-30 16:07:34 +01:00
Yuchen Liang	e374d6778e	feat(storcon): store scrubber metadata scan result (#8480 ) Part of #8128, followed by #8502. ## Problem Currently we lack mechanism to alert unhealthy `scan_metadata` status if we start running this scrubber command as part of a cronjob. With the storage controller client introduced to storage scrubber in #8196, it is viable to set up alert by storing health status in the storage controller database. We intentionally do not store the full output to the database as the json blobs potentially makes the table really huge. Instead, only a health status and a timestamp recording the last time metadata health status is posted on a tenant shard. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-30 14:32:00 +01:00
Joonas Koivunen	bdfc9ca7e9	test: deflake test_duplicate_creation (#8536 ) By including comparison of `remote_consistent_lsn_visible` we risk flakyness coming from outside of timeline creation. Mask out the `remote_consistent_lsn_visible` for the comparison. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8489/10142336315/index.html#suites/ffbb7f9930a77115316b58ff32b7c719/89ff0270bf58577a	2024-07-29 13:41:06 +01:00
a-masterov	1d8cf5b3a9	Add a test for clickhouse as a logical replication consumer (#8408 ) ## Problem We need to test logical replication with 3rd-party tools regularly. ## Summary of changes Added a test using ClickHouse as a client Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-07-29 14:35:12 +02:00
Alexander Bayandin	da6bdff893	test_runner: fix user_property usage in benchmarks (#8531 ) ## Problem After https://github.com/neondatabase/neon/pull/7990 `regress_test` job started to fail with an error: ``` ... File "/__w/neon/neon/test_runner/fixtures/benchmark_fixture.py", line 485, in pytest_terminal_summary terminalreporter.write(f"{test_report.head_line}.{recorded_property['name']}: ") TypeError: 'bool' object is not subscriptable ``` https://github.com/neondatabase/neon/actions/runs/10125750938/job/28002582582 It happens because the current implementation doesn't expect pytest's `user_properties` can be used for anything else but benchmarks (and https://github.com/neondatabase/neon/pull/7990 started to use it for tracking `preserve_database_files` parameter) ## Summary of changes - Make NeonBenchmarker use only records with`neon_benchmarker_` prefix	2024-07-29 11:00:33 +01:00
Alexander Bayandin	6cad0455b0	CI(test_runner): Upload all test artifacts if preserve_database_files is enabled (#7990 ) ## Problem There's a `NeonEnvBuilder#preserve_database_files` parameter that allows you to keep database files for debugging purposes (by default, files get cleaned up), but there's no way to get these files from a CI run. This PR adds handling of `NeonEnvBuilder#preserve_database_files` and adds the compressed test output directory to Allure reports (for tests with this parameter enabled). Ref https://github.com/neondatabase/neon/issues/6967 ## Summary of changes - Compress and add the whole test output directory to Allure reports - Currently works only with `neon_env_builder` fixture - Remove `preserve_database_files = True` from sharding tests as unneeded --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-27 20:01:10 +01:00
Christian Schwarz	8154e88732	refactor(layer load API): all errors are permanent (#8527 ) I am not aware of a case of "transient" VirtualFile errors as mentioned in https://github.com/neondatabase/neon/pull/5880 Private DM with Joonas discussing this: https://neondb.slack.com/archives/D049K7HJ9JM/p1721836424615799	2024-07-26 15:48:44 +01:00
Vlad Lazar	7a796a9963	storcon: introduce step down primitive (#8512 ) ## Problem We are missing the step-down primitive required to implement rolling restarts of the storage controller. ## Summary of changes Add `/control/v1/step_down` endpoint which puts the storage controller into a state where it rejects all API requests apart from `/control/v1/step_down`, `/status` and `/metrics`. When receiving the request, storage controller cancels all pending reconciles and waits for them to exit gracefully. The response contains a snapshot of the in-memory observed state. Related: * https://github.com/neondatabase/cloud/issues/14701 * https://github.com/neondatabase/neon/issues/7797 * https://github.com/neondatabase/neon/pull/8310	2024-07-26 14:54:09 +01:00
Vlad Lazar	cdaa2816e7	pageserver: make vectored get the default read path for the pageserver (#8384 ) ## Problem Vectored get is already enabled in all prod regions without validation. The pageserver defaults are out of sync however. ## Summary of changes Update the pageserver defaults to match the prod config. Also means that when running tests locally, people don't have to use the env vars to get the prod config.	2024-07-26 14:19:52 +01:00
John Spray	65868258d2	tests: checkpoint instead of compact in test_sharding_split_compaction (#8473 ) ## Problem This test relies on writing image layers before the split. It can fail to do so durably if the image layers are written ahead of the remote consistent LSN, so we should have been doing a checkpoint rather than just a compaction	2024-07-26 11:03:44 +01:00
John Spray	775c0c8892	tests: adjust threshold in test_partial_evict_tenant (#8509 ) ## Problem This test was destabilized by https://github.com/neondatabase/neon/pull/8431. The threshold is arbitrary & failures are still quite close to it. At a high level the test is asserting "eviction was approximately fair to these tenants", which appears to still be the case when the abs diff between ratios is slightly higher at ~0.6-0.7. ## Summary of changes - Change threshold from 0.06 to 0.065. Based on the last ~10 failures that should be sufficient.	2024-07-25 15:00:42 +01:00
John Spray	24ea9f9f60	tests: always scrub on test exit when using S3Storage (#8437 ) ## Problem Currently, tests may have a scrub during teardown if they ask for it, but most tests don't request it. To detect "unknown unknowns", let's run it at the end of every test where possible. This is similar to asserting that there are no errors in the log at the end of tests. ## Summary of changes - Remove explicit `enable_scrub_on_exit` - Always scrub if remote storage is an S3Storage.	2024-07-25 14:19:38 +01:00
Vlad Lazar	9c5ad21341	storcon: make heartbeats restart aware (#8222 ) ## Problem Re-attach blocks the pageserver http server from starting up. Hence, it can't reply to heartbeats until that's done. This makes the storage controller mark the node off-line (not good). We worked around this by setting the interval after which nodes are marked offline to 5 minutes. This isn't a long term solution. ## Summary of changes * Introduce a new `NodeAvailability` state: `WarmingUp`. This state models the following time interval: * From receiving the re-attach request until the pageserver replies to the first heartbeat post re-attach * The heartbeat delta generator becomes aware of this state and uses a separate longer interval * Flag `max-warming-up-interval` now models the longer timeout and `max-offline-interval` the shorter one to match the names of the states Closes https://github.com/neondatabase/neon/issues/7552	2024-07-25 14:09:12 +01:00
Christian Schwarz	d57412aaab	followup(#8359 ): pre-initialize circuitbreaker metrics (#8491 )	2024-07-25 10:24:28 +02:00
John Spray	842c3d8c10	tests: simplify code around unstable `test_basebackup_with_high_slru_count` (#8477 ) ## Problem In `test_basebackup_with_high_slru_count`, the pageserver is sometimes mysteriously hanging on startup, having been started+stopped earlier in the test setup while populating template tenant data. - #7586 We can't see why this is hanging in this particular test. The test does some weird stuff though, like attaching a load of broken tenants and then doing a SIGQUIT kill of a pageserver. ## Summary of changes - Attach tenants normally instead of doing a failpoint dance to attach them as broken - Shut the pageserver down gracefully during init instead of using immediate mode - Remove the "sequential" variant of the unstable test, as this is going away soon anyway - Log before trying to acquire lock file, so that if it hangs we have a clearer sense of if that's really where it's hanging. It seems like it is, but that code does a non-blocking flock so it's surprising.	2024-07-24 11:26:24 +01:00
John Spray	f5db655447	pageserver: simplify LayerAccessStats (#8431 ) ## Problem LayerAccessStats contains a lot of detail that we don't use: short histories of most recent accesses, specifics on what kind of task accessed a layer, etc. This is all stored inside a Mutex, which is locked every time something accesses a layer. ## Summary of changes - Store timestamps at a very low resolution (to the nearest second), sufficient for use on the timescales of eviction. - Pack access time and last residence change time into a single u64 - Use the high bits of the u64 for other flags, including the new layer visibility concept. - Simplify the external-facing model for access stats to just include what we now track. Note that the `HistoryBufferWithDropCounter` is removed here because it is no longer used. I do not dislike this type, we just happen not to use it for anything else at present. Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-24 08:17:28 +01:00
John Spray	9e23410074	tests: allow-list a controller heartbeat error (#8471 ) ## Problem `test_change_pageserver` stops pageservers in a way that can overlap with the controller's heartbeats: the controller can get a heartbeat success and then immediately find the node unavailable. This particular situation triggers a log that isn't in our current allow-list of messages for nodes offline Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8339/10048487700/index.html#testresult/19678f27810231df/retries ## Summary of changes - Add the message to the allow list	2024-07-23 16:09:05 -04:00
Konstantin Knizhnik	563d73d923	Use smgrexists() instead of access() to enforce uniqueness of generated relfilenumber (#7992 ) ## Problem Postgres is using `access()` function in `GetNewRelFileNumber` to check if assigned relfilenumber is not used for any other relation. This check will not work in Neon, because we do not have all files in local storage. ## Summary of changes Use smgrexists() instead which will check at page server if such relfilenode is used. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-23 18:41:55 +03:00
dependabot[bot]	129f348aae	build(deps): bump openssl from 0.10.64 to 0.10.66 in /test_runner/pg_clients/rust/tokio-postgres (#8464 )	2024-07-23 14:05:07 +00:00
John Spray	80c8ceacbc	tests: make `test_scrubber_physical_gc_ancestors` more stable (#8453 ) ## Problem This test sometimes found that ancestors were getting cleaned up before it had done any compaction. Compaction was happening implicitly via Workload. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8298/10032173390/index.html#testresult/fb04786402f80822/retries ## Summary of changes - Set upload=False when writing data after shard split, to avoid doing a checkpoint - Add a checkpoint_period & explicit wait for uploads so that we ensure data lands in S3 without doing a checkpoint	2024-07-23 12:57:57 +01:00
Vlad Lazar	35854928d9	pageserver: use identity file as node id authority and remove init command and config-override flags (#7766 ) Ansible will soon write the node id to `identity.toml` in the work dir for new pageservers. On the pageserver side, we read the node id from the identity file if it is present and use that as the source of truth. If the identity file is missing, cannot be read, or does not deserialise, start-up is aborted. This PR also removes the `--init` mode and the `--config-override` flag from the `pageserver` binary. The neon_local is already not using these flags anymore. Ansible still uses them until the linked change is merged & deployed, so, this PR has to land simultaneously or after the Ansible change due to that. Related Ansible change: https://github.com/neondatabase/aws/pull/1322 Cplane change to remove config-override usages: https://github.com/neondatabase/cloud/pull/13417 Closes: https://github.com/neondatabase/neon/issues/7736 Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-23 11:41:12 +01:00
Konstantin Knizhnik	a868e342d4	Change default version of Neon extensio to 1.4	2024-07-22 17:58:07 +01:00
Yuchen Liang	595c450036	fix(scrubber): more robust metadata consistency checks (#8344 ) Part of #8128. ## Problem Scrubber uses `scan_metadata` command to flag metadata inconsistencies. To trust it at scale, we need to make sure the errors we emit is a reflection of real scenario. One check performed in the scrubber is to see whether layers listed in the latest `index_part.json` is present in object listing. Currently, the scrubber does not robustly handle the case where objects are uploaded/deleted during the scan. ## Summary of changes Condition for success: An object in the index is (1) in the object listing we acquire from S3 or (2) found in a HeadObject request (new object). - Add in the `HeadObject` requests for the layers missing from the object listing. - Keep the order of first getting the object listing and then downloading the layers. - Update check to only consider shards with highest shard count. - Skip analyzing a timeline if `deleted_at` tombstone is marked in `index_part.json`. - Add new test to see if scrubber actually detect the metadata inconsistency. _Misc_ - A timeline with no ancestor should always have some layers. - Removed experimental histograms _Caveat_ - Ancestor layer is not cleaned until #8308 is implemented. If ancestor layers reference non-existing layers in the index, the scrubber will emit false positives. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-22 14:53:33 +01:00
John Spray	8d948f2e07	tests: make test_change_pageserver more robust (#8442 ) ## Problem This test predates the storage controller. It stops pageservers and reconfigures computes, but that races with the storage controller's node failure detection, which can result in restarting nodes not getting the attachments they expect, and the test failing ## Summary of changes - Configure the storage controller to use a compute notify hook that does nothing, so that it cannot interfere with the test's configuration of computes. - Instead of using the attach hook, just notify the storage controller that nodes are offline, and reconcile tenants so that they will automatically be attached to the other node.	2024-07-22 14:17:02 +01:00
John Spray	98af1e365b	pageserver: remove absolute-order disk usage eviction (#8454 ) ## Problem Deployed pageserver configurations are all like this: ``` disk_usage_based_eviction: max_usage_pct: 85 min_avail_bytes: 0 period: "10s" eviction_order: type: "RelativeAccessed" args: highest_layer_count_loses_first: true ``` But we're maintaining this optional absolute order eviction, with test cases etc. ## Summary of changes - Remove absolute order eviction. Make the default eviction policy the same as how we really deploy pageservers.	2024-07-22 13:15:55 +01:00
John Spray	ebda667ef8	tests: more generous memory allowance in test_compaction_l0_memory (#8446 ) ## Problem This test is new, the limit was set experimentally and it turns out the memory consumption in CI runs varies more than expected. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10010912745/index.html#suites/9eebd1154fe19f9311ca7613f38156a1/82e40cf86a243ad5/	2024-07-22 11:50:30 +01:00
John Spray	a4fa250c92	tests: longer timeouts in test_timeline_deletion_with_files_stuck_in_upload_queue (#8438 ) ## Problem This test had two locations with 2 second timeouts, which is rather low when we run on a highly contended test machine running lots of tests in parallel. It usually passes, but today I've seen both of these locations time out on separate PRs. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8432/10007868041/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/6c6a092be083d27c ## Summary of changes - Change 2 second timeouts to 20 second timeouts	2024-07-19 19:30:28 +02:00

1 2 3 4 5 ...

1491 Commits