rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 05:00:37 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	34f450c05a	test: allow no vectored gets happening (#7939 ) when running the regress tests locally without any environment variables we use on CI, `test_pageserver_compaction_smoke` fails with division by zero. fix it temporarily by allowing no vectored read happening. to be cleaned when vectored get validation gets removed and the default value can be changed. Cc: https://github.com/neondatabase/neon/issues/7381	2024-06-03 09:37:11 -04:00
John Spray	7e60563910	pageserver: add GcError type (#7917 ) ## Problem - Because GC exposes all errors as an anyhow::Error, we have intermittent issues with spurious log errors during shutdown, e.g. in this failure of a performance test https://neon-github-public-dev.s3.amazonaws.com/reports/main/9300804302/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/214a2154f6f0217a/ ``` Gc failed 1 times, retrying in 2s: shutting down ``` GC really doesn't do a lot of complicated IO: it doesn't benefit from the backtrace capabilities of anyhow::Error, and can be expressed more robustly as an enum. ## Summary of changes - Add GcError type and use it instead of anyhow::Error in GC functions - In `gc_iteration_internal`, return GcError::Cancelled on shutdown rather than Ok(()) (we only used Ok before because we didn't have a clear cancellation error variant to use). - In `gc_iteration_internal`, skip past timelines that are shutting down, to avoid having to go through another GC iteration if we happen to see a deleting timeline during a GC run. - In `refresh_gc_info_internal`, avoid an error case where a timeline might not be found after being looked up, by carrying an Arc<Timeline> instead of a TimelineId between the first loop and second loop in the function. - In HTTP request handler, handle Cancelled variants as 503 instead of turning all GC errors into 500s.	2024-05-31 22:20:06 +01:00
John Spray	9fda85b486	pageserver: remove AncestorStopping error variants (#7916 ) ## Problem In all cases, AncestorStopping is equivalent to Cancelled. This became more obvious in https://github.com/neondatabase/neon/pull/7912#discussion_r1620582309 when updating these error types. ## Summary of changes - Remove AncestorStopping, always use Cancelled instead	2024-05-31 17:02:10 +01:00
Arthur Petukhovsky	16b2e74037	Add FullAccessTimeline guard in safekeepers (#7887 ) This is a preparation for https://github.com/neondatabase/neon/issues/6337. The idea is to add FullAccessTimeline, which will act as a guard for tasks requiring access to WAL files. Eviction will be blocked on these tasks and WAL won't be deleted from disk until there is at least one active FullAccessTimeline. To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`. After eviction is implemented, this function will be responsible for downloading missing WAL file and waiting until the download finishes. This commit also contains other small refactorings: - Separate `get_tenant_dir` and `get_timeline_dir` functions for building a local path. This is useful for looking at usages and finding tasks requiring access to local filesystem. - `timeline_manager` is now responsible for spawning all background tasks - WAL removal task is now spawned instantly after horizon is updated	2024-05-31 13:19:45 +00:00
Arseny Sher	1fcc2b37eb	Add test checking term change during pull_timeline.	2024-05-31 12:58:59 +03:00
Arseny Sher	af40bf3c2e	Fix term/epoch confusion in python tests. Call epoch last_log_term and add separate term field.	2024-05-31 12:58:59 +03:00
John Spray	98dadf8543	pageserver: quieten some shutdown logs around logical size and flush (#7907 ) ## Problem Looking at several noisy shutdown logs: - In https://github.com/neondatabase/neon/issues/7861 we're hitting a log error with `InternalServerError(timeline shutting down\n'` on the checkpoint API handler. - In the field, we see initial_logical_size_calculation errors on shutdown, via DownloadError - In the field, we see errors logged from layer download code (independent of the error propagated) during shutdown Closes: https://github.com/neondatabase/neon/issues/7861 ## Summary of changes The theme of these changes is to avoid propagating anyhow::Errors for cases that aren't really unexpected error cases that we might want a stacktrace for, and avoid "Other" error variants unless we really do have unexpected error cases to propagate. - On the flush_frozen_layers path, use the `FlushLayerError` type throughout, rather than munging it into an anyhow::Error. Give FlushLayerError an explicit from_anyhow helper that checks for timeline cancellation, and uses it to give a Cancelled error instead of an Other error when the timeline is shutting down. - In logical size calculation, remove BackgroundCalculationError (this type was just a Cancelled variant and an Other variant), and instead use CalculateLogicalSizeError throughout. This can express a PageReconstructError, and has a From impl that translates cancel-like page reconstruct errors to Cancelled. - Replace CalculateLogicalSizeError's Other(anyhow::Error) variant case with a Decode(DeserializeError) variant, as this was the only kind of error we actually used in the Other case. - During layer download, drop out early if the timeline is shutting down, so that we don't do an `error!()` log of the shutdown error in this case.	2024-05-31 09:18:58 +01:00
Konstantin Knizhnik	7ac11d3942	Do not produce error if gin page is not restored in redo (#7876 ) ## Problem See https://github.com/neondatabase/cloud/issues/10845 ## Summary of changes Do not report error if GIN page is not restored ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-29 22:18:09 +03:00
Peter Bendel	f9f69a2ee7	clarify how to load the dbpedia vector embeddings into a postgres database (#7894 ) ## Problem Improve the readme for the data load step in the pgvector performance test.	2024-05-28 17:21:09 +03:00
Peter Bendel	fabeff822f	Performance test for pgvector HNSW index build and queries (#7873 ) ## Problem We want to regularly verify the performance of pgvector HNSW parallel index builds and parallel similarity search using HNSW indexes. The first release that considerably improved the index-build parallelism was pgvector 0.7.0 and we want to make sure that we do not regress by our neon compute VM settings (swap, memory over commit, pg conf etc.) ## Summary of changes Prepare a Neon project with 1 million openAI vector embeddings (vector size 1536). Run HNSW indexing operations in the regression test for the various distance metrics. Run similarity queries using pgbench with 100 concurrent clients. I have also added the relevant metrics to the grafana dashboards pgbench and olape --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-05-28 11:05:33 +00:00
Arseny Sher	4a0ce9512b	Add safekeeper test truncating WAL. We do it as a part of more complicated tests like test_compute_restarts, but let's have a simple test as well.	2024-05-28 11:08:29 +03:00
Arseny Sher	b2d34a82b9	Make python Safekeeper datadir Path instead of str.	2024-05-25 06:06:32 +03:00
Arseny Sher	3797566c36	safekeeper: test pull_timeline with WAL gc. Do pull_timeline while WAL is being removed. To this end - extract pausable_failpoint to utils, sprinkle pull_timeline with it - add 'checkpoint' sk http endpoint to force WAL removal. After fixing checking for pull file status code test fails so far which is expected.	2024-05-25 06:06:32 +03:00
John Spray	3860bc9c6c	pageserver: post-shard-split layer rewrites (2/2) (#7531 ) ## Problem - After a shard split of a large existing tenant, child tenants can end up with oversized historic layers indefinitely, if those layers are prevented from being GC'd by branchpoints. This PR follows https://github.com/neondatabase/neon/pull/7531, and adds rewriting of layers that contain a mixture of needed & un-needed contents, in addition to dropping un-needed layers. Closes: https://github.com/neondatabase/neon/issues/7504 ## Summary of changes - Add methods to ImageLayer for reading back existing layers - Extend `compact_shard_ancestors` to rewrite layer files that contain a mixture of keys that we want and keys we do not, if unwanted keys are the majority of those in the file. - Amend initialization code to handle multiple layers with the same LayerName properly - Get rid of of renaming bad layer files to `.old` since that's now expected on restarts during rewrites.	2024-05-24 08:33:19 +00:00
MMeent	0e4f182680	Rework PageStream connection state handling: (#7611 ) * Make PS connection startup use async APIs This allows for improved query cancellation when we start connections * Make PS connections have per-shard connection retry state. Previously they shared global backoff state, which is bad for quickly getting all connections started and/or back online. * Make sure we clean up most connection state on failed connections. Previously, we could technically leak some resources that we'd otherwise clean up. Now, the resources are correctly cleaned up. * pagestore_smgr.c now PANICs on unexpected response message types. Unexpected responses are likely a symptom of having a desynchronized view of the connection state. As a desynchronized connection state can cause corruption, we PANIC, as we don't know what data may have been written to buffers: the only solution is to fail fast & hope we didn't write wrong data. * Catch errors in sync pagestream request handling. Previously, if a query was cancelled after a message was sent to the pageserver, but before the data was received, the backend could forget that it sent the synchronous request, and let others deal with the repercussions. This could then lead to incorrect responses, or errors such as "unexpected response from page server with tag 0x68"	2024-05-23 23:26:42 +02:00
Joonas Koivunen	49d7f9b5a4	test_import_from_pageserver_small: try to make less flaky (#7843 ) With #7828 and proper fullbackup testing the test became flaky ([evidence]). - produce better assertion messages in `assert_pageserver_backups_equal` - use read only endpoint to confirm the row count [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7839/9192447962/index.html#suites/89cfa994d71769e01e3fc4f475a1f3fa/49009214d0f8b8ce	2024-05-23 14:44:08 +03:00
John Spray	545f7e8cd7	tests: fix an allow list entry (#7856 ) https://github.com/neondatabase/neon/pull/7844 typo'd one of the expressions: https://neon-github-public-dev.s3.amazonaws.com/reports/main/9196993886/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/e420fbfdb193bf80/	2024-05-23 10:50:21 +01:00
Joonas Koivunen	58e31fe098	test_attach_tenant_config: add allowed error (#7839 ) [evidence] of quite rare flaky. the detach can cause this with the right timing. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7650/9191613501/index.html#suites/7745dadbd815ab87f5798aa881796f47/2190222925001078	2024-05-23 11:25:38 +03:00
Alex Chi Z	ff560a1113	chore(pageserver): use kebab case for compaction algorithms (#7845 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 21:28:47 +00:00
John Spray	f98fdd20e3	tests: add a couple of allow lists for shutdown cases (#7844 ) ## Problem Failures on some of our uglier shutdown log messages: https://neon-github-public-dev.s3.amazonaws.com/reports/main/9192662995/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/51b365408678c66f/ ## Summary of changes - Allow-list these errors.	2024-05-22 18:38:22 +00:00
John Spray	014f822a78	tests: refine test_secondary_background_downloads (#7829 ) ## Problem This test relied on some sleeps, and was failing ~5% of the time. ## Summary of changes Use log-watching rather than straight waits, and make timeouts more generous for the CI environment.	2024-05-22 19:17:47 +01:00
Alex Chi Z	ddd8ebd253	chore(pageserver): use kebab case for aux file flag (#7840 ) part of https://github.com/neondatabase/neon/issues/7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 17:06:00 +00:00
Heikki Linnakangas	9217564026	Fix issues with determining request LSN in read replica (#7795 ) Don't set last-written LSN of a page when the record is replayed, only when the page is evicted from cache. For comparison, we don't update the last-written LSN on every page modification on the primary either, only when the page is evicted. Do update the last-written LSN when the page update is skipped in WAL redo, however. In neon_get_request_lsns(), don't be surprised if the last-written LSN is equal to the record being replayed. Use the LSN of the record being replayed as the request LSN in that case. Add a long comment explaining how that can happen. In neon_wallog_page, update last-written LSN also when Shutdown has been requested. We might still fetch and evict pages for a while, after shutdown has been requested, so we better continue to do that correctly. Enable the check that we don't evict a page with zero LSN also in standby, but make it a LOG message instead of PANIC Fixes issue https://github.com/neondatabase/neon/issues/7791	2024-05-22 18:24:21 +03:00
Joonas Koivunen	ce44dfe353	openapi: document timeline ancestor detach (#7650 ) The openapi description with the error descriptions: - 200 is used for "detached or has been detached previously" - 400 is used for "cannot be detached right now" -- it's an odd thing, but good enough - 404 is used for tenant or timeline not found - 409 is used for "can never be detached" (root timeline) - 500 is used for transient errors (basically ill-defined shutdown errors) - 503 is used for busy (other tenant ancestor detach underway, pageserver shutdown) Cc: #6994	2024-05-22 13:55:34 +00:00
Joonas Koivunen	df9ab1b5e3	refactor(test): duplication with fullbackup, tar content hashing (#7828 ) "taking a fullbackup" is an ugly multi-liner copypasted in multiple places, most recently with timeline ancestor detach tests. move it under `PgBin` which is not a great place, but better than yet another utility function. Additionally: - cleanup `psql_env` repetition (PgBin already configures that) - move the backup tar comparison as a yet another free utility function - use backup tar comparison in `test_import.py` where a size check was done previously - cleanup extra timeline creation from test Cc: #7715	2024-05-22 15:43:21 +03:00
Joonas Koivunen	a8a88ba7bc	test(detach_ancestor): ensure L0 compaction in history is ok (#7813 ) detaching a timeline from its ancestor can leave the resulting timeline with more L0 layers than the compaction threshold. most of the time, the detached timeline has made progress, and next time the L0 -> L1 compaction happens near the original branch point and not near the last_record_lsn. add a test to ensure that inheriting the historical L0s does not change fullbackup. additionally: - add `wait_until_completed` to test-only timeline checkpoint and compact HTTP endpoints. with `?wait_until_completed=true` the endpoints will wait until the remote client has completed uploads. - for delta layers, describe L0-ness with the `/layer` endpoint Cc: #6994	2024-05-21 20:08:43 +03:00
Tristan Partin	1988ad8db7	Extend test_unlogged to include a sequence Unlogged sequences were added in v15, so let's just test to make sure they work on Neon.	2024-05-21 09:18:11 -05:00
Tristan Partin	8030b8e4c5	Fix test_pg_regress for unlogged relations Previously we worked around file comparison issues by dropping unlogged relations in the pg_regress tests, but this would lead to an unnecessary diff when compared to upstream in our Postgres fork. Instead, we can precompute the files that we know will be different, and ignore them.	2024-05-21 09:18:11 -05:00
Tristan Partin	9a4b896636	Use a constant for database name in test_pg_regress	2024-05-21 09:18:11 -05:00
Tristan Partin	e8b8ebfa1d	Allow check_restored_datadir_content to ignore certain files Some files may have known differences that we are okay with.	2024-05-21 09:18:11 -05:00
Tristan Partin	d9d471e3c4	Add some Python typing in a few test files	2024-05-21 09:18:11 -05:00
Arseny Sher	f2771a99b7	Add metric for pageserver standby horizon. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Arseny Sher	f54c3b96e0	Fix bugs in hot standby feedback propagation and add test for it. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Arseny Sher	478cc37a70	Propagate standby apply LSN to pageserver to hold off GC. To avoid pageserver gc'ing data needed by standby, propagate standby apply LSN through standby -> safekeeper -> broker -> pageserver flow and hold off GC for it. Iteration of GC resets the value to remove the horizon when standby goes away -- pushes are assumed to happen at least once between gc iterations. As a safety guard max allowed lag compared to normal GC horizon is hardcoded as 10GB. Add test for the feature. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
John Spray	291fcb9e4f	pageserver: use the heatmap upload interval to set the secondary download interval (#7793 ) ## Problem The heatmap upload period is configurable, but secondary mode downloads were using a fixed download period. Closes: #6200 ## Summary of changes - Use the upload period in the heatmap to adjust the download period. In practice, this will reduce the frequency of downloads from its current 60 second period to what heatmaps use, which is 5-10m depending on environment. This is an improvement rather than being optimal: we could be smarter about periods, and schedule downloads to occur around the time we expect the next upload, rather than just using the same period, but that's something we can address in future if it comes up.	2024-05-20 09:25:25 +01:00
Alex Chi Z	aaf60819fa	feat(pageserver): persist aux file policy in index part (#7668 ) Part of https://github.com/neondatabase/neon/issues/7462 ## Summary of changes Tenant config is not persisted unless it's attached on the storage controller. In this pull request, we persist the aux file policy flag in the `index_part.json`. Admins can set `switch_aux_file_policy` in the storage controller or using the page server API. Upon the first aux file gets written, the write path will compare the aux file policy target with the current policy. If it is switch-able, we will do the switch. Otherwise, the original policy will be used. The test cases show what the admins can do / cannot do. The `last_aux_file_policy` is stored in `IndexPart`. Updates to the persisted policy are done via `schedule_index_upload_for_aux_file_policy_update`. On the write path, the writer will update the field. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-17 19:22:49 +00:00
John Spray	c84656a53e	pageserver: implement auto-splitting (#7681 ) ## Problem Currently tenants are only split into multiple shards if a human being calls the API to do it. Issue: #7388 ## Summary of changes - Add a pageserver API for returning the top tenants by size - Add a step to the controller's background loop where if there is no reconciliation or optimization to be done, it looks for things to split. - Add a test that runs pgbench on many tenants concurrently, and checks that splitting happens as expected as tenants grow, without interrupting the client I/O. This PR is quite basic: there is a tasklist in https://github.com/neondatabase/neon/issues/7388 for further work. This PR is meant to be safe (off by default), and sufficient to enable our staging environment to run lots of sharded tenants without a human having to set them up.	2024-05-17 16:01:24 +00:00
John Spray	a8e6d259cb	pageserver: fixes for layer path changes (#7786 ) ## Problem - When a layer with legacy local path format is evicted and then re-downloaded, a panic happened because the path downloaded by remote storage didn't match the path stored in Layer. - While investigating, I also realized that secondary locations would have a similar issue with evictions. Closes: #7783 ## Summary of changes - Make remote timeline client take local paths as an input: it should not have its own ideas about local paths, instead it just uses the layer path that the Layer has. - Make secondary state store an explicit local path, populated on scan of local disk at startup. This provides the same behavior as for Layer, that our local_layer_path is a _default_, but the layer path can actually be anything (e.g. an old style one). - Add tests for both cases.	2024-05-17 13:24:03 +01:00
Christian Schwarz	6d951e69d6	test_suite: patch, don't replace, the `tenant_config` field, where appropriate (#7771 ) Before this PR, the changed tests would overwrite the entire `tenant_config` because `pageserver_config_override` is merged non-recursively into the `ps_cfg`. This meant they would override the `PAGESERVER_DEFAULT_TENANT_CONFIG_COMPACTION_ALGORITHM`, impacting our matrix build for `compaction_algorithm=Tiered\|Legacy` in https://github.com/neondatabase/neon/pull/7748. I found the tests fixed in this PR using the `NEON_PAGESERVER_PANIC_ON_UNSPECIFIED_COMPACTION_ALGORITHM` env var that I added in #7748. Therefore, I think this is an exhaustive fix. This is better than just searching the code base for `tenant_config`, which is what I had sketched in #7747. refs #7749	2024-05-17 12:24:02 +02:00
Conrad Ludgate	790c05d675	proxy: swap tungstenite for a simpler impl (#7353 ) ## Problem I wanted to do a deep dive of the tungstenite codebase. tokio-tungstenite is incredibly convoluted... In my searching I found [fastwebsockets by deno](https://github.com/denoland/fastwebsockets), but it wasn't quite sufficient. This also removes the default 16MB/64MB frame/message size limitation. framed-websockets solves this by inserting continuation frames for partially received messages, so the whole message does not need to be entirely read into memory. ## Summary of changes I took the fastwebsockets code as a starting off point and rewrote it to be simpler, server-only, and be poll-based to support our Read/Write wrappers. I have replaced our tungstenite code with my framed-websockets fork. <https://github.com/neondatabase/framed-websockets>	2024-05-16 13:05:50 +02:00
Andrew Rudenko	923cf91aa4	compute_ctl: catalog API endpoints (#7575 ) ## Problem There are two cloud's features that require extra compute endpoints. 1. We are running pg_dump to get DB schemas. Currently, we are using a special service for this. But it would be great to execute pg_dump in an isolated environment. And we already have such an environment, it's our compute! And likely enough pg_dump already exists there too! (see https://github.com/neondatabase/cloud/issues/11644#issuecomment-2084617832) 2. We need to have a way to get databases and roles from compute after time travel (see https://github.com/neondatabase/cloud/issues/12109) ## Summary of changes It adds two API endpoints to compute_ctl HTTP API that target both of the aforementioned cases. --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-05-16 12:04:16 +02:00
Alex Chi Z	c6d5ff944d	fix(test): ensure fixtures are correctly used for pageserver_aux_file_policy (#7769 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-15 18:29:12 +00:00
Jure Bajic	affc18f912	Add performance regress `test_ondemand_download_churn.py` (#7242 ) Add performance regress test for on-demand download throughput. Closes https://github.com/neondatabase/neon/issues/7146 Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-05-15 18:41:12 +02:00
Christian Schwarz	3ef6e21211	fixup #7747 : actually use the fixture for neon_env_builder (#7767 ) The `= None` makes it not use the fixture. This slipped due to last-minute changes.	2024-05-15 18:17:55 +02:00
Arpad Müller	1075386d77	Add test_uploads_and_deletions test (#7758 ) Adds a test that is a reproducer for many tiered compaction bugs, both ones that have since been fixed as well as still unfxied ones: * (now fixed) #7296 * #7707 * #7759 * Likely also #7244 but I haven't tried that. The key ordering bug can be reproduced by switching to `merge_delta_keys` instead of `merge_delta_keys_buffered`, so reverting a big part of #7661, although it only sometimes reproduces (30-50% of cases). part of https://github.com/neondatabase/neon/issues/7554	2024-05-15 15:32:47 +02:00
Christian Schwarz	c3dd646ab3	chore!: always use async walredo, warn if sync is configured (#7754 ) refs https://github.com/neondatabase/neon/issues/7753 This PR is step (1) of removing sync walredo from Pageserver. Changes: * Remove the sync impl * If sync is configured, warn! and use async instead * Remove the metric that exposes `kind` * Remove the tenant status API that exposes `kind` Future Work ----------- After we've released this change to prod and are sure we won't roll back, we will 1. update the prod Ansible to remove the config flag from the prod pageserver.toml. 2. remove the remaining `kind` code in pageserver These two changes need no release inbetween. See https://github.com/neondatabase/neon/issues/7753 for details.	2024-05-15 15:04:52 +02:00
John Spray	f342b87f30	pageserver: remove Option<> around remote storage, clean up metadata file refs (#7752 ) ## Problem This is historical baggage from when the pageserver could be run with local disk only: we had a bunch of places where we had to treat remote storage as optional. Closes: https://github.com/neondatabase/neon/issues/6890 ## Changes - Remove Option<> around remote storage (in https://github.com/neondatabase/neon/pull/7722 we made remote storage clearly mandatory) - Remove code for deleting old metadata files: they're all gone now. - Remove other references to metadata files when loading directories, as none exist. I checked last 14 days of logs for "found legacy metadata", there are no instances.	2024-05-15 12:05:24 +00:00
Christian Schwarz	4eedb3b6f1	test suite: allow overriding default compaction algorithm via env var (#7747 ) This PR allows setting the `PAGESERVER_DEFAULT_TENANT_CONFIG_COMPACTION_ALGORITHM` env var to override the `tenant_config.compaction_algorithm` field in the initial `pageserver.toml` for all tests. I tested manually that this works by halting a test using pdb and inspecting the `effective_config` in the tenant status managment API. If the env var is set, the tests are parametrized by the `kind` tag field, allowing to do a matrix build in CI and let Allure summarize everything in a nice report. If the env var is not set, the tests are not parametrized. So, merging this PR doesn't cause problems for flaky test detection. In fact, it doesn't cause any runtime change if the env var is not set. There are some tests in the test suite that set used to override the entire tenant_config using `NeonEnvBuilder.pageserver_config_override`. Since config overrides are merged non-recursively, such overrides that don't specify `kind = ` cause a fallback to pageserver's built-in `DEFAULT_COMPACTION_ALGORITHM`. Such cases can be found using ``` ["']tenant_config\s*[='"] ``` We'll deal with these tests in a future PR. closes https://github.com/neondatabase/neon/issues/7555	2024-05-14 18:03:08 +02:00
Alex Chi Z	30d15ad403	chore(test): add version check for forward compat test (#7685 ) A test for https://github.com/neondatabase/neon/pull/7684. This pull request checks if the pageserver version we specified is the one actually running by comparing the git hash in forward compatibility tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-14 10:36:48 -04:00
John Spray	df0f1e359b	pageserver: switch on new-style local layer paths (#7660 ) We recently added support for local layer paths that contain a generation number: - https://github.com/neondatabase/neon/pull/7609 - https://github.com/neondatabase/neon/pull/7640 Now that we've cut a [release](https://github.com/neondatabase/neon/pull/7735) that includes those changes, we can proceed to enable writing the new format without breaking forward compatibility.	2024-05-14 09:37:48 +01:00

... 5 6 7 8 9 ...

1621 Commits