rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 21:20:37 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	324e4e008f	feat(storcon): timeline detach ancestor passthrough (#8353 ) Currently storage controller does not support forwarding timeline detach ancestor requests to pageservers. Add support for forwarding `PUT .../:tenant_id/timelines/:timeline_id/detach_ancestor`. Implement the support mostly as is, because the timeline detach ancestor will be made (mostly) idempotent in future PR. Cc: #6994	2024-07-15 18:08:24 +03:00
Christian Schwarz	b49b450dc4	remove page_service `show <tenant_id>` (#8372 ) This operation isn't used in practice, so let's remove it. Context: in https://github.com/neondatabase/neon/pull/8339	2024-07-15 15:33:56 +01:00
Konstantin Knizhnik	8a8b83df27	Add neon.running_xacts_overflow_policy to make it possible for RO replica to startup without primary even in case running xacts overflow (#8323 ) ## Problem Right now if there are too many running xacts to be restored from CLOG at replica startup, then replica is not trying to restore them and wait for non-overflown running-xacs WAL record from primary. But if primary is not active, then replica will not start at all. Too many running xacts can be caused by transactions with large number of subtractions. But right now it can be also cause by two reasons: - Lack of shutdown checkpoint which updates `oldestRunningXid` (because of immediate shutdown) - nextXid alignment on 1024 boundary (which cause loosing ~1k XIDs on each restart) Both problems are somehow addressed now. But we have existed customers with "sparse" CLOG and lack of checkpoints. To be able to start RO replicas for such customers I suggest to add GUC which allows replica to start even in case of subxacts overflow. ## Summary of changes Add `neon.running_xacts_overflow_policy` with the following values: - ignore: restore from CLOG last N XIDs and accept connections - skip: do not restore any XIDs from CXLOGbut still accept connections - wait: wait non-overflown running xacts record from primary node ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-15 15:52:00 +03:00
Arseny Sher	4bdfb96078	Fix test_timeline_copy flakiness. fixes https://github.com/neondatabase/neon/issues/8355	2024-07-15 15:21:16 +03:00
Luca Bruno	8da3b547f8	proxy/http: switch to typed_json (#8377 ) ## Summary of changes This switches JSON rendering logic to `typed_json` in order to reduce the number of allocations in the HTTP responder path. Followup from https://github.com/neondatabase/neon/pull/8319#issuecomment-2216991760. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-07-15 12:38:52 +01:00
Vlad Lazar	b329b1c610	tests: allow list breaching min resident size in statvfs test (#8358 ) ## Problem This test would sometimes violate the min resident size during disk eviction and fail due to the generate warning log. Disk usage candidate collection only takes into account active tenants. However, the statvfs call takes into account the entire tenants directory, which includes tenants which haven't become active yet. After re-starting the pageserver, disk usage eviction may kick in before both tenants have become active. Hence, the logic will try to satisfy thedisk usage requirements by evicting everything belonging to the active tenant, and hence violating the tenant minimum resident size. ## Summary of changes Allow the warning	2024-07-12 17:31:17 +01:00
Alex Chi Z	4184685721	fix(pageserver): unique test harness name for merge_in_between (#8366 ) As title, there should be a way to detect duplicated harness names in the future :( Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-12 14:28:13 +01:00
Conrad Ludgate	411a130675	Fix nightly warnings 2024 june (#8151 ) ## Problem new clippy warnings on nightly. ## Summary of changes broken up each commit by warning type. 1. Remove some unnecessary refs. 2. In edition 2024, inference will default to `!` and not `()`. 3. Clippy complains about doc comment indentation 4. Fix `Trait + ?Sized` where `Trait: Sized`. 5. diesel_derives triggering `non_local_defintions`	2024-07-12 13:58:04 +01:00
John Spray	0645ae318e	pageserver: circuit breaker on compaction (#8359 ) ## Problem We already back off on compaction retries, but the impact of a failing compaction can be so great that backing off up to 300s isn't enough. The impact is consuming a lot of I/O+CPU in the case of image layer generation for large tenants, and potentially also leaking disk space. Compaction failures are extremely rare and almost always indicate a bug, frequently a bug that will not let compaction to proceed until it is fixed. Related: https://github.com/neondatabase/neon/issues/6738 ## Summary of changes - Introduce a CircuitBreaker type - Add a circuit breaker for compaction, with a policy that after 5 failures, compaction will not be attempted again for 24 hours. - Add metrics that we can alert on: any >0 value for `pageserver_circuit_breaker_broken_total` should generate an alert. - Add a test that checks this works as intended. Couple notes to reviewers: - Circuit breakers are intrinsically a defense-in-depth measure: this is not the solution to any underlying issues, it is just a general mitigation for "unknown unknowns" that might be encountered in future. - This PR isn't primarily about writing a perfect CircuitBreaker type: the one in this PR is meant to be just enough to mitigate issues in compaction, and make it easy to monitor/alert on these failures. We can refine this type in future as/when we want to use it elsewhere.	2024-07-12 12:04:02 +01:00
Japin Li	86d6ef305a	Remove fs2 dependency (#8350 ) The fs2 dependency is not needed anymore after commit `d42700280`.	2024-07-12 12:56:06 +03:00
Arpad Müller	2e37aa3fe8	Implement decompression for vectored reads (#8302 ) Implement decompression of images for vectored reads. This doesn't implement support for still treating blobs as uncompressed with the bits we reserved for compression, as we have removed that functionality in #8300 anyways. Part of #5431	2024-07-12 04:32:34 +02:00
Arpad Müller	30bbfde50d	Pass configured compression param to image generation (#8363 ) We need to pass on the configured compression param during image layer generation. This was an oversight of #8106, and the likely cause why #8288 didn't bring any interesting regressions. Part of https://github.com/neondatabase/neon/issues/5431	2024-07-12 01:43:44 +02:00
Sasha Krassovsky	82b9a44ab4	Grant execute on snapshot functions to neon_superuser (#8346 ) ## Problem I need `neon_superuser` to be allowed to create snapshots for replication tests ## Summary of changes Adds a migration that grants these functions to neon_superuser	2024-07-11 20:29:35 +00:00
Joonas Koivunen	4a87bac036	test: limit `test_layer_download_timeouted` to MOCK_S3 (#8331 ) Requests against REAL_S3 on CI can consistently take longer than 1s; testing the short timeouts against it made no sense in hindsight, as MOCK_S3 works just as well. evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8229/9857994025/index.html#suites/b97efae3a617afb71cb8142f5afa5224/6828a50921660a32	2024-07-11 15:03:35 -04:00
Alex Chi Z	38b4ed297e	feat(pageserver): rewrite streaming vectored read planner (#8242 ) Rewrite streaming vectored read planner to be a separate struct. The API is designed to produce batches around `max_read_size` instead of exactly less than that so that `handle_XX` returns one batch a time. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-11 18:28:16 +00:00
Arseny Sher	cd29156927	Fix memory context of NeonWALReader allocation. Allocating it in short living context is wrong because it is reused during backend lifetime.	2024-07-11 20:31:15 +03:00
John Spray	814c8e8f68	storage controller: add node deletion API (#8226 ) ## Problem In anticipation of later adding a really nice drain+delete API, I initially only added an intentionally basic `/drop` API that is just about usable for deleting nodes in a pinch, but requires some ugly storage controller restarts to persuade it to restart secondaries. ## Summary of changes I started making a few tiny fixes, and ended up writing the delete API... - Quality of life nit: ordering of node + tenant listings in storcon_cli - Papercut: Fix the attach_hook using the wrong operation type for reporting slow locks - Make Service::spawn tolerate `generation_pageserver` columns that point to nonexistent node IDs. I started out thinking of this as a general resilience thing, but when implementing the delete API I realized it was actually a legitimate end state after the delete API is called (as that API doesn't wait for all reconciles to succeed). - Add a `DELETE` API for nodes, which does not gracefully drain, but does reschedule everything. This becomes safe to use when the system is in any state, but will incur availability gaps for any tenants that weren't already live-migrated away. If tenants have already been drained, this becomes a totally clean + safe way to decom a node. - Add a test and a storcon_cli wrapper for it This is meant to be a robust initial API that lets us remove nodes without doing ugly things like restarting the storage controller -- it's not quite a totally graceful node-draining routine yet. There's more work in https://github.com/neondatabase/neon/issues/8333 to get to our end-end state.	2024-07-11 17:05:47 +01:00
John Spray	0159ae9536	safekeeper: eviction metrics (#8348 ) ## Problem Follow up to https://github.com/neondatabase/neon/pull/8335, to improve observability of how many evict/restores we are doing. ## Summary of changes - Add `safekeeper_eviction_events_started_total` and `safekeeper_eviction_events_completed_total`, with a "kind" label of evict or restore. This gives us rates, and also ability to calculate how many are in progress. - Generalize SafekeeperMetrics test type to use the same helpers as pageserver, and enable querying any metric. - Read the new metrics at the end of the eviction test.	2024-07-11 17:05:35 +01:00
Vlad Lazar	d9a82468e2	storage_controller: fix ReconcilerWaiter::get_status (#8341 ) ## Problem SeqWait::would_wait_for returns Ok in the case when we would not wait for the sequence number and Err otherwise. ReconcilerWaiter::get_status uses it the wrong way around. This can cause the storage controller to go into a busy loop and make it look unavailable to the k8s controller. ## Summary of changes Use `SeqWait::would_wait_for` correctly.	2024-07-11 15:43:28 +01:00
Christian Schwarz	e26ef640c1	pageserver: remove `trace_read_requests` (#8338 ) `trace_read_requests` is a per `Tenant`-object option. But the `handle_pagerequests` loop doesn't know which `Tenant` object (i.e., which shard) the request is for. The remaining use of the `Tenant` object is to check `tenant.cancel`. That check is incorrect [if the pageserver hosts multiple shards](https://github.com/neondatabase/neon/issues/7427#issuecomment-2220577518). I'll fix that in a future PR where I completely eliminate the holding of `Tenant/Timeline` objects across requests. See [my code RFC](https://github.com/neondatabase/neon/pull/8286) for the high level idea. Note that we can always bring the tracing functionality if we need it. But since it's actually about logging the `page_service` wire bytes, it should be a `page_service`-level config option, not per-Tenant. And for enabling tracing on a single connection, we can implement a `set pageserver_trace_connection;` option.	2024-07-11 15:17:07 +02:00
Peter Bendel	c11b9cb43d	Run Performance bench on more platforms (#8312 ) ## Problem https://github.com/neondatabase/cloud/issues/14721 ## Summary of changes add one more platform to benchmarking job `57535c039c/.github/workflows/benchmarking.yml (L57C3-L126)` Run with pg 16, provisioner k8-neonvm by default on the new platform. Adjust some test cases to - not depend on database client <-> database server latency by pushing loops into server side pl/pgSQL functions - increase statement and test timeouts First successful run of these job steps https://github.com/neondatabase/neon/actions/runs/9869817756/job/27254280428	2024-07-11 10:07:12 +01:00
John Spray	69b6675da0	rfcs: add RFC for timeline archival (#8221 ) A design for a cheap low-resource state for idle timelines: - #8088	2024-07-11 08:23:51 +01:00
Stas Kelvich	6bbd34a216	Enable core dumps for postgres (#8272 ) Set core rmilit to ulimited in compute_ctl, so that all child processes inherit it. We could also set rlimit in relevant startup script, but that way we would depend on external setup and might inadvertently disable it again (core dumping worked in pods, but not in VMs with inittab-based startup).	2024-07-11 10:20:14 +03:00
John Spray	24f8133e89	safekeeper: add eviction_min_resident to stop evictions thrashing (#8335 ) ## Problem - The condition for eviction is not time-based: it is possible for a timeline to be restored in response to a client, that client times out, and then as soon as the timeline is restored it is immediately evicted again. - There is no delay on eviction at startup of the safekeeper, so when it starts up and sees many idle timelines, it does many evictions which will likely be immediately restored when someone uses the timeline. ## Summary of changes - Add `eviction_min_resident` parameter, and use it in `ready_for_eviction` to avoid evictions if the timeline has been resident for less than this period. - This also implicitly delays evictions at startup for `eviction_min_resident` - Set this to a very low number for the existing eviction test, which expects immediate eviction. The default period is 15 minutes. The general reasoning for that is that in the worst case where we thrash ~10k timelines on one safekeeper, downloading 16MB for each one, we should set a period that would not overwhelm the node's bandwidth.	2024-07-10 19:38:14 +01:00
Alex Chi Z	9f4511c554	feat(pageserver): add k-merge layer iterator with lazy loading (#8053 ) Part of https://github.com/neondatabase/neon/issues/8002. This pull request adds a k-merge iterator for bottom-most compaction. ## Summary of changes * Added back lsn_range / key_range in delta layer inner. This was removed due to https://github.com/neondatabase/neon/pull/8050, but added back because iterators need that information to process lazy loading. * Added lazy-loading k-merge iterator. * Added iterator wrapper as a unified iterator type for image+delta iterator. The current status and test should cover the use case for L0 compaction so that the L0 compaction process can bypass page cache and have a fixed amount of memory usage. The next step is to integrate this with the new bottom-most compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-10 14:11:27 -04:00
Arpad Müller	e78341e1c2	Remove ImageCompressionAlgorithm::DisabledNoDecompress (#8300 ) Removes the `ImageCompressionAlgorithm::DisabledNoDecompress` variant. We now assume any blob with the specific bits set is actually a compressed blob. The `ImageCompressionAlgorithm::Disabled` variant still remains and is the new default. Reverts large parts of #8238 , as originally intended in that PR. Part of #5431	2024-07-10 18:09:19 +02:00
dependabot[bot]	98387d6fb1	build(deps-dev): bump zipp from 3.8.1 to 3.19.1 Bumps [zipp](https://github.com/jaraco/zipp) from 3.8.1 to 3.19.1. - [Release notes](https://github.com/jaraco/zipp/releases) - [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst) - [Commits](https://github.com/jaraco/zipp/compare/v3.8.1...v3.19.1) --- updated-dependencies: - dependency-name: zipp dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2024-07-10 17:08:18 +01:00
Conrad Ludgate	1afab13ccb	proxy: remove some trace logs (#8334 )	2024-07-10 15:05:25 +01:00
John Spray	e89ec55ea5	tests: stabilize test_sharding_split_compaction (#8318 ) ## Problem This test incorrectly assumed that a post-split compaction would only drop content. This was easily destabilized by any changes to image generation rules. ## Summary of changes - Before split, do a full image layer generation pass, to guarantee that post-split compaction should only drop data, never create it. - Fix the force_image_layer_creation mode of compaction that we use from tests like this: previously it would try and generate image layers even if one already existed with the same layer key, which caused compaction to fail.	2024-07-10 14:14:10 +01:00
Conrad Ludgate	fe13fccdc2	proxy: pg17 fixes (#8321 ) ## Problem #7809 - we do not support sslnegotiation=direct #7810 - we do not support negotiating down the protocol extensions. ## Summary of changes 1. Same as postgres, check the first startup packet byte for tls header `0x16`, and check the ALPN. 2. Tell clients using protocol >3.0 to downgrade	2024-07-10 09:10:29 +01:00
Christian Schwarz	1a49f1c15c	pageserver: move `page_service`'s `import basebackup` / `import wal` to mgmt API (#8292 ) I want to fix bugs in `page_service` ([issue](https://github.com/neondatabase/neon/issues/7427)) and the `import basebackup` / `import wal` stand in the way / make the refactoring more complicated. We don't use these methods anyway in practice, but, there have been some objections to removing the functionality completely. So, this PR preserves the existing functionality but moves it into the HTTP management API. Note that I don't try to fix existing bugs in the code, specifically not fixing * it only ever worked correctly for unsharded tenants * it doesn't clean up on error All errors are mapped to `ApiError::InternalServerError`.	2024-07-09 23:17:42 +02:00
Christian Schwarz	9bb16c8780	fix(l0_flush): drops permit before fsync, potential cause for OOMs (#8327 ) ## Problem Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1720511577862519 We're seeing OOMs in staging on a pageserver that has l0_flush.mode=Direct enabled. There's a strong correlation between jumps in `maxrss_kb` and `pageserver_timeline_ephemeral_bytes`, so, it's quite likely that l0_flush.mode=Direct is the culprit. Notably, the expected max memory usage on that staging server by the l0_flush.mode=Direct is ~2GiB but we're seeing as much as 24GiB max RSS before the OOM kill. One hypothesis is that we're dropping the semaphore permit before all the dirtied pages have been flushed to disk. (The flushing to disk likely happens in the fsync inside the `.finish()` call, because we're using ext4 in data=ordered mode). ## Summary of changes Hold the permit until after we're done with `.finish()`.	2024-07-09 19:58:48 +01:00
Christian Schwarz	3f7aebb01c	refactor: postgres_backend: replace abstract shutdown_watcher with CancellationToken (#8295 ) Preliminary refactoring while working on https://github.com/neondatabase/neon/issues/7427 and specifically https://github.com/neondatabase/neon/pull/8286	2024-07-09 21:11:11 +03:00
Tristan Partin	abc330e095	Add an application_name to more Neon connections Helps identify connections in the logs.	2024-07-09 12:42:09 -05:00
Tristan Partin	6d3cb222ee	Refactor how migrations are ran Just a small improvement I noticed while looking at fixing CVE-2024-4317 in Neon.	2024-07-09 12:42:09 -05:00
Alex Chi Z	b1fe8259b4	fix(storage-scrubber): use default AWS authentication (#8299 ) part of https://github.com/neondatabase/cloud/issues/14024 close https://github.com/neondatabase/neon/issues/7665 Things running in k8s container use this authentication: https://docs.aws.amazon.com/sdkref/latest/guide/feature-container-credentials.html while we did not configure the client to use it. This pull request simply uses the default s3 client credential chain for storage scrubber. It might break compatibility with minio. ## Summary of changes * Use default AWS credential provider chain. * Improvements for s3 errors, we now have detailed errors and correct backtrace on last trial of the operation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-09 18:41:37 +01:00
Conrad Ludgate	4a5b55c834	chore: fix nightly build (#8142 ) ## Problem `cargo +nightly check` fails ## Summary of changes Updates `measured`, `time`, and `crc32c`. * `measured`: updated to fix https://github.com/rust-lang/rust/issues/125763. * `time`: updated to fix https://github.com/rust-lang/rust/issues/125319 * `crc32c`: updated to remove some nightly feature detection with a removed nightly feature	2024-07-09 18:25:49 +01:00
Alex Chi Z	73fa3c014b	chore(storage-scrubber): allow disable file logging (#8297 ) part of https://github.com/neondatabase/cloud/issues/14024, k8s does not always have a volume available for logging, and I'm running into weird permission errors... While I could spend time figuring out how to create temp directories for logging, I think it would be better to just disable file logging as k8s containers are ephemeral and we cannot retrieve anything on the fs after the container gets removed. ## Summary of changes `PAGESERVER_DISABLE_FILE_LOGGING=1` -> file logging disabled Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-09 17:11:37 +01:00
Luca BRUNO	c196cf6ac1	proxy/http: avoid spurious vector reallocations This tweaks the rows-to-JSON rendering logic in order to avoid allocating 0-sized temporary vectors and later growing them to insert elements. As the exact size is known in advance, both vectors can be built with an exact capacity upfront. This will avoid further vector growing/reallocation in the rendering hotpath. Signed-off-by: Luca BRUNO <lucab@lucabruno.net>	2024-07-09 15:20:00 +01:00
Alexander Bayandin	8b15864f59	CI(promote-compatibility-data): take into account commit sha (#8283 ) ## Problem In https://github.com/neondatabase/neon/pull/8161, we changed the path to Neon artefacts by adding commit sha to it, but we missed adding these changes to `promote-compatibility-data` job that we use for backward/forward- compatibility testing. ## Summary of changes - Add commit sha to `promote-compatibility-data`	2024-07-09 08:39:10 +00:00
Yuchen Liang	d9c1068cf4	tests: increase approx size equal threshold to avoid `test_lsn_lease_size` flakiness (#8282 ) ## Summary of changes Increase the `assert_size_approx_equal` threshold to avoid flakiness of `test_lsn_lease_size`. Still needs more investigation to fully resolve #8293. - Also set `autovacuum=off` for the endpoint we are running in the test. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-08 21:50:13 +01:00
John Spray	811eb88b89	tests: stabilize test_timeline_size_quota_on_startup (#8255 ) ## Problem `test_timeline_size_quota_on_startup` assumed that writing data beyond the size limit would always be blocked. This is not so: the limit is only enforced if feedback makes it back from the pageserver to the safekeeper + compute. Closes: https://github.com/neondatabase/neon/issues/6562 ## Summary of changes - Modify the test to wait for the pageserver to catch up. The size limit was never actually being enforced robustly, the original version of this test was just writing much more than 30MB and about 98% of the time getting lucky such that the feedback happened to arrive before the tests for loop was done. - If the test fails, log the logical size as seen by the pageserver.	2024-07-08 20:06:34 +00:00
Alex Chi Z	df3dc6e4c1	fix(pageserver): write to both v1+v2 for aux tenant import (#8316 ) close https://github.com/neondatabase/neon/issues/8202 ref https://github.com/neondatabase/neon/pull/6560 For tenant imports, we now write the aux files into both v1+v2 storage, so that the test case can pick either one for testing. Given the API is only used for testing, this looks like a safe change. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-08 20:05:59 +01:00
John Spray	daea26a22f	tests: use smaller layers in test_pg_regress (#8232 ) ## Problem Debug-mode runs of test_pg_regress are rather slow since https://github.com/neondatabase/neon/pull/8105, and occasionally exceed their 600s timeout. ## Summary of changes - Use 8MiB layer files, avoiding large ephemeral layers On a hetzner AX102, this takes the runtime from 230s to 190s. Which hopefully will be enough to get the runtime on github runners more reliably below its 600s timeout. This has the side benefit of exercising more of the pageserver stack (including compaction) under a workload that exercises a more diverse set of postgres functionality than most of our tests.	2024-07-08 19:05:35 +00:00
Alexey Kondratov	84b039e615	compute_ctl: Use 'fast' shutdown for Postgres termination (#8289 ) ## Problem We currently use 'immediate' mode in the most commonly used shutdown path, when the control plane calls a `compute_ctl` API to terminate Postgres inside compute without waiting for the actual pod / VM termination. Yet, 'immediate' shutdown doesn't create a shutdown checkpoint and ROs have bad times figuring out the list of running xacts during next start. ## Summary of changes Use 'fast' mode, which creates a shutdown checkpoint that is important for ROs to get a list of running xacts faster instead of going through the CLOG. On the control plane side, we poll this `compute_ctl` termination API for 10s, it should be enough as we don't really write any data at checkpoint time. If it times out, we anyway switch to the slow k8s-based termination. See https://www.postgresql.org/docs/current/server-shutdown.html for the list of modes and signals. The default VM shutdown hook already uses `fast` mode, see [1] [1] `c9fd8d7693/vm-image-spec.yaml (L30-L31)` Related to #6211	2024-07-08 19:54:02 +02:00
Yuchen Liang	a68edad913	refactor: move part of sharding API from `pageserver_api` to `utils` (#8254 ) ## Problem LSN Leases introduced in #8084 is a new API that is made shard-aware from day 1. To support ephemeral endpoint in #7994 without linking Postgres C API against `compute_ctl`, part of the sharding needs to reside in `utils`. ## Summary of changes - Create a new `shard` module in utils crate. - Move more interface related part of tenant sharding API to utils and re-export them in pageserver_api. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-08 15:43:10 +01:00
John Spray	fcdf060816	pageserver: respect has_relmap_file in collect_keyspace (#8276 ) ## Problem Rarely, a dbdir entry can exist with no `relmap_file_key` data. This causes compaction to fail, because it assumes that if the database exists, then so does the relmap file. Basebackup already handled this using a boolean to record whether such a key exists, but `collect_keyspace` didn't. ## Summary of changes - Respect the flag for whether a relfilemap exists in collect_keyspace - The reproducer for this issue will merge separately in https://github.com/neondatabase/neon/pull/8232	2024-07-08 15:39:41 +01:00
Tristan Partin	1c57f6bac3	Add long running replication tests These tests will help verify that replication, both physical and logical, works as expected in Neon. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-08 07:30:22 -07:00
Tristan Partin	b54dd9af15	Add PgBin.run_nonblocking() Allows a process to run without blocking program execution, which can be useful for certain test scenarios. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-08 07:30:22 -07:00
Tristan Partin	118847cd41	Log PG environment variables when a PgBin runs Useful for debugging situations like connecting to databases. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-08 07:30:22 -07:00

1 2 3 4 5 ...

5633 Commits