rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-06 21:50:37 +00:00

Author	SHA1	Message	Date
Arpad Müller	5446e08891	Move remote_storage config related code into dedicated module (#8132 ) Moves `RemoteStorageConfig` and related structs and functions into a dedicated module. Also implements `Serialize` for the config structs (requested in #8126). Follow-up of #8126	2024-06-24 12:29:54 +02:00
Conrad Ludgate	78d9059fc7	proxy: update tokio-postgres to allow arbitrary config params (#8076 ) ## Problem Fixes https://github.com/neondatabase/neon/issues/1287 ## Summary of changes tokio-postgres now supports arbitrary server params through the `param(key, value)` method. Some keys are special so we explicitly filter them out.	2024-06-24 10:20:27 +00:00
Arpad Müller	75747cdbff	Use serde for RemoteStorageConfig parsing (#8126 ) Adds a `Deserialize` impl to `RemoteStorageConfig`. We thus achieve the same as #7743 but with less repetitive code, by deriving `Deserialize` impls on `S3Config`, `AzureConfig`, and `RemoteStorageConfig`. The disadvantage is less useful error messages. The git history of this PR contains a state where we go via an intermediate representation, leveraging the `serde_json` crate, without it ever being actual json though. Also, the PR adds deserialization tests. Alternative to #7743 .	2024-06-22 17:57:09 +00:00
John Spray	59f949b4a8	pageserver: remove unused load/ignore APIs (#8122 ) ## Problem These APIs have be unused for some time. They were superseded by /location_conf: the equivalent of ignoring a tenant is now to put it in secondary mode. ## Summary of changes - Remove APIs - Remove tests & helpers that used them - Remove error variants that are no longer needed.	2024-06-21 10:02:15 +00:00
Vlad Lazar	5778d714f0	storcon: add drain and fill background operations for graceful cluster restarts (#8014 ) ## Problem Pageserver restarts cause read availablity downtime for tenants. See `Motivation` section in the [RFC](https://github.com/neondatabase/neon/pull/7704). ## Summary of changes * Introduce a new `NodeSchedulingPolicy`: `PauseForRestart` * Implement the first take of drain and fill algorithms * Add a node status endpoint which can be polled to figure out when an operation is done The implementation follows the RFC, so it might be useful to peek at it as you're reviewing. Since the PR is rather chunky, I've made sure all commits build (with warnings), so you can review by commit if you prefer that. RFC: https://github.com/neondatabase/neon/pull/7704 Related https://github.com/neondatabase/neon/issues/7387	2024-06-19 11:55:30 +01:00
Yuchen Liang	30b890e378	feat(pageserver): use leases to temporarily block gc (#8084 ) Part of #7497, extracts from #7996, closes #8063. ## Problem With the LSN lease API introduced in https://github.com/neondatabase/neon/issues/7808, we want to implement the real lease logic so that GC will keep all the layers needed to reconstruct all pages at all the leased LSNs with valid leases at a given time. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-18 17:37:06 +00:00
Arpad Müller	68a2298973	Add support to specifying storage account in AzureConfig (#8090 ) We want to be able to specify the storage account via the toml configuration, so that we can connect to multiple storage accounts in the same process. https://neondb.slack.com/archives/C06SJG60FRB/p1718702144270139	2024-06-18 16:03:23 +02:00
MMeent	e729f28205	Fix log rates (#8035 ) ## Summary of changes - Stop logging HealthCheck message passing at INFO level (moved to DEBUG) - Stop logging /status accesses at INFO (moved to DEBUG) - Stop logging most occurances of `missing config file "compute_ctl_temp_override.conf"` - Log memory usage only when the data has changed significantly, or if we've not recently logged the data, rather than always every 2 seconds.	2024-06-17 18:57:49 +00:00
Alex Chi Z	81892199f6	chore(pageserver): vectored get target_keyspace directly accums (#8055 ) follow up on https://github.com/neondatabase/neon/pull/7904 avoid a layer of indirection introduced by `Vec<Range<Key>>` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-14 11:57:58 -04:00
Conrad Ludgate	e6eb0020a1	update rust to 1.79.0 (#8048 ) ## Problem rust 1.79 new enabled by default lints ## Summary of changes * update to rust 1.79 * `s/default_features/default-features/` * fix proxy dead code. * fix pageserver dead code.	2024-06-14 13:23:52 +02:00
Arpad Müller	75bca9bb19	Perform retries on azure bulk deletion (#7964 ) This adds retries to the bulk deletion, because if there is a certain chance n that a request fails, the chance that at least one of the requests in a chain of requests fails increases exponentially. We've had similar issues with the S3 DR tests, which in the end yielded in adding retries at the remote_storage level. Retries at the top level are not sufficient when one remote_storage "operation" is multiple network requests in a trench coat, especially when there is no notion of saving the progress: even if prior deletions had been successful, we'd still need to get a 404 in order to continue the loop and get to the point where we failed in the last iteration. Maybe we'll fail again but before we've even reached it. Retries at the bottom level avoid this issue because they have the notion of progress and also when one network operation fails, only that operation is retried. First part of #7931.	2024-06-06 14:21:27 +00:00
Yuchen Liang	630cfbe420	refactor(pageserver): designated api error type for cancelled request (#7949 ) Closes #7406. ## Problem When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is exposed to handle that case, which verbosely logs the error. However, we don't benefit from having the full backtrace provided by anyhow in this case. ## Summary of changes This PR introduces a new `ApiError` type to handle errors caused by cancelled request more robustly. - A new enum variant `ApiError::Cancelled` - Currently the cancelled request is mapped to status code 500. - Need to handle this error in proxy's `http_util` as well. - Added a failpoint test to simulate cancelled `get_lsn_by_timestamp` request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-06 14:00:14 +00:00
Christian Schwarz	17116f2ea9	fix(pageserver): abort on duplicate layers, before doing damage (#7799 ) fixes https://github.com/neondatabase/neon/issues/7790 (duplicating most of the issue description here for posterity) # Background From the time before always-authoritative `index_part.json`, we had to handle duplicate layers. See the RFC for an illustration of how duplicate layers could happen: `a8e6d259cb/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md (L41-L50)` As of #5198 , we should not be exposed to that problem anymore. # Problem 1 We still have 1. [code in Pageserver](`82960b2175/pageserver/src/tenant/timeline.rs (L4502-L4521)`) than handles duplicate layers 2. [tests in the test suite](`d9dcbffac3/test_runner/regress/test_duplicate_layers.py (L15)`) that demonstrates the problem using a failpoint However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production. What is does instead is to return early with an `Ok()`, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised. That "return early" would be a bug in the routine if it happened in production. So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior. # Problem 2 Further, if production code _did_ (it nowawdays doesn't!) create a duplicate layer, the code in Pageserver that handles the condition (item 1 above) is too little and too late: * the code handles it by discarding the newer `struct Layer`; that's good. * however, on disk, we have already overwritten the old with the new layer file * the fact that we do it atomically doesn't matter because ... * if the new layer file is not bit-identical, then we have a cache coherency problem * PS PageCache block cache: caches old bit battern * blob_io offsets stored in variables, based on pre-overwrite bit pattern / offsets * => reading based on these offsets from the new file might yield different data than before # Solution - Remove the test suite code pertaining to Problem 1 - Move & rename test suite code that actually tests RFC-27 crash-consistent layer map. - Remove the Pageserver code that handles duplicate layers too late (Problem 1) - Use `RENAME_NOREPLACE` to prevent over-rename the file during `.finish()`, bail with an error if it happens (Problem 2) - This bailing prevents the caller from even trying to insert into the layer map, as they don't even get a `struct Layer` at hand. - Add `abort`s in the place where we have the layer map lock and check for duplicates (Problem 2) - Note again, we can't reach there because we bail from `.finish()` much earlier in the code. - Share the logic to clean up after failed `.finish()` between image layers and delta layers (drive-by cleanup) - This exposed that test `image_layer_rewrite` was overwriting layer files in place. Fix the test. # Future Work This PR adds a new failure scenario that was previously "papered over" by the overwriting of layers: 1. Start a compaction that will produce 3 layers: A, B, C 2. Layer A is `finish()`ed successfully. 3. Layer B fails mid-way at some `put_value()`. 4. Compaction bails out, sleeps 20s. 5. Some disk space gets freed in the meantime. 6. Compaction wakes from sleep, another iteration starts, it attempts to write Layer A again. But the `.finish()` fails because A already exists on disk. The failure in step 5 is new with this PR, and it causes the compaction to get stuck. Before, it would silently overwrite the file and "successfully" complete the second iteration. The mitigation for this is to `/reset` the tenant.	2024-06-04 16:16:23 +00:00
Joonas Koivunen	0acb604fa3	test: no missed wakeups, cancellation and timeout flow to downloads (#7863 ) I suspected a wakeup could be lost with `remote_storage::support::DownloadStream` if the cancellation and inner stream wakeups happen simultaneously. The next poll would only return the cancellation error without setting the wakeup. There is no lost wakeup because the single future for getting the cancellation error is consumed when the value is ready, and a new future is created for the next value. The new future is always polled. Similarly, if only the `Stream::poll_next` is being used after a `Some(_)` value has been yielded, it makes no sense to have an expectation of a wakeup for the (N+1)th stream value already set because when a value is wanted, `Stream::poll_next` will be called. A test is added to show that the above is true. Additionally, there was a question of these cancellations and timeouts flowing to attached or secondary tenant downloads. A test is added to show that this, in fact, happens. Lastly, a warning message is logged when a download stream is polled after a timeout or cancellation error (currently unexpected) so we can rule it out while troubleshooting.	2024-06-04 14:19:36 +03:00
Konstantin Knizhnik	7006caf3a1	Store logical replication origin in KV storage (#7099 ) Store logical replication origin in KV storage ## Problem See #6977 ## Summary of changes * Extract origin_lsn from commit WAl record * Add ReplOrigin key to KV storage and store origin_lsn * In basebackup replace snapshot origin_lsn with last committed origin_lsn at basebackup LSN ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-06-03 19:37:33 +03:00
Arpad Müller	acf0a11fea	Move keyspace utils to inherent impls (#7929 ) The keyspace utils like `is_rel_size_key` or `is_rel_fsm_block_key` and many others are free functions and have to be either imported separately or specified with the full path starting in `pageserver_api:🔑:`. This is less convenient than if these functions were just inherent impls. Follow-up of #7890 Fixes #6438	2024-06-03 16:18:07 +02:00
Arpad Müller	db477c0b8c	Add metrics for Azure blob storage (#7933 ) In issue #5590 it was proposed to implement metrics for Azure blob storage. This PR implements them except for the part that performs the rename, which is left for a followup. Closes #5590	2024-06-02 14:10:56 +00:00
Joonas Koivunen	ef83f31e77	pagectl: key command for dumping what we know about the key (#7890 ) What we know about the key via added `pagectl key $key` command: - debug formatting - shard placement when `--shard-count` is specified - different boolean queries in `key.rs` - aux files v2 Example: ``` $ cargo run -qp pagectl -- key 000000063F00004005000060270000100E2C parsed from hex: 000000063F00004005000060270000100E2C: Key { field1: 0, field2: 1599, field3: 16389, field4: 24615, field5: 0, field6: 1052204 } rel_block: true rel_vm_block: false rel_fsm_block: false slru_block: false inherited: true rel_size: false slru_segment_size: false recognized kind: None ```	2024-05-31 18:19:41 +00:00
Arthur Petukhovsky	16b2e74037	Add FullAccessTimeline guard in safekeepers (#7887 ) This is a preparation for https://github.com/neondatabase/neon/issues/6337. The idea is to add FullAccessTimeline, which will act as a guard for tasks requiring access to WAL files. Eviction will be blocked on these tasks and WAL won't be deleted from disk until there is at least one active FullAccessTimeline. To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`. After eviction is implemented, this function will be responsible for downloading missing WAL file and waiting until the download finishes. This commit also contains other small refactorings: - Separate `get_tenant_dir` and `get_timeline_dir` functions for building a local path. This is useful for looking at usages and finding tasks requiring access to local filesystem. - `timeline_manager` is now responsible for spawning all background tasks - WAL removal task is now spawned instantly after horizon is updated	2024-05-31 13:19:45 +00:00
Alex Chi Z	1eca8b8a6b	fix(pageserver): ensure to_i128 works for metadata keys (#7895 ) field2 of metadata keys can be 0xFFFF because of the mapping. Allow 0xFFFF for `to_i128`. An alternative is to encode 0xFFFF as 0xFFFFFFFF (which is allowed in the original `to_i128`). But checking the places where field2 is referenced, the rest part of the system does not seem to depend on this assertion. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-30 10:03:17 -04:00
Conrad Ludgate	9a081c230f	proxy: lazily parse startup pg params (#7905 ) ## Problem proxy params being a `HashMap<String,String>` when it contains just ``` application_name: psql database: neondb user: neondb_owner ``` is quite wasteful allocation wise. ## Summary of changes Keep the params in the wire protocol form, eg: ``` application_name\0psql\0database\0neondb\0user\0neondb_owner\0 ``` Using a linear search for the map is fast enough at small sizes, which is the normal case.	2024-05-30 11:02:38 +00:00
Conrad Ludgate	fddd11dd1a	proxy: upload postgres connection options as json in the parquet upload (#7903 ) ## Problem https://github.com/neondatabase/cloud/issues/9943 ## Summary of changes Captures the postgres options, converts them to json, uploads them in parquet.	2024-05-30 11:10:27 +01:00
Arseny Sher	3797566c36	safekeeper: test pull_timeline with WAL gc. Do pull_timeline while WAL is being removed. To this end - extract pausable_failpoint to utils, sprinkle pull_timeline with it - add 'checkpoint' sk http endpoint to force WAL removal. After fixing checking for pull file status code test fails so far which is expected.	2024-05-25 06:06:32 +03:00
Joonas Koivunen	a3f5b83677	chore: lower gate guard drop logging threshold to 100ms (#7862 ) We have some 1001ms cases, which do not yield gate guard context.	2024-05-24 14:07:58 +01:00
John Spray	1455f5a261	pageserver: revert concurrent secondary downloads, make DownloadStream always yield Err after cancel (#7866 ) ## Problem Ongoing hunt for secondary location shutdown hang issues. ## Summary of changes - Revert the functional changes from #7675 - Tweak a log in secondary downloads to make it more apparent when we drop out on cancellation - Modify DownloadStream's behavior to always return an Err after it has been cancelled. This _should_ not impact anything, but it makes the behavior simpler to reason about (e.g. even if the poll function somehow got called again, it could never end up in an un-cancellable state) Related #https://github.com/neondatabase/cloud/issues/13576	2024-05-24 11:45:34 +03:00
Alex Chi Z	e28e46f20b	fix(pageserver): make wal connstr a connstr (#7846 ) The list timeline API gives something like `"wal_source_connstr":"PgConnectionConfig { host: Domain(\"safekeeper-5.us-east-2.aws.neon.build\"), port: 6500, password: Some(REDACTED-STRING) }"`, which is weird. This pull request makes it somehow like a connection string. This field is not used at least in the neon database, so I assume no one is reading or parsing it. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-23 09:45:29 -04:00
Alex Chi Z	ff560a1113	chore(pageserver): use kebab case for compaction algorithms (#7845 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 21:28:47 +00:00
Alex Chi Z	ddd8ebd253	chore(pageserver): use kebab case for aux file flag (#7840 ) part of https://github.com/neondatabase/neon/issues/7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 17:06:00 +00:00
Arseny Sher	b43f6daa48	One more iteration on making walcraft test more robust. Some WAL might be inserted on the page boundary before XLOG_SWITCH lands there, repeat construction in this case.	2024-05-22 14:23:49 +03:00
Arpad Müller	679e031cf6	Add dummy lsn lease http and page service APIs (#7815 ) We want to introduce a concept of temporary and expiring LSN leases. This adds both a http API as well as one for the page service to obtain temporary LSN leases. This adds a dummy implementation to unblock integration work of this API. A functional implementation of the lease feature is deferred to a later step. Fixes #7808 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-21 23:31:20 +02:00
Joonas Koivunen	a8a88ba7bc	test(detach_ancestor): ensure L0 compaction in history is ok (#7813 ) detaching a timeline from its ancestor can leave the resulting timeline with more L0 layers than the compaction threshold. most of the time, the detached timeline has made progress, and next time the L0 -> L1 compaction happens near the original branch point and not near the last_record_lsn. add a test to ensure that inheriting the historical L0s does not change fullbackup. additionally: - add `wait_until_completed` to test-only timeline checkpoint and compact HTTP endpoints. with `?wait_until_completed=true` the endpoints will wait until the remote client has completed uploads. - for delta layers, describe L0-ness with the `/layer` endpoint Cc: #6994	2024-05-21 20:08:43 +03:00
Tristan Partin	e3415706b7	Upgrade Postgres v16 to 16.3	2024-05-21 09:18:11 -05:00
Arseny Sher	478cc37a70	Propagate standby apply LSN to pageserver to hold off GC. To avoid pageserver gc'ing data needed by standby, propagate standby apply LSN through standby -> safekeeper -> broker -> pageserver flow and hold off GC for it. Iteration of GC resets the value to remove the horizon when standby goes away -- pushes are assumed to happen at least once between gc iterations. As a safety guard max allowed lag compared to normal GC horizon is hardcoded as 10GB. Add test for the feature. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-21 16:21:29 +03:00
Alex Chi Z	6810d2aa53	feat(pageserver): do not read past image layers for vectored get (#7773 ) ## Problem Part of https://github.com/neondatabase/neon/issues/7462 On metadata keyspace, vectored get will not stop if a key is not found, and will read past the image layer. However, the semantics is different from single get, because if a key does not exist in the image layer, it means that the key does not exist in the past, or have been deleted. This pull request fixed it by recording image layer coverage during the vectored get process and stop when the full keyspace is covered by an image layer. A corresponding test case is added to ensure generating image layer reduces the number of delta layers. This optimization (or bug fix) also applies to rel block keyspaces. If a key is missing, we can know it's missing once the first image layer is reached. Page server will not attempt to read lower layers, which potentially incurs layer downloads + evictions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-20 14:24:18 -04:00
Arseny Sher	e3f51abadf	safekeeper: close connection when COPY stream ends. We can't gracefully exit COPY mode (and don't need that), so close connection to prevent further attempts to use it.	2024-05-20 15:43:10 +03:00
Heikki Linnakangas	5caee4ca54	Fix calculation in test The comment says that this checks if there's enough space on the page for logical message and an XLOG_SWITCH. So the sizes of the logical message and the XLOG_SWITCH record should be added together, not subtracted. I saw a panic in the test that led me to investigate and notice this (https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7803/9142396223/index.html): RuntimeError: Run ['/tmp/neon/bin/wal_craft', 'in-existing', 'last_wal_record_xlog_switch_ends_on_page_boundary', "host=localhost port=16165 user=cloud_admin dbname=postgres options='-cstatement_timeout=120s '"] failed: stdout: stderr: thread 'main' panicked at libs/postgres_ffi/wal_craft/src/lib.rs:370:27: attempt to subtract with overflow stack backtrace: 0: rust_begin_unwind at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5 1: core::panicking::panic_fmt at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14 2: core::panicking::panic at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:145:5 3: <wal_craft::LastWalRecordXlogSwitchEndsOnPageBoundary as wal_craft::Crafter>::craft::<postgres::client::Client> at libs/postgres_ffi/wal_craft/src/lib.rs:370:27 4: wal_craft::main::{closure#0} at libs/postgres_ffi/wal_craft/src/bin/wal_craft.rs:21:17 5: wal_craft::main at libs/postgres_ffi/wal_craft/src/bin/wal_craft.rs:66:47 6: <fn() -> core::result::Result<(), anyhow::Error> as core::ops::function::FnOnce<()>>::call_once at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.	2024-05-19 21:49:51 +03:00
Alex Chi Z	e1a9669d05	feat(pagebench): add aux file bench (#7746 ) part of https://github.com/neondatabase/neon/issues/7462 ## Summary of changes This pull request adds two APIs to the pageserver management API: list_aux_files and ingest_aux_files. The aux file pagebench is intended to be used on an empty timeline because the data do not go through the safekeeper. LSNs are advanced by 8 for each ingestion, to avoid invariant checks inside the pageserver. For now, I only care about space amplification / read amplification, so the bench is designed in a very simple way: ingest 10000 files, and I will manually dump the layer map to analyze. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-17 20:04:02 +00:00
Alex Chi Z	aaf60819fa	feat(pageserver): persist aux file policy in index part (#7668 ) Part of https://github.com/neondatabase/neon/issues/7462 ## Summary of changes Tenant config is not persisted unless it's attached on the storage controller. In this pull request, we persist the aux file policy flag in the `index_part.json`. Admins can set `switch_aux_file_policy` in the storage controller or using the page server API. Upon the first aux file gets written, the write path will compare the aux file policy target with the current policy. If it is switch-able, we will do the switch. Otherwise, the original policy will be used. The test cases show what the admins can do / cannot do. The `last_aux_file_policy` is stored in `IndexPart`. Updates to the persisted policy are done via `schedule_index_upload_for_aux_file_policy_update`. On the write path, the writer will update the field. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-17 19:22:49 +00:00
John Spray	c84656a53e	pageserver: implement auto-splitting (#7681 ) ## Problem Currently tenants are only split into multiple shards if a human being calls the API to do it. Issue: #7388 ## Summary of changes - Add a pageserver API for returning the top tenants by size - Add a step to the controller's background loop where if there is no reconciliation or optimization to be done, it looks for things to split. - Add a test that runs pgbench on many tenants concurrently, and checks that splitting happens as expected as tenants grow, without interrupting the client I/O. This PR is quite basic: there is a tasklist in https://github.com/neondatabase/neon/issues/7388 for further work. This PR is meant to be safe (off by default), and sufficient to enable our staging environment to run lots of sharded tenants without a human having to set them up.	2024-05-17 16:01:24 +00:00
Andrew Rudenko	923cf91aa4	compute_ctl: catalog API endpoints (#7575 ) ## Problem There are two cloud's features that require extra compute endpoints. 1. We are running pg_dump to get DB schemas. Currently, we are using a special service for this. But it would be great to execute pg_dump in an isolated environment. And we already have such an environment, it's our compute! And likely enough pg_dump already exists there too! (see https://github.com/neondatabase/cloud/issues/11644#issuecomment-2084617832) 2. We need to have a way to get databases and roles from compute after time travel (see https://github.com/neondatabase/cloud/issues/12109) ## Summary of changes It adds two API endpoints to compute_ctl HTTP API that target both of the aforementioned cases. --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-05-16 12:04:16 +02:00
Jure Bajic	affc18f912	Add performance regress `test_ondemand_download_churn.py` (#7242 ) Add performance regress test for on-demand download throughput. Closes https://github.com/neondatabase/neon/issues/7146 Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-05-15 18:41:12 +02:00
Christian Schwarz	c3dd646ab3	chore!: always use async walredo, warn if sync is configured (#7754 ) refs https://github.com/neondatabase/neon/issues/7753 This PR is step (1) of removing sync walredo from Pageserver. Changes: * Remove the sync impl * If sync is configured, warn! and use async instead * Remove the metric that exposes `kind` * Remove the tenant status API that exposes `kind` Future Work ----------- After we've released this change to prod and are sure we won't roll back, we will 1. update the prod Ansible to remove the config flag from the prod pageserver.toml. 2. remove the remaining `kind` code in pageserver These two changes need no release inbetween. See https://github.com/neondatabase/neon/issues/7753 for details.	2024-05-15 15:04:52 +02:00
John Spray	972470b174	pageserver: use adaptive concurrency in secondary layer downloads (#7675 ) ## Problem Secondary downloads are a low priority task, and intentionally do not try to max out download speeds. This is almost always fine when they are used through the life of a tenant shard as a continuous "trickle" of background downloads. However, there are sometimes circumstances where we would like to populate a secondary location as fast as we can, within the constraint that we don't want to impact the activity of attached tenants: - During node removal, where we will need to create replacements for secondary locations on the node being removed - After a shard split, we need new secondary locations for the new shards to populate before the shards can be migrated to their final location. ## Summary of changes - Add an activity() function to the remote storage interface, enabling callers to query how busy the remote storage backend is - In the secondary download code, use a very modest amount of concurrency, driven by the remote storage's state: we only use concurrency if the remote storage semaphore is 75% free, and scale the amount of concurrency used within that range. This is not a super clever form of prioritization, but it should accomplish the key goals: - Enable secondary downloads to happen faster when the system is idle - Make secondary downloads a much lower priority than attached tenants when the remote storage is busy. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-05-13 17:38:30 +00:00
Joonas Koivunen	6206f76419	build: run doctests (#7697 ) While switching to use nextest with the repository in `f28bdb6`, we had not noticed that it doesn't yet support running doctests. Run the doc tests before other tests.	2024-05-10 16:46:50 +02:00
Christian Schwarz	ab10523cc1	remote_storage: AWS_PROFILE with endpoint overrides in ~/.aws/config (updates AWS SDKs) (#7664 ) Before this PR, using the AWS SDK profile feature for running against minio didn't work because * our SDK versions were too old and didn't include https://github.com/awslabs/aws-sdk-rust/issues/1060 and * we didn't massage the s3 client config builder correctly. This PR * udpates all the AWS SDKs we use to, respectively, the latest version I could find on crates.io (Is there a better process?) * changes the way remote_storage constructs the S3 client, and * documents how to run the test suite against real S3 & local minio. Regarding the changes to `remote_storage`: if one reads the SDK docs, it is clear that the recommended way is to use `aws_config::from_env`, then customize. What we were doing instead is to use the `aws_sdk_s3` builder directly. To get the `local-minio` in the added docs working, I needed to update both the SDKs and make the changes to the `remote_storage`. See the commit history in this PR for details. Refs: * byproduct: https://github.com/smithy-lang/smithy-rs/pull/3633 * follow-up on deprecation: https://github.com/neondatabase/neon/issues/7665 * follow-up for scrubber S3 setup: https://github.com/neondatabase/neon/issues/7667	2024-05-09 10:58:38 +02:00
Conrad Ludgate	0c99e5ec6d	proxy: cull http connections (#7632 ) ## Problem Some HTTP client connections can stay open for quite a long time. ## Summary of changes When there are too many HTTP client connections, pick a random connection and gracefully cancel it.	2024-05-07 18:15:06 +01:00
Alex Chi Z	017c34b773	feat(pageserver): generate basebackup from aux file v2 storage (#7517 ) This pull request adds the new basebackup read path + aux file write path. In the regression test, all logical replication tests are run with matrix aux_file_v2=false/true. Also fixed the vectored get code path to correctly return missing key error when being called from the unified sequential get code path. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-07 16:30:18 +00:00
Joonas Koivunen	3c9b484c4d	feat: Timeline detach ancestor (#7456 ) ## Problem Timelines cannot be deleted if they have children. In many production cases, a branch or a timeline has been created off the main branch for various reasons to the effect of having now a "new main" branch. This feature will make it possible to detach a timeline from its ancestor by inheriting all of the data before the branchpoint to the detached timeline and by also reparenting all of the ancestor's earlier branches to the detached timeline. ## Summary of changes - Earlier added copy_lsn_prefix functionality is used - RemoteTimelineClient learns to adopt layers by copying them from another timeline - LayerManager adds support for adding adopted layers - `timeline::Timeline::{prepare_to_detach,complete_detaching}_from_ancestor` and `timeline::detach_ancestor` are added - HTTP PUT handler Cc: #6994 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-05-07 13:47:57 +03:00
John Spray	af849a1f61	pageserver: post-shard-split layer trimming (1/2) (#7572 ) ## Problem After a shard split of a large existing tenant, child tenants can end up with oversized historic layers indefinitely, if those layers are prevented from being GC'd by branchpoints. This PR is followed by https://github.com/neondatabase/neon/pull/7531 Related issue: https://github.com/neondatabase/neon/issues/7504 ## Summary of changes - Add a new compaction phase `compact_shard_ancestors`, which identifies layers that are no longer needed after a shard split. - Add a Timeline->LayerMap code path called `rewrite_layers` , which is currently only used to drop layers, but will later be used to rewrite them as well in https://github.com/neondatabase/neon/pull/7531 - Add a new test that compacts after a split, and checks that something is deleted. Note that this doesn't have much impact on a tenant's resident size (since unused layers would end up evicted anyway), but it: - Makes index_part.json much smaller - Makes the system easier to reason about: avoid having tenants which are like "my physical size is 4TiB but don't worry I'll never actually download it", instead have tenants report the real physical size of what they might download. Why do we remove these layers in compaction rather than during the split? Because we have existing split tenants that need cleaning up. We can add it to the split operation in future as an optimization.	2024-05-07 11:15:58 +01:00
Sasha Krassovsky	7dd58e1449	On-demand WAL download for walsender (#6872 ) ## Problem There's allegedly a bug where if we connect a subscriber before WAL is downloaded from the safekeeper, it creates an error. ## Summary of changes Adds support for pausing safekeepers from sending WAL to computes, and then creates a compute and attaches a subscriber while it's in this paused state. Fails to reproduce the issue, but probably a good test to have --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-05-06 10:54:07 -07:00

1 2 3 4 5 ...

660 Commits