rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-26 01:20:38 +00:00

Author	SHA1	Message	Date
a-masterov	e27ce38619	Add testing for extensions (#7818 ) ## Problem We need automated tests of extensions shipped with Neon to detect possible problems. ## Summary of changes A new image neon-test-extensions is added. Workflow changes to test the shipped extensions are added as well. Currently, the regression tests, shipped with extensions are in use. Some extensions, i.e. rum, timescaledb, rdkit, postgis, pgx_ulid, pgtap, pg_tiktoken, pg_jsonschema, pg_graphql, kq_imcx, wal2json_2_5 are excluded due to problems or absence of internal tests. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-06-11 13:07:51 +02:00
Joonas Koivunen	e46692788e	refactor: Timeline layer flushing (#7993 ) The new features have deteriorated layer flushing, most recently with #7927. Changes: - inline `Timeline::freeze_inmem_layer` to the only caller - carry the TimelineWriterState guard to the actual point of freezing the layer - this allows us to `#[cfg(feature = "testing")]` the assertion added in #7927 - remove duplicate `flush_frozen_layer` in favor of splitting the `flush_frozen_layers_and_wait` - this requires starting the flush loop earlier for `checkpoint_distance < initdb size` tests	2024-06-10 19:34:34 +03:00
Alex Chi Z	a8ca7a1a1d	docs: highlight neon env comes with an initial timeline (#7995 ) Quite a few existing test cases create their own timelines instead of using the default one. This pull request highlights that and hopefully people can write simpler tests in the future. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-06-10 12:08:16 -04:00
Joonas Koivunen	b52e31c1a4	fix: allow layer flushes more often (#7927 ) As seen with the pgvector 0.7.0 index builds, we can receive large batches of images, leading to very large L0 layers in the range of 1GB. These large layers are produced because we are only able to roll the layer after we have witnessed two different Lsns in a single `DataDirModification::commit`. As the single Lsn batches of images can span over multiple `DataDirModification` lifespans, we will rarely get to write two different Lsns in a single `put_batch` currently. The solution is to remember the TimelineWriterState instead of eagerly forgetting it until we really open the next layer or someone else flushes (while holding the write_guard). Additional changes are test fixes to avoid "initdb image layer optimization" or ignoring initdb layers for assertion. Cc: #7197 because small `checkpoint_distance` will now trigger the "initdb image layer optimization"	2024-06-10 13:50:17 +00:00
Heikki Linnakangas	5a7e285c2c	Simplify scanning compute logs in tests (#7997 ) Implement LogUtils in the Endpoint fixture class, so that the "log_contains" function can be used on compute logs too. Per discussion at: https://github.com/neondatabase/neon/pull/7288#discussion_r1623633803	2024-06-10 12:52:49 +00:00
Christian Schwarz	ae5badd375	Revert "Include openssl and ICU statically linked" (#8003 ) Reverts neondatabase/neon#7956 Rationale: compute incompatibilties Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH Relevant quotes from @hlinnaka > If we go through with the current release candidate, but the compute is pinned, people who create new projects will get that warning, which is silly. To them, it looks like the ICU version was downgraded, because initdb was run with newer version. > We should upgrade the ICU version eventually. And when we do that, users with old projects that use ICU will start to see that warning. I think that's acceptable, as long as we do homework, notify users, and communicate that properly. > When do that, we should to try to upgrade the storage and compute versions at roughly the same time.	2024-06-10 13:20:20 +02:00
Alex Chi Z	3e63d0f9e0	test(pageserver): quantify compaction outcome (#7867 ) A simple API to collect some statistics after compaction to easily understand the result. The tool reads the layer map, and analyze range by range instead of doing single-key operations, which is more efficient than doing a benchmark to collect the result. It currently computes two key metrics: * Latest data access efficiency, which finds how many delta layers / image layers the system needs to iterate before returning any key in a key range. * (Approximate) PiTR efficiency, as in https://github.com/neondatabase/neon/issues/7770, which is simply the number of delta files in the range. The reason behind that is, assume no image layer is created, PiTR efficiency is simply the cost of collect records from the delta layers, and the replay time. Number of delta files (or in the future, estimated size of reads) is a simple yet efficient way of estimating how much effort the page server needs to reconstruct a page. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-10 10:42:13 +02:00
Rahul Patil	3b647cd55d	Include openssl and ICU statically linked (#7956 ) ## Problem Due to the upcoming End of Life (EOL) for Debian 11, we need to upgrade the base OS for Pageservers from Debian 11 to Debian 12 for security reasons. When deploying a new Pageserver on Debian 12 with the same binary built on Debian 11, we encountered the following errors: ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v16/bin/initdb: error while loading shared libraries: libicuuc.so.67: cannot open shared object file: No such file or directory ``` and ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v14/bin/initdb: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory ``` These issues occur when creating new projects. ## Summary of changes - To address these issues, we configured PostgreSQL build to use statically linked OpenSSL and ICU libraries. - This resolves the missing shared library errors when running the binaries on Debian 12. Closes: https://github.com/neondatabase/cloud/issues/12648 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [x] Do not forget to reformat commit message to not include the above checklist	2024-06-07 17:28:10 +00:00
Tristan Partin	26c68f91f3	Move SQL migrations out of line It makes them much easier to reason about, and allows other SQL tooling to operate on them like language servers, formatters, etc. I also brought back the removed migrations such that we can more easily understand what they were. I included a "-- SKIP" comment describing why those migrations are now skipped. We no longer skip migrations by checking if it is empty, but instead check to see if the migration starts with "-- SKIP".	2024-06-07 08:35:55 -07:00
a-masterov	2078dc827b	CI: copy run-* labels from external contributors' PRs (#7915 ) ## Problem We don't carry run-* labels from external contributors' PRs to ci-run/pr-* PRs. This is not really convenient. Need to sync labels in approved-for-ci-run workflow. ## Summary of changes Added the procedure of transition of labels from the original PR ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-06-07 10:04:59 +02:00
Joonas Koivunen	8ee191c271	test_local_only_layers_after_crash: various fixes (#7986 ) In #7927 I needed to fix this test case, but the fixes should be possible to land irrespective of the layer ingestion code change. The most important fix is the behavior if an image layer is found: the assertion message formatting raises a runtime error, which obscures the fact that we found an image layer.	2024-06-07 10:18:05 +03:00
Anastasia Lubennikova	66c6b270f1	Downgrade No response from reading prefetch entry WARNING to LOG	2024-06-06 20:56:19 +01:00
Arthur Petukhovsky	e4e444f59f	Remove random sleep in partial backup (#7982 ) We had a random sleep in the beginning of partial backup task, which was needed for the first partial backup deploy. It helped with gradual upload of segments without causing network overload. Now partial backup is deployed everywhere, so we don't need this random sleep anymore. We also had an issue related to this, in which manager task was not shut down for a long time. The cause of the issue is this random sleep that didn't take timeline cancellation into account, meanwhile manager task waited for partial backup to complete. Fixes https://github.com/neondatabase/neon/issues/7967	2024-06-06 17:54:44 +00:00
Joonas Koivunen	d46d19456d	raise the warning for oversized L0 to 2target (#7985 ) currently we warn even by going over a single byte. even that will be hit much more rarely once #7927 lands, but get this in earlier. rationale for 2checkpoint_distance: anything smaller is not really worth a warn. we have an global allowed_error for this warning, which still cannot be removed nor can it be removed with #7927 because of many tests with very small `checkpoint_distance`.	2024-06-06 20:18:39 +03:00
Alex Chi Z	5d05013857	fix(pageserver): skip metadata compaction is LSN is not accumulated enough (#7962 ) close https://github.com/neondatabase/neon/issues/7937 Only trigger metadata image layer creation if enough delta layers are accumulated. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 11:34:44 -04:00
Alex Chi Z	014509987d	fix(pageserver): more flexible layer size test (#7945 ) M-series macOS has different alignments/size for some fields (which I did not investigate in detail) and therefore this test cannot pass on macOS. Fixed by using `<=` for the comparison so that we do not test for an exact match. observed by @yliang412 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 14:40:58 +00:00
Arpad Müller	75bca9bb19	Perform retries on azure bulk deletion (#7964 ) This adds retries to the bulk deletion, because if there is a certain chance n that a request fails, the chance that at least one of the requests in a chain of requests fails increases exponentially. We've had similar issues with the S3 DR tests, which in the end yielded in adding retries at the remote_storage level. Retries at the top level are not sufficient when one remote_storage "operation" is multiple network requests in a trench coat, especially when there is no notion of saving the progress: even if prior deletions had been successful, we'd still need to get a 404 in order to continue the loop and get to the point where we failed in the last iteration. Maybe we'll fail again but before we've even reached it. Retries at the bottom level avoid this issue because they have the notion of progress and also when one network operation fails, only that operation is retried. First part of #7931.	2024-06-06 14:21:27 +00:00
Joonas Koivunen	a8be07785e	fix: do TimelineMetrics::shutdown only once (#7983 ) Related to #7341 tenant deletion will end up shutting down timelines twice, once before actually starting and the second time when per timeline deletion is requested. Shutting down TimelineMetrics causes underflows. Add an atomic boolean and only do the shutdown once.	2024-06-06 14:20:54 +00:00
Yuchen Liang	630cfbe420	refactor(pageserver): designated api error type for cancelled request (#7949 ) Closes #7406. ## Problem When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is exposed to handle that case, which verbosely logs the error. However, we don't benefit from having the full backtrace provided by anyhow in this case. ## Summary of changes This PR introduces a new `ApiError` type to handle errors caused by cancelled request more robustly. - A new enum variant `ApiError::Cancelled` - Currently the cancelled request is mapped to status code 500. - Need to handle this error in proxy's `http_util` as well. - Added a failpoint test to simulate cancelled `get_lsn_by_timestamp` request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-06 14:00:14 +00:00
Christian Schwarz	0a65333fff	chore(walredo): avoid duplicate tenant_id and shard_slug fields (#7977 ) spotted during reviews of async walredo work in #6628	2024-06-06 15:10:16 +02:00
John Spray	91dd99038e	pageserver/controller: enable tenant deletion without attachment (#7957 ) ## Problem As described in #7952, the controller's attempt to reconcile a tenant before finally deleting it can get hung up waiting for the compute notification hook to accept updates. The fact that we try and reconcile a tenant at all during deletion is part of a more general design issue (#5080), where deletion was implemented as an operation on attached tenant, requiring the tenant to be attached in order to delete it, which is not in principle necessary. Closes: #7952 ## Summary of changes - In the pageserver deletion API, only do the traditional deletion path if the tenant is attached. If it's secondary, then tear down the secondary location, and then do a remote delete. If it's not attached at all, just do the remote delete. - In the storage controller, instead of ensuring a tenant is attached before deletion, do a best-effort detach of the tenant, and then call into some arbitrary pageserver to issue a deletion of remote content. The pageserver retains its existing delete behavior when invoked on attached locations. We can remove this later when all users of the API are updated to either do a detach-before-delete. This will enable removing the "weird" code paths during startup that sometimes load a tenant and then immediately delete it, and removing the deletion markers on tenants.	2024-06-05 20:22:54 +00:00
Christian Schwarz	83ab14e271	chore!: remove walredo_process_kind config option & kind type (#7756 ) refs https://github.com/neondatabase/neon/issues/7753 Preceding PR https://github.com/neondatabase/neon/pull/7754 laid out the plan, this one wraps it up.	2024-06-05 14:21:10 +02:00
Peter Bendel	85ef6b1645	upgrade pgvector from 0.7.0 to 0.7.1 (#7954 ) ## Problem ## Summary of changes performance improvements in pgvector 0.7.1 for hnsw index builds, see https://github.com/pgvector/pgvector/issues/570	2024-06-05 10:32:03 +02:00
Alex Chi Z	1a8d53ab9d	feat(pageserver): compute aux file size on initial logical size calculation (#7958 ) close https://github.com/neondatabase/neon/issues/7822 close https://github.com/neondatabase/neon/issues/7443 Aux file metrics is computed incrementally. If the size is not initialized, the metrics will never show up. This pull request adds the functionality to compute the aux file size on initial logical size calculation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-04 13:47:48 -04:00
Joonas Koivunen	3d6e389aa2	feat: support changing IndexPart::metadata_bytes to json in future release (#7693 ) ## Problem Currently we serialize the `TimelineMetadata` into bytes to put it into `index_part.json`. This `Vec<u8>` (hopefully `[u8; 512]`) representation was chosen because of problems serializing TimelineId and Lsn between different serializers (bincode, json). After #5335, the serialization of those types became serialization format aware or format agnostic. We've removed the pageserver local `metadata` file writing in #6769. ## Summary of changes Allow switching from the current serialization format to plain JSON for the legacy TimelineMetadata format in the future by adding a competitive serialization method to the current one (`crate::tenant::metadata::modern_serde`), which accepts both old bytes and new plain JSON. The benefits of this are that dumping the index_part.json with pretty printing no longer produces more than 500 lines of output, but after enabling it produces lines only proportional to the layer count, like: ```json { "version": ???, "layer_metadata": { ... }, "disk_consistent_lsn": "0/15FD5D8", "legacy_metadata": { "disk_consistent_lsn": "0/15FD5D8", "prev_record_lsn": "0/15FD5A0", "ancestor_timeline": null, "ancestor_lsn": "0/0", "latest_gc_cutoff_lsn": "0/149FD18", "initdb_lsn": "0/149FD18", "pg_version": 15 } } ``` In the future, I propose we completely stop using this legacy metadata type and wasting time trying to come up with another version numbering scheme in addition to the informative-only one already found in `index_part.json`, and go ahead with storing metadata or feature flags on the `index_part.json` itself. #7699 is the "one release after" changes which starts to produce metadata in the index_part.json as json.	2024-06-04 19:36:22 +03:00
Christian Schwarz	17116f2ea9	fix(pageserver): abort on duplicate layers, before doing damage (#7799 ) fixes https://github.com/neondatabase/neon/issues/7790 (duplicating most of the issue description here for posterity) # Background From the time before always-authoritative `index_part.json`, we had to handle duplicate layers. See the RFC for an illustration of how duplicate layers could happen: `a8e6d259cb/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md (L41-L50)` As of #5198 , we should not be exposed to that problem anymore. # Problem 1 We still have 1. [code in Pageserver](`82960b2175/pageserver/src/tenant/timeline.rs (L4502-L4521)`) than handles duplicate layers 2. [tests in the test suite](`d9dcbffac3/test_runner/regress/test_duplicate_layers.py (L15)`) that demonstrates the problem using a failpoint However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production. What is does instead is to return early with an `Ok()`, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised. That "return early" would be a bug in the routine if it happened in production. So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior. # Problem 2 Further, if production code _did_ (it nowawdays doesn't!) create a duplicate layer, the code in Pageserver that handles the condition (item 1 above) is too little and too late: * the code handles it by discarding the newer `struct Layer`; that's good. * however, on disk, we have already overwritten the old with the new layer file * the fact that we do it atomically doesn't matter because ... * if the new layer file is not bit-identical, then we have a cache coherency problem * PS PageCache block cache: caches old bit battern * blob_io offsets stored in variables, based on pre-overwrite bit pattern / offsets * => reading based on these offsets from the new file might yield different data than before # Solution - Remove the test suite code pertaining to Problem 1 - Move & rename test suite code that actually tests RFC-27 crash-consistent layer map. - Remove the Pageserver code that handles duplicate layers too late (Problem 1) - Use `RENAME_NOREPLACE` to prevent over-rename the file during `.finish()`, bail with an error if it happens (Problem 2) - This bailing prevents the caller from even trying to insert into the layer map, as they don't even get a `struct Layer` at hand. - Add `abort`s in the place where we have the layer map lock and check for duplicates (Problem 2) - Note again, we can't reach there because we bail from `.finish()` much earlier in the code. - Share the logic to clean up after failed `.finish()` between image layers and delta layers (drive-by cleanup) - This exposed that test `image_layer_rewrite` was overwriting layer files in place. Fix the test. # Future Work This PR adds a new failure scenario that was previously "papered over" by the overwriting of layers: 1. Start a compaction that will produce 3 layers: A, B, C 2. Layer A is `finish()`ed successfully. 3. Layer B fails mid-way at some `put_value()`. 4. Compaction bails out, sleeps 20s. 5. Some disk space gets freed in the meantime. 6. Compaction wakes from sleep, another iteration starts, it attempts to write Layer A again. But the `.finish()` fails because A already exists on disk. The failure in step 5 is new with this PR, and it causes the compaction to get stuck. Before, it would silently overwrite the file and "successfully" complete the second iteration. The mitigation for this is to `/reset` the tenant.	2024-06-04 16:16:23 +00:00
John Spray	fd22fc5b7d	pageserver: include heatmap in tenant deletion (#7928 ) ## Problem This was an oversight when adding heatmaps: because they are at the top level of the tenant, they aren't included in the catch-all list & delete that happens for timeline paths. This doesn't break anything, but it leaves behind a few kilobytes of garbage in the S3 bucket after a tenant is deleted, generating work for the scrubber. ## Summary of changes - During deletion, explicitly remove the heatmap file - In test_tenant_delete_smoke, upload a heatmap so that the test would fail its "remote storage empty after delete" check if we didn't delete it.	2024-06-04 16:16:50 +01:00
Joonas Koivunen	0112097e13	feat(rtc): maintain dirty and uploaded IndexPart (#7833 ) RemoteTimelineClient maintains a copy of "next IndexPart" as a number of fields which are like an IndexPart but this is not immediately obvious. Instead of multiple fields, maintain a `dirty` ("next IndexPart") and `clean` ("uploaded IndexPart") fields. Additional cleanup: - rename `IndexPart::disk_consistent_lsn` accessor `duplicated_disk_consistent_lsn` - no one except scrubber should be looking at it, even scrubber is a stretch - remove usage elsewhere (pagectl used by tests, metadata scan endpoint) - serialize index part before the index upload operation - avoid upload operation being retried because of serialization error - serialization error is fatal anyway for timeline -- it can only make transient local progress after that, at least the error is bubbled up now - gather exploded IndexPart fields into single actual `UploadQueueInitialized::dirty` of which the uploaded snapshot is serialized - implement the long wished monotonicity check with the `clean` IndexPart with an assertion which is not expected to fire Continued work from #7860 towards next step of #6994.	2024-06-04 17:27:08 +03:00
Joonas Koivunen	9d4c113f9b	build(Dockerfile.compute-node): do not log tar contents (#7953 ) in build logs we get a lot of lines for building the compute node images because of verbose tar unpack. we know the sha256 so we don't need to log the contents. my hope is that this will allow us more reliably use the github live updating log view.	2024-06-04 12:42:57 +01:00
Joonas Koivunen	0acb604fa3	test: no missed wakeups, cancellation and timeout flow to downloads (#7863 ) I suspected a wakeup could be lost with `remote_storage::support::DownloadStream` if the cancellation and inner stream wakeups happen simultaneously. The next poll would only return the cancellation error without setting the wakeup. There is no lost wakeup because the single future for getting the cancellation error is consumed when the value is ready, and a new future is created for the next value. The new future is always polled. Similarly, if only the `Stream::poll_next` is being used after a `Some(_)` value has been yielded, it makes no sense to have an expectation of a wakeup for the (N+1)th stream value already set because when a value is wanted, `Stream::poll_next` will be called. A test is added to show that the above is true. Additionally, there was a question of these cancellations and timeouts flowing to attached or secondary tenant downloads. A test is added to show that this, in fact, happens. Lastly, a warning message is logged when a download stream is polled after a timeout or cancellation error (currently unexpected) so we can rule it out while troubleshooting.	2024-06-04 14:19:36 +03:00
Konstantin Knizhnik	387a36874c	Set page LSN when reconstructing VM in page server (#7935 ) ## Problem Page LSN is not set while VM update. May be reason of test_vm_bits flukyness. Buit more serious issues can be also caused by wrong LSN. Related: https://github.com/neondatabase/neon/pull/7935 ## Summary of changes - In `apply_in_neon`, set the LSN bytes when applying records of type `ClearVisibilityMapFlags`	2024-06-04 09:56:03 +01:00
Anna Khanova	00032c9d9f	[proxy] Fix dynamic rate limiter (#7950 ) ## Problem There was a bug in dynamic rate limiter, which exhausted CPU in proxy and proxy wasn't able to accept any connections. ## Summary of changes 1. `if self.available > 1` -> `if self.available >= 1` 2. remove `timeout_at` to use just timeout 3. remove potential infinite loops which can exhaust CPUs.	2024-06-04 05:07:54 +01:00
John Spray	11bb265de1	pageserver: don't squash all image layer generation errors into anyhow::Error (#7943 ) ## Problem CreateImageLayersError and CompactionError had proper From implementations, but compact_legacy was explicitly squashing all image layer errors into an anyhow::Error anyway. This led to errors like: ``` Error processing HTTP request: InternalServerError(timeline shutting down Stack backtrace: 0: <<anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from as core::ops::function::FnOnce<(pageserver::tenant::timeline::CreateImageLayersError,)>>::call_once at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5 1: <core::result::Result<alloc::vec::Vec<pageserver::tenant::storage_layer::layer::ResidentLayer>, pageserver::tenant::timeline::CreateImageLayersError>>::map_err::<anyhow::Error, <anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from> at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:829:27 2: <pageserver::tenant::timeline::Timeline>::compact_legacy::{closure#0} at pageserver/src/tenant/timeline/compaction.rs:125:36 3: <pageserver::tenant::timeline::Timeline>::compact::{closure#0} at pageserver/src/tenant/timeline.rs:1719:84 4: pageserver::http::routes::timeline_checkpoint_handler::{closure#0}::{closure#0} ``` Closes: https://github.com/neondatabase/neon/issues/7861	2024-06-03 22:10:13 +02:00
John Spray	69026a9a36	storcon_cli: add 'drop' and eviction interval utilities (#7938 ) The storage controller has 'drop' APIs for tenants and nodes, for use in situations where something weird has happened: - node-drop is useful until we implement proper node decom, or if we have a partially provisioned node that somehow gets registered with the storage controller but is then dead. - tenant-drop is useful if we accidentally add a tenant that shouldn't be there at all, or if we want to make the controller forget about a tenant without deleting its data. For example, if one uses the tenant-warmup command with a bad tenant ID and needs to clean that up. The drop commands require an `--unsafe` parameter, to reduce the chance that someone incorrectly assumes these are the normal/clean ways to delete things. This PR also adds a convenience command for setting the time based eviction parameters on a tenant. This is useful when onboarding an existing tenant that has high resident size due to storage amplification in compaction: setting a lower time based eviction threshold brings down the resident size ahead of doing a shard split.	2024-06-03 18:13:01 +00:00
Konstantin Knizhnik	7006caf3a1	Store logical replication origin in KV storage (#7099 ) Store logical replication origin in KV storage ## Problem See #6977 ## Summary of changes * Extract origin_lsn from commit WAl record * Add ReplOrigin key to KV storage and store origin_lsn * In basebackup replace snapshot origin_lsn with last committed origin_lsn at basebackup LSN ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-06-03 19:37:33 +03:00
John Spray	69d18d6429	s3_scrubber: add `pageserver-physical-gc` (#7925 ) ## Problem Currently, we leave `index_part.json` objects from old generations behind each time a pageserver restarts or a tenant is migrated. This doesn't break anything, but it's annoying when a tenant has been around for a long time and starts to accumulate 10s-100s of these. Partially implements: #7043 ## Summary of changes - Add a new `pageserver-physical-gc` command to `s3_scrubber` The name is a bit of a mouthful, but I think it makes sense: - GC is the accurate term for what we are doing here: removing data that takes up storage but can never be accessed. - "physical" is a necessary distinction from the "normal" GC that we do online in the pageserver, which operates at a higher level in terms of LSNs+layers, whereas this type of GC is purely about S3 objects. - "pageserver" makes clear that this command deals exclusively with pageserver data, not safekeeper.	2024-06-03 17:16:23 +01:00
Arpad Müller	acf0a11fea	Move keyspace utils to inherent impls (#7929 ) The keyspace utils like `is_rel_size_key` or `is_rel_fsm_block_key` and many others are free functions and have to be either imported separately or specified with the full path starting in `pageserver_api:🔑:`. This is less convenient than if these functions were just inherent impls. Follow-up of #7890 Fixes #6438	2024-06-03 16:18:07 +02:00
Alex Chi Z	c1f55c1525	feat(pageserver): collect aux file tombstones (#7900 ) close https://github.com/neondatabase/neon/issues/7800 This is a small change to enable the tombstone -> exclude from image layer path. Most of the pull request is unit tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-03 09:56:36 -04:00
Joonas Koivunen	34f450c05a	test: allow no vectored gets happening (#7939 ) when running the regress tests locally without any environment variables we use on CI, `test_pageserver_compaction_smoke` fails with division by zero. fix it temporarily by allowing no vectored read happening. to be cleaned when vectored get validation gets removed and the default value can be changed. Cc: https://github.com/neondatabase/neon/issues/7381	2024-06-03 09:37:11 -04:00
Arpad Müller	db477c0b8c	Add metrics for Azure blob storage (#7933 ) In issue #5590 it was proposed to implement metrics for Azure blob storage. This PR implements them except for the part that performs the rename, which is left for a followup. Closes #5590	2024-06-02 14:10:56 +00:00
Arthur Petukhovsky	a345cf3fc6	Fix span for WAL removal task (#7930 ) During refactoring in https://github.com/neondatabase/neon/pull/7887 I forgot to add "WAL removal" span with ttid. This commit fixes it.	2024-06-01 12:23:59 +01:00
Arthur Petukhovsky	e98bc4fd2b	Run gc on too many partial backup segments (#7700 ) The general partial backup idea is that each safekeeper keeps only one partial segment in remote storage at a time. Sometimes this is not true, for example if we uploaded object to S3 but got an error when tried to remove the previous upload. In this case we still keep a list of all potentially uploaded objects in safekeeper state. This commit prints a warning to logs if there is too many objects in safekeeper state. This is not expected and we should try to fix this state, we can do this by running gc. I haven't seen this being an issue anywhere, but printing a warning is something that I wanted to do and forgot in initial PR.	2024-06-01 00:18:56 +01:00
John Spray	7e60563910	pageserver: add GcError type (#7917 ) ## Problem - Because GC exposes all errors as an anyhow::Error, we have intermittent issues with spurious log errors during shutdown, e.g. in this failure of a performance test https://neon-github-public-dev.s3.amazonaws.com/reports/main/9300804302/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/214a2154f6f0217a/ ``` Gc failed 1 times, retrying in 2s: shutting down ``` GC really doesn't do a lot of complicated IO: it doesn't benefit from the backtrace capabilities of anyhow::Error, and can be expressed more robustly as an enum. ## Summary of changes - Add GcError type and use it instead of anyhow::Error in GC functions - In `gc_iteration_internal`, return GcError::Cancelled on shutdown rather than Ok(()) (we only used Ok before because we didn't have a clear cancellation error variant to use). - In `gc_iteration_internal`, skip past timelines that are shutting down, to avoid having to go through another GC iteration if we happen to see a deleting timeline during a GC run. - In `refresh_gc_info_internal`, avoid an error case where a timeline might not be found after being looked up, by carrying an Arc<Timeline> instead of a TimelineId between the first loop and second loop in the function. - In HTTP request handler, handle Cancelled variants as 503 instead of turning all GC errors into 500s.	2024-05-31 22:20:06 +01:00
Joonas Koivunen	ef83f31e77	pagectl: key command for dumping what we know about the key (#7890 ) What we know about the key via added `pagectl key $key` command: - debug formatting - shard placement when `--shard-count` is specified - different boolean queries in `key.rs` - aux files v2 Example: ``` $ cargo run -qp pagectl -- key 000000063F00004005000060270000100E2C parsed from hex: 000000063F00004005000060270000100E2C: Key { field1: 0, field2: 1599, field3: 16389, field4: 24615, field5: 0, field6: 1052204 } rel_block: true rel_vm_block: false rel_fsm_block: false slru_block: false inherited: true rel_size: false slru_segment_size: false recognized kind: None ```	2024-05-31 18:19:41 +00:00
John Spray	9fda85b486	pageserver: remove AncestorStopping error variants (#7916 ) ## Problem In all cases, AncestorStopping is equivalent to Cancelled. This became more obvious in https://github.com/neondatabase/neon/pull/7912#discussion_r1620582309 when updating these error types. ## Summary of changes - Remove AncestorStopping, always use Cancelled instead	2024-05-31 17:02:10 +01:00
Alex Chi Z	87afbf6b24	test(pageserver): add test interface to create artificial layers (#7899 ) This pull request adds necessary interfaces to deterministically create scenarios we want to test. Simplify some test cases to use this interface to make it stable + reproducible. Compaction test will be able to use this interface. Also the upcoming delete tombstone tests will use this interface to make test reproducible. ## Summary of changes * `force_create_image_layer` * `force_create_delta_layer` * `force_advance_lsn` * `create_test_timeline_with_states` * `branch_timeline_test_with_states` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-31 12:00:40 -04:00
Arthur Petukhovsky	16b2e74037	Add FullAccessTimeline guard in safekeepers (#7887 ) This is a preparation for https://github.com/neondatabase/neon/issues/6337. The idea is to add FullAccessTimeline, which will act as a guard for tasks requiring access to WAL files. Eviction will be blocked on these tasks and WAL won't be deleted from disk until there is at least one active FullAccessTimeline. To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`. After eviction is implemented, this function will be responsible for downloading missing WAL file and waiting until the download finishes. This commit also contains other small refactorings: - Separate `get_tenant_dir` and `get_timeline_dir` functions for building a local path. This is useful for looking at usages and finding tasks requiring access to local filesystem. - `timeline_manager` is now responsible for spawning all background tasks - WAL removal task is now spawned instantly after horizon is updated	2024-05-31 13:19:45 +00:00
John Spray	5a394fde56	pageserver: avoid spurious "bad state" logs/errors during shutdown (#7912 ) ## Problem - Initial size calculations tend to fail with `Bad state (not active)` Closes: https://github.com/neondatabase/neon/issues/7911 ## Summary of changes - In `wait_lsn`, return WaitLsnError::Cancelled rather than BadState when the state is Stopping - Replace PageReconstructError's `Other` variant with a specific `BadState` variant - Avoid returning anyhow::Error from get_ready_ancestor_timeline -- this was only used for the case where there was no ancestor. All callers of this function had implicitly checked that the ancestor timeline exists before calling it, so they can pass in the ancestor instead of handling an error.	2024-05-31 13:31:42 +01:00
Arseny Sher	7ec70b5eff	safekeeper: rename epoch to last_log_term. epoch is a historical and potentially confusing name. It semantically means lastLogTerm from the raft paper, so let's use it. This commit changes only internal namings, not public interface (http).	2024-05-31 12:59:13 +03:00
Arseny Sher	1fcc2b37eb	Add test checking term change during pull_timeline.	2024-05-31 12:58:59 +03:00

1 2 3 4 5 ...

5392 Commits