rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-09 06:22:57 +00:00

Author	SHA1	Message	Date
a-masterov	9dda13ecce	Add the image version to the neon-test-extensions image (#8032 ) ## Problem The version was missing in the image name causing the error during the workflow ## Summary of changes Added the version to the image name	2024-06-12 18:15:20 +02:00
Peter Bendel	9ba9f32dfe	Reactivate page bench test in CI after ignoring CopyFail error in pageserver (#8023 ) ## Problem Testcase page bench test_pageserver_max_throughput_getpage_at_latest_lsn had been deactivated because it was flaky. We now ignore copy fail error messages like in `270d3be507/test_runner/regress/test_pageserver_getpage_throttle.py (L17-L20)` and want to reactivate it to see it it is still flaky ## Summary of changes - reactivate the test in CI - ignore CopyFail error message during page bench test cases ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-06-12 16:10:57 +02:00
Vlad Lazar	3099e1a787	storcon_cli: do not drain to undesirable nodes (#8027 ) ## Problem The previous code would attempt to drain to unavailable or unschedulable nodes. ## Summary of Changes Remove such nodes from the list of nodes to fill.	2024-06-12 12:33:54 +01:00
a-masterov	f749437cec	Resolve the problem the docker compose caused by the extensions tests (#8024 ) ## Problem The merging of #7818 caused the problem with the docker-compose file. Running docker compose is now impossible due to the unavailability of the neon-test-extensions:latest image ## Summary of changes Fix the problem: Add the latest tag to the neon-test-extensions image and use the profiles feature of the docker-compose file to avoid loading the neon-test-extensions container if it is not needed.	2024-06-12 12:25:13 +02:00
Heikki Linnakangas	0a256148b0	Update documentation on running locally with Docker (#8020 ) - Fix the dockerhub URLs - `neondatabase/compute-node` image has been replaced with Postgres version specific images like `neondatabase/compute-node-v16` - Use TAG=latest in the example, rather than some old tag. That's a sensible default for people to copy-past - For convenience, use a Postgres connection URL in the `psql` example that also includes the password. That way, there's no need to set up .pgpass - Update the image names in `docker ps` example to match what you get when you follow the example	2024-06-12 07:06:00 +00:00
Heikki Linnakangas	69aa1aca35	Update default Postgres version in docker-compose.yml (#8019 ) Let's be modern.	2024-06-12 09:19:24 +03:00
Heikki Linnakangas	9983ae291b	Another attempt at making test_vm_bits less flaky (#7989 ) - Split the first and second parts of the test to two separate tests - In the first test, disable the aggressive GC, compaction, and autovacuum. They are only needed by the second test. I'd like to get the first test to a point that the VM page is never all-zeros. Disabling autovacuum in the first test is hopefully enough to accomplish that. - Compare the full page images, don't skip page header. After fixing the previous point, there should be no discrepancy. LSN still won't match, though, because of commit `387a36874c`. Fixes issue https://github.com/neondatabase/neon/issues/7984	2024-06-12 09:18:52 +03:00
Sasha Krassovsky	b7a0c2b614	Add On-demand WAL Download to logicalfuncs (#7960 ) We implemented on-demand WAL download for walsender, but other things that may want to read the WAL from safekeepers don't do that yet. This PR makes it do that by adding the same set of hooks to logicalfuncs. Addresses https://github.com/neondatabase/neon/issues/7959 Also relies on: https://github.com/neondatabase/postgres/pull/438 https://github.com/neondatabase/postgres/pull/437 https://github.com/neondatabase/postgres/pull/436	2024-06-11 17:59:32 -07:00
Arpad Müller	27518676d7	Rename S3 scrubber to storage scrubber (#8013 ) The S3 scrubber contains "S3" in its name, but we want to make it generic in terms of which storage is used (#7547). Therefore, rename it to "storage scrubber", following the naming scheme of already existing components "storage broker" and "storage controller". Part of #7547	2024-06-11 22:45:22 +00:00
Heikki Linnakangas	78a59b94f5	Copy editor config for the neon extension from PostgreSQL (#8009 ) This makes IDEs and github diff format the code the same way as PostgreSQL sources, which is the style we try to maintain.	2024-06-11 23:19:18 +03:00
Vlad Lazar	7121db3669	storcon_cli: add 'drain' command (#8007 ) ## Problem We need the ability to prepare a subset of storage controller managed pageservers for decommisioning. The storage controller cannot currently express this in terms of scheduling constraints (it's a pretty special case, so I'm not sure it even should). ## Summary of Changes A new `drain` command is added to `storcon_cli`. It takes a set of nodes to drain and migrates primary attachments outside of said set. Simple round robing assignment is used under the assumption that nodes outside of the draining set are evenly balanced. Note that secondary locations are not migrated. This is fine for staging, but the migration API will have to be extended for prod in order to allow migration of secondaries as well. I've tested this out against a neon local cluster. The immediate use for this command will be to migrate staging to ARM(Arch64) pageservers. Related https://github.com/neondatabase/cloud/issues/14029	2024-06-11 16:39:38 +00:00
Vlad Lazar	126bcc3794	storcon: track number of attached shards for each node (#8011 ) ## Problem The storage controller does not track the number of shards attached to a given pageserver. This is a requirement for various scheduling operations (e.g. draining and filling will use this to figure out if the cluster is balanced) ## Summary of Changes Track the number of shards attached to each node. Related https://github.com/neondatabase/neon/issues/7387	2024-06-11 16:03:25 +01:00
Alex Chi Z	4c2100794b	feat(pageserver): initial code sketch & test case for combined gc+compaction at gc_horizon (#7948 ) A demo for a building block for compaction. The GC-compaction operation iterates all layers below/intersect with the GC horizon, and do a full layer rewrite of all of them. The end result will be image layer covering the full keyspace at GC-horizon, and a bunch of delta layers above the GC-horizon. This helps us collect the garbages of the test_gc_feedback test case to reduce space amplification. This operation can be manually triggered using an HTTP API or be triggered based on some metrics. Actual method TBD. The test is very basic and it's very likely that most part of the algorithm will be rewritten. I would like to get this merged so that I can have a basic skeleton for the algorithm and then make incremental changes. <img width="924" alt="image" src="https://github.com/neondatabase/neon/assets/4198311/f3d49f4e-634f-4f56-986d-bfefc6ae6ee2"> --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-11 14:14:51 +00:00
Joonas Koivunen	d3b892e9ad	test: fix duplicated harness name (#8010 ) We need unique tenant harness names in case you want to inspect the results of the last failing run. We are not using any proc macros to get the test name as there is no stable way of doing that, and there will not be one in the future, so we need to fix these duplicates. Also, clean up the duplicated tests to not mix `?` and `unwrap/assert`.	2024-06-11 10:10:05 -04:00
Joonas Koivunen	7515d0f368	fix: stop storing TimelineMetadata in index_part.json as bytes (#7699 ) We've stored metadata as bytes within the `index_part.json` for long fixed reasons. #7693 added support for reading out normal json serialization of the `TimelineMetadata`. Change the serialization to only write `TimelineMetadata` as json for going forward, keeping the backward compatibility to reading the metadata as bytes. Because of failure to include `alias = "metadata"` in #7693, one more follow-up is required to make the switch from the old name to `"metadata": <json>`, but that affects only the field name in serialized format. In documentation and naming, an effort is made to add enough warning signs around TimelineMetadata so that it will receive no changes in the future. We can add those fields to `IndexPart` directly instead. Additionally, the path to cleaning up `metadata.rs` is documented in the `metadata.rs` module comment. If we must extend `TimelineMetadata` before that, the duplication suggested in [review comment] is the way to go. [review comment]: https://github.com/neondatabase/neon/pull/7699#pullrequestreview-2107081558	2024-06-11 15:38:54 +03:00
a-masterov	e27ce38619	Add testing for extensions (#7818 ) ## Problem We need automated tests of extensions shipped with Neon to detect possible problems. ## Summary of changes A new image neon-test-extensions is added. Workflow changes to test the shipped extensions are added as well. Currently, the regression tests, shipped with extensions are in use. Some extensions, i.e. rum, timescaledb, rdkit, postgis, pgx_ulid, pgtap, pg_tiktoken, pg_jsonschema, pg_graphql, kq_imcx, wal2json_2_5 are excluded due to problems or absence of internal tests. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-06-11 13:07:51 +02:00
Joonas Koivunen	e46692788e	refactor: Timeline layer flushing (#7993 ) The new features have deteriorated layer flushing, most recently with #7927. Changes: - inline `Timeline::freeze_inmem_layer` to the only caller - carry the TimelineWriterState guard to the actual point of freezing the layer - this allows us to `#[cfg(feature = "testing")]` the assertion added in #7927 - remove duplicate `flush_frozen_layer` in favor of splitting the `flush_frozen_layers_and_wait` - this requires starting the flush loop earlier for `checkpoint_distance < initdb size` tests	2024-06-10 19:34:34 +03:00
Alex Chi Z	a8ca7a1a1d	docs: highlight neon env comes with an initial timeline (#7995 ) Quite a few existing test cases create their own timelines instead of using the default one. This pull request highlights that and hopefully people can write simpler tests in the future. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-06-10 12:08:16 -04:00
Joonas Koivunen	b52e31c1a4	fix: allow layer flushes more often (#7927 ) As seen with the pgvector 0.7.0 index builds, we can receive large batches of images, leading to very large L0 layers in the range of 1GB. These large layers are produced because we are only able to roll the layer after we have witnessed two different Lsns in a single `DataDirModification::commit`. As the single Lsn batches of images can span over multiple `DataDirModification` lifespans, we will rarely get to write two different Lsns in a single `put_batch` currently. The solution is to remember the TimelineWriterState instead of eagerly forgetting it until we really open the next layer or someone else flushes (while holding the write_guard). Additional changes are test fixes to avoid "initdb image layer optimization" or ignoring initdb layers for assertion. Cc: #7197 because small `checkpoint_distance` will now trigger the "initdb image layer optimization"	2024-06-10 13:50:17 +00:00
Heikki Linnakangas	5a7e285c2c	Simplify scanning compute logs in tests (#7997 ) Implement LogUtils in the Endpoint fixture class, so that the "log_contains" function can be used on compute logs too. Per discussion at: https://github.com/neondatabase/neon/pull/7288#discussion_r1623633803	2024-06-10 12:52:49 +00:00
Christian Schwarz	ae5badd375	Revert "Include openssl and ICU statically linked" (#8003 ) Reverts neondatabase/neon#7956 Rationale: compute incompatibilties Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH Relevant quotes from @hlinnaka > If we go through with the current release candidate, but the compute is pinned, people who create new projects will get that warning, which is silly. To them, it looks like the ICU version was downgraded, because initdb was run with newer version. > We should upgrade the ICU version eventually. And when we do that, users with old projects that use ICU will start to see that warning. I think that's acceptable, as long as we do homework, notify users, and communicate that properly. > When do that, we should to try to upgrade the storage and compute versions at roughly the same time.	2024-06-10 13:20:20 +02:00
Alex Chi Z	3e63d0f9e0	test(pageserver): quantify compaction outcome (#7867 ) A simple API to collect some statistics after compaction to easily understand the result. The tool reads the layer map, and analyze range by range instead of doing single-key operations, which is more efficient than doing a benchmark to collect the result. It currently computes two key metrics: * Latest data access efficiency, which finds how many delta layers / image layers the system needs to iterate before returning any key in a key range. * (Approximate) PiTR efficiency, as in https://github.com/neondatabase/neon/issues/7770, which is simply the number of delta files in the range. The reason behind that is, assume no image layer is created, PiTR efficiency is simply the cost of collect records from the delta layers, and the replay time. Number of delta files (or in the future, estimated size of reads) is a simple yet efficient way of estimating how much effort the page server needs to reconstruct a page. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-10 10:42:13 +02:00
Rahul Patil	3b647cd55d	Include openssl and ICU statically linked (#7956 ) ## Problem Due to the upcoming End of Life (EOL) for Debian 11, we need to upgrade the base OS for Pageservers from Debian 11 to Debian 12 for security reasons. When deploying a new Pageserver on Debian 12 with the same binary built on Debian 11, we encountered the following errors: ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v16/bin/initdb: error while loading shared libraries: libicuuc.so.67: cannot open shared object file: No such file or directory ``` and ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v14/bin/initdb: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory ``` These issues occur when creating new projects. ## Summary of changes - To address these issues, we configured PostgreSQL build to use statically linked OpenSSL and ICU libraries. - This resolves the missing shared library errors when running the binaries on Debian 12. Closes: https://github.com/neondatabase/cloud/issues/12648 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [x] Do not forget to reformat commit message to not include the above checklist	2024-06-07 17:28:10 +00:00
Tristan Partin	26c68f91f3	Move SQL migrations out of line It makes them much easier to reason about, and allows other SQL tooling to operate on them like language servers, formatters, etc. I also brought back the removed migrations such that we can more easily understand what they were. I included a "-- SKIP" comment describing why those migrations are now skipped. We no longer skip migrations by checking if it is empty, but instead check to see if the migration starts with "-- SKIP".	2024-06-07 08:35:55 -07:00
a-masterov	2078dc827b	CI: copy run-* labels from external contributors' PRs (#7915 ) ## Problem We don't carry run-* labels from external contributors' PRs to ci-run/pr-* PRs. This is not really convenient. Need to sync labels in approved-for-ci-run workflow. ## Summary of changes Added the procedure of transition of labels from the original PR ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-06-07 10:04:59 +02:00
Joonas Koivunen	8ee191c271	test_local_only_layers_after_crash: various fixes (#7986 ) In #7927 I needed to fix this test case, but the fixes should be possible to land irrespective of the layer ingestion code change. The most important fix is the behavior if an image layer is found: the assertion message formatting raises a runtime error, which obscures the fact that we found an image layer.	2024-06-07 10:18:05 +03:00
Anastasia Lubennikova	66c6b270f1	Downgrade No response from reading prefetch entry WARNING to LOG	2024-06-06 20:56:19 +01:00
Arthur Petukhovsky	e4e444f59f	Remove random sleep in partial backup (#7982 ) We had a random sleep in the beginning of partial backup task, which was needed for the first partial backup deploy. It helped with gradual upload of segments without causing network overload. Now partial backup is deployed everywhere, so we don't need this random sleep anymore. We also had an issue related to this, in which manager task was not shut down for a long time. The cause of the issue is this random sleep that didn't take timeline cancellation into account, meanwhile manager task waited for partial backup to complete. Fixes https://github.com/neondatabase/neon/issues/7967	2024-06-06 17:54:44 +00:00
Joonas Koivunen	d46d19456d	raise the warning for oversized L0 to 2target (#7985 ) currently we warn even by going over a single byte. even that will be hit much more rarely once #7927 lands, but get this in earlier. rationale for 2checkpoint_distance: anything smaller is not really worth a warn. we have an global allowed_error for this warning, which still cannot be removed nor can it be removed with #7927 because of many tests with very small `checkpoint_distance`.	2024-06-06 20:18:39 +03:00
Alex Chi Z	5d05013857	fix(pageserver): skip metadata compaction is LSN is not accumulated enough (#7962 ) close https://github.com/neondatabase/neon/issues/7937 Only trigger metadata image layer creation if enough delta layers are accumulated. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 11:34:44 -04:00
Alex Chi Z	014509987d	fix(pageserver): more flexible layer size test (#7945 ) M-series macOS has different alignments/size for some fields (which I did not investigate in detail) and therefore this test cannot pass on macOS. Fixed by using `<=` for the comparison so that we do not test for an exact match. observed by @yliang412 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 14:40:58 +00:00
Arpad Müller	75bca9bb19	Perform retries on azure bulk deletion (#7964 ) This adds retries to the bulk deletion, because if there is a certain chance n that a request fails, the chance that at least one of the requests in a chain of requests fails increases exponentially. We've had similar issues with the S3 DR tests, which in the end yielded in adding retries at the remote_storage level. Retries at the top level are not sufficient when one remote_storage "operation" is multiple network requests in a trench coat, especially when there is no notion of saving the progress: even if prior deletions had been successful, we'd still need to get a 404 in order to continue the loop and get to the point where we failed in the last iteration. Maybe we'll fail again but before we've even reached it. Retries at the bottom level avoid this issue because they have the notion of progress and also when one network operation fails, only that operation is retried. First part of #7931.	2024-06-06 14:21:27 +00:00
Joonas Koivunen	a8be07785e	fix: do TimelineMetrics::shutdown only once (#7983 ) Related to #7341 tenant deletion will end up shutting down timelines twice, once before actually starting and the second time when per timeline deletion is requested. Shutting down TimelineMetrics causes underflows. Add an atomic boolean and only do the shutdown once.	2024-06-06 14:20:54 +00:00
Yuchen Liang	630cfbe420	refactor(pageserver): designated api error type for cancelled request (#7949 ) Closes #7406. ## Problem When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is exposed to handle that case, which verbosely logs the error. However, we don't benefit from having the full backtrace provided by anyhow in this case. ## Summary of changes This PR introduces a new `ApiError` type to handle errors caused by cancelled request more robustly. - A new enum variant `ApiError::Cancelled` - Currently the cancelled request is mapped to status code 500. - Need to handle this error in proxy's `http_util` as well. - Added a failpoint test to simulate cancelled `get_lsn_by_timestamp` request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-06 14:00:14 +00:00
Christian Schwarz	0a65333fff	chore(walredo): avoid duplicate tenant_id and shard_slug fields (#7977 ) spotted during reviews of async walredo work in #6628	2024-06-06 15:10:16 +02:00
John Spray	91dd99038e	pageserver/controller: enable tenant deletion without attachment (#7957 ) ## Problem As described in #7952, the controller's attempt to reconcile a tenant before finally deleting it can get hung up waiting for the compute notification hook to accept updates. The fact that we try and reconcile a tenant at all during deletion is part of a more general design issue (#5080), where deletion was implemented as an operation on attached tenant, requiring the tenant to be attached in order to delete it, which is not in principle necessary. Closes: #7952 ## Summary of changes - In the pageserver deletion API, only do the traditional deletion path if the tenant is attached. If it's secondary, then tear down the secondary location, and then do a remote delete. If it's not attached at all, just do the remote delete. - In the storage controller, instead of ensuring a tenant is attached before deletion, do a best-effort detach of the tenant, and then call into some arbitrary pageserver to issue a deletion of remote content. The pageserver retains its existing delete behavior when invoked on attached locations. We can remove this later when all users of the API are updated to either do a detach-before-delete. This will enable removing the "weird" code paths during startup that sometimes load a tenant and then immediately delete it, and removing the deletion markers on tenants.	2024-06-05 20:22:54 +00:00
Christian Schwarz	83ab14e271	chore!: remove walredo_process_kind config option & kind type (#7756 ) refs https://github.com/neondatabase/neon/issues/7753 Preceding PR https://github.com/neondatabase/neon/pull/7754 laid out the plan, this one wraps it up.	2024-06-05 14:21:10 +02:00
Peter Bendel	85ef6b1645	upgrade pgvector from 0.7.0 to 0.7.1 (#7954 ) ## Problem ## Summary of changes performance improvements in pgvector 0.7.1 for hnsw index builds, see https://github.com/pgvector/pgvector/issues/570	2024-06-05 10:32:03 +02:00
Alex Chi Z	1a8d53ab9d	feat(pageserver): compute aux file size on initial logical size calculation (#7958 ) close https://github.com/neondatabase/neon/issues/7822 close https://github.com/neondatabase/neon/issues/7443 Aux file metrics is computed incrementally. If the size is not initialized, the metrics will never show up. This pull request adds the functionality to compute the aux file size on initial logical size calculation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-04 13:47:48 -04:00
Joonas Koivunen	3d6e389aa2	feat: support changing IndexPart::metadata_bytes to json in future release (#7693 ) ## Problem Currently we serialize the `TimelineMetadata` into bytes to put it into `index_part.json`. This `Vec<u8>` (hopefully `[u8; 512]`) representation was chosen because of problems serializing TimelineId and Lsn between different serializers (bincode, json). After #5335, the serialization of those types became serialization format aware or format agnostic. We've removed the pageserver local `metadata` file writing in #6769. ## Summary of changes Allow switching from the current serialization format to plain JSON for the legacy TimelineMetadata format in the future by adding a competitive serialization method to the current one (`crate::tenant::metadata::modern_serde`), which accepts both old bytes and new plain JSON. The benefits of this are that dumping the index_part.json with pretty printing no longer produces more than 500 lines of output, but after enabling it produces lines only proportional to the layer count, like: ```json { "version": ???, "layer_metadata": { ... }, "disk_consistent_lsn": "0/15FD5D8", "legacy_metadata": { "disk_consistent_lsn": "0/15FD5D8", "prev_record_lsn": "0/15FD5A0", "ancestor_timeline": null, "ancestor_lsn": "0/0", "latest_gc_cutoff_lsn": "0/149FD18", "initdb_lsn": "0/149FD18", "pg_version": 15 } } ``` In the future, I propose we completely stop using this legacy metadata type and wasting time trying to come up with another version numbering scheme in addition to the informative-only one already found in `index_part.json`, and go ahead with storing metadata or feature flags on the `index_part.json` itself. #7699 is the "one release after" changes which starts to produce metadata in the index_part.json as json.	2024-06-04 19:36:22 +03:00
Christian Schwarz	17116f2ea9	fix(pageserver): abort on duplicate layers, before doing damage (#7799 ) fixes https://github.com/neondatabase/neon/issues/7790 (duplicating most of the issue description here for posterity) # Background From the time before always-authoritative `index_part.json`, we had to handle duplicate layers. See the RFC for an illustration of how duplicate layers could happen: `a8e6d259cb/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md (L41-L50)` As of #5198 , we should not be exposed to that problem anymore. # Problem 1 We still have 1. [code in Pageserver](`82960b2175/pageserver/src/tenant/timeline.rs (L4502-L4521)`) than handles duplicate layers 2. [tests in the test suite](`d9dcbffac3/test_runner/regress/test_duplicate_layers.py (L15)`) that demonstrates the problem using a failpoint However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production. What is does instead is to return early with an `Ok()`, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised. That "return early" would be a bug in the routine if it happened in production. So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior. # Problem 2 Further, if production code _did_ (it nowawdays doesn't!) create a duplicate layer, the code in Pageserver that handles the condition (item 1 above) is too little and too late: * the code handles it by discarding the newer `struct Layer`; that's good. * however, on disk, we have already overwritten the old with the new layer file * the fact that we do it atomically doesn't matter because ... * if the new layer file is not bit-identical, then we have a cache coherency problem * PS PageCache block cache: caches old bit battern * blob_io offsets stored in variables, based on pre-overwrite bit pattern / offsets * => reading based on these offsets from the new file might yield different data than before # Solution - Remove the test suite code pertaining to Problem 1 - Move & rename test suite code that actually tests RFC-27 crash-consistent layer map. - Remove the Pageserver code that handles duplicate layers too late (Problem 1) - Use `RENAME_NOREPLACE` to prevent over-rename the file during `.finish()`, bail with an error if it happens (Problem 2) - This bailing prevents the caller from even trying to insert into the layer map, as they don't even get a `struct Layer` at hand. - Add `abort`s in the place where we have the layer map lock and check for duplicates (Problem 2) - Note again, we can't reach there because we bail from `.finish()` much earlier in the code. - Share the logic to clean up after failed `.finish()` between image layers and delta layers (drive-by cleanup) - This exposed that test `image_layer_rewrite` was overwriting layer files in place. Fix the test. # Future Work This PR adds a new failure scenario that was previously "papered over" by the overwriting of layers: 1. Start a compaction that will produce 3 layers: A, B, C 2. Layer A is `finish()`ed successfully. 3. Layer B fails mid-way at some `put_value()`. 4. Compaction bails out, sleeps 20s. 5. Some disk space gets freed in the meantime. 6. Compaction wakes from sleep, another iteration starts, it attempts to write Layer A again. But the `.finish()` fails because A already exists on disk. The failure in step 5 is new with this PR, and it causes the compaction to get stuck. Before, it would silently overwrite the file and "successfully" complete the second iteration. The mitigation for this is to `/reset` the tenant.	2024-06-04 16:16:23 +00:00
John Spray	fd22fc5b7d	pageserver: include heatmap in tenant deletion (#7928 ) ## Problem This was an oversight when adding heatmaps: because they are at the top level of the tenant, they aren't included in the catch-all list & delete that happens for timeline paths. This doesn't break anything, but it leaves behind a few kilobytes of garbage in the S3 bucket after a tenant is deleted, generating work for the scrubber. ## Summary of changes - During deletion, explicitly remove the heatmap file - In test_tenant_delete_smoke, upload a heatmap so that the test would fail its "remote storage empty after delete" check if we didn't delete it.	2024-06-04 16:16:50 +01:00
Joonas Koivunen	0112097e13	feat(rtc): maintain dirty and uploaded IndexPart (#7833 ) RemoteTimelineClient maintains a copy of "next IndexPart" as a number of fields which are like an IndexPart but this is not immediately obvious. Instead of multiple fields, maintain a `dirty` ("next IndexPart") and `clean` ("uploaded IndexPart") fields. Additional cleanup: - rename `IndexPart::disk_consistent_lsn` accessor `duplicated_disk_consistent_lsn` - no one except scrubber should be looking at it, even scrubber is a stretch - remove usage elsewhere (pagectl used by tests, metadata scan endpoint) - serialize index part before the index upload operation - avoid upload operation being retried because of serialization error - serialization error is fatal anyway for timeline -- it can only make transient local progress after that, at least the error is bubbled up now - gather exploded IndexPart fields into single actual `UploadQueueInitialized::dirty` of which the uploaded snapshot is serialized - implement the long wished monotonicity check with the `clean` IndexPart with an assertion which is not expected to fire Continued work from #7860 towards next step of #6994.	2024-06-04 17:27:08 +03:00
Joonas Koivunen	9d4c113f9b	build(Dockerfile.compute-node): do not log tar contents (#7953 ) in build logs we get a lot of lines for building the compute node images because of verbose tar unpack. we know the sha256 so we don't need to log the contents. my hope is that this will allow us more reliably use the github live updating log view.	2024-06-04 12:42:57 +01:00
Joonas Koivunen	0acb604fa3	test: no missed wakeups, cancellation and timeout flow to downloads (#7863 ) I suspected a wakeup could be lost with `remote_storage::support::DownloadStream` if the cancellation and inner stream wakeups happen simultaneously. The next poll would only return the cancellation error without setting the wakeup. There is no lost wakeup because the single future for getting the cancellation error is consumed when the value is ready, and a new future is created for the next value. The new future is always polled. Similarly, if only the `Stream::poll_next` is being used after a `Some(_)` value has been yielded, it makes no sense to have an expectation of a wakeup for the (N+1)th stream value already set because when a value is wanted, `Stream::poll_next` will be called. A test is added to show that the above is true. Additionally, there was a question of these cancellations and timeouts flowing to attached or secondary tenant downloads. A test is added to show that this, in fact, happens. Lastly, a warning message is logged when a download stream is polled after a timeout or cancellation error (currently unexpected) so we can rule it out while troubleshooting.	2024-06-04 14:19:36 +03:00
Konstantin Knizhnik	387a36874c	Set page LSN when reconstructing VM in page server (#7935 ) ## Problem Page LSN is not set while VM update. May be reason of test_vm_bits flukyness. Buit more serious issues can be also caused by wrong LSN. Related: https://github.com/neondatabase/neon/pull/7935 ## Summary of changes - In `apply_in_neon`, set the LSN bytes when applying records of type `ClearVisibilityMapFlags`	2024-06-04 09:56:03 +01:00
Anna Khanova	00032c9d9f	[proxy] Fix dynamic rate limiter (#7950 ) ## Problem There was a bug in dynamic rate limiter, which exhausted CPU in proxy and proxy wasn't able to accept any connections. ## Summary of changes 1. `if self.available > 1` -> `if self.available >= 1` 2. remove `timeout_at` to use just timeout 3. remove potential infinite loops which can exhaust CPUs.	2024-06-04 05:07:54 +01:00
John Spray	11bb265de1	pageserver: don't squash all image layer generation errors into anyhow::Error (#7943 ) ## Problem CreateImageLayersError and CompactionError had proper From implementations, but compact_legacy was explicitly squashing all image layer errors into an anyhow::Error anyway. This led to errors like: ``` Error processing HTTP request: InternalServerError(timeline shutting down Stack backtrace: 0: <<anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from as core::ops::function::FnOnce<(pageserver::tenant::timeline::CreateImageLayersError,)>>::call_once at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5 1: <core::result::Result<alloc::vec::Vec<pageserver::tenant::storage_layer::layer::ResidentLayer>, pageserver::tenant::timeline::CreateImageLayersError>>::map_err::<anyhow::Error, <anyhow::Error as core::convert::From<pageserver::tenant::timeline::CreateImageLayersError>>::from> at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:829:27 2: <pageserver::tenant::timeline::Timeline>::compact_legacy::{closure#0} at pageserver/src/tenant/timeline/compaction.rs:125:36 3: <pageserver::tenant::timeline::Timeline>::compact::{closure#0} at pageserver/src/tenant/timeline.rs:1719:84 4: pageserver::http::routes::timeline_checkpoint_handler::{closure#0}::{closure#0} ``` Closes: https://github.com/neondatabase/neon/issues/7861	2024-06-03 22:10:13 +02:00
John Spray	69026a9a36	storcon_cli: add 'drop' and eviction interval utilities (#7938 ) The storage controller has 'drop' APIs for tenants and nodes, for use in situations where something weird has happened: - node-drop is useful until we implement proper node decom, or if we have a partially provisioned node that somehow gets registered with the storage controller but is then dead. - tenant-drop is useful if we accidentally add a tenant that shouldn't be there at all, or if we want to make the controller forget about a tenant without deleting its data. For example, if one uses the tenant-warmup command with a bad tenant ID and needs to clean that up. The drop commands require an `--unsafe` parameter, to reduce the chance that someone incorrectly assumes these are the normal/clean ways to delete things. This PR also adds a convenience command for setting the time based eviction parameters on a tenant. This is useful when onboarding an existing tenant that has high resident size due to storage amplification in compaction: setting a lower time based eviction threshold brings down the resident size ahead of doing a shard split.	2024-06-03 18:13:01 +00:00
Konstantin Knizhnik	7006caf3a1	Store logical replication origin in KV storage (#7099 ) Store logical replication origin in KV storage ## Problem See #6977 ## Summary of changes * Extract origin_lsn from commit WAl record * Add ReplOrigin key to KV storage and store origin_lsn * In basebackup replace snapshot origin_lsn with last committed origin_lsn at basebackup LSN ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-06-03 19:37:33 +03:00

... 2 3 4 5 6 ...

5557 Commits