rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 09:22:55 +00:00

Author	SHA1	Message	Date
Vlad Lazar	fce602ed30	review: remove Controller entity	2024-06-17 16:23:14 +01:00
Vlad Lazar	fc24ba5233	review: add drain/fill_node comments	2024-06-17 16:23:14 +01:00
Vlad Lazar	e200a2b01e	review: lift fill plan to a separate function	2024-06-17 16:23:11 +01:00
Vlad Lazar	2a7f224306	review: remove superflous sequence number bump	2024-06-17 11:46:39 +01:00
Vlad Lazar	d86ddf2b76	review: get_node_status -> get_node	2024-06-17 11:45:49 +01:00
Vlad Lazar	86d5f4ada9	review: kick_waiters -> await_waiters_remainder	2024-06-17 11:44:39 +01:00
Vlad Lazar	089edb55e8	tests: add storcon graceful restarts smoke test	2024-06-13 18:51:02 +01:00
Vlad Lazar	1302f9442a	storcon: add node status endpoint	2024-06-13 18:51:02 +01:00
Vlad Lazar	80612d2688	storcon: reset transient node policies on re-attach	2024-06-13 18:51:02 +01:00
Vlad Lazar	7f96ac3435	storcon: change default scheduling policy to Active	2024-06-13 18:51:02 +01:00
Vlad Lazar	999fbbb2a3	storcon: disallow attachment optimisations for nodes in filling state	2024-06-13 18:51:02 +01:00
Vlad Lazar	d22e0b5398	storcon: plug drain and fill operations to the controller	2024-06-13 18:51:02 +01:00
Vlad Lazar	58340f9dbf	storcon: add node fill algorithm	2024-06-13 18:50:58 +01:00
Vlad Lazar	fcbac527b0	storcon: add node drain algorithm	2024-06-13 18:42:52 +01:00
Vlad Lazar	a5154cf990	storcon: add util for kicking a set of waiters repeatedly	2024-06-11 16:03:52 +01:00
Vlad Lazar	bfe5df8c4e	storcon: add PauseForRestart node scheduling policy	2024-06-11 16:03:52 +01:00
Vlad Lazar	46927bc228	storcon: expose node scheduling policy	2024-06-11 16:03:52 +01:00
Vlad Lazar	bb9c792813	storcon: add background node operations controller skeleton	2024-06-11 16:03:52 +01:00
Vlad Lazar	126bcc3794	storcon: track number of attached shards for each node (#8011 ) ## Problem The storage controller does not track the number of shards attached to a given pageserver. This is a requirement for various scheduling operations (e.g. draining and filling will use this to figure out if the cluster is balanced) ## Summary of Changes Track the number of shards attached to each node. Related https://github.com/neondatabase/neon/issues/7387	2024-06-11 16:03:25 +01:00
Alex Chi Z	4c2100794b	feat(pageserver): initial code sketch & test case for combined gc+compaction at gc_horizon (#7948 ) A demo for a building block for compaction. The GC-compaction operation iterates all layers below/intersect with the GC horizon, and do a full layer rewrite of all of them. The end result will be image layer covering the full keyspace at GC-horizon, and a bunch of delta layers above the GC-horizon. This helps us collect the garbages of the test_gc_feedback test case to reduce space amplification. This operation can be manually triggered using an HTTP API or be triggered based on some metrics. Actual method TBD. The test is very basic and it's very likely that most part of the algorithm will be rewritten. I would like to get this merged so that I can have a basic skeleton for the algorithm and then make incremental changes. <img width="924" alt="image" src="https://github.com/neondatabase/neon/assets/4198311/f3d49f4e-634f-4f56-986d-bfefc6ae6ee2"> --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-11 14:14:51 +00:00
Joonas Koivunen	d3b892e9ad	test: fix duplicated harness name (#8010 ) We need unique tenant harness names in case you want to inspect the results of the last failing run. We are not using any proc macros to get the test name as there is no stable way of doing that, and there will not be one in the future, so we need to fix these duplicates. Also, clean up the duplicated tests to not mix `?` and `unwrap/assert`.	2024-06-11 10:10:05 -04:00
Joonas Koivunen	7515d0f368	fix: stop storing TimelineMetadata in index_part.json as bytes (#7699 ) We've stored metadata as bytes within the `index_part.json` for long fixed reasons. #7693 added support for reading out normal json serialization of the `TimelineMetadata`. Change the serialization to only write `TimelineMetadata` as json for going forward, keeping the backward compatibility to reading the metadata as bytes. Because of failure to include `alias = "metadata"` in #7693, one more follow-up is required to make the switch from the old name to `"metadata": <json>`, but that affects only the field name in serialized format. In documentation and naming, an effort is made to add enough warning signs around TimelineMetadata so that it will receive no changes in the future. We can add those fields to `IndexPart` directly instead. Additionally, the path to cleaning up `metadata.rs` is documented in the `metadata.rs` module comment. If we must extend `TimelineMetadata` before that, the duplication suggested in [review comment] is the way to go. [review comment]: https://github.com/neondatabase/neon/pull/7699#pullrequestreview-2107081558	2024-06-11 15:38:54 +03:00
a-masterov	e27ce38619	Add testing for extensions (#7818 ) ## Problem We need automated tests of extensions shipped with Neon to detect possible problems. ## Summary of changes A new image neon-test-extensions is added. Workflow changes to test the shipped extensions are added as well. Currently, the regression tests, shipped with extensions are in use. Some extensions, i.e. rum, timescaledb, rdkit, postgis, pgx_ulid, pgtap, pg_tiktoken, pg_jsonschema, pg_graphql, kq_imcx, wal2json_2_5 are excluded due to problems or absence of internal tests. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-06-11 13:07:51 +02:00
Joonas Koivunen	e46692788e	refactor: Timeline layer flushing (#7993 ) The new features have deteriorated layer flushing, most recently with #7927. Changes: - inline `Timeline::freeze_inmem_layer` to the only caller - carry the TimelineWriterState guard to the actual point of freezing the layer - this allows us to `#[cfg(feature = "testing")]` the assertion added in #7927 - remove duplicate `flush_frozen_layer` in favor of splitting the `flush_frozen_layers_and_wait` - this requires starting the flush loop earlier for `checkpoint_distance < initdb size` tests	2024-06-10 19:34:34 +03:00
Alex Chi Z	a8ca7a1a1d	docs: highlight neon env comes with an initial timeline (#7995 ) Quite a few existing test cases create their own timelines instead of using the default one. This pull request highlights that and hopefully people can write simpler tests in the future. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-06-10 12:08:16 -04:00
Joonas Koivunen	b52e31c1a4	fix: allow layer flushes more often (#7927 ) As seen with the pgvector 0.7.0 index builds, we can receive large batches of images, leading to very large L0 layers in the range of 1GB. These large layers are produced because we are only able to roll the layer after we have witnessed two different Lsns in a single `DataDirModification::commit`. As the single Lsn batches of images can span over multiple `DataDirModification` lifespans, we will rarely get to write two different Lsns in a single `put_batch` currently. The solution is to remember the TimelineWriterState instead of eagerly forgetting it until we really open the next layer or someone else flushes (while holding the write_guard). Additional changes are test fixes to avoid "initdb image layer optimization" or ignoring initdb layers for assertion. Cc: #7197 because small `checkpoint_distance` will now trigger the "initdb image layer optimization"	2024-06-10 13:50:17 +00:00
Heikki Linnakangas	5a7e285c2c	Simplify scanning compute logs in tests (#7997 ) Implement LogUtils in the Endpoint fixture class, so that the "log_contains" function can be used on compute logs too. Per discussion at: https://github.com/neondatabase/neon/pull/7288#discussion_r1623633803	2024-06-10 12:52:49 +00:00
Christian Schwarz	ae5badd375	Revert "Include openssl and ICU statically linked" (#8003 ) Reverts neondatabase/neon#7956 Rationale: compute incompatibilties Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH Relevant quotes from @hlinnaka > If we go through with the current release candidate, but the compute is pinned, people who create new projects will get that warning, which is silly. To them, it looks like the ICU version was downgraded, because initdb was run with newer version. > We should upgrade the ICU version eventually. And when we do that, users with old projects that use ICU will start to see that warning. I think that's acceptable, as long as we do homework, notify users, and communicate that properly. > When do that, we should to try to upgrade the storage and compute versions at roughly the same time.	2024-06-10 13:20:20 +02:00
Alex Chi Z	3e63d0f9e0	test(pageserver): quantify compaction outcome (#7867 ) A simple API to collect some statistics after compaction to easily understand the result. The tool reads the layer map, and analyze range by range instead of doing single-key operations, which is more efficient than doing a benchmark to collect the result. It currently computes two key metrics: * Latest data access efficiency, which finds how many delta layers / image layers the system needs to iterate before returning any key in a key range. * (Approximate) PiTR efficiency, as in https://github.com/neondatabase/neon/issues/7770, which is simply the number of delta files in the range. The reason behind that is, assume no image layer is created, PiTR efficiency is simply the cost of collect records from the delta layers, and the replay time. Number of delta files (or in the future, estimated size of reads) is a simple yet efficient way of estimating how much effort the page server needs to reconstruct a page. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-10 10:42:13 +02:00
Rahul Patil	3b647cd55d	Include openssl and ICU statically linked (#7956 ) ## Problem Due to the upcoming End of Life (EOL) for Debian 11, we need to upgrade the base OS for Pageservers from Debian 11 to Debian 12 for security reasons. When deploying a new Pageserver on Debian 12 with the same binary built on Debian 11, we encountered the following errors: ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v16/bin/initdb: error while loading shared libraries: libicuuc.so.67: cannot open shared object file: No such file or directory ``` and ``` could not execute operation: pageserver error, status: 500, msg: Command failed with status ExitStatus(unix_wait_status(32512)): /usr/local/neon/v14/bin/initdb: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory ``` These issues occur when creating new projects. ## Summary of changes - To address these issues, we configured PostgreSQL build to use statically linked OpenSSL and ICU libraries. - This resolves the missing shared library errors when running the binaries on Debian 12. Closes: https://github.com/neondatabase/cloud/issues/12648 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [x] Do not forget to reformat commit message to not include the above checklist	2024-06-07 17:28:10 +00:00
Tristan Partin	26c68f91f3	Move SQL migrations out of line It makes them much easier to reason about, and allows other SQL tooling to operate on them like language servers, formatters, etc. I also brought back the removed migrations such that we can more easily understand what they were. I included a "-- SKIP" comment describing why those migrations are now skipped. We no longer skip migrations by checking if it is empty, but instead check to see if the migration starts with "-- SKIP".	2024-06-07 08:35:55 -07:00
a-masterov	2078dc827b	CI: copy run-* labels from external contributors' PRs (#7915 ) ## Problem We don't carry run-* labels from external contributors' PRs to ci-run/pr-* PRs. This is not really convenient. Need to sync labels in approved-for-ci-run workflow. ## Summary of changes Added the procedure of transition of labels from the original PR ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-06-07 10:04:59 +02:00
Joonas Koivunen	8ee191c271	test_local_only_layers_after_crash: various fixes (#7986 ) In #7927 I needed to fix this test case, but the fixes should be possible to land irrespective of the layer ingestion code change. The most important fix is the behavior if an image layer is found: the assertion message formatting raises a runtime error, which obscures the fact that we found an image layer.	2024-06-07 10:18:05 +03:00
Anastasia Lubennikova	66c6b270f1	Downgrade No response from reading prefetch entry WARNING to LOG	2024-06-06 20:56:19 +01:00
Arthur Petukhovsky	e4e444f59f	Remove random sleep in partial backup (#7982 ) We had a random sleep in the beginning of partial backup task, which was needed for the first partial backup deploy. It helped with gradual upload of segments without causing network overload. Now partial backup is deployed everywhere, so we don't need this random sleep anymore. We also had an issue related to this, in which manager task was not shut down for a long time. The cause of the issue is this random sleep that didn't take timeline cancellation into account, meanwhile manager task waited for partial backup to complete. Fixes https://github.com/neondatabase/neon/issues/7967	2024-06-06 17:54:44 +00:00
Joonas Koivunen	d46d19456d	raise the warning for oversized L0 to 2target (#7985 ) currently we warn even by going over a single byte. even that will be hit much more rarely once #7927 lands, but get this in earlier. rationale for 2checkpoint_distance: anything smaller is not really worth a warn. we have an global allowed_error for this warning, which still cannot be removed nor can it be removed with #7927 because of many tests with very small `checkpoint_distance`.	2024-06-06 20:18:39 +03:00
Alex Chi Z	5d05013857	fix(pageserver): skip metadata compaction is LSN is not accumulated enough (#7962 ) close https://github.com/neondatabase/neon/issues/7937 Only trigger metadata image layer creation if enough delta layers are accumulated. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 11:34:44 -04:00
Alex Chi Z	014509987d	fix(pageserver): more flexible layer size test (#7945 ) M-series macOS has different alignments/size for some fields (which I did not investigate in detail) and therefore this test cannot pass on macOS. Fixed by using `<=` for the comparison so that we do not test for an exact match. observed by @yliang412 Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-06 14:40:58 +00:00
Arpad Müller	75bca9bb19	Perform retries on azure bulk deletion (#7964 ) This adds retries to the bulk deletion, because if there is a certain chance n that a request fails, the chance that at least one of the requests in a chain of requests fails increases exponentially. We've had similar issues with the S3 DR tests, which in the end yielded in adding retries at the remote_storage level. Retries at the top level are not sufficient when one remote_storage "operation" is multiple network requests in a trench coat, especially when there is no notion of saving the progress: even if prior deletions had been successful, we'd still need to get a 404 in order to continue the loop and get to the point where we failed in the last iteration. Maybe we'll fail again but before we've even reached it. Retries at the bottom level avoid this issue because they have the notion of progress and also when one network operation fails, only that operation is retried. First part of #7931.	2024-06-06 14:21:27 +00:00
Joonas Koivunen	a8be07785e	fix: do TimelineMetrics::shutdown only once (#7983 ) Related to #7341 tenant deletion will end up shutting down timelines twice, once before actually starting and the second time when per timeline deletion is requested. Shutting down TimelineMetrics causes underflows. Add an atomic boolean and only do the shutdown once.	2024-06-06 14:20:54 +00:00
Yuchen Liang	630cfbe420	refactor(pageserver): designated api error type for cancelled request (#7949 ) Closes #7406. ## Problem When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is exposed to handle that case, which verbosely logs the error. However, we don't benefit from having the full backtrace provided by anyhow in this case. ## Summary of changes This PR introduces a new `ApiError` type to handle errors caused by cancelled request more robustly. - A new enum variant `ApiError::Cancelled` - Currently the cancelled request is mapped to status code 500. - Need to handle this error in proxy's `http_util` as well. - Added a failpoint test to simulate cancelled `get_lsn_by_timestamp` request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-06 14:00:14 +00:00
Christian Schwarz	0a65333fff	chore(walredo): avoid duplicate tenant_id and shard_slug fields (#7977 ) spotted during reviews of async walredo work in #6628	2024-06-06 15:10:16 +02:00
John Spray	91dd99038e	pageserver/controller: enable tenant deletion without attachment (#7957 ) ## Problem As described in #7952, the controller's attempt to reconcile a tenant before finally deleting it can get hung up waiting for the compute notification hook to accept updates. The fact that we try and reconcile a tenant at all during deletion is part of a more general design issue (#5080), where deletion was implemented as an operation on attached tenant, requiring the tenant to be attached in order to delete it, which is not in principle necessary. Closes: #7952 ## Summary of changes - In the pageserver deletion API, only do the traditional deletion path if the tenant is attached. If it's secondary, then tear down the secondary location, and then do a remote delete. If it's not attached at all, just do the remote delete. - In the storage controller, instead of ensuring a tenant is attached before deletion, do a best-effort detach of the tenant, and then call into some arbitrary pageserver to issue a deletion of remote content. The pageserver retains its existing delete behavior when invoked on attached locations. We can remove this later when all users of the API are updated to either do a detach-before-delete. This will enable removing the "weird" code paths during startup that sometimes load a tenant and then immediately delete it, and removing the deletion markers on tenants.	2024-06-05 20:22:54 +00:00
Christian Schwarz	83ab14e271	chore!: remove walredo_process_kind config option & kind type (#7756 ) refs https://github.com/neondatabase/neon/issues/7753 Preceding PR https://github.com/neondatabase/neon/pull/7754 laid out the plan, this one wraps it up.	2024-06-05 14:21:10 +02:00
Peter Bendel	85ef6b1645	upgrade pgvector from 0.7.0 to 0.7.1 (#7954 ) ## Problem ## Summary of changes performance improvements in pgvector 0.7.1 for hnsw index builds, see https://github.com/pgvector/pgvector/issues/570	2024-06-05 10:32:03 +02:00
Alex Chi Z	1a8d53ab9d	feat(pageserver): compute aux file size on initial logical size calculation (#7958 ) close https://github.com/neondatabase/neon/issues/7822 close https://github.com/neondatabase/neon/issues/7443 Aux file metrics is computed incrementally. If the size is not initialized, the metrics will never show up. This pull request adds the functionality to compute the aux file size on initial logical size calculation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-04 13:47:48 -04:00
Joonas Koivunen	3d6e389aa2	feat: support changing IndexPart::metadata_bytes to json in future release (#7693 ) ## Problem Currently we serialize the `TimelineMetadata` into bytes to put it into `index_part.json`. This `Vec<u8>` (hopefully `[u8; 512]`) representation was chosen because of problems serializing TimelineId and Lsn between different serializers (bincode, json). After #5335, the serialization of those types became serialization format aware or format agnostic. We've removed the pageserver local `metadata` file writing in #6769. ## Summary of changes Allow switching from the current serialization format to plain JSON for the legacy TimelineMetadata format in the future by adding a competitive serialization method to the current one (`crate::tenant::metadata::modern_serde`), which accepts both old bytes and new plain JSON. The benefits of this are that dumping the index_part.json with pretty printing no longer produces more than 500 lines of output, but after enabling it produces lines only proportional to the layer count, like: ```json { "version": ???, "layer_metadata": { ... }, "disk_consistent_lsn": "0/15FD5D8", "legacy_metadata": { "disk_consistent_lsn": "0/15FD5D8", "prev_record_lsn": "0/15FD5A0", "ancestor_timeline": null, "ancestor_lsn": "0/0", "latest_gc_cutoff_lsn": "0/149FD18", "initdb_lsn": "0/149FD18", "pg_version": 15 } } ``` In the future, I propose we completely stop using this legacy metadata type and wasting time trying to come up with another version numbering scheme in addition to the informative-only one already found in `index_part.json`, and go ahead with storing metadata or feature flags on the `index_part.json` itself. #7699 is the "one release after" changes which starts to produce metadata in the index_part.json as json.	2024-06-04 19:36:22 +03:00
Christian Schwarz	17116f2ea9	fix(pageserver): abort on duplicate layers, before doing damage (#7799 ) fixes https://github.com/neondatabase/neon/issues/7790 (duplicating most of the issue description here for posterity) # Background From the time before always-authoritative `index_part.json`, we had to handle duplicate layers. See the RFC for an illustration of how duplicate layers could happen: `a8e6d259cb/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md (L41-L50)` As of #5198 , we should not be exposed to that problem anymore. # Problem 1 We still have 1. [code in Pageserver](`82960b2175/pageserver/src/tenant/timeline.rs (L4502-L4521)`) than handles duplicate layers 2. [tests in the test suite](`d9dcbffac3/test_runner/regress/test_duplicate_layers.py (L15)`) that demonstrates the problem using a failpoint However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production. What is does instead is to return early with an `Ok()`, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised. That "return early" would be a bug in the routine if it happened in production. So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior. # Problem 2 Further, if production code _did_ (it nowawdays doesn't!) create a duplicate layer, the code in Pageserver that handles the condition (item 1 above) is too little and too late: * the code handles it by discarding the newer `struct Layer`; that's good. * however, on disk, we have already overwritten the old with the new layer file * the fact that we do it atomically doesn't matter because ... * if the new layer file is not bit-identical, then we have a cache coherency problem * PS PageCache block cache: caches old bit battern * blob_io offsets stored in variables, based on pre-overwrite bit pattern / offsets * => reading based on these offsets from the new file might yield different data than before # Solution - Remove the test suite code pertaining to Problem 1 - Move & rename test suite code that actually tests RFC-27 crash-consistent layer map. - Remove the Pageserver code that handles duplicate layers too late (Problem 1) - Use `RENAME_NOREPLACE` to prevent over-rename the file during `.finish()`, bail with an error if it happens (Problem 2) - This bailing prevents the caller from even trying to insert into the layer map, as they don't even get a `struct Layer` at hand. - Add `abort`s in the place where we have the layer map lock and check for duplicates (Problem 2) - Note again, we can't reach there because we bail from `.finish()` much earlier in the code. - Share the logic to clean up after failed `.finish()` between image layers and delta layers (drive-by cleanup) - This exposed that test `image_layer_rewrite` was overwriting layer files in place. Fix the test. # Future Work This PR adds a new failure scenario that was previously "papered over" by the overwriting of layers: 1. Start a compaction that will produce 3 layers: A, B, C 2. Layer A is `finish()`ed successfully. 3. Layer B fails mid-way at some `put_value()`. 4. Compaction bails out, sleeps 20s. 5. Some disk space gets freed in the meantime. 6. Compaction wakes from sleep, another iteration starts, it attempts to write Layer A again. But the `.finish()` fails because A already exists on disk. The failure in step 5 is new with this PR, and it causes the compaction to get stuck. Before, it would silently overwrite the file and "successfully" complete the second iteration. The mitigation for this is to `/reset` the tenant.	2024-06-04 16:16:23 +00:00
John Spray	fd22fc5b7d	pageserver: include heatmap in tenant deletion (#7928 ) ## Problem This was an oversight when adding heatmaps: because they are at the top level of the tenant, they aren't included in the catch-all list & delete that happens for timeline paths. This doesn't break anything, but it leaves behind a few kilobytes of garbage in the S3 bucket after a tenant is deleted, generating work for the scrubber. ## Summary of changes - During deletion, explicitly remove the heatmap file - In test_tenant_delete_smoke, upload a heatmap so that the test would fail its "remote storage empty after delete" check if we didn't delete it.	2024-06-04 16:16:50 +01:00
Joonas Koivunen	0112097e13	feat(rtc): maintain dirty and uploaded IndexPart (#7833 ) RemoteTimelineClient maintains a copy of "next IndexPart" as a number of fields which are like an IndexPart but this is not immediately obvious. Instead of multiple fields, maintain a `dirty` ("next IndexPart") and `clean` ("uploaded IndexPart") fields. Additional cleanup: - rename `IndexPart::disk_consistent_lsn` accessor `duplicated_disk_consistent_lsn` - no one except scrubber should be looking at it, even scrubber is a stretch - remove usage elsewhere (pagectl used by tests, metadata scan endpoint) - serialize index part before the index upload operation - avoid upload operation being retried because of serialization error - serialization error is fatal anyway for timeline -- it can only make transient local progress after that, at least the error is bubbled up now - gather exploded IndexPart fields into single actual `UploadQueueInitialized::dirty` of which the uploaded snapshot is serialized - implement the long wished monotonicity check with the `clean` IndexPart with an assertion which is not expected to fire Continued work from #7860 towards next step of #6994.	2024-06-04 17:27:08 +03:00

1 2 3 4 5 ...

5414 Commits