rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 22:12:56 +00:00

Author	SHA1	Message	Date
Arpad Müller	3d582b212a	Add archival_config endpoint to pageserver (#8414 ) This adds an archival_config endpoint to the pageserver. Currently it has no effect, and always "works", but later the intent is that it will make a timeline archived/unarchived. - [x] add yml spec - [x] add endpoint handler Part of https://github.com/neondatabase/neon/issues/8088	2024-07-19 21:01:59 +02:00
John Spray	44781518d0	storage scrubber: GC ancestor shard layers (#8196 ) ## Problem After a shard split, the pageserver leaves the ancestor shard's content in place. It may be referenced by child shards, but eventually child shards will de-reference most ancestor layers as they write their own data and do GC. We would like to eventually clean up those ancestor layers to reclaim space. ## Summary of changes - Extend the physical GC command with `--mode=full`, which includes cleaning up unreferenced ancestor shard layers - Add test `test_scrubber_physical_gc_ancestors` - Remove colored log output: in testing this is irritating ANSI code spam in logs, and in interactive use doesn't add much. - Refactor storage controller API client code out of storcon_client into a `storage_controller/client` crate - During physical GC of ancestors, call into the storage controller to check that the latest shards seen in S3 reflect the latest state of the tenant, and there is no shard split in progress.	2024-07-19 19:07:59 +03:00
Joonas Koivunen	324e4e008f	feat(storcon): timeline detach ancestor passthrough (#8353 ) Currently storage controller does not support forwarding timeline detach ancestor requests to pageservers. Add support for forwarding `PUT .../:tenant_id/timelines/:timeline_id/detach_ancestor`. Implement the support mostly as is, because the timeline detach ancestor will be made (mostly) idempotent in future PR. Cc: #6994	2024-07-15 18:08:24 +03:00
Christian Schwarz	e26ef640c1	pageserver: remove `trace_read_requests` (#8338 ) `trace_read_requests` is a per `Tenant`-object option. But the `handle_pagerequests` loop doesn't know which `Tenant` object (i.e., which shard) the request is for. The remaining use of the `Tenant` object is to check `tenant.cancel`. That check is incorrect [if the pageserver hosts multiple shards](https://github.com/neondatabase/neon/issues/7427#issuecomment-2220577518). I'll fix that in a future PR where I completely eliminate the holding of `Tenant/Timeline` objects across requests. See [my code RFC](https://github.com/neondatabase/neon/pull/8286) for the high level idea. Note that we can always bring the tracing functionality if we need it. But since it's actually about logging the `page_service` wire bytes, it should be a `page_service`-level config option, not per-Tenant. And for enabling tracing on a single connection, we can implement a `set pageserver_trace_connection;` option.	2024-07-11 15:17:07 +02:00
Arpad Müller	e78341e1c2	Remove ImageCompressionAlgorithm::DisabledNoDecompress (#8300 ) Removes the `ImageCompressionAlgorithm::DisabledNoDecompress` variant. We now assume any blob with the specific bits set is actually a compressed blob. The `ImageCompressionAlgorithm::Disabled` variant still remains and is the new default. Reverts large parts of #8238 , as originally intended in that PR. Part of #5431	2024-07-10 18:09:19 +02:00
Yuchen Liang	a68edad913	refactor: move part of sharding API from `pageserver_api` to `utils` (#8254 ) ## Problem LSN Leases introduced in #8084 is a new API that is made shard-aware from day 1. To support ephemeral endpoint in #7994 without linking Postgres C API against `compute_ctl`, part of the sharding needs to reside in `utils`. ## Summary of changes - Create a new `shard` module in utils crate. - Move more interface related part of tenant sharding API to utils and re-export them in pageserver_api. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-08 15:43:10 +01:00
Arpad Müller	b8d031cd0c	Improve parsing of `ImageCompressionAlgorithm` (#8281 ) Improve parsing of the `ImageCompressionAlgorithm` enum to allow level customization like `zstd(1)`, as strum only takes `Default::default()`, i.e. `None` as the level. Part of #5431	2024-07-05 20:18:05 +00:00
Arpad Müller	adde0ecfe0	Flatten compression algorithm setting (#8265 ) This flattens the compression algorithm setting, removing the `Option<_>` wrapping layer and making handling of the setting easier. It also adds a specific setting for disabled compression with the continued ability to read copmressed data, giving us the option to more easily back out of a compression rollout, should the need arise, which was one of the limitations of #8238. Implements my suggestion from https://github.com/neondatabase/neon/pull/8238#issuecomment-2206181594 , inspired by Christian's review in https://github.com/neondatabase/neon/pull/8238#pullrequestreview-2156460268 . Part of #5431	2024-07-04 16:59:19 +00:00
Yuchen Liang	19accfee4e	feat(pageserver): integrate lsn lease into synthetic size (#8220 ) Part of #7497, closes #8071. (accidentally closed #8208, reopened here) ## Problem After the changes in #8084, we need synthetic size to also account for leased LSNs so that users do not get free retention by running a small ephemeral endpoint for a long time. ## Summary of changes This PR integrates LSN leases into the synthetic size calculation. We model leases as read-only branches started at the leased LSN (except it does not have a timeline id). Other changes: - Add new unit tests testing whether a lease behaves like a read-only branch. - Change `/size_debug` response to include lease point in the SVG visualization. - Fix `/lsn_lease` HTTP API to do proper parsing for POST. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-04 15:09:05 +00:00
John Spray	778787d8e9	pageserver: add supplementary branch usage stats (#8131 ) ## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: https://github.com/neondatabase/neon/issues/8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in https://github.com/neondatabase/neon/pull/8245 than this PR is adding.	2024-07-03 22:29:43 +01:00
Alex Chi Z	891cb5a9a8	fix(pageserver): comments about metadata key range (#8236 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-02 16:54:32 +00:00
Arpad Müller	25eefdeb1f	Add support for reading and writing compressed blobs (#8106 ) Add support for reading and writing zstd-compressed blobs for use in image layer generation, but maybe one day useful also for delta layers. The reading of them is unconditional while the writing is controlled by the `image_compression` config variable allowing for experiments. For the on-disk format, we re-use some of the bitpatterns we currently keep reserved for blobs larger than 256 MiB. This assumes that we have never ever written any such large blobs to image layers. After the preparation in #7852, we now are unable to read blobs with a size larger than 256 MiB (or write them). A non-goal of this PR is to come up with good heuristics of when to compress a bitpattern. This is left for future work. Parts of the PR were inspired by #7091. cc #7879 Part of #5431	2024-07-02 14:14:12 +00:00
Vlad Lazar	28929d9cfa	pageserver: rate limit log for loads of layers visited (#8228 ) ## Problem At high percentiles we see more than 800 layers being visited by the read path. We need the tenant/timeline to investigate. ## Summary of changes Add a rate limited log line when the average number of layers visited per key is in the last specified histogram bucket. I plan to use this to identify tenants in us-east-2 staging that exhibit this behaviour. Will revert before next week's release.	2024-07-02 14:14:10 +01:00
John Spray	063553a51b	pageserver: remove tenant create API (#8135 ) ## Problem For some time, we have created tenants with calls to location_conf. The legacy "POST /v1/tenant" path was only used in some tests. ## Summary of changes - Remove the API - Relocate TenantCreateRequest to the controller API file (this used to be used in both pageserver and controller APIs) - Rewrite tenant_create test helper to use location_config API, as control plane and storage controller do - Update docker-compose test script to create tenants with location_config API (this small commit is also present in https://github.com/neondatabase/neon/pull/7947)	2024-06-28 09:14:19 +01:00
John Spray	c39d5b03e8	pageserver: remove legacy tenant config code, clean up redundant generation none/broken usages (#7947 ) ## Problem In https://github.com/neondatabase/neon/pull/5299, the new config-v1 tenant config file was added to hold the LocationConf type. We left the old config file in place for forward compat, and because running without generations (therefore without LocationConf) as still useful before the storage controller was ready for prime-time. Closes: https://github.com/neondatabase/neon/issues/5388 ## Summary of changes - Remove code for reading and writing the legacy config file - Remove Generation::Broken: it was unused. - Treat missing config file on disk as an error loading a tenant, rather than defaulting it. We can now remove LocationConf::default, and thereby guarantee that we never construct a tenant with a None generation. - Update some comments + add some assertions to clarify that Generation::None is only used in layer metadata, not in the state of a running tenant. - Update docker compose test to create tenants with a generation	2024-06-26 19:53:59 +00:00
Alex Chi Z	76864e6a2a	feat(pageserver): add image layer iterator (#8006 ) part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes This pull request adds the image layer iterator. It buffers a fixed amount of key-value pairs in memory, and give developer an iterator abstraction over the image layer. Once the buffer is exhausted, it will issue 1 I/O to fetch the next batch. Due to the Rust lifetime mysteries, the `get_stream_from` API has been refactored to `into_stream` and consumes `self`. Delta layer iterator implementation will be similar, therefore I'll add it after this pull request gets merged. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-25 20:49:29 +00:00
John Spray	07f21dd6b6	pageserver: remove attach/detach apis (#8134 ) ## Problem These APIs have been deprecated for some time, but were still used from test code. Closes: https://github.com/neondatabase/neon/issues/4282 ## Summary of changes - It is still convenient to do a "tenant_attach" from a test without having to write out a location_conf body, so those test methods have been retained with implementations that call through to their location_conf equivalent.	2024-06-25 17:38:06 +01:00
John Spray	59f949b4a8	pageserver: remove unused load/ignore APIs (#8122 ) ## Problem These APIs have be unused for some time. They were superseded by /location_conf: the equivalent of ignoring a tenant is now to put it in secondary mode. ## Summary of changes - Remove APIs - Remove tests & helpers that used them - Remove error variants that are no longer needed.	2024-06-21 10:02:15 +00:00
Vlad Lazar	5778d714f0	storcon: add drain and fill background operations for graceful cluster restarts (#8014 ) ## Problem Pageserver restarts cause read availablity downtime for tenants. See `Motivation` section in the [RFC](https://github.com/neondatabase/neon/pull/7704). ## Summary of changes * Introduce a new `NodeSchedulingPolicy`: `PauseForRestart` * Implement the first take of drain and fill algorithms * Add a node status endpoint which can be polled to figure out when an operation is done The implementation follows the RFC, so it might be useful to peek at it as you're reviewing. Since the PR is rather chunky, I've made sure all commits build (with warnings), so you can review by commit if you prefer that. RFC: https://github.com/neondatabase/neon/pull/7704 Related https://github.com/neondatabase/neon/issues/7387	2024-06-19 11:55:30 +01:00
Yuchen Liang	30b890e378	feat(pageserver): use leases to temporarily block gc (#8084 ) Part of #7497, extracts from #7996, closes #8063. ## Problem With the LSN lease API introduced in https://github.com/neondatabase/neon/issues/7808, we want to implement the real lease logic so that GC will keep all the layers needed to reconstruct all pages at all the leased LSNs with valid leases at a given time. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-18 17:37:06 +00:00
Alex Chi Z	81892199f6	chore(pageserver): vectored get target_keyspace directly accums (#8055 ) follow up on https://github.com/neondatabase/neon/pull/7904 avoid a layer of indirection introduced by `Vec<Range<Key>>` Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-14 11:57:58 -04:00
Konstantin Knizhnik	7006caf3a1	Store logical replication origin in KV storage (#7099 ) Store logical replication origin in KV storage ## Problem See #6977 ## Summary of changes * Extract origin_lsn from commit WAl record * Add ReplOrigin key to KV storage and store origin_lsn * In basebackup replace snapshot origin_lsn with last committed origin_lsn at basebackup LSN ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-06-03 19:37:33 +03:00
Arpad Müller	acf0a11fea	Move keyspace utils to inherent impls (#7929 ) The keyspace utils like `is_rel_size_key` or `is_rel_fsm_block_key` and many others are free functions and have to be either imported separately or specified with the full path starting in `pageserver_api:🔑:`. This is less convenient than if these functions were just inherent impls. Follow-up of #7890 Fixes #6438	2024-06-03 16:18:07 +02:00
Joonas Koivunen	ef83f31e77	pagectl: key command for dumping what we know about the key (#7890 ) What we know about the key via added `pagectl key $key` command: - debug formatting - shard placement when `--shard-count` is specified - different boolean queries in `key.rs` - aux files v2 Example: ``` $ cargo run -qp pagectl -- key 000000063F00004005000060270000100E2C parsed from hex: 000000063F00004005000060270000100E2C: Key { field1: 0, field2: 1599, field3: 16389, field4: 24615, field5: 0, field6: 1052204 } rel_block: true rel_vm_block: false rel_fsm_block: false slru_block: false inherited: true rel_size: false slru_segment_size: false recognized kind: None ```	2024-05-31 18:19:41 +00:00
Alex Chi Z	1eca8b8a6b	fix(pageserver): ensure to_i128 works for metadata keys (#7895 ) field2 of metadata keys can be 0xFFFF because of the mapping. Allow 0xFFFF for `to_i128`. An alternative is to encode 0xFFFF as 0xFFFFFFFF (which is allowed in the original `to_i128`). But checking the places where field2 is referenced, the rest part of the system does not seem to depend on this assertion. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-30 10:03:17 -04:00
Alex Chi Z	ff560a1113	chore(pageserver): use kebab case for compaction algorithms (#7845 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 21:28:47 +00:00
Alex Chi Z	ddd8ebd253	chore(pageserver): use kebab case for aux file flag (#7840 ) part of https://github.com/neondatabase/neon/issues/7462 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-22 17:06:00 +00:00
Arpad Müller	679e031cf6	Add dummy lsn lease http and page service APIs (#7815 ) We want to introduce a concept of temporary and expiring LSN leases. This adds both a http API as well as one for the page service to obtain temporary LSN leases. This adds a dummy implementation to unblock integration work of this API. A functional implementation of the lease feature is deferred to a later step. Fixes #7808 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-21 23:31:20 +02:00
Joonas Koivunen	a8a88ba7bc	test(detach_ancestor): ensure L0 compaction in history is ok (#7813 ) detaching a timeline from its ancestor can leave the resulting timeline with more L0 layers than the compaction threshold. most of the time, the detached timeline has made progress, and next time the L0 -> L1 compaction happens near the original branch point and not near the last_record_lsn. add a test to ensure that inheriting the historical L0s does not change fullbackup. additionally: - add `wait_until_completed` to test-only timeline checkpoint and compact HTTP endpoints. with `?wait_until_completed=true` the endpoints will wait until the remote client has completed uploads. - for delta layers, describe L0-ness with the `/layer` endpoint Cc: #6994	2024-05-21 20:08:43 +03:00
Alex Chi Z	6810d2aa53	feat(pageserver): do not read past image layers for vectored get (#7773 ) ## Problem Part of https://github.com/neondatabase/neon/issues/7462 On metadata keyspace, vectored get will not stop if a key is not found, and will read past the image layer. However, the semantics is different from single get, because if a key does not exist in the image layer, it means that the key does not exist in the past, or have been deleted. This pull request fixed it by recording image layer coverage during the vectored get process and stop when the full keyspace is covered by an image layer. A corresponding test case is added to ensure generating image layer reduces the number of delta layers. This optimization (or bug fix) also applies to rel block keyspaces. If a key is missing, we can know it's missing once the first image layer is reached. Page server will not attempt to read lower layers, which potentially incurs layer downloads + evictions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-20 14:24:18 -04:00
Alex Chi Z	e1a9669d05	feat(pagebench): add aux file bench (#7746 ) part of https://github.com/neondatabase/neon/issues/7462 ## Summary of changes This pull request adds two APIs to the pageserver management API: list_aux_files and ingest_aux_files. The aux file pagebench is intended to be used on an empty timeline because the data do not go through the safekeeper. LSNs are advanced by 8 for each ingestion, to avoid invariant checks inside the pageserver. For now, I only care about space amplification / read amplification, so the bench is designed in a very simple way: ingest 10000 files, and I will manually dump the layer map to analyze. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-17 20:04:02 +00:00
Alex Chi Z	aaf60819fa	feat(pageserver): persist aux file policy in index part (#7668 ) Part of https://github.com/neondatabase/neon/issues/7462 ## Summary of changes Tenant config is not persisted unless it's attached on the storage controller. In this pull request, we persist the aux file policy flag in the `index_part.json`. Admins can set `switch_aux_file_policy` in the storage controller or using the page server API. Upon the first aux file gets written, the write path will compare the aux file policy target with the current policy. If it is switch-able, we will do the switch. Otherwise, the original policy will be used. The test cases show what the admins can do / cannot do. The `last_aux_file_policy` is stored in `IndexPart`. Updates to the persisted policy are done via `schedule_index_upload_for_aux_file_policy_update`. On the write path, the writer will update the field. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-17 19:22:49 +00:00
John Spray	c84656a53e	pageserver: implement auto-splitting (#7681 ) ## Problem Currently tenants are only split into multiple shards if a human being calls the API to do it. Issue: #7388 ## Summary of changes - Add a pageserver API for returning the top tenants by size - Add a step to the controller's background loop where if there is no reconciliation or optimization to be done, it looks for things to split. - Add a test that runs pgbench on many tenants concurrently, and checks that splitting happens as expected as tenants grow, without interrupting the client I/O. This PR is quite basic: there is a tasklist in https://github.com/neondatabase/neon/issues/7388 for further work. This PR is meant to be safe (off by default), and sufficient to enable our staging environment to run lots of sharded tenants without a human having to set them up.	2024-05-17 16:01:24 +00:00
Jure Bajic	affc18f912	Add performance regress `test_ondemand_download_churn.py` (#7242 ) Add performance regress test for on-demand download throughput. Closes https://github.com/neondatabase/neon/issues/7146 Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-05-15 18:41:12 +02:00
Christian Schwarz	c3dd646ab3	chore!: always use async walredo, warn if sync is configured (#7754 ) refs https://github.com/neondatabase/neon/issues/7753 This PR is step (1) of removing sync walredo from Pageserver. Changes: * Remove the sync impl * If sync is configured, warn! and use async instead * Remove the metric that exposes `kind` * Remove the tenant status API that exposes `kind` Future Work ----------- After we've released this change to prod and are sure we won't roll back, we will 1. update the prod Ansible to remove the config flag from the prod pageserver.toml. 2. remove the remaining `kind` code in pageserver These two changes need no release inbetween. See https://github.com/neondatabase/neon/issues/7753 for details.	2024-05-15 15:04:52 +02:00
Alex Chi Z	017c34b773	feat(pageserver): generate basebackup from aux file v2 storage (#7517 ) This pull request adds the new basebackup read path + aux file write path. In the regression test, all logical replication tests are run with matrix aux_file_v2=false/true. Also fixed the vectored get code path to correctly return missing key error when being called from the unified sequential get code path. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-07 16:30:18 +00:00
Joonas Koivunen	3c9b484c4d	feat: Timeline detach ancestor (#7456 ) ## Problem Timelines cannot be deleted if they have children. In many production cases, a branch or a timeline has been created off the main branch for various reasons to the effect of having now a "new main" branch. This feature will make it possible to detach a timeline from its ancestor by inheriting all of the data before the branchpoint to the detached timeline and by also reparenting all of the ancestor's earlier branches to the detached timeline. ## Summary of changes - Earlier added copy_lsn_prefix functionality is used - RemoteTimelineClient learns to adopt layers by copying them from another timeline - LayerManager adds support for adding adopted layers - `timeline::Timeline::{prepare_to_detach,complete_detaching}_from_ancestor` and `timeline::detach_ancestor` are added - HTTP PUT handler Cc: #6994 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-05-07 13:47:57 +03:00
John Spray	af849a1f61	pageserver: post-shard-split layer trimming (1/2) (#7572 ) ## Problem After a shard split of a large existing tenant, child tenants can end up with oversized historic layers indefinitely, if those layers are prevented from being GC'd by branchpoints. This PR is followed by https://github.com/neondatabase/neon/pull/7531 Related issue: https://github.com/neondatabase/neon/issues/7504 ## Summary of changes - Add a new compaction phase `compact_shard_ancestors`, which identifies layers that are no longer needed after a shard split. - Add a Timeline->LayerMap code path called `rewrite_layers` , which is currently only used to drop layers, but will later be used to rewrite them as well in https://github.com/neondatabase/neon/pull/7531 - Add a new test that compacts after a split, and checks that something is deleted. Note that this doesn't have much impact on a tenant's resident size (since unused layers would end up evicted anyway), but it: - Makes index_part.json much smaller - Makes the system easier to reason about: avoid having tenants which are like "my physical size is 4TiB but don't worry I'll never actually download it", instead have tenants report the real physical size of what they might download. Why do we remove these layers in compaction rather than during the split? Because we have existing split tenants that need cleaning up. We can add it to the split operation in future as an optimization.	2024-05-07 11:15:58 +01:00
Christian Schwarz	1e7cd6ac9f	refactor: move `NodeMetadata` to `pageserver_api`; use it from `neon_local` (#7606 ) This is the first step towards representing all of Pageserver configuration as clean `serde::Serialize`able Rust structs in `pageserver_api`. The `neon_local` code will then use those structs instead of the crude `toml_edit` / string concatenation that it does today. refs https://github.com/neondatabase/neon/issues/7555 --------- Co-authored-by: Alex Chi Z <iskyzh@gmail.com>	2024-05-03 13:15:38 -04:00
Alex Chi Z	a3fe12b6d8	feat(pageserver): add scan interface (#7468 ) This pull request adds the scan interface. Scan operates on a sparse keyspace and retrieves all the key-value pairs from the keyspaces. Currently, scan only supports the metadata keyspace, and by default do not retrieve anything from the ancestor branch. This should be fixed in the future if we need to have some keyspaces that inherits from the parent. The scan interface reuses the vectored get code path by disabling the missing key errors. This pull request also changes the behavior of vectored get on aux file v1/v2 key/keyspace: if the key is not found, it is simply not included in the result, instead of throwing a missing key error. TODOs in future pull requests: limit memory consumption, ensure the search stops when all keys are covered by the image layer, remove `#[allow(dead_code)]` once the code path is used in basebackups / aux files, remove unnecessary fine-grained keyspace tracking in vectored get (or have another code path for scan) to improve performance. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-05-03 10:43:30 -04:00
Arpad Müller	426598cf76	Update rust to 1.78.0 (#7598 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. Release notes: https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html Prior update was in #7198	2024-05-03 15:59:28 +02:00
Arpad Müller	7a49e5d5c2	Remove tenant_id from TenantLocationConfigRequest (#7469 ) Follow-up of #7055 and #7476 to remove `tenant_id` from `TenantLocationConfigRequest` completely. All components of our system should now not specify the `tenant_id`. cc https://github.com/neondatabase/cloud/pull/11791	2024-05-02 20:18:13 +02:00
Alex Chi Z	45c625fb34	feat(pageserver): separate sparse and dense keyspace (#7503 ) extracted (and tested) from https://github.com/neondatabase/neon/pull/7468, part of https://github.com/neondatabase/neon/issues/7462. The current codebase assumes the keyspace is dense -- which means that if we have a keyspace of 0x00-0x100, we assume every key (e.g., 0x00, 0x01, 0x02, ...) exists in the storage engine. However, the assumption does not hold any more in metadata keyspace. The metadata keyspace is sparse. It is impossible to do per-key check. Ideally, we should not have the assumption of dense keyspace at all, but this would incur a lot of refactors. Therefore, we split the keyspaces we have to dense/sparse and handle them differently in the code for now. At some point in the future, we should assume all keyspaces are sparse. ## Summary of changes * Split collect_keyspace to return dense+sparse keyspace. * Do not allow generating image layers for sparse keyspace (for now -- will fix this next week, we need image layers anyways). * Generate delta layers for sparse keyspace. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-30 09:39:10 -04:00
John Spray	577982b778	pageserver: remove workarounds from #7454 (#7550 ) PR #7454 included a workaround that let any existing bugged databases start up. Having used that already, we may now Closes: https://github.com/neondatabase/neon/issues/7480	2024-04-30 11:04:54 +01:00
John Spray	574645412b	pageserver: shard-aware keyspace partitioning (#6778 ) ## Problem Followup to https://github.com/neondatabase/neon/pull/6776 While #6776 makes compaction safe on sharded tenants, the logic for keyspace partitioning remains inefficient: it assumes that the size of data on a pageserver can be calculated simply as the range between start and end of a Range -- this is not the case in sharded tenants, where data within a range belongs to a variety of shards. Closes: https://github.com/neondatabase/neon/issues/6774 ## Summary of changes I experimented with using a sharding-aware range type in KeySpace to replace all the Range<Key> uses, but the impact on other code was quite large (many places use the ranges), and not all of them need this property of being able to approximate the physical size of data within a key range. So I compromised on expressing this as a ShardedRange type, but only using that type selctively: during keyspace repartition, and in tiered compaction when accumulating key ranges. - keyspace partitioning methods take sharding parameters as an input - new `ShardedRange` type wraps a Range<Key> and a shard identity - ShardedRange::page_count is the shard-aware replacement for key_range_size - Callers that don't need to be shard-aware (e.g. vectored get code that just wants to count the number of keys in a keyspace) can use ShardedRange::raw_size to get the faster, shard-naive code (same as old `key_range_size`) - Compaction code is updated to carry a shard identity so that it can use shard aware calculations - Unit tests for the new fragmentation logic. - Add a test for compaction on sharded tenants, that validates that we generate appropriately sized image layers (this fails before fixing keyspace partitioning)	2024-04-29 17:46:46 +00:00
John Spray	b655c7030f	neon_local: add "tenant import" (#7399 ) ## Problem Sometimes we have test data in the form of S3 contents that we would like to run live in a neon_local environment. ## Summary of changes - Add a storage controller API that imports an existing tenant. Currently this is equivalent to doing a create with a high generation number, but in future this would be something smarter to probe S3 to find the shards in a tenant and find generation numbers. - Add a `neon_local` command that invokes the import API, and then inspects timelines in the newly attached tenant to create matching branches.	2024-04-29 08:52:18 +01:00
Alex Chi Z	ee3437cbd8	chore(pageserver): shrink aux keyspace to 0x60-0x7F (#7502 ) extracted from https://github.com/neondatabase/neon/pull/7468, part of https://github.com/neondatabase/neon/issues/7462. In the page server, we use i128 (instead of u128) to do the integer representation of the key, which indicates that the highest bit of the key should not be 1. This constraints our keyspace to <= 0x7F. Also fix the bug of `to_i128` that dropped the highest 4b. Now we keep 3b of them, dropping the sign bit. And on that, we shrink the metadata keyspace to 0x60-0x7F for now, and once we add support for u128, we can have a larger metadata keyspace. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-26 13:35:01 -04:00
Alex Chi Z	dbe0aa653a	feat(pageserver): add aux-file-v2 flag on tenant level (#7505 ) Changing metadata format is not easy. This pull request adds a tenant-level flag on whether to enable aux file v2. As long as we don't roll this out to the user and guarantee our staging projects can persist tenant config correctly, we can test the aux file v2 change with setting this flag. Previous discussion at https://github.com/neondatabase/neon/pull/7424. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-26 11:48:47 -04:00
Heikki Linnakangas	4917f52c88	Server support for new pagestream protocol version (#7377 ) In the old protocol version, the client sent with each request: - latest: bool. If true, the client requested the latest page version, and the 'lsn' was just a hint of when the page was last modified - lsn: Lsn, the page version to return This protocol didn't allow requesting a page at a particular non-latest LSN and also sending a hint on when the page was last modified. That put a read only compute into an awkward position where it had to either request each page at the replay-LSN, which could be very close to the last LSN written in the primary and therefore require the pageserver to wait for it to arrive, or an older LSN which could already be garbage collected in the pageserver, resulting in an error. The new protocol version fixes that by allowing a read only compute to send both LSNs. To use the new protocol version, use "pagestream_v2" command instead of just "pagestream". The old protocol version is still supported, for compatibility with old computes (and in fact there is no client support yet, it is added by the next commit).	2024-04-25 20:45:37 +03:00
Vlad Lazar	e4a279db13	pageserver: coalesce read paths (#7477 ) ## Problem We are currently supporting two read paths. No bueno. ## Summary of changes High level: use vectored read path to serve get page requests - gated by `get_impl` config Low level: 1. Add ps config, `get_impl` to specify which read path to use when serving get page requests 2. Fix base cached image handling for the vectored read path. This was subtly broken: previously we would not mark keys that went past their cached lsn as complete. This is a self standing change which could be its own PR, but I've included it here because writing separate tests for it is tricky. 3. Fork get page to use either the legacy or vectored implementation 4. Validate the use of vectored read path when serving get page requests against the legacy implementation. Controlled by `validate_vectored_get` ps config. 5. Use the vectored read path to serve get page requests in tests (with validation). ## Note Since the vectored read path does not go through the page cache to read buffers, this change also amounts to a removal of the buffer page cache. Materialized page cache is still used.	2024-04-25 13:29:17 +01:00

1 2 3 4

178 Commits