rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-24 16:40:38 +00:00

Author	SHA1	Message	Date
Arseny Sher	6f20a18e8e	Allow to change compute safekeeper list without restart. - Add --safekeepers option to neon_local reconfigure - Add it to python Endpoint reconfigure - Implement config reload in walproposer by restarting the whole bgw when safekeeper list changes. ref https://github.com/neondatabase/neon/issues/6341	2024-06-27 15:08:35 +03:00
Vlad Lazar	d557002675	strocon: don't overcommit when making node fill plan (#8171 ) ## Problem The fill requirement was not taken into account when looking through the shards of a given node to fill from. ## Summary of Changes Ensure that we do not fill a node past the recommendation from `Scheduler::compute_fill_requirement`.	2024-06-27 11:56:57 +01:00
Cihan Demirci	32b75e7c73	CI: additional trigger on merge to main (#8176 ) Before we consolidate workflows we want to be triggered by merges to main. https://github.com/neondatabase/cloud/issues/14862	2024-06-26 22:36:41 +00:00
Heikki Linnakangas	d2753719e3	test: Add helper function for importing a Postgres cluster (#8025 ) Also, modify the "neon_local timeline import" command so that it doesn't create the endpoint any more. I don't see any reason to bundle that in the same command, the "timeline create" and "timeline branch" commands don't do that either. I plan to add more tests similar to 'test_import_at_2bil', this will help to reduce the copy-pasting.	2024-06-26 21:54:29 +00:00
Alex Chi Z	04b2ac3fed	test: use aux file v2 policy in benchmarks (#8174 ) Use aux file v2 in benchmarks. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-26 20:33:15 +00:00
John Spray	c39d5b03e8	pageserver: remove legacy tenant config code, clean up redundant generation none/broken usages (#7947 ) ## Problem In https://github.com/neondatabase/neon/pull/5299, the new config-v1 tenant config file was added to hold the LocationConf type. We left the old config file in place for forward compat, and because running without generations (therefore without LocationConf) as still useful before the storage controller was ready for prime-time. Closes: https://github.com/neondatabase/neon/issues/5388 ## Summary of changes - Remove code for reading and writing the legacy config file - Remove Generation::Broken: it was unused. - Treat missing config file on disk as an error loading a tenant, rather than defaulting it. We can now remove LocationConf::default, and thereby guarantee that we never construct a tenant with a None generation. - Update some comments + add some assertions to clarify that Generation::None is only used in layer metadata, not in the state of a running tenant. - Update docker compose test to create tenants with a generation	2024-06-26 19:53:59 +00:00
Arthur Petukhovsky	76fc3d4aa1	Evict WAL files from disk (#8022 ) Fixes https://github.com/neondatabase/neon/issues/6337 Add safekeeper support to switch between `Present` and `Offloaded(flush_lsn)` states. The offloading is disabled by default, but can be controlled using new cmdline arguments: ``` --enable-offload Enable automatic switching to offloaded state --delete-offloaded-wal Delete local WAL files after offloading. When disabled, they will be left on disk --control-file-save-interval <CONTROL_FILE_SAVE_INTERVAL> Pending updates to control file will be automatically saved after this interval [default: 300s] ``` Manager watches state updates and detects when there are no actvity on the timeline and actual partial backup upload in remote storage. When all conditions are met, the state can be switched to offloaded. In `timeline.rs` there is `StateSK` enum to support switching between states. When offloaded, code can access only control file structure and cannot use `SafeKeeper` to accept new WAL. `FullAccessTimeline` is now renamed to `WalResidentTimeline`. This struct contains guard to notify manager about active tasks requiring on-disk WAL access. All guards are issued by the manager, all requests are sent via channel using `ManagerCtl`. When manager receives request to issue a guard, it unevicts timeline if it's currently evicted. Fixed a bug in partial WAL backup, it used `term` instead of `last_log_term` previously. After this commit is merged, next step is to roll this change out, as in issue #6338.	2024-06-26 18:58:56 +01:00
Vlad Lazar	dd3adc3693	docker: downgrade openssl to 1.1.1w (#8168 ) ## Problem We have seen numerous segfault and memory corruption issue for clients using libpq and openssl 3.2.2. I don't know if this is a bug in openssl or libpq. Downgrading to 1.1.1w fixes the issues for the storage controller and pgbench. ## Summary of Changes: Use openssl 1.1.1w instead of 3.2.2	2024-06-26 17:27:23 +00:00
Heikki Linnakangas	5b871802fd	Add counters for commands processed through the libpq page service API (#8089 ) I was looking for metrics on how many computes are still using protocol version 1 and 2. This provides counters for that as "pagestream" and "pagestream_v2" commands, but also all the other commands. The new metrics are global for the whole pageserver instance rather than per-tenant, so the additional metrics bloat should be fairly small.	2024-06-26 19:53:03 +03:00
Heikki Linnakangas	24ce73ffaf	Silence compiler warning (#8153 ) I saw this compiler warning on my laptop: pgxn/neon_walredo/walredoproc.c:178:10: warning: using the result of an assignment as a condition without parentheses [-Wparentheses] if (err = close_range_syscall(3, ~0U, 0)) { ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pgxn/neon_walredo/walredoproc.c:178:10: note: place parentheses around the assignment to silence this warning if (err = close_range_syscall(3, ~0U, 0)) { ^ ( ) pgxn/neon_walredo/walredoproc.c:178:10: note: use '==' to turn this assignment into an equality comparison if (err = close_range_syscall(3, ~0U, 0)) { ^ == 1 warning generated. I'm not sure what compiler version or options cause that, but it's a good warning. Write the call a little differently, to avoid the warning and to make it a little more clear anyway. (The 'err' variable wasn't used for anything, so I'm surprised we were not seeing a compiler warning on the unused value, too.)	2024-06-26 19:19:27 +03:00
Arthur Petukhovsky	3118c24521	Panic on unexpected error in simtests (#8169 )	2024-06-26 16:46:14 +01:00
Alexander Bayandin	5af9660b9e	CI(build-tools): don't install Postgres 14 (#6540 ) ## Problem We install Postgres 14 in `build-tools` image, but we don't need it. We use Postgres binaries, which we build ourselves. ## Summary of changes - Remove Postgresql 14 installation from `build-tools` image	2024-06-26 16:37:04 +01:00
Conrad Ludgate	d7e349d33c	proxy: report blame for passthrough disconnect io errors (#8170 ) ## Problem Hard to debug the disconnection reason currently. ## Summary of changes Keep track of error-direction, and therefore error source (client vs compute) during passthrough.	2024-06-26 15:11:26 +00:00
Arthur Petukhovsky	47e5bf3bbb	Improve term reject message in walproposer (#8164 ) Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-06-26 15:26:52 +01:00
Alex Chi Z	5d2f9ffa89	test(bottom-most-compaction): wal apply order (#8163 ) A follow-up on https://github.com/neondatabase/neon/pull/8103/. Previously, main branch fails with: ``` assertion `left == right` failed left: b"value 3@0x10@0x30@0x28@0x40" right: b"value 3@0x10@0x28@0x30@0x40" ``` This gets fixed after #8103 gets merged. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-26 09:34:41 -04:00
Heikki Linnakangas	fdadd6a152	Remove primary_is_running (#8162 ) This was a half-finished mechanism to allow a replica to enter hot standby mode sooner, without waiting for a running-xacts record. It had issues, and we are working on a better mechanism to replace it. The control plane might still set the flag in the spec file, but compute_ctl will simply ignore it.	2024-06-26 15:13:03 +03:00
Peter Bendel	9b623d3a2c	add commit hash to S3 object identifier for artifacts on S3 (#8161 ) In future we may want to run periodic tests on dedicated cloud instances that are not GitHub action runners. To allow these to download artifact binaries for a specific commit hash we want to make the search by commit hash possible and prefix the S3 objects with `artifacts/${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}` --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-06-26 05:46:52 +00:00
Alex Chi Z	9b98823d61	bottom-most-compaction: use in test_gc_feedback + fix bugs (#8103 ) Adds manual compaction trigger; add gc compaction to test_gc_feedback Part of https://github.com/neondatabase/neon/issues/8002 ``` test_gc_feedback[debug-pg15].logical_size: 50 Mb test_gc_feedback[debug-pg15].physical_size: 2269 Mb test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1 ``` ## Summary of changes * Add the manual compaction trigger * Use in test_gc_feedback * Add a guard to avoid running it with retain_lsns * Fix: Do `schedule_compaction_update` after compaction * Fix: Supply deltas in the correct order to reconstruct value --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-25 23:00:14 +00:00
Alex Chi Z	76864e6a2a	feat(pageserver): add image layer iterator (#8006 ) part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes This pull request adds the image layer iterator. It buffers a fixed amount of key-value pairs in memory, and give developer an iterator abstraction over the image layer. Once the buffer is exhausted, it will issue 1 I/O to fetch the next batch. Due to the Rust lifetime mysteries, the `get_stream_from` API has been refactored to `into_stream` and consumes `self`. Delta layer iterator implementation will be similar, therefore I'll add it after this pull request gets merged. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-25 20:49:29 +00:00
Conrad Ludgate	6c5d3b5263	proxy fix wake compute console retry (#8141 ) ## Problem 1. Proxy is retrying errors from cplane that shouldn't be retried 2. ~~Proxy is not using the retry_after_ms value~~ ## Summary of changes 1. Correct the could_retry impl for ConsoleError. 2. ~~Update could_retry interface to support returning a fixed wait duration.~~	2024-06-25 18:07:54 +00:00
Christian Schwarz	cd9a550d97	clippy-deny the `todo!()` macro (#4340 ) `todo!()` shouldn't slip into prod code	2024-06-25 18:03:27 +00:00
John Spray	07f21dd6b6	pageserver: remove attach/detach apis (#8134 ) ## Problem These APIs have been deprecated for some time, but were still used from test code. Closes: https://github.com/neondatabase/neon/issues/4282 ## Summary of changes - It is still convenient to do a "tenant_attach" from a test without having to write out a location_conf body, so those test methods have been retained with implementations that call through to their location_conf equivalent.	2024-06-25 17:38:06 +01:00
Heikki Linnakangas	64a4461191	Fix submodule references to match the REL_*_STABLE_neon branches (#8159 ) No code changes, just point to the correct commit SHAs.	2024-06-25 19:05:13 +03:00
Yuchen Liang	961fc0ba8f	feat(pageserver): add metrics for number of valid leases after each refresh (#8147 ) Part of #7497, closes #8120. ## Summary of changes This PR adds a metric to track the number of valid leases after `GCInfo` gets refreshed each time. Besides this metric, we should also track disk space and synthetic size (after #8071 is closed) to make sure leases are used properly. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-25 15:43:12 +00:00
Alexander Bayandin	9b2f9419d9	CI: upload docker cache only from main (#8157 ) ## Problem The Docker build cache gets invalidated by PRs ## Summary of changes - Upload cache only from the main branch	2024-06-25 16:18:22 +01:00
Christian Schwarz	947f6da75e	L0 flush: avoid short-lived allocation when checking key_range empty (#8154 ) We only use `keys` to check if it's empty so we can bail out early. No need to collect the keys for that. Found this while doing research for https://github.com/neondatabase/neon/issues/7418	2024-06-25 17:04:44 +02:00
Vlad Lazar	7026dde9eb	storcon: update db related dependencides (#8155 ) ## Problem Storage controller runs into memory corruption issue on the drain/fill code paths. ## Summary of changes Update db related depdencies in the unlikely case that the issue was fixed in diesel.	2024-06-25 15:06:18 +01:00
Heikki Linnakangas	d502313841	Fix MVCC bug with prepared xact with subxacts on standby (#8152 ) We did not recover the subtransaction IDs of prepared transactions when starting a hot standby from a shutdown checkpoint. As a result, such subtransactions were considered as aborted, rather than in-progress. That would lead to hint bits being set incorrectly, and the subtransactions suddenly becoming visible to old snapshots when the prepared transaction was committed. To fix, update pg_subtrans with prepared transactions's subxids when starting hot standby from a shutdown checkpoint. The snapshots taken from that state need to be marked as "suboverflowed", so that we also check the pg_subtrans. Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0%40iki.fi NEON: cherry-picked from the upstream thread ahead of time, to unblock https://github.com/neondatabase/neon/pull/7288. I expect this to be committed to upstream in the next few days, superseding this. NOTE: I did not include the new regression test on v15 and v14 branches, because the test would need some adapting, and we don't run the perl tests on Neon anyway.	2024-06-25 16:29:32 +03:00
Yuchen Liang	219e78f885	feat(pageserver): add an optional lease to the get_lsn_by_timestamp API (#8104 ) Part of #7497, closes #8072. ## Problem Currently the `get_lsn_by_timestamp` and branch creation pageserver APIs do not provide a pleasant client experience where the looked-up LSN might be GC-ed between the two API calls. This PR attempts to prevent common races between GC and branch creation by making use of LSN leases provided in #8084. A lease can be optionally granted to a looked-up LSN. With the lease, GC will not touch layers needed to reconstruct all pages at this LSN for the duration of the lease. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-06-24 20:12:24 +00:00
John Spray	1ea5d8b132	tests: accomodate some messages that can fail tests (#8144 ) ## Problem - `test_storage_controller_many_tenants` can fail with warnings in the storage controller about tenant creation holding a lock for too long, because this test stresses the machine running the test with many concurrent timeline creations - `test_tenant_delete_smoke` can fail when synthetic remote storage errors show up ## Summary of changes - tolerate warnings about slow timeline creation in test_storage_controller_many_tenants - tolerate both possible errors during error_tolerant_delete	2024-06-24 17:03:53 +00:00
John Spray	3d760938e1	storcon_cli: remove old tenant-scatter command (#8127 ) ## Problem This command was used in the very early days of sharding, before the storage controller had anti-affinity + scheduling optimization to spread out shards. ## Summary of changes - Remove `storcon_cli tenant-scatter`	2024-06-24 12:57:16 -04:00
Alex Chi Z	9211de0df7	test(pageserver): add delta records tests for gc-compaction (#8078 ) Part of https://github.com/neondatabase/neon/issues/8002 This pull request adds tests for bottom-most gc-compaction with delta records. Also fixed a bug in the compaction process that creates overlapping delta layers by force splitting at the original delta layer boundary. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-24 15:50:31 +00:00
Alex Chi Z	d8ffe662a9	fix(pageserver): handle version number in draw timeline (#8102 ) We now have a `vX` number in the file name, i.e., `000000067F0000000400000B150100000000-000000067F0000000400000D350100000000__00000000014B7AC8-v1-00000001` The related pull request for new-style path was merged a month ago https://github.com/neondatabase/neon/pull/7660 ## Summary of changes Fixed the draw timeline dir command to handle it. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-06-24 15:31:06 +00:00
Arthur Petukhovsky	a4db2af1f0	Truncate waltmp file on creation (#8133 ) Previously in safekeeper code, new segment file was opened without truncate option. I don't think there is a reason to do it, this commit replaces it with `File::create` to make it simpler and remove `clippy::suspicious_open_options` linter warning.	2024-06-24 14:07:59 +00:00
John Spray	47fdf93cf0	tests: fix a flake in test_sharding_split_compaction (#8136 ) ## Problem This test could occasionally trigger a "removing local file ... because it has unexpected length log" when using the `compact-shard-ancestors-persistent` failpoint is in use, which is unexpected because that failpoint stops the process when the remote metadata is in sync with local files. It was because there are two shards on the same pageserver, and while the one being compacted explicitly stops at the failpoint, another shard was compacting in the background and failing at an unclean point. The test intends to disable background compaction, but was mistakenly revoking the value of `compaction_period` when it updated `pitr_interval`. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8123/9602976462/index.html#/testresult/7dd6165da7daef40 ## Summary of changes - Update `TENANT_CONF` in the test to use properly typed values, so that it is usable in pageserver APIs as well as via neon_local. - When updating tenant config with `pitr_interval`, retain the overrides from the start of the test, so that there won't be any background compaction going on during the test.	2024-06-24 14:54:54 +01:00
John Spray	de05f90735	pageserver: add more info-level logging in shard splits (#8137 ) ## Problem `test_sharding_autosplit` is occasionally failing on warnings about shard splits taking longer than expected (`Exclusive lock by ShardSplit was held for`...) It's not obvious which part is taking the time (I suspect remote storage uploads). Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/9618788427/index.html#testresult/b395294d5bdeb783/ ## Summary of changes - Since shard splits are infrequent events, we can afford to be very chatty: add a bunch of info-level logging throughout the process.	2024-06-24 11:53:43 +01:00
John Spray	188797f048	pageserver: remove code that resumes tenant deletions after restarts (#8091 ) #8082 removed the legacy deletion path, but retained code for completing deletions that were started before a pageserver restart. This PR cleans up that remaining code, and removes all the pageserver code that dealt with tenant deletion markers and resuming tenant deletions. The release at https://github.com/neondatabase/neon/pull/8138 contains https://github.com/neondatabase/neon/pull/8082, so we can now merge this to `main`	2024-06-24 11:41:11 +01:00
Arpad Müller	5446e08891	Move remote_storage config related code into dedicated module (#8132 ) Moves `RemoteStorageConfig` and related structs and functions into a dedicated module. Also implements `Serialize` for the config structs (requested in #8126). Follow-up of #8126	2024-06-24 12:29:54 +02:00
Conrad Ludgate	78d9059fc7	proxy: update tokio-postgres to allow arbitrary config params (#8076 ) ## Problem Fixes https://github.com/neondatabase/neon/issues/1287 ## Summary of changes tokio-postgres now supports arbitrary server params through the `param(key, value)` method. Some keys are special so we explicitly filter them out.	2024-06-24 10:20:27 +00:00
Arpad Müller	75747cdbff	Use serde for RemoteStorageConfig parsing (#8126 ) Adds a `Deserialize` impl to `RemoteStorageConfig`. We thus achieve the same as #7743 but with less repetitive code, by deriving `Deserialize` impls on `S3Config`, `AzureConfig`, and `RemoteStorageConfig`. The disadvantage is less useful error messages. The git history of this PR contains a state where we go via an intermediate representation, leveraging the `serde_json` crate, without it ever being actual json though. Also, the PR adds deserialization tests. Alternative to #7743 .	2024-06-22 17:57:09 +00:00
Vlad Lazar	8fe3f17c47	storcon: improve drain and fill shard placement (#8119 ) ## Problem While adapting the storage controller scale test to do graceful rolling restarts via drain and fill, I noticed that secondaries are also being rescheduled, which, in turn, caused the storage controller to optimise attachments. ## Summary of changes * Introduce a transactional looking rescheduling primitive (i.e. "try to schedule to this secondary, but leave everything as is if you can't") * Use it for the drain and fill stages to avoid calling into `Scheduler::schedule` and having secondaries move around.	2024-06-22 14:20:58 +00:00
Anastasia Lubennikova	8776089c70	Remove kq_imcx extension support per customer request neondatabase/cloud#13648	2024-06-21 20:22:54 +01:00
John Spray	b74232eb4d	tests: allow-list neon_local endpoint errors from storage controller (#8123 ) ## Problem For testing, the storage controller has a built-in hack that loads neon_local endpoint config from disk, and uses it to reconfigure endpoints when the attached pageserver changes. Some tests that stop an endpoint while the storage controller is running could occasionally fail on log errors from the controller trying to use its special test-mode calls into neon local Endpoint. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8117/9592392425/index.html#/testresult/9d2bb8623d0d53f8 ## Summary of changes - Give NotifyError an explicit NeonLocal variant, to avoid munging these into generic 500s (I don't want to ignore 500s in general) - Allow-list errors related to the local notification hook. The expectation is that tests using endpoints/workloads should be independently checking that those endpoints work: if neon_local generates an error inside the storage controller, that's ignorable.	2024-06-21 17:23:31 +00:00
Vlad Lazar	ee3081863e	storcon: implement endpoints for cancellation of drain and fill operations (#8029 ) ## Problem There's no way to cancel drain and fill operations. ## Summary of changes Implement HTTP endpoints to allow cancelling of background operations. When the operationis cancelled successfully, the node scheduling policy will revert to `Active`.	2024-06-21 17:13:51 +01:00
John Spray	15728be0e1	pageserver: always detach before deleting (#8082 ) In #7957 we enabled deletion without attachment, but retained the old-style deletion (return 202, delete in background) for attached tenants. In this PR, we remove the old-style deletion path, such that if the tenant delete API is invoked while a tenant is detached, it is simply detached before completing the deletion. This intentionally doesn't rip out all the old deletion code: in case a deletion was in progress at time of upgrade, we keep around the code for finishing it for one release cycle. The rest of the code removal happens in https://github.com/neondatabase/neon/pull/8091 Now that deletion will always be via the new path, the new path is also updated to use some retries around remote storage operations, to tripping up the control plane with 500s if S3 has an intermittent issue.	2024-06-21 15:39:19 +01:00
Arthur Petukhovsky	f45cf28247	Add eviction_state to control file (#8125 ) This is a preparation for #8022, to make the PR both backwards and foward compatible. This commit adds `eviction_state` field to control file. Adds support for reading it, but writes control file in old format where possible, to keep the disk format forward compatible. Note: in `patch_control_file`, new field gets serialized to json like this: - `"eviction_state": "Present"` - `"eviction_state": {"Offloaded": "0/8F"}`	2024-06-21 13:15:02 +01:00
Peter Bendel	82266a252c	Allow longer timeout for starting pageserver, safe keeper and storage controller in test cases to make test cases less flaky (#8079 ) ## Problem see https://github.com/neondatabase/neon/issues/8070 ## Summary of changes the neon_local subcommands to - start neon - start pageserver - start safekeeper - start storage controller get a new option -t=xx or --start-timeout=xx which allows to specify a longer timeout in seconds we wait for the process start. This is useful in test cases where the pageserver has to read a lot of layer data, like in pagebench test cases. In addition we exploit the new timeout option in the python test infrastructure (python fixtures) and modify the flaky testcase to increase the timeout from 10 seconds to 1 minute. Example from the test execution ```bash RUST_BACKTRACE=1 NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS=1 DEFAULT_PG_VERSION=15 BUILD_TYPE=release ./scripts/pytest test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py ... 2024-06-19 09:29:34.590 INFO [neon_fixtures.py:1513] Running command "/instance_store/neon/target/release/neon_local storage_controller start --start-timeout=60s" 2024-06-19 09:29:36.365 INFO [broker.py:34] starting storage_broker to listen incoming connections at "127.0.0.1:15001" 2024-06-19 09:29:36.365 INFO [neon_fixtures.py:1513] Running command "/instance_store/neon/target/release/neon_local pageserver start --id=1 --start-timeout=60s" 2024-06-19 09:29:36.366 INFO [neon_fixtures.py:1513] Running command "/instance_store/neon/target/release/neon_local safekeeper start 1 --start-timeout=60s" ```	2024-06-21 10:36:12 +00:00
John Spray	59f949b4a8	pageserver: remove unused load/ignore APIs (#8122 ) ## Problem These APIs have be unused for some time. They were superseded by /location_conf: the equivalent of ignoring a tenant is now to put it in secondary mode. ## Summary of changes - Remove APIs - Remove tests & helpers that used them - Remove error variants that are no longer needed.	2024-06-21 10:02:15 +00:00
Vlad Lazar	01399621d5	storcon: avoid promoting too many shards of the same tenant (#8099 ) ## Problem The fill planner introduced in https://github.com/neondatabase/neon/pull/8014 selects tenant shards to promote strictly based on attached shard count load (tenant shards on nodes with the most attached shard counts are considered first). This approach runs the risk of migrating too many shards belonging to the same tenant on the same primary node. This is bad for availability and causes extra reconciles via the storage controller's background optimisations. Also see https://github.com/neondatabase/neon/pull/8014#discussion_r1642456241. ## Summary of changes Refine the fill plan to avoid promoting too many shards belonging to the same tenant on the same node. We allow for `max(1, shard_count / node_count)` shards belonging to the same tenant to be promoted.	2024-06-21 10:19:01 +01:00
Jure Bajic	0792bb6785	Add tracing for shared locks in `id_lock_map` (#7618 ) ## Problem Storage controller shared locks do not print a warning when held for long time spans. ## Summary of changes Extension of issue https://github.com/neondatabase/neon/issues/7108 in tracing to exclusive lock in `id_lock_map` was added, to add the same for shared locks. It was mentioned in the comment https://github.com/neondatabase/neon/pull/7397#discussion_r1587961160	2024-06-21 09:47:04 +01:00

1 2 3 4 5 ...

5510 Commits