rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-17 10:22:56 +00:00

Author	SHA1	Message	Date
Andrey Taranik	e5d9c003f5	try new qemu based runners	2024-08-19 17:10:17 +03:00
Andrey Taranik	290ce3ed46	try aws arm64 8 core	2024-08-18 02:45:55 +03:00
Andrey Taranik	1138e286b9	try aws arm64 16core runners	2024-08-17 23:04:50 +03:00
Andrey Taranik	9173847f81	try 80 core metal again	2024-08-17 20:20:07 +03:00
Andrey Taranik	6ae62574d0	try 16 core hcloud again	2024-08-17 19:34:31 +03:00
Andrey Taranik	5e074d8536	Merge branch 'main' into cicd/debug-regress-tests-on-arm	2024-08-17 17:49:47 +03:00
Andrey Taranik	3a22daf33e	runner lables fix	2024-08-17 16:37:17 +03:00
Andrey Taranik	f40f627730	tey aws arm64 runners	2024-08-17 16:30:24 +03:00
Konstantin Knizhnik	2be69af6c3	Track holes to be able to reuse them once LFC limit is increased (#8575 ) ## Problem Multiple increase/decrease LFC limit may cause unlimited growth of LFC file because punched holes while LFC shrinking are not reused when LFC is extended. ## Summary of changes Keep track of holes and reused them when LFC size is increased. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-16 22:19:44 +03:00
Sasha Krassovsky	c6b6b7700a	Fix superuser check in test_snap_files (#8749 ) ## Problem Current superuser check always passes because it returns a tuple like `(False,)`, and then the `if not superuser` passes. ## Summary of changes Fixes the issue by unwrapping the tuple. Verified that it works against a project where I don't have superuser.	2024-08-16 19:13:18 +01:00
John Spray	e2d89f7991	pageserver: prioritize secondary downloads to get most recent layers first, except l0s (#8729 ) ## Problem When a secondary location is trying to catch up while a tenant is receiving new writes, it can become quite wasteful: - Downloading L0s which are soon destroyed by compaction to L1s - Downloading older layer files which are soon made irrelevant when covered by image layers. ## Summary of changes Sort the layer files in the heatmap: - L0 layers are the lowest priority - Other layers are sorted to download the highest LSNs first.	2024-08-16 14:35:02 +02:00
Arseny Sher	25e7d321f4	safekeeper: cross check divergence point in ProposerElected handling. Previously, we protected from multiple ProposerElected messages from the same walproposer with the following condition: msg.term == self.get_last_log_term() && self.flush_lsn() > msg.start_streaming_at It is not exhaustive, i.e. we could still proceed to truncating WAL even though safekeeper inserted something since the divergence point has been calculated. While it was most likely safe because walproposer can't use safekeeper position to commit WAL until last_log_term reaches the current walproposer term, let's be more careful and properly calculate the divergence point like walproposer does.	2024-08-16 15:22:46 +03:00
Vlad Lazar	3f91ea28d9	tests: add infra and test for storcon leadership transfer (#8587 ) ## Problem https://github.com/neondatabase/neon/pull/8588 implemented the mechanism for storage controller leadership transfers. However, there's no tests that exercise the behaviour. ## Summary of changes 1. Teach `neon_local` how to handle multiple storage controller instances. Each storage controller instance gets its own subdirectory (`storage_controller_1, ...`). `storage_controller start\|stop` subcommands have also been extended to optionally accept an instance id. 2. Add a storage controller proxy test fixture. It's a basic HTTP server that forwards requests from pageserver and test env to the currently configured storage controller. 3. Add a test which exercises storage controller leadership transfer. 4. Finally fix a couple bugs that the test surfaced	2024-08-16 13:05:04 +01:00
Andrey Taranik	56b94b7d1b	return to large-arm64	2024-08-16 14:18:01 +03:00
Heikki Linnakangas	7fdc3ea162	Add retroactive RFC about physical replication (#8546 ) We've had physical replication support for a long time, but we never created an RFC for the feature. This RFC does that after the fact. Even though we've already implemented the feature, let's have a design discussion as if it hadn't done that. It can still be quite insightful. This is written from a pretty compute-centric viewpoint, not much on how it works in the control plane.	2024-08-16 11:30:53 +01:00
Andrey Taranik	fe445b2945	more parallelism	2024-08-16 13:21:19 +03:00
Andrey Taranik	7f49f45a53	tune parallelism	2024-08-16 13:08:36 +03:00
Andrey Taranik	980b212789	try more parallelism	2024-08-16 11:56:11 +03:00
Andrey Taranik	2b17a03911	arm64 80 cores debian 12	2024-08-16 10:20:23 +03:00
Andrey Taranik	b50a9d84d4	Merge branch 'main' into cicd/debug-regress-tests-on-arm	2024-08-16 09:06:09 +03:00
Joonas Koivunen	4763a960d1	chore: log if we have an open layer or any frozen on shutdown (#8740 ) Some benchmarks are failing with a "long" flushing, which might be because there is a queue of in-memory layers, or something else. Add logging to narrow it down. Private slack DM ref: https://neondb.slack.com/archives/D049K7HJ9JM/p1723727305238099	2024-08-16 06:10:05 +01:00
Sasha Krassovsky	df086cd139	Add logical replication test to exercise snapfiles (#8364 )	2024-08-15 15:34:45 -07:00
Alexander Bayandin	69cb1ee479	CI(replication-tests): store test results & change notification channel (#8687 ) ## Problem We want to store Nightly Replication test results in the database and notify the relevant Slack channel about failures ## Summary of changes - Store test results in the database - Notify `on-call-compute-staging-stream` about failures	2024-08-15 22:41:58 +01:00
Alexander Bayandin	4e58fd9321	CI(label-for-external-users): use CI_ACCESS_TOKEN (#8738 ) ## Problem `secrets.GITHUB_TOKEN` (with any permissions) is not enough to get a user's membership info if they decide to hide it. ## Summary of changes - Use `secrets.CI_ACCESS_TOKEN` for `gh api` call - Use `pull_request_target` instead of `pull_request` event to get access to secrets	2024-08-15 18:37:15 +01:00
Andrey Taranik	6fd3c9daa5	trigger build	2024-08-15 20:27:24 +03:00
Andrey Taranik	409171ab08	try 16 cores	2024-08-15 16:46:57 +03:00
Andrey Taranik	f26240c9dc	Merge branch 'main' into cicd/debug-regress-tests-on-arm	2024-08-15 16:46:00 +03:00
Konstantin Knizhnik	f087423a01	Handle reload config file request in LR monitor (#8732 ) ## Problem Logical replication BGW checking replication lag is not reloading config ## Summary of changes Add handling of reload config request ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-15 16:28:25 +03:00
Joonas Koivunen	24d347f50b	storcon: use tracing for logging panics (#8734 ) this gives spans for panics, and does not globber loki output by writing to stderr while all of the other logging is to stdout. See: #3475	2024-08-15 16:27:07 +03:00
Andrey Taranik	1c771267ab	Merge branch 'main' into cicd/debug-regress-tests-on-arm	2024-08-15 16:15:17 +03:00
Joonas Koivunen	52641eb853	storcon: add spans to drain/fill ops (#8735 ) this way we do not need to repeat the %node_id everywhere, and we get no stray messages in logs from within the op.	2024-08-15 15:30:04 +03:00
Andrey Taranik	9f1b7b72ed	Merge branch 'main' into cicd/debug-regress-tests-on-arm	2024-08-15 14:56:37 +03:00
Andrey Taranik	2476f7ef74	try arm64 with 80 cores	2024-08-15 14:56:14 +03:00
Joonas Koivunen	d9a57aeed9	storcon: deny external node configuration if an operation is ongoing (#8727 ) Per #8674, disallow node configuration while drain/fill are ongoing. Implement it by adding a only-http wrapper `Service::external_node_configure` which checks for operation existing before configuring. Additionally: - allow cancelling drain/fill after a pageserver has restarted and transitioned to WarmingUp Fixes: #8674	2024-08-15 10:54:05 +01:00
Alexander Bayandin	a9c28be7d0	fix(pageserver): allow unused_imports in download.rs on macOS (#8733 ) ## Problem On macOS, clippy fails with the following error: ``` error: unused import: `crate::virtual_file::owned_buffers_io::io_buf_ext::IoBufExt` --> pageserver/src/tenant/remote_timeline_client/download.rs:26:5 \| 26 \| use crate::virtual_file::owned_buffers_io::io_buf_ext::IoBufExt; \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: `-D unused-imports` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_imports)]` ``` Introduced in https://github.com/neondatabase/neon/pull/8717 ## Summary of changes - allow `unused_imports` for `crate::virtual_file::owned_buffers_io::io_buf_ext::IoBufExt` on macOS in download.rs	2024-08-15 10:06:28 +01:00
Vlad Lazar	fef77b0cc9	safekeeper: consider partial uploads when pulling timeline (#8628 ) ## Problem The control file contains the id of the safekeeper that uploaded it. Previously, when sending a snapshot of the control file to another sk, it would eventually be gc-ed by the receiving sk. This is incorrect because the original sk might still need it later. ## Summary of Changes When sending a snapshot and the control file contains an uploaded segment: * Create a copy of the segment in s3 with the destination sk in the object name * Tweak the streamed control file to point to the object create in the previous step Note that the snapshot endpoint now has to know the id of the requestor, so the api has been extended to include the node if of the destination sk. Closes https://github.com/neondatabase/neon/issues/8542	2024-08-15 09:02:33 +01:00
Andrey Taranik	f555cb3970	try cloud arm64 servers	2024-08-15 03:36:00 +03:00
Andrey Taranik	10a726503a	Merge branch 'main' into cicd/debug-regress-tests-on-arm	2024-08-15 03:34:07 +03:00
Christian Schwarz	168913bdf0	refactor(write path): newtype to enforce use of fully initialized slices (#8717 ) The `tokio_epoll_uring::Slice` / `tokio_uring::Slice` type is weird. The new `FullSlice` newtype is better. See the doc comment for details. The naming is not ideal, but we'll clean that up in a future refactoring where we move the `FullSlice` into `tokio_epoll_uring`. Then, we'll do the following: * tokio_epoll_uring::Slice is removed * `FullSlice` becomes `tokio_epoll_uring::IoBufView` * new type `tokio_epoll_uring::IoBufMutView` for the current `tokio_epoll_uring::Slice<IoBufMut>` Context ------- I did this work in preparation for https://github.com/neondatabase/neon/pull/8537. There, I'm changing the type that the `inmemory_layer.rs` passes to `DeltaLayerWriter::put_value_bytes` and thus it seemed like a good opportunity to make this cleanup first.	2024-08-14 21:57:17 +02:00
Alexander Bayandin	aa2e16f307	CI: misc cleanup & fixes (#8559 ) ## Problem A bunch of small fixes and improvements for CI, that are too small to have a separate PR for them ## Summary of changes - CI(build-and-test): fix parenthesis - CI(actionlint): fix path to workflow file - CI: remove default args from actions/checkout - CI: remove `gen3` label, using a couple `self-hosted` + `small{,-arm64}`/`large{,-arm64}` is enough - CI: prettify Slack messages, hide links behind text messages - C(build-and-test): add more dependencies to `conclusion` job	2024-08-14 17:56:59 +01:00
Alexander Bayandin	70b18ff481	CI(neon-image): add ARM-specific RUSTFLAGS (#8566 ) ## Problem It's recommended that a couple of additional RUSTFLAGS be set up to improve the performance of Rust applications on AWS Graviton. See `57dc813626/rust.md` Note: Apple Silicon is compatible with neoverse-n1: ``` $ clang --version Apple clang version 15.0.0 (clang-1500.3.9.4) Target: arm64-apple-darwin23.6.0 Thread model: posix InstalledDir: /Applications/Xcode_15.4.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin $ $ clang --print-supported-cpus 2>&1 \| grep neoverse- neoverse-512tvb neoverse-e1 neoverse-n1 neoverse-n2 neoverse-v1 neoverse-v2 ``` ## Summary of changes - Add `-Ctarget-feature=+lse -Ctarget-cpu=neoverse-n1` to RUSTFLAGS for ARM images	2024-08-14 17:03:21 +01:00
Andrey Taranik	7ba86e15fa	debug arm64 builds	2024-08-14 18:57:28 +03:00
Andrey Taranik	7ba42bfdb4	Merge branch 'main' into cicd/debug-regress-tests-on-arm	2024-08-14 18:33:19 +03:00
Andrey Taranik	4f8a39d6c6	try metal arm64 runners	2024-08-14 17:46:13 +03:00
Joonas Koivunen	60fc1e8cc8	chore: even more responsive compaction cancellation (#8725 ) Some benchmarks and tests might still fail because of #8655 (tracked in #8708) because we are not fast enough to shut down ([one evidence]). Partially this is explained by the current validation mode of streaming k-merge, but otherwise because that is where we use a lot of time in compaction. Outside of L0 => L1 compaction, the image layer generation is already guarded by vectored reads doing cancellation checks. 32768 is a wild guess based on looking how many keys we put in each layer in a bench (1-2 million), but I assume it will be good enough divisor. Doing checks more often will start showing up as contention which we cannot currently measure. Doing checks less often might be reasonable. [one evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10384136483/index.html#suites/9681106e61a1222669b9d22ab136d07b/96e6d53af234924/ Earlier PR: #8706.	2024-08-14 14:48:15 +01:00
Alexander Bayandin	54c5da3981	CI(build-and-test): set RUSTFLAGS for ARM	2024-08-14 13:57:20 +01:00
Alexander Bayandin	c1378dc43b	CI: don't collect code coverage on arm64 runners	2024-08-14 13:53:16 +01:00
Alexander Bayandin	50b9fb430a	test_runner: add __arch parameter to Allure report	2024-08-14 13:53:16 +01:00
Alexander Bayandin	486eaba028	CI(build-and-test): run regression tests on arm	2024-08-14 13:53:16 +01:00
Alexander Bayandin	d4d70cc314	CI(build-and-test): make pg-versions configurable	2024-08-14 13:53:16 +01:00

1 2 3 4 5 ...

5915 Commits