rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-03 20:20:38 +00:00

Author	SHA1	Message	Date
Shany Pozin	2266ee5971	Merge pull request #4803 from neondatabase/releases/2023-07-25 Release 2023-07-25 release-3592	2023-07-25 14:21:07 +03:00
Joonas Koivunen	f9214771b4	fix: count broken tenant more correct (#4800 ) count only once; on startup create the counter right away because we will not observe any changes later. small, probably never reachable from outside fix for #4796.	2023-07-25 12:31:24 +03:00
Joonas Koivunen	77a68326c5	Thin out TenantState metric, keep set of broken tenants (#4796 ) We currently have a timeseries for each of the tenants in different states. We only want this for Broken. Other states could be counters. Fix this by making the `pageserver_tenant_states_count` a counter without a `tenant_id` and add a `pageserver_broken_tenants_count` which has a `tenant_id` label, each broken tenant being 1.	2023-07-25 11:15:54 +03:00
Joonas Koivunen	a25504deae	Limit concurrent compactions (#4777 ) Compactions can create a lot of concurrent work right now with #4265. Limit compactions to use at most 6/8 background runtime threads.	2023-07-25 10:19:04 +03:00
Joonas Koivunen	294b8a8fde	Convert per timeline metrics to global (#4769 ) Cut down the per-(tenant, timeline) histograms by making them global: - `pageserver_getpage_get_reconstruct_data_seconds` - `pageserver_read_num_fs_layers` - `pageserver_remote_operation_seconds` - `pageserver_remote_timeline_client_calls_started` - `pageserver_wait_lsn_seconds` - `pageserver_io_operations_seconds` --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-07-25 00:43:27 +03:00
Alex Chi Z	407a20ceae	add proxy unit tests for retry connections (#4721 ) Given now we've refactored `connect_to_compute` as a generic, we can test it with mock backends. In this PR, we mock the error API and connect_once API to test the retry behavior of `connect_to_compute`. In the next PR, I'll add mock for credentials so that we can also test behavior with `wake_compute`. ref https://github.com/neondatabase/neon/issues/4709 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-24 20:41:42 +03:00
arpad-m	e5b7ddfeee	Preparatory pageserver async conversions (#4773 ) In #4743, we'd like to convert the read path to use `async` rust. In preparation of that, this PR switches some functions that are calling lower level functions like `BlockReader::read_blk`, `BlockCursor::read_blob`, etc into `async`. The PR does not switch all functions however, and only focuses on the ones which are easy to switch. This leaves around some async functions that are (currently) unnecessarily `async`, but on the other hand it makes future changes smaller in diff. Part of #4743 (but does not completely address it).	2023-07-24 14:01:54 +02:00
Alek Westover	7feb0d1a80	`unwrap` instead of passing `anyhow::Error` on failure to spawn a thread (#4779 )	2023-07-21 15:17:16 -04:00
Konstantin Knizhnik	457e3a3ebc	Mx offset bug (#4775 ) Fix mx_offset_to_flags_offset() function Fixes issue #4774 Postgres `MXOffsetToFlagsOffset` was not correctly converted to Rust because cast to u16 is done before division by modulo. It is possible only if divider is power of two. Add a small rust unit test to check that the function produces same results as the PostgreSQL macro, and extend the existing python test to cover this bug. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-07-21 21:20:53 +03:00
Joonas Koivunen	25d2f4b669	metrics: chunked responses (#4768 ) Metrics can get really large in the order of hundreds of megabytes, which we used to buffer completly (after a few rounds of growing the buffer).	2023-07-21 15:10:55 +00:00
Alex Chi Z	1685593f38	stable merge and sort in compaction (#4573 ) Per discussion at https://github.com/neondatabase/neon/pull/4537#discussion_r1242086217, it looks like a better idea to use `<` instead of `<=` for all these comparisons. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-21 10:15:44 -04:00
dependabot[bot]	8d0f4a7857	Bump aiohttp from 3.7.4 to 3.8.5 (#4762 )	2023-07-20 22:33:50 +03:00
Alex Chi Z	3fc3666df7	make flush frozen layer an atomic operation (#4720 ) ## Problem close https://github.com/neondatabase/neon/issues/4712 ## Summary of changes Previously, when flushing frozen layers, it was split into two operations: add delta layer to disk + remove frozen layer from memory. This would cause a short period of time where we will have the same data both in frozen and delta layer. In this PR, we merge them into one atomic operation in layer map manager, therefore simplifying the code. Note that if we decide to create image layers for L0 flush, it will still be split into two operations on layer map. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-20 13:39:19 -04:00
Joonas Koivunen	89746a48c6	chore: fix copypaste caused flakyness (#4763 ) I introduced a copypaste error leading to flaky [test failure][report] in #4737. Solution is to use correct/unique test name. I also looked into providing a proper fn name via macro but ... Yeah, it's probably not a great idea. [report]: https://github.com/neondatabase/neon/actions/runs/5612473297/job/15206293430#step:15:197	2023-07-20 19:55:40 +03:00
Joonas Koivunen	8d27a9c54e	Less verbose eviction failures (#4737 ) As seen in staging logs with some massive compactions (create_image_layer), in addition to racing with compaction or gc or even between two invocations to `evict_layer_batch`. Cc: #4745 Fixes: #3851 (organic tech debt reduction) Solution is not to log the Not Found in such cases; it is perfectly natural to happen. Route to this is quite long, but implemented two cases of "race between two eviction processes" which are like our disk usage based eviction and eviction_task, both have the separate "lets figure out what to evict" and "lets evict" phases.	2023-07-20 17:45:10 +03:00
arpad-m	d98cb39978	pageserver: use tokio::time::timeout where possible (#4756 ) Removes a bunch of cases which used `tokio::select` to emulate the `tokio::time::timeout` function. I've done an additional review on the cancellation safety of these futures, all of them seem to be cancellation safe (not that `select!` allows non-cancellation-safe futures, but as we touch them, such a review makes sense). Furthermore, I correct a few mentions of a non-existent `tokio::timeout!` macro in the docs to the `tokio::time::timeout` function.	2023-07-20 16:19:38 +02:00
Alexander Bayandin	27c73c8740	Bump pg_embedding extension (#4758 ) ``` % git log --pretty=oneline 2465f831ea1f8d49c1d74f8959adb7fc277d70cd..eeb3ba7c3a60c95b2604dd543c64b2f1bb4a3703 eeb3ba7c3a60c95b2604dd543c64b2f1bb4a3703 (HEAD -> main, origin/main) Fixc in-mmeory index rebuild after TRUNCATE 1d7cfcfe3d58e2cf4566900437c609725448d14b Correctly handle truncate forin-0memory HNSW index 8fd2a4a191f67858498d876ec378b58e76b5874a :Fix empty index search issue 30e9ef4064cff40c60ff2f78afeac6c296722757 Fix extensiomn name in makefile 23bb5d504aa21b1663719739f6eedfdcb139d948 Fix getting memory size at Mac OS/X 39193a38d6ad8badd2a8d1dce2dd999e1b86885d Update a comment for the extension bf3b0d62a7df56a5e4db9d9e62dc535794c425bc Merge branch 'main' of https://github.com/neondatabase/pg_embedding c2142d514280e14322d1026f0c811876ccf7a91f Update README.md 53b641880f786d2b69a75941c49e569018e8e97e Create LICENSE 093aaa36d5af183831bf370c97b563c12d15f23a Update README.md 91f0bb84d14cb26fd8b452bf9e1ecea026ac5cbc Update README.md 7f7efa38015f24ee9a09beca3009b8d0497a40b4 Update README.md 71defdd4143ecf35489d93289f6cdfa2545fbd36 Merge pull request #4 from neondatabase/danieltprice-patch-1 e06c228b99c6b7c47ebce3bb7c97dbd494088b0a Update README.md d7e52b576b47d9023743b124bdd0360a9fc98f59 Update README.md 70ab399c861330b50a9aff9ab9edc7044942a65b Merge pull request #5 from neondatabase/oom_error_reporting 0aee1d937997198fa2d2b2ed7a0886d1075fa790 Fix OOM error reporting and support vectprization for ARM 18d80079ce60b2aa81d58cefdf42fc09d2621fc1 Update README.md ```	2023-07-20 12:32:57 +01:00
Joonas Koivunen	9e871318a0	Wait detaches or ignores on pageserver shutdown (#4678 ) Adds in a barrier for the duration of the `Tenant::shutdown`. `pageserver_shutdown` will join this await, `detach`es and `ignore`s will not. Fixes #4429. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-20 13:14:13 +03:00
bojanserafimov	e1061879aa	Improve startup python test (#4757 )	2023-07-19 23:46:16 -04:00
Daniel	f09e82270e	Update comment for hnsw extension (#4755 ) Updated the description that appears for hnsw when you query extensions: ``` neondb=> SELECT * FROM pg_available_extensions WHERE name = 'hnsw'; name \| default_version \| installed_version \| comment ----------------------+-----------------+-------------------+-------------------------------------------------- hnsw \| 0.1.0 \| \| Deprecated Please use pg_embedding instead (1 row) ``` --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-19 19:08:25 +01:00
Alexander Bayandin	d4a5fd5258	Disable extension uploading to S3 (#4751 ) ## Problem We're going to reset S3 buckets for extensions (https://github.com/neondatabase/aws/pull/413), and as soon as we're going to change the format we store extensions on S3. Let's stop uploading extensions in the old format. ## Summary of changes - Disable `aws s3 cp` step for extensions	2023-07-19 15:44:14 +01:00
Arseny Sher	921bb86909	Use safekeeper tenant only port in all tests and actually test it. Compute now uses special safekeeper WAL service port allowing auth tokens with only tenant scope. Adds understanding of this port to neon_local and fixtures, as well as test of both ports behaviour with different tokens. ref https://github.com/neondatabase/neon/issues/4730	2023-07-19 06:03:51 +04:00
Arseny Sher	1e7db5458f	Add one more WAL service port allowing only tenant scoped auth tokens. It will make it easier to limit access at network level, with e.g. k8s network policies. ref https://github.com/neondatabase/neon/issues/4730	2023-07-19 06:03:51 +04:00
Alexander Bayandin	b4d36f572d	Use sharded-slab from crates (#4729 ) ## Problem We use a patched version of `sharded-slab` with increased MAX_THREADS [1]. It is not required anymore because safekeepers are async now. A valid comment from the original PR tho [1]: > Note that patch can affect other rust services, not only the safekeeper binary. - [1] https://github.com/neondatabase/neon/pull/4122 ## Summary of changes - Remove patch for `sharded-slab`	2023-07-18 13:50:44 +01:00
Shany Pozin	b58445d855	Merge pull request #4746 from neondatabase/releases/2023-07-18 Release 2023-07-18 release-3568	2023-07-18 14:45:39 +03:00
Conrad Ludgate	36050e7f3d	Merge branch 'release' into releases/2023-07-18	2023-07-18 12:00:09 +01:00
Joonas Koivunen	762a8a7bb5	python: more linting (#4734 ) Ruff has "B" class of lints, including B018 which will nag on useless expressions, related to #4719. Enable such lints and fix the existing issues. Most notably: - https://beta.ruff.rs/docs/rules/mutable-argument-default/ - https://beta.ruff.rs/docs/rules/assert-false/ --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-18 12:56:40 +03:00
Conrad Ludgate	2e8a3afab1	proxy: merge handle_client (#4740 ) ## Problem Second half of #4699. we were maintaining 2 implementations of handle_client. ## Summary of changes Merge the handle_client code, but abstract some of the details. ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-07-17 22:20:23 +01:00
Alexander Bayandin	4580f5085a	test_runner: run benchmarks in parallel (#4683 ) ## Problem Benchmarks run takes about an hour on main branch (in a single job), which delays pipeline results. And it takes another hour if we want to restart the job due to some failures. ## Summary of changes - Use `pytest-split` plugin to run benchmarks on separate CI runners in 4 parallel jobs - Add `scripts/benchmark_durations.py` for getting benchmark durations from the database to help `pytest-split` schedule tests more evenly. It uses p99 for the last 10 days' results (durations). The current distribution could be better; each worker's durations vary from 9m to 35m, but this could be improved in consequent PRs.	2023-07-17 20:09:45 +01:00
Conrad Ludgate	e074ccf170	reduce proxy timeouts (#4708 ) ## Problem 10 retries * 10 second timeouts makes for a very long retry window. ## Summary of changes Adds a 2s timeout to sql_over_http connections, and also reduces the 10s timeout in TCP.	2023-07-17 20:05:26 +01:00
George MacKerron	196943c78f	CORS preflight OPTIONS support for /sql (http fetch) endpoint (#4706 ) ## Problem HTTP fetch can't be used from browsers because proxy doesn't support [CORS 'preflight' `OPTIONS` requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests). ## Summary of changes Added a simple `OPTIONS` endpoint for `/sql`.	2023-07-17 20:01:25 +01:00
bojanserafimov	149dd36b6b	Update pg: add startup logs (#4736 )	2023-07-17 14:47:08 -04:00
Kirill Bulatov	be271e3edf	Use upstream version of tokio-tar (#4722 ) tokio-tar 0.3.1 got released, including all changes from the fork currently used, switch over to that one.	2023-07-17 17:18:33 +01:00
Conrad Ludgate	7c85c7ea91	proxy: merge connect compute (#4713 ) ## Problem Half of #4699. TCP/WS have one implementation of `connect_to_compute`, HTTP has another implementation of `connect_to_compute`. Having both is annoying to deal with. ## Summary of changes Creates a set of traits `ConnectMechanism` and `ShouldError` that allows the `connect_to_compute` to be generic over raw TCP stream or tokio_postgres based connections. I'm not super happy with this. I think it would be nice to remove tokio_postgres entirely but that will need a lot more thought to be put into it. I have also slightly refactored the caching to use fewer references. Instead using ownership to ensure the state of retrying is encoded in the type system.	2023-07-17 15:53:01 +01:00
Alex Chi Z	1066bca5e3	compaction: allow duplicated layers and skip in replacement (#4696 ) ## Problem Compactions might generate files of exactly the same name as before compaction due to our naming of layer files. This could have already caused some mess in the system, and is known to cause some issues like https://github.com/neondatabase/neon/issues/4088. Therefore, we now consider duplicated layers in the post-compaction process to avoid violating the layer map duplicate checks. related previous works: close https://github.com/neondatabase/neon/pull/4094 error reported in: https://github.com/neondatabase/neon/issues/4690, https://github.com/neondatabase/neon/issues/4088 ## Summary of changes If a file already exists in the layer map before the compaction, do not modify the layer map and do not delete the file. The file on disk at that time should be the new one overwritten by the compaction process. This PR also adds a test case with a fail point that produces exactly the same set of files. This bypassing behavior is safe because the produced layer files have the same content / are the same representation of the original file. An alternative might be directly removing the duplicate check in the layer map, but I feel it would be good if we can prevent that in the first place. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-17 17:26:29 +03:00
bojanserafimov	1aad8918e1	Document recommended ccls setup (#4723 )	2023-07-17 09:21:42 -04:00
Christian Schwarz	966213f429	basebackup query metric: use same buckets as control plane (#4732 ) The `CRITICAL_OPS_BUCKETS` is not useful for getting an accurate picture of basebackup latency because all the observations that negatively affect our SLI fall into one bucket, i.e., 100ms-1s. Use the same buckets as control plane instead.	2023-07-17 13:46:13 +02:00
arpad-m	35e73759f5	Reword comment and add comment on race condition (#4725 ) The race condition that caused #4526 is still not fixed, so point it out in a comment. Also, reword a comment in upload.rs. Follow-up of #4694	2023-07-17 12:49:58 +02:00
Vadim Kharitonov	48936d44f8	Update postgres version (#4727 )	2023-07-16 13:40:59 +03:00
Em Sharnoff	2eae0a1fe5	Update vm-builder v0.12.1 -> v0.13.1 (#4728 ) This should only affect the version of the vm-informant used. The only change to the vm-informant from v0.12.1 to v0.13.1 was: * https://github.com/neondatabase/autoscaling/pull/407 Just a typo fix; worth getting in anyways.	2023-07-15 15:38:15 -07:00
dependabot[bot]	53470ad12a	Bump cryptography from 41.0.0 to 41.0.2 (#4724 )	2023-07-15 14:36:13 +03:00
Alexander Bayandin	edccef4514	Make CI more friendly for external contributors (#4663 ) ## Problem CI doesn't work for external contributors (for PRs from forks), see #2222 for more information. I'm proposing the following: - External PR is created - PR is reviewed so that it doesn't contain any malicious code - Label `approved-for-ci-run` is added to that PR (by the reviewer) - A new workflow picks up this label and creates an internal branch from that PR (the branch name is `ci-run/pr-`) - CI is run on the branch, but the results are also propagated to the PRs check - We can merge a PR itself if it's green; if not — repeat. ## Summary of changes - Create `approved-for-ci-run.yml` workflow which handles `approved-for-ci-run` label - Trigger `build_and_test.yml` and `neon_extra_builds.yml` workflows on `ci-run/pr-` branches	2023-07-15 11:58:15 +01:00
arpad-m	982fce1e72	Fix rustdoc warnings and test cargo doc in CI (#4711 ) ## Problem `cargo +nightly doc` is giving a lot of warnings: broken links, naked URLs, etc. ## Summary of changes * update the `proc-macro2` dependency so that it can compile on latest Rust nightly, see https://github.com/dtolnay/proc-macro2/pull/391 and https://github.com/dtolnay/proc-macro2/issues/398 * allow the `private_intra_doc_links` lint, as linking to something that's private is always more useful than just mentioning it without a link: if the link breaks in the future, at least there is a warning due to that. Also, one might enable [`--document-private-items`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html#documentation-options) in the future and make these links work in general. * fix all the remaining warnings given by `cargo +nightly doc` * make it possible to run `cargo doc` on stable Rust by updating `opentelemetry` and associated crates to version 0.19, pulling in a fix that previously broke `cargo doc` on stable: https://github.com/open-telemetry/opentelemetry-rust/pull/904 * Add `cargo doc` to CI to ensure that it won't get broken in the future. Fixes #2557 ## Future work * Potentially, it might make sense, for development purposes, to publish the generated rustdocs somewhere, like for example [how the rust compiler does it](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html). I will file an issue for discussion.	2023-07-15 05:11:25 +03:00
Vadim Kharitonov	e767ced8d0	Update rust to 1.71.0 (#4718 ) Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-14 18:34:01 +02:00
Alex Chi Z	1309571f5d	proxy: switch to structopt for clap parsing (#4714 ) Using `#[clap]` for parsing cli opts, which is easier to maintain. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-14 19:11:01 +03:00
Joonas Koivunen	9a69b6cb94	Demote deletion warning, list files (#4688 ) Handle test failures like: ``` AssertionError: assert not ['$ts WARN delete_timeline{tenant_id=X timeline_id=Y}: About to remove 1 files\n'] ``` Instead of logging: ``` WARN delete_timeline{tenant_id=X timeline_id=Y}: Found 1 files not bound to index_file.json, proceeding with their deletion WARN delete_timeline{tenant_id=X timeline_id=Y}: About to remove 1 files ``` For each one operation of timeline deletion, list all unref files with `info!`, and then continue to delete them with the added spice of logging the rare/never happening non-utf8 name with `warn!`. Rationale for `info!` instead of `warn!`: this is a normal operation; like we had mentioned in `test_import.py` -- basically whenever we delete a timeline which is not idle. Rationale for N * (`ìnfo!`\|`warn!`): symmetry for the layer deletions; if we could ever need those, we could also need these for layer files which are not yet mentioned in `index_part.json`. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-14 18:59:16 +03:00
Joonas Koivunen	cc82cd1b07	spanchecks: Support testing without tracing (#4682 ) Tests cannot be ran without configuring tracing. Split from #4678. Does not nag about the span checks when there is no subscriber configured, because then the spans will have no links and nothing can be checked. Sadly the `SpanTrace::status()` cannot be used for this. `tracing` is always configured in regress testing (running with `pageserver` binary), which should be enough. Additionally cleans up the test code in span checks to be in the test code. Fixes a `#[should_panic]` test which was flaky before these changes, but the `#[should_panic]` hid the flakyness. Rationale for need: Unit tests might not be testing only the public or `feature="testing"` APIs which are only testable within `regress` tests so not all spans might be configured.	2023-07-14 17:45:25 +03:00
Alex Chi Z	c76b74c50d	semantic layer map operations (#4618 ) ## Problem ref https://github.com/neondatabase/neon/issues/4373 ## Summary of changes A step towards immutable layer map. I decided to finish the refactor with this new approach and apply https://github.com/neondatabase/neon/pull/4455 on this patch later. In this PR, we moved all modifications of the layer map to one place with semantic operations like `initialize_local_layers`, `finish_compact_l0`, `finish_gc_timeline`, etc, which is now part of `LayerManager`. This makes it easier to build new features upon this PR: * For immutable storage state refactor, we can simply replace the layer map with `ArcSwap<LayerMap>` and remove the `layers` lock. Moving towards it requires us to put all layer map changes in a single place as in https://github.com/neondatabase/neon/pull/4455. * For manifest, we can write to manifest in each of the semantic functions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-13 17:35:27 +03:00
Alexey Kondratov	ed938885ff	[compute_ctl] Fix deletion of template databases (#4661 ) If database was created with `is_template true` Postgres doesn't allow dropping it right away and throws error ``` ERROR: cannot drop a template database ``` so we have to unset `is_template` first. Fixing it, I noticed that our `escape_literal` isn't exactly correct and following the same logic as in `quote_literal_internal`, we need to prepend string with `E`. Otherwise, it's not possible to filter `pg_database` using `escape_literal()` result if name contains `\`, for example. Also use `FORCE` to drop database even if there are active connections. We run this from `cloud_admin`, so it should have enough privileges. NB: there could be other db states, which prevent us from dropping the database. For example, if db is used by any active subscription or logical replication slot. TODO: deal with it once we allow logical replication. Proper fix should involve returning an error code to the control plane, so it could figure out that this is a non-retryable error, return it to the user and mark operation as permanently failed. Related to neondatabase/cloud#4258	2023-07-13 13:18:35 +02:00
Conrad Ludgate	db4d094afa	proxy: add more error cases to retry connect (#4707 ) ## Problem In the logs, I noticed we still weren't retrying in some cases. Seemed to be timeouts but we explicitly wanted to handle those ## Summary of changes Retry on io::ErrorKind::TimedOut errors. Handle IO errors in tokio_postgres::Error.	2023-07-13 11:47:27 +01:00

1 2 3 4 5 ...

3592 Commits