rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-03 12:10:36 +00:00

Author	SHA1	Message	Date
Christian Schwarz	67345d66ea	Merge pull request #4858 from neondatabase/releases/2023-08-01 Release 2023-08-01	2023-08-01 10:44:01 +02:00
Alex Chi Z	7b6c849456	support isolation level + read only for http batch sql (#4830 ) We will retrieve `neon-batch-isolation-level` and `neon-batch-read-only` from the http header, which sets the txn properties. https://github.com/neondatabase/serverless/pull/38#issuecomment-1653130981 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-08-01 02:59:11 +03:00
Joonas Koivunen	326189d950	consumption_metrics: send timeline_written_size_delta (#4822 ) We want to have timeline_written_size_delta which is defined as difference to the previously sent `timeline_written_size` from the current `timeline_written_size`. Solution is to send it. On the first round `disk_consistent_lsn` is used which is captured during `load` time. After that an incremental "event" is sent on every collection. Incremental "events" are not part of deduplication. I've added some infrastructure to allow somewhat typesafe `EventType::Absolute` and `EventType::Incremental` factories per metrics, now that we have our first `EventType::Incremental` usage.	2023-07-31 22:10:19 +03:00
bojanserafimov	ddbe170454	Prewarm compute nodes (#4828 )	2023-07-31 14:13:32 -04:00
Alexander Bayandin	39e458f049	test_compatibility: fix pg_tenant_only_port port collision (#4850 ) ## Problem Compatibility tests fail from time to time due to `pg_tenant_only_port` port collision (added in https://github.com/neondatabase/neon/pull/4731) ## Summary of changes - replace `pg_tenant_only_port` value in config with new port - remove old logic, than we don't need anymore - unify config overrides	2023-07-31 20:49:46 +03:00
Vadim Kharitonov	e1424647a0	Update pg_embedding to 0.3.1 version (#4811 )	2023-07-31 20:23:18 +03:00
Yinnan Yao	705ae2dce9	Fix error message for listen_pg_addr_tenant_only binding (#4787 ) ## Problem Wrong use of `conf.listen_pg_addr` in `error!()`. ## Summary of changes Use `listen_pg_addr_tenant_only` instead of `conf.listen_pg_addr`. Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2023-07-31 14:40:52 +01:00
Conrad Ludgate	eb78603121	proxy: div by zero (#4845 ) ## Problem 1. In the CacheInvalid state loop, we weren't checking the `num_retries`. If this managed to get up to `32`, the retry_after procedure would compute 2^32 which would overflow to 0 and trigger a div by zero 2. When fixing the above, I started working on a flow diagram for the state machine logic and realised it was more complex than it had to be: a. We start in a `Cached` state b. `Cached`: call `connect_once`. After the first connect_once error, we always move to the `CacheInvalid` state, otherwise, we return the connection. c. `CacheInvalid`: we attempt to `wake_compute` and we either switch to Cached or we retry this step (or we error). d. `Cached`: call `connect_once`. We either retry this step or we have a connection (or we error) - After num_retries > 1 we never switch back to `CacheInvalid`. ## Summary of changes 1. Insert a `num_retries` check in the `handle_try_wake` procedure. Also using floats in the retry_after procedure to prevent the overflow entirely 2. Refactor connect_to_compute to be more linear in design.	2023-07-31 09:30:24 -04:00
John Spray	f0ad603693	pageserver: add unit test for deleted_at in IndexPart (#4844 ) ## Problem Existing IndexPart unit tests only exercised the version 1 format (i.e. without deleted_at set). ## Summary of changes Add a test that sets version to 2, and sets a value for deleted_at. Closes https://github.com/neondatabase/neon/issues/4162	2023-07-31 12:51:18 +01:00
Arpad Müller	e5183f85dc	Make DiskBtreeReader::dump async (#4838 ) ## Problem `DiskBtreeReader::dump` calls `read_blk` internally, which we want to make async in the future. As it is currently relying on recursion, and async doesn't like recursion, we want to find an alternative to that and instead traverse the tree using a loop and a manual stack. ## Summary of changes * Make `DiskBtreeReader::dump` and all the places calling it async * Make `DiskBtreeReader::dump` non-recursive internally and use a stack instead. It now deparses the node in each iteration, which isn't optimal, but on the other hand it's hard to store the node as it is referencing the buffer. Self referential data are hard in Rust. For a dumping function, speed isn't a priority so we deparse the node multiple times now (up to branching factor many times). Part of https://github.com/neondatabase/neon/issues/4743 I have verified that output is unchanged by comparing the output of this command both before and after this patch: ``` cargo test -p pageserver -- particular_data --nocapture ```	2023-07-31 12:52:29 +02:00
Joonas Koivunen	89ee8f2028	fix: demote warnings, fix flakyness (#4837 ) `WARN ... found future (image\|delta) layer` are not actionable log lines. They don't need to be warnings. `info!` is enough. This also fixes some known but not tracked flakyness in [`test_remote_timeline_client_calls_started_metric`][evidence]. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4829/5683495367/index.html#/testresult/34fe79e24729618b Closes #3369. Closes #4473.	2023-07-31 07:43:12 +00:00
Alex Chi Z	a8f3540f3d	proxy: add unit test for wake_compute (#4819 ) ## Problem ref https://github.com/neondatabase/neon/pull/4721, ref https://github.com/neondatabase/neon/issues/4709 ## Summary of changes This PR adds unit tests for wake_compute. The patch adds a new variant `Test` to auth backends. When `wake_compute` is called, we will verify if it is the exact operation sequence we are expecting. The operation sequence now contains 3 more operations: `Wake`, `WakeRetry`, and `WakeFail`. The unit tests for proxy connects are now complete and I'll continue work on WebSocket e2e test in future PRs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-28 19:10:55 -04:00
Konstantin Knizhnik	4338eed8c4	Make it possible to grant self perfmissions to self created roles (#4821 ) ## Problem See: https://neondb.slack.com/archives/C04USJQNLD6/p1689973957908869 ## Summary of changes Bump Postgres version ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-07-28 22:06:03 +03:00
Joonas Koivunen	2fbdf26094	test: raise timeout to avoid flakyness (#4832 ) 2s timeout was too tight for our CI, [evidence](https://neon-github-public-dev.s3.amazonaws.com/reports/main/5669956577/index.html#/testresult/6388e31182cc2d6e). 15s might be better. Also cleanup code no longer needed after #4204.	2023-07-28 14:32:01 -04:00
Alexander Bayandin	7374634845	test_runner: clean up test_compatibility (#4770 ) ## Problem We have some amount of outdated logic in test_compatibility, that we don't need anymore. ## Summary of changes - Remove `PR4425_ALLOWED_DIFF` and tune `dump_differs` method to accept allowed diffs in the future (a cleanup after https://github.com/neondatabase/neon/pull/4425) - Remote etcd related code (a cleanup after https://github.com/neondatabase/neon/pull/2733) - Don't set `preserve_database_files`	2023-07-28 16:15:31 +01:00
Alexander Bayandin	9fdd3a4a1e	test_runner: add amcheck to test_compatibility (#4772 ) Run `pg_amcheck` in forward and backward compatibility tests to catch some data corruption. ## Summary of changes - Add amcheck compiling to Makefile - Add `pg_amcheck` to test_compatibility	2023-07-28 16:00:55 +01:00
Alek Westover	3681fc39fd	modify `relative_path_to_s3_object` logic for `prefix=None` (#4795 ) see added unit tests for more description	2023-07-28 10:03:18 -04:00
Joonas Koivunen	67d2fa6dec	test: fix `test_neon_cli_basics` flakyness without making it better for future (#4827 ) The test was starting two endpoints on the same branch as discovered by @petuhovskiy. The fix is to allow passing branch-name from the python side over to neon_local, which already accepted it. Split from #4824, which will handle making this more misuse resistant.	2023-07-27 19:13:58 +03:00
Dmitry Rodionov	cafbe8237e	Move tenant/delete.rs to tenant/timeline/delete.rs (#4825 ) move tenant/delete.rs to tenant/timeline/delete.rs to prepare for appearance of tenant deletion routines in tenant/delete.rs	2023-07-27 15:52:36 +03:00
Joonas Koivunen	3e425c40c0	fix(compute_ctl): remove stray variable in error message (#4823 ) error is not needed because anyhow will have the cause chain reported anyways. related to test_neon_cli_basics being flaky, but doesn't actually fix any flakyness, just the obvious stray `{e}`.	2023-07-27 15:40:53 +03:00
Joonas Koivunen	395bd9174e	test: allow future image layer warning (#4818 ) https://neon-github-public-dev.s3.amazonaws.com/reports/main/5670795960/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/5a73fa4a69399123/retries Allow it because we are doing immediate stop.	2023-07-27 10:22:44 +03:00
Alek Westover	b9a7a661d0	add list of public extensions and lookup table for libraries (#4807 )	2023-07-26 15:55:55 -04:00
Joonas Koivunen	48ce95533c	test: allow normal warnings in test_threshold_based_eviction (#4801 ) See: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5654328815/index.html#suites/3fc871d9ee8127d8501d607e03205abb/3482458eba88c021	2023-07-26 20:20:12 +03:00
Dmitry Rodionov	874c31976e	dedup cleanup fs traces (#4778 ) This is a follow up for discussion: https://github.com/neondatabase/neon/pull/4552#discussion_r1253417777 see context there	2023-07-26 18:39:32 +03:00
Conrad Ludgate	231d7a7616	proxy: retry compute wake in auth (#4817 ) ## Problem wake_compute can fail sometimes but is eligible for retries. We retry during the main connect, but not during auth. ## Summary of changes retry wake_compute during auth flow if there was an error talking to control plane, or if there was a temporary error in waking the compute node	2023-07-26 16:34:46 +01:00
arpad-m	5705413d90	Use OnceLock instead of manually implementing it (#4805 ) ## Problem In https://github.com/neondatabase/neon/issues/4743 , I'm trying to make more of the pageserver async, but in order for that to happen, I need to be able to persist the result of `ImageLayer::load` across await points. For that to happen, the return value needs to be `Send`. ## Summary of changes Use `OnceLock` in the image layer instead of manually implementing it with booleans, locks and `Option`. Part of #4743	2023-07-26 17:20:09 +02:00
Conrad Ludgate	35370f967f	proxy: add some connection init logs (#4812 ) ## Problem The first session event we emit is after we receive the first startup packet from the client. This means we can't detect any issues between TCP open and handling of the first PG packet ## Summary of changes Add some new logs for websocket upgrade and connection handling	2023-07-26 15:03:51 +00:00
Alexander Bayandin	b98419ee56	Fix allure report overwriting for different Postgres versions (#4806 ) ## Problem We've got an example of Allure reports from 2 different runners for the same build that started to upload at the exact second, making one overwrite another ## Summary of changes - Use the Postgres version to distinguish artifacts (along with the build type)	2023-07-26 15:19:18 +01:00
Alexander Bayandin	86a61b318b	Bump certifi from 2022.12.7 to 2023.7.22 (#4815 )	2023-07-26 16:32:56 +03:00
Alek Westover	5f8fd640bf	Upload Test Remote Extensions (#4792 ) We need some real extensions in S3 to accurately test the code for handling remote extensions. In this PR we just upload three extensions (anon, kq_imcx and postgis), which is enough for testing purposes for now. In addition to creating and uploading the extension archives, we must generate a file `ext_index.json` which specifies important metadata about the extensions. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-26 15:24:03 +03:00
bojanserafimov	916a5871a6	compute_ctl: Parse sk connstring (#4809 )	2023-07-26 08:10:49 -04:00
Dmitry Rodionov	700d929529	Init Timeline in Stopping state in create_timeline_struct when Cause::Delete (#4780 ) See https://github.com/neondatabase/neon/pull/4552#discussion_r1258368127 for context. TLDR: use CreateTimelineCause to infer desired state instead of using .set_stopping after initialization	2023-07-26 14:05:18 +03:00
bojanserafimov	520046f5bd	cold starts: Add sync-safekeepers fast path (#4804 )	2023-07-25 19:44:18 -04:00
Conrad Ludgate	2ebd2ce2b6	proxy: record connection type (#4802 ) ## Problem We want to measure how many users are using TCP/WS connections. We also want to measure how long it takes to establish a connection with the compute node. I plan to also add a separate counter for HTTP requests, but because of pooling this needs to be disambiguated against new HTTP compute connections ## Summary of changes * record connection type (ws/tcp) in the connection counters. * record connection latency including retry latency	2023-07-25 18:57:42 +03:00
Alex Chi Z	bcc2aee704	proxy: add tests for batch http sql (#4793 ) This PR adds an integration test case for batch HTTP SQL endpoint. https://github.com/neondatabase/neon/pull/4654/ should be merged first before we land this PR. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-25 15:08:24 +00:00
Dmitry Rodionov	6d023484ed	Use mark file to allow for deletion operations to continue through restarts (#4552 ) ## Problem Currently we delete local files first, so if pageserver restarts after local files deletion then remote deletion is not continued. This can be solved with inversion of these steps. But even if these steps are inverted when index_part.json is deleted there is no way to distinguish between "this timeline is good, we just didnt upload it to remote" and "this timeline is deleted we should continue with removal of local state". So to solve it we use another mark file. After index part is deleted presence of this mark file indentifies that it was a deletion intention. Alternative approach that was discussed was to delete all except metadata first, and then delete metadata and index part. In this case we still do not support local only configs making them rather unsafe (deletion in them is already unsafe, but this direction solidifies this direction instead of fixing it). Another downside is that if we crash after local metadata gets removed we may leave dangling index part on the remote which in theory shouldnt be a big deal because the file is small. It is not a big change to choose another approach at this point. ## Summary of changes Timeline deletion sequence: 1. Set deleted_at in remote index part. 2. Create local mark file. 3. Delete local files except metadata (it is simpler this way, to be able to reuse timeline initialization code that expects metadata) 4. Delete remote layers 5. Delete index part 6. Delete meta, timeline directory. 7. Delete mark file. This works for local only configuration without remote storage. Sequence is resumable from any point. resolves #4453 resolves https://github.com/neondatabase/neon/pull/4552 (the issue was created with async cancellation in mind, but we can still have issues with retries if metadata is deleted among the first by remove_dir_all (which doesnt have any ordering guarantees)) --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-07-25 16:25:27 +03:00
Nick Randall	062159ac17	support non-interactive transactions in sql-over-http (#4654 ) This PR adds support for non-interactive transaction query endpoint. It accepts an array of queries and parameters and returns an array of query results. The queries will be run in a single transaction one after another on the proxy side.	2023-07-25 13:03:55 +01:00
cui fliter	f2e2b8a7f4	fix some typos (#4662 ) Typos fix Signed-off-by: cui fliter <imcusg@gmail.com>	2023-07-25 14:39:29 +03:00
Shany Pozin	2266ee5971	Merge pull request #4803 from neondatabase/releases/2023-07-25 Release 2023-07-25 release-3592	2023-07-25 14:21:07 +03:00
Joonas Koivunen	f9214771b4	fix: count broken tenant more correct (#4800 ) count only once; on startup create the counter right away because we will not observe any changes later. small, probably never reachable from outside fix for #4796.	2023-07-25 12:31:24 +03:00
Joonas Koivunen	77a68326c5	Thin out TenantState metric, keep set of broken tenants (#4796 ) We currently have a timeseries for each of the tenants in different states. We only want this for Broken. Other states could be counters. Fix this by making the `pageserver_tenant_states_count` a counter without a `tenant_id` and add a `pageserver_broken_tenants_count` which has a `tenant_id` label, each broken tenant being 1.	2023-07-25 11:15:54 +03:00
Joonas Koivunen	a25504deae	Limit concurrent compactions (#4777 ) Compactions can create a lot of concurrent work right now with #4265. Limit compactions to use at most 6/8 background runtime threads.	2023-07-25 10:19:04 +03:00
Joonas Koivunen	294b8a8fde	Convert per timeline metrics to global (#4769 ) Cut down the per-(tenant, timeline) histograms by making them global: - `pageserver_getpage_get_reconstruct_data_seconds` - `pageserver_read_num_fs_layers` - `pageserver_remote_operation_seconds` - `pageserver_remote_timeline_client_calls_started` - `pageserver_wait_lsn_seconds` - `pageserver_io_operations_seconds` --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-07-25 00:43:27 +03:00
Alex Chi Z	407a20ceae	add proxy unit tests for retry connections (#4721 ) Given now we've refactored `connect_to_compute` as a generic, we can test it with mock backends. In this PR, we mock the error API and connect_once API to test the retry behavior of `connect_to_compute`. In the next PR, I'll add mock for credentials so that we can also test behavior with `wake_compute`. ref https://github.com/neondatabase/neon/issues/4709 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-24 20:41:42 +03:00
arpad-m	e5b7ddfeee	Preparatory pageserver async conversions (#4773 ) In #4743, we'd like to convert the read path to use `async` rust. In preparation of that, this PR switches some functions that are calling lower level functions like `BlockReader::read_blk`, `BlockCursor::read_blob`, etc into `async`. The PR does not switch all functions however, and only focuses on the ones which are easy to switch. This leaves around some async functions that are (currently) unnecessarily `async`, but on the other hand it makes future changes smaller in diff. Part of #4743 (but does not completely address it).	2023-07-24 14:01:54 +02:00
Alek Westover	7feb0d1a80	`unwrap` instead of passing `anyhow::Error` on failure to spawn a thread (#4779 )	2023-07-21 15:17:16 -04:00
Konstantin Knizhnik	457e3a3ebc	Mx offset bug (#4775 ) Fix mx_offset_to_flags_offset() function Fixes issue #4774 Postgres `MXOffsetToFlagsOffset` was not correctly converted to Rust because cast to u16 is done before division by modulo. It is possible only if divider is power of two. Add a small rust unit test to check that the function produces same results as the PostgreSQL macro, and extend the existing python test to cover this bug. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-07-21 21:20:53 +03:00
Joonas Koivunen	25d2f4b669	metrics: chunked responses (#4768 ) Metrics can get really large in the order of hundreds of megabytes, which we used to buffer completly (after a few rounds of growing the buffer).	2023-07-21 15:10:55 +00:00
Alex Chi Z	1685593f38	stable merge and sort in compaction (#4573 ) Per discussion at https://github.com/neondatabase/neon/pull/4537#discussion_r1242086217, it looks like a better idea to use `<` instead of `<=` for all these comparisons. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-21 10:15:44 -04:00
dependabot[bot]	8d0f4a7857	Bump aiohttp from 3.7.4 to 3.8.5 (#4762 )	2023-07-20 22:33:50 +03:00

1 2 3 4 5 ...

3630 Commits