rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-05 20:42:54 +00:00

Author	SHA1	Message	Date
Arseny Sher	49c57c0b13	Add neon_local to docker image. People sometimes ask about this. https://community.neon.tech/t/is-the-neon-local-binary-in-any-of-the-official-docker-images/360/2	2023-08-14 14:08:51 +03:00
John Spray	d3a97fdf88	pageserver: avoid incrementing access time when reading layers for compaction (#4971 ) ## Problem Currently, image generation reads delta layers before writing out subsequent image layers, which updates the access time of the delta layers and effectively puts them at the back of the queue for eviction. This is the opposite of what we want, because after a delta layer is covered by a later image layer, it's likely that subsequent reads of latest data will hit the image rather than the delta layer, so the delta layer should be quite a good candidate for eviction. ## Summary of changes `RequestContext` gets a new `ATimeBehavior` field, and a `RequestContextBuilder` helper so that we can optionally add the new field without growing `RequestContext::new` every time we add something like this. Request context is passed into the `record_access` function, and the access time is not updated if `ATimeBehavior::Skip` is set. The compaction background task constructs its request context with this skip policy. Closes: https://github.com/neondatabase/neon/issues/4969	2023-08-14 10:18:22 +01:00
Arthur Petukhovsky	763f5c0641	Remove dead code from walproposer_utils.c (#4525 ) This code was mostly copied from walsender.c and the idea was to keep it similar to walsender.c, so that we can easily copy-paste future upstream changes to walsender.c to waproposer_utils.c, too. But right now I see that deleting it doesn't break anything, so it's better to remove unused parts.	2023-08-14 09:49:51 +01:00
Arseny Sher	8173813584	Add term=n option to safekeeper START_REPLICATION command. It allows term leader to ensure he pulls data from the correct term. Absense of it wasn't very problematic due to CRC checks, but let's be strict. walproposer still doesn't use it as we're going to remove recovery completely from it.	2023-08-12 12:20:13 +03:00
Felix Prasanna	cc2d00fea4	bump vm-builder version to v0.15.4 (#4980 ) Patches a bug in vm-builder where it did not include enough parameters in the query string. These parameters are `host=localhost port=5432`. These parameters were not necessary for the monitor because the `pq` go postgres driver included them by default.	2023-08-11 14:26:53 -04:00
Arpad Müller	9ffccb55f1	InMemoryLayer: move end_lsn out of the lock (#4963 ) ## Problem In some places, the lock on `InMemoryLayerInner` is only created to obtain `end_lsn`. This is not needed however, if we move `end_lsn` to `InMemoryLayer` instead. ## Summary of changes Make `end_lsn` a member of `InMemoryLayer`, and do less locking of `InMemoryLayerInner`. `end_lsn` is changed from `Option<Lsn>` into an `OnceLock<Lsn>`. Thanks to this change, we don't need to lock any more in three functions. Part of #4743 . Suggested in https://github.com/neondatabase/neon/pull/4905#issuecomment-1666458428 .	2023-08-11 18:01:02 +02:00
Arthur Petukhovsky	3a6b99f03c	proxy: improve http logs (#4976 ) Fix multiline logs on websocket errors and always print sql-over-http errors sent to the user.	2023-08-11 18:18:07 +03:00
Dmitry Rodionov	d39fd66773	tests: remove redundant wait_while (#4952 ) Remove redundant `wait_while` in tests. It had only one usage. Use `wait_tenant_status404`. Related: https://github.com/neondatabase/neon/pull/4855#discussion_r1289610641	2023-08-11 10:18:13 +03:00
Arthur Petukhovsky	73d7a9bc6e	proxy: propagate ws span (#4966 ) Found this log on staging: ``` 2023-08-10T17:42:58.573790Z INFO handling interactive connection from client protocol="ws" ``` We seem to be losing websocket span in spawn, this patch fixes it.	2023-08-10 23:38:22 +03:00
Sasha Krassovsky	3a71cf38c1	Grant BypassRLS to new neon_superuser roles (#4935 )	2023-08-10 21:04:45 +02:00
Conrad Ludgate	25c66dc635	proxy: http logging to 11 (#4950 ) ## Problem Mysterious network issues ## Summary of changes Log a lot more about HTTP/DNS in hopes of detecting more of the network errors	2023-08-10 17:49:24 +01:00
George MacKerron	538373019a	Increase max sql-over-http response size from 1MB to 10MB (#4961 ) ## Problem 1MB response limit is very small. ## Summary of changes This data is not yet tracked, so we shoudn't raise the limit too high yet. But as discussed with @kelvich and @conradludgate, this PR lifts it to 10MB, and adds also details of the limit to the error response.	2023-08-10 17:21:52 +01:00
Dmitry Rodionov	c58b22bacb	Delete tenant's data from s3 (#4855 ) ## Summary of changes For context see https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md Create Flow to delete tenant's data from pageserver. The approach heavily mimics previously implemented timeline deletion implemented mostly in https://github.com/neondatabase/neon/pull/4384 and followed up in https://github.com/neondatabase/neon/pull/4552 For remaining deletion related issues consult with deletion project here: https://github.com/orgs/neondatabase/projects/33 resolves #4250 resolves https://github.com/neondatabase/neon/issues/3889 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-10 18:53:16 +03:00
Alek Westover	17aea78aa7	delete already present files from library index (#4955 )	2023-08-10 16:51:16 +03:00
Joonas Koivunen	71f9d9e5a3	test: allow slow shutdown warning (#4953 ) Introduced in #4886, did not consider that tests with real_s3 could sometimes go over the limit. Do not fail tests because of that.	2023-08-10 15:55:41 +03:00
Alek Westover	119b86480f	test: make pg_regress less flaky, hopefully (#4903 ) `pg_regress` is flaky: https://github.com/neondatabase/neon/issues/559 Consolidated `CHECKPOINT` to `check_restored_datadir_content`, add a wait for `wait_for_last_flush_lsn`. Some recently introduced flakyness was fixed with #4948. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-10 15:24:43 +03:00
Arpad Müller	fa1f87b268	Make the implementation of DiskBtreeReader::visit non-recursive (#4884 ) ## Problem The `DiskBtreeReader::visit` function calls `read_blk` internally, and while #4863 converted the API of `visit` to async, the internal function is still recursive. So, analogously to #4838, we turn the recursive function into an iterative one. ## Summary of changes First, we prepare the change by moving the for loop outside of the case switch, so that we only have one loop that calls recursion. Then, we switch from using recursion to an approach where we store the search path inside the tree on a stack on the heap. The caller of the `visit` function can control when the search over the B-Tree ends, by returning `false` from the closure. This is often used to either only find one specific entry (by always returning `false`), but it is also used to iterate over all entries of the B-tree (by always returning `true`), or to look for ranges (mostly in tests, but `get_value_reconstruct_data` also has such a use). Each stack entry contains two things: the block number (aka the block's offset), and a children iterator. The children iterator is constructed depending on the search direction, and with the results of a binary search over node's children list. It is the only thing that survives a spilling/push to the stack, everything else is reconstructed. In other words, each stack spill, will, if the search is still ongoing, cause an entire re-parsing of the node. Theoretically, this would be a linear overhead in the number of leaves the search visits. However, one needs to note: * the workloads to look for a specific entry are just visiting one leaf, ever, so this is mostly about workloads that visit larger ranges, including ones that visit the entire B-tree. * the requests first hit the page cache, so often the cost is just in terms of node deserialization * for nodes that only have leaf nodes as children, no spilling to the stack-on-heap happens (outside of the initial request where the iterator is `None`). In other words, for balanced trees, the spilling overhead is $\Theta\left(\frac{n}{b^2}\right)$, where `b` is the branching factor and `n` is the number of nodes in the tree. The B-Trees in the current implementation have a branching factor of roughly `PAGE_SZ/L` where `PAGE_SZ` is 8192, and `L` is `DELTA_KEY_SIZE = 26` or `KEY_SIZE = 18` in production code, so this gives us an estimate that we'd be re-loading an inner node for every 99000 leaves in the B-tree in the worst case. Due to these points above, I'd say that not fully caching the inner nodes with inner children is reasonable, especially as we also want to be fast for the "find one specific entry" workloads, where the stack content is never accessed: any action to make the spilling computationally more complex would contribute to wasted cycles here, even if these workloads "only" spill one node for each depth level of the b-tree (which is practically always a low single-digit number, Kleppmann points out on page 81 that for branching factor 500, a four level B-tree with 4 KB pages can store 250 TB of data). But disclaimer, this is all stuff I thought about in my head, I have not confirmed it with any benchmarks or data. Builds on top of #4863, part of #4743	2023-08-10 13:43:13 +02:00
Joonas Koivunen	db48f7e40d	test: mark test_download_extensions.py skipped for now (#4948 ) The test mutates a shared directory which does not work with multiple concurrent tests. It is being fixed, so this should be a very temporary band-aid. Cc: #4949.	2023-08-10 11:05:27 +00:00
Alek Westover	e157b16c24	if control file already exists ignore the remote version of the extension (#4945 )	2023-08-09 18:56:09 +00:00
bojanserafimov	94ad9204bb	Measure compute-pageserver latency (#4901 ) Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-09 13:20:30 -04:00
Joonas Koivunen	c8aed107c5	refactor: make {Delta,Image}LayerInners usable without {Delta,Image}Layer (#4937 ) On the quest of #4745, these are more related to the task at hand, but still small. In addition to $subject, allow `ValueRef<ResidentDeltaLayer>`.	2023-08-09 19:18:44 +03:00
Anastasia Lubennikova	da128a509a	fix pkglibdir path for remote extensions	2023-08-09 19:13:11 +03:00
Alexander Bayandin	5993b2bedc	test_runner: remove excessive timeouts (#4659 ) ## Problem For some tests, we override the default timeout (300s / 5m) with a larger values like 600s / 10m or even 1800s / 30m, even if it's not required. I've collected some statistics (for the last 60 days) for tests duration: \| test \| max (s) \| p99 (s) \| p50 (s) \| count \| \|-----------------------------------\|---------\|---------\|---------\|-------\| \| test_hot_standby \| 9 \| 2 \| 2 \| 5319 \| \| test_import_from_vanilla \| 16 \| 9 \| 6 \| 5692 \| \| test_import_from_pageserver_small \| 37 \| 7 \| 5 \| 5719 \| \| test_pg_regress \| 101 \| 73 \| 44 \| 5642 \| \| test_isolation \| 65 \| 56 \| 39 \| 5692 \| A couple of tests that I left with custom 600s / 10m timeout. \| test \| max (s) \| p99 (s) \| p50 (s) \| count \| \|-----------------------------------\|---------\|---------\|---------\|-------\| \| test_gc_cutoff \| 456 \| 224 \| 109 \| 5694 \| \| test_pageserver_chaos \| 528 \| 267 \| 121 \| 5712 \| ## Summary of changes - Remove `@pytest.mark.timeout` annotation from several tests	2023-08-09 16:27:53 +01:00
Anastasia Lubennikova	4ce7aa9ffe	Fix extensions download error handling (#4941 ) Don't panic if library or extension is not found in remote extension storage or download has failed. Instead, log the error and proceed - if file is not present locally as well, postgres will fail with postgres error. If it is a shared_preload_library, it won't start, because of bad config. Otherwise, it will just fail to run the SQL function/ command that needs the library. Also, don't try to download extensions if remote storage is not configured.	2023-08-09 15:37:51 +03:00
Joonas Koivunen	cbd04f5140	remove_remote_layer: uninteresting refactorings (#4936 ) In the quest to solve #4745 by moving the download/evictedness to be internally mutable factor of a Layer and get rid of `trait PersistentLayer` at least for prod usage, `layer_removal_cs`, we present some misc cleanups. --------- Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2023-08-09 14:35:56 +03:00
Arpad Müller	1037a8ddd9	Explain why VirtualFile stores tenant_id and timeline_id as strings (#4930 ) ## Problem One might wonder why the code here doesn't use `TimelineId` or `TenantId`. I originally had a refactor to use them, but then discarded it, because converting to strings on each time there is a read or write is wasteful. ## Summary of changes We add some docs explaining why here no `TimelineId` or `TenantId` is being used.	2023-08-08 23:41:09 +02:00
Felix Prasanna	6661f4fd44	bump vm-builder version to v0.15.0-alpha1 (#4934 )	2023-08-08 15:22:10 -05:00
Alexander Bayandin	b9f84b9609	Improve test results format (#4549 ) ## Problem The current test history format is a bit inconvenient: - It stores all test results in one row, so all queries should include subqueries which expand the test - It includes duplicated test results if the rerun is triggered manually for one of the test jobs (for example, if we rerun `debug-pg14`, then the report will include duplicates for other build types/postgres versions) - It doesn't have a reference to run_id, which we use to create a link to allure report Here's the proposed new format: ``` id BIGSERIAL PRIMARY KEY, parent_suite TEXT NOT NULL, suite TEXT NOT NULL, name TEXT NOT NULL, status TEXT NOT NULL, started_at TIMESTAMPTZ NOT NULL, stopped_at TIMESTAMPTZ NOT NULL, duration INT NOT NULL, flaky BOOLEAN NOT NULL, build_type TEXT NOT NULL, pg_version INT NOT NULL, run_id BIGINT NOT NULL, run_attempt INT NOT NULL, reference TEXT NOT NULL, revision CHAR(40) NOT NULL, raw JSONB COMPRESSION lz4 NOT NULL, ``` ## Summary of changes - Misc allure changes: - Update allure to 2.23.1 - Delete files from previous runs in HTML report (by using `sync --delete` instead of `mv`) - Use `test-cases/*.json` instead of `suites.json`, using this directory allows us to catch all reruns. - Until we migrated `scripts/flaky_tests.py` and `scripts/benchmark_durations.py` store test results in 2 formats (in 2 different databases).	2023-08-08 20:09:38 +01:00
Felix Prasanna	459253879e	Revert "bump vm-builder to v0.15.0-alpha1 (#4895 )" (#4931 ) This reverts commit `682dfb3a31`.	2023-08-08 20:21:39 +03:00
Conrad Ludgate	0fa85aa08e	proxy: delay auth on retry (#4929 ) ## Problem When an endpoint is shutting down, it can take a few seconds. Currently when starting a new compute, this causes an "endpoint is in transition" error. We need to add delays before retrying to ensure that we allow time for the endpoint to shutdown properly. ## Summary of changes Adds a delay before retrying in auth. connect_to_compute already has this delay	2023-08-08 17:19:24 +03:00
Cuong Nguyen	039017cb4b	Add new flag for advertising pg address (#4898 ) ## Problem The safekeeper advertises the same address specified in `--listen-pg`, which is problematic when the listening address is different from the address that the pageserver can use to connect to the safekeeper. ## Summary of changes Add a new optional flag called `--advertise-pg` for the address to be advertised. If this flag is not specified, the behavior is the same as before.	2023-08-08 14:26:38 +03:00
John Spray	4dc644612b	pageserver: expose prometheus metrics for startup time (#4893 ) ## Problem Currently to know how long pageserver startup took requires inspecting logs. ## Summary of changes `pageserver_startup_duration_ms` metric is added, with label `phase` for different phases of startup. These are broken down by phase, where the phases correspond to the existing wait points in the code: - Start of doing I/O - When tenant load is done - When initial size calculation is done - When background jobs start - Then "complete" when everything is done. `pageserver_startup_is_loading` is a 0/1 gauge that indicates whether we are in the initial load of tenants. `pageserver_tenant_activation_seconds` is a histogram of time in seconds taken to activate a tenant. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-08 12:41:37 +03:00
Anastasia Lubennikova	6d17d6c775	Use WebIdentityTokenCredentialsProvider to access remote extensions (#4921 ) Fixes access to s3 buckets that use IAM roles for service accounts access control method --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-08 12:37:22 +03:00
John Spray	4892a5c5b7	pageserver: avoid logging the "ERROR" part of DbErrors that are successes (#4902 ) ## Problem The pageserver<->safekeeper protocol uses error messages to indicate end of stream. pageserver already logs these at INFO level, but the inner error message includes the word "ERROR", which interferes with log searching. Example: ``` walreceiver connection handling ended: db error: ERROR: ending streaming to Some("pageserver") at 0/4031CA8 ``` The inner DbError has a severity of ERROR so DbError's Display implementation includes that ERROR, even though we are actually logging the error at INFO level. ## Summary of changes Introduce an explicit WalReceiverError type, and in its From<> for postgres errors, apply the logic from ExpectedError, for expected errors, and a new condition for successes. The new output looks like: ``` walreceiver connection handling ended: Successful completion: ending streaming to Some("pageserver") at 0/154E9C0, receiver is caughtup and there is no computes ```	2023-08-08 12:35:24 +03:00
John Spray	33cb1e9c0c	tests: enable higher concurrency and adjust tests with outlier runtime (#4904 ) ## Problem I spent a few minutes seeing how fast I could get our regression test suite to run on my workstation, for when I want to run a "did I break anything?" smoke test before pushing to CI. - Test runtime was dominated by a couple of tests that run for longer than all the others take together - Test concurrency was limited to <16 by the ports-per-worker setting There's no "right answer" for how long a test should be, but as a rule of thumb, no one test should run for much longer than the time it takes to run all the other tests together. ## Summary of changes - Make the ports per worker setting dynamic depending on worker count - Modify the longest running tests to run for a shorter time (`test_duplicate_layers` which uses a pgbench runtime) or fewer iterations (`test_restarts_frequent_checkpoints`).	2023-08-08 09:16:21 +01:00
Arpad Müller	9559ef6f3b	Sort by (key, lsn), not just key (#4918 ) ## Problem PR #4839 didn't output the keys/values in lsn order, but for a given key, the lsns were kept in incoming file order. I think the ordering by lsn is expected. ## Summary of changes We now also sort by `(key, lsn)`, like we did before #4839.	2023-08-07 18:14:15 +03:00
John Spray	64a4fb35c9	pagectl: skip `metadata` file in `pagectl draw-timeline` (#4872 ) ## Problem Running `pagectl draw-timeline` on a pageserver directory wasn't working out of the box because it trips up on the `metadata` file. ## Summary of changes Just ignore the `metadata` file in the list of input files passed to `draw-timeline`.	2023-08-07 08:24:50 +01:00
MMeent	95ec42f2b8	Change log levels on various operations (#4914 ) Cache changes are now DEBUG2 Logs that indicate disabled caches now explicitly call out that the file cache is disabled on WARNING level instead of LOG/INFO	2023-08-06 20:37:09 +02:00
Joonas Koivunen	ba9df27e78	fix: silence not found error when removing ephmeral (#4900 ) We currently cannot drop tenant before removing it's directory, or use Tenant::drop for this. This creates unnecessary or inactionable warnings during detach at least. Silence the most typical, file not found. Log remaining at `error!`. Cc: #2442	2023-08-04 21:03:17 +03:00
Joonas Koivunen	ea3e1b51ec	Remote storage metrics (#4892 ) We don't know how our s3 remote_storage is performing, or if it's blocking the shutdown. Well, for sampling reasons, we will not really know even after this PR. Add metrics: - align remote_storage metrics towards #4813 goals - histogram `remote_storage_s3_request_seconds{request_type=(get_object\|put_object\|delete_object\|list_objects), result=(ok\|err\|cancelled)}` - histogram `remote_storage_s3_wait_seconds{request_type=(same kinds)}` - counter `remote_storage_s3_cancelled_waits_total{request_type=(same kinds)}` Follow-up work: - After release, remove the old metrics, migrate dashboards Histogram buckets are rough guesses, need to be tuned. In pageserver we have a download timeout of 120s, so I think the 100s bucket is quite nice.	2023-08-04 21:01:29 +03:00
John Spray	e3e739ee71	pageserver: remove no-op attempt to report fail/failpoint feature (#4879 ) ## Problem The current output from a prod binary at startup is: ``` git-env:765455bca22700e49c053d47f44f58a6df7c321f failpoints: true, features: [] launch_timestamp: 2023-08-02 10:30:35.545217477 UTC ``` It's confusing to read that line, then read the code and think "if failpoints is true, but not in the features list, what does that mean?". As far as I can tell, the check of `fail/failpoints` is just always false because cargo doesn't expose features across crates like this: the `fail/failpoints` syntax works in the cargo CLI but not from a macro in some crate other than `fail`. ## Summary of changes Remove the lines that try to check `fail/failpoints` from the pageserver entrypoint module. This has no functional impact but makes the code slightly easier to understand when trying to make sense of the line printed on startup.	2023-08-04 17:56:31 +01:00
Conrad Ludgate	606caa0c5d	proxy: update logs and span data to be consistent and have more info (#4878 ) ## Problem Pre-requisites for #4852 and #4853 ## Summary of changes 1. Includes the client's IP address (which we already log) with the span info so we can have it on all associated logs. This makes making dashboards based on IP addresses easier. 2. Switch to a consistent error/warning log for errors during connection. This includes error, num_retries, retriable=true/false and a consistent log message that we can grep for.	2023-08-04 12:37:18 +03:00
Arpad Müller	6a906c68c9	Make {DeltaLayer,ImageLayer}::{load,load_inner} async (#4883 ) ## Problem The functions `DeltaLayer::load_inner` and `ImageLayer::load_inner` are calling `read_blk` internally, which we would like to turn into an async fn. ## Summary of changes We switch from `once_cell`'s `OnceCell` implementation to the one in `tokio` in order to be able to call an async `get_or_try_init` function. Builds on top of #4839, part of #4743	2023-08-04 12:35:45 +03:00
Felix Prasanna	682dfb3a31	bump vm-builder to v0.15.0-alpha1 (#4895 )	2023-08-03 14:26:14 -04:00
Joonas Koivunen	5263b39e2c	fix: shutdown logging again (#4886 ) During deploys of 2023-08-03 we logged too much on shutdown. Fix the logging by timing each top level shutdown step, and possibly warn on it taking more than a rough threshold, based on how long I think it possibly should be taking. Also remove all shutdown logging from background tasks since there is already "shutdown is taking a long time" logging. Co-authored-by: John Spray <john@neon.tech>	2023-08-03 20:34:05 +03:00
Arpad Müller	a241c8b2a4	Make DiskBtreeReader::{visit, get} async (#4863 ) ## Problem `DiskBtreeReader::get` and `DiskBtreeReader::visit` both call `read_blk` internally, which we would like to make async in the future. This PR focuses on making the interface of these two functions `async`. There is further work to be done in forms of making `visit` to not be recursive any more, similar to #4838. For that, see https://github.com/neondatabase/neon/pull/4884. Builds on top of https://github.com/neondatabase/neon/pull/4839, part of https://github.com/neondatabase/neon/issues/4743 ## Summary of changes Make `DiskBtreeReader::get` and `DiskBtreeReader::visit` async functions and `await` in the places that call these functions.	2023-08-03 17:36:46 +02:00
John Spray	e71d8095b9	README: make it a bit clearer how to get regression tests running (#4885 ) ## Problem When setting up for the first time I hit a couple of nits running tests: - It wasn't obvious that `openssl` and `poetry` were needed (poetry is mentioned kind of obliquely via "dependency installation notes" rather than being in the list of rpm/deb packages to install. - It wasn't obvious how to get the tests to run for just particular parameters (e.g. just release mode) ## Summary of changes Add openssl and poetry to the package lists. Add an example of how to run pytest for just a particular build type and postgres version.	2023-08-03 15:23:23 +01:00
Dmitry Rodionov	1497a42296	tests: split neon_fixtures.py (#4871 ) ## Problem neon_fixtures.py has grown to unmanageable size. It attracts conflicts. When adding specific utils under for example `fixtures/pageserver` things sometimes need to import stuff from `neon_fixtures.py` which creates circular import. This is usually only needed for type annotations, so `typing.TYPE_CHECKING` flag can mask the issue. Nevertheless I believe that splitting neon_fixtures.py into smaller parts is a better approach. Currently the PR contains small things, but I plan to continue and move NeonEnv to its own `fixtures.env` module. To keep the diff small I think this PR can already be merged to cause less conflicts. UPD: it looks like currently its not really possible to fully avoid usage of `typing.TYPE_CHECKING`, because some components directly depend on each other. I e Env -> Cli -> Env cycle. But its still worth it to avoid it in as many places as possible. And decreasing neon_fixture's size still makes sense.	2023-08-03 17:20:24 +03:00
Alexander Bayandin	cd33089a66	test_runner: set AWS credentials for endpoints (#4887 ) ## Problem If AWS credentials are not set locally (via AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars) `test_remote_library[release-pg15-mock_s3]` test fails with the following error: ``` ERROR could not start the compute node: Failed to download a remote file: Failed to download S3 object: failed to construct request ``` ## Summary of changes - set AWS credentials for endpoints programmatically	2023-08-03 16:44:48 +03:00
Arpad Müller	416c14b353	Compaction: sort on slices directly instead of kmerge (#4839 ) ## Problem The k-merge in pageserver compaction currently relies on iterators over the keys and also over the values. This approach does not support async code because we are using iterators and those don't support async in general. Also, the k-merge implementation we use doesn't support async either. Instead, as we already load all the keys into memory, just do sorting in-memory. ## Summary of changes The PR can be read commit-by-commit, but most importantly, it: * Stops using kmerge in compaction, using slice sorting instead. * Makes `load_keys` and `load_val_refs` async, using `Handle::block_on` in the compaction code as we don't want to turn the compaction function, called inside `spawn_blocking`, into an async fn. Builds on top of #4836, part of https://github.com/neondatabase/neon/issues/4743	2023-08-03 15:30:41 +02:00

1 2 3 4 5 ...

3551 Commits