rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-09 14:32:57 +00:00

Author	SHA1	Message	Date
Christian Schwarz	71ccb07a43	ci: fix upload-postgres-extensions-to-s3 job (#5063 ) This is cherry-picked-then-improved version of release branch commit `4204960942` PR #4861) The commit commit `5f8fd640bf` Author: Alek Westover <alek.westover@gmail.com> Date: Wed Jul 26 08:24:03 2023 -0400 Upload Test Remote Extensions (#4792) switched to using the release tag instead of `latest`, but, the `promote-images` job only uploads `latest` to the prod ECR. The switch to using release tag was good in principle, but, it broke the release pipeline. So, switch release pipeline back to using `latest`. Note that a proper fix should abandon use of `:latest` tag at all: currently, if a `main` pipeline runs concurrently with a `release` pipeline, the `release` pipeline may end up using the `main` pipeline's images. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-08-22 22:45:25 +03:00
Joonas Koivunen	ad8d777c1c	refactor: remove is_incremental=true for ImageLayers footgun (#5061 ) Accidentially giving is_incremental=true for ImageLayers costs a lot of debugging time. Removes all API which would allow to do that. They can easily be restored later when needed. Split off from #4938.	2023-08-22 22:12:05 +03:00
Joonas Koivunen	2f97b43315	build: update tar, get rid of duplicate xattr (#5071 ) `tar` recently pushed to 0.4.40. No big changes, but less Cargo.lock and one less nagging from `cargo-deny`. The diff: https://github.com/alexcrichton/tar-rs/compare/0.4.38...0.4.40.	2023-08-22 21:21:44 +03:00
Joonas Koivunen	533a92636c	refactor: pre-cleanup Layer, PersistentLayer and impls (#5059 ) Remove pub but dead code, move trait methods as inherent methods, remove unnecessary. Split off from #4938.	2023-08-22 21:14:28 +03:00
Alek Westover	bf303a6575	Trigger workflow in remote (private) repo to build and upload private extensions (#4944 )	2023-08-22 13:32:29 -04:00
Christian Schwarz	8cd20485f8	metrics: smgr query time: add a pre-aggregated histogram (#5064 ) When doing global queries in VictoriaMetrics, the per-timeline histograms make us run into cardinality limits. We don't want to give them up just yet because we don't have an alternative for drilling down on timeline-specific performance issues. So, add a pre-aggregated histogram and add observations to it whenever we add observations to the per-timeline histogram. While we're at it, switch to using a strummed enum for the operation type names.	2023-08-22 20:08:31 +03:00
Joonas Koivunen	933a869f00	refactor: compaction becomes async again (#5058 ) #4938 will make on-demand download of layers in compaction possible, so it's not suitable for our "policy" of no `spawn_blocking(\|\| ... Handle::block_on(async { spawn_blocking(...).await })` because this poses a clear deadlock risk. Nested spawn_blockings are because of the download using `tokio::fs::File`. - Remove `spawn_blocking` from caller of `compact_level0_phase1` - Remove `Handle::block_on` from `compact_level0_phase1` (indentation change) - Revert to `AsLayerDesc::layer_desc` usage temporarily (until it becomes field access in #4938)	2023-08-22 20:03:14 +03:00
Conrad Ludgate	8c6541fea9	chore: add supported targets to deny (#5070 ) ## Problem many duplicate windows crates pollute the cargo deny output ## Summary of changes we don't build those crates, so remove those targets from being checked	2023-08-22 19:44:31 +03:00
Alek Westover	5cf75d92d8	Fix cargo deny errors (#5068 ) ## Problem cargo deny lint broken Links to the CVEs: [rustsec.org/advisories/RUSTSEC-2023-0052](https://rustsec.org/advisories/RUSTSEC-2023-0052) [rustsec.org/advisories/RUSTSEC-2023-0053](https://rustsec.org/advisories/RUSTSEC-2023-0053) One is fixed, the other one isn't so we allow it (for now), to unbreak CI. Then later we'll try to get rid of webpki in favour of the rustls fork. ## Summary of changes ``` +ignore = ["RUSTSEC-2023-0052"] ```	2023-08-22 18:41:32 +03:00
Conrad Ludgate	0b001a0001	proxy: remove connections on shutdown (#5051 ) ## Problem On shutdown, proxy connections are staying open. ## Summary of changes Remove the connections on shutdown	2023-08-21 19:20:58 +01:00
Felix Prasanna	4a8bd866f6	bump vm-builder version to v0.16.3 (#5055 ) This change to autoscaling allows agents to connect directly to the monitor, completely removing the informant.	2023-08-21 13:29:16 -04:00
John Spray	615a490239	pageserver: refactor Tenant/Timeline args into structs (#5053 ) ## Problem There are some common types that we pass into tenants and timelines as we construct them, such as remote storage and the broker client. Currently the list is small, but this is likely to grow -- the deletion queue PR (#4960) pushed some methods to the point of clippy complaining they had too many args, because of the extra deletion queue client being passed around. There are some shared objects that currently aren't passed around explicitly because they use a static `once_cell` (e.g. CONCURRENT_COMPACTIONS), but as we add more resource management and concurreny control over time, it will be more readable & testable to pass a type around in the respective Resources object, rather than to coordinate via static objects. The `Resources` structures in this PR will make it easier to add references to central coordination functions, without having to rely on statics. ## Summary of changes - For `Tenant`, the `broker_client` and `remote_storage` are bundled into `TenantSharedResources` - For `Timeline`, the `remote_client` is wrapped into `TimelineResources`. Both of these structures will get an additional deletion queue member in #4960.	2023-08-21 17:30:28 +01:00
John Spray	b95addddd5	pageserver: do not read redundant `timeline_layers` from IndexPart, so that we can remove it later (#4972 ) ## Problem IndexPart contains two redundant lists of layer names: a set of the names, and then a map of name to metadata. We already required that all the layers in `timeline_layers` are also in `layers_metadata`, in `initialize_with_current_remote_index_part`, so if there were any index_part.json files in the field that relied on these sets being different, they would already be broken. ## Summary of changes `timeline_layers` is made private and no longer read at runtime. It is still serialized, but not deserialized. `disk_consistent_lsn` is also made private, as this field only exists for convenience of humans reading the serialized JSON. This prepares us to entirely remove `timeline_layers` in a future release, once this change is fully deployed, and therefore no pageservers are trying to read the field.	2023-08-21 14:29:36 +03:00
Joonas Koivunen	130ccb4b67	Remove initial timeline id troubles (#5044 ) I made a mistake when I adding `env.initial_timeline: Optional[TimelineId]` in the #3839, should had just generated it and used it to create a specific timeline. This PR fixes those mistakes, and some extra calling into psql which must be slower than python field access.	2023-08-20 12:33:19 +03:00
Dmitry Rodionov	9140a950f4	Resume tenant deletion on attach (#5039 ) I'm still a bit nervous about attach -> crash case. But it should work. (unlike case with timeline). Ideally would be cool to cover this with test. This continues tradition of adding bool flags for Tenant::set_stopping. Probably lifecycle project will help with fixing it.	2023-08-20 12:28:50 +03:00
Arpad Müller	a23b0773f1	Fix DeltaLayer dumping (#5045 ) ## Problem Before, DeltaLayer dumping (via `cargo run --release -p pagectl -- print-layer-file` ) would crash as one can't call `Handle::block_on` in an async executor thread. ## Summary of changes Avoid the problem by using `DeltaLayerInner::load_keys` to load the keys into RAM (which we already do during compaction), and then load the values one by one during dumping.	2023-08-19 00:56:03 +02:00
Joonas Koivunen	368ee6c8ca	refactor: failpoint support (#5033 ) - move them to pageserver which is the only dependant on the crate fail - "move" the exported macro to the new module - support at init time the same failpoints as runtime Found while debugging test failures and making tests more repeatable by allowing "exit" from pageserver start via environment variables. Made those changes to `test_gc_cutoff.py`. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-08-19 01:01:44 +03:00
Felix Prasanna	5c6a692cf1	bump `VM_BUILDER_VERSION` to v0.16.2 (#5031 ) A very slight change that allows us to configure the UID of the neon-postgres cgroup owner. We start postgres in this cgroup so we can scale it with the cgroups v2 api. Currently, the control plane overwrites the entrypoint set by `vm-builder`, so `compute_ctl` (and thus postgres), is not started in the neon-postgres cgroup. Having `compute_ctl` start postgres in the cgroup should fix this. However, at the moment appears like it does not have the correct permissions. Configuring the neon-postgres UID to `postgres` (which is the UID `compute_ctl` runs under) should hopefully fix this. See #4920 - the PR to modify `compute_ctl` to start postgres in the cgorup. See: neondatabase/autoscaling#480, neondatabase/autoscaling#477. Both these PR's are part of an effort to increase `vm-builder`'s configurability and allow us to adjust it as we integrate in the monitor.	2023-08-18 14:29:20 -04:00
Dmitry Rodionov	30888a24d9	Avoid flakiness in test_timeline_delete_fail_before_local_delete (#5032 ) The problem was that timeline detail can return timelines in not only active state. And by the time request comes timeline deletion can still be in progress if we're unlucky (test execution happened to be slower for some reason) Reference for failed test run https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5022/5891420105/index.html#suites/f588e0a787c49e67b29490359c589fae/dab036e9bd673274 The error was `Exception: detail succeeded (it should return 404)` reported by @koivunej	2023-08-18 20:49:11 +03:00
Dmitry Rodionov	f6c671c140	resume timeline deletions on attach (#5030 ) closes [#5036](https://github.com/neondatabase/neon/issues/5036)	2023-08-18 20:48:33 +03:00
Christian Schwarz	ed5bce7cba	rfcs: archive my MVCC S3 Notion Proposal (#5040 ) This is a copy from the [original Notion page](https://www.notion.so/neondatabase/Proposal-Pageserver-MVCC-S3-Storage-8a424c0c7ec5459e89d3e3f00e87657c?pvs=4), taken on 2023-08-16. This is for archival mostly. The RFC that we're likely to go with is https://github.com/neondatabase/neon/pull/4919.	2023-08-18 19:34:29 +02:00
Christian Schwarz	7a63685cde	simplify page-caching of EphemeralFile (#4994 ) (This PR is the successor of https://github.com/neondatabase/neon/pull/4984 ) ## Summary The current way in which `EphemeralFile` uses `PageCache` complicates the Pageserver code base to a degree that isn't worth it. This PR refactors how we cache `EphemeralFile` contents, by exploiting the append-only nature of `EphemeralFile`. The result is that `PageCache` only holds `ImmutableFilePage` and `MaterializedPage`. These types of pages are read-only and evictable without write-back. This allows us to remove the writeback code from `PageCache`, also eliminating an entire failure mode. Futher, many great open-source libraries exist to solve the problem of a read-only cache, much better than our `page_cache.rs` (e.g., better replacement policy, less global locking). With this PR, we can now explore using them. ## Problem & Analysis Before this PR, `PageCache` had three types of pages: * `ImmutableFilePage`: caches Delta / Image layer file contents * `MaterializedPage`: caches results of Timeline::get (page materialization) * `EphemeralPage`: caches `EphemeralFile` contents `EphemeralPage` is quite different from `ImmutableFilePage` and `MaterializedPage`: * Immutable and materialized pages are for the acceleration of (future) reads of the same data using `PAGE_CACHE_SIZE * PAGE_SIZE` bytes of DRAM. * Ephemeral pages are a write-back cache of `EphemeralFile` contents, i.e., if there is pressure in the page cache, we spill `EphemeralFile` contents to disk. `EphemeralFile` is only used by `InMemoryLayer`, for the following purposes: * write: when filling up the `InMemoryLayer`, via `impl BlobWriter for EphemeralFile` * read: when doing page reconstruction for a page@lsn that isn't written to disk * read: when writing L0 layer files, we re-read the `InMemoryLayer` and put the contents into the L0 delta writer (`create_delta_layer`). This happens every 10min or when InMemoryLayer reaches 256MB in size. The access patterns of the `InMemoryLayer` use case are as follows: * write: via `BlobWriter`, strictly append-only * read for page reconstruction: via `BlobReader`, random * read for `create_delta_layer`: via `BlobReader`, dependent on data, but generally random. Why? * in classical LSM terms, this function is what writes the memory-resident `C0` tree into the disk-resident `C1` tree * in our system, though, the values of InMemoryLayer are stored in an EphemeralFile, and hence they are not guaranteed to be memory-resident * the function reads `Value`s in `Key, LSN` order, which is `!=` insert order What do these `EphemeralFile`-level access patterns mean for the page cache? * write: * the common case is that `Value` is a WAL record, and if it isn't a full-page-image WAL record, then it's smaller than `PAGE_SIZE` * So, the `EphemeralPage` pages act as a buffer for these `< PAGE_CACHE` sized writes. * If there's no page cache eviction between subsequent `InMemoryLayer::put_value` calls, the `EphemeralPage` is still resident, so the page cache avoids doing a `write` system call. * In practice, a busy page server will have page cache evictions because we only configure 64MB of page cache size. * reads for page reconstruction: read acceleration, just as for the other page types. * reads for `create_delta_layer`: * The `Value` reads happen through a `BlockCursor`, which optimizes the case of repeated reads from the same page. * So, the best case is that subsequent values are located on the same page; hence `BlockCursor`s buffer is maximally effective. * The worst case is that each `Value` is on a different page; hence the `BlockCursor`'s 1-page-sized buffer is ineffective. * The best case translates into `256MB/PAGE_SIZE` page cache accesses, one per page. * the worst case translates into `#Values` page cache accesses * again, the page cache accesses must be assumed to be random because the `Value`s aren't accessed in insertion order but `Key, LSN` order. ## Summary of changes Preliminaries for this PR were: - #5003 - #5004 - #5005 - uncommitted microbenchmark in #5011 Based on the observations outlined above, this PR makes the following changes: * Rip out `EphemeralPage` from `page_cache.rs` * Move the `block_io::FileId` to `page_cache::FileId` * Add a `PAGE_SIZE`d buffer to the `EphemeralPage` struct. It's called `mutable_tail`. * Change `write_blob` to use `mutable_tail` for the write buffering instead of a page cache page. * if `mutable_tail` is full, it writes it out to disk, zeroes it out, and re-uses it. * There is explicitly no double-buffering, so that memory allocation per `EphemeralFile` instance is fixed. * Change `read_blob` to return different `BlockLease` variants depending on `blknum` * for the `blknum` that corresponds to the `mutable_tail`, return a ref to it * Rust borrowing rules prevent `write_blob` calls while refs are outstanding. * for all non-tail blocks, return a page-cached `ImmutablePage` * It is safe to page-cache these as ImmutablePage because EphemeralFile is append-only. ## Performance How doe the changes above affect performance? M claim is: not significantly. * write path: * before this PR, the `EphemeralFile::write_blob` didn't issue its own `write` system calls. * If there were enough free pages, it didn't issue any `write` system calls. * If it had to evict other `EphemeralPage`s to get pages a page for its writes (`get_buf_for_write`), the page cache code would implicitly issue the writeback of victim pages as needed. * With this PR, `EphemeralFile::write_blob` always issues all of its own `write` system calls. * Also, the writes are explicit instead of implicit through page cache write back, which will help #4743 * The perf impact of always doing the writes is the CPU overhead and syscall latency. * Before this PR, we might have never issued them if there were enough free pages. * We don't issue `fsync` and can expect the writes to only hit the kernel page cache. * There is also an advantage in issuing the writes directly: the perf impact is paid by the tenant that caused the writes, instead of whatever tenant evicts the `EphemeralPage`. * reads for page reconstruction: no impact. * The `write_blob` function pre-warms the page cache when it writes the `mutable_tail` to disk. * So, the behavior is the same as with the EphemeralPages before this PR. * reads for `create_delta_layer`: no impact. * Same argument as for page reconstruction. * Note for the future: * going through the page cache likely causes read amplification here. Why? * Due to the `Key,Lsn`-ordered access pattern, we don't read all the values in the page before moving to the next page. In the worst case, we might read the same page multiple times to read different `Values` from it. * So, it might be better to bypass the page cache here. * Idea drafts: * bypass PS page cache + prefetch pipeline + iovec-based IO * bypass PS page cache + use `copy_file_range` to copy from ephemeral file into the L0 delta file, without going through user space	2023-08-18 20:31:03 +03:00
Joonas Koivunen	0a082aee77	test: allow race with flush and stopped queue (#5027 ) A lucky race can happen with the shutdown order I guess right now. Seen in [test_tenant_delete_smoke]. The message is not the greatest to match against. [test_tenant_delete_smoke]: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5892262320/index.html#suites/3556ed71f2d69272a7014df6dcb02317/189a0d1245fb5a8c	2023-08-18 19:36:25 +03:00
Arthur Petukhovsky	0b90411380	Fix safekeeper recovery with auth (#5035 ) Fix missing a password in walrcv_connect for a safekeeper recovery. Add a test which restarts endpoint and triggers a recovery.	2023-08-18 16:48:55 +01:00
Arpad Müller	f4da010aee	Make the compaction warning more tolerant (#5024 ) ## Problem The performance benchmark in `test_runner/performance/test_layer_map.py` is currently failing due to the warning added in #4888. ## Summary of changes The test mentioned has a `compaction_target_size` of 8192, which is just one page size. This is an unattainable goal, as we generate at least three pages: one for the header, one for the b-tree (minimally sized ones have just the root node in a single page), one for the data. Therefore, we add two pages to the warning limit. The warning text becomes a bit less accurate but I think this is okay.	2023-08-18 16:36:31 +02:00
Conrad Ludgate	ec10838aa4	proxy: pool connection logs (#5020 ) ## Problem Errors and notices that happen during a pooled connection lifecycle have no session identifiers ## Summary of changes Using a watch channel, we set the session ID whenever it changes. This way we can see the status of a connection for that session Also, adding a connection id to be able to search the entire connection lifecycle	2023-08-18 11:44:08 +01:00
Joonas Koivunen	67af24191e	test: cleanup remote_timeline_client tests (#5013 ) I will have to change these as I change remote_timeline_client api in #4938. So a bit of cleanup, handle my comments which were just resolved during initial review. Cleanup: - use unwrap in tests instead of mixed `?` and `unwrap` - use `Handle` instead of `&'static Reactor` to make the RemoteTimelineClient more natural - use arrays in tests - use plain `#[tokio::test]`	2023-08-17 19:27:30 +03:00
Joonas Koivunen	6af5f9bfe0	fix: format context (#5022 ) We return an error with unformatted `{timeline_id}`.	2023-08-17 14:30:25 +00:00
Dmitry Rodionov	64fc7eafcd	Increase timeout once again. (#5021 ) When failpoint is early in deletion process it takes longer to complete after failpoint is removed. Example was: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5889544346/index.html#suites/3556ed71f2d69272a7014df6dcb02317/49826c68ce8492b1	2023-08-17 15:37:28 +03:00
Conrad Ludgate	3e4710c59e	proxy: add more sasl logs (#5012 ) ## Problem A customer is having trouble connecting to neon from their production environment. The logs show a mix of "Internal error" and "authentication protocol violation" but not the full error ## Summary of changes Make sure we don't miss any logs during SASL/SCRAM	2023-08-17 12:05:54 +01:00
Dmitry Rodionov	d8b0a298b7	Do not attach deleted tenants (#5008 ) Rather temporary solution before proper: https://github.com/neondatabase/neon/issues/5006 It requires more plumbing so lets not attach deleted tenants first and then implement resume. Additionally fix `assert_prefix_empty`. It had a buggy prefix calculation, and since we always asserted for absence of stuff it worked. Here I started to assert for presence of stuff too and it failed. Added more "presence" asserts to other places to be confident that it works. Resolves [#5016](https://github.com/neondatabase/neon/issues/5016)	2023-08-17 13:46:49 +03:00
Alexander Bayandin	c8094ee51e	test_compatibility: run amcheck unconditionally (#4985 ) ## Problem The previous version of neon (that we use in the forward compatibility test) has installed `amcheck` extension now. We can run `pg_amcheck` unconditionally. ## Summary of changes - Run `pg_amcheck` in compatibility tests unconditionally	2023-08-17 11:46:00 +01:00
Christian Schwarz	957af049c2	ephemeral file: refactor write_blob impl to concentrate mutable state (#5004 ) Before this patch, we had the `off` and `blknum` as function-wide mutable state. Now it's contained in the `Writer` struct. The use of `push_bytes` instead of index-based filling of the buffer also makes it easier to reason about what's going on. This is prep for https://github.com/neondatabase/neon/pull/4994	2023-08-17 13:07:25 +03:00
Anastasia Lubennikova	786c7b3708	Refactor remote extensions index download. Don't download ext_index.json from s3, but instead receive it as a part of spec from control plane. This eliminates s3 access for most compute starts, and also allows us to update extensions spec on the fly	2023-08-17 12:48:33 +03:00
Joonas Koivunen	d3612ce266	delta_layer: Restore generic from last week (#5014 ) Restores #4937 work relating to the ability to use `ResidentDeltaLayer` (which is an Arc wrapper) in #4938 for the ValueRef's by removing the borrow from `ValueRef` and providing it from an upper layer. This should not have any functional changes, most importantly, the `main` will continue to use the borrowed `DeltaLayerInner`. It might be that I can change #4938 to be like this. If that is so, I'll gladly rip out the `Ref` and move the borrow back. But I'll first want to look at the current test failures.	2023-08-17 11:47:31 +03:00
Christian Schwarz	994411f5c2	page cache: newtype the blob_io and ephemeral_file file ids (#5005 ) This makes it more explicit that these are different u64-sized namespaces. Re-using one in place of the other would be catastrophic. Prep for https://github.com/neondatabase/neon/pull/4994 which will eliminate the ephemeral_file::FileId and move the blob_io::FileId into page_cache. It makes sense to have this preliminary commit though, to minimize amount of new concept in #4994 and other preliminaries that depend on that work.	2023-08-16 18:33:47 +02:00
Conrad Ludgate	25934ec1ba	proxy: reduce global conn pool contention (#4747 ) ## Problem As documented, the global connection pool will be high contention. ## Summary of changes Use DashMap rather than Mutex<HashMap>. Of note, DashMap currently uses a RwLock internally, but it's partially sharded to reduce contention by a factor of N. We could potentially use flurry which is a port of Java's concurrent hashmap, but I have no good understanding of it's performance characteristics. Dashmap is at least equivalent to hashmap but less contention. See the read heavy benchmark to analyse our expected performance <https://github.com/xacrimon/conc-map-bench#ready-heavy> I also spoke with the developer of dashmap recently, and they are working on porting the implementation to use concurrent HAMT FWIW	2023-08-16 17:20:28 +01:00
Arpad Müller	0bdbc39cb1	Compaction: unify key and value reference vecs (#4888 ) ## Problem PR #4839 has already reduced the number of b-tree traversals and vec creations from 3 to 2, but as pointed out in https://github.com/neondatabase/neon/pull/4839#discussion_r1279167815 , we would ideally just traverse the b-tree once during compaction. Afer #4836, the two vecs created are one for the list of keys, lsns and sizes, and one for the list of `(key, lsn, value reference)`. However, they are not equal, as pointed out in https://github.com/neondatabase/neon/pull/4839#issuecomment-1660418012 and the following comment: the key vec creation combines multiple entries for which the lsn is changing but the key stays the same into one, with the size being the sum of the sub-sizes. In SQL, this would correspond to something like `SELECT key, lsn, SUM(size) FROM b_tree GROUP BY key;` and `SELECT key, lsn, val_ref FROM b_tree;`. Therefore, the join operation is non-trivial. ## Summary of changes This PR merges the two lists of keys and value references into one. It's not a trivial change and affects the size pattern of the resulting files, which is why this is in a separate PR from #4839 . The key vec is used in compaction for determining when to start a new layer file. The loop uses various thresholds to come to this conclusion, but the grouping via the key has led to the behaviour that regardless of the threshold, it only starts a new file when either a new key is encountered, or a new delta file. The new code now does the combination after the merging and sorting of the various keys from the delta files. This mostly does the same as the old code, except for a detail: with the grouping done on a per-delta-layer basis, the sorted and merged vec would still have multiple entries for multiple delta files, but now, we don't have an easy way to tell when a new input delta layer file is encountered, so we cannot create multiple entries on that basis easily. To prevent possibly infinite growth, our new grouping code compares the combined size with the threshold, and if it is exceeded, it cuts a new entry so that the downstream code can cut a new output file. Here, we perform a tradeoff however, as if the threshold is too small, we risk putting entries for the same key into multiple layer files, but if the threshold is too big, we can in some instances exceed the target size. Currently, we set the threshold to the target size, so in theory we would stay below or roughly at double the `target_file_size`. We also fix the way the size was calculated for the last key. The calculation was wrong and accounted for the old layer's btree, even though we already account for the overhead of the in-construction btree. Builds on top of #4839 .	2023-08-16 18:27:18 +03:00
Dmitry Rodionov	96b84ace89	Correctly remove orphaned objects in RemoteTimelineClient::delete_all (#5000 ) Previously list_prefixes was incorrectly used for that purpose. Change to use list_files. Add a test. Some drive by refactorings on python side to move helpers out of specific test file to be widely accessible resolves https://github.com/neondatabase/neon/issues/4499	2023-08-16 17:31:16 +03:00
Christian Schwarz	368b783ada	ephemeral_file: remove FileExt impl (was only used by tests) (#5003 ) Extracted from https://github.com/neondatabase/neon/pull/4994	2023-08-16 15:41:25 +02:00
Dmitry Rodionov	0f47bc03eb	Fix delete_objects in UnreliableWrapper (#5002 ) For `delete_objects` it was injecting failures for whole delete_objects operation and then for every delete it contains. Make it fail once for the whole operation.	2023-08-16 14:08:53 +03:00
Arseny Sher	fdbe8dc8e0	Fix test_s3_wal_replay flakiness. ref https://github.com/neondatabase/neon/issues/4466	2023-08-16 12:57:43 +03:00
Arthur Petukhovsky	1b97a3074c	Disable neon-pool-opt-in (#4995 )	2023-08-15 20:57:56 +03:00
John Spray	5c836ee5b4	tests: extend timeout in timeline deletion test (#4992 ) ## Problem This was set to 5 seconds, which was very close to how long a compaction took on my workstation, and when deletion is blocked on compaction the test would fail. We will fix this to make compactions drop out on deletion, but for the moment let's stabilize the test. ## Summary of changes Change timeout on timeline deletion in `test_timeline_deletion_with_files_stuck_in_upload_queue` from 5 seconds to 30 seconds.	2023-08-15 20:14:03 +03:00
Arseny Sher	4687b2e597	Test that auth on pg/http services can be enabled separately in sks. To this end add 1) -e option to 'neon_local safekeeper start' command appending extra options to safekeeper invocation; 2) Allow multiple occurrences of the same option in safekeepers, the last value is taken. 3) Allow to specify empty string for *-auth-public-key-path opts, it disables auth for the service.	2023-08-15 19:31:20 +03:00
Arseny Sher	13adc83fc3	Allow to enable http/pg/pg tenant only auth separately in safekeeper. The same option enables auth and specifies public key, so this allows to use different public keys as well. The motivation is to 1) Allow to e.g. change pageserver key/token without replacing all compute tokens. 2) Enable auth gradually.	2023-08-15 19:31:20 +03:00
Dmitry Rodionov	52c2c69351	fsync directory before mark file removal (#4986 ) ## Problem Deletions can be possibly reordered. Use fsync to avoid the case when mark file doesnt exist but other tenant/timeline files do. See added comments. resolves #4987	2023-08-15 19:24:23 +03:00
Alexander Bayandin	207919f5eb	Upload test results to DB right after generation (#4967 ) ## Problem While adding new test results format, I've also changed the way we upload Allure reports to S3 (`722c7956bb`) to avoid duplicated results from previous runs. But it broke links at earlier results (results are still available but on different URLs). This PR fixes this (by reverting logic in `722c7956bb` changes), and moves the logic for storing test results into db to allure generate step. It allows us to avoid test results duplicates in the db and saves some time on extra s3 downloads that happened in a different job before the PR. Ref https://neondb.slack.com/archives/C059ZC138NR/p1691669522160229 ## Summary of changes - Move test results storing logic from a workflow to `actions/allure-report-generate`	2023-08-15 15:32:30 +01:00
George MacKerron	218be9eb32	Added deferrable transaction option to http batch queries (#4993 ) ## Problem HTTP batch queries currently allow us to set the isolation level and read only, but not deferrable. ## Summary of changes Add support for deferrable. Echo deferrable status in response headers only if true. Likewise, now echo read-only status in response headers only if true.	2023-08-15 14:52:00 +01:00
Joonas Koivunen	8198b865c3	Remote storage metrics follow-up (#4957 ) #4942 left old metrics in place for migration purposes. It was noticed that from new metrics the total number of deleted objects was forgotten, add it. While reviewing, it was noticed that the delete_object could just be delete_objects of one. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-08-15 12:30:27 +03:00

1 2 3 4 5 ...

3606 Commits