Commit Graph

3040 Commits

Author SHA1 Message Date
Christian Schwarz
bef32e336f doc: test: comment on name_filter 2023-03-30 17:56:20 +02:00
Christian Schwarz
6f63705a6d test: test_pageserver_respects_overridden_resident_size: minor clarification 2023-03-30 17:52:56 +02:00
Christian Schwarz
4017bf3e8e tests: unflaky test_partial_evict_tenant, hopefully
I suspect we need to account for the slight overage
allowed by the rewrite.

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3905/debug/4565272357/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/3dbd8cde3049a19c/
2023-03-30 17:52:56 +02:00
Christian Schwarz
e2e0ffd344 doc: serde_regex can also Serialize 2023-03-30 17:52:56 +02:00
Christian Schwarz
05b45a35d7 fix: macOS build 2023-03-30 17:52:25 +02:00
Christian Schwarz
1417c6880e fix: cfg(not(feature = testing)) builds 2023-03-30 16:00:10 +02:00
Christian Schwarz
140ef67dd8 refactor: replace LD_PRELOAD with a baked-in solution 2023-03-30 15:51:26 +02:00
Christian Schwarz
d5b8f123ec test: fix: accidentally re-enabled compaction & gc 2023-03-30 14:06:15 +02:00
Christian Schwarz
cbaba7c089 tests: fix flakiness: walreceiver triggered on-demand downloads after evictions 2023-03-30 13:58:41 +02:00
Christian Schwarz
6dab4f3539 test: refactor: don't capture layermap info, we never use it 2023-03-30 13:47:43 +02:00
Christian Schwarz
2acec7cf5b test: use 33% in hopes it make test failures easier to understand 2023-03-30 13:25:22 +02:00
Christian Schwarz
0834155882 tests: fix "[Errno 39] Directory not empty"
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3905/debug/4562684660/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/f89800acfdc76ec7/
2023-03-30 12:24:01 +02:00
Christian Schwarz
81088f238e fix: do a single write to stderr from the statvfs_ldpreload 2023-03-30 11:16:09 +02:00
Christian Schwarz
d1b398702d random_init_delay: remove the minimum of 10 seconds
Before this patch, the range from which the random delay is picked
is at minimum 10 seconds.

The disk usage eviction tests are apparently the first to wait for
a background loop to do its job.
With this patch, they only need to wait the 1s which they configure.
2023-03-30 11:13:46 +02:00
Christian Schwarz
919e11e001 fixup: forgot the lib prefix 2023-03-29 19:48:51 +02:00
Christian Schwarz
1b9b1e5e71 fixup: need to actually build the libstatvfs_ldpreload.so 2023-03-29 19:40:04 +02:00
Christian Schwarz
5cbe8341e4 fix: CI requires the cdylibs 2023-03-29 19:26:29 +02:00
Christian Schwarz
133406f4ae test: skip the LD_PRELOAD tests on macOS 2023-03-29 18:44:22 +02:00
Christian Schwarz
87e439b6de fix: adopt workspace_hack for the statvfs preload 2023-03-29 18:07:02 +02:00
Christian Schwarz
f75ee57aa6 fix: clippy 2023-03-29 18:05:57 +02:00
Christian Schwarz
11ff3d4ea1 tests: more statvfs tests 2023-03-29 17:40:18 +02:00
Christian Schwarz
7b1c5f46ab refactor: require caller to stop pageserver before starting with mock 2023-03-29 17:39:53 +02:00
Christian Schwarz
71ca140fd2 refactor: move testing of broken-tenant-skipping into separate test
It makes the mocked statvfs bytes counting more difficult.
2023-03-29 17:07:39 +02:00
Christian Schwarz
d786614384 fix: statvfs_ldpreload: specify used bytes instead of avail bytes 2023-03-29 16:49:14 +02:00
Christian Schwarz
f32ebb74a1 refactor: move the ldpreload setup code into a method on EvictionEnv 2023-03-29 15:55:13 +02:00
Christian Schwarz
c8784cba6b wire up the statvfs ldpreload thing in an example test 2023-03-29 15:44:14 +02:00
Christian Schwarz
555ccb8c91 feat: add LD_PRELOADable library for mocking statvfs
use like so:

env RUST_LOG=pageserver=info,pageserver::disk_usage_eviction_task=debug LD_PRELOAD=$PWD/target/debug/libstatvfs_ldpreload.so  NEON_STATVFS_LDPRELOAD_CONFIG="$(echo '{}' | jq '{magic: "foobar", mock: { type: "Failure", mocked_error: "EIO" }}')" ./target/debug/neon_local pageserver start
2023-03-29 15:01:45 +02:00
Christian Schwarz
216f613e24 Merge pull request #3890 from neondatabase/heikki/disk-usage-eviction
Rewrite parts of disk usage eviction implementation to make it more understandable (I hope).
2023-03-29 12:43:38 +02:00
Christian Schwarz
b47a02569f tests: fully read-only warmup + wait for remote storage upload 2023-03-29 11:55:02 +02:00
Christian Schwarz
699ca672a4 test: refine test_pageserver_respects_overridden_resident_size 2023-03-29 11:22:53 +02:00
Christian Schwarz
a698ddb8a4 fix: avoid needless timeline.clone() 2023-03-29 10:58:59 +02:00
Christian Schwarz
57d215e6bb fix: suggestions commited from GitHub web didn't compile 2023-03-29 10:55:57 +02:00
Christian Schwarz
83813f2cb1 fix: remove unneeded clippy allow 2023-03-29 10:55:45 +02:00
Christian Schwarz
bdc7f8d192 fix: remove now-unused is_sorted 2023-03-29 10:52:56 +02:00
Christian Schwarz
9a55e4f909 fix: structured logging of tenant_id
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-29 10:51:14 +02:00
Christian Schwarz
0b9a44a879 fix: structured logging of tenant_id
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-29 10:50:36 +02:00
Christian Schwarz
a6f9ebf178 fix: repeat tenant_id in debug message 2023-03-29 10:49:06 +02:00
Christian Schwarz
370b3637db doc: add explainer to debug_assert 2023-03-29 10:47:15 +02:00
Christian Schwarz
88753b3325 doc: link follow-up issue in TODO comment 2023-03-29 10:36:55 +02:00
Christian Schwarz
bb5947afde test: test_pageserver_respects_overridden_resident_size: use absolute wiggle room instead of percentage
Heikki added the `*0.75` in

    commit 11b16614a3
    Author: Heikki Linnakangas <heikki@neon.tech>
    Date:   Tue Mar 28 01:13:33 2023 +0300

        Fix test for change in behavior close to the min_resident_size boundary

        This PR changed the behavior to match my expectation per my comment:
        https://github.com/neondatabase/neon/pull/3809/files#r1149837135

Without it, the test fails because we fall back to global LRU, and we
have an assert on that.

The reason why it falls back to global LRU is that
`target = delta_between_small_and_big_tenant`
doesn't leave any wiggle-room to go over min_resident_size boundary.

But, we redefined min_resident_size to include up to 1 layer above it
in this branch.
Multiply that by two because we're dealing with 2 tenants here.
2023-03-28 19:16:59 +02:00
Christian Schwarz
d6c2867b46 doc: add debug_assert for self-documenting candidates.sort_unstable_by_kye() 2023-03-28 19:16:26 +02:00
Christian Schwarz
386c2d0112 refactor: go back to a single list
The MinResidentSizePartition is effectively what `overage` was earlier,
but more expressive and outside of EvictionCandidates.

So switch the code back to a single list,
but use (MinResidentSizePartition, EvictionCandidates) tuples.

That eliminates the need for iter_in_eviction_order() alltogether.

It consumes 8 bytes more memory per candidate, but, that doesn't matter
for now.
2023-03-28 18:25:44 +02:00
Christian Schwarz
704d4f4640 doc: improve comment on min_resident_size 2023-03-28 18:06:44 +02:00
Christian Schwarz
dc72a9534e doc: update doc comment for collect_eviction_candidates
And move the impl of MinResidentSizePartitionedCandidates
below it because it makes sense when reading the code top-down.
2023-03-28 18:05:31 +02:00
Christian Schwarz
07c44f9151 doc: hint that usage_assumed is modified in the loop 2023-03-28 17:47:11 +02:00
Christian Schwarz
0c10e6d3e7 feat: demote info logs to debug
These would be per tenant, we don't want to emit thousands of log lines
when this code runs.
2023-03-28 17:36:59 +02:00
Christian Schwarz
85becb148f feat: bring back min_resident=max(all layers) behavior 2023-03-28 17:36:59 +02:00
Christian Schwarz
ea3c76a9d6 refactor: instead of 'overage', have two separate lists 2023-03-28 17:36:59 +02:00
Christian Schwarz
799576ab1e Merge branch 'problame/disk-usage-eviction' into heikki/disk-usage-eviction 2023-03-28 15:24:52 +02:00
Joonas Koivunen
b1d54024e7 doc: why tokio mutex instead of std mutex 2023-03-28 14:14:14 +03:00