Commit Graph

660 Commits

Author SHA1 Message Date
Christian Schwarz
bef32e336f doc: test: comment on name_filter 2023-03-30 17:56:20 +02:00
Christian Schwarz
6f63705a6d test: test_pageserver_respects_overridden_resident_size: minor clarification 2023-03-30 17:52:56 +02:00
Christian Schwarz
4017bf3e8e tests: unflaky test_partial_evict_tenant, hopefully
I suspect we need to account for the slight overage
allowed by the rewrite.

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3905/debug/4565272357/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/3dbd8cde3049a19c/
2023-03-30 17:52:56 +02:00
Christian Schwarz
140ef67dd8 refactor: replace LD_PRELOAD with a baked-in solution 2023-03-30 15:51:26 +02:00
Christian Schwarz
d5b8f123ec test: fix: accidentally re-enabled compaction & gc 2023-03-30 14:06:15 +02:00
Christian Schwarz
cbaba7c089 tests: fix flakiness: walreceiver triggered on-demand downloads after evictions 2023-03-30 13:58:41 +02:00
Christian Schwarz
6dab4f3539 test: refactor: don't capture layermap info, we never use it 2023-03-30 13:47:43 +02:00
Christian Schwarz
2acec7cf5b test: use 33% in hopes it make test failures easier to understand 2023-03-30 13:25:22 +02:00
Christian Schwarz
0834155882 tests: fix "[Errno 39] Directory not empty"
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3905/debug/4562684660/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/f89800acfdc76ec7/
2023-03-30 12:24:01 +02:00
Christian Schwarz
133406f4ae test: skip the LD_PRELOAD tests on macOS 2023-03-29 18:44:22 +02:00
Christian Schwarz
11ff3d4ea1 tests: more statvfs tests 2023-03-29 17:40:18 +02:00
Christian Schwarz
7b1c5f46ab refactor: require caller to stop pageserver before starting with mock 2023-03-29 17:39:53 +02:00
Christian Schwarz
71ca140fd2 refactor: move testing of broken-tenant-skipping into separate test
It makes the mocked statvfs bytes counting more difficult.
2023-03-29 17:07:39 +02:00
Christian Schwarz
f32ebb74a1 refactor: move the ldpreload setup code into a method on EvictionEnv 2023-03-29 15:55:13 +02:00
Christian Schwarz
c8784cba6b wire up the statvfs ldpreload thing in an example test 2023-03-29 15:44:14 +02:00
Christian Schwarz
b47a02569f tests: fully read-only warmup + wait for remote storage upload 2023-03-29 11:55:02 +02:00
Christian Schwarz
699ca672a4 test: refine test_pageserver_respects_overridden_resident_size 2023-03-29 11:22:53 +02:00
Christian Schwarz
bb5947afde test: test_pageserver_respects_overridden_resident_size: use absolute wiggle room instead of percentage
Heikki added the `*0.75` in

    commit 11b16614a3
    Author: Heikki Linnakangas <heikki@neon.tech>
    Date:   Tue Mar 28 01:13:33 2023 +0300

        Fix test for change in behavior close to the min_resident_size boundary

        This PR changed the behavior to match my expectation per my comment:
        https://github.com/neondatabase/neon/pull/3809/files#r1149837135

Without it, the test fails because we fall back to global LRU, and we
have an assert on that.

The reason why it falls back to global LRU is that
`target = delta_between_small_and_big_tenant`
doesn't leave any wiggle-room to go over min_resident_size boundary.

But, we redefined min_resident_size to include up to 1 layer above it
in this branch.
Multiply that by two because we're dealing with 2 tenants here.
2023-03-28 19:16:59 +02:00
Christian Schwarz
ea3c76a9d6 refactor: instead of 'overage', have two separate lists 2023-03-28 17:36:59 +02:00
Heikki Linnakangas
11b16614a3 Fix test for change in behavior close to the min_resident_size boundary
This PR changed the behavior to match my expectation per my comment:
https://github.com/neondatabase/neon/pull/3809/files#r1149837135
2023-03-28 01:28:27 +03:00
Christian Schwarz
de3a7470de Merge remote-tracking branch 'origin/main' into problame/disk-usage-eviction 2023-03-27 19:04:55 +02:00
Joonas Koivunen
e7d41c32ae fixup: import Dict 2023-03-27 19:40:32 +03:00
Christian Schwarz
67fc8ec5c4 fix: use Dict in type annotation
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
2023-03-27 18:31:40 +02:00
dependabot[bot]
4071ff8c7b Bump openssl from 0.10.45 to 0.10.48 in /test_runner/pg_clients/rust/tokio-postgres (#3879) 2023-03-25 12:33:39 +00:00
Dmitry Rodionov
870ba43a1f return proper http codes in timeline delete endpoint (#3876)
return proper http codes in timeline delete endpoint
+ fix openapi spec for detach to include 404 responses
2023-03-24 19:25:39 +02:00
Kirill Bulatov
8bd565e09e Ensure branches with no layers have their remote storage counterpart created eventually (#3857)
Discovered during writing a test for
https://github.com/neondatabase/neon/pull/3843
2023-03-22 17:42:31 +02:00
Shany Pozin
0f7de84785 Allow calling detach on ignored tenant (#3834)
## Describe your changes
Added a query param to detach API
Allow to remove local state of a tenant even if its not in the memory
(following ignore API)
## Issue ticket number and link
#3828
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2023-03-22 07:17:00 +00:00
Kirill Bulatov
dd22c87100 Remove older layer metadata format support code (#3854)
The PR enforces current newest `index_part.json` format in the type
system (version `1`), not allowing any previous forms of it, that were
used in the past.
Similarly, the code to mitigate the
https://github.com/neondatabase/neon/issues/3024 issue is now also
removed.

Current code does not produce old formats and extra files in the
index_part.json, in the future we will be able to use
https://github.com/neondatabase/aversion or other approach to make
version transitions more explicit.

See https://neondb.slack.com/archives/C033RQ5SPDH/p1679134185248119 for
the justification on the breaking changes.
2023-03-21 23:33:28 +02:00
Christian Schwarz
834e94f5e3 fix: rebase fallout 2023-03-20 17:24:46 +01:00
Christian Schwarz
68b3e68642 refactor: PageserverHttpClient method for breaking tenant from tests 2023-03-20 17:18:15 +01:00
Christian Schwarz
4aa528ce45 chore: python format 2023-03-20 16:51:11 +01:00
Christian Schwarz
5015194b40 refactor: rename wanted_trimmed_bytes to evict_bytes 2023-03-20 16:51:11 +01:00
Christian Schwarz
5aef192bf2 disk-usage-based layer eviction
This patch adds a pageserver-global background loop that evicts layers
in response to a shortage of available bytes in the $repo/tenants
directory's filesystem.

The loop runs periodically at a configurable `period`.

Each loop iteration uses `statvfs` to determine filesystem-level space
usage.  It compares the returned usage data against two different types
of thresholds. The iteration tries to evict layers until app-internal
accounting says we should be below the thresholds.  We cross-check this
internal accounting with the real world by making another `statvfs` at
the end of the iteration.  We're good if that second statvfs shows that
we're _actually_ below the configured thresholds.  If we're still above
one or more thresholds, we emit a warning log message, leaving it to the
operator to investigate further.

There are two thresholds: `max_usage_pct` is the relative available
space, expressed in percent of the total filesystem space. If the actual
usage is higher, the threshold is exceeded.  `min_avail_bytes` is the
absolute available space in bytes. If the actual usage is lower, the
threshold is exceeded.

The iteration evicts layers in LRU fashion with a reservation of up to
`min_resident_size` bytes of the most recent layers per tenant.
The layers not part of the per-tenant reservation are evicted
least-recently-used first until we're below all thresholds.
If the above doesn't relieve enough pressure, we fall back to Global LRU.

In addition to the loop, there is also an HTTP endpoint to perform
one loop iteration synchronous to the request.
The endpoint takes an absolute number of bytes that the iteration
needs to evict before pressure is relieved.
The tests use this endpoint, which is a great simplification over
setting up loopback-mounts in the tests, which would be required to
test the statvfs part of the implementation.
We will rely on manual testing in staging to test the statvfs parts.

The HTTP endpoint is also handy in emergencies where an operator wants
the pageserver to evict a given amount of space _now.
Hence, it's arguments documented in openapi_spec.yml.
The response type isn't documented though because we don't consider
it stable. The endpoint should _not_ be used by Console.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>

fixes https://github.com/neondatabase/neon/issues/3728
2023-03-20 16:51:09 +01:00
Christian Schwarz
881356c417 add metrics to detect eviction-induced thrashing (#3837)
This patch adds two metrics that will enable us to detect *thrashing* of
layers, i.e., repetitions of `eviction, on-demand-download, eviction,
... ` for a given layer.

The first metric counts all layer evictions per timeline. It requires no
further explanation. The second metric counts the layer evictions where
the layer was resident for less than a given threshold.

We can alert on increments to the second metric. The first metric will
serve as a baseline, and further, it's generally interesting, outside of
thrashing.

The second metric's threshold is configurable in PageServerConf and
defaults to 24h. The threshold value is reproduced as a label in the
metric because the counter's value is semantically tied to that
threshold. Since changes to the config and hence the label value are
infrequent, this will have low storage overhead in the metrics storage.

The data source to determine the time that the layer was resident is the
file's `mtime`. Using `mtime` is more of a crutch. It would be better if
Pageserver did its own persistent bookkeeping of residence change events
instead of relying on the filesystem. We had some discussion about this:
https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900

My position is that `mtime` is good enough for now. It can theoretically
jump forward if someone copies files without resetting `mtime`. But that
shouldn't happen in practice. Note that moving files back and forth
doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`,
which is typically used for filesystem-level backup / restore, correctly
syncs `mtime`.

I've added a label that identifies the data source to keep options open
for a future, better data source than `mtime`. Since this value will
stay the same for the time being, it's not a problem for metrics
storage.

refs https://github.com/neondatabase/neon/issues/3728
2023-03-20 16:11:36 +01:00
Heikki Linnakangas
fea4b5f551 Switch to EdDSA algorithm for the storage JWT authentication tokens.
The control plane currently only supports EdDSA. We need to either teach
the storage to use EdDSA, or the control plane to use RSA. EdDSA is more
modern, so let's use that.

We could support both, but it would require a little more code and tests,
and we don't really need the flexibility since we control both sides.
2023-03-20 16:28:01 +02:00
Heikki Linnakangas
77107607f3 Allow JWT key generation to fail if authentication is not enabled.
This allows you to run without the 'openssl' binary as long as you
don't enable authentication. This becomes more important with the next
commit, which switches the JWT algorithm to EdDSA. LibreSSL does not
support EdDSA, and LibreSSL comes with macOS, so the next commit makes
it much more likely for the key generation to fail for macOS users.

To allow running without a keypair, don't generate the authentication
token in the 'neon_local init' step. Instead, generate a new token on
every request that needs one, using the private key.
2023-03-20 16:28:01 +02:00
Joonas Koivunen
0c1228c37a feat: store initial timeline in env fixture (#3839)
minor change, but will allow more use in future for the default
tenants.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-03-20 11:57:27 +02:00
Shany Pozin
93f3f4ab5f Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818)
## Describe your changes
Add Error enum for tenant state response to allow better error handling
in mgmt api
## Issue ticket number and link
#2238
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-19 10:44:42 +02:00
Arthur Petukhovsky
b067378d0d Measure cross-AZ traffic in safekeepers (#3806)
Create `safekeeper_pg_io_bytes_total` metric to track total amount of
bytes written/read in a postgres connections to safekeepers. This metric
has the following labels:
- `client_az` – availability zone of the connection initiator, or
`"unknown"`
- `sk_az` – availability zone of the safekeeper, or `"unknown"`
- `app_name` – `application_name` of the postgres client
- `dir` – data direction, either `"read"` or `"write"`
- `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from
`client_az` and `sk_az`, exists purely for convenience.

This is implemented by passing availability zone in the connection
string, like this: `-c tenant_id=AAA timeline_id=BBB
availability-zone=AZ-1`.

Update ansible deployment scripts to add availability_zone argument
to safekeeper and pageserver in systemd service files.
2023-03-16 17:24:01 +03:00
Joonas Koivunen
768c8d9972 test: allow gc to get unlucky (#3826)
this failure case was probably introduced by b220ba6, because earlier
the gc would always have run fast enough for restart every 1s. however,
test got added later, so we have just been lucky.

fixes #3824 by allowing this error to happen.
2023-03-16 11:03:21 +02:00
Arthur Petukhovsky
cd17802b1f Add pg_clients test for serverless driver (#3827)
Fixes #3819
2023-03-15 16:18:38 +03:00
Heikki Linnakangas
10a5d36af8 Separate mgmt and libpq authentication configs in pageserver. (#3773)
This makes it possible to enable authentication only for the mgmt HTTP
API or the compute API. The HTTP API doesn't need to be directly
accessible from compute nodes, and it can be secured through network
policies. This also allows rolling out authentication in a piecemeal
fashion.
2023-03-15 13:52:29 +02:00
Heikki Linnakangas
2672fd09d8 Make test independent of the order of config lines. 2023-03-14 20:10:34 +02:00
Heikki Linnakangas
4a92799f24 Fix check for trailing garbage in basebackup import.
There was a warning for trailing garbage after end-of-tar archive, but
it didn't always work. The reason is that we created a StreamReader
over the original copyin-stream, but performed the check for garbage
on the copyin-stream. There could be some garbage bytes buffered in
the StreamReader, which were not caught by the warning.

I considered turning the the warning into a fatal error, aborting the
import, but I wasn't sure if we handle aborting the import properly.
Do we clean up the timeline directory on error? If we don't, we should
make that more robust, but that's a different story.

Also, normally a valid tar archive ends with two 512-byte blocks of zeros.
The tokio_tar crate stops at the first all-zeros block. Read and check
the second all-zeros block, and error out if it's not there, or contains
something unexpected.
2023-03-14 19:45:33 +02:00
Joonas Koivunen
15b692ccc9 test: more strict finding of WARN, ERROR lines (#3798)
this prevents flakyness when `WARN|ERROR` appears in some other part of
the line, for example in a random filename.
2023-03-14 15:27:52 +02:00
Alexander Bayandin
3d869cbcde Replace flake8 and isort with ruff (#3810)
- Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort
- Update mypy and black
2023-03-14 13:25:44 +00:00
Dmitry Ivanov
15ed6af5f2 Add descriptions to proxy's python tests. 2023-03-14 01:32:37 +03:00
Heikki Linnakangas
e0ee138a8b Add a test for tokio-postgres client to the driver test suite.
It is fully supported. To enable TLS, though, it requires some extra
glue code, and a dependency to a TLS library.
2023-03-13 12:14:37 +02:00
Joonas Koivunen
8699342249 Ondemand rx bytes and layer count (#3777)
Adds two new *global* metrics:
- pageserver_remote_ondemand_downloaded_layers_total
- pageserver_remote_ondemand_downloaded_bytes_total

An existing test is repurposed once more to check that we do get some
reasonable counts. These are to replace guessing from the nic RX bytes
metric how much was on-demand downloaded.

First part of #3745: This does not add the "(un)?avoidable" metric,
which I plan to add as a new metric, which will be a subset of the
counts of the metrics added here.
2023-03-13 09:26:49 +02:00
Joonas Koivunen
ce8fbbd910 Fix allowed error again (#3790)
Fixes #3360 again, this time checking all other "Error processing HTTP
request" messages and aligning the regex with the two others.
2023-03-10 17:44:12 +00:00