Commit Graph

4867 Commits

Author SHA1 Message Date
Christian Schwarz
261f116a2d local benchmark run
test_bulk_insert[neon-release-pg14-std-fs].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-std-fs].wal_recovery: 8.381 s

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 11.760 s
2024-03-14 17:29:04 +00:00
Christian Schwarz
79a6cda45d page_cache_priming_writer: done 2024-03-14 17:19:36 +00:00
Christian Schwarz
f0a67c4071 WIP: page_cache_priming_writer: clippy 2024-03-14 17:19:36 +00:00
Christian Schwarz
a1f4eb2815 WIP: page_cache_priming_writer: bugfixes 2024-03-14 17:15:30 +00:00
Christian Schwarz
eb1ccd7988 WIP: page_cache_priming_writer: plumb through RequestContext for previous commit, yet more churn -,- 2024-03-14 17:15:21 +00:00
Christian Schwarz
f1f0452722 WIP: page_cache_priming_writer (is it really worth it?) 2024-03-14 15:46:41 +00:00
Christian Schwarz
40200b7521 figure out why & when exactly zeroes past write offset are required & assert it 2024-03-14 12:04:17 +00:00
Christian Schwarz
91bd729be2 junk up owned_buffers_io from previous commit to deal with EphemeralFile speciality of reading zeroes past end-of-file
This makes it diverge semantically from what's in the tokio-epoll-uring
download PR :(
2024-03-13 18:23:12 +00:00
Christian Schwarz
4b84f23cea larger buffers for the write path
The OwnedAsyncWrite stuff is based on the code in
tokio-epoll-uring on-demand download PR (#6992), which hasn't merged
yet.
2024-03-13 18:23:08 +00:00
Christian Schwarz
644c5e243d Revert "experiment(repeat, without preceding reverts) demonstrate that std-fs performs better because it hits the page cache"
This reverts commit d66ccbae5e.
2024-03-13 15:28:24 +00:00
Christian Schwarz
d66ccbae5e experiment(repeat, without preceding reverts) demonstrate that std-fs performs better because it hits the page cache
... by forcing each write system call to go to disk

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 346 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 93.417 s

test_bulk_insert[neon-release-pg14-std-fs].wal_written: 346 MB
test_bulk_insert[neon-release-pg14-std-fs].wal_recovery: 86.009 s

=> ~8% instead of 2x difference
2024-03-13 15:27:41 +00:00
Christian Schwarz
578a2d5d5f Revert "experiment: for create_delta_layer, use global io_engine, but inside a spawn_blocking single-threaded runtime"
This reverts commit 72a8e090dd.
2024-03-13 15:09:11 +00:00
Christian Schwarz
c9d1f51a93 Revert "experiment: for create_delta_layer _write path_, use StdFs io engine in a spawn_blocking thread single-threaded runtime"
This reverts commit 4a8e7f8716.
2024-03-13 15:09:06 +00:00
Christian Schwarz
1339834297 Revert "experiment: for EphemeralFile write path, use StdFs io engine"
This reverts commit c8c04c0db8.
2024-03-13 15:09:00 +00:00
Christian Schwarz
746fc530c5 "experiment: demonstrate that std-fs performs better because it hits the page cache"
This reverts commit 2edbc07733.
2024-03-13 15:08:43 +00:00
Christian Schwarz
94311052cd previous commit's numbers were with all the preceding experiments 2024-03-13 15:08:11 +00:00
Christian Schwarz
2edbc07733 experiment: demonstrate that std-fs performs better because it hits the page cache
... by forcing each write system call to go to disk

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 346 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 92.559 s

test_bulk_insert[neon-release-pg14-std-fs].wal_written: 346 MB
test_bulk_insert[neon-release-pg14-std-fs].wal_recovery: 81.998 s

=> 10%ish worse instead of 2x
2024-03-13 15:07:46 +00:00
Christian Schwarz
c8c04c0db8 experiment: for EphemeralFile write path, use StdFs io engine
together with previous commits, this brings us back down to
pre-regression

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 9.991 s
2024-03-13 14:09:21 +00:00
Christian Schwarz
4a8e7f8716 experiment: for create_delta_layer _write path_, use StdFs io engine in a spawn_blocking thread single-threaded runtime
builds on top of the previous commit

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 13.153 s
2024-03-13 14:09:16 +00:00
Christian Schwarz
72a8e090dd experiment: for create_delta_layer, use global io_engine, but inside a spawn_blocking single-threaded runtime
This makes things worse with tokio-epoll-uring

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 19.574 s

This is a partial revert of 3da410c8fe
2024-03-13 14:09:12 +00:00
Christian Schwarz
e0ea465aed Revert "experiment: Revert "tokio-epoll-uring: use it on the layer-creating code paths (#6378)""
This reverts commit d3c157eeee.
2024-03-13 12:45:53 +00:00
Christian Schwarz
d3c157eeee experiment: Revert "tokio-epoll-uring: use it on the layer-creating code paths (#6378)"
Unchanged

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 9.194 s

This reverts commit 3da410c8fe.
2024-03-13 12:45:40 +00:00
Christian Schwarz
c600355802 Revert "experiment: StdFs for EphemeralFile writes isn't the bottleneck"
This reverts commit 57241c1c5a.
2024-03-13 12:36:21 +00:00
Christian Schwarz
57241c1c5a experiment: StdFs for EphemeralFile writes isn't the bottleneck
With this

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 16.053 s

down from

test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-tokio-epoll-uring].wal_recovery: 17.669 s

but the regression is from baseline

test_bulk_insert[neon-release-pg14-std-fs].wal_written: 345 MB
test_bulk_insert[neon-release-pg14-std-fs].wal_recovery: 9.335 s
2024-03-13 11:41:17 +00:00
Christian Schwarz
b9c30dbd6b fix the parametrization 2024-03-12 20:24:45 +00:00
Christian Schwarz
35f8735a27 wip 2024-03-12 20:13:44 +00:00
Christian Schwarz
095130c1b3 DO NOT MERGE: always parametrize 2024-03-12 20:09:14 +00:00
Christian Schwarz
5a0277476d Revert "make changes preparing next commit"
This reverts commit e85a631ddb.
2024-03-12 20:09:14 +00:00
Christian Schwarz
6348833bdc expose that virtual_file_io_engine and get_vectored_impl were never set 2024-03-12 20:09:14 +00:00
Christian Schwarz
dbabd4e4ea Revert "expose that pageserver_virtual_file_io_engine test param was never used (same for get_vectored_impl)"
This reverts commit 5b8888ce6b.
2024-03-12 20:09:14 +00:00
Christian Schwarz
5b8888ce6b expose that pageserver_virtual_file_io_engine test param was never used (same for get_vectored_impl) 2024-03-12 19:56:02 +00:00
Christian Schwarz
e85a631ddb make changes preparing next commit 2024-03-12 19:56:02 +00:00
Christian Schwarz
95deea4f39 Revert "Revert "tokio-epoll-uring: use it on the layer-creating code paths (#6378)""
This reverts commit 9876045444.
2024-03-12 18:53:16 +00:00
Christian Schwarz
9876045444 Revert "tokio-epoll-uring: use it on the layer-creating code paths (#6378)"
This reverts commit 3da410c8fe.
2024-03-12 18:53:11 +00:00
Jure Bajic
bac06ea1ac pageserver: fix read path max lsn bug (#7007)
## Summary of changes
The problem it fixes is when `request_lsn` is `u64::MAX-1` the
`cont_lsn` becomes `u64::MAX` which is the same as `prev_lsn` which
stops the loop.

Closes https://github.com/neondatabase/neon/issues/6812
2024-03-12 16:32:47 +00:00
John Spray
7ae8364b0b storage controller: register nodes in re-attach request (#7040)
## Problem

Currently we manually register nodes with the storage controller, and
use a script during deploy to register with the cloud control plane.
Rather than extend that script further, nodes should just register on
startup.

## Summary of changes

- Extend the re-attach request to include an optional
NodeRegisterRequest
- If the `register` field is set, handle it like a normal node
registration before executing the normal re-attach work.
- Update tests/neon_local that used to rely on doing an explicit
register step that could be enabled/disabled.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-03-12 14:47:12 +00:00
Conrad Ludgate
1f7d54f987 proxy refactor tls listener (#7056)
## Problem

Now that we have tls-listener vendored, we can refactor and remove a lot
of bloated code and make the whole flow a bit simpler

## Summary of changes

1. Remove dead code
2. Move the error handling to inside the `TlsListener` accept() function
3. Extract the peer_addr from the PROXY protocol header and log it with
errors
2024-03-12 13:05:40 +00:00
Arthur Petukhovsky
580e136b2e Forward all backpressure feedback to compute (#7079)
Previously we aggregated ps_feedback on each safekeeper and sent it to
walproposer with every AppendResponse. This PR changes it to send
ps_feedback to walproposer right after receiving it from pageserver,
without aggregating it in memory. Also contains some preparations for
implementing backpressure support for sharding.
2024-03-12 12:14:02 +00:00
Conrad Ludgate
09699d4bd8 proxy: cancel http queries on timeout (#7031)
## Problem

On HTTP query timeout, we should try and cancel the current in-flight
SQL query.

## Summary of changes

Trigger a cancellation command in postgres once the timeout is reach
2024-03-12 11:52:00 +00:00
John Spray
89cf714890 tests/neon_local: rename "attachment service" -> "storage controller" (#7087)
Not a user-facing change, but can break any existing `.neon` directories
created by neon_local, as the name of the database used by the storage
controller changes.

This PR changes all the locations apart from the path of
`control_plane/attachment_service` (waiting for an opportune moment to
do that one, because it's the most conflict-ish wrt ongoing PRs like
#6676 )
2024-03-12 11:36:27 +00:00
Heikki Linnakangas
621ea2ec44 tests: try to make restored-datadir comparison tests not flaky v2
This test occasionally fails with a difference in "pg_xact/0000" file
between the local and restored datadirs. My hypothesis is that
something changed in the database between the last explicit checkpoint
and the shutdown. I suspect autovacuum, it could certainly create
transactions.

To fix, be more precise about the point in time that we compare. Shut
down the endpoint first, then read the last LSN (i.e. the shutdown
checkpoint's LSN), from the local disk with pg_controldata. And use
exactly that LSN in the basebackup.

Closes #559
2024-03-11 23:29:32 +04:00
Heikki Linnakangas
74d09b78c7 Keep walproposer alive until shutdown checkpoint is safe on safekepeers
The walproposer pretends to be a walsender in many ways. It has a
WalSnd slot, it claims to be a walsender by calling
MarkPostmasterChildWalSender() etc. But one different to real
walsenders was that the postmaster still treated it as a bgworker
rather than a walsender. The difference is that at shutdown,
walsenders are not killed until the very end, after the checkpointer
process has written the shutdown checkpoint and exited.

As a result, the walproposer always got killed before the shutdown
checkpoint was written, so the shutdown checkpoint never made it to
safekeepers. That's fine in principle, we don't require a clean
shutdown after all. But it also feels a bit silly not to stream the
shutdown checkpoint. It could be useful for initializing hot standby
mode in a read replica, for example.

Change postmaster to treat background workers that have called
MarkPostmasterChildWalSender() as walsenders. That unfortunately
requires another small change in postgres core.

After doing that, walproposers stay alive longer. However, it also
means that the checkpointer will wait for the walproposer to switch to
WALSNDSTATE_STOPPING state, when the checkpointer sends the
PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in
walproposer to receive and handle that signal reliably. Instead, we
mark walproposer as being in WALSNDSTATE_STOPPING always.

In commit 568f91420a, I assumed that shutdown will wait for all the
remaining WAL to be streamed to safekeepers, but before this commit
that was not true, and the test became flaky. This should make it
stable again.

Some tests wrongly assumed that no WAL could have been written between
pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing
flush_ep_to_pageserver which first stops the endpoint and then waits till all
committed WAL reaches the pageserver.

In passing extract safekeeper http client to its own module.
2024-03-11 23:29:32 +04:00
Arseny Sher
0cf0731d8b SIGQUIT instead of SIGKILL prewarmed postgres.
To avoid orphaned processes using wiped datadir with confusing logging.
2024-03-11 22:36:52 +04:00
Sasha Krassovsky
98723844ee Don't return from inside PG_TRY (#7095)
## Problem
Returning from PG_TRY is a bug, and we currently do that

## Summary of changes
Make it break and then return false. This should also help stabilize
test_bad_connection.py
2024-03-11 18:36:39 +00:00
Alex Chi Z
73a8c97ac8 fix: warnings when compiling neon extensions (#7053)
proceeding https://github.com/neondatabase/neon/pull/7010, close
https://github.com/neondatabase/neon/issues/6188

## Summary of changes

This pull request (should) fix all warnings except
`-Wdeclaration-after-statement` in the neon extension compilation.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-03-11 17:49:58 +00:00
Christian Schwarz
17a3c9036e follow-up(#7077): adjust flaky-test-detection cutoff date for tokio-epoll-uring (#7090)
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-03-11 16:36:49 +00:00
Joonas Koivunen
8c5b310090 fix: Layer delete on drop and eviction can outlive timeline shutdown (#7082)
This is a follow-up to #7051 where `LayerInner::drop` and
`LayerInner::evict_blocking` were not noticed to require a gate before
the file deletion. The lack of entering a gate opens up a similar
possibility of deleting a layer file which a newer Timeline instance has
already checked out to be resident in a similar case as #7051.
2024-03-11 16:54:06 +01:00
Christian Schwarz
8224580f3e fix(tenant/timeline metrics): race condition during shutdown + recreation (#7064)
Tenant::shutdown or Timeline::shutdown completes and becomes externally
observable before the corresponding Tenant/Timeline object is dropped.

For example, after observing a Tenant::shutdown to complete, we could
attach the same tenant_id again. The shut down Tenant object might still
be around at the time of the attach.

The race is then the following:
- old object's metrics are still around
- new object uses with_label_values
- old object calls remove_label_values

The outcome is that the new object will have the metric objects (they're
an Arc internall) but the metrics won't be part of the internal registry
and hence they'll be missing in `/metrics`.

Later, when the new object gets shut down and tries to
remove_label_value, it will observe an error because
the metric was already removed by the old object.

Changes
-------

This PR moves metric removal to `shutdown()`.

An alternative design would be to multi-version the metrics using a
distinguishing label, or, to use a better metrics crate that allows
removing metrics from the registry through the locally held metric
handle instead of interacting with the (globally shared) registry.

refs https://github.com/neondatabase/neon/pull/7051
2024-03-11 15:41:41 +01:00
Christian Schwarz
2b0f3549f7 default to tokio-epoll-uring in CI tests & on Linux (#7077)
All of production is using it now as of
https://github.com/neondatabase/aws/pull/1121

The change in `flaky_tests.py` resets the flakiness detection logic.

The alternative would have been to repeat the choice of io engine in
each test name, which would junk up the various test reports too much.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-03-11 14:35:59 +00:00
John Spray
b4972d07d4 storage controller: refactor non-mutable members up into Service (#7086)
result_tx and compute_hook were in ServiceState (i.e. behind a sync
mutex), but didn't need to be.

Moving them up into Service removes a bunch of boilerplate clones.

While we're here, create a helper `Service::maybe_reconcile_shard` which
avoids writing out all the `&self.` arguments to
`TenantState::maybe_reconcile` everywhere we call it.
2024-03-11 14:29:32 +00:00