Remove sync postgres_backend_async, tidy up its split usage.

- Use postgres_backend_async throughout safekeeper.
- Use framed.rs in postgres_backend_async, similar to tokio_util::codec::Framed
  but with slightly more efficient split. Now IO functions are also cancellation
  safe. Also, there is no allocation in each message read anymore, and data in
  read messages still directly point to input buffer without copies.
- In both safekeeper COPY streams, do read-write from the same thread/task with
  select! for easier error handling.
- Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL
  -- join split parts back catching errors from them before returning.

Initially I hoped to do that read-write without split at all, through polling
IO:
https://github.com/neondatabase/neon/pull/3522
However that turned out to be more complicated than I initially expected
due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>>
which is Send construct just to satisfy the checker; 2) can be workaround with
transmute. But this is so messy that I decided to leave split.

proxy stream.rs is adapted with minimal changes. It also benefits from framed.rs
improvements described above.
This commit is contained in:
Arseny Sher
2023-02-02 12:03:45 +04:00
parent 744428229b
commit 06b97c60f3
49 changed files with 2345 additions and 2638 deletions

View File

@@ -2049,8 +2049,10 @@ class NeonPageserver(PgProtocol):
".*Connection aborted: connection error: error communicating with the server: Broken pipe.*",
".*Connection aborted: connection error: error communicating with the server: Transport endpoint is not connected.*",
".*Connection aborted: connection error: error communicating with the server: Connection reset by peer.*",
# FIXME: replication patch for tokio_postgres regards any but CopyDone/CopyData message in CopyBoth stream as unexpected
".*Connection aborted: connection error: unexpected message from server*",
".*kill_and_wait_impl.*: wait successful.*",
".*Replication stream finished: db error: ERROR: Socket IO error: end streaming to Some.*",
".*Replication stream finished: db error:.*ending streaming to Some*",
".*query handler for 'pagestream.*failed: Broken pipe.*", # pageserver notices compute shut down
".*query handler for 'pagestream.*failed: Connection reset by peer.*", # pageserver notices compute shut down
# safekeeper connection can fail with this, in the window between timeline creation

View File

@@ -1106,8 +1106,8 @@ def test_delete_force(neon_env_builder: NeonEnvBuilder, auth_enabled: bool):
# FIXME: are these expected?
env.pageserver.allowed_errors.extend(
[
".*Failed to process query for timeline .*: Timeline .* was not found in global map.*",
".*Failed to process query for timeline .*: Timeline .* was cancelled and cannot be used anymore.*",
".*Timeline .* was not found in global map.*",
".*Timeline .* was cancelled and cannot be used anymore.*",
]
)