Commit Graph

126 Commits

Author SHA1 Message Date
Dmitry Ivanov
18d3d078ad [WIP] [proxy] Migrate to async 2022-02-08 05:43:32 +03:00
Dmitry Ivanov
c2927353a5 Enable async deserialization of FeMessage
Now it's possible to call Fe{Startup,}Message in both
sync and async contexts, which is good for proxy.

Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2022-01-28 19:40:37 +03:00
Arseny Sher
86045ac36c Prefix per-cluster directory with ztenant_id in safekeeper.
Currently ztimelineids are unique, but all APIs accept the pair, so let's keep
it everywhere for uniformity.

Carry around ZTTId containing both ZTenantId and ZTimelineId for simplicity.

(existing clusters on staging ought to be preprocessed for that)
2022-01-27 17:22:07 +03:00
Konstantin Knizhnik
79f0e44a20 Gc cutoff rwlock (#1139)
* Reproduce github issue #1047.

* Use RwLock to protect gc_cuttof_lsn

* Eeduce number of updates in test_gc_aggressive

* Change  test_prohibit_get_page_at_lsn_for_garbage_collected_pages test

* Change  test_prohibit_get_page_at_lsn_for_garbage_collected_pages

* Lock latest_gc_cutoff_lsn in all operations accessing storage to prevent race conditions with GC

* Remove random sleep between wait_for_lsn and get_page_at_lsn

* Initialize latest_gc_cutoff with initdb_lsn and remove separate check that lsn >= initdb_lsn

* Update test_prohibit_branch_creation_on_pre_initdb_lsn test

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>
2022-01-27 14:41:16 +03:00
anastasia
5abe2129c6 Extend replication protocol with ZentihFeedback message
to pass current_timeline_size to compute node

Put standby_status_update fields into ZenithFeedback and send them as one message.
Pass values sizes together with keys in ZenithFeedback message.
2022-01-27 11:20:45 +03:00
Dmitry Rodionov
e6f2d70517 use 2021 rust edition 2022-01-25 18:48:49 +03:00
Dmitry Ivanov
703716228e Use &str instead of String in BeMessage::ErrorResponse
There's no need in allocating string literals in the heap.
2022-01-24 18:49:05 +03:00
Dmitry Rodionov
37c440c5d3 Introduce first version of tenant migraiton between pageservers
This patch includes attach/detach http endpoints in pageservers. Some
changes in callmemaybe handling inside safekeeper and an integrational
test to check migration with and without load. There are still some
rough edges that will be addressed in follow up patches
2022-01-24 17:20:15 +03:00
Dmitry Ivanov
d3542c34f1 Refactoring: use anyhow::Context's methods where possible 2022-01-19 16:33:48 +03:00
Heikki Linnakangas
dab30c27b6 Refactor thread management and shutdown
This introduces a new module to handle thread creation and shutdown.
All page server threads are now registered in a global hash map, and
there's a function to request individual threads to shut down gracefully.

Thread shutdown request is signalled to the thread with a flag, as well
as a Future that can be used to wake up async operations if shutdown is
requested. Use that facility to have the libpq listener thread respond
to pageserver shutdown, based on Kirill's earlier prototype
(https://github.com/zenithdb/zenith/pull/1088). That addresses
https://github.com/zenithdb/zenith/issues/1036, previously the libpq
listener thread would not exit until one more connection arrives.

This also eliminates a resource leak in the accept() loop. Previously,
we added the JoinHanlde of each new thread to a vector but old handles
for threads that had already exited were never removed.
2022-01-14 18:36:10 +02:00
Heikki Linnakangas
adb0b3dada Include backtrace in error messages in the log.
'anyhow' crate can include a backtrace in all errors, when the
'backtrace' feature is enabled. Enable it, and change the places that used
'{:#}' or '{}' to '{:?}', so that the backtrace is printed.
2022-01-14 10:10:17 +02:00
bojanserafimov
5b9391b51d Support "query cancel" in proxy (#1052) 2022-01-05 17:27:12 -05:00
bojanserafimov
24eca8d58b Parse cancel message in pq_proto (#1060) 2021-12-28 16:43:44 -05:00
Bojan Serafimov
1e3ddd43bc Add struct for key data 2021-12-28 22:40:22 +03:00
Bojan Serafimov
989371493b Add BeMessage::BackendKeyData variant 2021-12-28 22:40:22 +03:00
Kirill Bulatov
f0afd08667 Fix zenith init defaults 2021-12-28 00:21:48 +02:00
Arseny Sher
a163650a99 Refactor Postgres command parsing in safekeeper.
Do it separately with SafekeeperPostgresCommand enum as a result. Since query is
always C string, switch postgres_backend process_query argument from Bytes to
&str.

Make passing ztli/ztenant id in safekeeper connection string optional; this is
needed for upcoming intra-safekeeper heartbeat cmd which is not bound to any
timeline.
2021-12-24 15:48:13 +03:00
Kirill Bulatov
114a757d1c Use generic config parameters in pageserver cli
Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
2021-12-23 18:58:28 +02:00
anastasia
3b61f364f7 Stop WAL streaming threads, when compute node is shut down.
WAL stream uses the 2 connections:
1. Compute node (walproposer) -> Safekeeper (ReceiveWalConn module)

When compute node is shut down, safekeeper needs to stop the respective receiving thread.
Prior to this PR it didn't work because PostgresBackend haven't handled disconnection properly.

2. Safekeeper (ReplicationConn module) -> pageserver (walreceiver thread)

When incoming WAL stream is gone, safekeeper can stop streaming WAL and cancel connection as soon as replica is caught up.
Note that the WAL can be streamed to multiple replicas simultaneously, only disconnect ones that are caught up to the last_recieved_lsn.
2021-12-20 12:34:28 +03:00
Kirill Bulatov
673c297949 Download timelines on demand 2021-12-10 17:23:35 +02:00
Dmitry Ivanov
7cec13d1df Improve shutdown story for code coverage
This patch introduces fixes for several problems affecting
LLVM-based code coverage:

* Daemonizing parent processes should call _exit() to prevent
coverage data file corruption (*.profraw) due to concurrent writes.

* Implement proper shutdown handlers in safekeeper.
2021-12-06 13:27:52 +03:00
Dmitry Rodionov
2669d140f8 use full commit sha for version info
for builds in docker this is not needed, since environment variable
with commit sha already contains full version
2021-12-01 17:35:57 +03:00
Dmitry Rodionov
130184fee9 Prohibit branch creation and basebackup at out of scope lsns
Out of scope LSNs include pre initdb LSNs, and LSNs prior to
latest_gc_cutoff.

To get there there was also two cleanups:
* Fix error handling in Execute message handler. This fixes behaviour
  when basebackup retured an error. Previously pageserver thread just
  died.
* Remove "ancestor" file which previously contained ancestor id and
  branch lsn. Currently the same data can be obtained from metadata file.
  And just the way we handled ancestor file in the code introduced the
  case when branching fails timeline directory is created but there is no data in it
  except ancestor file. And this confused gc because it scans
  directories. So it is better to just remove ancestor file and clean up
  this timeline directory creation so it happens after all validity
  checks have passed
2021-11-25 15:27:16 +03:00
Dmitry Ivanov
0ccfc62e88 [proxy] Pass PostgreSQL version to client
Fixes #779
2021-11-17 16:28:44 +03:00
Dmitry Ivanov
43ded1c54b [proxy] Minor cleanup 2021-11-17 16:28:44 +03:00
Dmitry Rodionov
44111e3ba3 Prohibit branch creation at lsn that was already garbage collected.
This introduces new timeline field latest_gc_cutoff. It is updated
before each gc iteration. New check is added to branch_timelines to
prevent branch creation with start point less than latest_gc_cutoff.
Also this adds a check to get_page_at_lsn which asserts that lsn at
which the page is requested was not garbage collected. This check
currently is triggered for readonly nodes which are pinned to specific
lsn and because they are not tracked in pageserver garbage collection
can remove data that still might be referenced. This is a bug and will
be fixed separately.
2021-11-15 20:03:16 +03:00
Patrick Insinger
1ce4976e36 pageserver - track size of VecMaps 2021-11-10 11:09:34 -08:00
Dmitry Rodionov
987833e0b9 Propagate git SHA to zenith binaries
Git commit sha is displayed when --version flag is used and is written
to logs during service startup. Uses git_version crate when git is
available, and GIT_VERSION environment variable otherwise which is the case for docker
builds.
2021-11-04 14:22:29 +03:00
Heikki Linnakangas
b38e841f2d Use poll() in communication with WAL redo process.
The tokio futures added some overhead, so switch to plain non-blocking
I/O with poll(). In a simple pgbench test on my laptop (select-only
queries, scale-factor 1 `pgbench -P1 -T50 -S`), this gives about 10%
improvement, from about 4300 TPS to 4800 TPS.
2021-11-04 10:39:04 +02:00
Patrick Insinger
b532470792 Set SO_REUSEADDR for all TCP listeners 2021-10-29 12:45:26 -07:00
Heikki Linnakangas
af429fb401 Improve 'zenith' CLI utility for safekeepers and a config file.
The 'zenith' CLI utility can now be used to launch safekeepers. By
default, one safekeeper is configured. There are new 'safekeeper
start/stop' subcommands to manage the safekeepers. Each safekeeper is
given a name that can be used to identify the safekeeper to start/stop
with the 'zenith start/stop' commands. The safekeeper data is stored
in '.zenith/safekeepers/<name>'.

The 'zenith start' command now starts the pageserver and also all
safekeepers. 'zenith stop' stops pageserver, all safekeepers, and all
postgres nodes.

Introduce new 'zenith pageserver start/stop' subcommands for
starting/stopping just the page server.

The biggest change here is to the 'zenith init' command. This adds a
new 'zenith init --config=<path to toml file>' option. It takes a toml
config file that describes the environment. In the config file, you
can specify options for the pageserver, like the pg and http ports,
and authentication. For each safekeeper, you can define a name and the
pg and http ports. If you don't use the --config option, you get a
default configuration with a pageserver and one safekeeper. Note that
that's different from the previous default of no safekeepers.  Any
fields that are omitted in the configuration file are filled with
defaults. You can also specify the initial tenant ID in the config
file. A couple of sample config files are added in the control_plane/
directory.

The --pageserver-pg-port, --pageserver-http-port, and
--pageserver-auth options to 'zenith init' are removed. Use a config
file instead.

Finally, change the python test fixtures to use the new 'zenith'
commands and the config file to describe the environment.
2021-10-27 10:49:38 +03:00
Kirill Bulatov
d88377f9f0 Remove log from zenith_utils 2021-10-26 23:24:11 +03:00
Kirill Bulatov
ecd577c934 Simplify tracing declarations 2021-10-26 23:24:11 +03:00
Konstantin Knizhnik
c310932121 Implement backpressure for compute node to avoid WAL overflow
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2021-10-21 18:15:50 +03:00
Dmitry Ivanov
85116a8375 [proxy] Prevent TLS stream from hanging
This change causes writer halves of a TLS stream to always flush after a
portion of bytes has been written by `std::io::copy`. Furthermore, some
cosmetic and minor functional changes are made to facilitate debug.
2021-10-20 14:15:49 +03:00
Arseny Sher
de744a44dd Add /timeline http request to safekeeper returning its status.
Which is mainly generational state (terms) and useful LSNs.

Also add /status basic healthcheck request which is now used in tests to
determine the safekeeper is up; this fixes #726.

ref #115
2021-10-14 19:02:38 +03:00
Arseny Sher
96f1175a80 Cleanup hardcoded oids. 2021-10-13 10:52:47 +03:00
anastasia
d7c9dd06f4 Implement graceful shutdown at 'pageserver stop':
- perform checkpoint for each tenant repository.
- wait for the completion of all threads.

Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.
2021-10-11 13:35:01 +03:00
Heikki Linnakangas
7216f22609 Use tracing crate to have more context in log messages.
Whenever we start processing a request, we now enter a tracing "span"
that includes context information like the tenant and timeline ID, and
the operation we're performing. That context information gets attached
to every log message we create within the span. That way, we don't need
to include basic context information like that in every log message, and
it also becomes easier to filter the logs programmatically.

This removes the eplicit timeline and tenant IDs from most log messages,
as you get that information from the enclosing span now.

Also improve log messages in general, dialing down the level of some
messages that are not very useful, and adding information to others.

We now obey the RUST_LOG env variable, if it's set.

The 'tracing' crate allows for different log formatters, like JSON or
bunyan output. The one we use now is human-readable multi-line format,
which is nice when reading the log directly, but hard for
post-processing.  For production, we'll probably want JSON output and
some tools for working with it, but that's left as a TODO. The log
format is easy to change.
2021-10-11 08:59:06 +03:00
Patrick Insinger
0baf4bc796 fix cargo doc complaints 2021-10-09 08:45:46 -07:00
Patrick Insinger
c356030660 pageserver - use VecMap for delta metadata & sizes 2021-10-08 15:05:22 -07:00
Patrick Insinger
c4bb6d78d4 pageserver - use VecMap for in memory segsizes 2021-10-08 14:37:32 -07:00
Patrick Insinger
3b82e806f2 pageserver - use VecMap for in-memory PageVersions 2021-10-08 14:11:07 -07:00
Egor Suvorov
7e190d72a5 Make pageserver_ prefix for common metric names configurable (#681) 2021-10-05 19:06:44 +03:00
Patrick Insinger
664b99b5ac pageserver - use constant TIMELINE_ID for tests 2021-10-04 08:36:35 -07:00
Max Sharnoff
84f7dcd052 Fix clippy errors on nightly (2021-09-29) (#691)
Most of the changes are for the new if-then-panic lint added in
https://github.com/rust-lang/rust-clippy/pull/7669.
2021-10-01 15:45:42 -07:00
Patrick Insinger
0a8aaa2c24 zenith_utils - add crashsafe_dir
Utility for creating directories and directory trees in a crash safe
manor.

Minimizes calls to fsync for trees.
2021-10-01 11:41:39 -07:00
Stas Kelvich
3bac4d485d Fix EncryptionResponse message in pq_proto.rs
Positive EncryptionResponse should set 'S' byte, not 'Y'. With that
fix it is possible to connect to proxy with SSL enabled and read
deciphered notice text. But after the first query everything stucks.
2021-09-27 11:56:43 +03:00
sharnoff
a72707b8cb Redo #655 with fix: Allow LeSer/BeSer impls missing either Serialize or Deserialize
Commit message copied below:

* Allow LeSer/BeSer impls missing Serialize/Deserialize

Currently, using `LeSer` or `BeSer` requires that the type implements
both `Serialize` and `DeserializeOwned`, even if we're only using the
trait for one of those functionalities.

Moving the bounds to the methods gives the convenience of the traits
without requiring unnecessary derives.

* Remove unused #[derive(Serialize/Deserialize)]

This should hopefully reduce compile times - if only by a little bit.

Some of these were already unused (we weren't using LeSer/BeSer for the
types), but most are have *become* unused with the change to
LeSer/BeSer.
2021-09-24 10:58:01 -07:00
Max Sharnoff
0f770967b4 Revert "Allow LeSer/BeSer impls missing either Serialize or Deserialize (#655)
This reverts commit bd9f4794d9.
2021-09-24 10:18:36 -07:00