Commit Graph

1138 Commits

Author SHA1 Message Date
Alexey Kondratov
f64074c609 Move compute_tools from console repo (zenithdb/console#383)
Currently it's included with minimal changes and lives aside of the main
workspace. Later we may re-use and combine common parts with zenith
control_plane.

This change is mostly needed to unify cloud deployment pipeline:
1.1. build compute-tools image
1.2. build compute-node image based on the freshly built compute-tools
2. build zenith image

So we can roll new compute image and new storage required by it to
operate properly. Also it becomes easier to test console against some
specific version of compute-node/-tools.
2021-12-28 20:17:29 +03:00
anastasia
eba897ffe7 send CallmeEvent::Unsubscribe request only when pageserver is caught up with safekeeper and it's time to stop streaming 2021-12-28 17:50:48 +03:00
anastasia
5ef2b1baf7 Add new test illustrating issue with sync-safekeepers.
If safekeepers sync fast enough, callmemaybe thread may never make a call before receiving Unsubscribe request. This leads to the situation, when pageserver lacks data that exists on safekeepers.
2021-12-28 17:50:48 +03:00
Kirill Bulatov
f0afd08667 Fix zenith init defaults 2021-12-28 00:21:48 +02:00
Kirill Bulatov
b494ac1ea0 Remove redundant pageserver cli params 2021-12-27 18:38:54 +02:00
Arseny Sher
a163650a99 Refactor Postgres command parsing in safekeeper.
Do it separately with SafekeeperPostgresCommand enum as a result. Since query is
always C string, switch postgres_backend process_query argument from Bytes to
&str.

Make passing ztli/ztenant id in safekeeper connection string optional; this is
needed for upcoming intra-safekeeper heartbeat cmd which is not bound to any
timeline.
2021-12-24 15:48:13 +03:00
anastasia
980f5f8440 Propagate remote_consistent_lsn to safekeepers.
Change meaning of lsns in HOT_STANDBY_FEEDBACK:
flush_lsn = disk_consistent_lsn,
apply_lsn = remote_consistent_lsn
Update compute node backpressure configuration respectively.

Update compute node configuration:
set 'synchronous_commit=remote_write' in setup without safekeepers.
This way compute node doesn't have to wait for data checkpoint on pageserver.
This doesn't guarantee data durability, but we only use this setup for tests, so it's fine.
2021-12-24 15:32:54 +03:00
Kirill Bulatov
42647f606e Use correct pageserver CLI parameters in docker entrypoint 2021-12-24 03:41:45 +02:00
bojanserafimov
b807570f46 Use parking_lot::Mutex instead of std::Mutex in walreceiver (#1045) 2021-12-23 14:25:44 -05:00
Kirill Bulatov
114a757d1c Use generic config parameters in pageserver cli
Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
2021-12-23 18:58:28 +02:00
Andrey Taranik
9854ded56b Feature/proxy deploy (#1046)
* zenith proxy deployment

* proxy deploy ci fix

* ci cleanup or zenith proxy deploy
2021-12-23 15:53:28 +03:00
Heikki Linnakangas
fdd987c3ad Refactor the way Image- and DeltaLayers are created
Introduce builder objects, DeltaLayerWriter and ImageLayerWriter.
This gives more flexibility, as the DeltaLayer::create and
ImageLayer::create functions don't need to know about the details of
the format of where the page versions are coming from. This allows us
to change the format used in InMemoryLayer more easily, without having
to modify Delta- and ImageLayer code.

Also refactor the code in InMemoryLayer::write_to_disk for clarity.
2021-12-23 00:33:16 +02:00
Heikki Linnakangas
da62407fce Change the meaning of 'blknum' argument in Layer trait
Previously, the 'blknum' argument of various Layer functions was the
block number within the overall relation. That was pretty confusing,
because an individual layer only holds data from a one segment of the
relation. Furthermore, the 'put_truncation' function already dealt
with per-segment size, not overall relation size, adding to the
confusion.

Change the meaning of the 'blknum' argument to mean the block number
within the segment, not the overall relation.
2021-12-22 16:55:37 +02:00
Heikki Linnakangas
1cc181ca32 Fix WAL redo of commit records with subtransactions.
If a commit record contains XIDs that are stored on different CLOG pages,
we duplicate the commit record for each affected CLOG page. In the redo
routine, we must only apply the parts of the record that apply to the
CLOG page being restored. We got that right in the loop that handles the
sub-XIDs, but incorrectly always set the bit that corresponds to the main
XID.
2021-12-21 23:08:01 +02:00
Heikki Linnakangas
927587cec8 Fix comments in tests 2021-12-21 22:38:33 +02:00
Heikki Linnakangas
bcf80eaa95 Fix multixacts members WAL redo.
The logic to compute the page number was broken, and as a result, only
the first page of multixact members was updated correctly. All the
rest were left as zeros. Improve test_multixact.py to generate more
multixacts, to cover this case.

Also fix the check that the restored PG data directory matches the
original one. Previously, the test compared the 'pg_new' cluster,
which is a bit silly because the test restored the 'pg_new' cluster
only a few lines earlier, so if the multixact WAL redo is somehow
broken, the comparison will just compare two broken data directories
and report success. Change it to compare the original datadir, the one
where the multixacts were originally created, with a restored image of
the same.
2021-12-21 17:50:06 +02:00
Arthur Petukhovsky
f56db3da68 Bump vendor/postgres (#996) 2021-12-21 16:53:08 +03:00
Konstantin Knizhnik
68aa9d2715 Set utf8 encoding in initdb (#993)
refer #992
2021-12-21 15:43:34 +03:00
Konstantin Knizhnik
76777f5812 Add utility for dumping/editing metadata file (#1031) 2021-12-21 15:43:15 +03:00
Arseny Sher
56312522f9 Make safekeeper namings more consistent with reality.
s/send_wal.rs/handler.rs
s/SendWalHandler/SafekeeperPostgresHandler
s/replication.rs/send_wal.rs
2021-12-21 13:24:23 +03:00
Dmitry Rodionov
2d9d0658e8 adjust benchmarking script for go console 2021-12-20 13:54:10 +03:00
anastasia
3b61f364f7 Stop WAL streaming threads, when compute node is shut down.
WAL stream uses the 2 connections:
1. Compute node (walproposer) -> Safekeeper (ReceiveWalConn module)

When compute node is shut down, safekeeper needs to stop the respective receiving thread.
Prior to this PR it didn't work because PostgresBackend haven't handled disconnection properly.

2. Safekeeper (ReplicationConn module) -> pageserver (walreceiver thread)

When incoming WAL stream is gone, safekeeper can stop streaming WAL and cancel connection as soon as replica is caught up.
Note that the WAL can be streamed to multiple replicas simultaneously, only disconnect ones that are caught up to the last_recieved_lsn.
2021-12-20 12:34:28 +03:00
anastasia
90e5b6f983 Don't try to reconnect failed walreceiver. If necessary, wal service will send new callmemaybe request 2021-12-20 12:34:28 +03:00
Heikki Linnakangas
75cbaafb96 Remove old ephemeral files on pageserver restart.
The ephemeral files are not usable after restart, so just delete them.
Before this, you got "unrecognized filename in timeline dir" warnings
about them, as Konstantin noted at:
https://github.com/zenithdb/zenith/issues/906#issuecomment-995530870.

While we're at it, refactor away the list_files() function, moving the
logic fully into the caller. Seems more straightforward.
2021-12-17 00:00:02 +02:00
Andrey Taranik
5d5c2738a6 staging deployment flow fix (#1029) 2021-12-16 22:54:01 +03:00
Andrey Taranik
cbe155ff48 storage CI flow for staging environment (#1003)
* storage CI flow for staging environment

* prevent deploy version older than already deployed
2021-12-16 17:05:20 +03:00
Kirill Bulatov
29143b018e Disable rustc incremental compilation to avoid ICEs 2021-12-15 21:57:34 +03:00
Heikki Linnakangas
d8a367dd32 Remove dead code, fix typos. 2021-12-15 19:58:03 +02:00
Kirill Bulatov
ca60561a01 Propagate disk consistent lsn in timeline sync statuses 2021-12-15 15:13:21 +02:00
Andrey Taranik
86a409a174 cleanup circleci config after test 2021-12-15 16:08:31 +03:00
Andrey Taranik
66242f0d0e tag docker image by commit sha and add docker build for compute 2021-12-15 16:08:31 +03:00
Heikki Linnakangas
7f78e80c51 Refactor WAL ingestion code.
Rename save_decoded_record() to ingest_record(), and move the
responsibility for decoding the record into ingest_record().

Also move the responsibility of updating the CheckPoint relish to
ingest_record(). Put it in a new WalIngest struct, to help with tracking
that.
2021-12-14 20:24:03 +02:00
Heikki Linnakangas
f8f88154d5 Split restore_local_repo.rs into two files, with more descriptive names. 2021-12-14 20:24:03 +02:00
Kirill Bulatov
5cff7d1de9 Use proper download order 2021-12-14 15:32:22 +02:00
Arseny Sher
8f0cafd508 Grab safekeeper.lock on the whole directory instead of per tli.
closes #976
2021-12-13 22:11:04 +03:00
Heikki Linnakangas
e0d41ac6a3 Move constants related to metadata file to metadata.rs.
They're not used anywhere else, so seems like a better place.
2021-12-13 16:57:16 +02:00
Heikki Linnakangas
72ef59c378 Fix small typos in comments, add a comment.
The introducing paragraph README could use some more love, but let's at
least fix the typos.
2021-12-13 13:44:08 +02:00
Kirill Bulatov
673c297949 Download timelines on demand 2021-12-10 17:23:35 +02:00
Kirill Bulatov
e61732ca7c Compress checkpoint files before streaming into S3 2021-12-10 17:23:35 +02:00
Heikki Linnakangas
cb4a8396fb Use rustls rather than native-tls in all dependencies.
We depends on rustls in postgres_backend anyway, so might as well use it
for all TLS stuff. Seems better to depend on only one library both from a
security point of view, and because fewer dependencies means less code to
compile. With this commit, we no longer depend on OpenSSL.
2021-12-10 15:14:27 +02:00
Heikki Linnakangas
c77e30116e Split waldecoder.rs into two source files.
Move the code for decoding a WAL stream into WAL records into
'postgres_ffi', and keep the code to parse the WAL records deeper in
'pageserver' crate, renamed to walrecord.rs.

This tidies up the dependencies a bit. 'walkeeper' reuses the same
waldecoder routines, and it used to depend on 'pageserver' because of
that. Now it only depends on 'postgres_ffi'.

(The comment in walkeeper/Cargo.toml that claimed that the dependency was
needed for ZTimelineId was obsolete. ZTimelineId is defined in
'zenith_utils', the dependency was actually needed for the waldecoder.)
2021-12-10 15:14:13 +02:00
Heikki Linnakangas
9d369f158c Update rust-s3 to version 0.28.0
0.28.0 includes two changes I submitted to upstream:

- Add support for older ListObjects API, needed to use rust-s3 with Google
  Cloud Storage: https://github.com/durch/rust-s3/pull/229

- If file is smaller than one chunk, don't initiate multi-part upload.
  https://github.com/durch/rust-s3/pull/228

These are not critical for Zenith right now, but let's stay up-to-date.
2021-12-10 14:52:08 +02:00
Heikki Linnakangas
6ecd442fb9 Remove a bunch of unnecessary dependencies. 2021-12-10 14:24:33 +02:00
Heikki Linnakangas
f3f059c1f8 Fix a few cases where request beyond end of rel would error out.
Currently, we return an all-zeros page if you request a block beyond end of
a relation. That has been implemented in LayeredTimeline::materialize_page,
so that if Layer::get_page_reconstruct_data returns Missing, it returns
and all-zeros page.

However InMemoryLayer and DeltaLayer would return Continue, not Missing,
in that case, and materialize_page would try to find the predecessor
layer. If there was a preceding image layer, then everything would still
work, but if there wasn't, it would return a "could not find predecessor
of layer" error. Fix that in InMemoryLayer and DeltaLayer, making them
check the size of the relation and return Missing in that case.

This is hard to reproduce at the moment, but it happened quickly with
pgbench when I modified InMemoryLayer::write_to_disk so that it didn't
always create a new ImageLayer.
2021-12-09 17:46:48 +02:00
Dmitry Ivanov
8388e14bbd [scripts/git-upload] Fix logic of --forbid-overwrite 2021-12-09 14:06:17 +03:00
anastasia
5293e183c5 callmemaybe. review code cleanup 2021-12-09 13:31:49 +03:00
anastasia
93ff5f7ff0 Add default value for safekeeper --recall option. DEFAULT_RECALL_PERIOD is 1 second. 2021-12-09 13:31:49 +03:00
anastasia
41dce68bdd callmemaybe refactoring
- Don't spawn a separate thread for each connection.
Instead use one thread per safekeeper, that iterates over all connections and sends callback requests for them.

-Use tokio postgres to connect to the pageserver, to avoid spawning a new thread for each connection.

callmemaybe review fixes:
- Spawn all request_callback tasks separately.
- Remember 'last_call_time' and only send request_callback if 'recall_period' has passed.
- If task hasn't finished till next recall, abort it and try again.
- Add pause/resume CallmeEvents to avoid spamming pageserver when connection already established.
2021-12-09 13:31:49 +03:00
Dmitry Rodionov
7dece8e4a0 skip temporary table files when comparing directories in regress tests 2021-12-09 12:53:26 +03:00
Arseny Sher
37c85d5fd9 Switch safekeeper from log to tracing logging.
Add context to wal acceptor and wal sender threads showing timeline id and
unique id differentiating them.
2021-12-09 06:57:46 +03:00