Commit Graph

1372 Commits

Author SHA1 Message Date
Anton Shyrabokau
be6a6958e2 CI: rebuild postgres when Makefile changes (#1429) 2022-03-28 18:19:20 -07:00
Kirill Bulatov
0e44887929 Show more S3 logs and less verbove WAL logs 2022-03-29 00:36:06 +03:00
Dhammika Pathirana
1aa57fc262 Fix tone down compact log chatter
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-03-28 13:24:13 -07:00
Alexey Kondratov
9a4f0930c0 Turn off S3 for pageserver on staging 2022-03-28 14:14:17 -05:00
Alexey Kondratov
d88f8b4a7e Fix storage deploy condition in ansible playbook 2022-03-28 13:30:40 -05:00
Arthur Petukhovsky
8a901de52a Refactor control file update at safekeeper.
Record global_commit_lsn, have common routine for control file update, add
SafekeeperMemstate.
2022-03-28 21:52:12 +04:00
Alexey Kondratov
a883202495 Enable S3 for pageserver on staging
Follow-up for #1417. Previously we had a problem uploading to S3
due to huge ammount of existing not yet uploaded data. Now we have a
fresh pageserver with LSM storage on staging, so we can try enabling it
once again.
2022-03-28 12:04:40 -05:00
Arseny Sher
780b46ad27 Bump vendor/postgres to fix commit_lsn going backwards. 2022-03-28 20:37:33 +04:00
Arseny Sher
75002adc14 Make shared_buffers large in test_pageserver_catchup.
We intentionally write while pageserver is down, so we shouldn't query it.

Noticed by @petuhovskiy at
https://github.com/zenithdb/postgres/pull/141#issuecomment-1080261700
2022-03-28 20:34:06 +04:00
Heikki Linnakangas
07342f7519 Major storage format rewrite.
This is a backwards-incompatible change. The new pageserver cannot
read repositories created with an old pageserver binary, or vice
versa.

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which
relations exist and what are their sizes, from Repository to a new
module, pgdatadir_mapping.rs. The interface to Repository is now a
simple key-value PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes
with an LSN.

Mapping from Postgres data directory to keys/values
---------------------------------------------------

All the data is now stored in the key-value store. The
'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects
like relation pages and SLRUs, to key-value pairs.

The key to the Repository key-value store is a Key struct, which
consists of a few integer fields. It's wide enough to store a full
RelFileNode, fork and block number, and to distinguish those from
metadata keys.

'pgdatadir_mapping.rs' is also responsible for maintaining a
"partitioning" of the keyspace. Partitioning means splitting the
keyspace so that each partition holds a roughly equal number of keys.
The partitioning is used when new image layer files are created, so
that each image layer file is roughly the same size.

The partitioning is also responsible for reclaiming space used by
deleted keys. The Repository implementation doesn't have any explicit
support for deleting keys. Instead, the deleted keys are simply
omitted from the partitioning, and when a new image layer is created,
the omitted keys are not copied over to the new image layer. We might
want to implement tombstone keys in the future, to reclaim space
faster, but this will work for now.

Changes to low-level layer file code
------------------------------------

The concept of a "segment" is gone. Each layer file can now store an
arbitrary range of Keys.

Checkpointing, compaction
-------------------------

The background tasks are somewhat different now. Whenever
checkpoint_distance is reached, the WAL receiver thread "freezes" the
current in-memory layer, and creates a new one. This is a quick
operation and doesn't perform any I/O yet. It then launches a
background "layer flushing thread" to write the frozen layer to disk,
as a new L0 delta layer. This mechanism takes care of durability. It
replaces the checkpointing thread.

Compaction is a new background operation that takes a bunch of L0
delta layers, and reshuffles the data in them. It runs in a separate
compaction thread.

Deployment
----------

This also contains changes to the ansible scripts that enable having
multiple different pageservers running at the same time in the staging
environment. We will use that to keep an old version of the pageserver
running, for clusters created with the old version, at the same time
with a new pageserver with the new binary.

Author: Heikki Linnakangas
Author: Konstantin Knizhnik <knizhnik@zenith.tech>
Author: Andrey Taranik <andrey@zenith.tech>
Reviewed-by: Matthias Van De Meent <matthias@zenith.tech>
Reviewed-by: Bojan Serafimov <bojan@zenith.tech>
Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech>
Reviewed-by: Anton Shyrabokau <antons@zenith.tech>
Reviewed-by: Dhammika Pathirana <dham@zenith.tech>
Reviewed-by: Kirill Bulatov <kirill@zenith.tech>
Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech>
Reviewed-by: Alexey Kondratov <alexey@zenith.tech>
2022-03-28 05:41:15 -05:00
Kirill Bulatov
55de0b88f5 Hide remote timeline index access details 2022-03-28 12:36:01 +03:00
Kirill Bulatov
d56a0ee19a Avoid recompiling tests for release profile 2022-03-26 08:38:45 +02:00
Kirill Bulatov
18dfc769d8 Use cachepot to cache more rustc builds 2022-03-26 08:38:45 +02:00
Heikki Linnakangas
5e04dad360 Add more variants of the sequential scan performance tests.
More rows, and test with serial and parallel plans. But fewer iterations,
so that the tests run in < 1 minutes, and we don't need to mark them as
"slow".
2022-03-25 23:42:13 +02:00
Dmitry Rodionov
b8cba059a5 temporary disable s3 integration on staging until LSM storge rewrite lands 2022-03-26 00:19:25 +04:00
Heikki Linnakangas
e3fa00972e Use RwLocks in image and delta layers for more concurrency.
With a Mutex, only one thread could read from the layer at a time. I did
some ad hoc profiling with pgbench and saw that a fair amout of time was
spent blocked on these Mutexes.
2022-03-25 15:34:38 +02:00
Kirill Bulatov
b39d1b1717 Exit only on important thread failures 2022-03-25 11:58:54 +02:00
Kirill Bulatov
28bc8e3f5c Log pageserver threads better and shut down on errors in them 2022-03-25 11:58:54 +02:00
Kirill Bulatov
6244fd9e7e Better error messages on zenith cli subcommand invocations 2022-03-25 11:58:54 +02:00
Kirill Bulatov
f6b1d76c30 Replace assert! with ensure! for anyhow::Result functions 2022-03-25 11:58:54 +02:00
Kirill Bulatov
edc7bebcb5 Remove obvious panic sources 2022-03-25 11:58:54 +02:00
Kirill Bulatov
a201d33edc Properly print cachepot stats 2022-03-24 21:11:02 +02:00
Heikki Linnakangas
825d363170 Remove some unnecessary Ord etc. trait implementations.
It doesn't make much sense to compare TimelineMetadata structs with
< or >. But we depended on that in the remote storage upload code,
so replace BTreeSets with Vecs there.
2022-03-24 12:20:06 +02:00
Dmitry Rodionov
b9a1a75b0d clean up unused imports in python tests 2022-03-24 12:47:22 +04:00
Dmitry Rodionov
d3a9cb44a6 tweak timeouts for tenant relocation test 2022-03-24 12:47:22 +04:00
Heikki Linnakangas
c718870517 Tiny refactoring of page_cache::init function.
The init function only needs the 'page_cache_size' from the config, so
seems slightly nicer to pass just that.
2022-03-24 09:46:07 +02:00
Dmitry Rodionov
8437fc056e some follow ups after s3 integration was enabled on staging
* do not error out when upload file list is empty
* ignore ephemeral files during sync initialization
2022-03-23 23:35:36 +04:00
Dmitry Rodionov
8b8d78a3a0 use main branch of our bookfile crate 2022-03-23 22:05:43 +04:00
Dmitry Rodionov
8a86276a6e add more context to error 2022-03-23 18:38:15 +04:00
Dmitry Rodionov
0be7ed0cb5 decrease log message severity for timeline checkpoint internals 2022-03-23 18:20:43 +04:00
Dmitry Rodionov
e80ae4306a change log level from info to debug for timeline gc messages 2022-03-23 18:20:43 +04:00
Heikki Linnakangas
123fcd5d0d Revert accidental bump of vendor/postgres submodule
I accidentally bumped it in commit 3b069f5aef. It didn't seem to cause
any harm, but it was not intentional.
2022-03-23 15:45:29 +02:00
Kirill Bulatov
15434ba7e0 Show cachepot build stats 2022-03-23 14:12:59 +02:00
Andrey Taranik
a4d0d78e9e s3 settings for pageserver (#1388) 2022-03-23 13:39:55 +03:00
Dmitry Rodionov
e13bdd77fe add safekepeers gossip annd storage messaging rfcs
they were in prs during rfc repo import

in addition to just import I've added sequence diagrams to storage
messaging rfc
2022-03-22 15:01:26 +04:00
Kirill Bulatov
bd6bef468c Provide single list timelines HTTP API handle 2022-03-21 13:42:21 +02:00
Kirill Bulatov
77ed2a0fa0 Run GitHub testing workflow on every push 2022-03-21 12:46:33 +02:00
Kirill Bulatov
37ebbb598d Add a macOs build 2022-03-21 12:46:33 +02:00
Kirill Bulatov
063f9ba81d Use serde_with to (de)serialize ZId and Lsn to hex 2022-03-21 12:46:07 +02:00
Heikki Linnakangas
3b069f5aef Fix name of directory used in unit test.
There's another test called 'timeline_load'. If the two tests run in
parallel, they would conflict and fail.
2022-03-18 21:27:48 +02:00
Dmitry Rodionov
b19870cd88 guard against partial uploads to local storage 2022-03-18 18:14:57 +03:00
Dmitry Rodionov
7738254f83 refactor timeline memory state management 2022-03-18 18:14:57 +03:00
Dmitry Ivanov
a7544eead5 Remove the last non-borrowed string from BeMessage (#1376) 2022-03-17 16:46:58 +03:00
Andrey Taranik
ab124c161b Merge branch 'release' into main 2022-03-17 00:05:51 +03:00
Andrey Taranik
1fddb0556f deploy playbook fix - interaction with console (#1374) 2022-03-17 00:01:17 +03:00
Andrey Taranik
15a2a2bf04 release 2202-03-16 (#1373)
production deploy
2022-03-16 23:00:01 +03:00
Dmitry Ivanov
705f51db27 [proxy] Propagate some errors to user (#1329)
* [proxy] Propagate most errors to user

This change enables propagation of most errors to the user
(e.g. auth and connectivity errors). Some of them will be
stripped of sensitive information.

As a side effect, most occurrences of `anyhow::Error` were
replaced with concrete error types.

* [proxy] Box weighty errors
2022-03-16 21:20:04 +03:00
Heikki Linnakangas
9c1a9a1d9f Update Cargo.lock for new dependencies (#1354)
Commit b2ad8342d2 added dependency on 'criterion', which pulled along
some other crates.
2022-03-14 21:06:25 +03:00
Arseny Sher
d5a96d3d50 Fix finding end of WAL on safekeepers after f86cf93435.
That commit dropped wal_start_lsn, now we're looking since commit_lsn, which is
the real end of WAL if no records follow it.

ref #1351
2022-03-14 18:54:59 +03:00
Heikki Linnakangas
d93fc371f3 Import all existing RFCs documents from the separate 'rfcs' repository. 2022-03-11 18:49:36 +02:00