Commit Graph

237 Commits

Author SHA1 Message Date
Bojan Serafimov
20923e70f5 Merge branch 'main' into bojan-psbench-over-kvstore 2022-04-12 13:04:59 -04:00
Bojan Serafimov
98149f1a08 Parse output 2022-04-12 10:53:14 -04:00
Bojan Serafimov
fbc4206f2d Simplify 2022-04-11 21:50:50 -04:00
Konstantin Knizhnik
07a9553700 Add test for restore from WAL (#1366)
* Add test for restore from WAL

* Fix python formatting

* Choose unused port in wal restore test

* Move recovery tests to zenith_utils/scripts

* Set LD_LIBRARY_PATH in wal recovery scripts

* Fix python test formatting

* Fix mypy warning

* Bump postgres version

* Bump postgres version
2022-04-11 22:30:08 +03:00
Arthur Petukhovsky
6bc78a0e77 Log more info in test_many_timelines asserts (#1473)
It will help to debug #1470 as soon as it happens again
2022-04-07 01:44:26 +03:00
bojanserafimov
6fe443e239 Improve random_writes test (#1469)
If you want to test with a 3GB database by tweaking some constants you'll hit a query timeout. I fix that by batching the inserts.
2022-04-06 18:32:10 -04:00
Bojan Serafimov
d614291c44 Test latest pages 2022-04-05 20:33:57 -04:00
Arthur Petukhovsky
a40b7cd516 Fix timeouts in test_restarts_under_load (#1436)
* Enable backpressure in test_restarts_under_load

* Remove hacks because #644 is fixed now

* Adjust config in test_restarts_under_load
2022-03-31 17:00:09 +03:00
Anton Shyrabokau
5c5629910f Add a test case for reading historic page versions (#1314)
* Add a test case for reading historic page versions

 Test read_page_at_lsn returns correct results when compared to page inspect.
 Validate possiblity of reading pages from dropped relation.
 Ensure funcitons read latest version when null lsn supplied.
 Check that functions do not poison buffer cache with stale page versions.
2022-03-29 22:13:06 -07:00
Arseny Sher
ec3bc74165 Add safekeeper information exchange through etcd.
Safekeers now publish to and pull from etcd per-timeline data. Immediate goal is
WAL truncation, for which every safekeeper must know remote_consistent_lsn; the
next would be callmemaybe replacement.

Adds corresponding '--broker' argument to safekeeper and ability to run etcd in
tests.

Adds test checking remote_consistent_lsn is indeed communicated.
2022-03-29 18:16:49 +04:00
Arthur Petukhovsky
fd78110c2b Add default statement_timeout for tests (#1423) 2022-03-29 09:57:00 +03:00
Arseny Sher
75002adc14 Make shared_buffers large in test_pageserver_catchup.
We intentionally write while pageserver is down, so we shouldn't query it.

Noticed by @petuhovskiy at
https://github.com/zenithdb/postgres/pull/141#issuecomment-1080261700
2022-03-28 20:34:06 +04:00
Heikki Linnakangas
07342f7519 Major storage format rewrite.
This is a backwards-incompatible change. The new pageserver cannot
read repositories created with an old pageserver binary, or vice
versa.

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which
relations exist and what are their sizes, from Repository to a new
module, pgdatadir_mapping.rs. The interface to Repository is now a
simple key-value PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes
with an LSN.

Mapping from Postgres data directory to keys/values
---------------------------------------------------

All the data is now stored in the key-value store. The
'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects
like relation pages and SLRUs, to key-value pairs.

The key to the Repository key-value store is a Key struct, which
consists of a few integer fields. It's wide enough to store a full
RelFileNode, fork and block number, and to distinguish those from
metadata keys.

'pgdatadir_mapping.rs' is also responsible for maintaining a
"partitioning" of the keyspace. Partitioning means splitting the
keyspace so that each partition holds a roughly equal number of keys.
The partitioning is used when new image layer files are created, so
that each image layer file is roughly the same size.

The partitioning is also responsible for reclaiming space used by
deleted keys. The Repository implementation doesn't have any explicit
support for deleting keys. Instead, the deleted keys are simply
omitted from the partitioning, and when a new image layer is created,
the omitted keys are not copied over to the new image layer. We might
want to implement tombstone keys in the future, to reclaim space
faster, but this will work for now.

Changes to low-level layer file code
------------------------------------

The concept of a "segment" is gone. Each layer file can now store an
arbitrary range of Keys.

Checkpointing, compaction
-------------------------

The background tasks are somewhat different now. Whenever
checkpoint_distance is reached, the WAL receiver thread "freezes" the
current in-memory layer, and creates a new one. This is a quick
operation and doesn't perform any I/O yet. It then launches a
background "layer flushing thread" to write the frozen layer to disk,
as a new L0 delta layer. This mechanism takes care of durability. It
replaces the checkpointing thread.

Compaction is a new background operation that takes a bunch of L0
delta layers, and reshuffles the data in them. It runs in a separate
compaction thread.

Deployment
----------

This also contains changes to the ansible scripts that enable having
multiple different pageservers running at the same time in the staging
environment. We will use that to keep an old version of the pageserver
running, for clusters created with the old version, at the same time
with a new pageserver with the new binary.

Author: Heikki Linnakangas
Author: Konstantin Knizhnik <knizhnik@zenith.tech>
Author: Andrey Taranik <andrey@zenith.tech>
Reviewed-by: Matthias Van De Meent <matthias@zenith.tech>
Reviewed-by: Bojan Serafimov <bojan@zenith.tech>
Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech>
Reviewed-by: Anton Shyrabokau <antons@zenith.tech>
Reviewed-by: Dhammika Pathirana <dham@zenith.tech>
Reviewed-by: Kirill Bulatov <kirill@zenith.tech>
Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech>
Reviewed-by: Alexey Kondratov <alexey@zenith.tech>
2022-03-28 05:41:15 -05:00
Bojan Serafimov
2f7d9d2dd5 Count modified bits per wal (WIP) 2022-03-27 18:41:14 -04:00
Heikki Linnakangas
5e04dad360 Add more variants of the sequential scan performance tests.
More rows, and test with serial and parallel plans. But fewer iterations,
so that the tests run in < 1 minutes, and we don't need to mark them as
"slow".
2022-03-25 23:42:13 +02:00
Dmitry Rodionov
b9a1a75b0d clean up unused imports in python tests 2022-03-24 12:47:22 +04:00
Dmitry Rodionov
d3a9cb44a6 tweak timeouts for tenant relocation test 2022-03-24 12:47:22 +04:00
Kirill Bulatov
bd6bef468c Provide single list timelines HTTP API handle 2022-03-21 13:42:21 +02:00
Kirill Bulatov
063f9ba81d Use serde_with to (de)serialize ZId and Lsn to hex 2022-03-21 12:46:07 +02:00
Bojan Serafimov
21f9774ea4 Merge branch 'heikki-kvstore' into bojan-psbench-over-kvstore 2022-03-18 16:27:18 -04:00
Bojan Serafimov
098d7046f8 Improve test 2022-03-18 14:37:01 -04:00
Dmitry Rodionov
7738254f83 refactor timeline memory state management 2022-03-18 18:14:57 +03:00
Heikki Linnakangas
13ec0ce7b2 fix formatting 2022-03-17 19:40:08 +02:00
Heikki Linnakangas
80fc133833 Add sequential scan tests 2022-03-17 17:12:30 +02:00
Bojan Serafimov
811d46f070 Fix 2022-03-16 22:39:55 -04:00
Bojan Serafimov
728f299641 Add hot page workload 2022-03-16 19:18:28 -04:00
Bojan Serafimov
fb49418e7f Trim dead code 2022-03-16 19:05:40 -04:00
Bojan Serafimov
96c2b3a80a WIP working pageserver get_page client 2022-03-16 14:31:42 -04:00
Heikki Linnakangas
c559c72ede Merge remote-tracking branch 'origin/main' into HEAD 2022-03-14 10:26:05 +02:00
Kirill Bulatov
093ad8ab59 Send 409 HTTP responses on timeline and tenant creation for existing entity 2022-03-10 19:38:58 +02:00
Kirill Bulatov
c51d545fd9 Serialize Lsn as strings in http api 2022-03-10 19:38:58 +02:00
Kirill Bulatov
dd74c66ef0 Do not create timeline along with tenant 2022-03-10 19:38:58 +02:00
Kirill Bulatov
a5e10c4f64 Tidy up pageserver's endpoints 2022-03-10 19:38:58 +02:00
Kirill Bulatov
7b5482bac0 Properly store the branch name mappings 2022-03-10 19:38:58 +02:00
Kirill Bulatov
c7569dce47 Allow passing initial timeline id into zenith CLI commands 2022-03-10 19:38:58 +02:00
Kirill Bulatov
4d0f7fd1e4 Update Zenith CLI config between runs 2022-03-10 19:38:58 +02:00
Kirill Bulatov
f49990ed43 Allow creating timelines by branching off ancestors 2022-03-10 19:38:58 +02:00
anastasia
87f306c516 Tune backpressure in python tests to make them more stable 2022-03-10 17:36:09 +04:00
Heikki Linnakangas
98ec8418c4 Fix bug with the partitioning and GC 2022-03-10 13:20:08 +02:00
bojanserafimov
15b19a0a57 [proxy] Test connstr options (#1344)
* Add proxy test
* Fix typo
2022-03-09 22:47:06 +03:00
Heikki Linnakangas
6127b6638b Major storage format rewrite
Major changes and new concepts:

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which relations
exist and what are their sizes, from Repository to a new module,
pgdatadir_mapping.rs. The interface to Repository is now a simple key-value
PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes with
an LSN.

Key
---

The key to the Repository key-value store is a Key struct, which consists
of a few integer fields. It's wide enough to store a full RelFileNode,
fork and block number, and to distinguish those from metadata keys.

See pgdatadir_mapping.rs for how relation blocks and metadata keys are
mapped to the Key struct.

Store arbitrary key-ranges in the layer files
---------------------------------------------

The concept of a "segment" is gone. Each layer file can store an arbitrary
range of Keys.

TODO:

- Deleting keys, to reclaim space. This isn't visible to Postgres, dropping
  or truncating a relation works as you would expect if you look at it from
  the compute node. If you drop a relation, for example, the relation is
  removed from the metadata entry, so that it appears to be gone. However,
  the layered repository implementation never reclaims the storage.

- Tracking "logical database size", for disk space quotas. That ought to
  be reimplemented now in pgdatadir_mapping.rs, or perhaps in walingest.rs.

- LSM compaction. The logic for checkpointing and creating image layers is
  very dumb. AFAIK the *read* code could deal with a full-fledged LSM tree
  now consisting of the delta and image layers. But there's no code to
  take a bunch of delta layers and compact them, and the heuristics for
  when to create image layers is pretty dumb.

- The code to track the layers is inefficient. All layers are just stored in
  a vector, and whenever we need to find a layer, we do a linear search in
  it.
2022-03-09 11:36:39 +02:00
Dmitry Rodionov
1d90b1b205 add node id to pageserver (#1310)
* Add --id argument to safekeeper setting its unique u64 id.

In preparation for storage node messaging. IDs are supposed to be monotonically
assigned by the console. In tests it is issued by ZenithEnv; at the zenith cli
level and fixtures, string name is completely replaced by integer id. Example
TOML configs are adjusted accordingly.

Sequential ids are chosen over Zid mainly because they are compact and easy to
type/remember.

* add node id to pageserver

This adds node id parameter to pageserver configuration. Also I use a
simple builder to construct pageserver config struct to avoid setting
node id to some temporary invalid value. Some of the changes in test
fixtures are needed to split init and start operations for envrionment.

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2022-03-04 01:10:42 +03:00
bojanserafimov
137d616e76 [proxy] Add pytest fixture (#1311) 2022-02-24 11:20:07 -05:00
Kirill Bulatov
917c640818 Fix mypy for the new Python 2022-02-24 14:24:36 +03:00
anastasia
58ee5d005f Add --pageserver-config-override to ZenithEnvBuilder to tune checkpointer and GC in tests.
Usage example:
zenith_env_builder.pageserver_config_override = "checkpoint_period = '100 s'; checkpoint_distance = 1073741824"
2022-02-23 19:59:35 +03:00
anastasia
74a0942a77 Fix zenith feedback processing at compute node.
Add test for backpressure
2022-02-22 13:56:21 +03:00
anastasia
abb422d5de Fix SafekeeperMetrics parsing in python tests 2022-02-21 13:45:22 +03:00
bojanserafimov
fdc15de8b2 Add perf test: test_random_writes (#1292) 2022-02-18 15:46:29 -05:00
Bojan Serafimov
4c64b10aec Revert removal of ignore hint 2022-02-17 13:41:49 +02:00
Bojan Serafimov
ad262a46ad Remove redundant pytest_plugins assignment 2022-02-17 13:41:49 +02:00