Commit Graph

1753 Commits

Author SHA1 Message Date
bojanserafimov
d5258cdc4d [proxy] Don't print passwords (#1298) 2022-04-06 20:05:24 -04:00
Arthur Petukhovsky
6bc78a0e77 Log more info in test_many_timelines asserts (#1473)
It will help to debug #1470 as soon as it happens again
2022-04-07 01:44:26 +03:00
bojanserafimov
6fe443e239 Improve random_writes test (#1469)
If you want to test with a 3GB database by tweaking some constants you'll hit a query timeout. I fix that by batching the inserts.
2022-04-06 18:32:10 -04:00
Alexey Kondratov
d0c246ac3c Update pageserver OpenAPI spec with missing attach/detach methods (#1463)
We have these methods for some time in the API, so mentioning them in the
spec could be useful for console (see zenithdb/console#867), as we generate
pageserver HTTP API golang client there.
2022-04-05 20:01:57 +03:00
Heikki Linnakangas
2f784144fe Avoid deadlock when locking two buffers.
It happened in unit tests. If a thread tries to read a buffer while
already holding a lock on one buffer, the code to find a victim buffer
to evict could try to evict the buffer that's already locked. To fix,
skip locked buffers.
2022-04-04 20:12:31 +03:00
Heikki Linnakangas
222b723354 Handle read errors when dumping a delta layer file.
If a file is corrupt, let's not stop on first read error, but continue
dumping.
2022-04-04 20:12:28 +03:00
Heikki Linnakangas
089ba6abfe Clean up some comments that still referred to 'segments' 2022-04-04 20:12:25 +03:00
Arthur Petukhovsky
a5a478c321 Bump vendor/postgres to store WAL on disk only (#1342)
Now WAL is no longer held in compute memory
2022-04-04 16:32:30 +03:00
Konstantin Knizhnik
fcf613b6e3 Fix unit tests build 2022-04-04 10:43:27 +03:00
Konstantin Knizhnik
572b3f48cf Add compaction_target_size parameter 2022-04-04 10:43:27 +03:00
Konstantin Knizhnik
bef9b837f1 Replace rwlock with mutex in repartition 2022-04-04 10:43:27 +03:00
Konstantin Knizhnik
232fe14297 Refactor partitioning 2022-04-04 10:43:27 +03:00
Konstantin Knizhnik
92031d376a Fix unit tests 2022-04-04 10:43:27 +03:00
Konstantin Knizhnik
1f0b406b63 Perform repartitioning in compaction thread
refer #1441
2022-04-04 10:43:27 +03:00
Kirill Bulatov
4c9447589a Place an info span into gc loop step 2022-04-03 19:30:36 +03:00
Kirill Bulatov
9e5423c867 Assert in a more informative way 2022-04-03 19:30:36 +03:00
Kirill Bulatov
43c16c5145 Don't log ZIds in the timeline load span 2022-04-03 19:30:36 +03:00
bojanserafimov
af712798e7 Fix pageserver readme formatting
I put the diagram in a fixed-width block, since it wasn't rendering correctly on github.
2022-04-02 00:36:54 +03:00
Dmitry Ivanov
f5da652388 [proxy] Enable keepalives for all tcp connections (#1448) 2022-03-31 20:44:57 +03:00
Anastasia Lubennikova
8745b022a9 Extend LayerMap dump() function to print also open_layers and frozen_layers.
Add verbose option to chose if we need to print all layer's keys or not.
2022-03-31 17:26:24 +03:00
Arthur Petukhovsky
a40b7cd516 Fix timeouts in test_restarts_under_load (#1436)
* Enable backpressure in test_restarts_under_load

* Remove hacks because #644 is fixed now

* Adjust config in test_restarts_under_load
2022-03-31 17:00:09 +03:00
Konstantin Knizhnik
1aa8fe43cf Fix race condition in image layer (#1440)
* Fix race condition in image layer

refer #1439

* Add explicit drop(inner) in layer load method

* Add explicit drop(inner) in layer load method
2022-03-31 15:47:59 +03:00
Dmitry Rodionov
649f324fe3 make logging in basebackup more consistent 2022-03-30 17:58:51 +03:00
Dmitry Rodionov
8609234204 decrease the log level to debug because it is too noisy 2022-03-30 10:13:38 +03:00
Anton Shyrabokau
5c5629910f Add a test case for reading historic page versions (#1314)
* Add a test case for reading historic page versions

 Test read_page_at_lsn returns correct results when compared to page inspect.
 Validate possiblity of reading pages from dropped relation.
 Ensure funcitons read latest version when null lsn supplied.
 Check that functions do not poison buffer cache with stale page versions.
2022-03-29 22:13:06 -07:00
Kirill Bulatov
277e41f4b7 Show s3 spans in logs and improve the log messages 2022-03-29 19:21:31 +03:00
Arthur Petukhovsky
ce0243bc12 Add metric for last_record_lsn (#1430) 2022-03-29 18:54:24 +03:00
Arseny Sher
ec3bc74165 Add safekeeper information exchange through etcd.
Safekeers now publish to and pull from etcd per-timeline data. Immediate goal is
WAL truncation, for which every safekeeper must know remote_consistent_lsn; the
next would be callmemaybe replacement.

Adds corresponding '--broker' argument to safekeeper and ability to run etcd in
tests.

Adds test checking remote_consistent_lsn is indeed communicated.
2022-03-29 18:16:49 +04:00
Dmitry Rodionov
9594362f74 change python cache version to 2 (fixes python cache in circle CI) 2022-03-29 10:42:30 +03:00
Dmitry Rodionov
eee0f51e0c use cargo-hakari to manage workspace_hack crate
workspace_hack is needed to avoid recompilation when different crates
inside the workspace depend on the same packages but with different
features being enabled. Problem occurs when you build crates separately
one by one. So this is irrelevant to our CI setup because there we build
all binaries at once, but it may be relevant for local development.

this also changes cargo's resolver version to 2
2022-03-29 10:42:04 +03:00
Arthur Petukhovsky
fd78110c2b Add default statement_timeout for tests (#1423) 2022-03-29 09:57:00 +03:00
Anton Shyrabokau
be6a6958e2 CI: rebuild postgres when Makefile changes (#1429) 2022-03-28 18:19:20 -07:00
Kirill Bulatov
0e44887929 Show more S3 logs and less verbove WAL logs 2022-03-29 00:36:06 +03:00
Dhammika Pathirana
1aa57fc262 Fix tone down compact log chatter
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-03-28 13:24:13 -07:00
Alexey Kondratov
9a4f0930c0 Turn off S3 for pageserver on staging 2022-03-28 14:14:17 -05:00
Alexey Kondratov
d88f8b4a7e Fix storage deploy condition in ansible playbook 2022-03-28 13:30:40 -05:00
Arthur Petukhovsky
8a901de52a Refactor control file update at safekeeper.
Record global_commit_lsn, have common routine for control file update, add
SafekeeperMemstate.
2022-03-28 21:52:12 +04:00
Alexey Kondratov
a883202495 Enable S3 for pageserver on staging
Follow-up for #1417. Previously we had a problem uploading to S3
due to huge ammount of existing not yet uploaded data. Now we have a
fresh pageserver with LSM storage on staging, so we can try enabling it
once again.
2022-03-28 12:04:40 -05:00
Arseny Sher
780b46ad27 Bump vendor/postgres to fix commit_lsn going backwards. 2022-03-28 20:37:33 +04:00
Arseny Sher
75002adc14 Make shared_buffers large in test_pageserver_catchup.
We intentionally write while pageserver is down, so we shouldn't query it.

Noticed by @petuhovskiy at
https://github.com/zenithdb/postgres/pull/141#issuecomment-1080261700
2022-03-28 20:34:06 +04:00
Heikki Linnakangas
07342f7519 Major storage format rewrite.
This is a backwards-incompatible change. The new pageserver cannot
read repositories created with an old pageserver binary, or vice
versa.

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which
relations exist and what are their sizes, from Repository to a new
module, pgdatadir_mapping.rs. The interface to Repository is now a
simple key-value PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes
with an LSN.

Mapping from Postgres data directory to keys/values
---------------------------------------------------

All the data is now stored in the key-value store. The
'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects
like relation pages and SLRUs, to key-value pairs.

The key to the Repository key-value store is a Key struct, which
consists of a few integer fields. It's wide enough to store a full
RelFileNode, fork and block number, and to distinguish those from
metadata keys.

'pgdatadir_mapping.rs' is also responsible for maintaining a
"partitioning" of the keyspace. Partitioning means splitting the
keyspace so that each partition holds a roughly equal number of keys.
The partitioning is used when new image layer files are created, so
that each image layer file is roughly the same size.

The partitioning is also responsible for reclaiming space used by
deleted keys. The Repository implementation doesn't have any explicit
support for deleting keys. Instead, the deleted keys are simply
omitted from the partitioning, and when a new image layer is created,
the omitted keys are not copied over to the new image layer. We might
want to implement tombstone keys in the future, to reclaim space
faster, but this will work for now.

Changes to low-level layer file code
------------------------------------

The concept of a "segment" is gone. Each layer file can now store an
arbitrary range of Keys.

Checkpointing, compaction
-------------------------

The background tasks are somewhat different now. Whenever
checkpoint_distance is reached, the WAL receiver thread "freezes" the
current in-memory layer, and creates a new one. This is a quick
operation and doesn't perform any I/O yet. It then launches a
background "layer flushing thread" to write the frozen layer to disk,
as a new L0 delta layer. This mechanism takes care of durability. It
replaces the checkpointing thread.

Compaction is a new background operation that takes a bunch of L0
delta layers, and reshuffles the data in them. It runs in a separate
compaction thread.

Deployment
----------

This also contains changes to the ansible scripts that enable having
multiple different pageservers running at the same time in the staging
environment. We will use that to keep an old version of the pageserver
running, for clusters created with the old version, at the same time
with a new pageserver with the new binary.

Author: Heikki Linnakangas
Author: Konstantin Knizhnik <knizhnik@zenith.tech>
Author: Andrey Taranik <andrey@zenith.tech>
Reviewed-by: Matthias Van De Meent <matthias@zenith.tech>
Reviewed-by: Bojan Serafimov <bojan@zenith.tech>
Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech>
Reviewed-by: Anton Shyrabokau <antons@zenith.tech>
Reviewed-by: Dhammika Pathirana <dham@zenith.tech>
Reviewed-by: Kirill Bulatov <kirill@zenith.tech>
Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech>
Reviewed-by: Alexey Kondratov <alexey@zenith.tech>
2022-03-28 05:41:15 -05:00
Kirill Bulatov
55de0b88f5 Hide remote timeline index access details 2022-03-28 12:36:01 +03:00
Kirill Bulatov
d56a0ee19a Avoid recompiling tests for release profile 2022-03-26 08:38:45 +02:00
Kirill Bulatov
18dfc769d8 Use cachepot to cache more rustc builds 2022-03-26 08:38:45 +02:00
Heikki Linnakangas
5e04dad360 Add more variants of the sequential scan performance tests.
More rows, and test with serial and parallel plans. But fewer iterations,
so that the tests run in < 1 minutes, and we don't need to mark them as
"slow".
2022-03-25 23:42:13 +02:00
Dmitry Rodionov
b8cba059a5 temporary disable s3 integration on staging until LSM storge rewrite lands 2022-03-26 00:19:25 +04:00
Heikki Linnakangas
e3fa00972e Use RwLocks in image and delta layers for more concurrency.
With a Mutex, only one thread could read from the layer at a time. I did
some ad hoc profiling with pgbench and saw that a fair amout of time was
spent blocked on these Mutexes.
2022-03-25 15:34:38 +02:00
Kirill Bulatov
b39d1b1717 Exit only on important thread failures 2022-03-25 11:58:54 +02:00
Kirill Bulatov
28bc8e3f5c Log pageserver threads better and shut down on errors in them 2022-03-25 11:58:54 +02:00
Kirill Bulatov
6244fd9e7e Better error messages on zenith cli subcommand invocations 2022-03-25 11:58:54 +02:00