Commit Graph

646 Commits

Author SHA1 Message Date
Heikki Linnakangas
6a264aaca3 Stopgap "fix" for test_parallel_copy failure in debug mode. 2022-03-14 19:54:38 +02:00
Heikki Linnakangas
60ed6b3710 Shave some CPU cycles from reading blobs from files.
This shows up in 'perf' profile when running in debug mode. Not so
significant in release mode, but still.
2022-03-14 19:53:00 +02:00
Heikki Linnakangas
89690d7349 Prevent compaction from running at same time as GC.
For same reasons as we prohibited concurrent checkpointing and GC
previosly.
2022-03-14 14:22:04 +02:00
Heikki Linnakangas
09f2dff537 Refactor the checkpoint and compaction functions.
The concept of a "checkpoint" had become quite muddled. This tries to
clarify it again.
2022-03-14 13:22:46 +02:00
Heikki Linnakangas
2d8587f67d Separate flushing in-memory layer to disk from checkpoints.
When 'checkpoint_distance' is reached, freeze the current in-memory
layer directly in the WAL receiver thread. And to flush the frozen
layer to disk, launch a separate "layer flushing thread". This leaves
only the compaction duty to the checkpoint thread.
2022-03-14 11:37:22 +02:00
Heikki Linnakangas
c559c72ede Merge remote-tracking branch 'origin/main' into HEAD 2022-03-14 10:26:05 +02:00
Heikki Linnakangas
f06707badc Bugfix: a few constant keys were missing from collect_keyspace
As a result, you got "could not find data for key" errors.
2022-03-13 01:15:32 +02:00
Heikki Linnakangas
64cdd6064d Don't ClearVisibilityMapFlags records for non-existent blocks.
We create a ClearVisibilityMapFlags record for the VM page, when a heap
WAL record indicates that the VM bit needs to be cleared. However,
sometimes the VM block would not exist. It seems that PostgreSQL
sometimes sets the clear-VM bit on WAL records, even though the
corresponding VM page hasn't been initialized yet. There's no point in
trying to clear a bit on a non-existent bit, so just skip emitting the
record if the VM page doesn't exist.

I'm not entirely sure why we're only seeing this bug with this PR, I
think it existed before. Maybe we were more sloppy and returned
an all-zeros page?
2022-03-13 01:14:58 +02:00
Heikki Linnakangas
ee40297758 Refactor keyspace code
Have separate classes for the KeySpace, a partitioning of the KeySpace
(KeyPartitioning), and a builder object used to construct the KeySpace.
Previously, KeyPartitioning did all those things, and it was a bit
confusing.
2022-03-11 16:24:13 +02:00
Heikki Linnakangas
d5b8380dae Improve comments on image layer.
Make it more explicit that if a key doesn't exist in an image layer, it
doesn't exist.
2022-03-11 09:47:09 +02:00
Heikki Linnakangas
bce2da4e55 Another 'tablespace' test fix. 2022-03-11 00:53:46 +02:00
Dhammika Pathirana
5d7bd8643a Fix page reconstruct time histo
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-03-10 14:42:28 -08:00
Dhammika Pathirana
27dadba52c Fix retain references to layer histograms
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>
2022-03-10 14:42:28 -08:00
Dhammika Pathirana
f67d010d1b Add ps smgr/storage metrics tenant tags
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>

Add tenant_id,timeline_id in smgr/storage metrics (#1234)
2022-03-10 14:42:28 -08:00
Heikki Linnakangas
3948956e87 Fix pg_table_size() on a view 2022-03-10 23:35:24 +02:00
Heikki Linnakangas
a726b555fb Handle tablespaces gracefully.
We don't really support tablespaces. But this makes the 'tablespace'
Postgres regression test pass, like it did previously.
2022-03-10 22:56:37 +02:00
Kirill Bulatov
093ad8ab59 Send 409 HTTP responses on timeline and tenant creation for existing entity 2022-03-10 19:38:58 +02:00
Kirill Bulatov
c51d545fd9 Serialize Lsn as strings in http api 2022-03-10 19:38:58 +02:00
Kirill Bulatov
fe6fccfdae Allow already existing repo when creating a tenant 2022-03-10 19:38:58 +02:00
Kirill Bulatov
dd74c66ef0 Do not create timeline along with tenant 2022-03-10 19:38:58 +02:00
Kirill Bulatov
a5e10c4f64 Tidy up pageserver's endpoints 2022-03-10 19:38:58 +02:00
Kirill Bulatov
7b5482bac0 Properly store the branch name mappings 2022-03-10 19:38:58 +02:00
Kirill Bulatov
c7569dce47 Allow passing initial timeline id into zenith CLI commands 2022-03-10 19:38:58 +02:00
Kirill Bulatov
4d0f7fd1e4 Update Zenith CLI config between runs 2022-03-10 19:38:58 +02:00
Kirill Bulatov
f49990ed43 Allow creating timelines by branching off ancestors 2022-03-10 19:38:58 +02:00
Kirill Bulatov
0c91091c63 Avoid point in time concept on pageserver level 2022-03-10 19:38:58 +02:00
Kirill Bulatov
10f811e886 Use timeline instead of branch in pageserver's API 2022-03-10 19:38:58 +02:00
Heikki Linnakangas
0e3512aad0 Crank down logging again 2022-03-10 18:50:12 +02:00
Heikki Linnakangas
dd56eeefbf Crank up logging 2022-03-10 15:45:50 +02:00
Heikki Linnakangas
d19a293e7e Add a test for branching 2022-03-10 14:56:13 +02:00
Heikki Linnakangas
be4aebd7e9 silence clippy 2022-03-10 13:36:28 +02:00
Heikki Linnakangas
dac73328ba Fix bug where reldir was not written to image layer. 2022-03-10 13:20:08 +02:00
Heikki Linnakangas
fb79c7f1f0 Make compaction more concurrent 2022-03-10 13:20:08 +02:00
Heikki Linnakangas
e7bd74d558 Tidy up 2022-03-10 13:20:08 +02:00
Heikki Linnakangas
da8beffc95 Fix logical timeline size tracking 2022-03-10 13:20:08 +02:00
Heikki Linnakangas
98ec8418c4 Fix bug with the partitioning and GC 2022-03-10 13:20:08 +02:00
Heikki Linnakangas
92d1322cd5 comments, other cleanup 2022-03-10 13:20:08 +02:00
Heikki Linnakangas
2896d35a8b rustfmt and clippy fixes 2022-03-10 13:20:08 +02:00
Heikki Linnakangas
e096c62494 Misc fixes and stuff 2022-03-09 11:36:39 +02:00
Heikki Linnakangas
356f716d39 Fixes 2022-03-09 11:36:39 +02:00
Heikki Linnakangas
798ff26fb0 More work on compaction, and resurrect some unit tests 2022-03-09 11:36:39 +02:00
Heikki Linnakangas
28045890eb Work on compaction. 2022-03-09 11:36:39 +02:00
Heikki Linnakangas
6127b6638b Major storage format rewrite
Major changes and new concepts:

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which relations
exist and what are their sizes, from Repository to a new module,
pgdatadir_mapping.rs. The interface to Repository is now a simple key-value
PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes with
an LSN.

Key
---

The key to the Repository key-value store is a Key struct, which consists
of a few integer fields. It's wide enough to store a full RelFileNode,
fork and block number, and to distinguish those from metadata keys.

See pgdatadir_mapping.rs for how relation blocks and metadata keys are
mapped to the Key struct.

Store arbitrary key-ranges in the layer files
---------------------------------------------

The concept of a "segment" is gone. Each layer file can store an arbitrary
range of Keys.

TODO:

- Deleting keys, to reclaim space. This isn't visible to Postgres, dropping
  or truncating a relation works as you would expect if you look at it from
  the compute node. If you drop a relation, for example, the relation is
  removed from the metadata entry, so that it appears to be gone. However,
  the layered repository implementation never reclaims the storage.

- Tracking "logical database size", for disk space quotas. That ought to
  be reimplemented now in pgdatadir_mapping.rs, or perhaps in walingest.rs.

- LSM compaction. The logic for checkpointing and creating image layers is
  very dumb. AFAIK the *read* code could deal with a full-fledged LSM tree
  now consisting of the delta and image layers. But there's no code to
  take a bunch of delta layers and compact them, and the heuristics for
  when to create image layers is pretty dumb.

- The code to track the layers is inefficient. All layers are just stored in
  a vector, and whenever we need to find a layer, we do a linear search in
  it.
2022-03-09 11:36:39 +02:00
Heikki Linnakangas
c7c1e19667 Use more generics, less dyn 2022-03-09 11:36:38 +02:00
Kirill Bulatov
9424bfae22 Use a separate newtype for ZId that (de)serialize as hex strings 2022-03-04 10:58:40 +02:00
Dmitry Rodionov
1d90b1b205 add node id to pageserver (#1310)
* Add --id argument to safekeeper setting its unique u64 id.

In preparation for storage node messaging. IDs are supposed to be monotonically
assigned by the console. In tests it is issued by ZenithEnv; at the zenith cli
level and fixtures, string name is completely replaced by integer id. Example
TOML configs are adjusted accordingly.

Sequential ids are chosen over Zid mainly because they are compact and easy to
type/remember.

* add node id to pageserver

This adds node id parameter to pageserver configuration. Also I use a
simple builder to construct pageserver config struct to avoid setting
node id to some temporary invalid value. Some of the changes in test
fixtures are needed to split init and start operations for envrionment.

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2022-03-04 01:10:42 +03:00
Kirill Bulatov
949f8b4633 Fix 1.59 rustc clippy warnings 2022-03-02 21:35:34 +02:00
Heikki Linnakangas
5120ba4b5f Refactor the interface for using cached page image.
Instead of passing it as a separate argument to get_page_reconstruct_data,
the caller can fill it in the PageReconstructData struct.
2022-02-24 10:37:12 +02:00
Heikki Linnakangas
e4670a5f1e Remove the PageVersions abstraction.
Since commit fdd987c3ad, it was only used in InMemoryLayers. Let's
just "inline" the code into InMemoryLayer itself.

I originally did this as part of a bigger PR (#1267). With that PR,
one in-memory layer, and one ephemeral file, would hold page versions
belonging to multiple segments. Currently, PageVersions can only hold
versions for a single segment, so that would need to be changed.
Rather than modify PageVersions to support that, just remove it
altogether.
2022-02-23 21:04:39 +02:00
Heikki Linnakangas
7fae894648 Move a few unit tests specific to layered file format.
These tests have intimate knowledge of the directory layeout and layer
file names used by the LayeredRepository implementation of the
Repository trait. Move them, so that all the tests that remain in
repository.rs are expected to work without changes with any
implementation of Repository. Not that we have any plans to create
another Repository implementaiton any time soon, but as long as we
have the Repository interface, let's try to maintain that abstraction
in the tests too.
2022-02-23 20:32:06 +02:00