Without this step, the page versions won't actually be removed, they're
just marked for deletion on the next RocksDB "merge" or "compact"
operation.
Author: Konstantin Knizhnik
- Print the number of dropped relations, and the number of relations
encountered overall.
- If a block has only one page version, the latest one, don't count it as
a "truncated" version history. Only count pages for which we actually
removed some old versions.
- Change "last" to "latest" in variable names and comments. "Last" could
be interpreted as "oldest", but here it means "newest".
- Add a comment noting that the GC code depends on get_page_at_lsn_nowait
to store the materialized page version in the repository.
- Change "last" to "latest" in variable names for clarity. "Last" could
be interpreted as the oldest, but here it means newest.
To simplify cloud ops, allow configuration via file.
toml is used as the config format, and the file is stored in the working
directory.
Arguments used at initialization are saved in the config file.
Config file params may be overridden by CLI arguments.
Use CLI args instead of environment variables to parameterize the
working directory and postgres distirbution.
Before this change, there was a mixture of environment variables and CLI
arguments that needed to be set. Moving to a single input simplifies
cloud configuration management.
The bool::then function was added in Rust 1.50. I'm still using 1.48 on
my laptop. We haven't decided what Rust version we will require
(https://github.com/zenithdb/zenith/issues/138), and I'll probably need
to upgrade sooner or later, but this will do for now.
Clear a clippy warning about manual flatten.
This isn't good error handling, but panicking is probably better than
spinning forever if stdin returns EOF.
Previously, transaction commit could happen regardless of whether
pageserver has caught up or not. This patch aims to fix that.
There are two notable changes:
1. ComputeControlPlane::new_node() now sets the
`synchronous_standby_names = 'pageserver'` parameter to delay
transaction commit until pageserver acting as a standby has
fetched and ack'd a relevant portion of WAL.
2. pageserver now has to:
- Specify the `application_name = pageserver` which matches the
one in `synchronous_standby_names`.
- Properly reply with the ack'd LSNs.
This means that some tests don't need sleeps anymore.
TODO: We should probably make this behavior configurable.
Fixes#187.
Now postgres_backend communicates with the client, passing queries to the
provided handler; we have two currently, for wal_acceptor and pageserver.
Now BytesMut is again used for writing data to avoid manual message length
calculation.
ref #118
- All timelines are now stored in the same rocksdb repository. The GET
functions have been taught to follow the ancestors.
- Change the way relation size is stored. Instead of inserting "tombstone"
entries for blocks that are truncated away, store relation size as
separate key-value entry for each relation
- Add an abstraction for the key-value store: ObjectStore. It allows
swapping RocksDB with some other key-value store easily. Perhaps we
will write our own storage implementation using that interface, or
perhaps we'll need a different abstraction, but this is a small
improvement over status quo in any case.
- Garbage Collection is broken and commented out. It's not clear where and
how it should be implemented.
Move `save_decoded_record` out of the Timeline trait. The storage
implementation shouldn't need to know how to decode records.
Also move put_create_database() out of the Timeline trait. Add a new
`list_rels` function to Timeline to support it, instead.
Rename `get_relsize` to `get_rel_size`, and `get_relsize_exists` to
`get_rel_exists`. Seems nicer.
Derive Serialize+Deserialize for RelTag, BufferTag, CacheKey. Replace
handwritten pack/unpack functions with ser, des from
zenith_utils::bin_ser (which uses the bincode crate).
There are some ugly hybrids in walredo.rs, but those functions are
already doing a lot of questionable manual byte-twiddling, so hopefully
the weirdness will go away when we get better postgres protocol
wrappers.
- Previously, we checked on first use of a timeline, whether there is
a snapshot and WAL for the timeline, and loaded it all into the
(rocksdb) repository. That's a waste of effort if we had done that
earlier already, and stopped and restarted the server. Track the
last LSN that we have loaded into the repository, and only load the
recent missing WAL after that.
- When you create a new zenith repository with "zenith init",
immediately load the initial empty postgres cluster into the rocksdb
repository. Previously, we only did that on the first connection. This
way, we don't need any "load from filesystem" codepath during normal
operation, we can assume that the repository for a timeline is always
up to date. (We might still want to use the functionality to import an
existing PostgreSQL data directory into the repository in the future,
as a separate Import feature, but not today.)
This includes the following commits:
35a1c3d521 Specify right LSN in test_createdb.py
d95e1da742 Fix issue with propagation of CREATE DATABASE to the branch
8465738aa5 [refer #167] Fix handling of pg_filenode.map files in page server
86056abd0e Fix merge conflict: set initial WAL position to second segment because of pg_resetwal
2bf2dd1d88 Add nonrelfile_utils.rs file
20b6279beb Fix restoring non-relational data during compute node startup
06f96f9600 Do not transfer WAL to computation nodes: use pg_resetwal for node startup
As well as some older changes related to storing CLOG and MultiXact data as
"pseudorelation" in the page server.
With this revert, we go back to the situtation that when you create a
new compute node, we ship *all* the WAL from the beginning of time to
the compute node. Obviously we need a better solution, like the code
that this reverts. But per discussion with Konstantin and Stas, this
stuff was still half-baked, and it's better for it to live in a branch
for now, until it's more complete and has gone through some review.