Commit Graph

36 Commits

Author SHA1 Message Date
Heikki Linnakangas
de9d5e0aa4 Remove unnecessary dependencies.
Found by "cargo udeps"
2021-08-25 18:51:15 +03:00
Heikki Linnakangas
5998744bcc Remove rocksdb implementation.
The layered storage format is good enough that we don't need the rocksdb
implementation anymore. There are a lot of known issues but we'll keep
working on them.
2021-08-25 18:37:22 +03:00
Dmitry Rodionov
23b5249512 translate pageserver api to http 2021-08-24 19:05:00 +03:00
Heikki Linnakangas
2450f82de5 Introduce a new "layered" repository implementation.
This replaces the RocksDB based implementation with an approach using
"snapshot files" on disk, and in-memory btreemaps to hold the recent
changes.

This make the repository implementation a configuration option. You can
choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>"
The unit tests have been refactored to exercise both implementations.
'layered' is now the default.

Push/pull is not implemented. The 'test_history_inmemory' test has been
commented out accordingly. It's not clear how we will implement that
functionality; probably by copying the snapshot files directly.
2021-08-16 10:06:48 +03:00
Dmitry Ivanov
cb1b4a12a6 Add some prometheus metrics to pageserver
The metrics are served by an http endpoint, which
is meant to be spawned in a new thread.

In the future the endpoint will provide more APIs,
but for the time being, we won't bother with proper routing.
2021-08-03 21:42:24 +03:00
Heikki Linnakangas
47824c5fca Remove page server interactive mode.
It was pretty cool, but no one used it, and it had gotten badly out of
date. The main interesting thing with it was to see some basic metrics
on the fly, while the page server is running, but the metrics collection
had been broken for a long time, too. Best to just remove it.
2021-07-23 12:21:21 +03:00
Dmitry Rodionov
ed0fcfa9b7 replace parse_duration crate because of unpatched known vulnerability
resolves #87
2021-07-16 14:30:27 +03:00
Patrick Insinger
cc169a6896 pageserver - config file
To simplify cloud ops, allow configuration via file.
toml is used as the config format, and the file is stored in the working
directory.
Arguments used at initialization are saved in the config file.
Config file params may be overridden by CLI arguments.
2021-06-14 09:40:22 -07:00
Stas Kelvich
002cd8ed5b Dockerfile for pageserver. 2021-06-01 16:08:32 +03:00
Heikki Linnakangas
34f4207501 Refactoring of the Repository/Timeline stuff
- All timelines are now stored in the same rocksdb repository. The GET
  functions have been taught to follow the ancestors.

- Change the way relation size is stored. Instead of inserting "tombstone"
  entries for blocks that are truncated away, store relation size as
  separate key-value entry for each relation

- Add an abstraction for the key-value store: ObjectStore. It allows
  swapping RocksDB with some other key-value store easily. Perhaps we
  will write our own storage implementation using that interface, or
  perhaps we'll need a different abstraction, but this is a small
  improvement over status quo in any case.

- Garbage Collection is broken and commented out. It's not clear where and
  how it should be implemented.
2021-05-27 20:07:50 +03:00
Stas Kelvich
746f667311 Refactor CLI and CLI<->pageserver interfaces to support remote pageserver
This patch started as an effort to support CLI working against remote
pageserver, but turned into a pretty big refactoring.

* CLI now does not look into repository files directly. New commands
'branch_create' and 'identify_system' were introduced into page_service to
support that.
* Branch management that was scattered between local_env and
zenith/main.rs is moved into pageserver/branches.rs. That code could better fit
in Repository/Timeline impl, but I'll leave that for a different patch.
* All tests-related code from local_env went into integration_tests/src/lib.rs as an
extension to PostgresNode trait.
* Paths-generating functions were concentrated around corresponding config
types (LocalEnv and PageserverConf).
2021-05-17 19:17:51 +03:00
Patrick Insinger
99d80aba52 use pageserver for pg list command 2021-05-12 12:34:03 +03:00
Eric Seppanen
0cbb3798da try using serde to do all the serialization in wal_service
This version validates on every call that our result is exactly the same
as the previous result.

NodeId is a strange corner case: one field is serialized little-endian
and one field is serialized big-endian. Hopefully we can fix that in the
future.
2021-05-10 16:21:05 -07:00
Eric Seppanen
1767208563 remove tokio-postgres from dependencies 2021-05-10 15:24:55 -07:00
Eric Seppanen
4b46693c81 adapt to new upstream tokio-postgres replication interface
Switch over to a newer version of rust-postgres PR752. A few
minor changes are required:
- PgLsn::UNDEFINED -> PgLsn::from(0)
- PgTimestamp -> SystemTime
2021-05-10 15:24:55 -07:00
Eric Seppanen
df5a55c445 add workspace_hack crate
Our builds can be a little inconsistent, because Cargo doesn't deal well
with workspaces where there are multiple crates which have different
dependencies that select different features. As a workaround, copy what
other big rust projects do: add a workspace_hack crate.

This crate just pins down a set of dependencies and features that
satisfies all of the workspace crates.

The benefits are:
- running `cargo build` from one of the workspace subdirectories now
  works without rebuilding anything.
- running `cargo install` works (without rebuilding anything).
- making small dependency changes is much less likely to trigger large
  dependency rebuilds.
2021-05-07 13:08:31 -07:00
Eric Seppanen
2e0d45d092 Switch to upstream rust-s3
The local fork of rust-s3 has some code to support Google Cloud, but
that PR no longer applies upstream, and will need significant changes
before it can be re-submitted.

In the meantime, we might as well just use the most similar upstream
release. The benefit of switching is that it fixes a feature-resolution
bug that was causing us to build 24 more crates than needed (mostly
async-std and its dependencies).
2021-05-04 12:02:00 -07:00
Eric Seppanen
a3818dee58 pin dependencies to versions
If there isn't any version specified for a dependency crate, Cargo may
choose a newer version. This could happen when Cargo.lock is updated
("cargo update") but can also happen unexpectedly when adding or
changing other dependencies. This can allow API-breaking changes to be
picked up, breaking the build.

To prevent this, specify versions for all dependencies. Cargo is still
allowed to pick newer versions that are (hopefully) non-breaking, by
analyzing the semver version number.

There are two special cases here:

1. serde_derive::{Serialize, Deserialize} isn't really used any more. It
was only a separate crate in the past because of compiler limitations.
Nowadays, people turn on the "derive" feature of the serde crate and
use serde::{Serialize, Deserialize}.

2. parse_duration is unmaintained and has an open security issue. (gh
iss. 87) That issue probably isn't critical for us because of where we
use that crate, but it's probably still better to pin the version so we
can't get hit with an API-breaking change at an awkward time.
2021-05-03 14:02:10 -07:00
Heikki Linnakangas
93d7d2ae2a Refactor pagecache <-> Wal redo communication
After the rocksdb patch (commit 6aa38d3f7d), the CacheEntry struct was
used only momentarily in the communication between the page_cache and
the walredo modules. It was in fact not stored in any cache anymore.
For clarity, refactor the communication.

There is now a WalRedoManager struct, with `request_redo` function,
that can be used to request WAL replay of a particular page. It sends
a request to a queue like before, but the queue has been replaced with
tokio::sync::mpsc. Previously, the resulting page image was stored
directly in the CacheEntry, and the requestor was notified using a
condition variable. Now, the requestor includes a 'oneshot' channel in
the request, and the WAL redo manager sends the response there.
2021-04-24 12:24:04 +03:00
Konstantin Knizhnik
ed30f2096c Disable GC by default 2021-04-22 11:30:27 +03:00
Konstantin Knizhnik
da9508716d Address issues from Eric's review 2021-04-22 10:37:52 +03:00
Konstantin Knizhnik
9e7c45cb72 Merge with master 2021-04-22 09:45:13 +03:00
Eric Seppanen
2cd730d31f page_cache: replace long mutex sleep with SeqWait
When calling into the page cache, it was possible to wait on a blocking
mutex, which can stall the async executor.

Replace that sleep with a SeqWait::wait_for(lsn).await so that the
executor can go on with other work while we wait.

Change walreceiver_works to an AtomicBool to avoid the awkwardness of
taking the lock, then dropping it while we call wait_for and then
acquiring it again to do real work.
2021-04-21 18:02:13 -07:00
Konstantin Knizhnik
4f3f0304c2 Merge branch 'main' into rocksdb_pageserver 2021-04-21 19:05:02 +03:00
Konstantin Knizhnik
c981f4ad66 Implement garbage collection of unused versions 2021-04-21 19:04:30 +03:00
Heikki Linnakangas
e911427872 Remove some unnecessary dependencies 2021-04-21 16:42:12 +03:00
Konstantin Knizhnik
07507274c0 Merge branch 'main' into rocksdb_pageserver 2021-04-21 16:06:31 +03:00
Heikki Linnakangas
3600b33f1c Implement "timelines" in page server
This replaces the page server's "datadir" concept. The Page Server now
always works with a "Zenith Repository". When you initialize a new
repository with "zenith init", it runs initdb and loads an initial
basebackup of the freshly-created cluster into the repository, on "main"
branch. Repository can hold multiple "timelines", which can be given
human-friendly names, making them "branches". One page server simultaneously
serves all timelines stored in the repository, and you can have multiple
Postgres compute nodes connected to the page server, as long they all
operate on a different timeline.

There is a new command "zenith branch", which can be used to fork off
new branches from existing branches.

The repository uses the directory layout desribed as Repository format
v1 in https://github.com/zenithdb/rfcs/pull/5. It it *highly* inefficient:
- we never create new snapshots. So in practice, it's really just a base
  backup of the initial empty cluster, and everything else is reconstructed
  by redoing all WAL

- when you create a new timeline, the base snapshot and *all* WAL is copied
  from the new timeline to the new one. There is no smarts about
  referencing the old snapshots/wal from the ancestor timeline.

To support all this, this commit includes a bunch of other changes:

- Implement "basebackup" funtionality in page server. When you initialize
  a new compute node with "zenith pg create", it connects to the page
  server, and requests a base backup of the Postgres data directory on
  that timeline. (the base backup excludes user tables, so it's not
  as bad as it sounds).

- Have page server's WAL receiver write the WAL into timeline dir. This
  allows running a Page Server and Compute Nodes without a WAL safekeeper,
  until we get around to integrate that properly into the system. (Even
  after we integrate WAL safekeeper, this is perhaps how this will operate
  when you want to run the system on your laptop.)

- restore_datadir.rs was renamed to restore_local_repo.rs, and heavily
  modified to use the new format. It now also restores all WAL.

- Page server no longer scans and restores everything into memory at startup.
  Instead, when the first request is made for a timeline, the timeline is
  slurped into memory at that point.

- The responsibility for telling page server to "callmemaybe" was moved
  into Postgres libpqpagestore code. Also, WAL producer connstring cannot
  be specified in the pageserver's command line anymore.

- Having multiple "system identifiers" in the same page server is no
  longer supported. I repurposed much of that code to support multiple
  timelines, instead.

- Implemented very basic, incomplete, support for PostgreSQL's Extended
  Query Protocol in page_service.rs. Turns out that rust-postgres'
  copy_out() function always uses the extended query protocol to send
  out the command, and I'm using that to stream the base backup from the
  page server.

TODO: I haven't fixed the WAL safekeeper for this scheme, so all the
integration tests involving safekeepers are failing. My plan is to modify
the safekeeper to know about Zenith timelines, too, and modify it to work
with the same Zenith repository format. It only needs to care about the
'.zenith/timelines/<timeline>/wal' directories.
2021-04-20 19:11:27 +03:00
Konstantin Knizhnik
95160dee6d Merge with main branch 2021-04-19 17:00:30 +03:00
Konstantin Knizhnik
8aa3013ec2 Merge branch 'main' into rocksdb_pageserver 2021-04-19 16:28:29 +03:00
Eric Seppanen
b32cc6a088 pageserver: change over some error handling to anyhow+thiserror
This is a first attempt at a new error-handling strategy:
- Use anyhow::Error as the first choice for easy error handling
- Use thiserror to generate local error types for anything that
  needs it (no error type is available to us) or will be inspected
  or matched on by higher layers.
2021-04-18 23:06:35 -07:00
Eric Seppanen
35e0099ac6 pin remote rust-s3 dependency to a git hash
Using the hash should allow us to change the remote repo and propagate
that change to user builds without that change becoming visible at a
random time.

It's unfortunate that we can't declare this dependency once in the
top-level Cargo.toml; that feature request is rust-lang rfc 2906.
2021-04-16 15:26:11 -07:00
Eric Seppanen
e8032f26e6 adopt new tokio-postgres:replication branch
This PR has evolved a lot; jump to the newer version. This should make
it easier to handle keepalive messages.
2021-04-16 08:29:47 -07:00
lubennikovaav
82dc1e82ba Restore pageserver from s3 or local datadir (#9)
* change pageserver --skip-recovery option to --restore-from=[s3|local]
* implement restore from local pgdata 
* add simple test for local restore
2021-04-14 21:14:10 +03:00
Konstantin Knizhnik
07fb30747a Store pageserver data in RocksDB 2021-04-08 19:39:30 +03:00
Heikki Linnakangas
1367332447 Separate walkeeper and pageserver sources into different directories.
The integration tests, which depend on both walkeeper and pageserver,
are moved into yet another directory, 'integration_tests'.
2021-04-06 13:15:26 +03:00