- Change hardcoded OLDEST_INMEM_DISTANCE value to pageserver config option checkpoint_distance.
- Get rid of 'force' flag in checkpoint_internal(). Use checkpoint_distance=0 instead.
Support is done via pytest-xdist plugin.
To use the feature add -n<concurrency> to pytest invocation
e.g. pytest -n8 to run 8 tests in parallel.
Changes in code are mostly about ports assigning. Previously port for
pageserver was hardcoded without the ability to override through zenith
cli and ports for started compute nodes were calculated twice, in zenith
cli and in test code. Now zenith cli supports port arguments for
pageserver and compute nodes to be passed explicitly.
Tests are modified in such a way that each worker gets a non overlapping
port range which can be configured and now contains 100 ports. These
ports are distributed to test services (pageserver, wal acceptors,
compute nodes) so they can work independently.
This contains a lowest common denominator of pageserver and safekeeper log
initialisation routines. It uses daemonize flag to decide where to
stream log messages. In case daemonize is true log messages are
forwarded to file. Otherwise streaming to stdout is used. Usage of
stdout for log output is the default in docker side of things, so make
it easier to browse our logs via builtin docker commands.
Ran into problems launching the WAL redo process on OS X after 4b73ad.
Launching the `initdb` process was met with "bad file descriptor" errors.
Using dtrace, I found shortly after calling `posix_spawn` for `initdb`,
`kevent` was returning this error.
I haven't dug super deep to see if the daemonization itself is the
problem, but this commit fixes it for me. My hunch is that some file
descriptors used when the Tokio runtime is initailzed become invalid
in the daemon process.
by binding sockets before daemonization
also use less annoying error reporting by not printing full error
messages for connect errors in first several connection retries
closes#507
Once upon a time, 'page_cache.rs' contained an actual page cache, but
it hasn't for a very long time. Rename to reflect what it actually does
these days.
Now that we only have one Repository implementation, no need for the
command-line options to choose it either. I'm removing these as a separate
commit to show what we will need to do if we add another Repository
implementation in the future (even though I don't foresee us doing that
any time soon)
The layered storage format is good enough that we don't need the rocksdb
implementation anymore. There are a lot of known issues but we'll keep
working on them.
This replaces the RocksDB based implementation with an approach using
"snapshot files" on disk, and in-memory btreemaps to hold the recent
changes.
This make the repository implementation a configuration option. You can
choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>"
The unit tests have been refactored to exercise both implementations.
'layered' is now the default.
Push/pull is not implemented. The 'test_history_inmemory' test has been
commented out accordingly. It's not clear how we will implement that
functionality; probably by copying the snapshot files directly.
Current state with authentication.
Page server validates JWT token passed as a password during connection
phase and later when performing an action such as create branch tenant
parameter of an operation is validated to match one submitted in token.
To allow access from console there is dedicated scope: PageServerApi,
this scope allows access to all tenants. See code for access validation in:
PageServerHandler::check_permission.
Because we are in progress of refactoring of communication layer
involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper
now doesn’t check token passed from compute, and uses “hardcoded” token
passed via environment variable to communicate with pageserver.
Compute postgres now takes token from environment variable and passes it
as a password field in pageserver connection. It is not passed through
settings because then user will be able to retrieve it using pg_settings
or SHOW ..
I’ve added basic test in test_auth.py. Probably after we add
authentication to remaining network paths we should enable it by default
and switch all existing tests to use it.
The metrics are served by an http endpoint, which
is meant to be spawned in a new thread.
In the future the endpoint will provide more APIs,
but for the time being, we won't bother with proper routing.
It was pretty cool, but no one used it, and it had gotten badly out of
date. The main interesting thing with it was to see some basic metrics
on the fly, while the page server is running, but the metrics collection
had been broken for a long time, too. Best to just remove it.
this patch adds support for tenants. This touches mostly pageserver.
Directory layout on disk is changed to contain new layer of indirection.
Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant
id>. Tenant id has the same format as timeline id. Tenant id is included in
pageserver commands when needed. Also new commands are available in
pageserver: tenant_list, tenant_create. This is also reflected CLI.
During init default tenant is created and it's id is saved in CLI config,
so following commands can use it without extra options. Tenant id is also included in
compute postgres configuration, so it can be passed via ServerInfo to
safekeeper and in connection string to pageserver.
For more info see docs/multitenancy.md.
To simplify cloud ops, allow configuration via file.
toml is used as the config format, and the file is stored in the working
directory.
Arguments used at initialization are saved in the config file.
Config file params may be overridden by CLI arguments.
Use CLI args instead of environment variables to parameterize the
working directory and postgres distirbution.
Before this change, there was a mixture of environment variables and CLI
arguments that needed to be set. Moving to a single input simplifies
cloud configuration management.
Parse all the command line options before calling "zenith init" and
changing current working dir. The rest of the options don't make any
difference if we're initializing a new repository, but it seems strange
and error-prone to parse some arguments at different times.
It's created once early in server startup, after parsing the
command-line options, and never modified afterwards. To simplify
things, pass it around as static ref, instead of making copies in all
the different structs. We still pass around a reference to it, rather
than putting it in a global variable, to allow unit testing with
different configs in the same process.
Commit 746f667311 added the 'workdir' field and the get_*_path()
functions, with the idea that we cd into the directory at page server
startup, so that the get_*_path() functions can always return paths
relative to '.', but 'workdir' shows the original path to it. Change it
so that 'conf.workdir' is always set to '.', too, and the get_*_path()
functions include 'workdir' in the returned paths. Why? Because that
allows writing unit tests without changing the current directory.
When I was working on commit 97992226d3, I initially wrote the test so
that it changed the current working directory, just like commit 746f667311
did. But that was problematic, when I tried to add another unit test that
*also* wants to change the current working dir, because they could then
not run concurrently. In fact, they could not even run serially, unless
the current directory was carefully reset after the test. So it is better
to avoid changing the current directory in tests.
Commit 746f667311 moved the "chdir" earlier in the startup sequence,
before daemonizing. But it forgot to remove a corresponding chdir call
later in the sequence when not in daemonize mode. As a result, if you
tried to start the pageserver without the --daemonize option, it always
failed with "No such file or directory" error.
This patch started as an effort to support CLI working against remote
pageserver, but turned into a pretty big refactoring.
* CLI now does not look into repository files directly. New commands
'branch_create' and 'identify_system' were introduced into page_service to
support that.
* Branch management that was scattered between local_env and
zenith/main.rs is moved into pageserver/branches.rs. That code could better fit
in Repository/Timeline impl, but I'll leave that for a different patch.
* All tests-related code from local_env went into integration_tests/src/lib.rs as an
extension to PostgresNode trait.
* Paths-generating functions were concentrated around corresponding config
types (LocalEnv and PageserverConf).
Multiple fds writing to the same file doesn't work. One fd will
overwrite the output of the other fd. We were opening log files three
times (stdout, stderr, and slog).
The symptoms can be seen when the program panics; the final file will
have truncated or lost messages. After this change, all messages are
preserved. If panicking and logging are concurrent (and they definitely
can be), some of the messages may be interleaved in slightly
inconvenient ways.
File::try_clone() is essentially `dup` underneath, meaning the two will
share the same file offset.
This moves things around:
- The PageCache is split into two structs: Repository and Timeline. A
Repository holds multiple Timelines. In order to get a page version,
you must first get a reference to the Repository, then the Timeline
in the repository, and finally call the get_page_at_lsn() function
on the Timeline object. This sounds complicated, but because each
connection from a compute node, and each WAL receiver, only deals
with one timeline at a time, the callers can get the reference to
the Timeline object once and hold onto it. The Timeline corresponds
most closely to the old PageCache object.
- Repository and Timeline are now abstract traits, so that we can
support multiple implementations. I don't actually expect us to have
multiple implementations for long. We have the RocksDB
implementation now, but as soon as we have a different
implementation that's usable, I expect that we will retire the
RocksDB implementation. But I think this abstraction works as good
documentation in any case: it's now easier to see what the interface
for storing and loading pages from the repository is, by looking at
the Repository/Timeline traits. They abstract traits are in
repository.rs, and the RocksDB implementation of them is in
repository/rocksdb.rs.
- page_cache.rs is now a "switchboard" to get a handle to the
repository. Currently, the page server can only handle one
repository at a time, so there isn't much there, but in the future
we might do multi-tenancy there.