Commit Graph

368 Commits

Author SHA1 Message Date
Stas Kelvich
746f667311 Refactor CLI and CLI<->pageserver interfaces to support remote pageserver
This patch started as an effort to support CLI working against remote
pageserver, but turned into a pretty big refactoring.

* CLI now does not look into repository files directly. New commands
'branch_create' and 'identify_system' were introduced into page_service to
support that.
* Branch management that was scattered between local_env and
zenith/main.rs is moved into pageserver/branches.rs. That code could better fit
in Repository/Timeline impl, but I'll leave that for a different patch.
* All tests-related code from local_env went into integration_tests/src/lib.rs as an
extension to PostgresNode trait.
* Paths-generating functions were concentrated around corresponding config
types (LocalEnv and PageserverConf).
2021-05-17 19:17:51 +03:00
Patrick Insinger
53ea6702bd zenith - pg list relax pageserver dependency 2021-05-17 11:14:01 -04:00
Heikki Linnakangas
952424b78c Move save_decoded_record() function to Repository trait.
The function doesn't depend on the implementation of the Repository, it
only calls the public interface functions.
2021-05-17 15:16:28 +03:00
Stas Kelvich
d737c40eec copy safekeeper README from older C version to current Rust version 2021-05-17 11:43:18 +03:00
Heikki Linnakangas
532918e13d Fix branch creation at a point other than end-of-WAL
When creating a new branch, we copied all WAL from the source timeline
to the new one, and it was being picked up and digested into the
repository on first use of the timeline. Fix by copying the WAL only
up to the branch's starting point.

We should probably move the branch-creation code from the CLI to page
server itself - that's what I was starting to hack on when I noticed this
bug - but let's fix this first.

Add a regression test. To test multiple branches, enhance the python
test fixture to manage multiple running Postgres instances. Also, for
convenience, add a function to the postgres fixture to open a connection
to the server with psycopg2.
2021-05-17 10:09:34 +03:00
Heikki Linnakangas
b266c28345 Use common Lsn datatype in a few more places
This isn't just cosmetic, this also fixes one bug: the code in
parse_point_in_time() function used str::parse::<u64>() to parse the
parts of the LSN string (e.g. 0/1A2B3C4D). That's wrong, because the
LSN consists of hex digits, not base-10.
2021-05-17 10:07:42 +03:00
Konstantin Knizhnik
04dc698d4b Add support of twophase transactions 2021-05-16 00:03:20 +03:00
Heikki Linnakangas
6b11b4250e Fix compilation with older rust version.
Commit 9ece1e863d used `slice.fill`, which isn't available until Rust
v1.50.0. I have 1.48.0 installed, so it was failing to compile for me.

We haven't really standardized on any particular Rust version, and if
there's a good feature we need in a recent version, let's bump up the
minimum requirement. But this is simple enough to work around.
2021-05-15 01:42:33 +03:00
Konstantin Knizhnik
15d1c1f8bf Update submodule version 2021-05-14 17:15:14 +03:00
Konstantin Knizhnik
9ece1e863d Compute and restore pg_xact, pg_multixact and pg_filenode.map files 2021-05-14 16:35:09 +03:00
anastasia
2870150365 bump vendor/postgres 2021-05-14 13:55:02 +03:00
anastasia
7b281900f9 Add a function to change postgresql.conf in python tests. Add test_config as an example 2021-05-14 13:55:02 +03:00
Heikki Linnakangas
97992226d3 Add some unit tests for the Repository/Timeline interface. 2021-05-14 12:44:52 +03:00
Heikki Linnakangas
270356ec38 Refactor WalRedoManager for easier testing.
Turn WalRedoManager into an abstract trait, so that it can be easily
mocked in unit tests.

One change here is that the WAL redo manager is no longer tied to a
specific zenith timeline. It didn't do anything with that information
aside from using it in the dummy datadir's name. We could use any
random string for that purpose, it's just to prevent two WAL redo
managers from stepping over each other. But this commit actually
changes things so that all timelines use the same WAL redo manager, so
that's not necessary. We will probably want to maintain a pool of WAL
redo processes in the future, but for now let's keep it simple.

In the passing, fix some comments.
2021-05-14 12:44:49 +03:00
Heikki Linnakangas
c2db828481 Create RocksDB databases under correct path.
We used to create them under .zenith/.zenith/<timelineid>. The double
.zenith was clearly not intentional. Change it to
.zenith/timelines/<timelineid>.

Fixes https://github.com/zenithdb/zenith/issues/127
2021-05-14 12:44:44 +03:00
Eric Seppanen
71e93faed7 fix endian typos in BeSer
Cut/paste error: BeSer was using the little-endian config in two places.

Add better unit tests so this can't happen again.
2021-05-13 19:04:17 -07:00
Eric Seppanen
54d52e07db .gitignore integration_tests/.zenith
It's a bit annoying that the .zenith state can show up in multiple
places, but since this is how the regression tests run if you launch
them from the git root directory, ignore this one too.
2021-05-13 13:47:22 -07:00
Heikki Linnakangas
4dccdb33ab Fix comment formatting.
The module comment should use "//!" instead of "///". Otherwise, it is
considered to apply to the *next* thing, in this case the "use" statement
that follows, not the file as whole. "cargo fmt" revealed this by insisting
to move the "use crate::pg_constants" line to before the comment.
2021-05-13 22:06:05 +03:00
anastasia
38c4b6f02f Move postgres code related to zenith pageserver to contrib/zenith.
- vendor/postgres changes
- Respective changes in RUST code: upload shared library, use new GUC names.
- Add contrib build to Makefile.
2021-05-13 16:23:21 +03:00
Eric Seppanen
6ff3f1b9fd don't open log files multiple times
Multiple fds writing to the same file doesn't work. One fd will
overwrite the output of the other fd. We were opening log files three
times (stdout, stderr, and slog).

The symptoms can be seen when the program panics; the final file will
have truncated or lost messages. After this change, all messages are
preserved. If panicking and logging are concurrent (and they definitely
can be), some of the messages may be interleaved in slightly
inconvenient ways.

File::try_clone() is essentially `dup` underneath, meaning the two will
share the same file offset.
2021-05-13 00:32:39 -07:00
Patrick Insinger
4c5e23d014 pageserver - fix ParameterStatus write call 2021-05-12 20:59:04 -04:00
Patrick Insinger
99d80aba52 use pageserver for pg list command 2021-05-12 12:34:03 +03:00
Konstantin Knizhnik
2f2dff4c8d Merge with main brnach 2021-05-12 10:46:01 +03:00
Konstantin Knizhnik
22e7fcbf2d Handle visbility map updates in WAL redo 2021-05-12 10:38:43 +03:00
Patrick Insinger
372617a4f5 test_runner - pgrep remove -c arg
macOS doesn't support it
2021-05-11 17:52:22 -04:00
Patrick Insinger
49d1921a28 page_server - add python api tests 2021-05-11 14:16:22 -04:00
Patrick Insinger
d8e509d29e page_service - use anyhow for error handling 2021-05-11 14:11:10 -04:00
Patrick Insinger
d5bfe84d9e cargo fmt 2021-05-11 12:35:09 -04:00
Arseny Sher
8fff26ad49 Make Repository API return abstract dyn Timeline.
+ minor cargo fmt cleanup
2021-05-11 15:27:23 +03:00
Heikki Linnakangas
5f4e32f505 Require valid WAL streaming point.
If timeline doesn't have a valid "last valid LSN", refuse WAL streaming.
The previous behavior was to start streaming from the very beginning of
time. That was needed to support bootstrapping the page server with no
data at all (see commit bd606ab37a), but we no longer do that.
2021-05-11 11:12:14 +03:00
Heikki Linnakangas
fb71c85a79 Implement std::fmt::Display for RelTag, for debug messages. 2021-05-11 10:55:51 +03:00
Heikki Linnakangas
ff76226a35 Remove obsolete mgmt-console.
It has served its purpose. A new management console is in the works. The
old code is available in git history if we need anything from it.
2021-05-11 10:54:41 +03:00
Eric Seppanen
6e748147b6 test_runner: fix relative import syntax
Somehow I never learned this part correctly: relative imports use the
syntax "import .file" for a file sitting in the same directory.

This error wasn't terribly obvious, but the Pylance linter is yelling at
me so I'll fix it now before anyone else notices.
2021-05-11 00:09:39 -07:00
Eric Seppanen
e5df42feef add workspace_hack dependency to zenith_utils
I didn't think this mattered, but it does: if you add a dependency to
zenith_utils, but forget to request a feature you need, the crate will
build from the workspace root, but not by itself.

It's probably better to pull in the whole dependency tree.

This leaves one problem unsolved: the missing feature above will now be
a latent bug. If that feature gets removed later by other crates, and
then the workspace_hack Cargo.toml is updated, this missing feature will
become a build failure.
2021-05-10 18:21:45 -07:00
Eric Seppanen
73647e5715 wal_service: fix NodeId order/endian issues
Add fixes suggested in code review.

In a previous commit, I changed the NodeId field order and types to try
to preserve the exact serialization that was happening. Unfortunately,
that serialization was incorrect and the original struct was mostly
correct.

Change uuid to be a [u8; 16] as it was intended to be a byte array; that
will clearly indicate to serde serializers that no endian swaps will
ever be needed.
2021-05-10 16:21:05 -07:00
Eric Seppanen
95db33f3f9 wal_service: comment cleanup 2021-05-10 16:21:05 -07:00
Eric Seppanen
bace19ffbe wal_service: switch to Lsn type
Replace XLogRecPtr with Lsn in wal_service.rs .

This removes the last use of XLogSegmentOffset and XLByteToSeg, so
delete them. (replaced by Lsn::segment_offset and Lsn::segment_number.)
2021-05-10 16:21:05 -07:00
Eric Seppanen
60d66267a9 add serde support to Lsn type
A serialized Lsn and a serialized u64 should be identical.
2021-05-10 16:21:05 -07:00
Eric Seppanen
294320e6a8 wal_service: drop repr(C)
The C memory representation is only needed if we want to guarantee the
same memory layout as some other program. Since we're using serde to
serialize these data structures, we can let the compiler do what it
wants.
2021-05-10 16:21:05 -07:00
Eric Seppanen
28b4d9abb3 wal_service: use anyhow for error handling
We may eventually want precise error types for some of this, but
anyhow::Error is a lot easier than trying to force io::Error.
2021-05-10 16:21:05 -07:00
Eric Seppanen
8d8bc304c1 work around NodeId endian issues
Instead of playing games during serialize/deserialize, just treat
NodeId::term as an 8-byte array instead of a u64.
2021-05-10 16:21:05 -07:00
Eric Seppanen
4788248e11 wal_service: remove manual serialization code
Commit to serde for serialization of data structures.
2021-05-10 16:21:05 -07:00
Eric Seppanen
0cbb3798da try using serde to do all the serialization in wal_service
This version validates on every call that our result is exactly the same
as the previous result.

NodeId is a strange corner case: one field is serialized little-endian
and one field is serialized big-endian. Hopefully we can fix that in the
future.
2021-05-10 16:21:05 -07:00
Eric Seppanen
36c12247b9 add bin_ser module
This module adds two traits that implement bincode-based serialization.
BeSer implements methods for big-endian encoding/decoding.
LeSer implements methods for little-endian encoding/decoding.

Right now, the BeSer and LeSer methods have the same names, meaning you
can't `use` them both at the same time. This is intended to be a safety
mechanism: mixing big-endian and little-endian encoding in the same file
is error-prone. There are ways around this, but the easiest fix is to
put the big-endian code and little-endian code in different files or
submodules.
2021-05-10 16:21:05 -07:00
Eric Seppanen
1767208563 remove tokio-postgres from dependencies 2021-05-10 15:24:55 -07:00
Eric Seppanen
d25656797c switch pageserver to blocking postgres interface 2021-05-10 15:24:55 -07:00
Eric Seppanen
6c825dcbaa switch walkeeper over to new postgres blocking interface
This is a big async -> sync conversion. Most of it is a pretty
straightforward conversion of removing `async` and `.await` and swapping
in the right std modules.

I didn't find a thread-blocking version of `Notify` so I wrote one, and
then realized that there was already a Mutex being used there, so I
deleted my Notify and just used Condvar instead.

There is one part that seems odd to me: in `handle_start_replication`
there is a place where the previous code was doing a non-blocking read;
there is no TcpStream::try_read() so I fell back on manually flipping
the socket to non-blocking mode and then back again. This seems pretty
gross, but I'm not sure exactly what to replace this with: a background
thread? Extract the fd and run select() on it to first test if it's
readable?
2021-05-10 15:24:55 -07:00
Eric Seppanen
4b46693c81 adapt to new upstream tokio-postgres replication interface
Switch over to a newer version of rust-postgres PR752. A few
minor changes are required:
- PgLsn::UNDEFINED -> PgLsn::from(0)
- PgTimestamp -> SystemTime
2021-05-10 15:24:55 -07:00
Eric Seppanen
8952066ecb circleci: Save the postgres logs as artifacts 2021-05-09 22:20:58 -07:00
Eric Seppanen
d26b76fe7c cargo fmt 2021-05-07 13:11:44 -07:00