Commit Graph

192 Commits

Author SHA1 Message Date
Heikki Linnakangas
cffc979058 Fix a couple of typos in comments. 2021-05-27 14:32:04 +03:00
Heikki Linnakangas
cb6e2d9ddb Minor refactoring and cleanup of the Timeline interface.
Move `save_decoded_record` out of the Timeline trait. The storage
implementation shouldn't need to know how to decode records.

Also move put_create_database() out of the Timeline trait. Add a new
`list_rels` function to Timeline to support it, instead.

Rename `get_relsize` to `get_rel_size`, and `get_relsize_exists` to
`get_rel_exists`. Seems nicer.
2021-05-27 09:44:46 +03:00
Alexey Kondratov
b1a424dfa9 Add more info about borrowed from Postgres structures (RelTag and BufferTag) 2021-05-26 12:05:13 +03:00
Eric Seppanen
7c73afc1af switch repository types to serde
Derive Serialize+Deserialize for RelTag, BufferTag, CacheKey. Replace
handwritten pack/unpack functions with ser, des from
zenith_utils::bin_ser (which uses the bincode crate).

There are some ugly hybrids in walredo.rs, but those functions are
already doing a lot of questionable manual byte-twiddling, so hopefully
the weirdness will go away when we get better postgres protocol
wrappers.
2021-05-25 14:56:19 -07:00
Eric Seppanen
6f9175ca2d cargo fmt 2021-05-24 17:28:56 -07:00
Heikki Linnakangas
69fa10ff86 Fix rocksdb get_relsize() implementation to work with historic LSNs. 2021-05-24 17:12:18 +03:00
Heikki Linnakangas
d5fe515363 Implement "checkpointing" in the page server.
- Previously, we checked on first use of a timeline, whether there is
  a snapshot and WAL for the timeline, and loaded it all into the
  (rocksdb) repository. That's a waste of effort if we had done that
  earlier already, and stopped and restarted the server. Track the
  last LSN that we have loaded into the repository, and only load the
  recent missing WAL after that.

- When you create a new zenith repository with "zenith init",
  immediately load the initial empty postgres cluster into the rocksdb
  repository. Previously, we only did that on the first connection. This
  way, we don't need any "load from filesystem" codepath during normal
  operation, we can assume that the repository for a timeline is always
  up to date. (We might still want to use the functionality to import an
  existing PostgreSQL data directory into the repository in the future,
  as a separate Import feature, but not today.)
2021-05-24 17:02:05 +03:00
Heikki Linnakangas
6a9c036ac1 Revert all changes related to storing and restoring non-rel data in page server
This includes the following commits:

35a1c3d521 Specify right LSN in test_createdb.py
d95e1da742 Fix issue with propagation of CREATE DATABASE to the branch
8465738aa5 [refer #167] Fix handling of pg_filenode.map files in page server
86056abd0e Fix merge conflict: set initial WAL position to second segment because of pg_resetwal
2bf2dd1d88 Add nonrelfile_utils.rs file
20b6279beb Fix restoring non-relational data during compute node startup
06f96f9600 Do not transfer WAL to computation nodes: use pg_resetwal for node startup

As well as some older changes related to storing CLOG and MultiXact data as
"pseudorelation" in the page server.

With this revert, we go back to the situtation that when you create a
new compute node, we ship *all* the WAL from the beginning of time to
the compute node. Obviously we need a better solution, like the code
that this reverts. But per discussion with Konstantin and Stas, this
stuff was still half-baked, and it's better for it to live in a branch
for now, until it's more complete and has gone through some review.
2021-05-24 16:05:45 +03:00
anastasia
6f9a582973 increase wait_lsn timeout to make tests more stable 2021-05-24 15:29:16 +03:00
anastasia
a0e23e6f3f Debug Timed out while waiting for WAL record problem 2021-05-24 15:29:16 +03:00
anastasia
84508d4f68 fix replay of nextMulti and nextMultiOffset fields 2021-05-24 15:17:35 +03:00
Eric Seppanen
4aabc9a682 easy clippy cleanups
Various things that clippy complains about, and are really easy to
fix.
2021-05-23 13:17:15 -07:00
Konstantin Knizhnik
d95e1da742 Fix issue with propagation of CREATE DATABASE to the branch 2021-05-21 12:06:46 +03:00
Stas Kelvich
c2b2ab974c Hide initdb output from "zenith init" command 2021-05-21 00:26:31 +03:00
Stas Kelvich
d45839879c Bind to socket earlier during pageserver init.
That allows printing reasonable error message instead of panicking if
address is already in use.
2021-05-21 00:26:31 +03:00
Heikki Linnakangas
2127a65e27 Tidy up the code to launch WAL redo process a little bit
- if removing the old datadir fails, throw an error
- obey PageServerConf.workdir
2021-05-20 19:29:00 +03:00
Heikki Linnakangas
ecf2d181c4 Tidy up the code to create PageServerConf
Parse all the command line options before calling "zenith init" and
changing current working dir. The rest of the options don't make any
difference if we're initializing a new repository, but it seems strange
and error-prone to parse some arguments at different times.
2021-05-20 19:28:57 +03:00
Konstantin Knizhnik
8465738aa5 [refer #167] Fix handling of pg_filenode.map files in page server 2021-05-20 19:16:16 +03:00
Konstantin Knizhnik
3645133700 Fix conflicts with main branch 2021-05-20 14:39:27 +03:00
Konstantin Knizhnik
20b6279beb Fix restoring non-relational data during compute node startup 2021-05-20 14:14:52 +03:00
Konstantin Knizhnik
06f96f9600 Do not transfer WAL to computation nodes: use pg_resetwal for node startup 2021-05-20 14:13:47 +03:00
Alexey Kondratov
b5f60f3874 Issue #144: Refactor errors handling during branches tree printing 2021-05-20 12:49:04 +03:00
Alexey Kondratov
0ec56cd21f Issue #144: Branching output of zenith branch
* Add ancestor_id to pg_list->branch_list output of pageserver.
* Display branching point (LSN) for each non-root branch.
* Add tests for `zenith branch`.
2021-05-20 12:49:04 +03:00
Heikki Linnakangas
600e1a0080 Pass PageServerConf as static ref.
It's created once early in server startup, after parsing the
command-line options, and never modified afterwards. To simplify
things, pass it around as static ref, instead of making copies in all
the different structs. We still pass around a reference to it, rather
than putting it in a global variable, to allow unit testing with
different configs in the same process.
2021-05-20 09:11:36 +03:00
Stas Kelvich
9c0ac251df Describe BeMessage::ErrorResponse format in comments 2021-05-20 00:37:46 +03:00
Stas Kelvich
2f25d17e11 Set more error fields to satisfy rust-postgres parser 2021-05-20 00:37:46 +03:00
Stas Kelvich
8faa6fa392 Accept semicolon right after branch_create command 2021-05-20 00:37:46 +03:00
Stas Kelvich
4d5a41301d Support returning errors from page service 2021-05-20 00:37:46 +03:00
Heikki Linnakangas
e3e593f571 Don't send spurious ReadyForQuery messages in extended query protocol.
libpq tolerates and ignores them, but the Rust postgres client gets
confused by them in certain states. This explained the strange failure
I saw with the Copy Out protocol. I'm not sure what the condition was
exactly, but somehow the rust client got confused if it received a
ReadyForQuery message that it was not expecting.

Fixes https://github.com/zenithdb/zenith/issues/148.
2021-05-19 22:31:28 +03:00
Stas Kelvich
d59cb2ca7a clean up some leftovers after 746f66731 2021-05-19 22:17:48 +03:00
Heikki Linnakangas
e6a7241c3a Simplify construction of rocksdb keys and values.
I'm going nuts with the pattern:

    let k = iter.key().unwrap();
    buf.clear();
    buf.extend_from_slice(&k);
    let key = CacheKey::unpack(&mut buf);

Introduce helper functions to convert a CacheKey into BytesMut, and
from [u8] into CacheKey. Reduces the boilerplate code a lot.

The helper functions create a new BytesMut on each call, whereas the old
coding could reuse a single BytesMut, so this could be a bit slower. I
haven't tried measuring it, but at least it's not immediately noticeable,
and readability is much more imporatant at this point. We can optimize
later
2021-05-19 12:33:38 +03:00
Stas Kelvich
709b778904 Show help in CLI when no arguments provided 2021-05-19 12:32:57 +03:00
Heikki Linnakangas
aa8debf4e8 Add test for a relation that's larger than 1 GB.
This isn't very exciting with the current RocksDB implementation, because
it doesn't care about the PostgreSQL 1 GB segment boundaries at all.
But I think we will care about this in the future, and more tests is
generally better anyway.
2021-05-19 09:22:17 +03:00
Heikki Linnakangas
1912546e52 Change the meaning of PageServerConf.workdir
Commit 746f667311 added the 'workdir' field and the get_*_path()
functions, with the idea that we cd into the directory at page server
startup, so that the get_*_path() functions can always return paths
relative to '.', but 'workdir' shows the original path to it. Change it
so that 'conf.workdir' is always set to '.', too, and the get_*_path()
functions include 'workdir' in the returned paths. Why? Because that
allows writing unit tests without changing the current directory.

When I was working on commit 97992226d3, I initially wrote the test so
that it changed the current working directory, just like commit 746f667311
did. But that was problematic, when I tried to add another unit test that
*also* wants to change the current working dir, because they could then
not run concurrently. In fact, they could not even run serially, unless
the current directory was carefully reset after the test. So it is better
to avoid changing the current directory in tests.
2021-05-19 08:49:16 +03:00
Heikki Linnakangas
a6178c135f Fix starting page server in non-daemonize mode.
Commit 746f667311 moved the "chdir" earlier in the startup sequence,
before daemonizing. But it forgot to remove a corresponding chdir call
later in the sequence when not in daemonize mode. As a result, if you
tried to start the pageserver without the --daemonize option, it always
failed with "No such file or directory" error.
2021-05-19 08:49:09 +03:00
Heikki Linnakangas
66bced0f36 Fix leftover comment about async I/O 2021-05-18 20:47:35 +03:00
Patrick Insinger
f954d5c501 pageserver - separate pagestream messages 2021-05-17 17:17:08 -04:00
Heikki Linnakangas
e602807476 Be more lenient with branch names.
Notably, the "foo@0/12345678" syntax was not allowed, because '/' is not
a word character.
2021-05-17 20:44:00 +03:00
Eric Seppanen
398d522d88 cargo fmt 2021-05-17 09:29:58 -07:00
Stas Kelvich
746f667311 Refactor CLI and CLI<->pageserver interfaces to support remote pageserver
This patch started as an effort to support CLI working against remote
pageserver, but turned into a pretty big refactoring.

* CLI now does not look into repository files directly. New commands
'branch_create' and 'identify_system' were introduced into page_service to
support that.
* Branch management that was scattered between local_env and
zenith/main.rs is moved into pageserver/branches.rs. That code could better fit
in Repository/Timeline impl, but I'll leave that for a different patch.
* All tests-related code from local_env went into integration_tests/src/lib.rs as an
extension to PostgresNode trait.
* Paths-generating functions were concentrated around corresponding config
types (LocalEnv and PageserverConf).
2021-05-17 19:17:51 +03:00
Heikki Linnakangas
952424b78c Move save_decoded_record() function to Repository trait.
The function doesn't depend on the implementation of the Repository, it
only calls the public interface functions.
2021-05-17 15:16:28 +03:00
Konstantin Knizhnik
04dc698d4b Add support of twophase transactions 2021-05-16 00:03:20 +03:00
Heikki Linnakangas
6b11b4250e Fix compilation with older rust version.
Commit 9ece1e863d used `slice.fill`, which isn't available until Rust
v1.50.0. I have 1.48.0 installed, so it was failing to compile for me.

We haven't really standardized on any particular Rust version, and if
there's a good feature we need in a recent version, let's bump up the
minimum requirement. But this is simple enough to work around.
2021-05-15 01:42:33 +03:00
Konstantin Knizhnik
9ece1e863d Compute and restore pg_xact, pg_multixact and pg_filenode.map files 2021-05-14 16:35:09 +03:00
Heikki Linnakangas
97992226d3 Add some unit tests for the Repository/Timeline interface. 2021-05-14 12:44:52 +03:00
Heikki Linnakangas
270356ec38 Refactor WalRedoManager for easier testing.
Turn WalRedoManager into an abstract trait, so that it can be easily
mocked in unit tests.

One change here is that the WAL redo manager is no longer tied to a
specific zenith timeline. It didn't do anything with that information
aside from using it in the dummy datadir's name. We could use any
random string for that purpose, it's just to prevent two WAL redo
managers from stepping over each other. But this commit actually
changes things so that all timelines use the same WAL redo manager, so
that's not necessary. We will probably want to maintain a pool of WAL
redo processes in the future, but for now let's keep it simple.

In the passing, fix some comments.
2021-05-14 12:44:49 +03:00
Heikki Linnakangas
c2db828481 Create RocksDB databases under correct path.
We used to create them under .zenith/.zenith/<timelineid>. The double
.zenith was clearly not intentional. Change it to
.zenith/timelines/<timelineid>.

Fixes https://github.com/zenithdb/zenith/issues/127
2021-05-14 12:44:44 +03:00
anastasia
38c4b6f02f Move postgres code related to zenith pageserver to contrib/zenith.
- vendor/postgres changes
- Respective changes in RUST code: upload shared library, use new GUC names.
- Add contrib build to Makefile.
2021-05-13 16:23:21 +03:00
Eric Seppanen
6ff3f1b9fd don't open log files multiple times
Multiple fds writing to the same file doesn't work. One fd will
overwrite the output of the other fd. We were opening log files three
times (stdout, stderr, and slog).

The symptoms can be seen when the program panics; the final file will
have truncated or lost messages. After this change, all messages are
preserved. If panicking and logging are concurrent (and they definitely
can be), some of the messages may be interleaved in slightly
inconvenient ways.

File::try_clone() is essentially `dup` underneath, meaning the two will
share the same file offset.
2021-05-13 00:32:39 -07:00
Patrick Insinger
4c5e23d014 pageserver - fix ParameterStatus write call 2021-05-12 20:59:04 -04:00