Commit Graph

1650 Commits

Author SHA1 Message Date
Arseny Sher
54b75248ff s3 WAL offloading staging review.
- Uncomment accidently `self.keep_alive.abort()` commented line, due to this
  task never finished, which blocked launcher.
- Mess up with initialization one more time, to fix offloader trying to back up
  segment 0. Now we initialize all required LSNs in handle_elected,
  where we learn start LSN for the first time.
- Fix blind attempt to provide safekeeper service file with remote storage
  params.
2022-05-27 14:02:52 +04:00
Arseny Sher
0e1bd57c53 Add WAL offloading to s3 on safekeepers.
Separate task is launched for each timeline and stopped when timeline doesn't
need offloading. Decision who offloads is done through etcd leader election;
currently there is no pre condition for participating, that's a TODO.

neon_local and tests infrastructure for remote storage in safekeepers added,
along with the test itself.

ref #1009

Co-authored-by: Anton Shyrabokau <ahtoxa@Antons-MacBook-Pro.local>
2022-05-27 06:19:23 +04:00
bojanserafimov
1d71949c51 Change proxy welcome message (#1808)
Remove zenith sun and outdated instructions around .pgpass
2022-05-26 14:59:03 -04:00
Thang Pham
7d565aa4b9 Reduce the logging level when PG client disconnected to INFO (#1713)
Fixes #1683.
2022-05-26 12:21:15 -04:00
Dmitry Rodionov
72a7220dc8 Tidy up some log messages
* turn println into an info with proper message
* rename new_local_timeline to load_local_timeline because it does not
  create new timeline, it registers timeline that exists on disk in
  pageserver in-memory structures
2022-05-26 18:37:40 +03:00
Konstantin Knizhnik
b0d114ee3f Initialize last_freeze_at with disk consistent LSN to avoid creation of small L0 delta layer on startup
refer #1736
2022-05-26 15:42:18 +03:00
Dmitry Rodionov
38f2d165b7 allow TLS 1.2 in proxy to be compatible with older client libraries 2022-05-26 13:21:29 +03:00
Dmitry Rodionov
5a5737278e add simple metrics for remote storage operations
track number of operations and number of their failures
2022-05-26 01:24:52 +03:00
Kirill Bulatov
06f5e017a1 Move rustfmt check to GH Action 2022-05-26 01:03:48 +03:00
Kirill Bulatov
887b0e14d9 Run basic checks on PRs and pushes to main only 2022-05-26 01:03:48 +03:00
chaitanya sharma
c584d90bb9 initial commit, renamed znodeid to nodeid. 2022-05-25 20:11:26 +03:00
Heikki Linnakangas
7997fc2932 Fix error handling with 'basebackup' command.
If the 'basebackup' command failed in the middle of building the tar
archive, the client would not report the error, but would attempt to
to start up postgres with the partial contents of the data directory.
That fails because the control file is missing (it's added to the
archive last, precisly to make sure that you cannot start postgres
from a partial archive). But the client doesn't see the proper error
message that caused the basebackup to fail in the server, which is
confusing.

Two issues conspired to cause that:

1. The tar::Builder object that we use in the pageserver to construct
the tar stream has a Drop handler that automatically writes a valid
end-of-archive marker on drop. Because of that, the resulting tarball
looks complete, even if an error happens while we're building it. The
pageserver does send an ErrorResponse after the seemingly-valid
tarball, but:

2. The client stops reading the Copy stream, as soon as it sees the
tar end-of-archive marker. Therefore, it doesn't read the
ErrorResponse that comes after it.

We have two clients that call 'basebackup', one in `control_plane`
used by the `neon_local` binary, and another one in
`compute_tools`. Both had the same issue.

This PR fixes both issues, even though fixing either one would be
enough to fix the problem at hand. The pageserver now doesn't send the
end-of-archive marker on error, and the client now reads the copy
stream to the end, even if it sees an end-of-archive marker.

Fixes github issue #1715

In the passing, change Basebackup to use generic Write rather than
'dyn'.
2022-05-25 18:14:44 +03:00
Heikki Linnakangas
24d2313d0b Set --quota-backend-bytes when launching etcd in tests.
By default, etcd makes a huge 10 GB mmap() allocation when it starts up.
It doesn't actually use that much memory, it's just address space, but
it caused me grief when I tried to use 'rr' to debug a python test run.
Apparently, when you replay the 'rr' trace, it does allocate memory for
all that address space.

The size of the initial mmap depends on the --quota-backend-bytes setting.
Our etcd clusters are very small, so let's set --quota-backend-bytes to
keep the virtual memory size small, to make debugging with 'rr' easier.

See https://github.com/etcd-io/etcd/issues/7910 and
5e4b008106
2022-05-25 16:57:45 +03:00
Andrey Taranik
9ab52e2186 helm repository name fix for production proxy deploy (#1790) 2022-05-25 15:41:18 +03:00
Heikki Linnakangas
6f1f33ef42 Improve error messages on seccomp loading errors.
Bump vendor/postgres for https://github.com/neondatabase/postgres/pull/166
2022-05-25 14:33:06 +03:00
Andrey Taranik
703f691df8 production inventory update (#1779) 2022-05-25 14:30:50 +03:00
Arseny Sher
2b265fd6dc Disable restart_after_crash in neon_local.
It is pointless when basebackup is invalid.
2022-05-25 14:48:11 +04:00
Sergey Melnikov
d32b491a53 Add zenith-us-stage-sk-6 to deploy (#1728) 2022-05-25 10:31:10 +03:00
Kirill Bulatov
541ec25875 Properly shutdown test mock S3 server 2022-05-24 19:09:31 +03:00
KlimentSerafimov
8346aa3a29 Potential fix to #1626. Fixed typo is Makefile. (#1781)
* Potential fix to #1626. Fixed typo is Makefile.
* Completed fix to #1626.

Summary:
changed 'error' to 'bail' in start_pageserver and start_safekeeper.
2022-05-24 04:55:38 -04:00
Heikki Linnakangas
2aceb6a309 Fix garbage collection to not remove image layers that are still needed.
The logic would incorrectly remove an image layer, if a new image layer
existed, even though the older image layer was still needed by some
delta layers after it. See example given in the comment this adds.

Without this fix, I was getting a lot of "could not find data for key
010000000000000000000000000000000000" errors from GC, with the new test
case being added in PR #1735.

Fixes #707
2022-05-23 20:58:27 +03:00
KlimentSerafimov
3ff5caf786 Add to readme install protobuf etcd (#1777)
* Update installation instructions
* Added libprotobuf-dev etcd to apt install

Added "brew install protobuf etcd" to OSX installation instructions.
Added "sudo apt install libprotobuf-dev etcd" to Linux installation instructions.
Without these, cargo build complains. 
Figured out in collaboration with Bojan.
2022-05-23 13:11:59 -04:00
chaitanya sharma
fbedd535c0 Replace a bunch of zenith references with neon. 2022-05-23 13:16:00 +03:00
Egor Suvorov
89e5659f3f Replace COPYRIGHT file from the root with NOTICE file
The primary reason: make GitHub detect that we use Apache License 2.0
They do it via https://github.com/licensee/licensee Ruby library (gem).

Our COPYRIGHT file contains a part of the Apache License, which should
be added to a source file, not the license or copyright information itself,
which confuses the library.

Instead, the recommended way is to create a NOTICE file which references
license of the code and its bundled dependencies.
2022-05-23 01:03:03 +02:00
Egor Suvorov
ef7cdb13e2 Remove unused dependencies from poetry.lock via poetry lock --no-update
There were a bunch of dependencies for Python <3.9. They are not needed
after #1254. This commit makes it easier to add/remove dependencies because
lock file will be updated like this on any such operation.

Do not update dependencies yet to not break anything.
2022-05-21 12:21:45 +02:00
Egor Suvorov
73187bfef1 postgres_ffi: find_end_of_wal_segment: clarify code around xl_crc retrieval
It would be better to not update xl_crc/rec_hdr at all when skipping contrecord,
but I would prefer to keep PR #1574 small.
Better audit of `find_end_of_wal_segment` is coming anyway in #544.
2022-05-21 05:25:17 +02:00
Egor Suvorov
967eb38e81 postgres_ffi: find_end_of_wal_segment: fix contrecord skipping
Also enable corresponding test.
2022-05-21 05:25:17 +02:00
Egor Suvorov
a124e44866 postgres_ffi: find_end_of_wal_segment: add lots of trace 2022-05-21 05:25:17 +02:00
Egor Suvorov
c4b77084af utils: add const_assert! macro 2022-05-21 05:25:17 +02:00
Egor Suvorov
c9efdec8db postgres_ffi: find_end_of_wal_segment: improve name of wal_crc variable
Now it reflects the field it's mirroring.
2022-05-21 05:25:17 +02:00
Egor Suvorov
12b7c793b3 postgres_ffi: find_end_of_wal_segment: remove redundant CRC operations
Previous invariant: `crc` contains an "unfinalized" CRC32 value,
its one complement, like in postgres before FIN_CRC32C.

New invariant: `crc` always contains a "finalized" CRC32 value,
this is the semantics of crc32c_append, so we don't need to invert CRC manually.
2022-05-21 05:25:17 +02:00
Egor Suvorov
3c6890bf1d postgres_ffi: add complex WAL tests for find_end_of_wal
* Actual generation logic is in a separate crate `postgres_ffi/wal_generate`
* The create also provides a binary for debug purposes akin to `initdb`
* Two tests currently fail and are ignored
* There is no easy way to test this directly in Safekeeper as it starts restoring from commit_lsn.
  So testing would require disconnecting Safekeeper just after it has received the WAL,
  but before it is committed.
2022-05-21 05:25:17 +02:00
Andrey Taranik
d97617ed3a updated proxy and proxy scram deployment for prod and stress environments (#1758) 2022-05-20 23:12:30 +03:00
KlimentSerafimov
65cf1a3221 Added paths to openssl includes and libraries for OSX because make complained that it couldn't find them. (#1761) 2022-05-20 12:02:51 -04:00
bojanserafimov
a4aef5d8dc Compile psql with openssl (#1725) 2022-05-19 12:25:31 -04:00
Heikki Linnakangas
ffbb9dd155 Add a 5 minute timeout to python tests.
The CI times out after 10 minutes of no output. It's annoying if a
test hangs and is killed by the CI timeout, because you don't get
information about which test was running. Try to avoid that, by adding
a slightly smaller timeout in pytest itself. You can override it on a
per-test basis if needed, but let's try to keep our tests shorter than
that.

For the Postgres regression tests, use a longer 30 minute timeout.
They're not really a single test, but many tests wrapped in a single
pytest test. It's OK for them to run longer in aggregate, each
Postgres test is still fairly short.
2022-05-19 14:04:14 +03:00
Egor Suvorov
baf7a81dce git-upload: pass committer to 'git rebase' (fix #1749) (#1750)
No committer was specified, which resulted in failing `git rebase` if
the branch is not up-to-date.
2022-05-19 14:01:03 +03:00
Heikki Linnakangas
ee3bcf108d Fix compact_level0 for delta layers with overlap or gaps
We saw a case in staging, where there was a gap in the LSN ranges of
level 0 files, like this:

    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001696070-00000000016960E9
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000016960E9-00000000016E4DB9
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000016E4DB9-000000000BFCE3E1
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__000000000BFCE3E1-000000000BFD0FE9
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000060045901-000000007005EAC1
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__000000007005EAC1-0000000080062E99
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000080062E99-000000009007F481
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__000000009007F481-00000000A009F7C9
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000A009F7C9-00000000AA284EB9
    000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000AA286471-00000000AA2886B9

Note that gap between 000000000BFD0FE9 and 0000000060045901. I don't
know how that happened, but in general the pageserver should be robust
if there are gaps like that, or overlapping files etc. In theory they
could happen as result of crashes, partial downloads from S3 etc.,
although it is mystery what caused it in this case.

Looking at the compaction code, it was not safe in the face of gaps
like that. The compaction routine collected all the level 0 files, and
took their min(start)..max(end) as the range of the new files it
builds. That's wrong, if the level 0 files don't cover the whole LSN
range; the newly created files will miss any records in the gap. Fix
that, by only collecting contiguous sequences of level 0 files, so
that the end LSN of previous delta file is equal to the start of the
next one.

Fixes issue #1730
2022-05-19 10:19:38 +03:00
Heikki Linnakangas
0da4046704 Include traversal path in error message.
Previously, the path was printed to the log with separate error!() calls.
It's better to include the whole path in the error object and have it
printed to the log as one message.

Also print the path in the ValueReconstructResult::Missing case.

This is what it looks like now:

    2022-05-17T21:53:53.611801Z ERROR pagestream{timeline=5adcb4af3e95f00a31550d266aab7a37 tenant=74d9f9ad3293c030c6a6e196dd91c60f}: error reading relation or page version: could not find data for key 000000067F000032BE000000000000000001 at LSN 0/1698C48, for request at LSN 0/1698CF8

    Caused by:
        0: layer traversal: result Complete, cont_lsn 0/1698C48, layer: 000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__0000000001698C48-0000000001698CC1
        1: layer traversal: result Continue, cont_lsn 0/1698CC1, layer: inmem-0000000001698CC1-FFFFFFFFFFFFFFFF

    Stack backtrace:
2022-05-19 10:19:38 +03:00
Anastasia Lubennikova
cbd00d7ed9 Remove temp layer files during timeline initialization on pageserver start 2022-05-19 10:11:12 +03:00
Anastasia Lubennikova
4c30ae8ba3 Add random string as a part of tempfile name 2022-05-19 10:11:12 +03:00
Anastasia Lubennikova
3da4b3165e Fsync layer files before rename 2022-05-19 10:11:12 +03:00
Anastasia Lubennikova
c1b365fdf7 Use temp filename while writing ImageLayer file 2022-05-19 10:11:12 +03:00
Egor Suvorov
fab104d5f3 docs/sourcetree: add note about exact Python version used and how to choose it 2022-05-19 00:09:13 +02:00
Egor Suvorov
7dd27ecd20 Bump minimal supported Python version to 3.9
Most of the CI already run with Python 3.9 since https://github.com/neondatabase/docker-images/pull/1
2022-05-19 00:09:13 +02:00
Egor Suvorov
bd2979d02c CirleCI/check-codestyle-python: print versions 2022-05-19 00:09:13 +02:00
Dmitry Rodionov
5914aab78a add comments, use expect instead of unwrap 2022-05-19 00:54:14 +03:00
Heikki Linnakangas
4a36d89247 Avoid spawning a layer-flush thread when there's no work to do.
The check_checkpoint_distance() always spawned a new thread, even if
there is no frozen layer to flush. That was a thinko, as @knizhnik
pointed out.
2022-05-19 00:51:48 +03:00
Egor Suvorov
432907ff5f Safekeeper: avoid holding mutex when deleting a tenant (#1746)
Following discussion with @arssher after #1653
2022-05-18 23:02:17 +03:00
Arthur Petukhovsky
98da0aa159 Add _total suffix to metrics name (#1741) 2022-05-18 15:17:04 +03:00