Commit Graph

1472 Commits

Author SHA1 Message Date
Heikki Linnakangas
a4700c9bbe Use pprof to get flamegraph of get_page and get_relsize requests.
This depends on a hacked version of the 'pprof-rs' crate. Because of
that, it's under an optional 'profiling' feature. It is disabled by
default, but enabled for release builds in CircleCI config. It doesn't
currently work on macOS.

The flamegraph is written to 'flamegraph.svg' in the pageserver
workdir when the 'pageserver' process exits.

Add a performance test that runs the perf_pgbench test, with profiling
enabled.
2022-04-21 20:32:48 +03:00
Heikki Linnakangas
dafdf9b952 Handle EINTR 2022-04-21 16:37:36 +03:00
Heikki Linnakangas
263d60f12d Add prometheus metric for time spent waiting for WAL to arrive 2022-04-21 16:37:32 +03:00
Arseny Sher
abcd7a4b1f Insert less data in test_wal_restore.
Otherwise it sometimes hits 2m statement timeout in CI.
2022-04-21 16:00:15 +04:00
Kirill Bulatov
81cad6277a Move and library crates into a dedicated directory and rename them 2022-04-21 13:30:33 +03:00
Kirill Bulatov
629688fd6c Drop redundant resolver setting for 2021 edition 2022-04-21 13:30:33 +03:00
Heikki Linnakangas
9d3779c124 Add a counter for materialized page cache hits. 2022-04-20 21:26:03 +03:00
Heikki Linnakangas
334a1d6b5d Fix materialized page caching with delta layers.
We only checked the cache page version when collecting WAL records in
an in-memory layer, not in a delta layer. Refactor the code so that we
always stop collecting WAL records when we reach a cached materialized
page.

Fix the assertion on the LSN range in
InMemoryLayer::get_value_reconstruct_data. It was supposed to check
that the requested LSN range is within the layer's LSN range, but the
inequality was backwards. That went unnoticed before, because the
caller always passed the layer's start LSN as the requested LSN
range's start LSN, but now we might stop the search earlier, if we have
a cached page version.

Co-authored-by: Konstantin Knizhnik <knizhnik@zenith.tech>
2022-04-20 21:25:59 +03:00
Dmitry Rodionov
e41ad3be0f add more context to writeback error 2022-04-20 17:07:07 +03:00
Heikki Linnakangas
e113c6fa8d Print a warning if unlinking an ephemeral file fails.
Unlink failure isn't serious on its own, we were about to remove the
file anyway, but it shouldn't happen and could be a symptom of
something more serious.

We just saw "No such file or directory" errors happening from
ephemeral file writeback in staging, and I suspect if we had this
warning in place, we would have seen these warnings too, if the
problem was that the ephemeral file was removed before dropping the
EphemeralFile struct. Next time it happens, we'll have more
information.
2022-04-20 16:23:16 +03:00
Heikki Linnakangas
cbdfd8c719 Update 'routerify' dependency in proxy.
routerify version 3 is used in zenith_utils, use the same version in proxy
to avoid having to build two versions.
2022-04-20 14:42:05 +03:00
Heikki Linnakangas
86bf4301b7 Remove unnecessary dependency on 'webpki' 2022-04-20 14:36:54 +03:00
Heikki Linnakangas
9eaa21317c Update jsonwebtoken crate.
With this, we no longer need to build two versions of 'pem' and 'base64'
crates. Introduces a duplicate version of 'time' crate, though, but it's
still progress.
2022-04-20 14:27:49 +03:00
Heikki Linnakangas
e660e12f79 Update rustls-split and rustls versions.
All dependencies now use rustls 0.20.2, so we no longer need to build two
versions of it.
2022-04-20 14:07:55 +03:00
Konstantin Knizhnik
ac52f4f2d6 Set superuser when initializing database for wal recovery (#1544) 2022-04-20 13:24:38 +03:00
Heikki Linnakangas
5e95338ee9 Improve logging in test_wal_restore.py
- Capture the output of the restore_from_wal.sh in a log file
- Kill "restored" Postgres server on test failure
2022-04-20 11:18:40 +03:00
Heikki Linnakangas
170badd626 Capture the postgres log in all tests that start a vanilla Postgres. 2022-04-20 11:18:40 +03:00
Kirill Bulatov
91fb21225a Show more logs during S3 sync 2022-04-20 02:57:03 +03:00
Kirill Bulatov
3e6087a12f Remove S3 archiving 2022-04-19 23:13:52 +03:00
Kirill Bulatov
44bfc529f6 Require specifying the upload size in remote storage 2022-04-19 23:13:52 +03:00
bojanserafimov
ef72eb84cf Remove zenfixture (#1534) 2022-04-19 09:46:47 -04:00
Kirill Bulatov
a1e34772e5 Improve compute error logging 2022-04-19 00:20:08 +03:00
Stas Kelvich
389bd1faeb Support for SCRAM-SHA-256 in compute tools 2022-04-18 22:19:01 +03:00
Anastasia Lubennikova
c15aa04714 Move Cluster size limit RFC from rfcs repo 2022-04-18 18:11:31 +03:00
Kirill Bulatov
52e0816fa5 wal_acceptor -> safekeeper 2022-04-18 12:52:31 +03:00
Kirill Bulatov
81417788c8 walkeeper -> safekeeper 2022-04-18 12:52:31 +03:00
Kirill Bulatov
81879f8137 Restore missing cachepot env vars 2022-04-18 12:32:04 +03:00
Arseny Sher
5b29774532 Small refactoring after ec3bc74165.
Move record_safekeeper_info inside safekeeper.rs, fix commit_lsn update, sync
control file.
2022-04-18 13:11:34 +04:00
Kirill Bulatov
0ca2bd929b Remove log crate from pageserver 2022-04-18 00:00:36 +03:00
Kirill Bulatov
9b7dcc2bae Use proper cachepot bucket 2022-04-17 16:35:40 +03:00
Kirill Bulatov
3136a0754a Use mold in Docker images 2022-04-17 00:50:28 +03:00
Kirill Bulatov
787f0d33f0 Use another cachepot bucket for rust Docker build caches 2022-04-16 23:36:42 +03:00
Kirill Bulatov
ed5f9acca9 Revert "Revert libc upgrade" (#1527)
This reverts commit 4bc338babc.
2022-04-16 13:38:48 +03:00
Kirill Bulatov
4bc338babc Revert libc upgrade 2022-04-16 10:03:26 +03:00
Kirill Bulatov
3ab090b43a Fix compute tools build 2022-04-15 23:12:35 +03:00
Kirill Bulatov
7126979950 Remove custom neon Docker build image 2022-04-15 20:08:22 +03:00
Arseny Sher
9946cd1125 Bump vendor/postgres to add safekeeper connection timeout. 2022-04-15 20:44:56 +04:00
Dmitry Ivanov
ab20f2c491 Use the same version of rust-postgres everywhere. (#1516)
Turns out we still had a stale dep in `compute_tools`.
2022-04-15 18:36:11 +03:00
Dmitry Ivanov
c9d897f9b6 [proxy] Update rustls (#1510) 2022-04-15 12:06:25 +03:00
Kirill Bulatov
e97f94cc30 Bump rustc version 2022-04-14 23:01:06 +03:00
Dmitry Rodionov
2cb39a1624 add missing files, update workspace hack 2022-04-14 20:41:21 +03:00
Heikki Linnakangas
93e0ac2b7a Remove a couple of unused dependencies.
Found by "cargo-udeps"
2022-04-14 17:38:26 +03:00
bojanserafimov
d5ae9db997 Add s3 cost estimate to tests (#1478) 2022-04-14 10:09:03 -04:00
Heikki Linnakangas
9e4de6bed0 Use RwLock instad of Mutex for layer map lock.
For more concurrency
2022-04-14 13:34:01 +03:00
Heikki Linnakangas
4a8c663452 Refactor pgbench tests.
- Remove batch_others/test_pgbench.py. It was a quick check that pgbench
  works, without actually recording any performance numbers, but that
  doesn't seem very interesting anymore. Remove it to avoid confusing it
  with the actual pgbench benchmarks

- Run pgbench with "-n" and "-S" options, for two different workloads:
  simple-updates, and SELECT-only. Previously, we would only run it with
  the "default" TPCB-like workload. That's more or less the same as the
  simple-update (-n) workload, but I think the simple-upload workload
  is more relevant for testing storage performance. The SELECT-only
  workload is a new thing to measure.

- Merge test_perf_pgbench.py and test_perf_pgbench_remote.py. I added
  a new "remote" implementation of the PgCompare class, which allows
  running the same tests against an already-running Postgres instance.

- Make the PgBenchRunResult.parse_from_output function more
  flexible. pgbench can print different lines depending on the
  command-line options, but the parsing function expected a particular
  set of lines.
2022-04-14 13:31:42 +03:00
Heikki Linnakangas
a009fe912a Refactor connection option handling in python tests
The PgProtocol.connect() function took extra options for username,
database, etc. Remove those options, and have a generic way for each
subclass of PgProtocol to provide some default options, with the
capability override them in the connect() call.
2022-04-14 13:31:40 +03:00
Heikki Linnakangas
19954dfd8a Refactor proxy options test to not rely on the 'schema' argument.
It was the only test that used the 'schema' argument to the connect()
function. I'm about to refactor the option handling and will remove
the special 'schema' argument altogether, so rewrite the test to not
use it.
2022-04-14 13:31:37 +03:00
Heikki Linnakangas
570db6f168 Update README for Zenith -> Neon renaming.
There's a lot of renaming left to do in the code and docs, but this is
a start. Our binaries and many other things are still called "zenith",
but I didn't change those in the README, because otherwise the
examples won't work. I added a brief note at the top of the README to
explain that we're in the process of renaming, until we've renamed
everything.
2022-04-14 11:30:01 +03:00
Arthur Petukhovsky
cdf04b6a9f Fix control file updates in safekeeper (#1452)
Now control_file::Storage implements Deref for read-only access to the state. All updates should clone the state before modifying and persisting.
2022-04-14 09:31:35 +03:00
Dhammika Pathirana
a0781f229c Add ps compact command
Signed-off-by: Dhammika Pathirana <dhammika@gmail.com>

Add ps compact command to api (#707) (#1484)
2022-04-13 22:47:13 -07:00