This depends on a hacked version of the 'pprof-rs' crate. Because of
that, it's under an optional 'profiling' feature. It is disabled by
default, but enabled for release builds in CircleCI config. It doesn't
currently work on macOS.
The flamegraph is written to 'flamegraph.svg' in the pageserver
workdir when the 'pageserver' process exits.
Add a performance test that runs the perf_pgbench test, with profiling
enabled.
We only checked the cache page version when collecting WAL records in
an in-memory layer, not in a delta layer. Refactor the code so that we
always stop collecting WAL records when we reach a cached materialized
page.
Fix the assertion on the LSN range in
InMemoryLayer::get_value_reconstruct_data. It was supposed to check
that the requested LSN range is within the layer's LSN range, but the
inequality was backwards. That went unnoticed before, because the
caller always passed the layer's start LSN as the requested LSN
range's start LSN, but now we might stop the search earlier, if we have
a cached page version.
Co-authored-by: Konstantin Knizhnik <knizhnik@zenith.tech>
Unlink failure isn't serious on its own, we were about to remove the
file anyway, but it shouldn't happen and could be a symptom of
something more serious.
We just saw "No such file or directory" errors happening from
ephemeral file writeback in staging, and I suspect if we had this
warning in place, we would have seen these warnings too, if the
problem was that the ephemeral file was removed before dropping the
EphemeralFile struct. Next time it happens, we'll have more
information.
With this, we no longer need to build two versions of 'pem' and 'base64'
crates. Introduces a duplicate version of 'time' crate, though, but it's
still progress.
- Remove batch_others/test_pgbench.py. It was a quick check that pgbench
works, without actually recording any performance numbers, but that
doesn't seem very interesting anymore. Remove it to avoid confusing it
with the actual pgbench benchmarks
- Run pgbench with "-n" and "-S" options, for two different workloads:
simple-updates, and SELECT-only. Previously, we would only run it with
the "default" TPCB-like workload. That's more or less the same as the
simple-update (-n) workload, but I think the simple-upload workload
is more relevant for testing storage performance. The SELECT-only
workload is a new thing to measure.
- Merge test_perf_pgbench.py and test_perf_pgbench_remote.py. I added
a new "remote" implementation of the PgCompare class, which allows
running the same tests against an already-running Postgres instance.
- Make the PgBenchRunResult.parse_from_output function more
flexible. pgbench can print different lines depending on the
command-line options, but the parsing function expected a particular
set of lines.
The PgProtocol.connect() function took extra options for username,
database, etc. Remove those options, and have a generic way for each
subclass of PgProtocol to provide some default options, with the
capability override them in the connect() call.
It was the only test that used the 'schema' argument to the connect()
function. I'm about to refactor the option handling and will remove
the special 'schema' argument altogether, so rewrite the test to not
use it.
There's a lot of renaming left to do in the code and docs, but this is
a start. Our binaries and many other things are still called "zenith",
but I didn't change those in the README, because otherwise the
examples won't work. I added a brief note at the top of the README to
explain that we're in the process of renaming, until we've renamed
everything.