Previous version of spec caused parsing errors in generated clients
as return type is object not array, also one field was missing. In
a passing set `format: hex` on ancestor_id too as value conforms to
that format.
This change makes most parts of the code asynchronous, except
for the `mgmt` subsystem (we're going to drop it anyway).
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
If a heap UPDATE record modified two pages, and both pages needed to have
their VM bits cleared, and the VM bits were located on the same VM page,
we would emit two ZenithWalRecord::ClearVisibilityMapFlags records for
the same VM page. That produced warnings like this in the pageserver log:
Page version Wal(ClearVisibilityMapFlags { heap_blkno: 18, flags: 3 }) of rel 1663/13949/2619_vm blk 0 at 2A/346046A0 already exists
To fix, change ClearVisibilityMapFlags so that it can update the bits
for both pages as one operation.
This was already covered by several python tests, so no need to add a
new one. Fixes#1125.
Co-authored-by: Konstantin Knizhnik <knizhnik@zenith.tech>
It was printing a lot of stuff to the log with INFO level, for routine
things like receiving or sending messages. Reduce the noise. The amount
of logging was excessive, and it was also consuming a fair amount of CPU
(about 20% of safekeeper's CPU usage in a little test I ran).
* Always initialize flush_lsn/commit_lsn metrics on a specific timeline, no more `n/a`
* Update flush_lsn metrics missing from cba4da3f4d
* Ensure that flush_lsn found on load is >= than both commit_lsn and truncate_lsn
* Add some debug logging
Use GUC zenith.max_cluster_size to set the limit.
If limit is reached, extend requests will throw out-of-space error.
When current size is too close to the limit - throw a warning.
Add new test: test_timeline_size_quota.
Use log::error!() instead. I spotted a few of these "connection error"
lines in the logs, without timestamps and the other stuff we print for
all other log messages.
Timeline is active whenever there is at least 1 connection from compute or
pageserver is not caught up. Currently 'active' means callmemaybes are being
sent.
Fixes race: now suspend condition checking and callmemaybe unsubscribe happen
under the same lock.