Now it's possible to call Fe{Startup,}Message in both
sync and async contexts, which is good for proxy.
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
* Fix checkpoint.nextXid update
* Add test for cehckpoint.nextXid
* Fix indentation of test_next_xid.py
* Fix mypy error in test_next_xid.py
* Tidy up the test case.
* Add a unit test
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>
* Freeze vectors at the same end LSN
* Fix calculation of last LSN for inmem layer
* Do not advance disk_consistent_lsn is no open layer was evicted
* Fix calculation of freeze_end_lsn
* Let start_lsn be larger than oldest_pending_lsn
* Rename 'oldest_pending_lsn' and 'last_lsn', add comments.
* Fix future_layerfiles test
* Update comments conserning olest_lsn
* Update comments conserning olest_lsn
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>
Now zenith_cli handles wal_acceptors config internally, and if we
will append wal_acceptors to postgresql.conf in python tests, then
it will contain duplicate wal_acceptors config.
Currently ztimelineids are unique, but all APIs accept the pair, so let's keep
it everywhere for uniformity.
Carry around ZTTId containing both ZTenantId and ZTimelineId for simplicity.
(existing clusters on staging ought to be preprocessed for that)
* Reproduce github issue #1047.
* Use RwLock to protect gc_cuttof_lsn
* Eeduce number of updates in test_gc_aggressive
* Change test_prohibit_get_page_at_lsn_for_garbage_collected_pages test
* Change test_prohibit_get_page_at_lsn_for_garbage_collected_pages
* Lock latest_gc_cutoff_lsn in all operations accessing storage to prevent race conditions with GC
* Remove random sleep between wait_for_lsn and get_page_at_lsn
* Initialize latest_gc_cutoff with initdb_lsn and remove separate check that lsn >= initdb_lsn
* Update test_prohibit_branch_creation_on_pre_initdb_lsn test
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>
to pass current_timeline_size to compute node
Put standby_status_update fields into ZenithFeedback and send them as one message.
Pass values sizes together with keys in ZenithFeedback message.
This patch includes attach/detach http endpoints in pageservers. Some
changes in callmemaybe handling inside safekeeper and an integrational
test to check migration with and without load. There are still some
rough edges that will be addressed in follow up patches
Mainly because it has better support for installing the packages from
different python versions.
It also has better dependency resolver than Pipenv. And supports modern
standard for python dependency management. This includes usage of
pyproject.toml for project specific configuration instead of per
tool conf files. See following links for details:
https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/https://www.python.org/dev/peps/pep-0518/
* Do not delete layers beyand cutoff LSN
* Update pageserver/src/layered_repository/layer_map.rs
Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
This introduces a new module to handle thread creation and shutdown.
All page server threads are now registered in a global hash map, and
there's a function to request individual threads to shut down gracefully.
Thread shutdown request is signalled to the thread with a flag, as well
as a Future that can be used to wake up async operations if shutdown is
requested. Use that facility to have the libpq listener thread respond
to pageserver shutdown, based on Kirill's earlier prototype
(https://github.com/zenithdb/zenith/pull/1088). That addresses
https://github.com/zenithdb/zenith/issues/1036, previously the libpq
listener thread would not exit until one more connection arrives.
This also eliminates a resource leak in the accept() loop. Previously,
we added the JoinHanlde of each new thread to a vector but old handles
for threads that had already exited were never removed.
Log the error and continue. Hopefully it's a transient failure.
This might have been happening in staging earlier, when the safekeeper
had a problem where it opened connections very frequently to issue
"callmemaybe" commands. If you launch too many threads too fast, you might
run out of file descriptors or something. It's not totally clear what
happened, but with commit, at least the page server will continue to run
and accept new connections, if a transient error happens.
'anyhow' crate can include a backtrace in all errors, when the
'backtrace' feature is enabled. Enable it, and change the places that used
'{:#}' or '{}' to '{:?}', so that the backtrace is printed.