## Problem
It will be useful to understand what kind of queries our clients are
executed.
And one of the most important characteristic of query is query execution
time - at least it allows to distinguish OLAP and OLTP queries. Also
monitoring query execution time can help to detect problem with
performance (assuming that workload is more or less stable).
## Summary of changes
Add query execution time histogram.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
`safekeepers_cmp` was added by #8840 to make changes of the safekeeper
set order independent: a `sk1,sk2,sk3` specifier changed to
`sk3,sk1,sk2` should not cause a walproposer restart. However, this
check didn't support generations, in the sense that it would see the
`g#123:` as part of the first safekeeper in the list, and if the first
safekeeper changes, it would also restart the walproposer.
Therefore, parse the generation properly and make it not be a part of
the generation.
This PR doesn't add a specific test, but I have confirmed locally that
`test_safekeepers_reconfigure_reorder` is fixed with the changes of PR
#11712 applied thanks to this PR.
Part of https://github.com/neondatabase/neon/issues/11670
Use that instead of the half-baked Adaptive Radix Tree
implementation. ART would probably be better in the long run, but more
complicated to implement.
In order to enable TLS connections between computes and safekeepers, we
need to provide the control plane with a way to configure the various
libpq keyword parameters, sslmode and sslrootcert. neon.safekeepers is a
comma separated list of safekeepers formatted as host:port, so isn't
available for extension in the same way that neon.pageserver_connstring
is. This could be remedied in a future PR.
Part-of: https://github.com/neondatabase/cloud/issues/25823
Link:
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
Signed-off-by: Tristan Partin <tristan@neon.tech>
There were conflicts because of the differences in the page_api
protocol that was merged to main vs what was on the branch. I adapted
the code for the protocol in main.
## Problem
See https://github.com/neondatabase/neon/issues/11997
This guard prevents race condition with pump prefetch state (initiated
by timeout).
Assert checks that prefetching is also done under guard.
But prewarm knows nothing about it.
## Summary of changes
Pump prefetch state only in regular backends.
Prewarming is done by background workers now.
Also it seems to have not sense to pump prefetch state in any other
background workers: parallel executors, vacuum,... because they are
short living and can not leave unconsumed responses in socket.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
Prefetched and LFC results are not checked in DEBUG_COMPARE_LOCAL mode
## Summary of changes
Add check for this results as well.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
- Add another locking hash table to track which cached pages are currently being
modified, by smgrwrite() or smgrread() or by prefetch.
- Use single-value Leaf pages in the art tree. That seems simpler after all,
and it eliminates some corner cases where a Value needed to be cloned, which
made it tricky to use atomics or other interior mutability on the Values
With the 50ms timeouts of pumping state in connector.c, we need to
correctly handle these timeouts that also wake up pg_usleep.
This new approach makes the connection attempts re-start the wait
whenever it gets woken up early; and CHECK_FOR_INTERRUPTS() is called to
make sure we don't miss query cancellations.
## Problem
https://neondb.slack.com/archives/C04DGM6SMTM/p1746794528680269
## Summary of changes
Make sure we start sleeping again if pg_usleep got woken up ahead of
time.
## Problem
Some small cosmetic changes I made while reading the code. Should not
affect anything.
## Summary of changes
- Remove `n_votes` field because it's not used anymore
- Explicitly initialize `safekeepers_generation` with
`INVALID_GENERATION` if the generation is not present (the struct is
zero-initialized anyway, but the explicit initialization is better IMHO)
- Access SafekeeperId via pointer `sk_id` created above
## Problem
Undo unintended change 60b9fb1baf
## Summary of changes
Add assert that we are not storing fake LSN in LwLSN.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
See https://github.com/neondatabase/neon/issues/11790
The neon extension opens extensions to the pageservers, which consumes
file descriptors. Postgres has a mechanism to count how many FDs are in
use, but it doesn't know about those FDs. We should call
ReserveExternalFD() or AcquireExternalFD() to account for them.
## Summary of changes
Call `ReserveExternalFD()` for each shard
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Mikhail Kot <mikhail@neon.tech>
## Problem
`TermsCollectedMset` and `VotesCollectedMset` accept a MemberSet
argument to find a quorum in. It may be either `wp->mconf.members` or
`wp->mconf.new_members`. But the loops inside always use
`wp->mconf.members.len`.
If the sizes of member sets are different, it may lead to these
functions not scanning all the safekeepers from `mset`.
We are not planning to change the member set size dynamically now, but
it's worth fixing anyway.
- Part of https://github.com/neondatabase/neon/issues/11669
## Summary of changes
- Use proper size of member set in `TermsCollectedMset` and
`VotesCollectedMset`
## Problem
See https://neondb.slack.com/archives/C04DGM6SMTM/p1745599814030679
Assume the following scenario: prefetch_wait_for is doing
`CHECK_FOR_INTERRUPTS` which tries to load prefetch responses.
In case of error is calls pageserver_disconnect which aborts all
in-flight requests. But such failure is not detected by
`prefetch_wait_for` which returns true. As a result
`communicator_read_at_lsnv` assumes that slot is received, but as far as
asserts are disables at prod, it is not actually checked.
Then it tries to interpret response and ... *SIGSEGV*
## Summary of changes
Check target slot state in `prefetch_wait_for`.
Resolves https://github.com/neondatabase/cloud/issues/28258
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
We have been running compute <-> sk protocol version 3 for a while on
staging with no issues observed, and want to fully migrate to it
eventually.
## Summary of changes
Let's make v3 the default.
ref https://github.com/neondatabase/neon/issues/10326
---------
Co-authored-by: Arpad Müller <arpad@neon.tech>