## Problem
Prefetched and LFC results are not checked in DEBUG_COMPARE_LOCAL mode
## Summary of changes
Add check for this results as well.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
With the 50ms timeouts of pumping state in connector.c, we need to
correctly handle these timeouts that also wake up pg_usleep.
This new approach makes the connection attempts re-start the wait
whenever it gets woken up early; and CHECK_FOR_INTERRUPTS() is called to
make sure we don't miss query cancellations.
## Problem
https://neondb.slack.com/archives/C04DGM6SMTM/p1746794528680269
## Summary of changes
Make sure we start sleeping again if pg_usleep got woken up ahead of
time.
## Problem
Some small cosmetic changes I made while reading the code. Should not
affect anything.
## Summary of changes
- Remove `n_votes` field because it's not used anymore
- Explicitly initialize `safekeepers_generation` with
`INVALID_GENERATION` if the generation is not present (the struct is
zero-initialized anyway, but the explicit initialization is better IMHO)
- Access SafekeeperId via pointer `sk_id` created above
## Problem
Undo unintended change 60b9fb1baf
## Summary of changes
Add assert that we are not storing fake LSN in LwLSN.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
See https://github.com/neondatabase/neon/issues/11790
The neon extension opens extensions to the pageservers, which consumes
file descriptors. Postgres has a mechanism to count how many FDs are in
use, but it doesn't know about those FDs. We should call
ReserveExternalFD() or AcquireExternalFD() to account for them.
## Summary of changes
Call `ReserveExternalFD()` for each shard
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Mikhail Kot <mikhail@neon.tech>
## Problem
`TermsCollectedMset` and `VotesCollectedMset` accept a MemberSet
argument to find a quorum in. It may be either `wp->mconf.members` or
`wp->mconf.new_members`. But the loops inside always use
`wp->mconf.members.len`.
If the sizes of member sets are different, it may lead to these
functions not scanning all the safekeepers from `mset`.
We are not planning to change the member set size dynamically now, but
it's worth fixing anyway.
- Part of https://github.com/neondatabase/neon/issues/11669
## Summary of changes
- Use proper size of member set in `TermsCollectedMset` and
`VotesCollectedMset`
## Problem
See https://neondb.slack.com/archives/C04DGM6SMTM/p1745599814030679
Assume the following scenario: prefetch_wait_for is doing
`CHECK_FOR_INTERRUPTS` which tries to load prefetch responses.
In case of error is calls pageserver_disconnect which aborts all
in-flight requests. But such failure is not detected by
`prefetch_wait_for` which returns true. As a result
`communicator_read_at_lsnv` assumes that slot is received, but as far as
asserts are disables at prod, it is not actually checked.
Then it tries to interpret response and ... *SIGSEGV*
## Summary of changes
Check target slot state in `prefetch_wait_for`.
Resolves https://github.com/neondatabase/cloud/issues/28258
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
We have been running compute <-> sk protocol version 3 for a while on
staging with no issues observed, and want to fully migrate to it
eventually.
## Summary of changes
Let's make v3 the default.
ref https://github.com/neondatabase/neon/issues/10326
---------
Co-authored-by: Arpad Müller <arpad@neon.tech>
Postgres has a nice self-documenting macro called pg_unreachable() when
you want to assert that a location in code won't be hit.
Warning in question:
```
/home/tristan957/Projects/work/neon//pgxn/neon/libpagestore.c: In function ‘pageserver_connect’:
/home/tristan957/Projects/work/neon//pgxn/neon/libpagestore.c:739:1: warning: control reaches end of non-void function [-Wreturn-type]
739 | }
| ^
```
Signed-off-by: Tristan Partin <tristan@neon.tech>
## Problem
Init fork is used in DEBUG_COMPARE_LOCAL to determine unlogged relation
or unlogged build.
But it is created only after the relation is initialized and so can be
swapped out, producing `Page is evicted with zero LSN` error.
## Summary of changes
Create init fork together with main fork for unlogged relations in
DEBUG_COMPARE_LOCAL mode.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
clang produce warning about unused variable `n_synced` in
HandleSafekeeperResponse
## Summary of changes
Remove local variable.
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Pass more neon ids to compute_ctl.
Expose them to postgres as neon extension GUCs:
neon.project_id, neon.branch_id, neon.endpoint_id.
This is the compute side PR, not yet supported by cplane.
## Problem
Walproposer should get elected and commit WAL on safekeepers specified
by the membership configuration.
## Summary of changes
- Add to wp `members_safekeepers` and `new_members_safekeepers` arrays
mapping configuration members to connection slots. Establish this
mapping (by node id) when safekeeper sends greeting, giving its id and
when mconf becomes known / changes.
- Add to TermsCollected, VotesCollected,
GetAcknowledgedByQuorumWALPosition membership aware logic. Currently it
partially duplicates existing one, but we'll drop the latter eventually.
- In python, rename Configuration to MembershipConfiguration for
clarity.
- Add test_quorum_sanity testing new logic.
ref https://github.com/neondatabase/neon/issues/10851
pagestore_smgr.c had grown pretty large. Split into two parts, such
that the smgr routines that PostgreSQL code calls stays in
pagestore_smgr.c, and all the prefetching logic and other lower-level
routines related to communicating with the pageserver are moved to a
new source file, "communicator.c".
There are plans to replace communicator parts with a new
implementation. See https://github.com/neondatabase/neon/pull/10799.
This commit doesn't implement any of the new things yet, but it is
good preparation for it. I'm imagining that the new implementation
will approximately replace the current "communicator.c" code, exposing
roughly the same functions to pagestore_smgr.c.
This commit doesn't change any functionality or behavior, or make any
other changes to the existing code: It just moves existing code
around.
## Problem
Support of unlogged build in DEBUG_COMPARE_LOCAL.
Neon SMGR treats present of local file as indicator of unlogged
relations.
But it doesn't work in DEBUG_COMPARE_LOCAL mode.
## Summary of changes
Use INIT_FORKNUM as indicator of unlogged file and create this file
while unlogged index build.
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Also, move the call to the lfc_init() function. It was weird to have it
in libpagestore.c, when libpagestore.c otherwise had nothing to do with
the LFC. Move it directly into _PG_init()
## Problem
If the local file cache is shrunk, so that we punch some holes in the
underlying file, the local_cache view displays the holes incorrectly.
See https://github.com/neondatabase/neon/issues/10770
## Summary of changes
Skip hole tags in the local_cache view.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
The 'neon_read' function needs to have a different prototype on PG < 16,
because it's part of the smgr interface. But neon_read_at_lsn doesn't
have that restriction.
Remove useless and often wrong IDENTIFICATION comments. PostgreSQL
sources have them, mostly for historical reasons, but there's no need
for us to copy that style.
Remove unnecessary #includes in header files, putting the #includes
directly in the .c files that need them. The principle is that a header
file should #include other header files if they need definitions from
them, such that each header file can be compiled on its own, but not
other #includes. (There are tools to enforce that, but this was just a
manual clean up of violations that I happened to spot.)
## Problem
See https://neondb.slack.com/archives/C04DGM6SMTM/p1741594233757489
Consider the following scenario:
1. Backend A wants to prefetch some block B
2. Backend A checks that block B is not present in shared buffer
3. Backend A registers new prefetch request and calls
prefetch_do_request
4. prefetch_do_request calls neon_get_request_lsns
5. neon_get_request_lsns obtains LwLSN for block B
6. Backend B downloads B, updates and wallogs it (let say to Lsn1)
7. Block B is once again thrown from shared buffers, its LwLSN is set to
Lsn1
8. Backend A obtains current flush LSN, let's say that it is Lsn1
9. Backend A stores Lsn1 as effective_lsn in prefetch slot.
10. Backend A reads page B with LwLSN=Lsn1
11. Backend A finds in prefetch ring response for prefetch request for
block B with effective_lsn=Lsn1, so that it satisfies
neon_prefetch_response_usable condition
12. Backend A uses deteriorated version of the page!
## Summary of changes
Use `not_modified_since` as `effective_lsn`.
It should not cause some degrade of performance because we store LwLSN
when it was not found in LwLSN hash, so if page is not changed till
prefetch response is arrived, then LwLSN should not be changed.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
Commit
3da70abfa5
cause noticeable performance regression (40% in update-with-prefetch in
test_bulk_update):
https://neondb.slack.com/archives/C04BLQ4LW7K/p1742633167580879
## Summary of changes
Remove loop from pageserver_try_receive to make it fetch not more than
one response. There is still loop in `pump_prefetch_state` which can
fetch as many responses as available.
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Previously we had different meanings for the bitmask of vector IOps.
That has now been unified to "bit set = final result, no more
scribbling".
Furthermore, the LFC read path scribbled on pages that were already
read; that's probably not a good thing so that's been fixed too. In
passing, the read path of LFC has been updated to read only the
requested pages into the provided buffers, thus reducing the IO size of
vectorized IOs.
## Problem
## Summary of changes
## Problem
Macro IS_LOCAL_REL used for DEBUG_COMPARE_LOCAL mode use greater-than
rather than greater-or-equal comparison while first table really is
assigned FirstNormalObjectId.
## Summary of changes
Replace strict greater with greater-or-equal comparison.
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
Now that stuck connections are quickly terminated it's not easy to
quickly find the right port from the pid to correlate the connection
with the one seen on pageserver side.
## Summary of changes
Call getsockname() and include the local port number in the
no-response-from-pageserver log messages.
## Problem
See https://neondb.slack.com/archives/C04DGM6SMTM/p1741176713523469
The problem is that this function is using `PQgetCopyData(shard->conn,
&resp_buff.data, 1 /* async = true */)`
to try to fetch next message. But this function returns 0 if the whole
message is not present in the buffer.
And input buffer may contain only part of message so result is not
fetched.
## Summary of changes
Use `PQisBusy` + `WaitEventSetWait` to check if data is available and
`PQgetCopyData(shard->conn, &resp_buff.data, 0)` to read whole message
in this case.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
We currently have this code duplicated across different PG versions.
Moving this to an extension would reduce duplication and simplify
maintenance.
## Summary of changes
Moving the LastWrittenLSN code from PG versions to the Neon extension
and linking it with hooks.
Related Postgres PR: https://github.com/neondatabase/postgres/pull/590
Closes: https://github.com/neondatabase/neon/issues/10973
---------
Co-authored-by: Tristan Partin <tristan@neon.tech>
## Problem
Unlogged build is used for GIST/SPGIST/GIN/HNSW indexes.
In this mode we first change relation class to `RELPERSISTENCE_UNLOGGED`
and save them on local disk.
But we do not save unlogged relations in LFC.
It may cause fetching incorrect value from LFC if relfilenode is reused.
## Summary of changes
Save modified pages in LFC on second stage of unlogged build (when
modified pages are walloged).
There is no need to save pages in LFC at first phase because the will be
in any case overwritten with assigned LSN at second phase.
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
In unlogged index build (used fir GIST/SPGIST/GIN indexes) files is
created on disk and then removed at the end.
It contradicts to the logic of DEBUG_COMPARE_LOCAL mode.
## Summary of changes
Do not create and unlink files in unlogged build in DEBUG_COMPARE_LOCAL
mode.
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
DEBUG_COMPARE_LOCAL is not supported in neon_zeroextend added in PG16
## Summary of changes
Add support of DEBUG_COMPARE_LOCAL in neon_zeroextend
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
https://github.com/neondatabase/neon/issues/10851
## Summary of changes
Do some refactoring before making walproposer generations aware.
- Rename SS_VOTING to SS_WAIT_VOTING, SS_IDLE to SS_WAIT_ELECTED
- Continue to get rid of epochs: rename GetEpoch to GetLastLogTerm,
donorEpoch to donorLastLogTerm
- Instead of counting n_votes, n_connected, introduce explicit
WalProposerState (collecting terms / voting / elected). Refactor out
TermsCollected and VotesCollected; they will determine state transition
differently depending whether generations are enabled or not.
There is no new logic in this PR and thus no new tests.
## Problem
Async prefetch in LFC PR cause incorrect calculation of LFC
`used_pages`when page is overwritten
## Summary of changes
Decrement `used_pages` is page is overwritten.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
This allows for improved decoding of otherwise broken WAL.
## Problem
Currently, if (or when) a WAL record has a wrong prevptr, that breaks
decoding. With this, we don't have to break on that if we decide it's OK
to proceed after that.
## Summary of changes
Use a Neon GUC to allow the system to enable the NEON-specific
skip_lsn_checks option in XLogReader.
## Problem
See https://neondb.slack.com/archives/C08DE6Q9C3B
Sometimes compute is not able to receive responses from PS for a long
time (minutes).
I do not think that the problem is at compute side, but in order to
exclude this possibility I wan to see more information about connection
state at compute side, particularly amount of data cached in connection
buffer.
## Summary of changes
Right now we are dumping state of socket buffer.
This PR adds printing state of connection buffer.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>