We set it for neon replica, if primary is running.
Postgres uses this GUC at the start,
to determine if replica should wait for
RUNNING_XACTS from primary or not.
Corresponding cloud PR is
https://github.com/neondatabase/cloud/pull/10183
* Add test hot-standby replica startup.
* Extract oldest_running_xid from XlRunningXits WAL records.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
This PR contains the first version of a
[FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc)
simulation testing for safekeeper and walproposer.
### desim
This is a core "framework" for running determenistic simulation. It
operates on threads, allowing to test syncronous code (like walproposer).
`libs/desim/src/executor.rs` contains implementation of a determenistic
thread execution. This is achieved by blocking all threads, and each
time allowing only a single thread to make an execution step. All
executor's threads are blocked using `yield_me(after_ms)` function. This
function is called when a thread wants to sleep or wait for an external
notification (like blocking on a channel until it has a ready message).
`libs/desim/src/chan.rs` contains implementation of a channel (basic
sync primitive). It has unlimited capacity and any thread can push or
read messages to/from it.
`libs/desim/src/network.rs` has a very naive implementation of a network
(only reliable TCP-like connections are supported for now), that can
have arbitrary delays for each package and failure injections for
breaking connections with some probability.
`libs/desim/src/world.rs` ties everything together, to have a concept of
virtual nodes that can have network connections between them.
### walproposer_sim
Has everything to run walproposer and safekeepers in a simulation.
`safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`,
`send_wal.rs` and `timelines_global_map.rs`.
`walproposer_api.rs` implements all walproposer callback to use
simulation library.
`simulation.rs` defines a schedule – a set of events like `restart <sk>`
or `write_wal` that should happen at time `<ts>`. It also has code to
spawn walproposer/safekeeper threads and provide config to them.
### tests
`simple_test.rs` has tests that just start walproposer and 3 safekeepers
together in a simulation, and tests that they are not crashing right
away.
`misc_test.rs` has tests checking more advanced simulation cases, like
crashing or restarting threads, testing memory deallocation, etc.
`random_test.rs` is the main test, it checks thousands of random seeds
(schedules) for correctness. It roughly corresponds to running a real
python integration test in an environment with very unstable network and
cpu, but in a determenistic way (each seed results in the same execution
log) and much much faster.
Closes#547
---------
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
It hanged if file size is less than of a normal segment. Normally that doesn't
happen, but it might in case of crash during segment init. We're going to fix
that half initialized segment by durably renaming it after cooking, so this fix
won't be needed, but better avoid busy loop anyway.
fixes https://github.com/neondatabase/neon/issues/6401
## Problem
See https://neondb.slack.com/archives/C06F5UJH601/p1705731304237889
Adding 1 to xid in `update_next_xid` can cause overflow in debug mode.
0xffffffff is valid transaction ID.
## Summary of changes
Use `wrapping_add`
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
## Problem
See #5468.
## Summary of changes
Add a new `get_timestamp_of_lsn` endpoint, returning the timestamp
associated with the given lsn.
Fixes#5468.
---------
Co-authored-by: Shany Pozin <shany@neon.tech>
## Problem
close https://github.com/neondatabase/neon/issues/4266
## Summary of changes
With this PR, rust-analyzer should be able to give lints and auto
complete in `mod tests`, and this makes writing tests easier.
Previously, rust-analyzer cannot do auto completion.
---------
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Notes:
- This still needs UI support from the Console
- I've not tuned any GUCs for PostgreSQL to make this work better
- Safekeeper has gotten a tweak in which WAL is sent and how: It now
sends zero-ed WAL data from the start of the timeline's first segment up to
the first byte of the timeline to be compatible with normal PostgreSQL
WAL streaming.
- This includes the commits of #3714
Fixes one part of https://github.com/neondatabase/neon/issues/769
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
1.66 release speeds up compile times for over 10% according to tests.
Also its Clippy finds plenty of old nits in our code:
* useless conversion, `foo as u8` where `foo: u8` and similar, removed
`as u8` and similar
* useless references and dereferenced (that were automatically adjusted
by the compiler), removed various `&` and `*`
* bool -> u8 conversion via `if/else`, changed to `u8::from`
* Map `.iter()` calls where only values were used, changed to
`.values()` instead
Standing out lints:
* `Eq` is missing in our protoc generated structs. Silenced, does not
seem crucial for us.
* `fn default` looks like the one from `Default` trait, so I've
implemented that instead and replaced the `dummy_*` method in tests with
`::default()` invocation
* Clippy detected that
```
if retry_attempt < u32::MAX {
retry_attempt += 1;
}
```
is a saturating add and proposed to replace it.
- Split postgres_ffi into two version specific files.
- Preserve pg_version in timeline metadata.
- Use pg_version in safekeeper code. Check for postgres major version mismatch.
- Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding.
- Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture.
To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable:
'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress'
Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.
Another preparatory commit for pg15 support:
* generate bindings for both pg14 and pg15;
* update Makefile and CI scripts: now neon build depends on both PostgreSQL versions;
* some code refactoring to decrease version-specific dependencies.
* Add submodule postgres-15
* Support pg_15 in pgxn/neon
* Renamed zenith -> neon in Makefile
* fix name of codestyle check
* Refactor build system to prepare for building multiple Postgres versions.
Rename "vendor/postgres" to "vendor/postgres-v14"
Change Postgres build and install directory paths to be version-specific:
- tmp_install/build -> pg_install/build/14
- tmp_install/* -> pg_install/14/*
And Makefile targets:
- "make postgres" -> "make postgres-v14"
- "make postgres-headers" -> "make postgres-v14-headers"
- etc.
Add Makefile aliases:
- "make postgres" to build "postgres-v14" and in future, "postgres-v15"
- similarly for "make postgres-headers"
Fix POSTGRES_DISTRIB_DIR path in pytest scripts
* Make postgres version a variable in codestyle workflow
* Support vendor/postgres-v15 in codestyle check workflow
* Support postgres-v15 building in Makefile
* fix pg version in Dockerfile.compute-node
* fix kaniko path
* Build neon extensions in version-specific directories
* fix obsolete mentions of vendor/postgres
* use vendor/postgres-v14 in Dockerfile.compute-node.legacy
* Use PG_VERSION_NUM to gate dependencies in inmem_smgr.c
* Use versioned ECR repositories and image names for compute-node.
The image name format is compute-node-vXX, where XX is postgres major version number.
For now only v14 is supported.
Old format unversioned name (compute-node) is left, because cloud repo depends on it.
* update vendor/postgres submodule url (zenith->neondatabase rename)
* Fix postgres path in python tests after rebase
* fix path in regress test
* Use separate dockerfiles to build compute-node:
Dockerfile.compute-node-v15 should be identical to Dockerfile.compute-node-v14 except for the version number.
This is a hack, because Kaniko doesn't support build ARGs properly
* bump vendor/postgres-v14 and vendor/postgres-v15
* Don't use Kaniko cache for v14 and v15 compute-node images
* Build compute-node images for different versions in different jobs
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Re-export only things that are used by other modules.
In the future, I'm imagining that we run bindgen twice, for Postgres
v14 and v15. The two sets of bindings would go into separate
'bindings_v14' and 'bindings_v15' modules.
Rearrange postgres_ffi modules.
Move function, to avoid Postgres version dependency in timelines.rs
Move function to generate a logical-message WAL record to postgres_ffi.
* Deduce `last_segment` automatically
* Get rid of local `wal_dir`/`wal_seg_size` variables
* Prepare to test parsing of WAL from multiple specific points, not just the start;
extract `check_end_of_wal` function to check both partial and non-partial WAL segments.
It would be better to not update xl_crc/rec_hdr at all when skipping contrecord,
but I would prefer to keep PR #1574 small.
Better audit of `find_end_of_wal_segment` is coming anyway in #544.
Previous invariant: `crc` contains an "unfinalized" CRC32 value,
its one complement, like in postgres before FIN_CRC32C.
New invariant: `crc` always contains a "finalized" CRC32 value,
this is the semantics of crc32c_append, so we don't need to invert CRC manually.
* Actual generation logic is in a separate crate `postgres_ffi/wal_generate`
* The create also provides a binary for debug purposes akin to `initdb`
* Two tests currently fail and are ignored
* There is no easy way to test this directly in Safekeeper as it starts restoring from commit_lsn.
So testing would require disconnecting Safekeeper just after it has received the WAL,
but before it is committed.
A new `get_lsn_by_timestamp` command is added to the libpq page service
API.
An extra timestamp field is now stored in an extra field after each
Clog page. It is the timestamp of the latest commit, among all the
transactions on the Clog page. To find the overall latest commit, we
need to scan all Clog pages, but this isn't a very frequent operation
so that's not too bad.
To find the LSN that corresponds to a timestamp, we perform a binary
search. The binary search starts with min = last LSN when GC ran, and
max = latest LSN on the timeline. On each iteration of the search we
check if there are any commits with a higher-than-requested timestamp
at that LSN.
Implements github issue 1361.