Heikki Linnakangas
e94a5ce360
Rename pg_control_ffi.h to bindgen_deps.h, for clarity.
...
The pg_control_ffi.h name implies that it only includes stuff related to
pg_control.h. That's mostly true currently, but really the point of the
file is to include everything that we need to generate Rust definitions
from.
2022-08-16 19:37:36 +03:00
Dmitry Rodionov
d5ec84b87b
reset rust cache for clippy run to avoid an ICE
...
additionally remove trailing whitespaces
2022-08-16 18:49:32 +03:00
Dmitry Rodionov
b21f7382cc
split out timeline metrics, track layer map loading and size calculation
2022-08-16 18:49:32 +03:00
Kirill Bulatov
648e8bbefe
Fix 1.63 clippy lints ( #2282 )
2022-08-16 18:49:22 +03:00
Rory de Zoete
9218426e41
Fix docker zombie process issue ( #2289 )
...
* Fix docker zombie process issue
* Init everywhere
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box >
2022-08-16 17:24:58 +02:00
Rory de Zoete
1d4114183c
Use main, not branch for ref check ( #2288 )
...
* Use main, not branch for ref check
* Add more debug
* Count main, not head
* Try new approach
* Conform to syntax
* Update approach
* Get full history
* Skip checkout
* Cleanup debug
* Remove more debug
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box >
2022-08-16 15:41:31 +02:00
Rory de Zoete
4cde0e7a37
Error for fatal not git repo ( #2286 )
...
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box >
2022-08-16 13:59:41 +02:00
Rory de Zoete
83f7b8ed22
Add missing step output, revert one deploy step ( #2285 )
...
* Add missing step output, revert one deploy step
* Conform to syntax
* Update approach
* Add missing value
* Add missing needs
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box >
2022-08-16 13:41:51 +02:00
Rory de Zoete
b8f0f37de2
Gen2 GH runner ( #2128 )
...
* Re-add rustup override
* Try s3 bucket
* Set git version
* Use v4 cache key to prevent problems
* Switch to v5 for key
* Add second rustup fix
* Rebase
* Add kaniko steps
* Fix typo and set compress level
* Disable global run default
* Specify shell for step
* Change approach with kaniko
* Try less verbose shell spec
* Add submodule pull
* Add promote step
* Adjust dependency chain
* Try default swap again
* Use env
* Don't override aws key
* Make kaniko build conditional
* Specify runs on
* Try without dependency link
* Try soft fail
* Use image with git
* Try passing to next step
* Fix duplicate
* Try other approach
* Try other approach
* Fix typo
* Try other syntax
* Set env
* Adjust setup
* Try step 1
* Add link
* Try global env
* Fix mistake
* Debug
* Try other syntax
* Try other approach
* Change order
* Move output one step down
* Put output up one level
* Try other syntax
* Skip build
* Try output
* Re-enable build
* Try other syntax
* Skip middle step
* Update check
* Try first step of dockerhub push
* Update needs dependency
* Try explicit dir
* Add missing package
* Try other approach
* Try other approach
* Specify region
* Use with
* Try other approach
* Add debug
* Try other approach
* Set region
* Follow AWS example
* Try github approach
* Skip Qemu
* Try stdin
* Missing steps
* Add missing close
* Add echo debug
* Try v2 endpoint
* Use v1 endpoint
* Try without quotes
* Revert
* Try crane
* Add debug
* Split steps
* Fix duplicate
* Add shell step
* Conform to options
* Add verbose flag
* Try single step
* Try workaround
* First request fails hunch
* Try bullseye image
* Try other approach
* Adjust verbose level
* Try previous step
* Add more debug
* Remove debug step
* Remove rogue indent
* Try with larger image
* Add build tag step
* Update workflow for testing
* Add tag step for test
* Remove unused
* Update dependency chain
* Add ownership fix
* Use matrix for promote
* Force update
* Force build
* Remove unused
* Add new image
* Add missing argument
* Update dockerfile copy
* Update Dockerfile
* Update clone
* Update dockerfile
* Go to correct folder
* Use correct format
* Update dockerfile
* Remove cd
* Debug find where we are
* Add debug on first step
* Changedir to postgres
* Set workdir
* Use v1 approach
* Use other dependency
* Try other approach
* Try other approach
* Update dockerfile
* Update approach
* Update dockerfile
* Update approach
* Update dockerfile
* Update dockerfile
* Add workspace hack
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Change last step
* Cleanup pull in prep for review
* Force build images
* Add condition for latest tagging
* Use pinned version
* Try without name value
* Remove more names
* Shorten names
* Add kaniko comments
* Pin kaniko
* Pin crane and ecr helper
* Up one level
* Switch to pinned tag for rust image
* Force update for test
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box >
Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box >
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box >
Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box >
Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box >
Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box >
Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box >
Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box >
2022-08-16 11:15:35 +02:00
Kirill Bulatov
18f251384d
Check for entire range during sasl validation ( #2281 )
2022-08-16 11:10:38 +03:00
Alexander Bayandin
4cddb0f1a4
Set up a workflow to run pgbench against captest ( #2077 )
2022-08-15 18:54:31 +01:00
Arseny Sher
7b12deead7
Bump vendor/postgres to include XLP_FIRST_IS_CONTRECORD fix. ( #2274 )
2022-08-15 18:24:24 +03:00
Dmitry Rodionov
63a72d99bb
increase timeout in wait_for_upload to avoid spurious failures when testing with real s3
2022-08-15 18:02:27 +03:00
Arthur Petukhovsky
116ecdf87a
Improve walreceiver logic ( #2253 )
...
This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers.
- There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237
- Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down.
- Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second.
- `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast.
- `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck.
2022-08-15 13:31:26 +03:00
Arseny Sher
431393e361
Find end of WAL on safekeepers using WalStreamDecoder.
...
We could make it inside wal_storage.rs, but taking into account that
- wal_storage.rs reading is async
- we don't need s3 here
- error handling is different; error during decoding is normal
I decided to put it separately.
Test
cargo test test_find_end_of_wal_last_crossing_segment
prepared earlier by @yeputons passes now.
Fixes https://github.com/neondatabase/neon/issues/544
https://github.com/neondatabase/cloud/issues/2004
Supersedes https://github.com/neondatabase/neon/pull/2066
2022-08-14 14:47:14 +03:00
Kirill Bulatov
f38f45b01d
Better storage sync logs ( #2268 )
2022-08-13 10:58:14 +03:00
Andrey Taranik
a5154dce3e
get_binaries script fix ( #2263 )
...
* get_binaries uses DOCKER_TAG taken from docker image build step
* remove docker tag discovery at all and fix get_binaries for version variable
2022-08-12 20:35:26 +03:00
Alexander Bayandin
da5f8486ce
test_runner/pg_clients: collect docker logs ( #2259 )
2022-08-12 17:03:09 +01:00
Dmitry Ivanov
ad08c273d3
[proxy] Rework wire format of the password hack and some errors ( #2236 )
...
The new format has a few benefits: it's shorter, simpler and
human-readable as well. We don't use base64 anymore, since
url encoding got us covered.
We also show a better error in case we couldn't parse the
payload; the users should know it's all about passing the
correct project name.
2022-08-12 17:38:43 +03:00
Andrey Taranik
7f97269277
get_binaries uses DOCKER_TAG taken from docker image build step ( #2260 )
2022-08-12 16:01:22 +03:00
Thang Pham
6d99b4f1d8
disable test_import_from_pageserver_multisegment ( #2258 )
...
This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure.
See: #2255 , #2256
2022-08-12 19:13:42 +07:00
Egor Suvorov
a7bf60631f
postgres_ffi/waldecoder: introduce explicit enum State
...
Previously it was emulated with a combination of nullable fields.
This change should make the logic more readable.
2022-08-12 11:40:46 +03:00
Egor Suvorov
07bb7a2afe
postgres_ffi/waldecoder: remove unused startlsn
2022-08-12 11:40:46 +03:00
Egor Suvorov
142e247e85
postgres_ffi/waldecoder: validate more header fields
2022-08-12 11:40:46 +03:00
Thang Pham
7da47d8a0a
Fix timeline physical size flaky tests ( #2244 )
...
Resolves #2212 .
- use `wait_for_last_flush_lsn` in `test_timeline_physical_size_*` tests
## Context
Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk.
### Note
The `LayerFlushThread` threads are processed **concurrently**, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly
2022-08-12 14:28:50 +07:00
Thang Pham
dc52436a8f
Fix bug when import large (>1GB) relations ( #2172 )
...
Resolves #2097
- use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get`
- update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`.
+ `small` is the old test
+ `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build
2022-08-12 09:24:20 +07:00
Kirill Bulatov
995a2de21e
Share exponential backoff code and fix logic for delete task failure ( #2252 )
2022-08-11 23:21:06 +03:00
Arseny Sher
e593cbaaba
Add pageserver checkpoint_timeout option.
...
To flush inmemory layer eventually when no new data arrives, which helps
safekeepers to suspend activity (stop pushing to the broker). Default 10m should
be ok.
2022-08-11 22:54:09 +03:00
Heikki Linnakangas
4b9e02be45
Update back vendor/postgres back; it was changed accidentally. ( #2251 )
...
Commit 4227cfc96e accidentally reverted vendor/postgres to an older
version. Update it back.
2022-08-11 19:25:08 +03:00
Kirill Bulatov
7a36d06cc2
Fix exponential backoff values
2022-08-11 08:34:57 +03:00
Konstantin Knizhnik
4227cfc96e
Safe truncate ( #2218 )
...
* Move relation sie cache to layered timeline
* Fix obtaining current LSN for relation size cache
* Resolve merge conflicts
* Resolve merge conflicts
* Reestore 'lsn' field in DatadirModification
* adjust DatadirModification lsn in ingest_record
* Fix formatting
* Pass lsn to get_relsize
* Fix merge conflict
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
* Check if relation exists before trying to truncat it
refer #1932
* Add test reporducing FSM truncate problem
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
2022-08-09 22:45:33 +03:00
Dmitry Rodionov
1fc761983f
support node id and remote storage params in docker_entrypoint.sh
2022-08-09 18:59:00 +03:00
Stas Kelvich
227d47d2f3
Update CONTRIBUTING.md
2022-08-09 14:18:25 +03:00
Stas Kelvich
0290893bcc
Update CONTRIBUTING.md
2022-08-09 14:18:25 +03:00
Heikki Linnakangas
32fd709b34
Fix links to safekeeper protocol docs. ( #2188 )
...
safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in
commit 0b14fdb078 , as part of reorganizing the docs into 'mdbook' format.
Fixes issue #1475 . Thanks to @banks for spotting the outdated references.
In addition to fixing the above issue, this patch also fixes other broken links as a result of 0b14fdb078 . See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480 .
Co-authored-by: Heikki Linnakangas <heikki@neon.tech >
Co-authored-by: Thang Pham <thang@neon.tech >
2022-08-09 10:19:18 +07:00
Kirill Bulatov
3a9bff81db
Fix etcd typos
2022-08-08 19:04:46 +03:00
bojanserafimov
743370de98
Major migration script ( #2073 )
...
This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview.
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech >
2022-08-08 17:52:28 +02:00
Dmitry Rodionov
cdfa9fe705
avoid duplicate parameter, increase timeout
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
7cd68a0c27
increase timeout to pass test with real s3
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
beaa991f81
remove debug log
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
9430abae05
use event so it fires only if workload thread successfully finished
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
4da4c7f769
increase statement timeout
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
0d14d4a1a8
ignore record property warning to fix benchmarks
2022-08-08 12:15:16 +03:00
bojanserafimov
8c8431ebc6
Add more buckets to pageserver latency metrics ( #2225 )
2022-08-06 11:45:47 +02:00
Ankur Srivastava
84d1bc06a9
refactor: replace lazy-static with once-cell ( #2195 )
...
- Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy`
- fixes #1147
Signed-off-by: Ankur Srivastava <best.ankur@gmail.com >
2022-08-05 19:34:04 +02:00
Konstantin Knizhnik
5133db44e1
Move relation size cache from WalIngest to DatadirTimeline ( #2094 )
...
* Move relation sie cache to layered timeline
* Fix obtaining current LSN for relation size cache
* Resolve merge conflicts
* Resolve merge conflicts
* Reestore 'lsn' field in DatadirModification
* adjust DatadirModification lsn in ingest_record
* Fix formatting
* Pass lsn to get_relsize
* Fix merge conflict
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
2022-08-05 16:28:59 +03:00
Alexander Bayandin
4cb1074fe5
github/workflows: Fix git dubious ownership ( #2223 )
2022-08-05 13:44:57 +01:00
Arthur Petukhovsky
0a958b0ea1
Check find_end_of_wal errors instead of unwrap
2022-08-04 17:56:19 +03:00
Vadim Kharitonov
1bbc8090f3
[issue #1591 ] Add neon_local pageserver status handler
2022-08-04 16:38:29 +03:00
Dmitry Rodionov
f7d8db7e39
silence https://github.com/neondatabase/neon/issues/2211
2022-08-04 16:32:19 +03:00