Kirill Bulatov
995a2de21e
Share exponential backoff code and fix logic for delete task failure ( #2252 )
2022-08-11 23:21:06 +03:00
Arseny Sher
e593cbaaba
Add pageserver checkpoint_timeout option.
...
To flush inmemory layer eventually when no new data arrives, which helps
safekeepers to suspend activity (stop pushing to the broker). Default 10m should
be ok.
2022-08-11 22:54:09 +03:00
Heikki Linnakangas
4b9e02be45
Update back vendor/postgres back; it was changed accidentally. ( #2251 )
...
Commit 4227cfc96e accidentally reverted vendor/postgres to an older
version. Update it back.
2022-08-11 19:25:08 +03:00
Kirill Bulatov
7a36d06cc2
Fix exponential backoff values
2022-08-11 08:34:57 +03:00
Konstantin Knizhnik
4227cfc96e
Safe truncate ( #2218 )
...
* Move relation sie cache to layered timeline
* Fix obtaining current LSN for relation size cache
* Resolve merge conflicts
* Resolve merge conflicts
* Reestore 'lsn' field in DatadirModification
* adjust DatadirModification lsn in ingest_record
* Fix formatting
* Pass lsn to get_relsize
* Fix merge conflict
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
* Check if relation exists before trying to truncat it
refer #1932
* Add test reporducing FSM truncate problem
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
2022-08-09 22:45:33 +03:00
Dmitry Rodionov
1fc761983f
support node id and remote storage params in docker_entrypoint.sh
2022-08-09 18:59:00 +03:00
Stas Kelvich
227d47d2f3
Update CONTRIBUTING.md
2022-08-09 14:18:25 +03:00
Stas Kelvich
0290893bcc
Update CONTRIBUTING.md
2022-08-09 14:18:25 +03:00
Heikki Linnakangas
32fd709b34
Fix links to safekeeper protocol docs. ( #2188 )
...
safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in
commit 0b14fdb078 , as part of reorganizing the docs into 'mdbook' format.
Fixes issue #1475 . Thanks to @banks for spotting the outdated references.
In addition to fixing the above issue, this patch also fixes other broken links as a result of 0b14fdb078 . See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480 .
Co-authored-by: Heikki Linnakangas <heikki@neon.tech >
Co-authored-by: Thang Pham <thang@neon.tech >
2022-08-09 10:19:18 +07:00
Kirill Bulatov
3a9bff81db
Fix etcd typos
2022-08-08 19:04:46 +03:00
bojanserafimov
743370de98
Major migration script ( #2073 )
...
This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview.
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech >
2022-08-08 17:52:28 +02:00
Dmitry Rodionov
cdfa9fe705
avoid duplicate parameter, increase timeout
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
7cd68a0c27
increase timeout to pass test with real s3
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
beaa991f81
remove debug log
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
9430abae05
use event so it fires only if workload thread successfully finished
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
4da4c7f769
increase statement timeout
2022-08-08 12:15:16 +03:00
Dmitry Rodionov
0d14d4a1a8
ignore record property warning to fix benchmarks
2022-08-08 12:15:16 +03:00
bojanserafimov
8c8431ebc6
Add more buckets to pageserver latency metrics ( #2225 )
2022-08-06 11:45:47 +02:00
Ankur Srivastava
84d1bc06a9
refactor: replace lazy-static with once-cell ( #2195 )
...
- Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy`
- fixes #1147
Signed-off-by: Ankur Srivastava <best.ankur@gmail.com >
2022-08-05 19:34:04 +02:00
Konstantin Knizhnik
5133db44e1
Move relation size cache from WalIngest to DatadirTimeline ( #2094 )
...
* Move relation sie cache to layered timeline
* Fix obtaining current LSN for relation size cache
* Resolve merge conflicts
* Resolve merge conflicts
* Reestore 'lsn' field in DatadirModification
* adjust DatadirModification lsn in ingest_record
* Fix formatting
* Pass lsn to get_relsize
* Fix merge conflict
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
* Update pageserver/src/pgdatadir_mapping.rs
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech >
2022-08-05 16:28:59 +03:00
Alexander Bayandin
4cb1074fe5
github/workflows: Fix git dubious ownership ( #2223 )
2022-08-05 13:44:57 +01:00
Arthur Petukhovsky
0a958b0ea1
Check find_end_of_wal errors instead of unwrap
2022-08-04 17:56:19 +03:00
Vadim Kharitonov
1bbc8090f3
[issue #1591 ] Add neon_local pageserver status handler
2022-08-04 16:38:29 +03:00
Dmitry Rodionov
f7d8db7e39
silence https://github.com/neondatabase/neon/issues/2211
2022-08-04 16:32:19 +03:00
Dmitry Rodionov
e54941b811
treat pytest warnings as errors
2022-08-04 16:32:19 +03:00
Heikki Linnakangas
52ce1c9d53
Speed up test shutdown, by polling more frequently.
...
A fair amount of the time in our python tests is spent waiting for the
pageserver and safekeeper processes to shut down. It doesn't matter so
much when you're running a lot of tests in parallel, but it's quite
noticeable when running them sequentially.
A big part of the slowness is that is that after sending the SIGTERM
signal, we poll to see if the process is still running, and the
polling happened at 1 s interval. Reduce it to 0.1 s.
2022-08-04 12:57:15 +03:00
Dmitry Rodionov
bc2cb5382b
run real s3 tests in CI
2022-08-04 11:14:05 +03:00
Dmitry Rodionov
5f71aa09d3
support running tests against real s3 implementation without mocking
2022-08-04 11:14:05 +03:00
Dmitry Rodionov
b4f2c5b514
run benchmarks conditionally, on main or if run_benchmarks label is set
2022-08-03 01:36:14 +03:00
Alexander Bayandin
71f39bac3d
github/workflows: upload artifacts to S3 ( #2071 )
2022-08-02 13:57:26 +01:00
Stas Kelvich
177d5b1f22
Bump postgres to get uuid extension
2022-08-02 11:16:26 +03:00
dependabot[bot]
8ba41b8c18
Bump pywin32 from 227 to 301 ( #2202 )
2022-08-01 19:08:09 +01:00
Dmitry Rodionov
1edf3eb2c8
increase timeout so mac os job can finish the build with all cache misses
2022-08-01 18:28:49 +03:00
Dmitry Rodionov
0ebb6bc4b0
Temporary pin Werkzeug version because moto hangs with newer one. See https://github.com/spulec/moto/issues/5341
2022-08-01 18:28:49 +03:00
Dmitry Rodionov
092a9b74d3
use only s3 in boto3-stubs and update mypy
...
Newer version of mypy fixes buggy error when trying to update only boto3 stubs.
However it brings new checks and starts to yell when we index into
cusror.fetchone without checking for None first. So this introduces a wrapper
to simplify quering for scalar values. I tried to use cursor_factory connection
argument but without success. There can be a better way to do that,
but this looks the simplest
2022-08-01 18:28:49 +03:00
Ankur Srivastava
e73b95a09d
docs: linked poetry related step in tests section
...
Added the link to the dependencies which should be installed
before running the tests.
2022-08-01 18:13:01 +03:00
Alexander Bayandin
539007c173
github/workflows: make bash more strict ( #2197 )
2022-08-01 12:54:39 +01:00
Heikki Linnakangas
d0494c391a
Remove wal_receiver mgmt API endpoint
...
Move all the fields that were returned by the wal_receiver endpoint into
timeline_detail. Internally, move those fields from the separate global
WAL_RECEIVERS hash into the LayeredTimeline struct. That way, all the
information about a timeline is kept in one place.
In the passing, I noted that the 'thread_id' field was removed from
WalReceiverEntry in commit e5cb727572 , but it forgot to update
openapi_spec.yml. This commit removes that too.
2022-07-29 20:51:37 +03:00
Kirill Bulatov
2af5a96f0d
Back off when reenqueueing delete tasks
2022-07-29 19:04:40 +03:00
Vadim Kharitonov
9733b24f4a
Fix README.md: Fixed several typos and changed a bit documentation for
...
OSX
2022-07-29 19:03:57 +03:00
Heikki Linnakangas
d865892a06
Print full error with stacktrace, if compute node startup fails.
...
It failed in staging environment a few times, and all we got in the
logs was:
ERROR could not start the compute node: failed to get basebackup@0/2D6194F8 from pageserver host=zenith-us-stage-ps-2.local port=6400
giving control plane 30s to collect the error before shutdown
That's missing all the detail on *why* it failed.
2022-07-29 16:41:55 +03:00
Heikki Linnakangas
a0f76253f8
Bump Postgres version.
...
This brings in the inclusion of 'uuid-ossp' extension.
2022-07-29 16:30:39 +03:00
Heikki Linnakangas
02afa2762c
Move Tenant- and TimelineInfo structs to models.rs.
...
They are part of the management API response structs. Let's try to
concentrate everything that's part of the API in models.rs.
2022-07-29 15:02:15 +03:00
Heikki Linnakangas
d903dd61bd
Rename 'wal_producer_connstr' to 'wal_source_connstr'.
...
What the WAL receiver really connects to is the safekeeper. The
"producer" term is a bit misleading, as the safekeeper doesn't produce
the WAL, the compute node does.
This change also applies to the name of the field used in the mgmt API
in in the response of the
'/v1/tenant/:tenant_id/timeline/:timeline_id/wal_receiver' endpoint.
AFAICS that's not used anywhere else than one python test, so it
should be OK to change it.
2022-07-29 09:09:22 +03:00
Thang Pham
417d9e9db2
Add current physical size to tenant status endpoint ( #2173 )
...
Ref #1902
2022-07-28 13:59:20 -04:00
Alexander Bayandin
6ace347175
github/workflows: unpause stress env deployment ( #2180 )
...
This reverts commit 4446791397 .
2022-07-28 18:37:21 +01:00
Alexander Bayandin
14a027cce5
Makefile: get openssl prefix dynamically ( #2179 )
2022-07-28 17:05:30 +01:00
Arthur Petukhovsky
09ddd34b2a
Fix checkpoints race condition in safekeeper tests ( #2175 )
...
We should wait for WAL to arrive to pageserver before calling CHECKPOINT
2022-07-28 15:44:02 +03:00
Arthur Petukhovsky
aeb3f0ea07
Refactor test_race_conditions ( #2162 )
...
Do not use python multiprocessing, make the test async
2022-07-28 14:38:37 +03:00
Kirill Bulatov
58b04438f0
Tweak backoff numbers to avoid no wal connection threshold trigger
2022-07-27 22:16:40 +03:00