rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-06-04 05:50:38 +00:00

Author	SHA1	Message	Date
Arthur Petukhovsky	873347f977	Merge pull request #2275 from neondatabase/main * github/workflows: Fix git dubious ownership (#2223) * Move relation size cache from WalIngest to DatadirTimeline (#2094) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * refactor: replace lazy-static with once-cell (#2195) - Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy` - fixes #1147 Signed-off-by: Ankur Srivastava <best.ankur@gmail.com> * Add more buckets to pageserver latency metrics (#2225) * ignore record property warning to fix benchmarks * increase statement timeout * use event so it fires only if workload thread successfully finished * remove debug log * increase timeout to pass test with real s3 * avoid duplicate parameter, increase timeout * Major migration script (#2073) This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview. Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> * Fix etcd typos * Fix links to safekeeper protocol docs. (#2188) safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in commit `0b14fdb078`, as part of reorganizing the docs into 'mdbook' format. Fixes issue #1475. Thanks to @banks for spotting the outdated references. In addition to fixing the above issue, this patch also fixes other broken links as a result of `0b14fdb078`. See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480. Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech> * Update CONTRIBUTING.md * Update CONTRIBUTING.md * support node id and remote storage params in docker_entrypoint.sh * Safe truncate (#2218) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Check if relation exists before trying to truncat it refer #1932 * Add test reporducing FSM truncate problem Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Fix exponential backoff values * Update back `vendor/postgres` back; it was changed accidentally. (#2251) Commit `4227cfc96e` accidentally reverted vendor/postgres to an older version. Update it back. * Add pageserver checkpoint_timeout option. To flush inmemory layer eventually when no new data arrives, which helps safekeepers to suspend activity (stop pushing to the broker). Default 10m should be ok. * Share exponential backoff code and fix logic for delete task failure (#2252) * Fix bug when import large (>1GB) relations (#2172) Resolves #2097 - use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get` - update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. + `small` is the old test + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build * Fix timeline physical size flaky tests (#2244) Resolves #2212. - use `wait_for_last_flush_lsn` in `test_timeline_physical_size_` tests ## Context Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk. ### Note The `LayerFlushThread` threads are processed concurrently, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly postgres_ffi/waldecoder: validate more header fields * postgres_ffi/waldecoder: remove unused startlsn * postgres_ffi/waldecoder: introduce explicit `enum State` Previously it was emulated with a combination of nullable fields. This change should make the logic more readable. * disable `test_import_from_pageserver_multisegment` (#2258) This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure. See: #2255, #2256 * get_binaries uses DOCKER_TAG taken from docker image build step (#2260) * [proxy] Rework wire format of the password hack and some errors (#2236) The new format has a few benefits: it's shorter, simpler and human-readable as well. We don't use base64 anymore, since url encoding got us covered. We also show a better error in case we couldn't parse the payload; the users should know it's all about passing the correct project name. * test_runner/pg_clients: collect docker logs (#2259) * get_binaries script fix (#2263) * get_binaries uses DOCKER_TAG taken from docker image build step * remove docker tag discovery at all and fix get_binaries for version variable * Better storage sync logs (#2268) * Find end of WAL on safekeepers using WalStreamDecoder. We could make it inside wal_storage.rs, but taking into account that - wal_storage.rs reading is async - we don't need s3 here - error handling is different; error during decoding is normal I decided to put it separately. Test cargo test test_find_end_of_wal_last_crossing_segment prepared earlier by @yeputons passes now. Fixes https://github.com/neondatabase/neon/issues/544 https://github.com/neondatabase/cloud/issues/2004 Supersedes https://github.com/neondatabase/neon/pull/2066 * Improve walreceiver logic (#2253) This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers. - There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237 - Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down. - Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second. - `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast. - `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck. * increase timeout in wait_for_upload to avoid spurious failures when testing with real s3 * Bump vendor/postgres to include XLP_FIRST_IS_CONTRECORD fix. (#2274) * Set up a workflow to run pgbench against captest (#2077) Signed-off-by: Ankur Srivastava <best.ankur@gmail.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Ankur Srivastava <ansrivas@users.noreply.github.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech> Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com> Co-authored-by: Arseny Sher <sher-ars@yandex.ru> Co-authored-by: Egor Suvorov <egor@neon.tech> Co-authored-by: Andrey Taranik <andrey@cicd.team> Co-authored-by: Dmitry Ivanov <ivadmi5@gmail.com>	2022-08-15 21:30:45 +03:00
Alexander Bayandin	4cddb0f1a4	Set up a workflow to run pgbench against captest (#2077 )	2022-08-15 18:54:31 +01:00
Arseny Sher	7b12deead7	Bump vendor/postgres to include XLP_FIRST_IS_CONTRECORD fix. (#2274 )	2022-08-15 18:24:24 +03:00
Dmitry Rodionov	63a72d99bb	increase timeout in wait_for_upload to avoid spurious failures when testing with real s3	2022-08-15 18:02:27 +03:00
Arthur Petukhovsky	116ecdf87a	Improve walreceiver logic (#2253 ) This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers. - There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237 - Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down. - Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second. - `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast. - `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck.	2022-08-15 13:31:26 +03:00
Arseny Sher	431393e361	Find end of WAL on safekeepers using WalStreamDecoder. We could make it inside wal_storage.rs, but taking into account that - wal_storage.rs reading is async - we don't need s3 here - error handling is different; error during decoding is normal I decided to put it separately. Test cargo test test_find_end_of_wal_last_crossing_segment prepared earlier by @yeputons passes now. Fixes https://github.com/neondatabase/neon/issues/544 https://github.com/neondatabase/cloud/issues/2004 Supersedes https://github.com/neondatabase/neon/pull/2066	2022-08-14 14:47:14 +03:00
Kirill Bulatov	f38f45b01d	Better storage sync logs (#2268 )	2022-08-13 10:58:14 +03:00
Andrey Taranik	a5154dce3e	get_binaries script fix (#2263 ) * get_binaries uses DOCKER_TAG taken from docker image build step * remove docker tag discovery at all and fix get_binaries for version variable	2022-08-12 20:35:26 +03:00
Alexander Bayandin	da5f8486ce	test_runner/pg_clients: collect docker logs (#2259 )	2022-08-12 17:03:09 +01:00
Dmitry Ivanov	ad08c273d3	[proxy] Rework wire format of the password hack and some errors (#2236 ) The new format has a few benefits: it's shorter, simpler and human-readable as well. We don't use base64 anymore, since url encoding got us covered. We also show a better error in case we couldn't parse the payload; the users should know it's all about passing the correct project name.	2022-08-12 17:38:43 +03:00
Andrey Taranik	7f97269277	get_binaries uses DOCKER_TAG taken from docker image build step (#2260 )	2022-08-12 16:01:22 +03:00
Thang Pham	6d99b4f1d8	disable `test_import_from_pageserver_multisegment` (#2258 ) This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure. See: #2255, #2256	2022-08-12 19:13:42 +07:00
Egor Suvorov	a7bf60631f	postgres_ffi/waldecoder: introduce explicit `enum State` Previously it was emulated with a combination of nullable fields. This change should make the logic more readable.	2022-08-12 11:40:46 +03:00
Egor Suvorov	07bb7a2afe	postgres_ffi/waldecoder: remove unused startlsn	2022-08-12 11:40:46 +03:00
Egor Suvorov	142e247e85	postgres_ffi/waldecoder: validate more header fields	2022-08-12 11:40:46 +03:00
Thang Pham	7da47d8a0a	Fix timeline physical size flaky tests (#2244 ) Resolves #2212. - use `wait_for_last_flush_lsn` in `test_timeline_physical_size_` tests ## Context Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk. ### Note The `LayerFlushThread` threads are processed concurrently*, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly	2022-08-12 14:28:50 +07:00
Thang Pham	dc52436a8f	Fix bug when import large (>1GB) relations (#2172 ) Resolves #2097 - use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get` - update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. + `small` is the old test + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build	2022-08-12 09:24:20 +07:00
Kirill Bulatov	995a2de21e	Share exponential backoff code and fix logic for delete task failure (#2252 )	2022-08-11 23:21:06 +03:00
Arseny Sher	e593cbaaba	Add pageserver checkpoint_timeout option. To flush inmemory layer eventually when no new data arrives, which helps safekeepers to suspend activity (stop pushing to the broker). Default 10m should be ok.	2022-08-11 22:54:09 +03:00
Heikki Linnakangas	4b9e02be45	Update back `vendor/postgres` back; it was changed accidentally. (#2251 ) Commit `4227cfc96e` accidentally reverted vendor/postgres to an older version. Update it back.	2022-08-11 19:25:08 +03:00
Kirill Bulatov	7a36d06cc2	Fix exponential backoff values	2022-08-11 08:34:57 +03:00
Konstantin Knizhnik	4227cfc96e	Safe truncate (#2218 ) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Check if relation exists before trying to truncat it refer #1932 * Add test reporducing FSM truncate problem Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-08-09 22:45:33 +03:00
Dmitry Rodionov	1fc761983f	support node id and remote storage params in docker_entrypoint.sh	2022-08-09 18:59:00 +03:00
Stas Kelvich	227d47d2f3	Update CONTRIBUTING.md	2022-08-09 14:18:25 +03:00
Stas Kelvich	0290893bcc	Update CONTRIBUTING.md	2022-08-09 14:18:25 +03:00
Heikki Linnakangas	32fd709b34	Fix links to safekeeper protocol docs. (#2188 ) safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in commit `0b14fdb078`, as part of reorganizing the docs into 'mdbook' format. Fixes issue #1475. Thanks to @banks for spotting the outdated references. In addition to fixing the above issue, this patch also fixes other broken links as a result of `0b14fdb078`. See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480. Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech>	2022-08-09 10:19:18 +07:00
Kirill Bulatov	3a9bff81db	Fix etcd typos	2022-08-08 19:04:46 +03:00
bojanserafimov	743370de98	Major migration script (#2073 ) This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview. Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2022-08-08 17:52:28 +02:00
Dmitry Rodionov	cdfa9fe705	avoid duplicate parameter, increase timeout	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	7cd68a0c27	increase timeout to pass test with real s3	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	beaa991f81	remove debug log	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	9430abae05	use event so it fires only if workload thread successfully finished	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	4da4c7f769	increase statement timeout	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	0d14d4a1a8	ignore record property warning to fix benchmarks	2022-08-08 12:15:16 +03:00
bojanserafimov	8c8431ebc6	Add more buckets to pageserver latency metrics (#2225 )	2022-08-06 11:45:47 +02:00
Ankur Srivastava	84d1bc06a9	refactor: replace lazy-static with once-cell (#2195 ) - Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy` - fixes #1147 Signed-off-by: Ankur Srivastava <best.ankur@gmail.com>	2022-08-05 19:34:04 +02:00
Konstantin Knizhnik	5133db44e1	Move relation size cache from WalIngest to DatadirTimeline (#2094 ) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-08-05 16:28:59 +03:00
Alexander Bayandin	4cb1074fe5	github/workflows: Fix git dubious ownership (#2223 )	2022-08-05 13:44:57 +01:00
Arthur Petukhovsky	e814ac16f9	Merge pull request #2219 from neondatabase/main Release 2022-08-04	2022-08-04 20:06:34 +03:00
Arthur Petukhovsky	0a958b0ea1	Check find_end_of_wal errors instead of unwrap	2022-08-04 17:56:19 +03:00
Vadim Kharitonov	1bbc8090f3	[issue #1591 ] Add `neon_local pageserver status` handler	2022-08-04 16:38:29 +03:00
Dmitry Rodionov	f7d8db7e39	silence https://github.com/neondatabase/neon/issues/2211	2022-08-04 16:32:19 +03:00
Dmitry Rodionov	e54941b811	treat pytest warnings as errors	2022-08-04 16:32:19 +03:00
Heikki Linnakangas	52ce1c9d53	Speed up test shutdown, by polling more frequently. A fair amount of the time in our python tests is spent waiting for the pageserver and safekeeper processes to shut down. It doesn't matter so much when you're running a lot of tests in parallel, but it's quite noticeable when running them sequentially. A big part of the slowness is that is that after sending the SIGTERM signal, we poll to see if the process is still running, and the polling happened at 1 s interval. Reduce it to 0.1 s.	2022-08-04 12:57:15 +03:00
Dmitry Rodionov	bc2cb5382b	run real s3 tests in CI	2022-08-04 11:14:05 +03:00
Dmitry Rodionov	5f71aa09d3	support running tests against real s3 implementation without mocking	2022-08-04 11:14:05 +03:00
Dmitry Rodionov	b4f2c5b514	run benchmarks conditionally, on main or if run_benchmarks label is set	2022-08-03 01:36:14 +03:00
Alexander Bayandin	71f39bac3d	github/workflows: upload artifacts to S3 (#2071 )	2022-08-02 13:57:26 +01:00
Heikki Linnakangas	ad3055d386	Merge pull request #2203 from neondatabase/release-uuid-ossp Deploy new storage and compute version to production Release 2022-08-02	2022-08-02 15:08:14 +03:00
Heikki Linnakangas	94e03eb452	Merge remote-tracking branch 'origin/main' into 'release' Release 2022-08-01	2022-08-02 12:43:49 +03:00

1 2 3 4 5 ...

1935 Commits