rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 14:02:55 +00:00

Author	SHA1	Message	Date
Arseny Sher	596d622a82	Fix test_prepare_snapshot. It should checkpoint pageserver after waiting for all data arrival, not before.	2022-10-28 22:12:31 +04:00
Alexander Bayandin	0cbae6e8f3	test_backward_compatibility: friendlier error message (#2707 )	2022-10-27 15:54:49 +00:00
Alexander Stanovoy	78e412b84b	The fix of #2650 . (#2686 ) * Wrappers and drop implementations for image and delta layer writers. * Two regression tests for the image and delta layer files.	2022-10-27 14:02:55 +00:00
Alexander Bayandin	834ffe1bac	Add data format backward compatibility tests (#2626 )	2022-10-25 16:41:50 +02:00
Andrés	71ef7b6663	Remove cached_property package (#2673 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-21 20:02:31 +03:00
Kirill Bulatov	5928cb33c5	Introduce timeline state (#2651 ) Similar to https://github.com/neondatabase/neon/pull/2395, introduces a state field in Timeline, that's possible to subscribe to. Adjusts * walreceiver to not to have any connections if timeline is not Active * remote storage sync to not to schedule uploads if timeline is Broken * not to create timelines if a tenant/timeline is broken * automatically switches timelines' states based on tenant state Does not adjust timeline's gc, checkpointing and layer flush behaviour much, since it's not safe to cancel these processes abruptly and there's task_mgr::shutdown_tasks that does similar thing.	2022-10-21 15:51:48 +00:00
Heikki Linnakangas	fc4ea3553e	test_gc_cutoff.py fixes (#2655 ) * Fix bogus early exit from GC. Commit `91411c415a` added this failpoint, but the early exit was not intentional. * Cleanup test_gc_cutoff.py test. - Remove the 'scale' parameter, this isn't a benchmark - Tweak pgbench and pageserver options to create garbage faster that the the GC can collect away. The test used to take just under 5 minutes, which was uncomfortably close to the default 5 minute test timeout, and annoyingly even without the hard limit. These changes bring it down to about 1-2 minutes. - Improve comments, fix typos - Rename the failpoint. The old name, 'gc-before-save-metadata' implied that the failpoint was before the metadata update, but it was in fact much later in the function. - Move the call to persist the metadata outside the lock, to avoid holding it for too long. To verify that this test still covers the original bug, https://github.com/neondatabase/neon/issues/2539, I commenting out updating the metadata file like this: ``` diff --git a/pageserver/src/tenant/timeline.rs b/pageserver/src/tenant/timeline.rs index 1e857a9a..f8a9f34a 100644 --- a/pageserver/src/tenant/timeline.rs +++ b/pageserver/src/tenant/timeline.rs @@ -1962,7 +1962,7 @@ impl Timeline { } // Persist the new GC cutoff value in the metadata file, before // we actually remove anything. - self.update_metadata_file(self.disk_consistent_lsn.load(), HashMap::new())?; + //self.update_metadata_file(self.disk_consistent_lsn.load(), HashMap::new())?; info!("GC starting"); ``` It doesn't fail every time with that, but it did fail after about 5 runs.	2022-10-21 02:39:55 +03:00
Arthur Petukhovsky	f5ab9f761b	Remove flaky checks in test_delete_force (#2567 )	2022-10-20 17:14:32 +04:00
Kirill Bulatov	306a47c4fa	Use uninit mark files during timeline init for atomic creation (#2489 ) Part of https://github.com/neondatabase/neon/pull/2239 Regular, from scratch, timeline creation involves initdb to be run in a separate directory, data from this directory to be imported into pageserver and, finally, timeline-related background tasks to start. This PR ensures we don't leave behind any directories that are not marked as temporary and that pageserver removes such directories on restart, allowing timeline creation to be retried with the same IDs, if needed. It would be good to later rewrite the logic to use a temporary directory, similar what tenant creation does. Yet currently it's harder than this change, so not done.	2022-10-20 14:19:17 +03:00
Kirill Bulatov	84c5f681b0	Fix test feature detection (#2659 ) Follow-up of #2636 and #2654 , fixing the test detection feature. Pageserver currently outputs features as ``` /target/debug/pageserver --version Neon page server git:7734929a8202c8cc41596a861ffbe0b51b5f3cb9 failpoints: true, features: ["testing", "profiling"] ```	2022-10-20 13:44:03 +03:00
Andrés	9211923bef	Pageserver Python tests should not fail if the server is built with no testing feature (#2636 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-20 10:46:57 +03:00
Heikki Linnakangas	bc5ec43056	Fix flaky physical-size tests in test_timeline_size.py. These two tests, test_timeline_physical_size_post_compaction and test_timeline_physical_size_post_gc, assumed that after you have waited for the WAL from a bulk insertion to arrive, and you run a cycle of checkpoint and compaction, no new layer files are created. Because if a new layer file is created while we are calculating the incremental and non-incremental physical sizes, they might differ. However, the tests used a very small checkpoint_distance, so even a small amount of WAL generated in PostgreSQL could cause a new layer file to be created. Autovacuum can kick in at any time, and do that. That caused occasional failues in the test. I was able to reproduce it reliably by adding a long delay between the incremental and non-incremental size calculations: ``` --- a/pageserver/src/http/routes.rs +++ b/pageserver/src/http/routes.rs @@ -129,6 +129,9 @@ async fn build_timeline_info( } }; let current_physical_size = Some(timeline.get_physical_size()); + if include_non_incremental_physical_size { + std:🧵:sleep(std::time::Duration::from_millis(60000)); + } let info = TimelineInfo { tenant_id: timeline.tenant_id, ``` To fix, disable autovacuum for the table. Autovacuum could still kick in for other tables, e.g. catalog tables, but that seems less likely to generate enough WAL to causea new layer file to be flushed. If this continues to be a problem in the future, we could simply retry the physical size call a few times, if there's a mismatch. A mismatch could happen every once in a while, but it's very unlikely to happen more than once or twice in a row. Fixes https://github.com/neondatabase/neon/issues/2212	2022-10-19 23:50:21 +03:00
Konstantin Knizhnik	91411c415a	Persists latest_gc_cutoff_lsn before performing GC (#2558 ) * Persists latest_gc_cutoff_lsn before performing GC * Peform some refactoring and code deduplication refer #2539 * Add test for persisting GC cutoff * Fix python test style warnings * Bump postgres version * Reduce number of iterations in test_gc_cutoff test * Bump postgres version * Undo bumping postgres version	2022-10-19 12:32:03 +03:00
bojanserafimov	8fbe437768	Improve pageserver IO metrics (#2629 )	2022-10-18 11:53:28 -04:00
Dmitry Rodionov	129f7c82b7	remove redundant expect_tenant_to_download_timeline	2022-10-18 11:21:48 +03:00
Joonas Koivunen	c709354579	Add layer sizes to index_part.json (#2582 ) This is the first step in verifying layer files. Next up on the road is hashing the files and verifying the hashes. The metadata additions do not require any migration. The idea is that the change is backward and forward-compatible with regard to `index_part.json` due to the softness of JSON schema and the deserialization options in use. New types added: - LayerFileMetadata for tracking the file metadata - starting with only the file size - in future hopefully a sha256 as well - IndexLayerMetadata, the serialized counterpart of LayerFileMetadata LayerFileMetadata needing to have all fields Option is a problem but that is not possible to handle without conflicting a lot more with other ongoing work. Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2022-10-17 12:21:04 +03:00
Heikki Linnakangas	538876650a	Merge 'local' and 'remote' parts of TimelineInfo into one struct. The 'local' part was always filled in, so that was easy to merge into into the TimelineInfo itself. 'remote' only contained two fields, 'remote_consistent_lsn' and 'awaits_download'. I made 'remote_consistent_lsn' an optional field, and 'awaits_download' is now false if the timeline is not present remotely. However, I kept stub versions of the 'local' and 'remote' structs for backwards-compatibility, with a few fields that are actively used by the control plane. They just duplicate the fields from TimelineInfo now. They can be removed later, once the control plane has been updated to use the new fields.	2022-10-14 18:37:14 +03:00
Heikki Linnakangas	500239176c	Make TimelineInfo.local field mandatory. It was only None when you queried the status of a timeline with 'timeline_detail' mgmt API call, and it was still being downloaded. You can check for that status with the 'tenant_status' API call instead, checking for has_in_progress_downloads field. Anothere case was if an error happened while trying to get the current logical size, in a 'timeline_detail' request. It might make sense to tolerate such errors, and leave the fields we cannot fill in as empty, None, 0 or similar, but it doesn't make sense to me to leave the whole 'local' struct empty in tht case.	2022-10-14 18:37:14 +03:00
Arseny Sher	9fe4548e13	Reimplement explicit timeline creation on safekeepers. With the ability to pass commit_lsn. This allows to perform project WAL recovery through different (from the original) set of safekeepers (or under different ttid) by 1) moving WAL files to s3 under proper ttid; 2) explicitly creating timeline on safekeepers, setting commit_lsn to the latest point; 3) putting the lastest .parital file to the timeline directory on safekeepers, if desired. Extend test_s3_wal_replay to exersise this behaviour. Also extends timeline_status endpoint to return postgres information.	2022-10-13 21:43:10 +04:00
Andrés	09dda35dac	Return broken tenants due to non existing timelines dir (#2552 ) (#2575 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-12 22:28:39 +03:00
Arseny Sher	6d0dacc4ce	Recreate timeline on pageserver in s3_wal_replay test. That's closer to real usage than switching to brand new pageserver.	2022-10-12 11:46:21 +04:00
Heikki Linnakangas	e5e40a31f4	Clean up terms "delete timeline" and "detach tenant". You cannot attach/detach an individual timeline, attach/detach always applies to the whole tenant. However, you can delete a single timeline from a tenant. Fix some comments and error messages that confused these two operations.	2022-10-11 17:47:41 +03:00
Lassi Pölönen	e520293090	Add build info metric to pageserver, safekeeper and proxy (#2596 ) * Test that we emit build info metric for pageserver, safekeeper and proxy with some non-zero length revision label * Emit libmetrics_build_info on startup of pageserver, safekeeper and proxy with label "revision" which tells the git revision.	2022-10-11 09:54:32 +03:00
Heikki Linnakangas	a22165d41e	Add tests for comparing root and child branch performance. Author: Thang Pham <thang@neon.tech>	2022-10-08 10:07:33 +03:00
Heikki Linnakangas	9e1eb69d55	Increase default compaction_period setting to 20 s. The previous default of 1 s caused excessive CPU usage when there were a lot of projects. Polling every timeline once a second was too aggressive so let's reduce it. Fixes https://github.com/neondatabase/neon/issues/2542, but we probably also want do to something so that we don't poll timelines that have received no new WAL or layers since last check.	2022-10-07 13:55:19 +03:00
Arthur Petukhovsky	687ba81366	Display sync safekeepers output in compute_ctl (#2571 ) Pipe postgres output to compute_ctl stdout and create a test to check that compute_ctl works and prints postgres logs.	2022-10-06 13:53:52 +00:00
Andrés	47bae68a2e	Make get_lsn_by_timestamp available in mgmt API (#2536 ) (#2560 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-06 12:42:50 +03:00
Konstantin Knizhnik	ff8c481777	Normalize last_record LSN in wal receiver (#2529 ) * Add test for branching on page boundary * Normalize start recovery point Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech>	2022-10-06 09:01:56 +03:00
Kirill Bulatov	d823e84ed5	Allow attaching tenants with zero timelines	2022-10-04 18:13:51 +03:00
Kirill Bulatov	231dfbaed6	Do not remove empty timelines/ directory for tenants	2022-10-04 18:13:51 +03:00
Dmitry Rodionov	5cf53786f9	Improve pytest ergonomics 1. Disable perf tests by default 2. Add instruction to run tests in parallel	2022-10-04 14:53:01 +03:00
Kirill Bulatov	7b2f9dc908	Reuse existing tenants during attach (#2540 )	2022-10-03 13:33:55 +03:00
Alexander Bayandin	ebab89ebd2	test_runner: pass password to pgbench via PGPASSWORD (#2468 )	2022-09-23 12:51:33 +00:00
Anastasia Lubennikova	d098542dde	Make test_timeline_size_metrics more stable: Compare size with Vanilla postgres size instead of hardcoded value	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	862902f9e5	Update readme and openapi spec	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	0fde59aa46	use pg_version in python tests	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	86bf491981	Support pg 15 - Split postgres_ffi into two version specific files. - Preserve pg_version in timeline metadata. - Use pg_version in safekeeper code. Check for postgres major version mismatch. - Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding. - Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture. To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable: 'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress' Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.	2022-09-22 14:15:13 +03:00
Konstantin Knizhnik	f3073a4db9	R-Tree layer map (#2317 ) Replace the layer array and linear search with R-tree So far, the in-memory layer map that holds information about layer files that exist, has used a simple Vec, in no particular order, to hold information about all the layers. That obviously doesn't scale very well; with thousands of layer files the linear search was consuming a lot of CPU. Replace it with a two-dimensional R-tree, with Key and LSN ranges as the dimensions. For the R-tree, use the 'rstar' crate. To be able to use that, we convert the Keys and LSNs into 256-bit integers. 64 bits would be enough to represent LSNs, and 128 bits would be enough to represent Keys. However, we use 256 bits, because rstar internally performs multiplication to calculate the area of rectangles, and the result of multiplying two 128 bit integers doesn't necessarily fit in 128 bits, causing integer overflow and, if overflow-checks are enabled, panic. To avoid that, we use 256 bit integers. Add a performance test that creates a lot of layer files, to demonstrate the benefit.	2022-09-22 08:35:06 +03:00
Alexander Bayandin	19fa410ff8	NeonCompare: switch to new pageserver HTTP API	2022-09-21 15:17:55 +01:00
sharnoff	4a3b3ff11d	Move testing pageserver libpq cmds to HTTP api (#2429 ) Closes #2422. The APIs have been feature gated with the `testing_api!` macro so that they return 400s when support hasn't been compiled in.	2022-09-20 11:28:12 -07:00
Heikki Linnakangas	a5019bf771	Use a simpler way to set extra options for benchmark test. Commit `43a4f7173e` fixed the case that there are extra options in the connection string, but broke it in the case when there are not. Fix that. But on second thoughts, it's more straightforward set the options with ALTER DATABASE, so change the workflow yaml file to do that instead.	2022-09-20 13:48:50 +03:00
Heikki Linnakangas	e4f775436f	Don't override other options than statement_timeout in test conn string. In commit `6985f6cd6c`, I tried passing extra GUCs in the 'options' part of the connection string, but it didn't work because the pgbench test overrode it with the statement_timeout. Change it so that it adds the statement_timeout to any other options, instead of replacing them.	2022-09-20 09:46:15 +03:00
Kirill Bulatov	b46c8b4ae0	Add an alias to build test images simply	2022-09-16 18:58:41 +03:00
Kirill Bulatov	031e57a973	Disable failpoints by default	2022-09-16 09:26:29 +03:00
bojanserafimov	96e867642f	Validate tenant create options (#2450 ) Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2022-09-15 18:20:23 -04:00
Egor Suvorov	e968b5e502	tests: do not set num_safekeepers = 1, it's the default (#2457 ) Also get rid if `with_safekeepers` parameter in tests. Its meaning has changed: `False` meant "no safekeepers" which is not supported anymore, so we assume it's always `True`. See #1648	2022-09-15 21:43:51 +03:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Alexander Bayandin	59d04ab66a	test_runner: redact passwords from log messages (#2434 )	2022-09-13 17:24:11 +00:00
Kirill Bulatov	1a8c8b04d7	Merge Repository and Tenant entities, rework tenant background jobs	2022-09-13 15:39:39 +03:00
Kirill Bulatov	2a837d7de7	Create tenants in temporary directory first (#2426 )	2022-09-12 21:04:33 +00:00

1 2 3 4 5 ...

460 Commits