rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-06 04:52:55 +00:00

Author	SHA1	Message	Date
Alexander Bayandin	ebab89ebd2	test_runner: pass password to pgbench via PGPASSWORD (#2468 )	2022-09-23 12:51:33 +00:00
Konstantin Knizhnik	f3073a4db9	R-Tree layer map (#2317 ) Replace the layer array and linear search with R-tree So far, the in-memory layer map that holds information about layer files that exist, has used a simple Vec, in no particular order, to hold information about all the layers. That obviously doesn't scale very well; with thousands of layer files the linear search was consuming a lot of CPU. Replace it with a two-dimensional R-tree, with Key and LSN ranges as the dimensions. For the R-tree, use the 'rstar' crate. To be able to use that, we convert the Keys and LSNs into 256-bit integers. 64 bits would be enough to represent LSNs, and 128 bits would be enough to represent Keys. However, we use 256 bits, because rstar internally performs multiplication to calculate the area of rectangles, and the result of multiplying two 128 bit integers doesn't necessarily fit in 128 bits, causing integer overflow and, if overflow-checks are enabled, panic. To avoid that, we use 256 bit integers. Add a performance test that creates a lot of layer files, to demonstrate the benefit.	2022-09-22 08:35:06 +03:00
Heikki Linnakangas	a5019bf771	Use a simpler way to set extra options for benchmark test. Commit `43a4f7173e` fixed the case that there are extra options in the connection string, but broke it in the case when there are not. Fix that. But on second thoughts, it's more straightforward set the options with ALTER DATABASE, so change the workflow yaml file to do that instead.	2022-09-20 13:48:50 +03:00
Heikki Linnakangas	e4f775436f	Don't override other options than statement_timeout in test conn string. In commit `6985f6cd6c`, I tried passing extra GUCs in the 'options' part of the connection string, but it didn't work because the pgbench test overrode it with the statement_timeout. Change it so that it adds the statement_timeout to any other options, instead of replacing them.	2022-09-20 09:46:15 +03:00
Egor Suvorov	e968b5e502	tests: do not set num_safekeepers = 1, it's the default (#2457 ) Also get rid if `with_safekeepers` parameter in tests. Its meaning has changed: `False` meant "no safekeepers" which is not supported anymore, so we assume it's always `True`. See #1648	2022-09-15 21:43:51 +03:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Konstantin Knizhnik	eef7475408	Add tests for measuring effect of lsn caching (#2384 ) * Add tests for measurif effet of lsn caching * Fix formatting of test_latency.py * Fix test_lsn_mapping test	2022-09-03 17:06:19 +03:00
Heikki Linnakangas	47bd307cb8	Add python types to represent LSNs, tenant IDs and timeline IDs. (#2351 ) For better ergonomics. I always found it weird that we used UUID to actually mean a tenant or timeline ID. It worked because it happened to have the same length, 16 bytes, but it was hacky.	2022-09-02 10:16:47 +03:00
Alexander Bayandin	39a3bcac36	test_runner: fix flake8 warnings	2022-08-22 14:57:09 +01:00
Alexander Bayandin	4c2bb43775	Reformat all python files by black & isort	2022-08-22 14:57:09 +01:00
Alexander Bayandin	4cddb0f1a4	Set up a workflow to run pgbench against captest (#2077 )	2022-08-15 18:54:31 +01:00
Dmitry Rodionov	cdfa9fe705	avoid duplicate parameter, increase timeout	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	9430abae05	use event so it fires only if workload thread successfully finished	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	4da4c7f769	increase statement timeout	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	092a9b74d3	use only s3 in boto3-stubs and update mypy Newer version of mypy fixes buggy error when trying to update only boto3 stubs. However it brings new checks and starts to yell when we index into cusror.fetchone without checking for None first. So this introduces a wrapper to simplify quering for scalar values. I tried to use cursor_factory connection argument but without success. There can be a better way to do that, but this looks the simplest	2022-08-01 18:28:49 +03:00
Alexander Bayandin	9dcb9ca3da	test/performance: ensure we don't have tables that we're creating (#2135 )	2022-07-22 11:00:05 +01:00
Thang Pham	ed102f44d9	Reduce memory allocations for page server (#2010 ) ## Overview This patch reduces the number of memory allocations when running the page server under a heavy write workload. This mostly helps improve the speed of WAL record ingestion. ## Changes - modified `DatadirModification` to allow reuse the struct's allocated memory after each modification - modified `decode_wal_record` to allow passing a `DecodedWALRecord` reference. This helps reuse the struct in each `decode_wal_record` call - added a reusable buffer for serializing object inside the `InMemoryLayer::put_value` function - added a performance test simulating a heavy write workload for testing the changes in this patch ### Semi-related changes - remove redundant serializations when calling `DeltaLayer::put_value` during `InMemoryLayer::write_to_disk` function call [1] - removed the info span `info_span!("processing record", lsn = %lsn)` during each WAL ingestion [2] ## Notes - [1]: in `InMemoryLayer::write_to_disk`, a deserialization is called ``` let val = Value::des(&buf)?; delta_layer_writer.put_value(key, *lsn, val)?; ``` `DeltaLayer::put_value` then creates a serialization based on the previous deserialization ``` let off = self.blob_writer.write_blob(&Value::ser(&val)?)?; ``` - [2]: related: https://github.com/neondatabase/neon/issues/733	2022-07-21 12:08:26 -04:00
Konstantin Knizhnik	572ae74388	More precisely control size of inmem layer (#1927 ) * More precisely control size of inmem layer * Force recompaction of L0 layers if them contains large non-wallogged BLOBs to avoid too large layers * Add modified version of test_hot_update test (test_dup_key.py) which should generate large layers without large number of tables * Change test name in test_dup_key * Add Layer::get_max_key_range function * Add layer::key_iter method and implement new approach of splitting layers during compaction based on total size of all key values * Add test_large_schema test for checking layer file size after compaction * Make clippy happy * Restore checking LSN distance threshold for checkpoint in-memory layer * Optimize stoage keys iterator * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Fix code style * Reduce number of tables in test_large_schema to make it fit in timeout with debug build * Fix style of test_large_schema.py * Fix handlng of duplicates layers Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-07-21 07:45:11 +03:00
Thang Pham	160e52ec7e	Optimize branch creation (#2101 ) Resolves #2054 Context: branch creation needs to wait for GC to acquire `gc_cs` lock, which prevents creating new timelines during GC. However, because individual timeline GC iteration also requires `compaction_cs` lock, branch creation may also need to wait for compactions of multiple timelines. This results in large latency when creating a new branch, which we advertised as "instantly". This PR optimizes the latency of branch creation by separating GC into two phases: 1. Collect GC data (branching points, cutoff LSNs, etc) 2. Perform GC for each timeline The GC bottleneck comes from step 2, which must wait for compaction of multiple timelines. This PR modifies the branch creation and GC functions to allow GC to hold the GC lock only in step 1. As a result, branch creation doesn't need to wait for compaction to finish but only needs to wait for GC data collection step, which is fast.	2022-07-19 14:56:25 -04:00
Alexander Bayandin	00c26ff3a3	Bring periodic perf tests on GitHub back (#2037 ) * test/fixtures: fix DeprecationWarning * workflows/benchmarking: increase timeout * test: switch pgbench to default(simple) query mode * test/performance: ensure we don't have tables that we're creating * workflows/pg_clients: remove unused env var * workflows/benchmarking: change platform name	2022-07-07 19:53:23 +01:00
bojanserafimov	84b9fcbbd5	Increase a few test timeouts (#1977 )	2022-06-23 11:51:56 -04:00
Thang Pham	37465dafe3	Add wal backpressure tests (#1919 ) Resolves #1889. This PR adds new tests to measure the WAL backpressure's performance under different workloads. ## Changes - add new performance tests in `test_wal_backpressure.py` - allow safekeeper's fsync to be configurable when running tests	2022-06-20 11:40:55 -04:00
Thang Pham	6cfebc096f	Add read/write throughput performance tests (#1883 ) Part of #1467 This PR adds several performance tests that compare the [PG statistics](https://www.postgresql.org/docs/current/monitoring-stats.html) obtained when running PG benchmarks against Neon and vanilla PG to measure the read/write throughput of the DB.	2022-06-06 12:32:10 -04:00
bojanserafimov	90e2c9ee1f	Rename zenith to neon in python tests (#1871 )	2022-06-02 16:21:28 -04:00
Kirill Bulatov	e5cb727572	Replace callmemaybe with etcd subscriptions on safekeeper timeline info	2022-06-01 16:07:04 +03:00
Heikki Linnakangas	ffbb9dd155	Add a 5 minute timeout to python tests. The CI times out after 10 minutes of no output. It's annoying if a test hangs and is killed by the CI timeout, because you don't get information about which test was running. Try to avoid that, by adding a slightly smaller timeout in pytest itself. You can override it on a per-test basis if needed, but let's try to keep our tests shorter than that. For the Postgres regression tests, use a longer 30 minute timeout. They're not really a single test, but many tests wrapped in a single pytest test. It's OK for them to run longer in aggregate, each Postgres test is still fairly short.	2022-05-19 14:04:14 +03:00
Anastasia Lubennikova	a2561f0a78	Use tenant's pitr_interval instead of hardroded 0 in the command. Adjust python tests that use the	2022-05-13 18:32:14 +03:00
Thang Pham	ae20751724	update `ZenithCli::create_tenant` return signature (#1692 ) to include the initial timeline's ID in addition to the new tenant's ID. Context: follow-up of https://github.com/neondatabase/neon/pull/1689	2022-05-12 17:27:08 -04:00
Heikki Linnakangas	30a7598172	Some copy-editing.	2022-05-05 22:35:15 +03:00
Heikki Linnakangas	1ad5658d9c	Fix typos	2022-05-05 22:35:15 +03:00
Dmitry Rodionov	954859f6c5	add readme for performance tests with the current state of things	2022-05-05 22:35:15 +03:00
bojanserafimov	02e5083695	Add hot page test (#1479 )	2022-05-04 12:45:01 -04:00
bojanserafimov	867aede715	Add idle compute restart time test (#1514 )	2022-04-22 10:45:47 -04:00
Heikki Linnakangas	a4700c9bbe	Use pprof to get flamegraph of get_page and get_relsize requests. This depends on a hacked version of the 'pprof-rs' crate. Because of that, it's under an optional 'profiling' feature. It is disabled by default, but enabled for release builds in CircleCI config. It doesn't currently work on macOS. The flamegraph is written to 'flamegraph.svg' in the pageserver workdir when the 'pageserver' process exits. Add a performance test that runs the perf_pgbench test, with profiling enabled.	2022-04-21 20:32:48 +03:00
Kirill Bulatov	52e0816fa5	wal_acceptor -> safekeeper	2022-04-18 12:52:31 +03:00
Heikki Linnakangas	4a8c663452	Refactor pgbench tests. - Remove batch_others/test_pgbench.py. It was a quick check that pgbench works, without actually recording any performance numbers, but that doesn't seem very interesting anymore. Remove it to avoid confusing it with the actual pgbench benchmarks - Run pgbench with "-n" and "-S" options, for two different workloads: simple-updates, and SELECT-only. Previously, we would only run it with the "default" TPCB-like workload. That's more or less the same as the simple-update (-n) workload, but I think the simple-upload workload is more relevant for testing storage performance. The SELECT-only workload is a new thing to measure. - Merge test_perf_pgbench.py and test_perf_pgbench_remote.py. I added a new "remote" implementation of the PgCompare class, which allows running the same tests against an already-running Postgres instance. - Make the PgBenchRunResult.parse_from_output function more flexible. pgbench can print different lines depending on the command-line options, but the parsing function expected a particular set of lines.	2022-04-14 13:31:42 +03:00
bojanserafimov	6fe443e239	Improve random_writes test (#1469 ) If you want to test with a 3GB database by tweaking some constants you'll hit a query timeout. I fix that by batching the inserts.	2022-04-06 18:32:10 -04:00
Heikki Linnakangas	5e04dad360	Add more variants of the sequential scan performance tests. More rows, and test with serial and parallel plans. But fewer iterations, so that the tests run in < 1 minutes, and we don't need to mark them as "slow".	2022-03-25 23:42:13 +02:00
Kirill Bulatov	dd74c66ef0	Do not create timeline along with tenant	2022-03-10 19:38:58 +02:00
Kirill Bulatov	7b5482bac0	Properly store the branch name mappings	2022-03-10 19:38:58 +02:00
Kirill Bulatov	4d0f7fd1e4	Update Zenith CLI config between runs	2022-03-10 19:38:58 +02:00
Kirill Bulatov	f49990ed43	Allow creating timelines by branching off ancestors	2022-03-10 19:38:58 +02:00
Dmitry Rodionov	1d90b1b205	add node id to pageserver (#1310 ) * Add --id argument to safekeeper setting its unique u64 id. In preparation for storage node messaging. IDs are supposed to be monotonically assigned by the console. In tests it is issued by ZenithEnv; at the zenith cli level and fixtures, string name is completely replaced by integer id. Example TOML configs are adjusted accordingly. Sequential ids are chosen over Zid mainly because they are compact and easy to type/remember. * add node id to pageserver This adds node id parameter to pageserver configuration. Also I use a simple builder to construct pageserver config struct to avoid setting node id to some temporary invalid value. Some of the changes in test fixtures are needed to split init and start operations for envrionment. Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2022-03-04 01:10:42 +03:00
bojanserafimov	fdc15de8b2	Add perf test: test_random_writes (#1292 )	2022-02-18 15:46:29 -05:00
Bojan Serafimov	ad262a46ad	Remove redundant pytest_plugins assignment	2022-02-17 13:41:49 +02:00
Kirill Bulatov	e5bf520b18	Use types in zenith cli invocations in Python tests	2022-02-17 13:41:19 +02:00
bojanserafimov	335abfcc28	Add slow seqscan perf test (#1283 )	2022-02-16 10:59:51 -05:00
bojanserafimov	afb3342e46	Add vanilla pg baseline tests (#1275 )	2022-02-15 13:44:22 -05:00
bojanserafimov	ea13838be7	Add pgbench baseline test (#1204 ) Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2022-02-10 15:33:36 -05:00
Heikki Linnakangas	722667f189	Add test case for performance issue #941 . The first COPY generates about 230 MB of write I/O, but the second COPY, after deleting most of the rows and vacuuming the rows away, generates 370 MB of writes. Both COPYs insert the same amount of data, so they should generate roughly the same amount of I/O. This commit doesn't try to fix the issue, just adds a test case to demonstrate it. Add a new 'checkpoint' command to the pageserver API. Previously, we've used 'do_gc' for that, but many tests, including this new one, really only want to perform a checkpoint and don't care about GC. For now, I only used the command in the new test, though, and didn't convert any existing tests to use it.	2022-01-04 11:26:37 +02:00

1 2 3 4 5

221 Commits