rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 13:32:57 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	3aca717f3d	Reorganize python tests. Merge batch_others and batch_pg_regress. The original idea was to split all the python tests into multiple "batches" and run each batch in parallel as a separate CI job. However, the batch_pg_regress batch was pretty short compared to all the tests in batch_others. We could split batch_others into multiple batches, but it actually seems better to just treat them as one big pool of tests and use pytest's handle the parallelism on its own. If we need to split them across multiple nodes in the future, we could use pytest-shard or something else, instead of managing the batches ourselves. Merge test_neon_regress.py, test_pg_regress.py and test_isolation.py into one file, test_pg_regress.py. Seems more clear to group all pg_regress-based tests into one file, now that they would all be in the same directory.	2022-08-30 18:25:38 +03:00
Dmitry Ivanov	96a50e99cf	Forward various connection params to compute nodes. (#2336 ) Previously, proxy didn't forward auxiliary `options` parameter and other ones to the client's compute node, e.g. ``` $ psql "user=john host=localhost dbname=postgres options='-cgeqo=off'" postgres=# show geqo; ┌──────┐ │ geqo │ ├──────┤ │ on │ └──────┘ (1 row) ``` With this patch we now forward `options`, `application_name` and `replication`. Further reading: https://www.postgresql.org/docs/current/libpq-connect.html Fixes #1287.	2022-08-30 17:36:21 +03:00
Konstantin Knizhnik	ee8b5f967d	Add fork_at_current_lsn function which creates branch at current LSN (#2344 ) * Add fork_at_current_lsn function which creates branch at current LSN * Undo use of fork_at_current_lsn in test_branching because of short GC period * Add missed return in fork_at_current_lsn * Add missed return in fork_at_current_lsn * Update test_runner/fixtures/neon_fixtures.py Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update test_runner/fixtures/neon_fixtures.py Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update test_runner/fixtures/neon_fixtures.py Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-08-29 17:59:04 +03:00
Heikki Linnakangas	ec20534173	Fix minor typos and leftover comments.	2022-08-27 17:54:56 +03:00
Dmitry Ivanov	6d30e21a32	Fix proxy tests (#2343 ) There might be different psql & locale configurations, therefore we should explicitly reset them to defaults.	2022-08-26 20:42:32 +03:00
KlimentSerafimov	b98fa5d6b0	Added a new test for making sure the proxy displays a session_id when using link auth. (#2039 ) Added pytest to check correctness of the link authentication pipeline. Context: this PR is the first step towards refactoring the link authentication pipeline to use https (instead of psql) to send the db info to the proxy. There was a test missing for this pipeline in this repo, so this PR adds that test as preparation for the actual change of psql -> https. Co-authored-by: Bojan Serafimov <bojan.serafimov7@gmail.com> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Stas Kelvic <stas@neon.tech> Co-authored-by: Dimitrii Ivanov <dima@neon.tech>	2022-08-22 20:02:45 -04:00
Dmitry Rodionov	9dd19ec397	Remove interferring proc check We do not need it anymore because ports_distributor checks whether the port can be used before giving it to service	2022-08-22 20:59:32 +03:00
Alexander Bayandin	39a3bcac36	test_runner: fix flake8 warnings	2022-08-22 14:57:09 +01:00
Alexander Bayandin	4c2bb43775	Reformat all python files by black & isort	2022-08-22 14:57:09 +01:00
Alexander Bayandin	277f2d6d3d	Report test results to Allure (#2229 )	2022-08-22 11:21:50 +01:00
Heikki Linnakangas	d48177d0d8	Expose timeline logical size as a prometheus metric. Physical size was already exposed, and it'd be nice to show both logical and physical size side by side in our graphana dashboards.	2022-08-19 22:21:33 +03:00
MMeent	f99ccb5041	Extract WalProposer into the neon extension (#2217 ) Including, but not limited to: * Fixes to neon management code to support walproposer-as-an-extension * Fix issue in expected output of pg settings serialization. * Show the logs of a failed --sync-safekeepers process in CI * Add compat layer for renamed GUCs in postgres.conf * Update vendor/postgres to the latest origin/main	2022-08-18 17:12:28 +02:00
Kirill Bulatov	67e091c906	Rework `init` in pageserver CLI (#2272 ) * Do not create initial tenant and timeline (adjust Python tests for that) * Rework config handling during init, add --update-config to manage local config updates	2022-08-17 23:24:47 +03:00
bojanserafimov	e9a3499e87	Fix flaky pageserver restarts in tests (#2261 )	2022-08-17 08:17:35 -04:00
Alexander Bayandin	4cddb0f1a4	Set up a workflow to run pgbench against captest (#2077 )	2022-08-15 18:54:31 +01:00
Dmitry Rodionov	63a72d99bb	increase timeout in wait_for_upload to avoid spurious failures when testing with real s3	2022-08-15 18:02:27 +03:00
Arthur Petukhovsky	116ecdf87a	Improve walreceiver logic (#2253 ) This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers. - There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237 - Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down. - Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second. - `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast. - `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck.	2022-08-15 13:31:26 +03:00
Alexander Bayandin	da5f8486ce	test_runner/pg_clients: collect docker logs (#2259 )	2022-08-12 17:03:09 +01:00
Dmitry Ivanov	ad08c273d3	[proxy] Rework wire format of the password hack and some errors (#2236 ) The new format has a few benefits: it's shorter, simpler and human-readable as well. We don't use base64 anymore, since url encoding got us covered. We also show a better error in case we couldn't parse the payload; the users should know it's all about passing the correct project name.	2022-08-12 17:38:43 +03:00
Thang Pham	6d99b4f1d8	disable `test_import_from_pageserver_multisegment` (#2258 ) This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure. See: #2255, #2256	2022-08-12 19:13:42 +07:00
Thang Pham	7da47d8a0a	Fix timeline physical size flaky tests (#2244 ) Resolves #2212. - use `wait_for_last_flush_lsn` in `test_timeline_physical_size_` tests ## Context Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk. ### Note The `LayerFlushThread` threads are processed concurrently*, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly	2022-08-12 14:28:50 +07:00
Thang Pham	dc52436a8f	Fix bug when import large (>1GB) relations (#2172 ) Resolves #2097 - use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get` - update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. + `small` is the old test + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build	2022-08-12 09:24:20 +07:00
Arseny Sher	e593cbaaba	Add pageserver checkpoint_timeout option. To flush inmemory layer eventually when no new data arrives, which helps safekeepers to suspend activity (stop pushing to the broker). Default 10m should be ok.	2022-08-11 22:54:09 +03:00
Konstantin Knizhnik	4227cfc96e	Safe truncate (#2218 ) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Check if relation exists before trying to truncat it refer #1932 * Add test reporducing FSM truncate problem Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-08-09 22:45:33 +03:00
bojanserafimov	743370de98	Major migration script (#2073 ) This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview. Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2022-08-08 17:52:28 +02:00
Dmitry Rodionov	cdfa9fe705	avoid duplicate parameter, increase timeout	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	7cd68a0c27	increase timeout to pass test with real s3	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	9430abae05	use event so it fires only if workload thread successfully finished	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	4da4c7f769	increase statement timeout	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	f7d8db7e39	silence https://github.com/neondatabase/neon/issues/2211	2022-08-04 16:32:19 +03:00
Dmitry Rodionov	e54941b811	treat pytest warnings as errors	2022-08-04 16:32:19 +03:00
Dmitry Rodionov	bc2cb5382b	run real s3 tests in CI	2022-08-04 11:14:05 +03:00
Dmitry Rodionov	5f71aa09d3	support running tests against real s3 implementation without mocking	2022-08-04 11:14:05 +03:00
Dmitry Rodionov	092a9b74d3	use only s3 in boto3-stubs and update mypy Newer version of mypy fixes buggy error when trying to update only boto3 stubs. However it brings new checks and starts to yell when we index into cusror.fetchone without checking for None first. So this introduces a wrapper to simplify quering for scalar values. I tried to use cursor_factory connection argument but without success. There can be a better way to do that, but this looks the simplest	2022-08-01 18:28:49 +03:00
Heikki Linnakangas	d0494c391a	Remove wal_receiver mgmt API endpoint Move all the fields that were returned by the wal_receiver endpoint into timeline_detail. Internally, move those fields from the separate global WAL_RECEIVERS hash into the LayeredTimeline struct. That way, all the information about a timeline is kept in one place. In the passing, I noted that the 'thread_id' field was removed from WalReceiverEntry in commit `e5cb727572`, but it forgot to update openapi_spec.yml. This commit removes that too.	2022-07-29 20:51:37 +03:00
Heikki Linnakangas	d903dd61bd	Rename 'wal_producer_connstr' to 'wal_source_connstr'. What the WAL receiver really connects to is the safekeeper. The "producer" term is a bit misleading, as the safekeeper doesn't produce the WAL, the compute node does. This change also applies to the name of the field used in the mgmt API in in the response of the '/v1/tenant/:tenant_id/timeline/:timeline_id/wal_receiver' endpoint. AFAICS that's not used anywhere else than one python test, so it should be OK to change it.	2022-07-29 09:09:22 +03:00
Thang Pham	417d9e9db2	Add current physical size to tenant status endpoint (#2173 ) Ref #1902	2022-07-28 13:59:20 -04:00
Arthur Petukhovsky	09ddd34b2a	Fix checkpoints race condition in safekeeper tests (#2175 ) We should wait for WAL to arrive to pageserver before calling CHECKPOINT	2022-07-28 15:44:02 +03:00
Arthur Petukhovsky	aeb3f0ea07	Refactor test_race_conditions (#2162 ) Do not use python multiprocessing, make the test async	2022-07-28 14:38:37 +03:00
Alexey Kondratov	01f1f1c1bf	Add OpenAPI spec for safekeeper HTTP API (neondatabase/cloud#1264, #2061 ) This spec is used in the `cloud` repo to generate HTTP client.	2022-07-27 21:29:22 +03:00
Thang Pham	6a664629fa	Add timeline physical size tracking (#2126 ) Ref #1902. - Track the layered timeline's `physical_size` using `pageserver_current_physical_size` metric when updating the layer map. - Report the local timeline's `physical_size` in timeline GET APIs. - Add `include-non-incremental-physical-size` URL flag to also report the local timeline's `physical_size_non_incremental` (similar to `logical_size_non_incremental`) - Add a `UIntGaugeVec` and `UIntGauge` to represent `u64` prometheus metrics Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2022-07-27 12:36:46 -04:00
Kirill Bulatov	45680f9a2d	Drop CircleCI runs (#2082 )	2022-07-25 18:30:30 +03:00
Dmitry Ivanov	5f4ccae5c5	[proxy] Add the `password hack` authentication flow (#2095 ) [proxy] Add the `password hack` authentication flow This lets us authenticate users which can use neither SNI (due to old libpq) nor connection string `options` (due to restrictions in other client libraries). Note: `PasswordHack` will accept passwords which are not encoded in base64 via the "password" field. The assumption is that most user passwords will be valid utf-8 strings, and the rest may still be passed via "password_".	2022-07-25 17:23:10 +03:00
Thang Pham	39c59b8df5	Fix flaky test_branch_creation_before_gc test (#2142 )	2022-07-22 12:44:20 +01:00
Alexander Bayandin	9dcb9ca3da	test/performance: ensure we don't have tables that we're creating (#2135 )	2022-07-22 11:00:05 +01:00
Thang Pham	ed102f44d9	Reduce memory allocations for page server (#2010 ) ## Overview This patch reduces the number of memory allocations when running the page server under a heavy write workload. This mostly helps improve the speed of WAL record ingestion. ## Changes - modified `DatadirModification` to allow reuse the struct's allocated memory after each modification - modified `decode_wal_record` to allow passing a `DecodedWALRecord` reference. This helps reuse the struct in each `decode_wal_record` call - added a reusable buffer for serializing object inside the `InMemoryLayer::put_value` function - added a performance test simulating a heavy write workload for testing the changes in this patch ### Semi-related changes - remove redundant serializations when calling `DeltaLayer::put_value` during `InMemoryLayer::write_to_disk` function call [1] - removed the info span `info_span!("processing record", lsn = %lsn)` during each WAL ingestion [2] ## Notes - [1]: in `InMemoryLayer::write_to_disk`, a deserialization is called ``` let val = Value::des(&buf)?; delta_layer_writer.put_value(key, *lsn, val)?; ``` `DeltaLayer::put_value` then creates a serialization based on the previous deserialization ``` let off = self.blob_writer.write_blob(&Value::ser(&val)?)?; ``` - [2]: related: https://github.com/neondatabase/neon/issues/733	2022-07-21 12:08:26 -04:00
Konstantin Knizhnik	572ae74388	More precisely control size of inmem layer (#1927 ) * More precisely control size of inmem layer * Force recompaction of L0 layers if them contains large non-wallogged BLOBs to avoid too large layers * Add modified version of test_hot_update test (test_dup_key.py) which should generate large layers without large number of tables * Change test name in test_dup_key * Add Layer::get_max_key_range function * Add layer::key_iter method and implement new approach of splitting layers during compaction based on total size of all key values * Add test_large_schema test for checking layer file size after compaction * Make clippy happy * Restore checking LSN distance threshold for checkpoint in-memory layer * Optimize stoage keys iterator * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Fix code style * Reduce number of tables in test_large_schema to make it fit in timeout with debug build * Fix style of test_large_schema.py * Fix handlng of duplicates layers Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-07-21 07:45:11 +03:00
Arthur Petukhovsky	b445cf7665	Refactor test_unavailability (#2134 ) Now test_unavailability uses async instead of Process. The test is refactored to fix a possible race condition.	2022-07-20 22:13:05 +03:00
Heikki Linnakangas	f4233fde39	Silence "Module already imported" warning in python tests We were getting a warning like this from the pg_regress tests: =================== warnings summary =================== /usr/lib/python3/dist-packages/_pytest/config/__init__.py:663 /usr/lib/python3/dist-packages/_pytest/config/__init__.py:663: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: fixtures.pg_stats self.import_plugin(import_spec) -- Docs: https://docs.pytest.org/en/stable/warnings.html ------------------ Benchmark results ------------------- To fix, reorder the imports in conftest.py. I'm not sure what exactly the problem was or why the order matters, but the warning is gone and that's good enough for me.	2022-07-20 16:55:41 +03:00
Heikki Linnakangas	abff15dd7c	Fix test to be more robust with slow pageserver. If the WAL arrives at the pageserver slowly, it's possible that the branch is created before all the data on the parent branch have arrived. That results in a failure: test_runner/batch_others/test_tenant_relocation.py:259: in test_tenant_relocation timeline_id_second, current_lsn_second = populate_branch(pg_second, create_table=False, expected_sum=1001000) test_runner/batch_others/test_tenant_relocation.py:133: in populate_branch assert cur.fetchone() == (expected_sum, ) E assert (500500,) == (1001000,) E At index 0 diff: 500500 != 1001000 E Full diff: E - (1001000,) E + (500500,) To fix, specify the LSN to branch at, so that the pageserver will wait for it arrive. See https://github.com/neondatabase/neon/issues/2063	2022-07-20 15:59:46 +03:00

1 2 3 4 5 ...

394 Commits