rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Arthur Petukhovsky	09ddd34b2a	Fix checkpoints race condition in safekeeper tests (#2175 ) We should wait for WAL to arrive to pageserver before calling CHECKPOINT	2022-07-28 15:44:02 +03:00
Arthur Petukhovsky	aeb3f0ea07	Refactor test_race_conditions (#2162 ) Do not use python multiprocessing, make the test async	2022-07-28 14:38:37 +03:00
Kirill Bulatov	58b04438f0	Tweak backoff numbers to avoid no wal connection threshold trigger	2022-07-27 22:16:40 +03:00
Alexey Kondratov	01f1f1c1bf	Add OpenAPI spec for safekeeper HTTP API (neondatabase/cloud#1264, #2061 ) This spec is used in the `cloud` repo to generate HTTP client.	2022-07-27 21:29:22 +03:00
Thang Pham	6a664629fa	Add timeline physical size tracking (#2126 ) Ref #1902. - Track the layered timeline's `physical_size` using `pageserver_current_physical_size` metric when updating the layer map. - Report the local timeline's `physical_size` in timeline GET APIs. - Add `include-non-incremental-physical-size` URL flag to also report the local timeline's `physical_size_non_incremental` (similar to `logical_size_non_incremental`) - Add a `UIntGaugeVec` and `UIntGauge` to represent `u64` prometheus metrics Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2022-07-27 12:36:46 -04:00
Sergey Melnikov	f6f29f58cd	Switch production storage to dedicated etcd (#2169 )	2022-07-27 16:41:25 +03:00
Sergey Melnikov	fd46e52e00	Switch staging storage to dedicated etcd (#2164 )	2022-07-27 12:28:05 +03:00
Heikki Linnakangas	d6f12cff8e	Make DatadirTimeline a trait, implemented by LayeredTimeline. Previously DatadirTimeline was a separate struct, and there was a 1:1 relationship between each DatadirTimeline and LayeredTimeline. That was a bit awkward; whenever you created a timeline, you also needed to create the DatadirTimeline wrapper around it, and if you only had a reference to the LayeredTimeline, you would need to look up the corresponding DatadirTimeline struct through tenant_mgr::get_local_timeline_with_load(). There were a couple of calls like that from LayeredTimeline itself. Refactor DatadirTimeline, so that it's a trait, and mark LayeredTimeline as implementing that trait. That way, there's only one object, LayeredTimeline, and you can call both Timeline and DatadirTimeline functions on that. You can now also call DatadirTimeline functions from LayeredTimeline itself. I considered just moving all the functions from DatadirTimeline directly to Timeline/LayeredTimeline, but I still like to have some separation. Timeline provides a simple key-value API, and handles durably storing key/value pairs, and branching. Whereas DatadirTimeline is stateless, and provides an abstraction over the key-value store, to present an interface with relations, databases, etc. Postgres concepts. This simplified the logical size calculation fast-path for branch creation, introduced in commit `28243d68e6`. LayerTimeline can now access the ancestor's logical size directly, so it doesn't need the caller to pass it to it. I moved the fast-path to init_logical_size() function itself. It now checks if the ancestor's last LSN is the same as the branch point, i.e. if there haven't been any changes on the ancestor after the branch, and copies the size from there. An additional bonus is that the optimization will now work any time you have a branch of another branch, with no changes from the ancestor, not only at a create-branch command.	2022-07-27 10:26:21 +03:00
Konstantin Knizhnik	5a4394a8df	Do not hold timelines lock while calling update_gc_info to avoid recusrive mutex lock and so deadlock (#2163 )	2022-07-26 22:21:05 +03:00
Heikki Linnakangas	d301b8364c	Move LayeredTimeline and related code to separate source file. The layered_repository.rs file had grown to be very large. Split off the LayeredTimeline struct and related code to a separate source file to make it more manageable. There are plans to move much of the code to track timelines from tenant_mgr.rs to LayeredRepository. That will make layered_repository.rs grow again, so now is a good time to split it. There's a lot more cleanup to do, but this commit intentionally only moves existing code and avoids doing anything else, for easier review.	2022-07-26 11:47:04 +03:00
Kirill Bulatov	172314155e	Compact only once on psql checkpoint call	2022-07-26 11:37:16 +03:00
Konstantin Knizhnik	28243d68e6	Yet another apporach of copying logical timeline size during branch creation (#2139 ) * Yet another apporach of copying logical timeline size during branch creation * Fix unit tests * Update pageserver/src/layered_repository.rs Co-authored-by: Thang Pham <thang@neon.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Thang Pham <thang@neon.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Thang Pham <thang@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech>	2022-07-26 09:11:10 +03:00
Kirill Bulatov	45680f9a2d	Drop CircleCI runs (#2082 )	2022-07-25 18:30:30 +03:00
Dmitry Ivanov	5f4ccae5c5	[proxy] Add the `password hack` authentication flow (#2095 ) [proxy] Add the `password hack` authentication flow This lets us authenticate users which can use neither SNI (due to old libpq) nor connection string `options` (due to restrictions in other client libraries). Note: `PasswordHack` will accept passwords which are not encoded in base64 via the "password" field. The assumption is that most user passwords will be valid utf-8 strings, and the rest may still be passed via "password_".	2022-07-25 17:23:10 +03:00
Thang Pham	39c59b8df5	Fix flaky test_branch_creation_before_gc test (#2142 )	2022-07-22 12:44:20 +01:00
Alexander Bayandin	9dcb9ca3da	test/performance: ensure we don't have tables that we're creating (#2135 )	2022-07-22 11:00:05 +01:00
Dmitry Rodionov	e308265e42	register tenants task thread pool threads in thread_mgr needed to avoid this warning: is_shutdown_requested() called in an unexpected thread	2022-07-22 11:43:38 +03:00
Thang Pham	ed102f44d9	Reduce memory allocations for page server (#2010 ) ## Overview This patch reduces the number of memory allocations when running the page server under a heavy write workload. This mostly helps improve the speed of WAL record ingestion. ## Changes - modified `DatadirModification` to allow reuse the struct's allocated memory after each modification - modified `decode_wal_record` to allow passing a `DecodedWALRecord` reference. This helps reuse the struct in each `decode_wal_record` call - added a reusable buffer for serializing object inside the `InMemoryLayer::put_value` function - added a performance test simulating a heavy write workload for testing the changes in this patch ### Semi-related changes - remove redundant serializations when calling `DeltaLayer::put_value` during `InMemoryLayer::write_to_disk` function call [1] - removed the info span `info_span!("processing record", lsn = %lsn)` during each WAL ingestion [2] ## Notes - [1]: in `InMemoryLayer::write_to_disk`, a deserialization is called ``` let val = Value::des(&buf)?; delta_layer_writer.put_value(key, *lsn, val)?; ``` `DeltaLayer::put_value` then creates a serialization based on the previous deserialization ``` let off = self.blob_writer.write_blob(&Value::ser(&val)?)?; ``` - [2]: related: https://github.com/neondatabase/neon/issues/733	2022-07-21 12:08:26 -04:00
Konstantin Knizhnik	572ae74388	More precisely control size of inmem layer (#1927 ) * More precisely control size of inmem layer * Force recompaction of L0 layers if them contains large non-wallogged BLOBs to avoid too large layers * Add modified version of test_hot_update test (test_dup_key.py) which should generate large layers without large number of tables * Change test name in test_dup_key * Add Layer::get_max_key_range function * Add layer::key_iter method and implement new approach of splitting layers during compaction based on total size of all key values * Add test_large_schema test for checking layer file size after compaction * Make clippy happy * Restore checking LSN distance threshold for checkpoint in-memory layer * Optimize stoage keys iterator * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/layered_repository.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Fix code style * Reduce number of tables in test_large_schema to make it fit in timeout with debug build * Fix style of test_large_schema.py * Fix handlng of duplicates layers Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-07-21 07:45:11 +03:00
Arthur Petukhovsky	b445cf7665	Refactor test_unavailability (#2134 ) Now test_unavailability uses async instead of Process. The test is refactored to fix a possible race condition.	2022-07-20 22:13:05 +03:00
Kirill Bulatov	cc680dd81c	Explicitly enable cachepot in Docker builds only	2022-07-20 17:09:36 +03:00
Heikki Linnakangas	f4233fde39	Silence "Module already imported" warning in python tests We were getting a warning like this from the pg_regress tests: =================== warnings summary =================== /usr/lib/python3/dist-packages/_pytest/config/__init__.py:663 /usr/lib/python3/dist-packages/_pytest/config/__init__.py:663: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: fixtures.pg_stats self.import_plugin(import_spec) -- Docs: https://docs.pytest.org/en/stable/warnings.html ------------------ Benchmark results ------------------- To fix, reorder the imports in conftest.py. I'm not sure what exactly the problem was or why the order matters, but the warning is gone and that's good enough for me.	2022-07-20 16:55:41 +03:00
Heikki Linnakangas	b4c74c0ecd	Clean up unnecessary dependencies. Just to be tidy.	2022-07-20 16:31:25 +03:00
Heikki Linnakangas	abff15dd7c	Fix test to be more robust with slow pageserver. If the WAL arrives at the pageserver slowly, it's possible that the branch is created before all the data on the parent branch have arrived. That results in a failure: test_runner/batch_others/test_tenant_relocation.py:259: in test_tenant_relocation timeline_id_second, current_lsn_second = populate_branch(pg_second, create_table=False, expected_sum=1001000) test_runner/batch_others/test_tenant_relocation.py:133: in populate_branch assert cur.fetchone() == (expected_sum, ) E assert (500500,) == (1001000,) E At index 0 diff: 500500 != 1001000 E Full diff: E - (1001000,) E + (500500,) To fix, specify the LSN to branch at, so that the pageserver will wait for it arrive. See https://github.com/neondatabase/neon/issues/2063	2022-07-20 15:59:46 +03:00
Thang Pham	160e52ec7e	Optimize branch creation (#2101 ) Resolves #2054 Context: branch creation needs to wait for GC to acquire `gc_cs` lock, which prevents creating new timelines during GC. However, because individual timeline GC iteration also requires `compaction_cs` lock, branch creation may also need to wait for compactions of multiple timelines. This results in large latency when creating a new branch, which we advertised as "instantly". This PR optimizes the latency of branch creation by separating GC into two phases: 1. Collect GC data (branching points, cutoff LSNs, etc) 2. Perform GC for each timeline The GC bottleneck comes from step 2, which must wait for compaction of multiple timelines. This PR modifies the branch creation and GC functions to allow GC to hold the GC lock only in step 1. As a result, branch creation doesn't need to wait for compaction to finish but only needs to wait for GC data collection step, which is fast.	2022-07-19 14:56:25 -04:00
Heikki Linnakangas	98dd2e4f52	Use zstd and multiple threads to compress artifact tarball. For faster and better compression.	2022-07-19 21:31:34 +03:00
Heikki Linnakangas	71753dd947	Remove github CI 'build_postgres' job, merging it with 'build_neon' Simplifies the workflow. Makes the overall build a little faster, as the build_postgres step doesn't need to upload the pg.tgz artifact, and the build_neon step doesn't need to download it again. This effectively reverts commit `a490f64a68`. That commit changed the workflow so that the Postgres binaries were not included in the neon.tgz artifact. With this commit, the pg.tgz artifact is gone, and the Postgres binaries are part of neon.tgz again.	2022-07-19 21:31:22 +03:00
Alexander Bayandin	4446791397	github/workflows: pause stress env deployment (#2122 )	2022-07-19 17:40:58 +01:00
Alexander Bayandin	5ff7a7dd8b	github/workflows: run periodic benchmarks earlier (#2121 )	2022-07-19 16:33:33 +01:00
Heikki Linnakangas	3dce394197	Use the same cargo options for every cargo call. The "cargo metadata" and "cargo test --no-run" are used in the workflow to just list names of the final binaries, but unless the same cargo options like --release or --debug are used in those calls, they will in fact recompile everything.	2022-07-19 16:36:59 +03:00
Heikki Linnakangas	df7f644822	Move things around in github yml file, for clarity. Also, this avoids building the list of test binaries in release mode. They are not included in the neon.tgz tarball in release mode.	2022-07-19 16:36:59 +03:00
Arthur Petukhovsky	bf5333544f	Fix missing quotes in GitHub Actions (#2116 )	2022-07-19 10:57:24 +03:00
Heikki Linnakangas	0b8049c283	Update core_changes.md, describing Postgres changes. I went through "git diff REL_14_2" and updated the doc to list all the changes, categorized into what I think could form a logical set of patches.	2022-07-19 09:53:12 +03:00
Heikki Linnakangas	f384e20d78	Minor cleanup in layer_repository.rs.	2022-07-19 07:50:55 +03:00
Heikki Linnakangas	0b14fdb078	Reorganize, expand, improve internal documentation Reorganize existing READMEs and other documentation files into mdbook format. The resulting Table of Contents is a mix of placeholders for docs that we should write, and documentation files that we already had, dropped into the most appropriate place. Update the Pageserver overview diagram. Add sections on thread management and WAL redo processes. Add all the RFCs to the mdbook Table of Content too. Per github issue #1979	2022-07-18 17:39:12 +03:00
Arseny Sher	a69fdb0e8e	Fix commit_lsn monotonicity violation. On ProposerElected message receival WAL is truncated at streaming point; this code expected that, once vote is given for the proposer / term switch happened, flush_lsn can be advanced only by this proposer (or higher one). However, that didn't take into account possibility of accumulating written WAL and flushing it after vote is given -- flushing goes without term checks. Which eventually led to the violation in question. ref #2048	2022-07-18 15:15:51 +03:00
Arseny Sher	eeff56aeb7	Make get_dir_size robust to concurrent deletions. ref #2055	2022-07-18 15:13:10 +03:00
Dmitry Rodionov	7987889cb3	keep successfully downloaded index parts	2022-07-18 12:27:04 +03:00
Dmitry Rodionov	912a08317b	do not ignore errors during downloading of tenant index parts	2022-07-18 12:27:04 +03:00
Kirill Bulatov	c4b2347e21	Use less restricrtive lock guard during storage sync	2022-07-17 12:49:18 +03:00
dependabot[bot]	373bc59ebe	Bump pywin32 from 227 to 301 (#2102 )	2022-07-16 16:05:12 +01:00
Egor Suvorov	94003e1ebc	postgres_ffi: test restoring from intermediate LSNs by wal_craft	2022-07-15 19:06:50 +03:00
Egor Suvorov	19ea486cde	postgres_ffi/xlog_utils: refactor find_end_of_wal test * Deduce `last_segment` automatically * Get rid of local `wal_dir`/`wal_seg_size` variables * Prepare to test parsing of WAL from multiple specific points, not just the start; extract `check_end_of_wal` function to check both partial and non-partial WAL segments.	2022-07-15 19:06:50 +03:00
Alexander Bayandin	95c40334b8	github/workflows: post periodic benchmark failures to slack (#2105 )	2022-07-15 15:39:49 +01:00
Sergey Melnikov	a68d5a0173	Run workflow on release branch (#2085 )	2022-07-15 13:18:55 +02:00
Alexey Kondratov	c690522870	[compute_tools] Change owner of the schema public only once (#2058 ) Otherwise, we will change it back to the db owner on each restart. Even if user already changed schema owner to some other user.	2022-07-15 12:25:07 +02:00
Heikki Linnakangas	eaa550afcc	Reduce size of cargo deps cache, by excluding ~/.cargo/registry/src.	2022-07-15 13:18:48 +03:00
Heikki Linnakangas	a490f64a68	Don't include Postgres binaries in neon.tgz neon.tgz artifact in the github workflow included the contents of 'tmp_install', but that seems pointless, because the same files are included earlier already in the pg.tgz artifact.	2022-07-15 12:33:13 +03:00
Thang Pham	fe65d1df74	reduce concurrent tasks in `test_branching_with_pgbench.py` - add thread limit - run `pgbench` with 1 client	2022-07-15 12:30:09 +03:00
Heikki Linnakangas	c68336a246	Strip debug symbols from test binaries, to make the artifact smaller. Uploading large artifacts is slow in github actions. To speed that up, make the artifact smaller. The code coverage tool doesn't require debug symbols, so remove them. We've discussed doing the same for all binaries, but it's nice to have debugging symbols for debugging purposes, and so that you get more complete stack traces. The discussion is ongoing, but let's at least do this for the test symbols now.	2022-07-14 23:08:57 +03:00

1 2 3 4 5 ...

1855 Commits