rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 04:30:38 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	32f4a56327	Fix catching error in test_replication_lag	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	28475c660f	Trace received standby_horizon updates	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	69a097ef5e	Bump Postgres vesion	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	772a579d1f	Fix merge problems	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	1299a7cf8d	Bump Postgres version	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	535ea85082	Fix python formatting	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	e1fdab059f	Add rest read-only replica startup	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	3f04e644da	Bump Postgres version	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	b3d6910f87	Make ruff happy	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	f24b898bef	Prevent flip-flop of standby_horizon by ignoring 0 LSN and resetting it after each GC iteration	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	22a0a6ba0c	Catch recovery conflict oin all queries	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	50309cfe83	Fix wait_replica_caughtup function	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	1e9dac2d0e	Ignore recovery conflict errors	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	6101e7d981	Ignore recovery conflict in test_replication_lag test	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	7fe1f4f9bf	Make ruff happy	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	df4b944906	Fix decoding of got standby feedback	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	f6fba416fa	Fix CombineHotStanbyFeedbacks	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	a6c71157e5	Add wait_replica_caughtup to neon_fixtures	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	6fa4068c51	Make ruff happy	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	a902b5472f	Add timeout to let replica start	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	bb7be2f22c	Use InvalidXLogRecPt for horizon=request_lsn	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	6533636123	Fix style	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	f15116fad1	Use Lsn::INVALID for horizon=request_lsn	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	9d003402e9	Use InvalidXlogRecPtr for horizon=request_lsn	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	a5e06a3ea7	Use GetXLogReplayRecPtr only for replicas	2024-02-07 08:57:08 +02:00
Konstantin Knizhnik	9d984edf54	Replace latest with horizon in get_page request	2024-02-07 08:57:03 +02:00
Konstantin Knizhnik	d499008b42	Set hot_standby_feedback option to ON	2024-02-07 08:54:11 +02:00
Konstantin Knizhnik	6be3a4507b	Fix python style	2024-02-07 08:54:11 +02:00
Konstantin Knizhnik	aac24f623b	Add test for delaying GC by replica	2024-02-07 08:54:11 +02:00
Konstantin Knizhnik	af08102718	Make clippy happy	2024-02-07 08:54:11 +02:00
Konstantin Knizhnik	32e38d5e86	Limit max replication lag to prevent blocking GC	2024-02-07 08:54:11 +02:00
Konstantin Knizhnik	8cfc5ade57	Propagate repply_flush_lsn from SK to PS to prevent GC from collecting objects which may be still requested by replica	2024-02-07 08:54:09 +02:00
Konstantin Knizhnik	f3d7d23805	Some small WAL records can write a lot of data to KV storage, so perform checkpoint check more frequently (#6639 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1707149618314539?thread_ts=1707081520.140049&cid=C04DGM6SMTM ## Summary of changes Perform checkpoint check after processing `ingest_batch_size` (default 100) WAL records. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-02-07 08:47:19 +02:00
Alexander Bayandin	9f75da7c0a	test_lazy_startup: fix statement_timeout setting (#6654 ) ## Problem Test `test_lazy_startup` is flaky[0], sometimes (pretty frequently) it fails with `canceling statement due to statement timeout`. - [0] https://neon-github-public-dev.s3.amazonaws.com/reports/main/7803316870/index.html#suites/355b1a7a5b1e740b23ea53728913b4fa/7263782d30986c50/history ## Summary of changes - Fix setting `statement_timeout` setting by reusing a connection for all queries. - Also fix label (`lazy`, `eager`) assignment - Split `test_lazy_startup` into two, by `slru` laziness and make tests smaller	2024-02-07 00:31:26 +00:00
Alexander Bayandin	f4cc7cae14	CI(build-tools): Update Python from 3.9.2 to 3.9.18 (#6615 ) ## Problem We use an outdated version of Python (3.9.2) ## Summary of changes - Update Python to the latest patch version (3.9.18) - Unify the usage of python caches where possible	2024-02-06 20:30:43 +00:00
John Spray	4f57dc6cc6	control_plane/attachment_service: take public key as value (#6651 ) It's awkward to point to a file when doing some kinds of ad-hoc deployment (like right now, when I'm hacking a helm chart having not quite hooked up secrets properly yet). We take all the rest of the secrets as CLI args directly, so let's do the same for public key.	2024-02-06 19:08:39 +00:00
Heikki Linnakangas	dc811d1923	Add a span to 'create_neon_superuser' for better OpenTelemetry traces (#6644 ) create_neon_superuser runs the first queries in the database after cold start. Traces suggest that those first queries can make up a significant fraction of the cold start time. Make it more visible by adding an explict tracing span to it; currently you just have to deduce it by looking at the time spent in the parent 'apply_config' span subtracted by all the other child spans.	2024-02-06 20:37:35 +02:00
Alexander Bayandin	e65f0fe874	CI(benchmarks): make job split consistent across reruns (#6614 ) ## Problem We've got several issues with the current `benchmarks` job setup: - `benchmark_durations.json` file (that we generate in runtime to split tests into several jobs[0]) is not consistent between these jobs (and very not consistent with the file if we rerun the job). I.e. test selection for each job can be different, which could end up in missed tests in a test run. - `scripts/benchmark_durations` doesn't fetch all tests from the database (it doesn't expect any extra directories inside `test_runner/performance`) - For some reason, currently split into 4 groups ends up with the 4th group has no tests to run, which fails the job[1] - [0] https://github.com/neondatabase/neon/pull/4683 - [1] https://github.com/neondatabase/neon/issues/6629 ## Summary of changes - Generate `benchmark_durations.json` file once before we start `benchmarks` jobs (this makes it consistent across the jobs) and pass the file content through the GitHub Actions input (this makes it consistent for reruns) - `scripts/benchmark_durations` fix SQL query for getting all required tests - Split benchmarks into 5 jobs instead of 4 jobs.	2024-02-06 17:00:55 +00:00
Joonas Koivunen	bb92721168	build: migrate check-style-rust to small runners (#6588 ) We have more small runners than large runners, and often a shortage of large runners. Migrate `check-style-rust` to run on small runners.	2024-02-06 15:53:04 +00:00
Christian Schwarz	d7b29aace7	refactor(walredo): don't create WalRedoManager for broken tenants (#6597 ) When we'll later introduce a global pool of pre-spawned walredo processes (https://github.com/neondatabase/neon/issues/6581), this refactoring avoids plumbing through the reference to the pool to all the places where we create a broken tenant. Builds atop the refactoring in #6583	2024-02-06 16:20:02 +01:00
Christian Schwarz	53a3ed0a7e	debug_assert presence of `shard_id` tracing field (#6572 ) also: fixes https://github.com/neondatabase/neon/issues/6638	2024-02-06 14:43:33 +00:00
dependabot[bot]	27a3c9ecbe	build(deps): bump cryptography from 41.0.6 to 42.0.0 (#6643 )	2024-02-06 13:15:07 +00:00
John Spray	6297843317	tests: flakiness fixes in pageserver tests (#6632 ) Fix several test flakes: - test_sharding_service_smoke had log failures on "Dropped LSN updates" - test_emergency_mode had log failures on a deletion queue shutdown check, where the check was incorrect because it was expecting channel receiver to stay alive after cancellation token was fired. - test_secondary_mode_eviction had racing heatmap uploads because the test was using a live migration hook to set up locations, where that migration was itself uploading heatmaps and generally making the situation more complex than it needed to be. These are the failure modes that I saw when spot checking the last few failures of each test. This will mostly/completely address #6511, but I'll leave that ticket open for a couple days and then check if either of the tests named in that ticket are flaky. Related #6511	2024-02-06 12:49:41 +00:00
Vadim Kharitonov	dae56ef60c	Do not suspend compute if there is an active logical replication subscription. (#6570 ) ## Problem the idea is to keep compute up and running if there are any active logical replication subscriptions. ### Rationale Rationale: - The Write-Ahead Logging (WAL) files, which contain the data changes, will need to be retained on the publisher side until the subscriber is able to connect again and apply these changes. This could potentially lead to increased disk usage on the publisher - and we do not want to disrupt the source - I think it is more pain for our customer to resolve storage issues on the source than to pay for the compute at the target. - Upon resuming the compute resources, the subscriber will start consuming and applying the changes from the retained WAL files. The time taken to catch up will depend on the volume of changes and the configured vCPUs. we can avoid explaining complex situations where we lag behind (in extreme cases we could lag behind hours, days or even months) - I think an important use case for logical replication from a source is a one-time migration or release upgrade. In this case the customer would not mind if we are not suspended for the duration of the migration. We need to document this in the release notes and the documentation in the context of logical replication where Neon is the target (subscriber) ### See internal discussion here https://neondb.slack.com/archives/C04DGM6SMTM/p1706793400746539?thread_ts=1706792628.701279&cid=C04DGM6SMTM	2024-02-06 12:15:42 +00:00
Christian Schwarz	0de46fd6f2	heavier_once_cell: switch to tokio::sync::RwLock (#6589 ) Using the RwLock reduces contention on the hot path. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-02-06 14:04:15 +02:00
Joonas Koivunen	53743991de	uploader: avoid cloning vecs just to get Bytes (#6645 ) Fix cloning the serialized heatmap on every attempt by just turning it into `bytes::Bytes` before clone so it will be a refcounted instead of refcounting a vec clone later on. Also fixes one cancellation token cloning I had missed in #6618. Cc: #6096	2024-02-06 11:34:13 +00:00
John Spray	431f4234d4	storage controller: embed database migrations in binary (#6637 ) ## Problem We don't have a neat way to carry around migration .sql files during deploy, and in any case would prefer to avoid depending on diesel CLI to deploy. ## Summary of changes - Use `diesel_migrations` crate to embed migrations in our binary - Run migrations on startup - Drop the diesel dependency in the `neon_local` binary, as the attachment_service binary just needs the database to exist. Do database creation with a simple `createdb`. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-06 10:07:10 +00:00
Christian Schwarz	edcde05c1c	refactor(walredo): split up the massive `walredo.rs` (#6583 ) Part of https://github.com/neondatabase/neon/issues/6581	2024-02-06 09:44:49 +00:00
Christian Schwarz	e196d974cc	pagebench: actually implement `--num_clients` (#6640 ) Will need this to validate per-tenant throttling in https://github.com/neondatabase/neon/issues/5899	2024-02-06 10:34:16 +01:00
Joonas Koivunen	947165788d	refactor: needless cancellation token cloning (#6618 ) The solution we ended up for `backoff::retry` requires always cloning of cancellation tokens even though there is just `.await`. Fix that, and also turn the return type into `Option<Result<T, E>>` avoiding the need for the `E::cancelled()` fn passed in. Cc: #6096	2024-02-06 09:39:06 +02:00

1 2 3 4 5 ...

4583 Commits