rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-14 11:40:38 +00:00

Author	SHA1	Message	Date
Alexey Masterov	8b04fc469d	Fix the complains	2024-09-10 14:30:40 +02:00
a-masterov	f8b9ec0dd0	Merge branch 'main' into amasterov/regress-arm	2024-09-10 14:29:04 +02:00
Alexey Masterov	b45560db75	Fix the error	2024-09-10 13:18:47 +02:00
Alexey Masterov	c4d98915ff	Refactoring	2024-09-10 13:12:46 +02:00
Alexey Masterov	9ac06ea3d9	Debug	2024-09-10 13:04:04 +02:00
Alexey Masterov	841b39f7c5	Some refactoring	2024-09-10 12:52:46 +02:00
Alexey Masterov	fe8fee0b88	Add debug	2024-09-10 12:26:22 +02:00
Alexey Masterov	dbde226f38	Add debug	2024-09-10 12:21:09 +02:00
Alexey Masterov	01c37c6c6c	Refactor, delete roles accidentally left into a project	2024-09-10 12:04:15 +02:00
Alexey Masterov	e989bf1887	remove unused import os	2024-09-10 11:17:55 +02:00
Alexey Masterov	287e05f49d	Fix the error	2024-09-09 16:22:04 +02:00
Alexey Masterov	650fb7b2d7	Drop subscriptions if exist	2024-09-09 16:18:26 +02:00
Heikki Linnakangas	723c0971e8	Don't create 'empty' branch in neon_simple_env (#8965 ) Now that we've given up hope on sharing the neon_simple_env between tests, there's no reason to not use the 'main' branch directly.	2024-09-09 12:38:34 +03:00
Heikki Linnakangas	c8f67eed8f	Remove TEST_SHARED_FIXTURES (#8965 ) I wish it worked, but it's been broken for a long time, so let's admit defeat and remove it. The idea of sharing the same pageserver and safekeeper environment between tests is still sound, and it could save a lot of time in our CI. We should perhaps put some time into doing that, but we're better off starting from scratch than trying to make TEST_SHARED_FIXTURES work in its current form.	2024-09-09 12:38:34 +03:00
Joonas Koivunen	3dbd34aa78	feat(storcon): forward gc blocking and unblocking (#8956 ) Currently using gc blocking and unblocking with storage controller managed pageservers is painful. Implement the API on storage controller. Fixes: #8893	2024-09-06 22:42:55 +01:00
Alex Chi Z.	ac5815b594	feat(storage-controller): add node shards api (#8896 ) For control-plane managed tenants, we have the page in the admin console that lists all tenants on a specific pageserver. But for storage-controller managed ones, we don't have that functionality for now. ## Summary of changes Adds an API that lists all shards on a given node (intention + observed) --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-06 14:14:21 -04:00
Arseny Sher	11cf16e3f3	safekeeper: add term_bump endpoint. When walproposer observes now higher term it restarts instead of crashing whole compute with PANIC; this avoids compute crash after term_bump call. After successfull election we're still checking last_log_term of the highest given vote to ensure basebackup is good, and PANIC otherwise. It will be used for migration per 035-safekeeper-dynamic-membership-change.md and https://github.com/neondatabase/docs/pull/21 ref https://github.com/neondatabase/neon/issues/8700	2024-09-06 19:13:50 +03:00
Arseny Sher	e287f36a05	safekeeper: fix endpoint restart immediately after xlog switch. Check that truncation point is not from the future by comparing it with write_record_lsn, not write_lsn, and explain that xlog switch changes their normal order. ref https://github.com/neondatabase/neon/issues/8911	2024-09-06 18:09:21 +03:00
Alexey Masterov	7469656b72	Add regression.out to allure reports	2024-09-06 15:49:43 +02:00
Alexey Masterov	e54f8bc5ff	Change the workdir to test_output_dir	2024-09-06 14:53:40 +02:00
Vlad Lazar	e86fef05dd	storcon: track preferred AZ for each tenant shard (#8937 ) ## Problem We want to do AZ aware scheduling, but don't have enough metadata. ## Summary of changes Introduce a `preferred_az_id` concept for each managed tenant shard. In a future PR, the scheduler will use this as a soft preference. The idea is to try and keep the shard attachments within the same AZ. Under the assumption that the compute was placed in the correct AZ, this reduces the chances of cross AZ trafic from between compute and PS. In terms of code changes we: 1. Add a new nullable `preferred_az_id` column to the `tenant_shards` table. Also include an in-memory counterpart. 2. Populate the preferred az on tenant creation and shard splits. 3. Add an endpoint which allows to bulk-set preferred AZs. (3) gives us the migration path. I'll write a script which queries the cplane db in the region and sets the preferred az of all shards with an active compute to the AZ of said compute. For shards without an active compute, I'll use the AZ of the currently attached pageserver since this is what cplane uses now to schedule computes.	2024-09-06 13:11:17 +01:00
Alexey Masterov	2098184d67	Revert "Revert "Fix an error in the path"" This reverts commit `c7f2a26cb9`.	2024-09-06 13:56:20 +02:00
Alexey Masterov	c7f2a26cb9	Revert "Fix an error in the path" This reverts commit `ebdd187398`.	2024-09-06 13:51:15 +02:00
Alexey Masterov	ebdd187398	Fix an error in the path	2024-09-06 13:36:49 +02:00
Alexey Masterov	6c679f722c	Fix an error in the path	2024-09-06 13:27:05 +02:00
Alexey Masterov	d0cf670b76	Fix an error in the path	2024-09-06 13:19:06 +02:00
Alexey Masterov	6d66a2ebe7	Fix an error in the path	2024-09-06 13:01:43 +02:00
Alexey Masterov	a8d1cbe376	Change the directories calculation	2024-09-06 12:58:10 +02:00
Alexey Masterov	222f483ce8	Add a debug	2024-09-06 12:19:08 +02:00
Alexey Masterov	c7d9eda56a	Some refactoring	2024-09-06 11:25:59 +02:00
Alexey Masterov	195c7a359d	Some refactoring	2024-09-06 11:06:43 +02:00
Alexey Masterov	8bb0e97880	Some refactoring	2024-09-06 11:03:29 +02:00
Vlad Lazar	04f99a87bf	storcon: make pageserver AZ id mandatory (#8856 ) ## Problem https://github.com/neondatabase/neon/pull/8852 introduced a new nullable column for the `nodes` table: `availability_zone_id` ## Summary of changes * Make neon local and the test suite always provide an az id * Make the az id field in the ps registration request mandatory * Migrate the column to non-nullable and adjust in memory state accordingly * Remove the code that was used to populate the az id for pre-existing nodes	2024-09-05 19:14:21 +01:00
a-masterov	815d7d6ab1	Merge branch 'main' into amasterov/regress-arm	2024-09-05 15:30:05 +02:00
Joonas Koivunen	efe03d5a1c	build: sync between benchies (#8919 ) Sometimes, the benchmarks fail to start up pageserver in 10s without any obvious reason. Benchmarks run sequentially on otherwise idle runners. Try running `sync(2)` after each bench to force a cleaner slate. Implement this via: - SYNC_AFTER_EACH_TEST environment variable enabled autouse fixture - autouse fixture seems to be outermost fixture, so it works as expected - set SYNC_AFTER_EACH_TEST=true for benchmarks in build_and_test workflow Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10678984691/index.html#suites/5008d72a1ba3c0d618a030a938fc035c/1210266507534c0f/ --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-05 14:29:48 +01:00
Christian Schwarz	850421ec06	refactor(pageserver): rely on serde derive for toml deserialization (#7656 ) This PR simplifies the pageserver configuration parsing as follows: * introduce the `pageserver_api::config::ConfigToml` type * implement `Default` for `ConfigToml` * use serde derive to do the brain-dead leg-work of processing the toml document * use `serde(default)` to fill in default values * in `pageserver` crate: * use `toml_edit` to deserialize the pageserver.toml string into a `ConfigToml` * `PageServerConfig::parse_and_validate` then * consumes the `ConfigToml` * destructures it exhaustively into its constituent fields * constructs the `PageServerConfig` The rules are: * in `ConfigToml`, use `deny_unknown_fields` everywhere * static default values go in `pageserver_api` * if there cannot be a static default value (e.g. which default IO engine to use, because it depends on the runtime), make the field in `ConfigToml` an `Option` * if runtime-augmentation of a value is needed, do that in `parse_and_validate` * a good example is `virtual_file_io_engine` or `l0_flush`, both of which need to execute code to determine the effective value in `PageServerConf` The benefits: * massive amount of brain-dead repetitive code can be deleted * "unused variable" compile-time errors when removing a config value, due to the exhaustive destructuring in `parse_and_validate` * compile-time errors guide you when adding a new config field Drawbacks: * serde derive is sometimes a bit too magical * `deny_unknown_fields` is easy to miss Future Work / Benefits: * make `neon_local` use `pageserver_api` to construct `ConfigToml` and write it to `pageserver.toml` * This provides more type safety / coompile-time errors than the current approach. ### Refs Fixes #3682 ### Future Work * `remote_storage` deser doesn't reject unknown fields https://github.com/neondatabase/neon/issues/8915 * clean up `libs/pageserver_api/src/config.rs` further * break up into multiple files, at least for tenant config * move `models` as appropriate / refine distinction between config and API models / be explicit about when it's the same * use `pub(crate)` visibility on `mod defaults` to detect stale values	2024-09-05 14:59:49 +02:00
Alexey Masterov	226464e6b5	Fix format	2024-09-05 12:53:39 +02:00
Alexey Masterov	7a324f84e4	Fix Line	2024-09-05 12:22:13 +02:00
Alexey Masterov	b54a919d51	Fix Line	2024-09-05 12:19:32 +02:00
Alexey Masterov	afd25c896c	Get rid of redundant local variables	2024-09-05 12:14:54 +02:00
Alexey Masterov	99f9ab2c07	Fix regex	2024-09-05 12:04:16 +02:00
Alexey Masterov	9e61284d10	fix mypy warnings	2024-09-05 11:55:23 +02:00
Alexey Masterov	bfb7bf92f2	fix linters' warnings	2024-09-05 11:07:51 +02:00
Alexey Masterov	9414976c4c	uncomment the extension creation	2024-09-04 17:36:48 +02:00
John Spray	1a9b54f1d9	storage controller: read from database in validate API (#8784 ) ## Problem The initial implementation of the validate API treats the in-memory generations as authoritative. - This is true when only one storage controller is running, but if a rogue controller was running that hadn't been shut down properly, and some pageserver requests were routed to that bad controller, it could incorrectly return valid=true for stale generations. - The generation in the main in-memory map gets out of date while a live migration is in flight, and if the origin location for the migration tries to do some deletions even though it is in AttachedStale (for example because it had already started compaction), these might be wrongly validated + executed. ## Summary of changes - Continue to do the in-memory check: if this returns valid=false it is sufficient to reject requests. - When valid=true, do an additional read from the database to confirm the generation is fresh. - Revise behavior for validation on missing shards: this used to always return valid=true as a convenience for deletions and shard splits, so that pageservers weren't prevented from completing any enqueued deletions for these shards after they're gone. However, this becomes unsafe when we consider split brain scenarios. We could reinstate this in future if we wanted to store some tombstones for deleted shards. - Update test_scrubber_physical_gc to cope with the behavioral change: they must now explicitly flush the deletion queue before splits, to avoid tripping up on deletions that are enqueued at the time of the split (these tests assert "scrubber deletes nothing", which check fails if the split leaves behind some remote objects that are legitimately GC'able) - Add `test_storage_controller_validate_during_migration`, which uses failpoints to create a situation where incorrect generation validation during a live migration could result in a corruption The rate of validate calls for tenants is pretty low: it happens as a consequence deletions from GC and compaction, which are both concurrency-limited on the pageserver side.	2024-09-04 15:00:40 +01:00
Alexey Masterov	777c01938d	fix	2024-09-04 15:42:19 +02:00
Alexey Masterov	302a2203a1	change path	2024-09-04 15:27:36 +02:00
Alexey Masterov	bc1697ab28	change path	2024-09-04 15:18:22 +02:00
Alexey Masterov	61f3ac3fbf	change path	2024-09-04 14:58:41 +02:00
Alexey Masterov	f7f0be8727	Temporary disable the extension.	2024-09-04 14:55:02 +02:00

1 2 3 4 5 ...

1599 Commits