rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-25 00:50:36 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	2753abc0d8	Remove leftover enums for configuring vectored get implementation The settings were removed in commit corb9d2c7b.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	a523548ed1	Remove unused cleanup_remaining_timeline_fs_traces function There's some more code that still checks for uninit and delete markers, see callers of is_delete_mark and is_uninit_mark, and github issue #5718. But these functions were outright dead.	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	2d4e5af18b	Remove unused code for parsing a postgresql.conf file	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	5da2340e74	Remove misc dead code in control_plane/	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	7b34c2d7af	Remove misc dead code in libs/	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	15ae1fc3df	Remove a few postgres constants that were not used Dead code is generally useless, but with Postgres constants in particular, I'm also worried that if they're not used anywhere, we might fail to update them at a Postgres version update, and get very confused later when they have wrong values.	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	728b79b9dd	Remove some unnecessary derives	2024-09-19 11:57:10 +03:00
Alex Chi Z.	9d1c6f23d3	fix(storage-scrubber): log version after initialize the logger (#9049 ) When I checked the log in Grafana I couldn't find the scrubber version. Then I realized that it should be logged after the logger gets initialized. ## Summary of changes Log after initializing the logger for the scrubber. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-18 14:13:57 -04:00
Christian Schwarz	035a49a6b2	`neon_local start`: parallel startup to break cyclic dependency (#8950 ) (Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.) Problem ------- Before this PR, `neon_local` sequentially does the following: 1. launch storcon process 2. wait for storcon to signal readiness [here](`75310fe441/control_plane/src/storage_controller.rs (L804-L808)`) 3. start pageserver 4. wait for pageserver to become ready [here](`c43e664ff5/control_plane/src/pageserver.rs (L343-L346)`) 5. etc The problem is that storcon's readiness waits for the [`startup_reconcile`](`cbcd4058ed/storage_controller/src/service.rs (L520-L523)`) to complete. But pageservers aren't started at this point. So, worst case we wait for `STARTUP_RECONCILE_TIMEOUT/2`, i.e., 15s. This is more than the 10s default timeout allowed by neon_local. So, the result is that `neon_local start` fails to start storcon and stops everything. Solution -------- In this PR I choose the the radical solution to start everything in parallel. It junks up the output because we do stuff like `print!(".")` to indicate progress. We should just abandon that. And switch to `utils::logging` + `tracing` with separate spans for each component. I can do that in this PR or we leave it as a follow-up. Alternatives Considered ----------------------- The Pageserver's `/v1/status` or in fact any endpoint of the mgmt API will not `accept()` on the mgmt API socket until after the `re-attach` call to storcon returned success. So, it's insufficient to change the startup order to start Pageservers first. We cannot easily change Pageserver startup order because `init_tenant_mgr` must complete before we start serving the mgmt API. Otherwise tenant detach calls et al can race with `init_tenant_mgr`. We'd have to add a "loading" state to tenant mgr and make all API endpoints except `/v1/status` wait for _that_ to complete. Related ------- - https://github.com/neondatabase/neon/pull/6475	2024-09-18 18:17:55 +02:00
Folke Behrens	794bd4b866	proxy: mock cplane usable without allowed-ips table (#9046 )	2024-09-18 17:14:53 +02:00
Alexander Bayandin	ac6a1151ae	test_postgres_version: reenable version check for prereleased versions	2024-09-18 14:51:59 +01:00
Tristan Partin	2f37f0384c	Add v17 to revisions.json Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-18 14:51:59 +01:00
Alexander Bayandin	e161a2fa42	CI(deploy): fix deploy to staging and prod (#9030 ) ## Problem It turns out the previous approach (with `skip_if` input) doesn't work (from https://github.com/neondatabase/neon/pull/9017). Revert it and use more straightforward if-conditions ## Summary of changes - Revert `efbe8db7f1` - Add if-condition to`promote-compatibility-data` job and relevant comments	2024-09-18 14:26:47 +01:00
Folke Behrens	c5cd8577ff	proxy: make sql-over-http max request/response sizes configurable (#9029 )	2024-09-18 13:58:51 +02:00
Heikki Linnakangas	3454ef7507	Refactor ImageLayerWriter to avoid passing a Timeline to finish() (#9028 ) Commit `ca5390a89d` made a similar change to DeltaLayerWriter. We bumped into this with Stas with our hackathon project, to create a standalong program to create image layers directly from a Postgres data directory. It needs to create image layers without having a Timeline and other pageserver machinery. This downgrades the "created image layer {}" message from INFO to TRACE level. TRACE is used for the corresponding message on delta layer creation too. The path logged in the message is now the temporary path, before the file is renamed to its final name. Again commit `ca5390a89d` made the same change for the message on delta layer creation.	2024-09-18 13:16:51 +03:00
Christian Schwarz	135e7e4306	add `neon_local` subcommand for the broker & use that from regression tests (#8948 ) There's currently no way to just start/stop broker from `neon_local`. This PR * adds a sub-command * uses that sub-command from the test suite instead of the pre-existing Python `subprocess` based approach. Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.	2024-09-18 09:10:27 +02:00
Christian Schwarz	3cd2a3f931	refactor(walredo): process launch & kill-on-error machinery (#8951 ) Immediate benefit: easier to spot what's going on. Later benefit: use the extracted method in PR - https://github.com/neondatabase/neon/pull/8952 which adds a `ping` command to walredo. Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.	2024-09-17 19:16:33 +00:00
Alexander Bayandin	d78f5ce6da	CI: don't fetch the whole git history if it's not required (#9021 ) ## Problem We do use `actions/checkout` with `fetch-depth: 0` when it's not required ## Summary of changes - Remove unneeded `fetch-depth: 0` - Add a comment if `fetch-depth: 0` is required	2024-09-17 18:40:05 +01:00
Arpad Müller	a1b71b73fe	Rename some S3 usages to "remote storage" in exposed messages (#8999 ) In exposed messages like log messages we mentioned "S3", which is not entirely accurate as we support Azure blob storage now as well.	2024-09-17 19:15:01 +02:00
Tristan Partin	6138eb50e9	Fix test code related to migrations We added another migration in `5876c441ab`, but didn't bump this value. This had no effect, but best to fix it anyway. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-17 15:56:05 +01:00
Heikki Linnakangas	d211f00f05	Remove unnecessary dependencies (#9000 ) Found by "cargo machete"	2024-09-17 17:55:45 +03:00
Alexander Bayandin	cd4276fd65	CI: fix release pipeline (#9017 ) ## Problem We've got 2 non-blocking failures on the release pipeline: - `promote-compatibility-data` job got skipped _presumably_ because one of the dependencies of `deploy` job (`push-to-acr-dev`) got skipped (https://github.com/neondatabase/neon/pull/8940) - `coverage-report` job fails because we don't build debug artifacts in the release branch (https://github.com/neondatabase/neon/pull/8561) ## Summary of changes - Always run `push-to-acr-dev` / `push-to-acr-prod` jobs, but add `skip_if` parameter to the reusable workflow, which can skip the job internally, without skipping externally - Do not run `coverage-report` on release branches	2024-09-17 10:17:48 +01:00
Vlad Lazar	b719d58863	storcon: forward requests from stepped down instance to the current leader (#8954 ) ## Problem It turns out that we can't rely on external orchestration to promptly route trafic to the new leader. This is downtime inducing. Forwarding provides a safe way out. ## Safety We forward when: 1. Request is not one of ["/control/v1/step_down", "/status", "/ready", "/metrics"] 2. Current instance is in [`LeadershipStatus::SteppedDown`] state 3. There is a leader in the database to forward to 4. Leader from step (3) is not the current instance If a storcon instance is persisted in the database, then we know that it is the current leader. There's one exception: time between handling step-down request and the new leader updating the database. Let's treat the happy case first. The stepped down node does not produce any side effects, since all request handling happens on the leader. As for the edge case, we are guaranteed to always have a maximum of two running instances. Hence, if we are in the edge case scenario the leader persisted in the database is the stepped down instance that received the request. Condition (4) above covers this scenario. ## Summary of changes * Conversion utilities for reqwest <-> hyper. I'm not happy with these, but I don't see a better way. Open to suggestions. * Add request forwarding logic * Update each request handler. Again, not happy with this. If anyone knows a nice to wrap the handlers, lmk. Me and Joonas tried :/ * Update each handler to maybe forward * Tweak tests to showcase new behaviour	2024-09-17 09:25:42 +01:00
Heikki Linnakangas	2db840d8b8	Move a few test functions related to auth tokens to separate file (#9018 ) For readability. neon_fixtures.py is huge.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	4295ff0f07	Mark a couple of test fixtures as session-scoped (#9018 ) pg_distrib_dir doesn't include the Postgres version and only depends on env variables which cannot change during a test run, so it can be marked as session-scoped. Similarly, the platform cannot change during a test run.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	c6f56b8462	Remove redundant get_dir_size() function (#9018 ) There was another copy of it in utils.py. The only difference is that the version in utils.py tolerates files that are concurrently removed. That seems fine for the few callers in neon_fixtures.py too.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	fec9321fc0	Use Path type in a few more places in neon_fixtures.py (#9018 ) This is in preparation of replacing neon_fixtures.get_dir_size with neon_fixtures.utils.get_dir_size() in next commit.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	3a52e356c1	Remove unused function (#9018 )	2024-09-17 06:53:18 +03:00
Tristan Partin	5e16c7bb0b	Generate pgbench data on the server for most tests This should generally be faster when running tests, especially those that run with higher scales. Ignoring test_lfc_resize since it seems like we are hitting a query timeout for some reason that I have yet to investigate. A little bit of improvemnt is better than none. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-16 23:37:36 +01:00
Heikki Linnakangas	2bbb4d3e1c	Remove misc unused code (#9014 )	2024-09-16 18:45:19 +00:00
Matthias van de Meent	c8bedca582	Fix PG17's extension modifications (#9010 ) This also reduces the GRANT statements to one per created _reset function	2024-09-16 17:06:31 +01:00
Tristan Partin	5876c441ab	Grant access to pg_show_replication_origin_status for neon_superuser Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-16 16:38:55 +01:00
Alexander Bayandin	b2c83db54d	CI(gather-rust-build-stats): set PQ_LIB_DIR to Postgres 17 (#9001 ) ## Problem `gather-rust-build-stats` extra CI job fails with ``` "PQ_LIB_DIR" doesn't exist in the configured path: "/__w/neon/neon/pg_install/v16/lib" ``` ## Summary of changes - Use the path to Postgres 17 for the `gather-rust-build-stats` job. The job uses Postgres built by `make walproposer-lib`	2024-09-16 12:44:26 +01:00
Matthias van de Meent	0a8c5e1214	Fix broken image for PG17 (#8998 ) Most extensions are not required to run Neon-based PostgreSQL, but the Neon extension is _quite_ critical, so let's make sure we include it. ## Problem Staging doesn't have working compute images for PG17 ## Summary of changes Disable some PG17 filters so that we get the critical components into the PG17 image	2024-09-13 15:10:52 +01:00
Matthias van de Meent	78938d1b59	[compute/postgres] feature: PostgreSQL 17 (#8573 ) This adds preliminary PG17 support to Neon, based on RC1 / 2024-09-04 `07b828e9d4` NOTICE: The data produced by the included version of the PostgreSQL fork may not be compatible with the future full release of PostgreSQL 17 due to expected or unexpected future changes in magic numbers and internals. DO NOT EXPECT DATA IN V17-TENANTS TO BE COMPATIBLE WITH THE 17.0 RELEASE! Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-09-12 23:18:41 +01:00
Stefan Radig	fcab61bdcd	Prototype implementation for private access poc (#8976 ) ## Problem For the Private Access POC we want users to be able to disable access from the public proxy. To limit the number of changes this can be done by configuring an IP allowlist [ "255.255.255.255" ]. For the Private Access proxy a new commandline flag allows to disable IP allowlist completely. See https://www.notion.so/neondatabase/Neon-Private-Access-POC-Proposal-8f707754e1ab4190ad5709da7832f020?d=887495c15e884aa4973f973a8a0a582a#7ac6ec249b524a74adbeddc4b84b8f5f for details about the POC., ## Summary of changes - Adding the commandline flag is_private_access_proxy=true will disable IP allowlist	2024-09-12 15:55:12 +01:00
Tristan Partin	9e3ead3689	Collect the last of on-demand WAL download in CreateReplicationSlot reverts Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-12 11:31:38 +01:00
Heikki Linnakangas	8dc069037b	Remove NeonEnvBuilder.start() function It feels wrong to me to start() from the builder object. Surely the thing you start is the environment itself, not its configuration.	2024-09-12 01:28:56 +03:00
Heikki Linnakangas	0a363c3dce	Add --timeline-id option to "neon_local timeline branch" command Makes it consistent with the "timeline create" and "timeline import" commands, which allowed you to pass the timeline id as argument. This also makes it unnecessary to parse the timeline ID from the output in the python function that calls it.	2024-09-12 01:28:56 +03:00
Heikki Linnakangas	aeca15008c	Remove obsolete and misleading comment The tenant ID was not actually generated here but in NeonEnvBuilder. And the "neon_local init" command hasn't been able to generate the initial tenant since `8712e1899e` anyway.	2024-09-12 01:28:56 +03:00
Heikki Linnakangas	43846b72fa	Remove unused "neon_local init --pg-version" arg It has been unused since commit `8712e1899e`, when it stopped creating the initial timeline.	2024-09-12 01:28:56 +03:00
John Spray	cb060548fb	libs: tweak PageserverUtilization::is_overloaded (#8946 ) ## Problem Having run in production for a while, we see that nodes are generally safely oversubscribed by about a factor of 2. ## Summary of changes Tweak the is_overloaded method to check for utililzation over 200% rather than over 100%	2024-09-11 18:45:34 +01:00
Folke Behrens	bae793ffcd	proxy: Handle all let underscore instances (#8898 ) * Most can be simply replaced * One instance renamed to _rtchk (return-type check)	2024-09-10 15:36:08 +02:00
John Spray	26b5fcdc50	reinstate write-path key check (#8973 ) ## Problem In https://github.com/neondatabase/neon/pull/8621, validation of keys during ingest was removed because the places where we actually store keys are now past the point where we have already converted them to CompactKey (i128) representation. ## Summary of changes Reinstate validation at an earlier stage in ingest. This doesn't cover literally every place we write a key, but it covers most cases where we're trusting postgres to give us a valid key (i.e. one that doesn't try and use a custom spacenode).	2024-09-10 12:54:25 +01:00
Arpad Müller	97582178cb	Remove async_trait from the Handler trait (#8958 ) Newest attempt to remove `async_trait` from the Handler trait. Earlier attempts were in #7301 and #8296 .	2024-09-10 02:40:00 +02:00
Matthias van de Meent	842be0ba74	Specialize WalIngest on PostgreSQL version (#8904 ) The current code assumes that most of this functionality is version-independent, which is only true up to v16 - PostgreSQL 17 has a new field in CheckPoint that we need to keep track of. This basically removes the file-level dependency on v14, and replaces it with switches that load the correct version dependencies where required.	2024-09-09 23:01:52 +01:00
Heikki Linnakangas	982b376ea2	Update parquet crate to a released version (#8961 ) PR #7782 set the dependency in Cargo.toml to 'master', and locked the version to commit that contained a specific fix, because we needed the fix before it was included in a versioned release. The fix was later included in parquet crate version 52.0.0, so we can now switch back to using a released version. The latest release is 53.0.0, switch straight to that. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-09-10 00:04:00 +03:00
Alex Chi Z.	e158df4e86	feat(pageserver): split delta writer automatically determines key range (#8850 ) close https://github.com/neondatabase/neon/issues/8838 ## Summary of changes This patch modifies the split delta layer writer to avoid taking start_key and end_key when creating/finishing the layer writer. The start_key for the delta layers will be the first key provided to the layer writer, and the end_key would be the `last_key.next()`. This simplifies the delta layer writer API. On that, the layer key hack is removed. Image layers now use the full key range, and delta layers use the first/last key provided by the user. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-09 22:03:27 +01:00
Heikki Linnakangas	723c0971e8	Don't create 'empty' branch in neon_simple_env (#8965 ) Now that we've given up hope on sharing the neon_simple_env between tests, there's no reason to not use the 'main' branch directly.	2024-09-09 12:38:34 +03:00
Heikki Linnakangas	c8f67eed8f	Remove TEST_SHARED_FIXTURES (#8965 ) I wish it worked, but it's been broken for a long time, so let's admit defeat and remove it. The idea of sharing the same pageserver and safekeeper environment between tests is still sound, and it could save a lot of time in our CI. We should perhaps put some time into doing that, but we're better off starting from scratch than trying to make TEST_SHARED_FIXTURES work in its current form.	2024-09-09 12:38:34 +03:00

1 2 3 4 5 ...

6079 Commits