rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-14 17:02:56 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	f3f7d0d3f1	zero-copy jwt claim validation	2024-09-30 12:47:07 +01:00
Conrad Ludgate	0724df1d3f	stash	2024-09-29 20:29:26 +01:00
Conrad Ludgate	4d47049b00	split up jwt tests	2024-09-27 16:31:49 +01:00
Conrad Ludgate	5687384a8e	remove deref impl	2024-09-27 11:43:34 +01:00
Conrad Ludgate	249f5ea17d	cleaner local-proxy conn error code	2024-09-27 11:43:34 +01:00
Conrad Ludgate	6abcc1f298	add explicit panic reason	2024-09-27 11:43:34 +01:00
Conrad Ludgate	3e97cf0d6e	refine missing credentials error	2024-09-27 11:43:34 +01:00
Conrad Ludgate	054ef4988b	update certification comment	2024-09-27 11:43:34 +01:00
Conrad Ludgate	5202cd75b5	only forward expected headers	2024-09-27 11:43:34 +01:00
Conrad Ludgate	f475dac0e6	keepalive while idle	2024-09-27 11:43:34 +01:00
Conrad Ludgate	a4100373e5	fix common name parsing	2024-09-27 11:43:34 +01:00
Conrad Ludgate	040d8cf4f6	fix common name parsing	2024-09-27 11:43:34 +01:00
Conrad Ludgate	75bfd57e01	add authbroker cli flag and fix http2 ka	2024-09-27 11:43:34 +01:00
Conrad Ludgate	4bc2686dee	small tweaks	2024-09-27 11:43:34 +01:00
Conrad Ludgate	8e7d2aab76	put it all together	2024-09-27 11:43:34 +01:00
Conrad Ludgate	2703abccc7	start on http2 local proxy connection pool	2024-09-27 11:43:34 +01:00
Conrad Ludgate	76515cdae3	split out auth info from conn info, return the jwt as the auth keys	2024-09-27 11:43:34 +01:00
Conrad Ludgate	08c7f933a3	add support for console backend jwt	2024-09-27 11:43:34 +01:00
Conrad Ludgate	4ad3aa7c96	update doc comment for get_with_url	2024-09-27 10:24:50 +01:00
Conrad Ludgate	9c59e3b4b9	proxy: add jwks endpoint to control plane and mock providers	2024-09-27 10:24:43 +01:00
Arseny Sher	40f7930a7d	safekeeper: skip syncfs on start if --no-sync is specified. (#9166 ) https://neondb.slack.com/archives/C059ZC138NR/p1727350911890989?thread_ts=1727350211.370869&cid=C059ZC138NR	2024-09-27 09:59:38 +03:00
Conrad Ludgate	ec07a1ecc9	proxy: make local-proxy config by signal with PID, refine JWKS apis with role caching (#9164 )	2024-09-26 19:01:48 +01:00
Arseny Sher	c4cdfe66ac	Fix flakiness of test_timeline_copy. Timeline might be not initialized when timeline_start_lsn is queried. Spotted by CI.	2024-09-26 19:01:45 +03:00
Alex Chi Z.	42e19e952f	fix(pageserver): categorize client error in basebackup metrics (#9110 ) We separated client error from basebackup error log lines in https://github.com/neondatabase/neon/pull/7523, but we didn't do anything for the metrics. In this patch, we fixed it. ref https://github.com/neondatabase/neon/issues/8970 ## Summary of changes We use the same criteria as in `log_query_error` producing an info line (instead of error) for the metrics. We added a `client_error` category for the basebackup query time metrics. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-26 11:38:19 -04:00
John Spray	3d255d601b	pageserver: rename control plane client & chunk validation requests (#8997 ) ## Problem - In https://github.com/neondatabase/neon/pull/8784, the validate controller API is modified to check generations directly in the database. It batches tenants into separate queries to avoid generating a huge statement, but - While updating this, I realized that "control_plane_client" is a kind of confusing name for the client code now that it primarily talks to the storage controller (the case of talking to the control plane will go away in a few months). ## Summary of changes - Big rename to "ControllerUpcallClient" -- this reflects the storage controller's api naming, where the paths used by the pageserver are in `/upcall/` - When sending validate requests, break them up into chunks so that we avoid possible edge cases of generating any HTTP requests that require database I/O across many thousands of tenants. This PR mixes a functional change with a refactor, but the commits are cleanly separated -- only the last commit is a functional change. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-09-26 16:06:34 +01:00
Arthur Petukhovsky	80e974d05b	fix(compute_ctl): race condition in configurator (#9162 ) There was a tricky race condition in compute_ctl, that sometimes makes configurator skip updates. It makes a deadlock because: - control-plane cannot configure compute, because it's in ConfigurationPending state - compute_ctl doesn't do any reconfiguration because `configurator_main_loop` missed notification for it Full sequence that reproduces the issue: 1. `start_compute` finishes works and changes status `self.set_status(ComputeStatus::Running);` 2. configurator received update about `Running` state and dropped the mutex lock in the iteration 3. `/configure` request was triggered at the same time as step 1, and got the mutex lock 4. same `/configure` request set the spec and updated the state to `ConfigurationPending`, also sent a notification 5. next iteration in configurator got the mutex lock, but missed the notification There are more details in this slack thread: https://neondb.slack.com/archives/C03438W3FLZ/p1727281028478689?thread_ts=1727261220.483799&cid=C03438W3FLZ --------- Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-09-26 15:42:17 +01:00
Alexander Bayandin	7fdf1ab5b6	CI: run compatibility tests on Postgres 17 (#9145 ) ## Problem The latest storage release has generated artifacts for Postgres 17, so we can enable compatibility tests this version ## Summary of changes - Unskip `test_backward_compatibility` / `test_forward_compatibility` on Postgres 17	2024-09-26 15:17:01 +01:00
Arpad Müller	7bae78186b	Forbid creation of child timelines of archived timeline (#9122 ) We don't want to allow any new child timelines of archived timelines. If you want any new child timelines, you should first un-archive the timeline. Part of #8088	2024-09-26 02:05:25 +02:00
Heikki Linnakangas	7e560dd00e	chore: Silence clippy warning with nightly (#9157 ) The warning: warning: first doc comment paragraph is too long --> pageserver/src/tenant/checks.rs:7:1 \| 7 \| / /// Checks whether a layer map is valid (i.e., is a valid result of the current compaction algorithm if no... 8 \| \| /// The function checks if we can split the LSN range of a delta layer only at the LSNs of the delta layer... 9 \| \| /// 10 \| \| /// ```plain \| \|_ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph = note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default help: add an empty line \| 7 ~ /// Checks whether a layer map is valid (i.e., is a valid result of the current compaction algorithm if nothing goes wrong). 8 + /// \| Fix by applying the suggestion.	2024-09-25 21:29:16 +00:00
Tristan Partin	684e924211	Fix compute_logical_snapshot_files for v14 The function, pg_ls_logicalsnapdir(), was added in version 15. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-25 16:25:17 -05:00
Tristan Partin	8ace9ea25f	Format long single DATA line in pgxn/Makefile This should be a little more readable. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-25 16:25:17 -05:00
Alex Chi Z.	6a4f49b08b	fix(pageserver): passthrough partition cancel error (#9154 ) close https://github.com/neondatabase/neon/issues/9142 ## Summary of changes passthrough CollectKeyspaceError::Cancelled to CompactionError::ShuttingDown Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-25 21:35:33 +01:00
Alexander Bayandin	c6e89445e2	CI(promote-images): fix prod ECR auth (#9146 ) A cherry-pick from the previous release (#9131) ## Problem Login to prod ECR doesn't work anymore: ``` Retrieving registries data through * SDK... * ECR detected with eu-central-1 region Error: The security token included in the request is invalid. ``` ## Summary of changes - Fix login to prod ECR by using `aws-actions/configure-aws-credentials`	2024-09-25 18:22:39 +01:00
Vlad Lazar	04f32b9526	tests: remove patching up of az id column (#8968 ) This was required since the compat tests used a snapshot generated from a version of neon local which didn't contain the availability_zone_id column.	2024-09-25 17:22:32 +01:00
Heikki Linnakangas	6f2333f52b	CI: Leave out unnecessary build files from binary artifact (#9135 ) The pg_install/build directory contains .o files and such intermediate results from the build, which are not needed in the final tarball. Except for src/test/regress/regress.so and a few other .so files in that directory; keep those. This reduces the size of the neon-Linux-X64-release-artifact.tar.zst artifact from about 1.5 GB to 700 MB. (I attempted this a long time ago already, by moving the build/ directory out of pg_install altogether, see PR #2127. But I never got around to finish that work.) Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-25 19:07:20 +03:00
Yuchen Liang	d447f49bc3	fix(pageserver): handle lsn lease requests for unnormalized lsns (#9137 ) Fixes https://github.com/neondatabase/neon/issues/9098. ## Problem See https://github.com/neondatabase/neon/issues/9098#issuecomment-2372484969. ### Related A similar problem happened with branch creation, which was discussed [here](https://github.com/neondatabase/neon/pull/2143#issuecomment-1199969052) and fixed by https://github.com/neondatabase/neon/pull/2529. ## Summary of changes - Normalize the lsn on pageserver side upon lsn lease request, stores the normalized LSN. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-25 14:57:38 +00:00
Vlad Lazar	c5972389aa	storcon: include timeline ID in LSN waiting logs (#9141 ) ## Problem Hard to tell which timeline is holding the migration. ## Summary of Changes Add timeline id to log.	2024-09-25 15:54:41 +01:00
Matthias van de Meent	c4f5736d5a	Build images for PG17 using Debian 12 "Bookworm" (#9132 ) This increases the support window of the OS used for PG17 by 2 years compared to the previous usage of Debian 11 "Bullseye".	2024-09-25 17:50:05 +03:00
Alexey Kondratov	518f598e2d	docs(rfc): Independent compute release flow (#8881 ) Related to https://github.com/neondatabase/cloud/issues/11698	2024-09-25 16:24:09 +02:00
John Spray	4b711caf5e	storage controller: make proxying of GETs to pageservers more robust (#9065 ) ## Problem These commits are split off from https://github.com/neondatabase/neon/pull/8971/commits where I was fixing this to make a better scale test pass -- Vlad also independently recognized these issues with cloudbench in https://github.com/neondatabase/neon/issues/9062. 1. The storage controller proxies GET requests to pageservers based on their intent, not the ground truth of where they're really attached. 2. Proxied requests can race with scheduling to tenants, resulting in 404 responses if the request hits the wrong pageserver. Closes: https://github.com/neondatabase/neon/issues/9062 ## Summary of changes 1. If a shard has a running reconciler, then use the database generation_pageserver to decide who to proxy the request to 2. If such a request gets a 404 response and its scheduled node has changed since the request was dispatched.	2024-09-25 13:56:39 +00:00
Vlad Lazar	2cf47b1477	storcon: do az aware scheduling (#9083 ) ## Problem Storage controller didn't previously consider AZ locality between compute and pageservers when scheduling nodes. Control plane has this feature, and, since we are migrating tenants away from it, we need feature parity to avoid perf degradations. ## Summary of changes The change itself is fairly simple: 1. Thread az info into the scheduler 2. Add an extra member to the scheduling scores Step (2) deserves some more discussion. Let's break it down by the shard type being scheduled: Attached Shards We wish for attached shards of a tenant to end up in the preferred AZ of the tenant since that is where the compute is like to be. The AZ member for `NodeAttachmentSchedulingScore` has been placed below the affinity score (so it's got the second biggest weight for picking the node). The rationale for going below the affinity score is to avoid having all shards of a single tenant placed on the same node in 2 node regions, since that would mean that one tenant can drive the general workload of an entire pageserver. I'm not 100% sure this is the right decision, so open to discussing hoisting the AZ up to first place. Secondary Shards We wish for secondary shards of a tenant to be scheduled in a different AZ from the preferred one for HA purposes. The AZ member for `NodeSecondarySchedulingScore` has been placed first, so nodes in different AZs from the preferred one will always be considered first. On small clusters, this can mean that all the secondaries of a tenant are scheduled to the same pageserver, but secondaries don't use up as many resources as the attached location, so IMO the argument made for attached shards doesn't hold. Related: https://github.com/neondatabase/neon/issues/8848	2024-09-25 14:31:04 +01:00
Folke Behrens	7dcfcccf7c	Re-export git-version from utils and remove as direct dep (#9138 )	2024-09-25 14:38:35 +02:00
Vlad Lazar	a26cc29d92	storcon: add tags to scheduler logs (#9127 ) We log something at info level each time we schedule a shard to a non-secondary location. Might as well have context for it.	2024-09-25 10:16:06 +01:00
Alex Chi Z.	5f2f31e879	fix(test): storage scrubber should only log to stdout with info (#9067 ) As @koivunej mentioned in the storage channel, for regress test, we don't need to create a log file for the scrubber, and we should reduce noisy logs. ## Summary of changes * Disable log file creation for storage scrubber * Only log at info level --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-24 22:33:03 +00:00
Damian972	938b163b42	chore(docker-compose): fix typo in readme (#9133 ) Typo in the readme inside docker-compose folder ## Summary of changes - Update the readme	2024-09-24 18:05:23 -04:00
Heikki Linnakangas	5cbf5b45ae	Remove TenantState::Loading (#9118 ) The last real use was removed in commit `de90bf4663`. It was still used in a few unit tests, but they can use Attaching too.	2024-09-24 20:58:54 +00:00
Heikki Linnakangas	af5c54ed14	test: Make test_lfc_resize more robust (#9117 ) 1. Increase statement_timeout. It defaults to 120 s, which is not quite enough on slow or busy systems with debug build. On my laptop, the index creation takes about 100 s. On buildfarm, we've seen failures, e.g: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9084/10997888708/index.html#suites/821f97908a487f1d7d3a2a4dd1571e99/db1834bddfe8c5b9/ 2. Keep twiddling the LFC size through the whole test. Before, we would do it for the first 10 seconds, but that only covers a small part of the pgbench initialization phase. Change the loop so that the pgbench run time determines how long the test runs, and we keep changing the LFC for the whole time. In the passing, also fix bogus test description, copy-pasted from a completely unrelated test.	2024-09-24 23:38:16 +03:00
Alexander Bayandin	523cf71721	Fix compiler warnings on macOS (#9128 ) ## Problem Compilation of neon extension on macOS produces a warning ``` pgxn/neon/neon_perf_counters.c:50:1: error: non-void function does not return a value [-Werror,-Wreturn-type] ``` ## Summary of changes - Change the return type of `NeonPerfCountersShmemInit` to void	2024-09-24 18:11:31 +00:00
Arpad Müller	c47f355ec1	Catch Cancelled and don't print a warning for it (#9121 ) In the `imitate_synthetic_size_calculation_worker` function, we might obtain the `Cancelled` error variant instead of hitting the cancellation token based path. Therefore, catch `Cancelled` and handle it analogously to the cancellation case. Fixes #8886.	2024-09-24 17:28:56 +00:00
Yuchen Liang	4f67b0225b	pageserver: handle decompression outside vectored `read_blobs` (#8942 ) Part of #8130. ## Problem Currently, decompression is performed within the `read_blobs` implementation and the decompressed blob will be appended to the end of the `BytesMut` buffer. We will lose this flexibility of extending the buffer when we switch to using our own dio-aligned buffer (WIP in https://github.com/neondatabase/neon/pull/8730). To facilitate the adoption of aligned buffer, we need to refactor the code to perform decompression outside `read_blobs`. ## Summary of changes - `VectoredBlobReader::read_blobs` will return `VectoredBlob` without performing decompression and appending decompressed blob. It becomes the caller's responsibility to decompress the buffer. - Added a new `BufView` type that functions as `Cow<Bytes, &[u8]>`. - Perform decompression within `VectoredBlob::read` so that people don't have to explicitly thinking about compression when using the reader interface. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-24 16:41:38 +00:00

1 2 3 4 5 ...

6176 Commits