rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 17:32:56 +00:00

Author	SHA1	Message	Date
Christian Schwarz	27a35331c0	pagebench: add a 'basebackup' benchmark	2023-12-15 17:09:23 +00:00
Christian Schwarz	0b9f0e72ac	pagebench: scaffold	2023-12-15 17:09:21 +00:00
Christian Schwarz	2c631d3dc9	clippy	2023-12-15 17:07:59 +00:00
Christian Schwarz	300d6c38ad	Merge remote-tracking branch 'origin/problame/benchmarking/pr/keyspace-in-mgmt-api' into problame/benchmarking/pr/page_service_api_client	2023-12-15 17:06:41 +00:00
Christian Schwarz	8985331533	Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api	2023-12-15 15:50:15 +00:00
Christian Schwarz	a6abcbe454	hakari manage-deps	2023-12-15 15:49:08 +00:00
Christian Schwarz	888a7311f4	preseed rng for display_fromstr_bijection test case	2023-12-15 15:44:09 +00:00
Christian Schwarz	28479529ae	fixup merge: move keyspace and models::partitioning into pageserver_api	2023-12-15 15:41:00 +00:00
Christian Schwarz	feaee19d4a	Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api	2023-12-15 15:32:02 +00:00
Christian Schwarz	9e238a34b4	make cargo deny happy	2023-12-15 15:31:29 +00:00
Christian Schwarz	46889d768e	move client to separate crate	2023-12-15 15:21:05 +00:00
Christian Schwarz	1a71b72c39	move serialization roundtrip to rust unit test	2023-12-15 15:09:39 +00:00
Christian Schwarz	e4509d151d	Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api	2023-12-15 14:05:47 +00:00
Christian Schwarz	f91625a552	fixup	2023-12-15 14:04:09 +00:00
Christian Schwarz	795fe55332	Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api	2023-12-15 13:52:12 +00:00
Christian Schwarz	cab12c02a3	clippy	2023-12-15 13:43:40 +00:00
Christian Schwarz	7ac9ef8291	remove unused dep	2023-12-15 13:38:34 +00:00
Christian Schwarz	d7a8e0b1ae	eliminate one workaround, convert sk stuff to async as well	2023-12-15 13:34:32 +00:00
Christian Schwarz	2664e9b834	fix	2023-12-15 12:29:02 +00:00
Christian Schwarz	83bdebb4af	WIP	2023-12-15 12:22:33 +00:00
Christian Schwarz	b2508a689b	fill in the todo!()s	2023-12-15 11:16:07 +00:00
Christian Schwarz	672a97993d	Merge branch 'problame/benchmarking/pr/keyspace-in-mgmt-api' into problame/benchmarking/pr/page_service_api_client	2023-12-15 10:06:13 +00:00
Christian Schwarz	7c63902741	Merge remote-tracking branch 'origin/problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api	2023-12-15 10:05:22 +00:00
Christian Schwarz	a5214b203d	Merge remote-tracking branch 'origin/problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/page_service_api_client	2023-12-15 09:59:32 +00:00
Christian Schwarz	9e70c213f7	Merge remote-tracking branch 'origin/main' into problame/benchmarking/pr/mgmt-api-client	2023-12-15 09:40:55 +00:00
Christian Schwarz	e8cd645a82	fixup	2023-12-15 09:35:42 +00:00
John Spray	f1cd1a2122	pageserver: improved handling of concurrent timeline creations on the same ID (#6139 ) ## Problem Historically, the pageserver used an "uninit mark" file on disk for two purposes: - Track which timeline dirs are incomplete for handling on restart - Avoid trying to create the same timeline twice at the same time. The original purpose of handling restarts is now defunct, as we use remote storage as the source of truth and clean up any trash timeline dirs on startup. Using the file to mutually exclude creation operations is error prone compared with just doing it in memory, and the existing checks happened some way into the creation operation, and could expose errors as 500s (anyhow::Errors) rather than something clean. ## Summary of changes - Creations are now mutually excluded in memory (using `Tenant::timelines_creating`), rather than relying on a file on disk for coordination. - Acquiring unique access to the timeline ID now happens earlier in the request. - Creating the same timeline which already exists is now a 201: this simplifies retry handling for clients. - 409 is still returned if a timeline with the same ID is still being created: if this happens it is probably because the client timed out an earlier request and has retried. - Colliding timeline creation requests should no longer return 500 errors This paves the way to entirely removing uninit markers in a subsequent change. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-12-15 08:51:23 +00:00
Christian Schwarz	2e0737ce1a	pageserver: keyspace in mgmt api client Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771	2023-12-14 19:48:59 +00:00
Christian Schwarz	4c5b7cff49	add a Rust client for pageserver mgmt api Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771 This PR moves the control plane's spread-all-over-the-place client for the pageserver management API into a separate module within the pageserver crate. It also switches to the async version of reqwest, which I think is generally the right direction, and I need an async client API in the benchmark epic.	2023-12-14 19:47:26 +00:00
Joonas Koivunen	f010479107	feat(layer): pageserver_layer_redownloaded_after histogram (#6132 ) this is aimed at replacing the current mtime only based trashing alerting later. Cc: #5331	2023-12-14 21:32:54 +02:00
Conrad Ludgate	cc633585dc	gauge guards (#6138 ) ## Problem The websockets gauge for active db connections seems to be growing more than the gauge for client connections over websockets, which does not make sense. ## Summary of changes refactor how our counter-pair gauges are represented. not sure if this will improve the problem, but it should be harder to mess-up the counters. The API is much nicer though now and doesn't require scopeguard::defer hacks	2023-12-14 17:21:39 +00:00
Christian Schwarz	aa5581d14f	utils::logging: TracingEventCountLayer: don't use with_label_values() on hot path (#6129 ) fixes #6126	2023-12-14 16:31:41 +01:00
John Spray	c4e0ef507f	pageserver: heatmap uploads (#6050 ) Dependency (commits inline): https://github.com/neondatabase/neon/pull/5842 ## Problem Secondary mode tenants need a manifest of what to download. Ultimately this will be some kind of heat-scored set of layers, but as a robust first step we will simply use the set of resident layers: secondary tenant locations will aim to match the on-disk content of the attached location. ## Summary of changes - Add heatmap types representing the remote structure - Add hooks to Tenant/Timeline for generating these heatmaps - Create a new `HeatmapUploader` type that is external to `Tenant`, and responsible for walking the list of attached tenants and scheduling heatmap uploads. Notes to reviewers: - Putting the logic for uploads (and later, secondary mode downloads) outside of `Tenant` is an opinionated choice, motivated by: - Enable future smarter scheduling of operations, e.g. uploading the stalest tenant first, rather than having all tenants compete for a fair semaphore on a first-come-first-served basis. Similarly for downloads, we may wish to schedule the tenants with the hottest un-downloaded layers first. - Enable accessing upload-related state without synchronization (it belongs to HeatmapUploader, rather than being some Mutex<>'d part of Tenant) - Avoid further expanding the scope of Tenant/Timeline types, which are already among the largest in the codebase - You might reasonably wonder how much of the uploader code could be a generic job manager thing. Probably some of it: but let's defer pulling that out until we have at least two users (perhaps secondary downloads will be the second one) to highlight which bits are really generic. Compromises: - Later, instead of using digests of heatmaps to decide whether anything changed, I would prefer to avoid walking the layers in tenants that don't have changes: tracking that will be a bit invasive, as it needs input from both remote_timeline_client and Layer.	2023-12-14 13:09:24 +00:00
Conrad Ludgate	6987b5c44e	proxy: add more rates to endpoint limiter (#6130 ) ## Problem Single rate bucket is limited in usefulness ## Summary of changes Introduce a secondary bucket allowing an average of 200 requests per second over 1 minute, and a tertiary bucket allowing an average of 100 requests per second over 10 minutes. Configured by using a format like ```sh proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 100@10s --endpoint-rps-limit 50@1m ``` If the bucket limits are inconsistent, an error is returned on startup ``` $ proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 10@10s Error: invalid endpoint RPS limits. 10@10s allows fewer requests per bucket than 300@1s (100 vs 300) ```	2023-12-13 21:43:49 +00:00
Alexander Bayandin	0cd49cac84	test_compatibility: make it use initdb.tar.zst	2023-12-13 15:04:25 -06:00
Alexander Bayandin	904dff58b5	test_wal_restore_http: cleanup test	2023-12-13 15:04:25 -06:00
Arthur Petukhovsky	f401a21cf6	Fix test_simple_sync_safekeepers There is a postgres 16 version encoded in a binary message.	2023-12-13 15:04:25 -06:00
Tristan Partin	158adf602e	Update Postgres 16 series to 16.1	2023-12-13 15:04:25 -06:00
Tristan Partin	c94db6adbb	Update Postgres 15 series to 15.5	2023-12-13 15:04:25 -06:00
Tristan Partin	85720616b1	Update Postgres 14 series to 14.10	2023-12-13 15:04:25 -06:00
George MacKerron	d6fcc18eb2	Add Neon-Batch- headers to OPTIONS response for SQL-over-HTTP requests (#6116 ) This is needed to allow use of batch queries from browsers. ## Problem SQL-over-HTTP batch queries fail from web browsers because the relevant headers, `Neon-Batch-isolation-Level` and `Neon-Batch-Read-Only`, are not included in the server's OPTIONS response. I think we simply forgot to add them when implementing the batch query feature. ## Summary of changes Added `Neon-Batch-Isolation-Level` and `Neon-Batch-Read-Only` to the OPTIONS response.	2023-12-13 17:18:20 +00:00
Vadim Kharitonov	c2528ae671	Increase pgbouncer pool size to 64 for VMs (#6124 ) The pool size was changed for pods (https://github.com/neondatabase/cloud/pull/8057). The idea to increase it for VMs too	2023-12-13 16:23:24 +00:00
Christian Schwarz	a1143cbcfe	Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/page_service_api_client	2023-12-13 17:13:28 +01:00
Joonas Koivunen	a919b863d1	refactor: remove eviction batching (#6060 ) We no longer have `layer_removal_cs` since #5108, we no longer need batching.	2023-12-13 18:05:33 +02:00
Christian Schwarz	e744cb05e6	build fix & run clippy	2023-12-13 15:51:01 +00:00
Christian Schwarz	e7449cf77f	add a Rust client for Pageserver's page_service Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771	2023-12-13 15:29:24 +00:00
Christian Schwarz	811cc7e990	add a Rust client for pageserver mgmt api Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771	2023-12-13 15:27:14 +00:00
Joonas Koivunen	2d22661061	refactor: calculate_synthetic_size_worker, remove PRE::NeedsDownload (#6111 ) Changes I wanted to make on #6106 but decided to leave out to keep that commit clean for including in the #6090. Finally remove `PageReconstructionError::NeedsDownload`.	2023-12-13 14:23:19 +00:00
John Spray	e3778381a8	tests: make test_bulk_insert recreate tenant in same generation (#6113 ) ## Problem Test deletes tenant and recreates with the same ID. The recreation bumps generation number. This could lead to stale generation warnings in the logs. ## Summary of changes Handle this more gracefully by re-creating in the same generation that the tenant was previously attached in. We could also update the tenant delete path to have the attachment service to drop tenant state on delete, but I like having it there: it makes debug easier, and the only time it's a problem is when a test is re-using a tenant ID after deletion. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-12-13 14:14:38 +00:00
Conrad Ludgate	c8316b7a3f	simplify endpoint limiter (#6122 ) ## Problem 1. Using chrono for durations only is wasteful 2. The arc/mutex was not being utilised 3. Locking every shard in the dashmap every GC could cause latency spikes 4. More buckets ## Summary of changes 1. Use `Instant` instead of `NaiveTime`. 2. Remove the `Arc<Mutex<_>>` wrapper, utilising that dashmap entry returns mut access 3. Clear only a random shard, update gc interval accordingly 4. Multiple buckets can be checked before allowing access When I benchmarked the check function, it took on average 811ns when multithreaded over the course of 10 million checks.	2023-12-13 13:53:23 +00:00

1 2 3 4 5 ...

4239 Commits