rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 20:50:40 +00:00

Author	SHA1	Message	Date
Yuchen Liang	20e6a0c8a2	use open_with_options_v2 (O_DIRECT) for ephemeral file Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-12 01:26:29 +00:00
Yuchen Liang	ce7cd36100	add IoBufAligned marker Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-12 01:17:52 +00:00
Yuchen Liang	b0d7fc7564	fix IoBufferMut::extend_from_slice Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 23:49:52 +00:00
Yuchen Liang	7b34e73c15	fix tests Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 22:35:40 +00:00
Yuchen Liang	e5bb85d407	fix clippy Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 21:33:33 +00:00
Yuchen Liang	e0848c28d9	make InMemory read aware of mutable & maybe_flushed Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-11 03:22:01 +00:00
Yuchen Liang	bdffc352e7	use background flush for write path; read path broken Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-10 20:01:52 +00:00
Yuchen Liang	45998046f3	make flush handle & task generic Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-10 19:07:45 +00:00
Yuchen Liang	26c8b50451	implement non-generic flush handle & bg task Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-09 18:41:41 +00:00
Yuchen Liang	f0efc908d7	use Arc around W: OwnedAsyncWriter Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-08 15:19:36 +00:00
Yuchen Liang	224cbb4025	change OwnedAsyncWriter trait to use write_all_at Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-08 15:09:33 +00:00
Yuchen Liang	dd1c45e896	eliminate size_tracking_writer Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-07 20:44:21 +00:00
Peter Bendel	9132d80aa3	add pgcopydb tool to build tools image (#9658 ) ## Problem build-tools image does not provide superuser, so additional packages can not be installed during GitHub benchmarking workflows but need to be added to the image ## Summary of changes install pgcopydb version 0.17-1 or higher into build-tools bookworm image ```bash docker run -it neondatabase/build-tools:<tag>-bookworm-arm64 /bin/bash ... nonroot@c23c6f4901ce:~$ LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version; 13:58:19.768 8 INFO Running pgcopydb version 0.17 from "/pgcopydb/bin/pgcopydb" pgcopydb version 0.17 compiled with PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit compatible with Postgres 11, 12, 13, 14, 15, and 16 ``` Example usage of that image in a workflow https://github.com/neondatabase/neon/actions/runs/11725718371/job/32662681172#step:7:14	2024-11-07 19:00:25 +01:00
Conrad Ludgate	82e3f0ecba	[proxy/authorize]: improve JWKS reliability (#9676 ) While setting up some tests, I noticed that we didn't support keycloak. They make use of encryption JWKs as well as signature ones. Our current jwks crate does not support parsing encryption keys which caused the entire jwk set to fail to parse. Switching to lazy parsing fixes this. Also while setting up tests, I couldn't use localhost jwks server as we require HTTPS and we were using webpki so it was impossible to add a custom CA. Enabling native roots addresses this possibility. I saw some of our current e2e tests against our custom JWKS in s3 were taking a while to fetch. I've added a timeout + retries to address this.	2024-11-07 16:24:38 +00:00
Arpad Müller	75aa19aa2d	Don't attach is_archived to debug output (#9679 ) We are in branches where we know its value already.	2024-11-07 16:13:50 +00:00
Alex Chi Z.	a8d9939ea9	fix(pageserver): reduce aux compaction threshold (#9647 ) ref https://github.com/neondatabase/neon/issues/9441 The metrics from LR publisher testing project: ~300KB aux key deltas per 256MB files. Therefore, I think we can do compaction more aggressively as these deltas are small and compaction can reduce layer download latency. We also have a read path perf fix https://github.com/neondatabase/neon/pull/9631 but I'd still combine the read path fix with the reduce of the compaction threshold. ## Summary of changes * reduce metadata compaction threshold * use num of L1 delta layers as an indicator for metadata compaction * dump more logs Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-07 10:38:15 -05:00
Erik Grinaker	f18aa04b90	safekeeper: use `set_len()` to zero out segments (#9665 ) ## Problem When we create a new segment, we zero it out in order to avoid changing the length and fsyncing metadata on every write. However, we zeroed it out by writing 8 KB zero-pages, and Tokio file writes have non-trivial overhead. ## Summary of changes Zero out the segment using [`File::set_len()`](https://docs.rs/tokio/latest/i686-unknown-linux-gnu/tokio/fs/struct.File.html#method.set_len) instead. This will typically (depending on the filesystem) just write a sparse file and omit the 16 MB of data entirely. This improves WAL append throughput for large messages by over 400% with fsync disabled, and 100% with fsync enabled.	2024-11-07 15:09:57 +00:00
Erik Grinaker	01265b7bc6	safekeeper: add basic WAL ingestion benchmarks (#9531 ) ## Problem We don't have any benchmarks for Safekeeper WAL ingestion. ## Summary of changes Add some basic benchmarks for WAL ingestion, specifically for `SafeKeeper::process_msg()` (single append) and `WalAcceptor` (pipelined batch ingestion). Also add some baseline file write benchmarks.	2024-11-07 13:24:03 +00:00
Arseny Sher	f54f0e8e2d	Fix direct reading from WAL buffers. (#9639 ) Fix direct reading from WAL buffers. Pointer wasn't advanced which resulted in sending corrupted WAL if part of read used WAL buffers and part read from the file. Also move it to neon_walreader so that e.g. replication could also make use of it. ref https://github.com/neondatabase/cloud/issues/19567	2024-11-07 11:29:52 +00:00
Erik Grinaker	d6aa26a533	postgres_ffi: make `WalGenerator` generic over record generator (#9614 ) ## Problem Benchmarks need more control over the WAL generated by `WalGenerator`. In particular, they need to vary the size of logical messages. ## Summary of changes * Make `WalGenerator` generic over `RecordGenerator`, which constructs WAL records. * Add `LogicalMessageGenerator` which emits logical messages, with a configurable payload. * Minor tweaks and code reorganization. There are no changes to the core logic or emitted WAL.	2024-11-07 10:38:39 +00:00
Cheng Chen	e1d0b73824	chore(compute): Bump pg_mooncake to the latest version	2024-11-06 22:41:18 -06:00
Arpad Müller	011c0a175f	Support copying layers in detach_ancestor from before shard splits (#9669 ) We need to use the shard associated with the layer file, not the shard associated with our current tenant shard ID. Due to shard splits, the shard IDs can refer to older files. close https://github.com/neondatabase/neon/issues/9667	2024-11-07 01:53:58 +01:00
Alex Chi Z.	2a95a51a0d	refactor(pageserver): better pageservice command parsing (#9597 ) close https://github.com/neondatabase/neon/issues/9460 ## Summary of changes A full rewrite of pagestream cmdline parsing to make it more robust and readable. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-11-06 20:41:01 +00:00
Yuchen Liang	11fc1a4c12	fix(test): use layer map dump in `test_readonly_node_gc` to validate layers protected by leases (#9551 ) Fixes #9518. ## Problem After removing the assertion `layers_removed == 0` in #9506, we could miss breakage if we solely rely on the successful execution of the `SELECT` query to check if lease is properly protecting layers. Details listed in #9518. Also, in integration tests, we sometimes run into the race condition where getpage request comes before the lease get renewed (item 2 of #8817), even if compute_ctl sends a lease renewal as soon as it sees a `/configure` API calls that updates the `pageserver_connstr`. In this case, we would observe a getpage request error stating that we `tried to request a page version that was garbage collected` (as we seen in [Allure Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8613/11550393107/index.html#suites/3ccffb1d100105b98aed3dc19b717917/d1a1ba47bc180493)). ## Summary of changes - Use layer map dump to verify if the lease protects what it claimed: Record all historical layers that has `start_lsn <= lease_lsn` before and after running timeline gc. This is the same check as `ad79f42460/pageserver/src/tenant/timeline.rs (L5025-L5027)` The set recorded after GC should contain every layer in the set recorded before GC. - Wait until log contains another successful lease request before running the `SELECT` query after GC. We argued in #8817 that the bad request can only exist within a short period after migration/restart, and our test shows that as long as a lease renewal is done before the first getpage request sent after reconfiguration, we will not have bad request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-06 20:18:21 +00:00
Tristan Partin	93123f2623	Rename compute_backpressure_throttling_ms to compute_backpressure_throttling_seconds This is in line with the Prometheus guidance[0]. We also haven't started using this metric, so renaming is essentially free. Link: https://prometheus.io/docs/practices/naming/ [0] Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-06 13:28:23 -06:00
Alex Chi Z.	1d3559d4bc	feat(pageserver): add fast path for sparse keyspace read (#9631 ) In https://github.com/neondatabase/neon/issues/9441, the tenant has a lot of aux keys spread in multiple aux files. The perf tool shows that a significant amount of time is spent on remove_overlapping_keys. For sparse keyspaces, we don't need to report missing key errors anyways, and it's very likely that we will need to read all layers intersecting with the key range. Therefore, this patch adds a new fast path for sparse keyspace reads that we do not track `unmapped_keyspace` in a fine-grained way. We only modify it when we find an image layer. In debug mode, it was ~5min to read the aux files for a dump of the tenant, and now it's only 8s, that's a 60x speedup. ## Summary of changes * Do not add sparse keys into `keys_done` so that remove_overlapping does nothing. * Allow `ValueReconstructSituation::Complete` to be updated again in `ValuesReconstructState::update_key` for sparse keyspaces. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-06 18:17:02 +00:00
Conrad Ludgate	73bdc9a2d0	[proxy]: minor changes to endpoint-cache handling (#9666 ) I think I meant to make these changes over 6 months ago. alas, better late than never. 1. should_reject doesn't eagerly intern the endpoint string 2. Rate limiter uses a std Mutex instead of a tokio Mutex. 3. Recently I introduced a `-local-proxy` endpoint suffix. I forgot to add this to normalize. 4. Random but a small cleanup making the ControlPlaneEvent deser directly to the interned strings.	2024-11-06 17:40:40 +00:00
John Spray	d182ff294c	storcon: respect tenant scheduling policy in drain/fill (#9657 ) ## Problem Pinning a tenant by setting Pause scheduling policy doesn't work because drain/fill code moves the tenant around during deploys. Closes: https://github.com/neondatabase/neon/issues/9612 ## Summary of changes - In drain, only move a tenant if it is in Active or Essential mode - In fill, only move a tenant if it is in Active mode. The asymmetry is a bit annoying, but it faithfully respects the purposes of the modes: Essential is meant to endeavor to keep the tenant available, which means it needs to be drained but doesn't need to be migrated during fills.	2024-11-06 15:14:43 +00:00
Vlad Lazar	4dfa0c221b	pageserver: ingest pre-serialized batches of values (#9579 ) ## Problem https://github.com/neondatabase/neon/pull/9524 split the decoding and interpretation step from ingestion. The output of the first phase is a `wal_decoder::models::InterpretedWalRecord`. Before this patch set that struct contained a list of `Value` instances. We wish to lift the decoding and interpretation step to the safekeeper, but it would be nice if the safekeeper gave us a batch containing the raw data instead of actual values. ## Summary of changes Main goal here is to make `InterpretedWalRecord` hold a raw buffer which contains pre-serialized Values. For this we do: 1. Add a `SerializedValueBatch` type. This is `inmemory_layer::SerializedBatch` with some extra functionality for extension, observing values for shard 0 and tests. 2. Replace `inmemory_layer::SerializedBatch` with `SerializedValueBatch` 3. Make `DatadirModification` maintain a `SerializedValueBatch`. ### `DatadirModification` changes `DatadirModification` now maintains a `SerializedValueBatch` and extends it as new WAL records come in (to avoid flushing to disk on every record). In turn, this cascaded into a number of modifications to `DatadirModification`: 1. Replace `pending_data_pages` and `pending_zero_data_pages` with `pending_data_batch`. 2. Removal of `pending_zero_data_pages` and its cousin `on_wal_record_end` 3. Rename `pending_bytes` to `pending_metadata_bytes` since this is what it tracks now. 4. Adapting of various utility methods like `len`, `approx_pending_bytes` and `has_dirty_data_pages`. Removal of `pending_zero_data_pages` and the optimisation associated with it ((1) and (2)) deserves more detail. Previously all zero data pages went through `pending_zero_data_pages`. We wrote zero data pages when filling gaps caused by relation extension (case A) and when handling special wal records (case B). If it happened that the same WAL record contained a non zero write for an entry in `pending_zero_data_pages` we skipped the zero write. Case A: We handle this differently now. When ingesting the `SerialiezdValueBatch` associated with one PG WAL record, we identify the gaps and fill the them in one go. Essentially, we move from a per key process (gaps were filled after each new key), and replace it with a per record process. Hence, the optimisation is not required anymore. Case B: When the handling of a special record needs to zero out a key, it just adds that to the current batch. I inspected the code, and I don't think the optimisation kicked in here.	2024-11-06 14:10:32 +00:00
Folke Behrens	bdd492b1d8	proxy: Replace "web(auth)" with "console redirect" everywhere (#9655 )	2024-11-06 11:03:38 +00:00
Folke Behrens	5d8284c7fe	proxy: Read cplane JWT with clap arg (#9654 )	2024-11-06 10:27:55 +00:00
Folke Behrens	ebc43efebc	proxy: Refactor cplane types (#9643 ) The overall idea of the PR is to rename a few types to make their purpose more clear, reduce abstraction where not needed, and move types to to more better suited modules.	2024-11-05 23:03:53 +01:00
Folke Behrens	754d2950a3	proxy: Revert ControlPlaneEvent back to struct (#9649 ) Due to neondatabase/cloud#19815 we need to be more tolerant when reading events.	2024-11-05 21:32:33 +00:00
Conrad Ludgate	fcde40d600	[proxy] use the proxy protocol v2 command to silence some logs (#9620 ) The PROXY Protocol V2 offers a "command" concept. It can be of two different values. "Local" and "Proxy". The spec suggests that "Local" be used for health-checks. We can thus use this to silence logging for such health checks such as those from NLB. This additionally refactors the flow to be a bit more type-safe, self documenting and using zerocopy deser.	2024-11-05 17:23:00 +00:00
Erik Grinaker	babfeb70ba	safekeeper: don't allocate send buffers on stack (#9644 ) ## Problem While experimenting with `MAX_SEND_SIZE` for benchmarking, I saw stack overflows when increasing it to 1 MB. Turns out a few buffers of this size are stack-allocated rather than heap-allocated. Even at the default 128 KB size, that's a bit large to allocate on the stack. ## Summary of changes Heap-allocate buffers of size `MAX_SEND_SIZE`.	2024-11-05 17:05:30 +00:00
Ivan Efremov	2f1a56c8f9	proxy: Unify local and remote conn pool client structures (#9604 ) Unify client, EndpointConnPool and DbUserConnPool for remote and local conn. - Use new ClientDataEnum for additional client data. - Add ClientInnerCommon client structure. - Remove Client and EndpointConnPool code from local_conn_pool.rs	2024-11-05 17:33:41 +02:00
John Spray	e30f5fb922	scrubber: remove AWS region assumption, tolerate negative max_project_size (#9636 ) ## Problem First issues noticed when trying to run scrubber find-garbage on Azure: - Azure staging contains projects with -1 set for max_project_size: apparently the control plane treats this as a signed field. - Scrubber code assumed that listing projects should filter to aws-$REGION. This is no longer needed (per comment in the code) because we know hit region-local APIs. This PR doesn't make it work all the way (`init_remote` still assumes S3), but these are necessary precursors. ## Summary of changes - Change max-project_size from unsigned to signed - Remove region filtering in favor of simply using the right region's API (which we already do)	2024-11-05 13:32:50 +00:00
Arpad Müller	70ae8c16da	Construct models::TenantConfig only once (#9630 ) Since 5f83c9290b482dc90006c400dfc68e85a17af785/#1504 we've had duplication in construction of models::TenantConfig, where both constructs contained the same code. This PR removes one of the two locations to avoid the duplication.	2024-11-05 13:02:49 +00:00
Erik Grinaker	8840f3858c	pageserver: return 503 during tenant shutdown (#9635 ) ## Problem Tenant operations may return `409 Conflict` if the tenant is shutting down. This status code is not retried by the control plane, causing user-facing errors during pageserver restarts. Operations should instead return `503 Service Unavailable`, which may be retried for idempotent operations. ## Summary of changes Convert `GetActiveTenantError::WillNotBecomeActive(TenantState::Stopping)` to `ApiError::ShuttingDown` rather than `ApiError::Conflict`. This error is returned by `Tenant::wait_to_become_active` in most (all?) tenant/timeline-related HTTP routes.	2024-11-05 13:16:55 +01:00
Tristan Partin	1e16221f82	Update psycopg2 to latest version for complete PG 17 support Update the types to match. Changes the cursor import to match the C bindings[0]. Link: https://github.com/python/typeshed/issues/12578 [0] Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-04 18:21:59 -06:00
Tristan Partin	34812a6aab	Improve some typing related to performance testing for LR Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-04 15:52:01 -06:00
Arpad Müller	ee68bbf6f5	Add tenant config option to allow timeline_offloading (#9598 ) Allow us to enable timeline offloading for single tenants without having to enable it for the entire pageserver. Part of #8088.	2024-11-04 21:01:18 +01:00
Folke Behrens	1085fe57d3	proxy: Rewrite ControlPlaneEvent as enum (#9627 )	2024-11-04 20:19:26 +01:00
Folke Behrens	59879985b4	proxy: Wrap JWT errors in separate AuthError variant (#9625 ) * Also rename `AuthFailed` variant to `PasswordFailed`. * Before this all JWT errors end up in `AuthError::AuthFailed()`, expects a username and also causes cache invalidation.	2024-11-04 19:56:40 +01:00
Conrad Ludgate	81d1bb1941	quieten aws_config logs (#9626 ) logs during aws authentication are soooo noisy in staging 🙃	2024-11-04 17:28:10 +00:00
Christian Schwarz	06113e94e6	fix(test_regress): always use storcon virtual pageserver API to set tenant config (#9622 ) Problem ------- Tests that directly call the Pageserver Management API to set tenant config are flaky if the Pageserver is managed by Storcon because Storcon is the source of truth and may (theoretically) reconcile a tenant at any time. Solution -------- Switch all users of `set_tenant_config`/`patch_tenant_config_client_side` to use the `env.storage_controller.pageserver_api()` Future Work ----------- Prevent regressions from creeping in. And generally clean up up tenant configuration. Maybe we can avoid the Pageserver having a default tenant config at all and put the default into Storcon instead? * => https://github.com/neondatabase/neon/issues/9621 Refs ---- fixes https://github.com/neondatabase/neon/issues/9522	2024-11-04 17:42:08 +01:00
Erik Grinaker	0d5a512825	safekeeper: add walreceiver metrics (#9450 ) ## Problem We don't have any observability for Safekeeper WAL receiver queues. ## Summary of changes Adds a few WAL receiver metrics: * `safekeeper_wal_receivers`: gauge of currently connected WAL receivers. * `safekeeper_wal_receiver_queue_depth`: histogram of queue depths per receiver, sampled every 5 seconds. * `safekeeper_wal_receiver_queue_depth_total`: gauge of total queued messages across all receivers. * `safekeeper_wal_receiver_queue_size_total`: gauge of total queued message sizes across all receivers. There are already metrics for ingested WAL volume: `written_wal_bytes` counter per timeline, and `safekeeper_write_wal_bytes` per-request histogram.	2024-11-04 15:22:46 +00:00
Conrad Ludgate	8ad1dbce72	[proxy]: parse proxy protocol TLVs with aws/azure support (#9610 ) AWS/azure private link shares extra information in the "TLV" values of the proxy protocol v2 header. This code doesn't action on it, but it parses it as appropriate.	2024-11-04 14:04:56 +00:00
Conrad Ludgate	3dcdbcc34d	remove aws-lc-rs dep and fix storage_broker tls (#9613 ) It seems the ecosystem is not so keen on moving to aws-lc-rs as it's build setup is more complicated than ring (requiring cmake). Eventually I expect the ecosystem should pivot to https://github.com/ctz/graviola/tree/main/rustls-graviola as it stabilises (it has a very simply build step and license), but for now let's try not have a headache of juggling two crypto libs. I also noticed that tonic will just fail with tls without a default provider, so I added some defensive code for that.	2024-11-04 13:29:13 +00:00
Matthias van de Meent	d5de63c6b8	Fix a time zone issue in a PG17 test case (#9618 ) The commit was cherry-picked and thus shouldn't cause issues once we merge the release tag for PostgreSQL 17.1	2024-11-04 12:10:32 +00:00

1 2 3 4 5 ...

6502 Commits