rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-06 13:02:55 +00:00

Author	SHA1	Message	Date
Vlad Lazar	8b18d8b31b	safekeeper: add global disk usage utilization limit (#12605 ) N.B: No-op for the neon-env. ## Problem We added a per-timeline disk utilization protection circuit breaker, which will stop the safekeeper from accepting more WAL writes if the disk utilization by the timeline has exceeded a configured limit. We mainly designed the mechanism as a guard against WAL upload/backup bugs, and we assumed that as long as WAL uploads are proceeding as normal we will not run into disk pressure. This turned out to be not true. In one of our load tests where we have 500 PGs ingesting data at the same time, safekeeper disk utilization started to creep up even though WAL uploads were completely normal (we likely just maxed out our S3 upload bandwidth from the single SK). This means the per-timeline disk utilization protection won't be enough if too many timelines are ingesting data at the same time. ## Summary of changes Added a global disk utilization protection circuit breaker which will stop a safekeeper from accepting more WAL writes if the total disk usage on the safekeeper (across all tenants) exceeds a limit. We implemented this circuit breaker through two parts: 1. A "global disk usage watcher" background task that runs at a configured interval (default every minute) to see how much disk space is being used in the safekeeper's filesystem. This background task also performs the check against the limit and publishes the result to a global atomic boolean flag. 2. The `hadron_check_disk_usage()` routine (in `timeline.rs`) now also checks this global boolean flag published in the step above, and fails the `WalAcceptor` (triggers the circuit breaker) if the flag was raised. The disk usage limit is disabled by default. It can be tuned with the `--max-global-disk-usage-ratio` CLI arg. ## How is this tested? Added integration test `test_wal_acceptor.py::test_global_disk_usage_limit`. Also noticed that I haven't been using the `wait_until(f)` test function correctly (the `f` passed in is supposed to raise an exception if the condition is not met, instead of returning `False`...). Fixed it in both circuit breaker tests. --------- Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-16 14:43:17 +00:00
Arpad Müller	5c934efb29	Don't depend on the postgres_ffi just for one type (#12610 ) We don't want to depend on postgres_ffi in an API crate. If there is no such dependency, we can compile stuff like `storcon_cli` without needing a full working postgres build. Fixes regression of #12548 (before we could compile it).	2025-07-15 17:28:08 +00:00
Erik Grinaker	42ab34dc36	pageserver/client_grpc: don't pipeline GetPage requests (#12584 ) ## Problem The communicator gRPC client currently attempts to pipeline GetPage requests from multiple callers onto the same gRPC stream. This has a number of issues: * Head-of-line blocking: the request may block on e.g. layer download or LSN wait, delaying the next request. * Cancellation: we can't easily cancel in-progress requests (e.g. due to timeout or backend termination), so it may keep blocking the next request (even its own retry). * Complex stream scheduling: picking a stream becomes harder/slower, and additional Tokio tasks and synchronization is needed for stream management. Touches #11735. Requires #12579. ## Summary of changes This patch removes pipelining of gRPC stream requests, and instead prefers to scale out the number of streams to achieve the same throughput. Stream scheduling has been rewritten, and mostly follows the same pattern as the client pool with exclusive acquisition by a single caller. [Benchmarks](https://github.com/neondatabase/neon/pull/12583) show that the cost of an idle server-side GetPage worker task is about 26 KB (2.5 GB for 100,000), so we can afford to scale out. This has a number of advantages: * It (mostly) eliminates head-of-line blocking (except at the TCP level). * Cancellation becomes trivial, by closing the stream. * Stream scheduling becomes significantly simpler and cheaper. * Individual callers can still use client-side batching for pipelining.	2025-07-14 12:11:33 +00:00
Erik Grinaker	30b877074c	pagebench: add CPU profiling support (#12478 ) ## Problem The new communicator gRPC client has significantly worse Pagebench performance than a basic gRPC client. We need to find out why. ## Summary of changes Add a `pagebench --profile` flag which takes a client CPU profile of the benchmark and writes a flamegraph to `profile.svg`.	2025-07-14 11:44:53 +00:00
Conrad Ludgate	9bba31bf68	proxy: encode json as we parse rows (#11992 ) Serialize query row responses directly into JSON. Some of this code should be using the `json::value_as_object/list` macros, but I've avoided it for now to minimize the size of the diff.	2025-07-11 19:39:08 +00:00
Folke Behrens	380d167b7c	proxy: For cancellation data replace HSET+EXPIRE/HGET with SET..EX/GET (#12553 ) ## Problem To store cancellation data we send two commands to redis because the redis server version doesn't support HSET with EX. Also, HSET is not really needed. ## Summary of changes * Replace the HSET + EXPIRE command pair with one SET .. EX command. * Replace HGET with GET. * Leave a workaround for old keys set with HSET. * Replace some anyhow errors with specific errors to surface the WRONGTYPE error from redis.	2025-07-11 19:35:42 +00:00
Dmitrii Kovalkov	c34d36d8a2	storcon_cli: timeline-safekeeper-migrate and timeline-locate subcommands (#12548 ) ## Problem We have a `safekeeper_migrate` handler, but no subcommand in `storcon_cli`. Same for `/:timeline_id/locate` for identifying current set of safekeepers. - Closes: https://github.com/neondatabase/neon/issues/12395 ## Summary of changes - Add `timeline-safekeeper-migrate` and `timeline-locate` subcommands to `storcon_cli`	2025-07-11 10:49:37 +00:00
Erik Grinaker	dcdfe80bf0	pagebench: add support for rich gRPC client (#12477 ) ## Problem We need to benchmark the rich gRPC client `client_grpc::PageserverClient` against the basic, no-frills `page_api::Client` to determine how much overhead it adds. Touches #11735. Requires #12476. ## Summary of changes Add a `pagebench --rich-client` parameter to use `client_grpc::PageserverClient`. Also adds a compression parameter to the client.	2025-07-10 17:30:09 +00:00
Erik Grinaker	2fc77c836b	pageserver/client_grpc: add shard map updates (#12480 ) ## Problem The communicator gRPC client must support changing the shard map on splits. Touches #11735. Requires #12476. ## Summary of changes * Wrap the shard set in a `ArcSwap` to allow swapping it out. * Add a new `ShardSpec` parameter struct to pass validated shard info to the client. * Add `update_shards()` to change the shard set. In-flight requests are allowed to complete using the old shards. * Restructure `get_page` to use a stable view of the shard map, and retry errors at the top (pre-split) level to pick up shard map changes. * Also marks `tonic::Status::Internal` as non-retryable, so that we can use it for client-side invariant checks without continually retrying these.	2025-07-10 15:46:39 +00:00
HaoyuHuang	2c6b327be6	A few PS changes (#12540 ) # TLDR All changes are no-op except some metrics. ## Summary of changes I ### Pageserver Added a new global counter metric `pageserver_pagestream_handler_results_total` that categorizes pagestream request results according to their outcomes: 1. Success 2. Internal errors 3. Other errors Internal errors include: 1. Page reconstruction error: This probably indicates a pageserver bug/corruption 2. LSN timeout error: Could indicate overload or bugs with PS's ability to reach other components 3. Misrouted request error: Indicates bugs in the Storage Controller/HCC Other errors include transient errors that are expected during normal operation or errors indicating bugs with other parts of the system (e.g., malformed requests, errors due to cancelled operations during PS shutdown, etc.) ## Summary of changes II This PR adds a pageserver endpoint and its counterpart in storage controller to list visible size of all tenant shards. This will be a prerequisite of the tenant rebalance command. ## Problem III We need a way to download WAL segments/layerfiles from S3 and replay WAL records. We cannot access production S3 from our laptops directly, and we also can't transfer any user data out of production systems for GDPR compliance, so we need solutions. ## Summary of changes III This PR adds a couple of tools to support the debugging workflow in production: 1. A new `pagectl download-remote-object` command that can be used to download remote storage objects assuming the correct access is set up. ## Summary of changes IV This PR adds a command to list all visible delta and image layers from index_part. This is useful to debug compaction issues as index_part often contain a lot of covered layers due to PITR. --------- Co-authored-by: William Huang <william.huang@databricks.com> Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-07-10 14:39:38 +00:00
Erik Grinaker	f4b03ddd7b	pageserver/client_grpc: reap idle pool resources (#12476 ) ## Problem The gRPC client pools don't reap idle resources. Touches #11735. Requires #12475. ## Summary of changes Reap idle pool resources (channels/clients/streams) after 3 minutes of inactivity. Also restructure the `StreamPool` to use a mutex rather than atomics for synchronization, for simplicity. This will be optimized later.	2025-07-10 10:18:37 +00:00
Erik Grinaker	8f3351fa91	pageserver/client_grpc: split GetPage batches across shards (#12469 ) ## Problem The rich gRPC Pageserver client needs to split GetPage batches that straddle multiple shards. Touches #11735. Requires #12462. ## Summary of changes Adds a `GetPageSplitter` which splits `GetPageRequest` that span multiple shards, and then reassembles the responses. Dispatches per-shard requests in parallel.	2025-07-09 14:17:22 +00:00
Mikhail	e7d18bc188	Replica promotion in compute_ctl (#12183 ) Add `/promote` method for `compute_ctl` promoting secondary replica to primary, depends on secondary being prewarmed. Add `compute-ctl` mode to `test_replica_promotes`, testing happy path only (no corner cases yet) Add openapi spec for `/promote` and `/lfc` handlers https://github.com/neondatabase/cloud/issues/19011 Resolves: https://github.com/neondatabase/cloud/issues/29807	2025-07-09 12:55:10 +00:00
Erik Grinaker	3915995530	pageserver/client_grpc: add rich Pageserver gRPC client (#12462 ) ## Problem For the communicator, we need a rich Pageserver gRPC client. Touches #11735. Requires #12434. ## Summary of changes This patch adds an initial rich Pageserver gRPC client. It supports: * Sharded tenants across multiple Pageservers. * Pooling of connections, clients, and streams for efficient resource use. * Concurrent use by many callers. * Internal handling of GetPage bidirectional streams, with pipelining and error handling. * Automatic retries. * Observability. The client is still under development. In particular, it needs GetPage batch splitting, shard map updates, and performance optimization. This will be addressed in follow-up PRs.	2025-07-09 11:42:46 +00:00
Erik Grinaker	8223c1ba9d	pageserver/client_grpc: add initial gRPC client pools (#12434 ) ## Problem The communicator will need gRPC channel/client/stream pools for efficient reuse across many backends. Touches #11735. Requires #12396. ## Summary of changes Adds three nested resource pools: * `ChannelPool` for gRPC channels (i.e. TCP connections). * `ClientPool` for gRPC clients (i.e. `page_api::Client`). Acquires channels from `ChannelPool`. * `StreamPool` for gRPC GetPage streams. Acquires clients from `ClientPool`. These are minimal functional implementations that will need further improvements and performance optimization. However, the overall structure is expected to be roughly final, so reviews should focus on that. The pools are not yet in use, but will form the foundation of a rich gRPC Pageserver client used by the communicator (see #12462). This PR also adds the initial crate scaffolding for that client. See doc comments for details.	2025-07-08 20:58:18 +00:00
HaoyuHuang	3dad4698ec	PS changes #1 (#12467 ) # TLDR All changes are no-op except 1. publishing additional metrics. 2. problem VI ## Problem I It has come to my attention that the Neon Storage Controller doesn't correctly update its "observed" state of tenants previously associated with PSs that has come back up after a local data loss. It would still think that the old tenants are still attached to page servers and won't ask more questions. The pageserver has enough information from the reattach request/response to tell that something is wrong, but it doesn't do anything about it either. We need to detect this situation in production while I work on a fix. (I think there is just some misunderstanding about how Neon manages their pageserver deployments which got me confused about all the invariants.) ## Summary of changes I Added a `pageserver_local_data_loss_suspected` gauge metric that will be set to 1 if we detect a problematic situation from the reattch response. The problematic situation is when the PS doesn't have any local tenants but received a reattach response containing tenants. We can set up an alert using this metric. The alert should be raised whenever this metric reports non-zero number. Also added a HTTP PUT `http://pageserver/hadron-internal/reset_alert_gauges` API on the pageserver that can be used to reset the gauge and the alert once we manually rectify the situation (by restarting the HCC). ## Problem II Azure upload is 3x slower than AWS. -> 3x slower ingestion. The reason for the slower upload is that Azure upload in page server is much slower => higher flush latency => higher disk consistent LSN => higher back pressure. ## Summary of changes II Use Azure put_block API to uploads a 1 GB layer file in 8 blocks in parallel. I set the put_block block size to be 128 MB by default in azure config. To minimize neon changes, upload function passes the layer file path to the azure upload code through the storage metadata. This allows the azure put block to use FileChunkStreamRead to stream read from one partition in the file instead of loading all file data in memory and split it into 8 128 MB chunks. ## How is this tested? II 1. rust test_real_azure tests the put_block change. 3. I deployed the change in azure dev and saw flush latency reduces from ~30 seconds to 10 seconds. 4. I also did a bunch of stress test using sqlsmith and 100 GB TPCDS runs. ## Problem III Currently Neon limits the compaction tasks as 3/4 * CPU cores. This limits the overall compaction throughput and it can easily cause head-of-the-line blocking problems when a few large tenants are compacting. ## Summary of changes III This PR increases the limit of compaction tasks as `BG_TASKS_PER_THREAD` (default 4) * CPU cores. Note that `CONCURRENT_BACKGROUND_TASKS` also limits some other tasks `logical_size_calculation` and `layer eviction` . But compaction should be the most frequent and time-consuming task. ## Summary of changes IV This PR adds the following PageServer metrics: 1. `pageserver_disk_usage_based_eviction_evicted_bytes_total`: captures the total amount of bytes evicted. It's more straightforward to see the bytes directly instead of layers. 2. `pageserver_active_storage_operations_count`: captures the active storage operation, e.g., flush, L0 compaction, image creation etc. It's useful to visualize these active operations to get a better idea of what PageServers are spending cycles on in the background. ## Summary of changes V When investigating data corruptions, it's useful to search the base image and all WAL records of a page up to an LSN, i.e., a breakdown of GetPage@LSN request. This PR implements this functionality with two tools: 1. Extended `pagectl` with a new command to search the layer files for a given key up to a given LSN from the `index_part.json` file. The output can be used to download the files from S3 and then search the file contents using the second tool. Example usage: ``` cargo run --bin pagectl index-part search --tenant-id 09b99ea3239bbb3b2d883a59f087659d --timeline-id 7bedf4a6995baff7c0421ff9aebbcdab --path ~/Downloads/corruption/index_part.json-0000000c-formatted --key 000000067F000080140000802100000D61BD --lsn 70C/BF3D61D8 ``` Example output: ``` tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000067F0000801400000B180000000002-000000067F0000801400008028000002FEFF__000007089F0B5381-0000070C7679EEB9-0000000c tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000000000000000000000000000000000-000000067F0000801400008028000002F3F1__000006DD95B6F609-000006E2BA14C369-0000000c tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000067F0000801400000B180000000002-000000067F000080140000802100001B0973__000006D33429F539-000006DD95B6F609-0000000c tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000067F0000801400000B180000000002-000000067F00008014000080210000164D81__000006C6343B2D31-000006D33429F539-0000000b tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000067F0000801400000B180000000002-000000067F0000801400008021000017687B__000006BA344FA7F1-000006C6343B2D31-0000000b tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000067F0000801400000B180000000002-000000067F00008014000080210000165BAB__000006AD34613D19-000006BA344FA7F1-0000000b tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000067F0000801400000B180000000002-000000067F00008014000080210000137A39__0000069F34773461-000006AD34613D19-0000000b tenants/09b99ea3239bbb3b2d883a59f087659d-0304/timelines/7bedf4a6995baff7c0421ff9aebbcdab/000000067F000080140000802100000D4000-000000067F000080140000802100000F0000__0000069F34773460-0000000b ``` 2. Added a unit test to search the layer file contents. It's not implemented part of `pagectl` because it depends on some test harness code, which can only be used by unit tests. Example usage: ``` cargo test --package pageserver --lib -- tenant::debug::test_search_key --exact --nocapture -- --tenant-id 09b99ea3239bbb3b2d883a59f087659d --timeline-id 7bedf4a6995baff7c0421ff9aebbcdab --data-dir /Users/chen.luo/Downloads/corruption --key 000000067F000080140000802100000D61BD --lsn 70C/BF3D61D8 ``` Example output: ``` # omitted image for brievity delta: 69F/769D8180: will_init: false, "OgAAALGkuwXwYp12nwYAAECGAAASIqLHAAAAAH8GAAAUgAAAIYAAAL1hDQD/DLGkuwUDAAAAEAAWAA==" delta: 69F/769CB6D8: will_init: false, "PQAAALGkuwXotZx2nwYAABAJAAAFk7tpACAGAH8GAAAUgAAAIYAAAL1hDQD/CQUAEAASALExuwUBAAAAAA==" ``` ## Problem VI Currently when page service resolves shards from page numbers, it doesn't fully support the case that the shard could be split in the middle. This will lead to query failures during the tenant split for either commit or abort cases (it's mostly for abort). ## Summary of changes VI This PR adds retry logic in `Cache::get()` to deal with shard resolution errors more gracefully. Specifically, it'll clear the cache and retry, instead of failing the query immediately. It also reduces the internal timeout to make retries faster. The PR also fixes a very obvious bug in `TenantManager::resolve_attached_shard` where the code tries to cache the computed the shard number, but forgot to recompute when the shard count is different. --------- Co-authored-by: William Huang <william.huang@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com> Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Vlad Lazar <vlad.lazar@databricks.com> Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-07-08 19:43:01 +00:00
Conrad Ludgate	55aef2993d	introduce a JSON serialization lib (#12417 ) See #11992 and #11961 for some examples of usecases. This introduces a JSON serialization lib, designed for more flexibility than serde_json offers. ## Dynamic construction Sometimes you have dynamic values you want to serialize, that are not already in a serde-aware model like a struct or a Vec etc. To achieve this with serde, you need to implement a lot of different traits on a lot of different new-types. Because of this, it's often easier to give-in and pull all the data into a serde-aware model (serde_json::Value or some intermediate struct), but that is often not very efficient. This crate allows full control over the JSON encoding without needing to implement any extra traits. Just call the relevant functions, and it will guarantee a correctly encoded JSON value. ## Async construction Similar to the above, sometimes the values arrive asynchronously. Often collecting those values in memory is more expensive than writing them as JSON, since the overheads of `Vec` and `String` is much higher, however there are exceptions. Serializing to JSON all in one go is also more CPU intensive and can cause lag spikes, whereas serializing values incrementally spreads out the CPU load and reduces lag.	2025-07-07 15:12:02 +00:00
Heikki Linnakangas	b568189f7b	Build dummy libcommunicator into the 'neon' extension (#12266 ) This doesn't do anything interesting yet, but demonstrates linking Rust code to the neon Postgres extension, so that we can review and test drive just the build process changes independently.	2025-07-04 23:27:28 +00:00
Lassi Pölönen	6d73cfa608	Support audit syslog over TLS (#12124 ) Add support to transport syslogs over TLS. Since TLS params essentially require passing host and port separately, add a boolean flag to the configuration template and also use the same `action` format for plaintext logs. This allows seamless transition. The plaintext host:port is picked from `AUDIT_LOGGING_ENDPOINT` (as earlier) and from `AUDIT_LOGGING_TLS_ENDPOINT`. The TLS host:port is used when defined and non-empty. `remote_endpoint` is split separately to hostname and port as required by `omfwd` module. Also the address parsing and config content generation are split to more testable functions with basic tests added.	2025-07-01 12:53:46 +00:00
Erik Grinaker	d0a4ae3e8f	pageserver: add gRPC LSN lease support (#12384 ) ## Problem The gRPC API does not provide LSN leases. ## Summary of changes * Add LSN lease support to the gRPC API. * Use gRPC LSN leases for static computes with `grpc://` connstrings. * Move `PageserverProtocol` into the `compute_api::spec` module and reuse it.	2025-06-30 12:44:17 +00:00
Erik Grinaker	a384d7d501	pageserver: assert no changes to shard identity (#12379 ) ## Problem Location config changes can currently result in changes to the shard identity. Such changes will cause data corruption, as seen with #12217. Resolves #12227. Requires #12377. ## Summary of changes Assert that the shard identity does not change on location config updates and on (re)attach. This is currently asserted with `critical!`, in case it misfires in production. Later, we should reject such requests with an error and turn this into a proper assertion.	2025-06-30 12:36:45 +00:00
Erik Grinaker	e50b914a8e	compute_tools: support gRPC base backups in `compute_ctl` (#12244 ) ## Problem `compute_ctl` should support gRPC base backups. Requires #12111. Requires #12243. Touches #11926. ## Summary of changes Support `grpc://` connstrings for `compute_ctl` base backups.	2025-06-27 16:39:00 +00:00
Alex Chi Z.	33c0d5e2f4	fix(pageserver): make posthog config parsing more robust (#12356 ) ## Problem In our infra config, we have to split server_api_key and other fields in two files: the former one in the sops file, and the latter one in the normal config. It creates the situation that we might misconfigure some regions that it only has part of the fields available, causing storcon/pageserver refuse to start. ## Summary of changes Allow PostHog config to have part of the fields available. Parse it later. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-26 15:49:08 +00:00
Alex Chi Z.	6c77638ea1	feat(storcon): retrieve feature flag and pass to pageservers (#12324 ) ## Problem part of https://github.com/neondatabase/neon/issues/11813 ## Summary of changes It costs $$$ to directly retrieve the feature flags from the pageserver. Therefore, this patch adds new APIs to retrieve the spec from the storcon and updates it via pageserver. * Storcon retrieves the feature flag and send it to the pageservers. * If the feature flag gets updated outside of the normal refresh loop of the pageserver, pageserver won't fetch the flags on its own as long as the last updated time <= refresh_period. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-25 14:58:18 +00:00
Matthias van de Meent	6c6de6382a	Use enum-typed PG versions (#12317 ) This makes it possible for the compiler to validate that a match block matched all PostgreSQL versions we support. ## Problem We did not have a complete picture about which places we had to test against PG versions, and what format these versions were: The full PG version ID format (Major/minor/bugfix `MMmmbb`) as transfered in protocol messages, or only the Major release version (`MM`). This meant type confusion was rampant. With this change, it becomes easier to develop new version-dependent features, by making type and niche confusion impossible. ## Summary of changes Every use of `pg_version` is now typed as either `PgVersionId` (u32, valued in decimal `MMmmbb`) or PgMajorVersion (an enum, with a value for every major version we support, serialized and stored like a u32 with the value of that major version) --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-06-24 17:25:31 +00:00
Erik Grinaker	15d079cd41	pagebench: improve `getpage-latest-lsn` gRPC support (#12293 ) This improves `pagebench getpage-latest-lsn` gRPC support by: * Using `page_api::Client`. * Removing `--protocol`, and using the `page-server-connstring` scheme instead. * Adding `--compression` to enable zstd compression.	2025-06-20 08:31:40 +00:00
Heikki Linnakangas	1950ccfe33	Eliminate dependency from pageserver_api to postgres_ffi (#12273 ) Introduce a separate `postgres_ffi_types` crate which contains a few types and functions that were used in the API. `postgres_ffi_types` is a much small crate than `postgres_ffi`, and it doesn't depend on bindgen or the Postgres C headers. Move NeonWalRecord and Value types to wal_decoder crate. They are only used in the pageserver-safekeeper "ingest" API. The rest of the ingest API types are defined in wal_decoder, so move these there as well.	2025-06-19 10:31:27 +00:00
Erik Grinaker	6f4ffdb48b	pageserver: add gRPC compression (#12280 ) ## Problem The gRPC page service should support compression. Requires #12111. Touches #11728. Touches https://github.com/neondatabase/cloud/issues/25679. ## Summary of changes Add support for gzip and zstd compression in the server, and a client parameter to enable compression. This will need further benchmarking under realistic network conditions.	2025-06-19 09:54:34 +00:00
Elizabeth Murray	830ef35ed3	Domain client for Pageserver GRPC. (#12111 ) Add domain client for new communicator GRPC types.	2025-06-18 15:51:49 +00:00
Folke Behrens	1dce65308d	Update base64 to 0.22 (#12215 ) ## Problem Base64 0.13 is outdated. ## Summary of changes Update base64 to 0.22. Affects mostly proxy and proxy libs. Also upgrade serde_with to remove another dep on base64 0.13 from dep tree.	2025-06-12 16:12:47 +00:00
Mikhail	1b935b1958	endpoint_storage: add ?from_endpoint= to /lfc/prewarm (#12195 ) Related: https://github.com/neondatabase/cloud/issues/24225 Add optional from_endpoint parameter to allow prewarming from other endpoint	2025-06-10 19:25:32 +00:00
Mikhail	7efd4554ab	endpoint_storage: allow bypassing s3 write check on startup (#12165 ) Related: https://github.com/neondatabase/cloud/issues/27195	2025-06-06 18:08:02 +00:00
Arpad Müller	24d7c37e6e	neon_local timeline import: create timelines on safekeepers (#12138 ) neon_local's timeline import subcommand creates timelines manually, but doesn't create them on the safekeepers. If a test then tries to open an endpoint to read from the timeline, it will error in the new world with `--timelines-onto-safekeepers`. Therefore, if that flag is enabled, create the timelines on the safekeepers. Note that this import functionality is different from the fast import feature (https://github.com/neondatabase/neon/issues/10188, #11801). Part of #11670 As well as part of #11712	2025-06-05 18:53:14 +00:00
Erik Grinaker	6a43f23eca	pagebench: add batch support (#12133 ) ## Problem The new gRPC page service protocol supports client-side batches. The current libpq protocol only does best-effort server-side batching. To compare these approaches, Pagebench should support submitting contiguous page batches, similar to how Postgres will submit them (e.g. with prefetches or vectored reads). ## Summary of changes Add a `--batch-size` parameter specifying the size of contiguous page batches. One batch counts as 1 RPS and 1 queue depth. For the libpq protocol, a batch is submitted as individual requests and we rely on the server to batch them for us. This will give a realistic comparison of how these would be processed in the wild (e.g. when Postgres sends 100 prefetch requests). This patch also adds some basic validation of responses.	2025-06-05 11:52:52 +00:00
Erik Grinaker	e00fd45bba	page_api: remove smallvec (#12095 ) ## Problem The gRPC `page_api` domain types used smallvecs to avoid heap allocations in the common case where a single page is requested. However, this is pointless: the Protobuf types use a normal vec, and converting a smallvec into a vec always causes a heap allocation anyway. ## Summary of changes Use a normal `Vec` instead of a `SmallVec` in `page_api` domain types.	2025-06-03 12:20:34 +00:00
Erik Grinaker	a21c1174ed	pagebench: add gRPC support for `get-page-latest-lsn` (#12077 ) ## Problem We need gRPC support in Pagebench to benchmark the new gRPC Pageserver implementation. Touches #11728. ## Summary of changes Adds a `Client` trait to make the client transport swappable, and a gRPC client via a `--protocol grpc` parameter. This must also specify the connstring with the gRPC port: ``` pagebench get-page-latest-lsn --protocol grpc --page-service-connstring grpc://localhost:51051 ``` The client is implemented using the raw Tonic-generated gRPC client, to minimize client overhead.	2025-06-02 14:50:49 +00:00
Erik Grinaker	8d7ed2a4ee	pageserver: add gRPC observability middleware (#12093 ) ## Problem The page service logic asserts that a tracing span is present with tenant/timeline/shard IDs. An initial gRPC page service implementation thus requires a tracing span. Touches https://github.com/neondatabase/neon/issues/11728. ## Summary of changes Adds an `ObservabilityLayer` middleware that generates a tracing span and decorates it with IDs from the gRPC metadata. This is a minimal implementation to address the tracing span assertion. It will be extended with additional observability in later PRs.	2025-06-02 11:46:50 +00:00
Alex Chi Z.	dc953de85d	feat(pageserver): integrate PostHog with gc-compaction rollout (#11917 ) ## Problem part of https://github.com/neondatabase/neon/issues/11813 ## Summary of changes * Integrate feature store with tenant structure. * gc-compaction picks up the current strategy from the feature store. * We only log them for now for testing purpose. They will not be used until we have more patches to support different strategies defined in PostHog. * We don't support property-based evaulation for now; it will be implemented later. * Evaluating result of the feature flag is not cached -- it's not efficient and cannot be used on hot path right now. * We don't report the evaluation result back to PostHog right now. I plan to enable it in staging once we get the patch merged. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-05-26 13:09:37 +00:00
Erik Grinaker	7cd0defaf0	page_api: add Rust domain types (#11999 ) ## Problem For the gRPC Pageserver API, we should convert the Protobuf types to stricter, canonical Rust types. Touches https://github.com/neondatabase/neon/issues/11728. ## Summary of changes Adds Rust domain types that mirror the Protobuf types, with conversion and validation.	2025-05-26 11:01:36 +00:00
Erik Grinaker	ec991877f4	pageserver: add gRPC server (#11972 ) ## Problem We want to expose the page service over gRPC, for use with the communicator. Requires #11995. Touches #11728. ## Summary of changes This patch wires up a gRPC server in the Pageserver, using Tonic. It does not yet implement the actual page service. * Adds `listen_grpc_addr` and `grpc_auth_type` config options (disabled by default). * Enables gRPC by default with `neon_local`. * Stub implementation of `page_api.PageService`, returning unimplemented errors. * gRPC reflection service for use with e.g. `grpcurl`. Subsequent PRs will implement the actual page service, including authentication and observability. Notably, TLS support is not yet implemented. Certificate reloading requires us to reimplement the entire Tonic gRPC server.	2025-05-26 08:27:48 +00:00
Erik Grinaker	06ce704041	Cargo.toml: upgrade Tonic to 0.13.1 (#11995 ) ## Problem We're about to implement a gRPC interface for Pageserver. Let's upgrade Tonic first, to avoid a more painful migration later. It's currently only used by storage-broker. Touches #11728. ## Summary of changes Upgrade Tonic 0.12.3 → 0.13.1. Also opportunistically upgrade Prost 0.13.3 → 0.13.5. This transitively pulls in Indexmap 2.0.1 → 2.9.0, but it doesn't appear to be used in any particularly critical code paths.	2025-05-23 08:57:35 +00:00
Konstantin Merenkov	5db20af8a7	Keep the conn info cache on max_client_conn from pgbouncer (#11986 ) ## Problem Hitting max_client_conn from pgbouncer would lead to invalidation of the conn info cache. Customers would hit the limit on wake_compute. ## Summary of changes `should_retry_wake_compute` detects this specific error from pgbouncer as non-retriable, meaning we won't try to wake up the compute again.	2025-05-21 15:27:30 +00:00
Konstantin Knizhnik	2e3dc9a8c2	Add rel_size_replica_cache (#11889 ) ## Problem See Discussion: https://neondb.slack.com/archives/C033RQ5SPDH/p1746645666075799 Issue: https://github.com/neondatabase/cloud/issues/28609 Relation size cache is not correctly updated at PS in case of replicas. ## Summary of changes 1. Have two caches for relation size in timeline: `rel_size_primary_cache` and `rel_size_replica_cache`. 2. `rel_size_primary_cache` is actually what we have now. The only difference is that it is not updated in `get_rel_size`, only by WAL ingestion 3. `rel_size_replica_cache` has limited size (LruCache) and it's key is `(Lsn,RelTag)` . It is updated in `get_rel_size`. Only strict LSN matches are accepted as cache hit. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-05-20 15:38:27 +00:00
Erik Grinaker	cdb6479c8a	pageserver: add gRPC page service schema (#11815 ) ## Problem For the [communicator project](https://github.com/neondatabase/company_projects/issues/352), we want to move to gRPC for the page service protocol. Touches #11728. ## Summary of changes This patch adds an experimental gRPC Protobuf schema for the page service. It is equivalent to the current page service, but with several improvements, e.g.: * Connection multiplexing. * Reduced head-of-line blocking. * Client-side batching. * Explicit tenant shard routing. * GetPage request classification (normal vs. prefetch). * Explicit rate limiting ("slow down" response status). The API is exposed as a new `pageserver/page_api` package. This is separate from the `pageserver_api` package to reduce the dependency footprint for the communicator. The longer-term plan is to also split out e.g. the WAL ingestion service to a separate gRPC package, e.g. `pageserver/wal_api`. Subsequent PRs will: add Rust domain types for the Protobuf types, expose a gRPC server, and implement the page service. Preliminary prototype benchmarks of this gRPC API is within 10% of baseline libpq performance. We'll do further benchmarking and optimization as the implementation lands in `main` and is deployed to staging.	2025-05-19 09:03:06 +00:00
Heikki Linnakangas	532d9b646e	Add simple facility for an extendable shared memory area (#11929 ) You still need to provide a max size up-front, but memory is only allocated for the portion that is in use. The module is currently unused, but will be used by the new compute communicator project, in the neon Postgres extension. See https://github.com/neondatabase/neon/issues/11729 --------- Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-05-16 21:22:36 +00:00
Heikki Linnakangas	55f91cf10b	Update 'nix' package (#11948 ) There were some incompatible changes. Most churn was from switching from the now-deprecated fcntl:flock() function to fcntl::Flock::lock(). The new function returns a guard object, while with the old function, the lock was associated directly with the file descriptor. It's good to stay up-to-date in general, but the impetus to do this now is that in https://github.com/neondatabase/neon/pull/11929, I want to use some functions that were added only in the latest version of 'nix', and it's nice to not have to build multiple versions. (Although, different versions of 'nix' are still pulled in as indirect dependencies from other packages)	2025-05-16 14:45:08 +00:00
Vlad Lazar	2621ce2daf	pageserver: checkpoint import progress in the storage controller (#11862 ) ## Problem Timeline imports do not have progress checkpointing. Any time that the tenant is shut-down, all progress is lost and the import restarts from the beginning when the tenant is re-attached. ## Summary of changes This PR adds progress checkpointing. ### Preliminaries The unit of work is a `ChunkProcessingJob`. Each `ChunkProcessingJob` deals with the import for a set of key ranges. The job split is done by using an estimation of how many pages each job will produce. The planning stage must be pure: given a fixed set of contents in the import bucket, it will always yield the same plan. This property is enforced by checking that the hash of the plan is identical when resuming from a checkpoint. The storage controller tracks the progress of each shard in the import in the database in the form of the latest job that has has completed. ### Flow This is the high level flow for the happy path: 1. On the first run of the import task, the import task queries storcon for the progress and sees that none is recorded. 2. Execute the preparatory stage of the import 3. Import jobs start running concurrently in a `FuturesOrdered`. Every time the checkpointing threshold of jobs has been reached, notify the storage controller. 4. Tenant is detached and re-attached 5. Import task starts up again and gets the latest progress checkpoint from the storage controller in the form of a job index. 6. The plan is computed again and we check that the hash matches with the original plan. 7. Jobs are spawned from where the previous import task left off. Note that we will not report progress after the completion of each job, so some jobs might run twice. Closes https://github.com/neondatabase/neon/issues/11568 Closes https://github.com/neondatabase/neon/issues/11664	2025-05-15 13:18:22 +00:00
Alex Chi Z.	34a42b00ca	feat(pageserver): add PostHog lite client (#11821 ) ## Problem part of https://github.com/neondatabase/neon/issues/11813 ## Summary of changes Add a lite PostHog client that only uses the local flag evaluation functionality. Added a test case that parses an example feature flag and gets the evaluation result. TODO: support boolean flag, remote config; implement all operators in PostHog. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-05-13 09:49:14 +00:00
Conrad Ludgate	5cd7f936f9	fix(neon-rls): optimistically assume role grants are already assigned for replicas (#11811 ) ## Problem Read replicas cannot grant permissions for roles for Neon RLS. Usually the permission is already granted, so we can optimistically check. See INC-509 ## Summary of changes Perform a permission lookup prior to actually executing any grants.	2025-05-09 07:48:30 +00:00
Mikhail	5c356c63eb	endpoint_storage compute_ctl integration (#11550 ) Add `/lfc/(prewarm\|offload)` routes to `compute_ctl` which interact with endpoint storage. Add `prewarm_lfc_on_startup` spec option which, if enabled, downloads LFC prewarm data on compute startup. Resolves: https://github.com/neondatabase/cloud/issues/26343	2025-05-06 22:02:12 +00:00

1 2 3 4 5 ...

727 Commits