rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-03 19:42:55 +00:00

Author	SHA1	Message	Date
Erik Grinaker	37e322438b	pageserver: document gRPC compute accessibility (#12724 ) Document that the Pageserver gRPC port is accessible by computes, and should not provide internal services. Touches [LKB-191](https://databricks.atlassian.net/browse/LKB-191).	2025-07-25 13:35:44 +00:00
Gustavo Bazan	fca2c32e59	[ci/docker] task: Apply some quick wins for tools dockerfile (#12740 ) ## Problem The Dockerfile for build tools has some small issues that are easy to fix to make it follow some of docker best practices ## Summary of changes Apply some small quick wins on the Dockerfile for build tools - Usage of apt-get over apt - usage of --no-cache-dir for pip install	2025-07-25 12:39:01 +00:00
Conrad Ludgate	d19aebcf12	[proxy] introduce moka for the project-info cache (#12710 ) ## Problem LKB-2502 The garbage collection of the project info cache is garbage. What we observed: If we get unlucky, we might throw away a very hot entry if the cache is full. The GC loop is dependent on getting a lucky shard of the projects2ep table that clears a lot of cold entries. The GC does not take into account active use, and the interval it runs at is too sparse to do any good. Can we switch to a proper cache implementation? Complications: 1. We need to invalidate by project/account. 2. We need to expire based on `retry_delay_ms`. ## Summary of changes 1. Replace `retry_delay_ms: Duration` with `retry_at: Instant` when deserializing. 2. Split the EndpointControls from the RoleControls into two different caches. 3. Introduce an expiry policy based on error retry info. 4. Introduce `moka` as a dependency, replacing our `TimedLru`. See the follow up PR for changing all TimedLru instances to use moka: #12726.	2025-07-25 11:40:47 +00:00
Conrad Ludgate	a70a5bccff	move subzero_core to proxy libs (#12742 ) We have a dedicated libs folder for proxy related libraries. Let's move the subzero_core stub there.	2025-07-25 10:44:28 +00:00
Conrad Ludgate	d9cedb4a95	[tokio-postgres] fix regression in buffer reuse (#12739 ) Follow up to #12701, which introduced a new regression. When profiling locally I noticed that writes have the tendency to always reallocate. On investigation I found that even if the `Connection`'s write buffer is empty, if it still shares the same data pointer as the `Client`'s write buffer then the client cannot reclaim it. The best way I found to fix this is to just drop the `Connection`'s write buffer each time we fully flush it. Additionally, I remembered that `BytesMut` has an `unsplit` method which is allows even better sharing over the previous optimisation I had when 'encoding'.	2025-07-25 09:03:21 +00:00
quantumish	991fb507c9	Merge branch 'communicator-rewrite' into quantumish/lfc-resize-static-shmem	2025-07-24 19:44:21 -07:00
quantumish	7f63cecd5f	Add LFC resizing implementation and utilities for hole punching	2025-07-24 19:25:13 -07:00
Tristan Partin	b623fbae0c	Cancel PG query if stuck at refreshing configuration (#12717 ) ## Problem While configuring or reconfiguring PG due to PageServer movements, it's possible PG may get stuck if PageServer is moved around after fetching the spec from StorageController. ## Summary of changes To fix this issue, this PR introduces two changes: 1. Fail the PG query directly if the query cannot request configuration for certain number of times. 2. Introduce a new state `RefreshConfiguration` in compute tools to differentiate it from `RefreshConfigurationPending`. If compute tool is already in `RefreshConfiguration` state, then it will not accept new request configuration requests. ## How is this tested? Chaos testing. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-25 00:01:59 +00:00
quantumish	6f3361b3de	Make references all 'static	2025-07-24 14:08:26 -07:00
Tristan Partin	512210bb5a	[BRC-2368] Add PS and compute_ctl metrics to report pagestream request errors (#12716 ) ## Problem In our experience running the system so far, almost all of the "hang compute" situations are due to the compute (postgres) pointing at the wrong pageservers. We currently mainly rely on the promethesus exporter (PGExporter) running on PG to detect and report any down time, but these can be unreliable because the read and write probes the PGExporter runs do not always generate pageserver requests due to caching, even though the real user might be experiencing down time when touching uncached pages. We are also about to start disk-wiping node pool rotation operations in prod clusters for our pageservers, and it is critical to have a convenient way to monitor the impact of these node pool rotations so that we can quickly respond to any issues. These metrics should provide very clear signals to address this operational need. ## Summary of changes Added a pair of metrics to detect issues between postgres' PageStream protocol (e.g. get_page_at_lsn, get_base_backup, etc.) communications with pageservers: * On the compute node (compute_ctl), exports a counter metric that is incremented every time postgres requests a configuration refresh. Postgres today only requests these configuration refreshes when it cannot connect to a pageserver or if the pageserver rejects its request by disconnecting. * On the pageserver, exports a counter metric that is incremented every time it receives a PageStream request that cannot be handled because the tenant is not known or if the request was routed to the wrong shard (e.g. secondary). ### How I plan to use metrics I plan to use the metrics added here to create alerts. The alerts can fire, for example, if these counters have been continuously increasing for over a certain period of time. During rollouts, misrouted requests may occasionally happen, but they should soon die down as reconfigurations make progress. We can start with something like raising the alert if the counters have been increasing continuously for over 5 minutes. ## How is this tested? New integration tests in `test_runner/regress/test_hadron_ps_connectivity_metrics.py` Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-24 19:05:00 +00:00
HaoyuHuang	9eebd6fc79	A few more compute_ctl changes (#12713 ) ## Summary of changes A bunch of no-op changes. The only other thing is that the lock is released early in the terminate func.	2025-07-24 19:01:30 +00:00
Tristan Partin	11527b9df7	[BRC-2951] Enforce PG backpressure parameters at the shard level (#12694 ) ## Problem Currently PG backpressure parameters are enforced globally. With tenant splitting, this makes it hard to balance small tenants and large tenants. For large tenants with more shards, we need to increase the lagging because each shard receives total/shard_count amount of data, while doing so could be suboptimal to small tenants with fewer shards. ## Summary of changes This PR makes these parameters to be enforced at the shard level, i.e., PG will compute the actual lag limit by multiply the shard count. ## How is this tested? Added regression test. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-24 18:41:29 +00:00
Tristan Partin	89554af1bd	[BRC-1778] Have PG signal compute_ctl to refresh configuration if it suspects that it is talking to the wrong PSs (#12712 ) ## Problem This is a follow-up to TODO, as part of the effort to rewire the compute reconfiguration/notification mechanism to make it more robust. Please refer to that commit or ticket BRC-1778 for full context of the problem. ## Summary of changes The previous change added mechanism in `compute_ctl` that makes it possible to refresh the configuration of PG on-demand by having `compute_ctl` go out to download a new config from the control plane/HCC. This change wired this mechanism up with PG so that PG will signal `compute_ctl` to refresh its configuration when it suspects that it could be talking to incorrect pageservers due to a stale configuration. PG will become suspicious that it is talking to the wrong pageservers in the following situations: 1. It cannot connect to a pageserver (e.g., getting a network-level connection refused error) 2. It can connect to a pageserver, but the pageserver does not return any data for the GetPage request 3. It can connect to a pageserver, but the pageserver returns a malformed response 4. It can connect to a pageserver, but there is an error receiving the GetPage request response for any other reason This change also includes a minor tweak to `compute_ctl`'s config refresh behavior. Upon receiving a request to refresh PG configuration, `compute_ctl` will reach out to download a config, but it will not attempt to apply the configuration if the config is the same as the old config is it replacing. This optimization is added because the act of reconfiguring itself requires working pageserver connections. In many failure situations it is likely that PG detects an issue with a pageserver before the control plane can detect the issue, migrate tenants, and update the compute config. In this case even the latest compute config won't point PG to working pageservers, causing the configuration attempt to hang and negatively impact PG's time-to-recovery. With this change, `compute_ctl` only attempts reconfiguration if the refreshed config points PG to different pageservers. ## How is this tested? The new code paths are exercised in all existing tests because this mechanism is on by default. Explicitly tested in `test_runner/regress/test_change_pageserver.py`. Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-24 16:44:45 +00:00
Peter Bendel	f391186aa7	TPC-C like periodic benchmark using benchbase (#12665 ) ## Problem We don't have a well-documented, periodic benchmark for TPC-C like OLTP workload. ## Summary of changes # Benchbase TPC-C-like Performance Results Runs TPC-C-like benchmarks on Neon databases using [Benchbase](https://github.com/cmu-db/benchbase). Docker images are built [here](https://github.com/neondatabase-labs/benchbase-docker-images) We run the benchmarks at different scale factors aligned with different compute sizes we offer to customers. For each scale factor, we determine a max rate (see Throughput in warmup phase) and then run the benchmark at a target rate of approx. 70 % of the max rate. We use different warehouse sizes which determine the working set size - it is optimized for LFC size of the respected pricing tier. Usually we should get LFC hit rates above 70 % for this setup and quite good, consistent (non-flaky) latencies. ## Expected performance as of first testing this \| Tier \| CU \| Warehouses \| Terminals \| Max TPS \| LFC size \| Working set size \| LFC hit rate \| Median latency \| p95 latency \| \|------------\|------------\|---------------\|-----------\|---------\|----------\|------------------\|--------------\|----------------\|-------------\| \| free \| 0.25-2 \| 50 - 5 GB \| 150 \| 800 \| 5 GB \| 6.3 GB \| 95 % \| 170 ms \| 600 ms \| \| serverless \| 2-8 \| 500 - 50 GB \| 230 \| 2000 \| 26 GB \| ?? GB \| 91 % \| 50 ms \| 200 ms \| \| business \| 2-16 \| 1000 - 100 GB \| 330 \| 2900 \| 51 GB \| 50 GB \| 72 % \| 40 ms \| 180 ms \| Each run - first loads the database (not shown in the dashboard). - Then we run a warmup phase for 20 minutes to warm up the database and the LFC at unlimited target rate (max rate) (highest throughput but flaky latencies). The warmup phase can be used to determine the max rate and adjust it in the github workflow in case Neon is faster in the future. - Then we run the benchmark at a target rate of approx. 70 % of the max rate for 1 hour (expecting consistent latencies and throughput). ## Important notes on implementation: - we want to eventually publish the process how to reproduce these benchmarks - thus we want to reduce all dependencies necessary to run the benchmark, the only thing needed are - docker - the docker images referenced above for benchbase - python >= 3.9 to run some config generation steps and create diagrams - to reduce dependencies we deliberatly do NOT use some of our python fixture test infrastructure to make the dependency chain really small - so pls don't add a review comment "should reuse fixture xy" - we also upload all generator python scripts, generated bash shell scripts and configs as well as raw results to S3 bucket that we later want to publish once this benchmark is reviewed and approved.	2025-07-24 16:26:54 +00:00
Paul Banks	94b41b531b	storecon: Fix panic due to race with chaos migration on staging (#12727 ) ## Problem * Fixes LKB-743 We get regular assertion failures on staging caused by a race with chaos injector. If chaos injector decides to migrate a tenant shard between the background optimisation planning and applying optimisations then we attempt to migrate and already migrated shard and hit an assertion failure. ## Summary of changes @VladLazar fixed a variant of this issue by adding`validate_optimization` recently, however it didn't validate the specific property this other assertion requires. Fix is just to update it to cover all the expected properties.	2025-07-24 16:14:47 +00:00
Erik Grinaker	d793088225	pgxn: set `MACOSX_DEPLOYMENT_TARGET` (#12723 ) ## Problem Compiling `neon-pg-ext-v17` results in these linker warnings for `libcommunicator.a`: ``` $ make -j`nproc` -s neon-pg-ext-v17 Installing PostgreSQL v17 headers Compiling PostgreSQL v17 Compiling neon-specific Postgres extensions for v17 ld: warning: object file (/Users/erik.grinaker/Projects/neon/target/debug/libcommunicator.a[1159](25ac62e5b3c53843-curve25519.o)) was built for newer 'macOS' version (15.5) than being linked (15.0) ld: warning: object file (/Users/erik.grinaker/Projects/neon/target/debug/libcommunicator.a[1160](0bbbd18bda93c05b-aes_nohw.o)) was built for newer 'macOS' version (15.5) than being linked (15.0) ld: warning: object file (/Users/erik.grinaker/Projects/neon/target/debug/libcommunicator.a[1161](00c879ee3285a50d-montgomery.o)) was built for newer 'macOS' version (15.5) than being linked (15.0) [...] ``` ## Summary of changes Set `MACOSX_DEPLOYMENT_TARGET` to the current local SDK version (15.5 in this case), which links against object files for that version.	2025-07-24 14:48:35 +00:00
John Spray	67ad420e26	tests: turn down error rate in test_compute_pageserver_connection_stress (#12721 ) ## Problem Compute retries are finite (e.g. 5x in a basebackup) -- with a 50% failure rate we have pretty good chance of exceeding that and the test failing. Fixes: https://databricks.atlassian.net/browse/LKB-2278 ## Summary of changes - Turn connection error rate down to 20% Co-authored-by: John Spray <john.spray@databricks.com>	2025-07-24 14:42:39 +00:00
Tristan Partin	90cd5a5be8	[BRC-1778] Add mechanism to `compute_ctl` to pull a new config (#12711 ) ## Problem We have been dealing with a number of issues with the SC compute notification mechanism. Various race conditions exist in the PG/HCC/cplane/PS distributed system, and relying on the SC to send notifications to the compute node to notify it of PS changes is not robust. We decided to pursue a more robust option where the compute node itself discovers whether it may be pointing to the incorrect PSs and proactively reconfigure itself if issues are suspected. ## Summary of changes To support this self-healing reconfiguration mechanism several pieces are needed. This PR adds a mechanism to `compute_ctl` called "refresh configuration", where the compute node reaches out to the control plane to pull a new config and reconfigure PG using the new config, instead of listening for a notification message containing a config to arrive from the control plane. Main changes to compute_ctl: 1. The `compute_ctl` state machine now has a new State, `RefreshConfigurationPending`. The compute node may enter this state upon receiving a signal that it may be using the incorrect page servers. 2. Upon entering the `RefreshConfigurationPending` state, the background configurator thread in `compute_ctl` wakes up, pulls a new config from the control plane, and reconfigures PG (with `pg_ctl reload`) according to the new config. 3. The compute node may enter the new `RefreshConfigurationPending` state from `Running` or `Failed` states. If the configurator managed to configure the compute node successfully, it will enter the `Running` state, otherwise, it stays in `RefreshConfigurationPending` and the configurator thread will wait for the next notification if an incorrect config is still suspected. 4. Added various plumbing in `compute_ctl` data structures to allow the configurator thread to perform the config fetch. The "incorrect config suspected" notification is delivered using a HTTP endpoint, `/refresh_configuration`, on `compute_ctl`. This endpoint is currently not called by anyone other than the tests. In a follow up PR I will set up some code in the PG extension/libpagestore to call this HTTP endpoint whenever PG suspects that it is pointing to the wrong page servers. ## How is this tested? Modified `test_runner/regress/test_change_pageserver.py` to add a scenario where we use the new `/refresh_configuration` mechanism instead of the existing `/configure` mechanism (which requires us sending a full config to compute_ctl) to have the compute node reload and reconfigure its pageservers. I took one shortcut to reduce the scope of this change when it comes to testing: the compute node uses a local config file instead of pulling a config over the network from the HCC. This simplifies the test setup in the following ways: * The existing test framework is set up to use local config files for compute nodes only, so it's convenient if I just stick with it. * The HCC today generates a compute config with production settings (e.g., assuming 4 CPUs, 16GB RAM, with local file caches), which is probably not suitable in tests. We may need to add another test-only endpoint config to the control plane to make this work. The config-fetch part of the code is relatively straightforward (and well-covered in both production and the KIND test) so it is probably fine to replace it with loading from the local config file for these integration tests. In addition to making sure that the tests pass, I also manually inspected the logs to make sure that the compute node is indeed reloading the config using the new mechanism instead of going down the old `/configure` path (it turns out the test has bugs which causes compute `/configure` messages to be sent despite the test intending to disable/blackhole them). ```test 2024-09-24T18:53:29.573650Z INFO http request{otel.name=/refresh_configuration http.method=POST}: serving /refresh_configuration POST request 2024-09-24T18:53:29.573689Z INFO configurator_main_loop: compute node suspects its configuration is out of date, now refreshing configuration 2024-09-24T18:53:29.573706Z INFO configurator_main_loop: reloading config.json from path: /workspaces/hadron/test_output/test_change_pageserver_using_refresh[release-pg16]/repo/endpoints/ep-1/spec.json PG:2024-09-24 18:53:29.574 GMT [52799] LOG: received SIGHUP, reloading configuration files PG:2024-09-24 18:53:29.575 GMT [52799] LOG: parameter "neon.extension_server_port" cannot be changed without restarting the server PG:2024-09-24 18:53:29.575 GMT [52799] LOG: parameter "neon.pageserver_connstring" changed to "postgresql://no_user@localhost:15008" ... ``` Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-24 14:26:21 +00:00
Christian Schwarz	643448b1a2	test_hot_standby_gc: work around standby_horizon-related flakiness/raciness uncovered by #12431 (#12704 ) PR #12431 set initial lease deadline = 0s for tests. This turned test_hot_standby_gc flaky because it now runs GC: it started failing with `tried to request a page version that was garbage collected` because the replica reads below applied gc cutoff. The leading theory is that, we run the timeline_gc() before the first standby_horizon push arrives at PS. That is definitively a thing that can happen with the current standby_horizon mechanism, and it's now tracked as such in https://databricks.atlassian.net/browse/LKB-2499. We don't have logs to confirm this theory though, but regardless, try the fix in this PR and see if it stabilizes things. Refs - flaky test issue: https://databricks.atlassian.net/browse/LKB-2465 ## Problem ## Summary of changes	2025-07-24 14:00:22 +00:00
Conrad Ludgate	8daebb6ed4	[proxy] remove TokioMechanism and HyperMechanism (#12672 ) Another go at #12341. LKB-2497 We now only need 1 connect mechanism (and 1 more for testing) which saves us some code and complexity. We should be able to remove the final connect mechanism when we create a separate worker task for pglb->compute connections - either via QUIC streams or via in-memory channels. This also now ensures that connect_once always returns a ConnectionError type - something simple enough we can probably define a serialisation for in pglb. * I've abstracted connect_to_compute to always use TcpMechanism and the ProxyConfig. * I've abstracted connect_to_compute_and_auth to perform authentication, managing any retries for stale computes * I had to introduce a separate `managed` function for taking ownership of the compute connection into the Client/Connection pair	2025-07-24 12:37:04 +00:00
Alexey Kondratov	ab14521ea5	fix(compute): Turn off database collector in postgres_exporter (#12684 ) ## Problem `postgres_exporter` has database collector enabled by default and it doesn't filter out invalid databases, see `06a553c816/collector/pg_database.go (L67)` so if it hits one, it starts spamming logs ``` ERROR: [NEON_SMGR] [reqid d9700000018] could not read db size of db 705302 from page server at lsn 5/A2457EB0 ``` ## Summary of changes We don't use `pg_database_size_bytes` metric anyway, see `5e19b3fd89/apps/base/compute-metrics/scrape-compute-pg-exporter-neon.yaml (L29)` so just turn it off by passing `--no-collector.database`.	2025-07-24 11:52:31 +00:00
dependabot[bot]	e82021d6fe	build(deps): bump the npm_and_yarn group across 1 directory with 2 updates (#12678 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-24 10:51:09 +00:00
Conrad Ludgate	9997661138	[proxy/tokio-postgres] garbage collection for codec buffers (#12701 ) ## Problem A large insert or a large row will cause the codec to allocate a large buffer. The codec never shrinks the buffer however. LKB-2496 ## Summary of changes 1. Introduce a naive GC system for codec buffers 2. Try and reduce copies as much as possible	2025-07-24 10:30:02 +00:00
Ivan Efremov	0e427fc117	Update proxy-bench workflow to use bare-metal script (#12703 ) Pass the params for run.sh in proxy-bench repo to use bare-metal config. Fix the paths and cleanup procedure.	2025-07-24 08:23:07 +00:00
Heikki Linnakangas	0e0aff7b8c	fix metrics when not using the new communicator	2025-07-24 01:40:32 +03:00
Tristan Partin	9b2e6f862a	Set an upper limit on PG backpressure throttling (#12675 ) ## Problem Tenant split test revealed another bug with PG backpressure throttling that under some cases PS may never report its progress back to SK (e.g., observed when aborting tenant shard where the old shard needs to re-establish SK connection and re-ingest WALs from a much older LSN). In this case, PG may get stuck forever. ## Summary of changes As a general precaution that PS feedback mechanism may not always be reliable, this PR uses the previously introduced WAL write rate limit mechanism to slow down write rates instead of completely pausing it. The idea is to introduce a new `databricks_effective_max_wal_bytes_per_second`, which is set to `databricks_max_wal_mb_per_second` when no PS back pressure and is set to `10KB` when there is back pressure. This way, PG can still write to SK, though at a very low speed. The PR also fixes the problem that the current WAL rate limiting mechanism is too coarse grained and cannot enforce limits < 1MB. This is because it always resets the rate limiter after 1 second, even if PG could have written more data in the past second. The fix is to introduce a `batch_end_time_us` which records the expected end time of the current batch. For example, if PG writes 10MB of data in a single batch, and max WAL write rate is set as `1MB/s`, then `batch_end_time_us` will be set as 10 seconds later. ## How is this tested? Tweaked the existing test, and also did manual testing on dev. I set `max_replication_flush_lag` as 1GB, and loaded 500GB pgbench tables. It's expected to see PG gets throttled periodically because PS will accumulate 4GB of data before flushing. Results: when PG is throttled: ``` 9500000 of 3300000000 tuples (0%) done (elapsed 10.36 s, remaining 3587.62 s) 9600000 of 3300000000 tuples (0%) done (elapsed 124.07 s, remaining 42523.59 s) 9700000 of 3300000000 tuples (0%) done (elapsed 255.79 s, remaining 86763.97 s) 9800000 of 3300000000 tuples (0%) done (elapsed 315.89 s, remaining 106056.52 s) 9900000 of 3300000000 tuples (0%) done (elapsed 412.75 s, remaining 137170.58 s) ``` when PS just flushed: ``` 18100000 of 3300000000 tuples (0%) done (elapsed 433.80 s, remaining 78655.96 s) 18200000 of 3300000000 tuples (0%) done (elapsed 433.85 s, remaining 78231.71 s) 18300000 of 3300000000 tuples (0%) done (elapsed 433.90 s, remaining 77810.62 s) 18400000 of 3300000000 tuples (0%) done (elapsed 433.96 s, remaining 77395.86 s) 18500000 of 3300000000 tuples (0%) done (elapsed 434.03 s, remaining 76987.27 s) 18600000 of 3300000000 tuples (0%) done (elapsed 434.08 s, remaining 76579.59 s) 18700000 of 3300000000 tuples (0%) done (elapsed 434.13 s, remaining 76177.12 s) 18800000 of 3300000000 tuples (0%) done (elapsed 434.19 s, remaining 75779.45 s) 18900000 of 3300000000 tuples (0%) done (elapsed 434.84 s, remaining 75489.40 s) 19000000 of 3300000000 tuples (0%) done (elapsed 434.89 s, remaining 75097.90 s) 19100000 of 3300000000 tuples (0%) done (elapsed 434.94 s, remaining 74712.56 s) 19200000 of 3300000000 tuples (0%) done (elapsed 498.93 s, remaining 85254.20 s) 19300000 of 3300000000 tuples (0%) done (elapsed 498.97 s, remaining 84817.95 s) 19400000 of 3300000000 tuples (0%) done (elapsed 623.80 s, remaining 105486.76 s) 19500000 of 3300000000 tuples (0%) done (elapsed 745.86 s, remaining 125476.51 s) ``` Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-23 22:37:27 +00:00
Tristan Partin	12e87d7a9f	Add neon.lakebase_mode boolean GUC (#12714 ) This GUC will become useful for temporarily disabling Lakebase-specific features during the code merge. Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-23 22:37:20 +00:00
Heikki Linnakangas	5a5ea9cb9f	cargo fmt	2025-07-24 01:33:02 +03:00
Heikki Linnakangas	3d209dcaae	Minor changes to minimize diff against 'main' The `pgxn/neon/communicator/Cargo.lock` file was not used, since the package is part of the workspace.	2025-07-24 00:42:00 +03:00
Heikki Linnakangas	f939691f6a	remove leftover empty file	2025-07-24 00:27:49 +03:00
Mikhail	a56afee269	Accept primary compute spec in /promote, promotion corner cases testing (#12574 ) https://github.com/neondatabase/cloud/issues/19011 - Accept `ComputeSpec` in `/promote` instead of just passing safekeepers and LSN. Update API spec - Add corner case tests for promotion when promotion or perwarm fails (using failpoints) - Print root error for prewarm and promotion in status handlers	2025-07-23 20:11:34 +00:00
Alex Chi Z.	9e6ca2932f	fix(test): convert bool to lowercase when invoking neon-cli (#12688 ) ## Problem There has been some inconsistencies of providing tenant config via `tenant_create` and via other tenant config APIs due to how the properties are processed: in `tenant_create`, the test framework calls neon-cli and therefore puts those properties in the cmdline. In other cases, it's done via the HTTP API by directly serializing to a JSON. When using the cmdline, the program only accepts serde bool that is true/false. ## Summary of changes Convert Python bool into `true`/`false` when using neon-cli. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-23 18:56:37 +00:00
HaoyuHuang	63ea4b0579	A few more compute_tool changes (#12687 ) ## Summary of changes All changes are no-op except that the tracing-appender lib is upgraded from 0.2.2 to 0.2.3	2025-07-23 18:30:33 +00:00
Folke Behrens	20881ef65e	otel: Use blocking reqwest in dedicated thread (#12699 ) ## Problem OTel 0.28+ by default uses blocking operations in a dedicated thread and doesn't start a tokio runtime. Reqwest as currently configured wants to spawn tokio tasks. ## Summary of changes Use blocking reqwest. This PR just mitigates the current issue.	2025-07-23 18:21:36 +00:00
Conrad Ludgate	a695713727	[sql-over-http] Reset session state between pooled connection re-use (#12681 ) Session variables can be set during one sql-over-http query and observed on another when that pooled connection is re-used. To address this we can use `RESET ALL;` before re-using the connection. LKB-2495 To be on the safe side, we can opt for a full `DISCARD ALL;`, but that might have performance regressions since it also clears any query plans. See pgbouncer docs https://www.pgbouncer.org/config.html#server_reset_query. `DISCARD ALL` is currently defined as: ``` CLOSE ALL; SET SESSION AUTHORIZATION DEFAULT; RESET ALL; DEALLOCATE ALL; UNLISTEN *; SELECT pg_advisory_unlock_all(); DISCARD PLANS; DISCARD TEMP; DISCARD SEQUENCES; ``` I've opted to keep everything here except the `DISCARD PLANS`. I've modified the code so that this query is executed in the background when a connection is returned to the pool, rather than when taken from the pool. This should marginally improve performance for Neon RLS by removing 1 (localhost) round trip. I don't believe that keeping query plans could be a security concern. It's a potential side channel, but I can't imagine what you could extract from it. --- Thanks to https://github.com/neondatabase/neon/pull/12659#discussion_r2219016205 for probing the idea in my head.	2025-07-23 17:43:43 +00:00
Alex Chi Z.	5c57e8a11b	feat(pageserver): rework reldirv2 rollout (#12576 ) ## Problem LKB-197, #9516 To make sure the migration path is smooth. The previous plan is to store new relations in new keyspace and old ones in old keyspace until it gets dropped. This makes the migration path hard as we can't validate v2 writes and can't rollback. This patch gives us a more smooth migration path: - The first time we enable reldirv2 for a tenant, we copy over everything in the old keyspace to the new one. This might create a short spike of latency for the create relation operation, but it's oneoff. - After that, we have identical v1/v2 keyspace and read/write both of them. We validate reads every time we list the reldirs. - If we are in `migrating` mode, use v1 as source of truth and log a warning for failed v2 operations. If we are in `migrated` mode, use v2 as source of truth and error when writes fail. - One compatibility test uses dataset from the time where we enabled reldirv2 (of the original rollout plan), which only has relations written to the v2 keyspace instead of the v1 keyspace. We had to adjust it accordingly. - Add `migrated_at` in index_part to indicate the LSN where we did the initialize. TODOs: - Test if relv1 can be read below the migrated_at LSN. - Move the initialization process to L0 compaction instead of doing it on the write path. - Disable relcache in the relv2 test case so that all code path gets fully tested. ## Summary of changes - New behavior of reldirv2 migration flags as described above. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-23 16:12:46 +00:00
Alexander Bayandin	84a2556c9f	compute-node.Dockerfile: update bullseye-backports backports url (#12700 ) ## Problem > bullseye-backports has reached end-of-life and is no longer supported or updated From: https://backports.debian.org/Instructions/ This causes the compute-node image build to fail with the following error: ``` 0.099 Err:5 http://deb.debian.org/debian bullseye-backports Release 0.099 404 Not Found [IP: 146.75.122.132 80] ... 1.293 E: The repository 'http://deb.debian.org/debian bullseye-backports Release' does not have a Release file. ``` ## Summary of changes - Use archive version of `bullseye-backports`	2025-07-23 14:45:52 +00:00
Erik Grinaker	f96c8f63c2	pageserver: route gRPC requests to child shards	2025-07-23 16:38:22 +02:00
Erik Grinaker	c8cdd25da4	Pass stripe size during shard map updates	2025-07-23 16:38:20 +02:00
Folke Behrens	90242416a6	otel: Use blocking reqwest in dedicated thread OTel 0.28+ by default uses blocking operations in a dedicated thread.	2025-07-23 16:36:27 +02:00
Conrad Ludgate	761e9e0e1d	[proxy] move `read_info` from the compute connection to be as late as possible (#12660 ) Second attempt at #12130, now with a smaller diff. This allows us to skip allocating for things like parameter status and notices that we will either just forward untouched, or discard. LKB-2494	2025-07-23 13:33:21 +00:00
Dmitrii Kovalkov	94cb9a79d9	safekeeper: generation aware timeline tombstones (#12482 ) ## Problem With safekeeper migration in mind, we can now pull/exclude the timeline multiple times within the same safekeeper. To avoid races between out of order requests, we need to ignore the pull/exclude requests if we have already seen a higher generation. - Closes: https://github.com/neondatabase/neon/issues/12186 - Closes: [LKB-949](https://databricks.atlassian.net/browse/LKB-949) ## Summary of changes - Annotate timeline tombstones in safekeeper with request generation. - Replace `ignore_tombstone` option with `mconf` in `PullTimelineRequest` - Switch membership in `pull_timeline` if the existing/pulled timeline has an older generation. - Refuse to switch membership if the timeline is being deleted (`is_canceled`). - Refuse to switch membership in compute greeting request if the safekeeper is not a member of `mconf`. - Pass `mconf` in `PullTimelineRequest` in safekeeper_service --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-07-23 11:01:04 +00:00
Heikki Linnakangas	6d8b1cc754	silence compiler warning about using variable unused	2025-07-23 13:47:35 +03:00
Heikki Linnakangas	35da660200	more work on exposing LFC stats	2025-07-23 13:39:32 +03:00
Heikki Linnakangas	bfdd37b54e	Fix segfault in unimplemented function We need to implement this eventually, but for now let's at least silence the segfault. See also https://github.com/neondatabase/neon/pull/12696	2025-07-23 13:08:59 +03:00
Heikki Linnakangas	6cd1295d9f	Refactor communicator process initialization when new communicator is not used This should fix the 'cargo test' failures on xlog_utils tests, which launch Postgres in stand-alone mode, i.e. without setting 'neon_tenant'	2025-07-23 13:01:19 +03:00
Erik Grinaker	eaec6e2fb4	Fix notify_local shard count	2025-07-23 11:16:35 +02:00
Heikki Linnakangas	f7e403eea1	Fix broken link in doc comment	2025-07-23 11:37:27 +03:00
Erik Grinaker	464ed0cbc7	rustfmt	2025-07-23 09:41:01 +02:00
Erik Grinaker	f55ccd2c17	Fix lints	2025-07-23 08:17:06 +02:00

1 2 3 4 5 ...

8772 Commits