rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-14 17:02:56 +00:00

Author	SHA1	Message	Date
Ivan Efremov	e27d674b8e	Update anstream package and deps Update because of cargo deny check error: error[unsound]: Unsoundness in anstream	2024-12-04 18:22:31 +02:00
a-masterov	dec2e2fb29	Create a branch for compute release (#9637 ) ## Problem We practice a manual release flow for the compute module. This will allow automation of the compute release process. ## Summary of changes The workflow was modified to make a compute release automatically on the branch release-compute. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-12-04 13:10:00 +00:00
Erik Grinaker	699a213c5d	Display reqwest error source (#10004 ) ## Problem Reqwest errors don't include details about the inner source error. This means that we get opaque errors like: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config) ``` Instead of the more helpful: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out ``` Touches #9801. ## Summary of changes Include the source error for `reqwest::Error` wherever it's displayed.	2024-12-04 13:05:53 +00:00
Alexey Kondratov	9a4157dadb	feat(compute): Set default application_name for pgbouncer connections (#9973 ) ## Problem When client specifies `application_name`, pgbouncer propagates it to the Postgres. Yet, if client doesn't do it, we have hard time figuring out who opens a lot of Postgres connections (including the `cloud_admin` ones). See this investigation as an example: https://neondb.slack.com/archives/C0836R0RZ0D ## Summary of changes I haven't found this documented, but it looks like pgbouncer accepts standard Postgres connstring parameters in the connstring in the `[databases]` section, so put the default `application_name=pgbouncer` there. That way, we will always see who opens Postgres connections. I did tests, and if client specifies a `application_name`, pgbouncer overrides this default, so it only works if it's not specified or set to blank `&application_name=` in the connection string. This is the last place we could potentially open some Postgres connections without `application_name`. Everything else should be either of two: 1. Direct client connections without `application_name`, but these should be strictly non-`cloud_admin` ones 2. Some ad-hoc internal connections, so if we see spikes of unidentified `cloud_admin` connections, we will need to investigate it again. Fixes neondatabase/cloud#20948	2024-12-04 13:05:31 +00:00
Conrad Ludgate	bd52822e14	feat(proxy): add option to forward startup params (#9979 ) (stacked on #9990 and #9995) Partially fixes #1287 with a custom option field to enable the fixed behaviour. This allows us to gradually roll out the fix without silently changing the observed behaviour for our customers. related to https://github.com/neondatabase/cloud/issues/15284	2024-12-04 12:58:35 +00:00
Folke Behrens	dcd016bbfc	Assign /libs/proxy/ to proxy team (#10003 )	2024-12-04 12:58:31 +00:00
Erik Grinaker	7b18e33997	pageserver: return proper status code for heatmap_upload errors (#9991 ) ## Problem During deploys, we see a lot of 500 errors due to heapmap uploads for inactive tenants. These should be 503s instead. Resolves #9574. ## Summary of changes Make the secondary tenant scheduler use `ApiError` rather than `anyhow::Error`, to propagate the tenant error and convert it to an appropriate status code.	2024-12-04 12:53:52 +00:00
Peter Bendel	9d75218ba7	fix parsing human time output like "50m37s" (#10001 ) ## Problem In ingest_benchmark.yml workflow we use pgcopydb tool to migrate project. pgcopydb logs human time. Our parsing of the human time doesn't work for times like "50m37s". [Example workflow](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:479) contains "57m45s" but we [reported](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:500) only the seconds part: 45.000 s ## Summary of changes add a regex pattern for Minute/Second combination	2024-12-04 11:37:24 +00:00
Peter Bendel	1b3558df7a	optimize parms for ingest bench (#9999 ) ## Problem we tried different parallelism settings for ingest bench ## Summary of changes the following settings seem optimal after merging - SK side Wal filtering - batched getpages Settings: - effective_io_concurrency 100 - concurrency limit 200 (different from Prod!) - jobs 4, maintenance workers 7 - 10 GB chunk size	2024-12-04 11:07:22 +00:00
Vlad Lazar	68205c48ed	storcon: return an error for drain attempts while paused (#9997 ) ## Problem We currently allow drain operations to proceed while the node policy is paused. ## Summary of changes Return a precondition failed error in such cases. The orchestrator is updated in https://github.com/neondatabase/infra/pull/2544 to skip drain and fills if the pageserver is paused. Closes: https://github.com/neondatabase/neon/issues/9907	2024-12-04 09:25:29 +00:00
Christian Schwarz	8d93d02c2f	page_service: enable batching in Rust & Python Tests + Python benchmarks (#9993 ) This is the first step towards batching rollout. Refs - rollout plan: https://github.com/neondatabase/cloud/issues/20620 - task https://github.com/neondatabase/neon/issues/9377 - uber-epic: https://github.com/neondatabase/neon/issues/9376	2024-12-04 00:07:49 +00:00
Alexander Bayandin	023821a80c	test_page_service_batching: fix non-numeric metrics (#9998 ) ## Problem ``` 2024-12-03T15:42:46.5978335Z + poetry run python /__w/neon/neon/scripts/ingest_perf_test_result.py --ingest /__w/neon/neon/test_runner/perf-report-local 2024-12-03T15:42:49.5325077Z Traceback (most recent call last): 2024-12-03T15:42:49.5325603Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 165, in <module> 2024-12-03T15:42:49.5326029Z main() 2024-12-03T15:42:49.5326316Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 155, in main 2024-12-03T15:42:49.5326739Z ingested = ingest_perf_test_result(cur, item, recorded_at_timestamp) 2024-12-03T15:42:49.5327488Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-12-03T15:42:49.5327914Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 99, in ingest_perf_test_result 2024-12-03T15:42:49.5328321Z psycopg2.extras.execute_values( 2024-12-03T15:42:49.5328940Z File "/github/home/.cache/pypoetry/virtualenvs/non-package-mode-_pxWMzVK-py3.11/lib/python3.11/site-packages/psycopg2/extras.py", line 1299, in execute_values 2024-12-03T15:42:49.5335618Z cur.execute(b''.join(parts)) 2024-12-03T15:42:49.5335967Z psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "concurrent-futures" 2024-12-03T15:42:49.5336287Z LINE 57: 'concurrent-futures', 2024-12-03T15:42:49.5336462Z ^ ``` ## Summary of changes - `test_page_service_batching`: save non-numeric params as `labels` - Add a runtime check that `metric_value` is NUMERIC	2024-12-03 22:46:18 +00:00
Christian Schwarz	944c1adc4c	tests & benchmarks: unify the way we customize the default tenant config (#9992 ) Before this PR, some override callbacks used `.default()`, others used `.setdefault()`. As of this PR, all callbacks use `.setdefault()` which I think is least prone to failure. Aligning on a single way will set the right example for future tests that need such customization. The `test_pageserver_getpage_throttle.py` technically is a change in behavior: before, it replaced the `tenant_config` field, now it just configures the throttle. This is what I believe is intended anyway.	2024-12-03 22:07:03 +00:00
Arpad Müller	ca85f364ba	Support tenant manifests in the scrubber (#9942 ) Support tenant manifests in the storage scrubber: * list the manifests, order them by generation * delete all manifests except for the two most recent generations * for the latest manifest: try parsing it. I've tested this patch by running the against a staging bucket and it successfully deleted stuff (and avoided deleting the latest two generations). In follow-up work, we might want to also check some invariants of the manifest, as mentioned in #8088. Part of #9386 Part of #8088 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-03 20:39:10 +00:00
Conrad Ludgate	9ef0662a42	chore(proxy): enforce single host+port (#9995 ) proxy doesn't ever provide multiple hosts/ports, so this code adds a lot of complexity of error handling for no good reason. (stacked on #9990)	2024-12-03 20:00:14 +00:00
Alexey Immoreev	3baef0bca3	Improvement: add console redirect timeout warning (#9985 ) ## Problem There is no information on session being cancelled in 2 minutes at the moment ## Summary of changes The timeout being logged for the user	2024-12-03 18:59:44 +00:00
Erik Grinaker	f312c6571f	pageserver: respond to multiple shutdown signals (#9982 ) ## Problem The Pageserver signal handler would only respond to a single signal and initiate shutdown. Subsequent signals were ignored. This meant that a `SIGQUIT` sent after a `SIGTERM` had no effect (e.g. in the case of a slow or stalled shutdown). The `test_runner` uses this to force shutdown if graceful shutdown is slow. Touches #9740. ## Summary of changes Keep responding to signals after the initial shutdown signal has been received. Arguably, the `test_runner` should also use `SIGKILL` rather than `SIGQUIT` in this case, but it seems reasonable to respond to `SIGQUIT` regardless.	2024-12-03 18:47:17 +00:00
Conrad Ludgate	27a42d0f96	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-03 18:39:23 +00:00
John Spray	b04ab468ee	pageserver: more detailed logs when calling re-attach (#9996 ) ## Problem We saw a peculiar case where a pageserver apparently got a 0-tenant response to `/re-attach` but we couldn't see the request landing on a storage controller. It was hard to confirm retrospectively that the pageserver was configured properly at the moment it sent the request. ## Summary of changes - Log the URL to which we are sending the request - Log the NodeId and metadata that we sent	2024-12-03 18:36:37 +00:00
John Spray	dcb629532b	pageserver: only store SLRUs & aux files on shard zero (#9786 ) ## Problem Since https://github.com/neondatabase/neon/pull/9423 the non-zero shards no longer need SLRU content in order to do GC. This data is now redundant on shards >0. One release cycle after merging that PR, we may merge this one, which also stops writing those pages to shards > 0, reaping the efficiency benefit. Closes: https://github.com/neondatabase/neon/issues/7512 Closes: https://github.com/neondatabase/neon/issues/9641 ## Summary of changes - Avoid storing SLRUs on non-zero shards - Bonus: avoid storing aux files on non-zero shards	2024-12-03 17:22:49 +00:00
John Spray	71d004289c	storcon: in shard splits, inherit parent's AZ (#9946 ) ## Problem Sharded tenants should be run in a single AZ for best performance, so that computes have AZ-local latency to all the shards. Part of https://github.com/neondatabase/neon/issues/8264 ## Summary of changes - When we split a tenant, instead of updating each shard's preferred AZ to wherever it is scheduled, propagate the preferred AZ from the parent. - Drop the check in `test_shard_preferred_azs` that asserts shards end up in their preferred AZ: this will not be true again until the optimize_attachment logic is updated to make this so. The existing check wasn't testing anything about scheduling, it was just asserting that we set preferred AZ in a way that matches the way things happen to be scheduled at time of split.	2024-12-03 16:55:00 +00:00
Christian Schwarz	4d422b937c	pageserver: only throttle pagestream requests & bring back throttling deduction for smgr latency metrics (#9962 ) ## Problem In the batching PR - https://github.com/neondatabase/neon/pull/9870 I stopped deducting the time-spent-in-throttle fro latency metrics, i.e., - smgr latency metrics (`SmgrOpTimer`) - basebackup latency (+scan latency, which I think is part of basebackup). The reason for stopping the deduction was that with the introduction of batching, the trick with tracking time-spent-in-throttle inside RequestContext and swap-replacing it from the `impl Drop for SmgrOpTimer` no longer worked with >1 requests in a batch. However, deducting time-spent-in-throttle is desirable because our internal latency SLO definition does not account for throttling. ## Summary of changes - Redefine throttling to be a page_service pagestream request throttle instead of a throttle for repository `Key` reads through `Timeline::get` / `Timeline::get_vectored`. - This means reads done by `basebackup` are no longer subject to any throttle. - The throttle applies after batching, before handling of the request. - Drive-by fix: make throttle sensitive to cancellation. - Rename metric label `kind` from `timeline_get` to `pagestream` to reflect the new scope of throttling. To avoid config format breakage, we leave the config field named `timeline_get_throttle` and ignore the `task_kinds` field. This will be cleaned up in a future PR. ## Trade-Offs Ideally, we would apply the throttle before reading a request off the connection, so that we queue the minimal amount of work inside the process. However, that's not possible because we need to do shard routing. The redefinition of the throttle to limit pagestream request rate instead of repository `Key` rate comes with several downsides: - We're no longer able to use the throttle mechanism for other other tasks, e.g. image layer creation. However, in practice, we never used that capability anyways. - We no longer throttle basebackup.	2024-12-03 15:25:58 +00:00
Erik Grinaker	bbe4dfa991	test_runner: use immediate shutdown in `test_sharded_ingest` (#9984 ) ## Problem `test_sharded_ingest` ingests a lot of data, which can cause shutdown to be slow e.g. due to local "S3 uploads" or compactions. This can cause test flakes during teardown. Resolves #9740. ## Summary of changes Perform an immediate shutdown of the cluster.	2024-12-03 14:33:31 +00:00
Erik Grinaker	dcb24ce170	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-03 11:35:59 +00:00
a-masterov	a2a942f93c	Add support for the extensions test for Postgres v17 (#9748 ) ## Problem The extensions for Postgres v17 are ready but we do not test the extensions shipped with v17 ## Summary of changes Build the test image based on Postgres v17. Run the tests for v17. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2024-12-03 11:25:29 +00:00
Christian Schwarz	cb10be710d	page_service: batching observability & include throttled time in smgr metrics (#9870 ) This PR - fixes smgr metrics https://github.com/neondatabase/neon/issues/9925 - adds an additional startup log line logging the current batching config - adds a histogram of batch sizes global and per-tenant - adds a metric exposing the current batching config The issue described #9925 is that before this PR, request latency was only observed after batching. This means that smgr latency metrics (most importantly getpage latency) don't account for - `wait_lsn` time - time spent waiting for batch to fill up / the executor stage to pick up the batch. The fix is to use a per-request batching timer, like we did before the initial batching PR. We funnel those timers through the entire request lifecycle. I noticed that even before the initial batching changes, we weren't accounting for the time spent writing & flushing the response to the wire. This PR drive-by fixes that deficiency by dropping the timers at the very end of processing the batch, i.e., after the `pgb.flush()` call. I was *unable to maintain the behavior that we deduct time-spent-in-throttle from various latency metrics. The reason is that we're using a single* counter in `RequestContext` to track micros spent in throttle. But there are N metrics timers in the batch, one per request. As a consequence, the practice of consuming the counter in the drop handler of each timer no longer works because all but the first timer will encounter error `close() called on closed state`. A failed attempt to maintain the current behavior can be found in https://github.com/neondatabase/neon/pull/9951. So, this PR remvoes the deduction behavior from all metrics. I started a discussion on Slack about it the implications this has for our internal SLO calculation: https://neondb.slack.com/archives/C033RQ5SPDH/p1732910861704029 # Refs - fixes https://github.com/neondatabase/neon/issues/9925 - sub-issue https://github.com/neondatabase/neon/issues/9377 - epic: https://github.com/neondatabase/neon/issues/9376	2024-12-03 11:03:23 +00:00
Christian Schwarz	15d01b257a	storcon_cli tenant-describe: include tenant-wide information in output (#9899 ) Before this PR, the storcon_cli didn't have a way to show the tenant-wide information of the TenantDescribeResponse. Sadly, the `Serialize` impl for the tenant config doesn't skip on `None`, so, the output becomes a bit bloated. Maybe we can use `skip_serializing_if(Option::is_none)` in the future. => https://github.com/neondatabase/neon/issues/9983	2024-12-03 10:55:13 +00:00
John Spray	aaee713e53	storcon: use proper schedule context during node delete (#9958 ) ## Problem I was touching `test_storage_controller_node_deletion` because for AZ scheduling work I was adding a change to the storage controller (kick secondaries during optimisation) that made a FIXME in this test defunct. While looking at it I also realized that we can easily fix the way node deletion currently doesn't use a proper ScheduleContext, using the iterator type recently added for that purpose. ## Summary of changes - A testing-only behavior in storage controller where if a secondary location isn't yet ready during optimisation, it will be actively polled. - Remove workaround in `test_storage_controller_node_deletion` that previously was needed because optimisation would get stuck on cold secondaries. - Update node deletion code to use a `TenantShardContextIterator` and thereby a proper ScheduleContext	2024-12-03 08:59:38 +00:00
Alexey Kondratov	2e9207fdf3	fix(testing): Use 1 MB shared_buffers even with LFC (#9969 ) ## Problem After enabling LFC in tests and lowering `shared_buffers` we started having more problems with `test_pg_regress`. ## Summary of changes Set `shared_buffers` to 1MB to both exercise getPage requests/LFC, and still have enough room for Postgres to operate. Everything smaller might be not enough for Postgres under load, and can cause errors like 'no unpinned buffers available'. See Konstantin's comment [1] as well. Fixes #9956 [1]: https://github.com/neondatabase/neon/issues/9956#issuecomment-2511608097	2024-12-02 18:46:06 +00:00
Tristan Partin	d8ebd33fe6	Stop changing the value of neon.extension_server_port at runtime (#9972 ) On reconfigure, we no longer passed a port for the extension server which caused us to not write out the neon.extension_server_port line. Thus, Postgres thought we were setting the port to the default value of 0. PGC_POSTMASTER GUCs cannot be set at runtime, which causes the following log messages: > LOG: parameter "neon.extension_server_port" cannot be changed without restarting the server > LOG: configuration file "/var/db/postgres/compute/pgdata/postgresql.conf" contains errors; unaffected changes were applied Fixes: https://github.com/neondatabase/neon/issues/9945 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-02 18:06:19 +00:00
Conrad Ludgate	2dc238e5b3	feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971 ) Fix the HTTP AuthMethod to accomodate the JWT authorization method. Introduces the JWT issuer as an additional field in the parquet logs	2024-12-02 17:54:32 +00:00
Folke Behrens	243bca1c49	Bump OTel, tracing, reqwest crates (#9970 )	2024-12-02 17:24:48 +00:00
Arseny Sher	fa909c27fc	Update consensus protocol spec (#9607 ) The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have. ref https://github.com/neondatabase/neon/issues/8699	2024-12-02 16:10:44 +00:00
Folke Behrens	1b60571636	proxy: Create Elasticache credentials provider lazily (#9967 ) ## Problem The credentials providers tries to connect to AWS STS even when we use plain Redis connections. ## Summary of changes * Construct the CredentialsProvider only when needed ("irsa").	2024-12-02 15:38:12 +00:00
Alexander Bayandin	c18716bb3f	CI(replication-tests): fix notifications about replication-tests failures (#9950 ) ## Problem `if: ${{ github.event.schedule }}` gets skipped if a previous step has failed, but we want to run the step for both `success` and `failure` ## Summary of changes - Add `!cancelled()` to notification step if-condition, to skip only cancelled jobs	2024-12-02 12:46:07 +00:00
Conrad Ludgate	cd1d2d1996	fix(proxy): forward notifications from authentication (#9948 ) Fixes https://github.com/neondatabase/cloud/issues/20973. This refactors `connect_raw` in order to return direct access to the delayed notices. I cannot find a way to test this with psycopg2 unfortunately, although testing it with psql does return the expected results.	2024-12-02 12:29:57 +00:00
John Spray	bd09369198	storcon: add metric for AZ scheduling violations (#9949 ) ## Problem We can't easily tell how far the state of shards is from their AZ preferences. This can be a cause of performance issues, so it's important for diagnosability that we can tell easily if there are significant numbers of shards that aren't running in their preferred AZ. Related: https://github.com/neondatabase/cloud/issues/15413 ## Summary of changes - In reconcile_all, count shards that are scheduled into the wrong AZ (if they have a preference), and publish it as a prometheus gauge. - Also calculate a statistic for how many shards wanted to reconcile but couldn't. This is clearly a lazy calculation: reconcile all only runs periodically. But that's okay: shards in the wrong AZ is something that only matters if it stays that way for some period of time.	2024-12-02 11:50:22 +00:00
Erik Grinaker	5330122049	test_runner: improve `wait_until` (#9936 ) Improves `wait_until` by: * Use `timeout` instead of `iterations`. This allows changing the timeout/interval parameters independently. * Make `timeout` and `interval` optional (default 20s and 0.5s). Most callers don't care. * Only output status every 1s by default, and add optional `status_interval` parameter. * Remove `show_intermediate_error`, this was always emitted anyway. Most callers have been updated to use the defaults, except where they had good reason otherwise.	2024-12-02 10:26:15 +00:00
Anastasia Lubennikova	45658ccccb	Update pgvector to 0.8.0 (#9733 )	2024-12-02 10:10:51 +00:00
John Spray	14853a3284	storcon: don't take any Service locks in /status and /ready (#9944 ) ## Problem We saw unexpected container terminations when running in k8s with with small CPU resource requests. The /status and /ready handlers called `maybe_forward`, which always takes the lock on Service::inner. If there is a lot of writer lock contention, and the container is starved of CPU, this increases the likelihood that we will get killed by the kubelet. It isn't certain that this was a cause of issues, but it is a potential source that we can eliminate. ## Summary of changes - Revise logic to return immediately if the URL is in the non-forwarded list, rather than calling maybe_forward	2024-12-01 18:09:58 +00:00
Konstantin Knizhnik	aad809b048	Fix issues with prefetch ring buffer resize (#9847 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479 We observe the following error in the logs ``` [XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128 ``` most likely caused by changing `neon.readahead_buffer_size` ## Summary of changes 1. Copy shard state 2. Do not use prefetch_set_unused in readahead_buffer_resize 3. Change prefetch buffer overflow criteria --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-01 15:47:28 +00:00
Alexander Bayandin	fae8e7ba76	Compute image: prepare Postgres v14-v16 for Debian 12 (#9954 ) ## Problem Current compute images for Postgres 14-16 don't build on Debian 12 because of issues with extensions. This PR fixes that, but for the current setup, it is mostly a no-op change. ## Summary of changes - Use `/bin/bash -euo pipefail` as SHELL to fail earlier - Fix `plv8` build: backport a trivial patch for v8 - Fix `postgis` build: depend `sfgal` version on Debian version instead of Postgres version Tested in: https://github.com/neondatabase/neon/pull/9849	2024-12-01 13:04:37 +00:00
Konstantin Knizhnik	97a9abd181	Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057 ) ## Problem See https://github.com/neondatabase/neon/issues/9023 ## Summary of changes Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of replica and primary configuration mismatch See https://github.com/neondatabase/postgres/pull/501 See https://github.com/neondatabase/postgres/pull/502 See https://github.com/neondatabase/postgres/pull/503 See https://github.com/neondatabase/postgres/pull/504 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-12-01 12:23:10 +00:00
Folke Behrens	4abc8e5282	Merge the consumption metric pushes (#9939 ) #8564 ## Problem The main and backup consumption metric pushes are completely independent, resulting in different event time windows and different idempotency keys. ## Summary of changes * Merge the push tasks, but keep chunks the same size.	2024-11-30 10:11:37 +00:00
Christian Schwarz	aa4ec11af9	page_service: rewrite batching to work without a timeout (#9851 ) # Problem The timeout-based batching adds latency to unbatchable workloads. We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see [this](https://github.com/neondatabase/neon/pull/9822) abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads. # Solution The insight is that, in the steady state of a batchable workload, the time we spend in `get_vectored` will be hundreds of microseconds anyway. If we prepare the next batch concurrently to `get_vectored`, we will have a sizeable batch ready once `get_vectored` of the current batch is done and do not need an explicit timeout. This can be reasonably described as pipelining of the protocol handler. # Implementation We model the sub-protocol handler for pagestream requests (`handle_pagrequests`) as two futures that form a pipeline: 2. Batching: read requests from the connection and fill the current batch 3. Execution: `take` the current batch, execute it using `get_vectored`, and send the response. The Reading and Batching stage are connected through a new type of channel called `spsc_fold`. See the long comment in the `handle_pagerequests_pipelined` for details. # Changes - Refactor `handle_pagerequests` - separate functions for - reading one protocol message; produces a `BatchedFeMessage` with just one page request in it - batching; tried to merge an incoming `BatchedFeMessage` into an existing `BatchedFeMessage`; returns `None` on success and returns back the incoming message in case merging isn't possible - execution of a batched message - unify the timeline handle acquisition & request span construction; it now happen in the function that reads the protocol message - Implement serial and pipelined model - serial: what we had before any of the batching changes - read one protocol message - execute protocol messages - pipelined: the design described above - optionality for execution of the pipeline: either via concurrent futures vs tokio tasks - Pageserver config - remove batching timeout field - add ability to configure pipelining mode - add ability to limit max batch size for pipelined configurations (required for the rollout, cf https://github.com/neondatabase/cloud/issues/20620 ) - ability to configure execution mode - Tests - remove `batch_timeout` parametrization - rename `test_getpage_merge_smoke` to `test_throughput` - add parametrization to test different max batch sizes and execution moes - rename `test_timer_precision` to `test_latency` - rename the test case file to `test_page_service_batching.py` - better descriptions of what the tests actually do ## On the holding The `TimelineHandle` in the pending batch While batching, we hold the `TimelineHandle` in the pending batch. Therefore, the timeline will not finish shutting down while we're batching. This is not a problem in practice because the concurrently ongoing `get_vectored` call will fail quickly with an error indicating that the timeline is shutting down. This results in the Execution stage returning a `QueryError::Shutdown`, which causes the pipeline / entire page service connection to shut down. This drops all references to the `Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the contained `TimelineHandle`s. - => fixes https://github.com/neondatabase/neon/issues/9850 # Performance Local run of the benchmarks, results in [this empty commit](`1cf5b1463f`) in the PR branch. Key take-aways: * `concurrent-futures` and `tasks` deliver identical `batching_factor` * tail latency impact unknown, cf https://github.com/neondatabase/neon/issues/9837 * `concurrent-futures` has higher throughput than `tasks` in all workloads (=lower `time` metric) * In unbatchable workloads, `concurrent-futures` has 5% higher `CPU-per-throughput` than that of `tasks`, and 15% higher than that of `serial`. * In batchable-32 workload, `concurrent-futures` has 8% lower `CPU-per-throughput` than that of `tasks` (comparison to tput of `serial` is irrelevant) * in unbatchable workloads, mean and tail latencies of `concurrent-futures` is practically identical to `serial`, whereas `tasks` adds 20-30us of overhead Overall, `concurrent-futures` seems like a slightly more attractive choice. # Rollout This change is disabled-by-default. Rollout plan: - https://github.com/neondatabase/cloud/issues/20620 # Refs - epic: https://github.com/neondatabase/neon/issues/9376 - this sub-task: https://github.com/neondatabase/neon/issues/9377 - the abandoned attempt to improve batching timeout resolution: https://github.com/neondatabase/neon/pull/9820 - closes https://github.com/neondatabase/neon/issues/9850 - fixes https://github.com/neondatabase/neon/issues/9835	2024-11-30 00:16:24 +00:00
Matthias van de Meent	973a8d2680	Fix timeout value used in XLogWaitForReplayOf (#9937 ) The previous value assumed usec precision, while the timeout used is in milliseconds, causing replica backends to wait for (potentially) many hours for WAL replay without the expected progress reports in logs. This fixes the issue. Reported-By: Alexander Lakhin <exclusion@gmail.com> ## Problem https://github.com/neondatabase/postgres/pull/279#issuecomment-2507671817 The timeout value was configured with the assumption the indicated value would be microseconds, where it's actually milliseconds. That causes the backend to wait for much longer (2h46m40s) before it emits the "I'm waiting for recovery" message. While we do have wait events configured on this, it's not great to have stuck backends without clear logs, so this fixes the timeout value in all our PostgreSQL branches. ## PG PRs * PG14: https://github.com/neondatabase/postgres/pull/542 * PG15: https://github.com/neondatabase/postgres/pull/543 * PG16: https://github.com/neondatabase/postgres/pull/544 * PG17: https://github.com/neondatabase/postgres/pull/545	2024-11-29 19:10:26 +00:00
Gleb Novikov	c848f25ec2	Fixed fast_import pgbin in calling get_pg_version (#9933 ) Was working on https://github.com/neondatabase/cloud/pull/20795 and discovered that fast_import is not working normally.	2024-11-29 17:58:36 +00:00
John Spray	d5624cc505	pageserver: download small objects using a smaller timeout (#9938 ) ## Problem It appears that the Azure storage API tends to hang TCP connections more than S3 does. Currently we use a 2 minute timeout for all downloads. This is large because sometimes the objects we download are large. However, waiting 2 minutes when doing something like downloading a manifest on tenant attach is problematic, because when someone is doing a "create tenant, create timeline" workflow, that 2 minutes is long enough for them reasonably to give up creating that timeline. Rather than propagate oversized timeouts further up the stack, we should use a different timeout for objects that we expect to be small. Closes: https://github.com/neondatabase/neon/issues/9836 ## Summary of changes - Add a `small_timeout` configuration attribute to remote storage, defaulting to 30 seconds (still a very generous period to do something like download an index) - Add a DownloadKind parameter to DownloadOpts, so that callers can indicate whether they expect the object to be small or large. - In the azure client, use small timeout for HEAD requests, and for GET requests if DownloadKind::Small is used. - Use DownloadKind::Small for manifests, indices, and heatmap downloads. This PR intentionally does not make the equivalent change to the S3 client, to reduce blast radius in case this has unexpected consequences (we could accomplish the same thing by editing lots of configs, but just skipping the code is simpler for right now)	2024-11-29 15:11:44 +00:00
Alexey Kondratov	538e2312a6	feat(compute_ctl): Always set application_name (#9934 ) ## Problem It was not always possible to judge what exactly some `cloud_admin` connections were doing because we didn't consistently set `application_name` everywhere. ## Summary of changes Unify the way we connect to Postgres: 1. Switch to building configs everywhere 2. Always set `application_name` and make naming consistent Follow-up for #9919 Part of neondatabase/cloud#20948	2024-11-29 13:55:56 +00:00
Erik Grinaker	a6073b5013	safekeeper: use jemalloc (#9780 ) ## Problem To add Safekeeper heap profiling in #9778, we need to switch to an allocator that supports it. Pageserver and proxy already use jemalloc. Touches #9534. ## Summary of changes Use jemalloc in Safekeeper.	2024-11-29 13:38:04 +00:00

1 2 3 4 5 ...

6698 Commits