rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-16 18:02:56 +00:00

Author	SHA1	Message	Date
Gleb Novikov	047b986f7f	Removed version limitation from fast import tests	2025-01-16 18:03:01 +00:00
Gleb Novikov	ed189d733f	Merge branch 'main' into 22037-basic-fast-import-e2e	2025-01-16 16:46:53 +00:00
Alex Chi Z.	cccc196848	refactor(pageserver): make partitioning an ArcSwap (#10377 ) ## Problem gc-compaction needs the partitioning data to decide the job split. This refactor allows concurrent access/computing the partitioning. ## Summary of changes Make `partitioning` an ArcSwap so that others can access the partitioning while we compute it. Fully eliminate the `repartition is called concurrently` warning when gc-compaction is going on. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-16 15:33:37 +00:00
Arpad Müller	e436dcad57	Rename "disabled" safekeeper scheduling policy to "pause" (#10410 ) Rename the safekeeper scheduling policy "disabled" to "pause". A rename was requested in https://github.com/neondatabase/neon/pull/10400#discussion_r1916259124, as the "disabled" policy is meant to be analogous to the "pause" policy for pageservers. Also simplify the `SkSchedulingPolicyArg::from_str` function, relying on the `from_str` implementation of `SkSchedulingPolicy`. Latter is used for the database format as well, so it is quite stable. If we ever want to change the UI, we'll need to duplicate the function again but this is cheap.	2025-01-16 14:30:49 +00:00
John Spray	21d7b6a258	tests: refactor test_tenant_delete_races_timeline_creation (#10425 ) ## Problem Threads spawned in `test_tenant_delete_races_timeline_creation` are not joined before the test ends, and can generate `PytestUnhandledThreadExceptionWarning` in other tests. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10419/12805365523/index.html#/testresult/53a72568acd04dbd ## Summary of changes - Wrap threads in ThreadPoolExecutor which will join them before the test ends - Remove a spurious deletion call -- the background thread doing deletion ought to succeed.	2025-01-16 14:11:33 +00:00
JC Grünhage	86dbc44db1	CI: Run check-codestyle-rust as part of pre-merge-checks (#10387 ) ## Problem When multiple changes are grouped in a merge group to be merged as part of the merge queue, the changes might individually pass `check-codestyle-rust` but not in their combined form. ## Summary of changes - Move `check-codestyle-rust` into a reusable workflow that is called from it's previous location in `build_and_test.yml`, and additionally call it from `pre_merge_checks.yml`. The additional call does not run on ARM, only x86, to ensure the merge queue continues being responsive. - Trigger `pre_merge_checks.yml` on PRs that change any of the workflows running in `pre_merge_checks.yml`, so that we get feedback on those early an not only after trying to merge those changes.	2025-01-16 09:20:24 +00:00
Tristan Partin	58f6af6c9a	Clean up compute_ctl extension server code (#10417 )	2025-01-16 08:35:36 +00:00
Matthias van de Meent	7be971081a	Make sure we request pages with a known-flushed LSN. (#10413 ) This should fix the largest source of flakyness of test_nbtree_pagesplit_cycleid. ## Problem https://github.com/neondatabase/neon/issues/10390 ## Summary of changes By using a guaranteed-flushed LSN, we ensure that PS won't have to wait forever. (If it does wait forever, we know the issue can't be with Compute's WAL)	2025-01-16 08:34:11 +00:00
Arseny Sher	6fe4c6798f	Add START_WAL_PUSH proto_version and allow_timeline_creation options. (#10406 ) ## Problem As part of https://github.com/neondatabase/neon/issues/8614 we need to pass options to START_WAL_PUSH. ## Summary of changes Add two options. `allow_timeline_creation`, default true, disables implicit timeline creation in the connection from compute. Eventually such creation will be forbidden completely, but as we migrate to configurations we need to support both: current mode and configurations enabled where creation by compute is disabled. `proto_version` specifies compute <-> sk protocol version. We have it currently in the first greeting package also, but I plan to change tag size from u64 to u8, which would make it hard to use. Command is more appropriate place for it anyway.	2025-01-16 08:01:19 +00:00
Matthias van de Meent	2eda484ef6	prefetch: Read more frequently from TCP buffer (#10394 ) This reduces pressure on the OS TCP read buffer by increasing the moments we read data out of the receive buffer, and increasing the number of bytes we can pull from that buffer when we do reads. ## Problem A backend may not always consume its prefetch data quick enough ## Summary of changes We add a new function `prefetch_pump_state` which pulls as many prefetch requests from the OS TCP receive buffer as possible, but without blocking. This thus reduces pressure on OS-level TCP buffers, thus increasing throughput by limiting throttling caused by full TCP buffers.	2025-01-16 02:43:47 +00:00
Mikhail Kot	c7429af8a0	Enable dblink (#10358 ) Update compute image to include dblink #3720	2025-01-15 22:29:18 +00:00
Alex Chi Z.	a753349cb0	feat(pageserver): validate data integrity during gc-compaction (#10131 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 part of investigation of https://github.com/neondatabase/neon/issues/10049 ## Summary of changes * If `cfg!(test) or cfg!(feature = testing)`, then we will always try generating an image to ensure the history is replayable, but not put the image layer into the final layer results, therefore discovering wrong key history before we hit a read error. * I suspect it's easier to trigger some races if gc-compaction is continuously run on a timeline, so I increased the frequency to twice per 10 churns. * Also, create branches in gc-compaction smoke tests to get more test coverage. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad@neon.tech>	2025-01-15 22:04:06 +00:00
Gleb Novikov	55a68b28a2	fast import: restore to neondb (not postgres) database (#10251 ) ## Problem `postgres` is system database at neon, so we need to do `pg_restore` into `neondb` instead https://github.com/neondatabase/cloud/issues/22100 ## Summary of changes Changed fast_import a little bit: 1. After succesfull connection creating `neondb` in postgres instance 2. Changed restore connstring to use new db 3. Added optional `source_connection_string`, which allows to skip `s3_prefix` and just connect directly. 4. Added `-i` that stops process until sigterm ## TODO - [x] test image in cplane e2e - [ ] Change import job image back to latest after this merged (partial revert of https://github.com/neondatabase/cloud/pull/22338)	2025-01-15 20:51:09 +00:00
John Spray	fb0e2acb2f	pageserver: add `page_trace` API for debugging (#10293 ) ## Problem When a pageserver is receiving high rates of requests, we don't have a good way to efficiently discover what the client's access pattern is. Closes: https://github.com/neondatabase/neon/issues/10275 ## Summary of changes - Add `/v1/tenant/x/timeline/y/page_trace?size_limit_bytes=...&time_limit_secs=...` API, which returns a binary buffer. - Add `pagectl page-trace` tool to decode and analyze the output. --------- Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-01-15 19:07:22 +00:00
Gleb Novikov	14318afcc0	Merge branch '22100-change-fastimport-db-name' into 22037-basic-fast-import-e2e	2025-01-15 19:02:03 +00:00
Gleb Novikov	ebc4735bf4	postgres waiting timeout & retry as constants	2025-01-15 18:56:07 +00:00
Arpad Müller	efaec6cdf8	Add endpoint and storcon cli cmd to set sk scheduling policy (#10400 ) Implementing the last missing endpoint of #9981, this adds support to set the scheduling policy of an individual safekeeper, as specified in the RFC. However, unlike in the RFC we call the endpoint `scheduling_policy` not `status` Closes #9981. As for why not use the upsert endpoint for this: we want to have the safekeeper upsert endpoint be used for testing and for deploying new safekeepers, but not for changes of the scheduling policy. We don't want to change any of the other fields when marking a safekeeper as decommissioned for example, so we'd have to first fetch them only to then specify them again. Of course one can also design an endpoint where one can omit any field and it doesn't get modified, but it's still not great for observability to put everything into one big "change something about this safekeeper" endpoint.	2025-01-15 18:15:30 +00:00
Gleb Novikov	4c2ee6a011	added 10 min timeout on waiting loop	2025-01-15 17:19:50 +00:00
Gleb Novikov	c09d817c98	review comments	2025-01-15 17:13:52 +00:00
Tristan Partin	3d41069dc4	Update pgrx in extension builds to 0.12.9 (#10372 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-15 16:26:58 +00:00
Vlad Lazar	dbebede7bf	safekeeper: fan out from single wal reader to multiple shards (#10190 ) ## Problem Safekeepers currently decode and interpret WAL for each shard separately. This is wasteful in terms of CPU memory usage - we've seen this in profiles. ## Summary of changes Fan-out interpreted WAL to multiple shards. The basic is that wal decoding and interpretation happens in a separate tokio task and senders attach to it. Senders only receive batches concerning their shard and only past the Lsn they've last seen. Fan-out is gated behind the `wal_reader_fanout` safekeeper flag (disabled by default for now). When fan-out is enabled, it might be desirable to control the absolute delta between the current position and a new shard's desired position (i.e. how far behind or ahead a shard may be). `max_delta_for_fanout` is a new optional safekeeper flag which dictates whether to create a new WAL reader or attach to the existing one. By default, this behaviour is disabled. Let's consider enabling it if we spot the need for it in the field. ## Testing Tests passed [here](https://github.com/neondatabase/neon/pull/10301) with wal reader fanout enabled as of `34f6a71718`. Related: https://github.com/neondatabase/neon/issues/9337 Epic: https://github.com/neondatabase/neon/issues/9329	2025-01-15 15:33:54 +00:00
Tristan Partin	3e529f124f	Remove leading slashes when downloading remote files (#10396 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-15 15:29:52 +00:00
Arseny Sher	05a71c7d6a	safekeeper: add membership configuration switch endpoint (#10241 ) ## Problem https://github.com/neondatabase/neon/issues/9965 ## Summary of changes Add to safekeeper http endpoint to switch membership configuration. Also add it to python client for tests, and add simple test itself.	2025-01-15 14:16:04 +00:00
Alexander Bayandin	b9464865b6	benchmarks: report successful runs to slack as well (#10393 ) ## Problem Successful `benchmarks` runs doesn't have enough visibility Ref https://neondb.slack.com/archives/C069Z2199DL/p1736868055094539 ## Summary of changes - Report both successful and failed `benchmarks` to Slack - Update `slackapi/slack-github-action` action	2025-01-15 13:05:05 +00:00
Vlad Lazar	1577430408	safekeeper: decode and interpret for multiple shards in one go (#10201 ) ## Problem Currently, we call `InterpretedWalRecord::from_bytes_filtered` from each shard. To serve multiple shards at the same time, the API needs to allow for enquiring about multiple shards. ## Summary of changes This commit tweaks it a pretty brute force way. Naively, we could just generate the shard for a key, but pre and post split shards may be subscribed at the same time, so doing it efficiently is more complex.	2025-01-15 11:10:24 +00:00
Erik Grinaker	05d17a10ae	rfc: add CPU and heap profiling RFC (#10085 ) This document proposes a standard cross-team pattern for CPU and memory profiling across applications and languages, using the [pprof](https://github.com/google/pprof) profile format. It enables both ad hoc profiles via HTTP endpoints, and continuous profiling across the fleet via [Grafana Cloud Profiles](https://grafana.com/docs/grafana-cloud/monitor-applications/profiles/). Continuous profiling incurs an overhead of about 0.1% CPU usage and 3% slower heap allocations. [Rendered](https://github.com/neondatabase/neon/blob/erik/profiling-rfc/docs/rfcs/040-profiling.md) Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888.	2025-01-15 10:35:38 +00:00
Arseny Sher	2d0ea08524	Add safekeeper membership conf to control file. (#10196 ) ## Problem https://github.com/neondatabase/neon/issues/9965 ## Summary of changes Add safekeeper membership configuration struct itself and storing it in the control file. In passing also add creation timestamp to the control file (there were cases where I wanted it in the past). Remove obsolete unused PersistedPeerInfo struct from control file (still keep it control_file_upgrade.rs to have it in old upgrade code). Remove the binary representation of cfile in the roundtrip test. Updating it is annoying, and we still test the actual roundtrip. Also add configuration to timeline creation http request, currently used only in one python test. In passing, slightly change LSNs meaning in the request: normally start_lsn is passed (the same as ancestor_start_lsn in similar pageserver call), but we allow specifying higher commit_lsn for manual intervention if needed. Also when given LSN initialize term_history with it.	2025-01-15 09:45:58 +00:00
Arseny Sher	c98cbbeac1	Add migration details to safekeeper membership RFC. (#10272 ) ## Problem https://github.com/neondatabase/neon/pull/8455 wasn't specific enough on migration from current situation to enabling generations. ## Summary of changes Describe the missing parts, including control plane pushing generation to compute, which also defines whether generations are enabled -- non zero value does it.	2025-01-15 09:41:49 +00:00
John Spray	47c1640acc	storage controller: pagination for tenant listing API (#10365 ) ## Problem For large deployments, the `control/v1/tenant` listing API can time out transmitting a monolithic serialized response. ## Summary of changes - Add `limit` and `start_after` parameters to listing API - Update storcon_cli to use these parameters and limit requests to 1000 items at a time	2025-01-14 21:37:32 +00:00
Erik Grinaker	6debb49b87	pageserver: coalesce index uploads when possible (#10248 ) ## Problem With upload queue reordering in #10218, we can easily get into a situation where multiple index uploads are queued back to back, which can't be parallelized. This will happen e.g. when multiple layer flushes enqueue layer/index/layer/index/... and the layers skip the queue and are uploaded in parallel. These index uploads will incur serial S3 roundtrip latencies, and may block later operations. Touches #10096. ## Summary of changes When multiple back-to-back index uploads are ready to upload, only upload the most recent index and drop the rest.	2025-01-14 21:10:17 +00:00
Erik Grinaker	e58e29e639	pageserver: limit number of upload queue tasks (#10384 ) ## Problem The upload queue can currently schedule an arbitrary number of tasks. This can both spawn an unbounded number of Tokio tasks, and also significantly slow down upload queue scheduling as it's quadratic in number of operations. Touches #10096. ## Summary of changes Limit the number of inprogress tasks to the remote storage upload concurrency. While this concurrency limit is shared across all tenants, there's certainly no point in scheduling more than this -- we could even consider setting the limit lower, but don't for now to avoid artificially constraining tenants.	2025-01-14 18:01:14 +00:00
Heikki Linnakangas	d36112d20f	Simplify compute dockerfile by setting PATH just once (#10357 ) By setting PATH in the 'pg-build' layer, all the extension build layers will inherit. No need to pass PG_CONFIG to all the various make invocations either: once pg_config is in PATH, the Makefiles will pick it up from there.	2025-01-14 17:02:35 +00:00
Erik Grinaker	ffaa52ff5d	pageserver: reorder upload queue when possible (#10218 ) ## Problem The upload queue currently sees significant head-of-line blocking. For example, index uploads act as upload barriers, and for every layer flush we schedule a layer and index upload, which effectively serializes layer uploads. Resolves #10096. ## Summary of changes Allow upload queue operations to bypass the queue if they don't conflict with preceding operations, increasing parallelism. NB: the upload queue currently schedules an explicit barrier after every layer flush as well (see #8550). This must be removed to enable parallelism. This will require a better mechanism for compaction backpressure, see e.g. #8390 or #5415.	2025-01-14 16:31:59 +00:00
John Spray	aa7323a384	storage controller: quality of life improvements for AZ handling (#10379 ) ## Problem Since https://github.com/neondatabase/neon/pull/9916, the preferred AZ of a tenant is much more impactful, and we would like to make it more visible in tooling. ## Summary of changes - Include AZ in node describe API - Include AZ info in node & tenant outputs in CLI - Add metrics for per-node shard counts, labelled by AZ - Add a CLI for setting preferred AZ on a tenant - Extend AZ-setting API+CLI to handle None for clearing preferred AZ	2025-01-14 15:30:43 +00:00
Christian Schwarz	2466a2f977	page_service: throttle individual requests instead of the batched request (#10353 ) ## Problem Before this PR, the pagestream throttle was applied weighted on a per-batch basis. This had several problems: 1. The throttle occurence counters were only bumped by `1` instead of `batch_size`. 2. The throttle wait time aggregator metric only counted one wait time, irrespective of `batch_size`. That makes sense in some ways of looking at it but not in others. 3. If the last request in the batch runs into the throttle, the other requests in the batch are also throttled, i.e., over-throttling happens (theoretical, didn't measure it in practice). ## Solution It occured to me that we can simply push the throttling upwards into `pagestream_read_message`. This has the added benefit that in pipeline mode, the `executor` stage will, if it is idle, steal whatever requests already made it into the `spsc_fold` and execute them; before this change, that was not the case - the throttling happened in the `executor` stage instead of the `batcher` stage. ## Code Changes There are two changes in this PR: 1. Lifting up the throttling into the `pagestream_read_message` method. 2. Move the throttling metrics out of the `Throttle` type into `SmgrOpMetrics`. Unlike the other smgr metrics, throttling is per-tenant, hence the Arc. 3. Refactor the `SmgrOpTimer` implementation to account for the new observation states, and simplify its design. 4. Drive-by-fix flush time metrics. It was using the same `now` in the `observe_guard` every time. The `SmgrOpTimer` is now a state machine. Each observation point moves the state machine forward. If a timer object is dropped early some "pair"-like metrics still require an increment or observation. That's done in the Drop implementation, by driving the state machine to completion.	2025-01-14 15:28:01 +00:00
Alex Chi Z.	9bdb14c1c0	fix(pageserver): ensure initial image layers have correct key ranges (#10374 ) ## Problem Discovered during the relation dir refactor work. If we do not create images as in this patch, we would get two set of image layers: ``` 0000...METADATA_KEYS 0000...REL_KEYS ``` They overlap at the same LSN and would cause data loss for relation keys. This doesn't happen in prod because initial image layer generation is never called, but better to be fixed to avoid future issues with the reldir refactors. ## Summary of changes * Consolidate create_image_layers call into a single one. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-14 15:27:48 +00:00
Conrad Ludgate	df4abd8b14	fix: force-refresh azure identity token (#10378 ) ## Problem Because of https://github.com/Azure/azure-sdk-for-rust/issues/1739, our identity token file was not being refreshed. This caused our uploads to start failing when the storage token expired. ## Summary of changes Drop and recreate the remote storage config every time we upload in order to force reload the identity token file.	2025-01-14 12:53:32 +00:00
Gleb Novikov	a80dcfa544	Capture LD_LIBRARY_PATH from pytest env	2025-01-14 11:45:34 +00:00
Gleb Novikov	337ad52c37	Fixed initdb locale	2025-01-14 11:45:33 +00:00
Gleb Novikov	fd0acb6195	poetry run ruff check --fix .	2025-01-14 11:45:33 +00:00
Gleb Novikov	a7f8b9f6b5	poetry run ruff format .	2025-01-14 11:45:33 +00:00
Gleb Novikov	1e9707f7ee	Added todo on full import test with pageserver	2025-01-14 11:45:33 +00:00
Gleb Novikov	6d297857a7	Moved test_fast_import to test_import_pgdata	2025-01-14 11:45:33 +00:00
Gleb Novikov	6e5a0add43	Implemented basic test of fast import	2025-01-14 11:45:33 +00:00
Gleb Novikov	e291fb7edc	Fixture for fast_import binary is working	2025-01-14 11:45:33 +00:00
Gleb Novikov	ebe26e218b	cargo fmt --all	2025-01-14 11:45:18 +00:00
Gleb Novikov	2f0a127e0c	Create neondb database and restore into it	2025-01-14 11:45:18 +00:00
Gleb Novikov	131ab74be8	effective_io_concurrency=0 on macos	2025-01-14 11:45:18 +00:00
Gleb Novikov	3a66eebf4e	Made fast_import testable locally (made s3 prefix optional, added source_connection_string param)	2025-01-14 11:45:18 +00:00
Konstantin Knizhnik	a039f8381f	Optimize vector get last written LSN (#10360 ) ## Problem See https://github.com/neondatabase/neon/issues/10281 pg17 performs extra lock/unlock operation when fetching LwLSN. ## Summary of changes Perform all lookups under one lock, moving initialization of not found keys to separate loop. Related Postgres PR: https://github.com/neondatabase/postgres/pull/553 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-14 05:54:30 +00:00

1 2 3 4 5 ...

6963 Commits