rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 17:32:56 +00:00

Author	SHA1	Message	Date
Christian Schwarz	27a35331c0	pagebench: add a 'basebackup' benchmark	2023-12-15 17:09:23 +00:00
Christian Schwarz	0b9f0e72ac	pagebench: scaffold	2023-12-15 17:09:21 +00:00
Christian Schwarz	300d6c38ad	Merge remote-tracking branch 'origin/problame/benchmarking/pr/keyspace-in-mgmt-api' into problame/benchmarking/pr/page_service_api_client	2023-12-15 17:06:41 +00:00
Christian Schwarz	8985331533	Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api	2023-12-15 15:50:15 +00:00
Christian Schwarz	a6abcbe454	hakari manage-deps	2023-12-15 15:49:08 +00:00
Christian Schwarz	888a7311f4	preseed rng for display_fromstr_bijection test case	2023-12-15 15:44:09 +00:00
Christian Schwarz	46889d768e	move client to separate crate	2023-12-15 15:21:05 +00:00
Christian Schwarz	7ac9ef8291	remove unused dep	2023-12-15 13:38:34 +00:00
Christian Schwarz	d7a8e0b1ae	eliminate one workaround, convert sk stuff to async as well	2023-12-15 13:34:32 +00:00
Christian Schwarz	83bdebb4af	WIP	2023-12-15 12:22:33 +00:00
Christian Schwarz	a5214b203d	Merge remote-tracking branch 'origin/problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/page_service_api_client	2023-12-15 09:59:32 +00:00
Christian Schwarz	9e70c213f7	Merge remote-tracking branch 'origin/main' into problame/benchmarking/pr/mgmt-api-client	2023-12-15 09:40:55 +00:00
Christian Schwarz	4c5b7cff49	add a Rust client for pageserver mgmt api Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771 This PR moves the control plane's spread-all-over-the-place client for the pageserver management API into a separate module within the pageserver crate. It also switches to the async version of reqwest, which I think is generally the right direction, and I need an async client API in the benchmark epic.	2023-12-14 19:47:26 +00:00
John Spray	c4e0ef507f	pageserver: heatmap uploads (#6050 ) Dependency (commits inline): https://github.com/neondatabase/neon/pull/5842 ## Problem Secondary mode tenants need a manifest of what to download. Ultimately this will be some kind of heat-scored set of layers, but as a robust first step we will simply use the set of resident layers: secondary tenant locations will aim to match the on-disk content of the attached location. ## Summary of changes - Add heatmap types representing the remote structure - Add hooks to Tenant/Timeline for generating these heatmaps - Create a new `HeatmapUploader` type that is external to `Tenant`, and responsible for walking the list of attached tenants and scheduling heatmap uploads. Notes to reviewers: - Putting the logic for uploads (and later, secondary mode downloads) outside of `Tenant` is an opinionated choice, motivated by: - Enable future smarter scheduling of operations, e.g. uploading the stalest tenant first, rather than having all tenants compete for a fair semaphore on a first-come-first-served basis. Similarly for downloads, we may wish to schedule the tenants with the hottest un-downloaded layers first. - Enable accessing upload-related state without synchronization (it belongs to HeatmapUploader, rather than being some Mutex<>'d part of Tenant) - Avoid further expanding the scope of Tenant/Timeline types, which are already among the largest in the codebase - You might reasonably wonder how much of the uploader code could be a generic job manager thing. Probably some of it: but let's defer pulling that out until we have at least two users (perhaps secondary downloads will be the second one) to highlight which bits are really generic. Compromises: - Later, instead of using digests of heatmaps to decide whether anything changed, I would prefer to avoid walking the layers in tenants that don't have changes: tracking that will be a bit invasive, as it needs input from both remote_timeline_client and Layer.	2023-12-14 13:09:24 +00:00
Christian Schwarz	a1143cbcfe	Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/page_service_api_client	2023-12-13 17:13:28 +01:00
Christian Schwarz	e744cb05e6	build fix & run clippy	2023-12-13 15:51:01 +00:00
Christian Schwarz	e7449cf77f	add a Rust client for Pageserver's page_service Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771	2023-12-13 15:29:24 +00:00
Arpad Müller	7c2c87a5ab	Update azure SDK to 0.18 and use open range support (#6103 ) * Update `azure-` crates to 0.18 Use new open ranges support added by upstream in https://github.com/Azure/azure-sdk-for-rust/pull/1482 Part of #5567. Prior update PR: #6081	2023-12-12 18:20:12 +01:00
Joonas Koivunen	f0d15cee6f	build: update azure-* to 0.17 (#6081 ) this is a drive-by upgrade while we refresh the access tokens at the same time.	2023-12-11 12:21:02 +01:00
Andrew Rudenko	df1f8e13c4	proxy: pass neon options in deep object format (#6068 ) --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2023-12-08 19:58:36 +01:00
Conrad Ludgate	e1a564ace2	proxy simplify cancellation (#5916 ) ## Problem The cancellation code was confusing and error prone (as seen before in our memory leaks). ## Summary of changes * Use the new `TaskTracker` primitve instead of JoinSet to gracefully wait for tasks to shutdown. * Updated libs/utils/completion to use `TaskTracker` * Remove `tokio::select` in favour of `futures::future::select` in a specialised `run_until_cancelled()` helper function	2023-12-08 16:21:17 +00:00
Joonas Koivunen	b492cedf51	fix(remote_storage): buffering, by using streams for upload and download (#5446 ) There is double buffering in remote_storage and in pageserver for 8KiB in using `tokio::io::copy` to read `BufReader<ReaderStream<_>>`. Switches downloads and uploads to use `Stream<Item = std::io::Result<Bytes>>`. Caller and only caller now handles setting up buffering. For reading, `Stream<Item = ...>` is also a `AsyncBufRead`, so when writing to a file, we now have `tokio::io::copy_buf` reading full buffers and writing them to `tokio::io::BufWriter` which handles the buffering before dispatching over to `tokio::fs::File`. Additionally implements streaming uploads for azure. With azure downloads are a bit nicer than before, but not much; instead of one huge vec they just hold on to N allocations we got over the wire. This PR will also make it trivial to switch reading and writing to io-uring based methods. Cc: #5563.	2023-12-07 15:52:22 +00:00
Joonas Koivunen	b7ffe24426	build: update tokio to 1.34.0, tokio-utils 0.7.10 (#6061 ) We should still remember to bump minimum crates for libraries beginning to use task tracker.	2023-12-07 11:31:38 +00:00
Conrad Ludgate	f39fca0049	proxy: chore: replace strings with SmolStr (#5786 ) ## Problem no problem ## Summary of changes replaces boxstr with arcstr as it's cheaper to clone. mild perf improvement. probably should look into other smallstring optimsations tbh, they will likely be even better. The longest endpoint name I was able to construct is something like `ep-weathered-wildflower-12345678` which is 32 bytes. Most string optimisations top out at 23 bytes	2023-11-30 20:52:30 +00:00
Anna Khanova	e12e2681e9	IP allowlist on the proxy side (#5906 ) ## Problem Per-project IP allowlist: https://github.com/neondatabase/cloud/issues/8116 ## Summary of changes Implemented IP filtering on the proxy side. To retrieve ip allowlist for all scenarios, added `get_auth_info` call to the control plane for: * sql-over-http * password_hack * cleartext_hack Added cache with ttl for sql-over-http path This might slow down a bit, consider using redis in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2023-11-30 13:14:33 +00:00
John Khvatov	3e094e90d7	update aws sdk to 1.0.x (#5976 ) This change will be useful for experimenting with S3 performance. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 14:17:58 +02:00
Rahul Modpur	50d959fddc	refactor: use serde for TenantConf deserialization Fixes: #5300 (#5310 ) Remove handcrafted TenantConf deserialization code. Use `serde_path_to_error` to include the field which failed parsing. Leaves the duplicated TenantConf in pageserver and models, does not touch PageserverConf handcrafted deserialization. Error change: - before change: "configure option `checkpoint_distance` cannot be negative" - after change: "`checkpoint_distance`: invalid value: integer `-1`, expected u64" Fixes: #5300 Cc: #3682 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 12:47:13 +02:00
Arseny Sher	78e73b20e1	Notify safekeeper readiness with systemd. To avoid downtime during deploy, as in busy regions initial load can currently take ~30s.	2023-11-29 14:07:06 +04:00
dependabot[bot]	0d16874960	build(deps): bump openssl from 0.10.55 to 0.10.60 (#5965 ) Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.55 to 0.10.60. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-29 01:24:02 +01:00
John Spray	1ab0cfc8cb	pageserver: add sharding metadata to `LocationConf` (#5932 ) ## Problem The TenantShardId in API URLs is sufficient to uniquely identify a tenant shard, but not for it to function: it also needs to know its full sharding configuration (stripe size, layout version) in order to map keys to shards. ## Summary of changes - Introduce ShardIdentity: this is the superset of ShardIndex (#5924 ) that is required for translating keys to shard numbers. - Include ShardIdentity as an optional attribute of LocationConf - Extend the public `LocationConfig` API structure with a flat representation of shard attributes. The net result is that at the point we construct a `Tenant`, we have a `ShardIdentity` (inside LocationConf). This enables the next steps to actually use the ShardIdentity to split WAL and validate that page service requires are reaching the correct shard.	2023-11-28 13:14:51 +00:00
Conrad Ludgate	316309c85b	channel binding (#5683 ) ## Problem channel binding protects scram from sophisticated MITM attacks where the attacker is able to produce 'valid' TLS certificates. ## Summary of changes get the tls-server-end-point channel binding, and verify it is correct for the SCRAM-SHA-256-PLUS authentication flow	2023-11-27 21:45:15 +00:00
Anastasia Lubennikova	92bc2bb132	Refactor remote extensions feature to request extensions from proxy (#5836 ) instead of direct S3 request. Pros: - simplify code a lot (no need to provide AWS credentials and paths); - reduce latency of downloading extension data as proxy resides near computes; -reduce AWS costs as proxy has cache and 1000 computes asking the same extension will not generate 1000 downloads from S3. - we can use only one S3 bucket to store extensions (and rid of regional buckets which were introduced to reduce latency); Changes: - deprecate remote-ext-config compute_ctl parameter, use http://pg-ext-s3-gateway if any old format remote-ext-cofig is provided; - refactor tests to use mock http server;	2023-11-27 12:10:23 +00:00
Arpad Müller	54327bbeec	Upload initdb results to S3 (#5390 ) ## Problem See #2592 ## Summary of changes Compresses the results of initdb into a .tar.zst file and uploads them to S3, to enable usage in recovery from lsn. Generations should not be involved I think because we do this only once at the very beginning of a timeline. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-23 18:11:52 +00:00
Christian Schwarz	a0e61145c8	fix: cleanup of layers from the future can race with their re-creation (#5890 ) fixes https://github.com/neondatabase/neon/issues/5878 obsoletes https://github.com/neondatabase/neon/issues/5879 Before this PR, it could happen that `load_layer_map` schedules removal of the future image layer. Then a later compaction run could re-create the same image layer, scheduling a PUT. Due to lack of an upload queue barrier, the PUT and DELETE could be re-ordered. The result was IndexPart referencing a non-existent object. ## Summary of changes * Add support to `pagectl` / Python tests to decode `IndexPart` * Rust * new `pagectl` Subcommand * `IndexPart::{from,to}_s3_bytes()` methods to internalize knowledge about encoding of `IndexPart` * Python * new `NeonCli` subclass * Add regression test * Rust * Ability to force repartitioning; required to ensure image layer creation at last_record_lsn * Python * The regression test. * Fix the issue * Insert an `UploadOp::Barrier` after scheduling the deletions.	2023-11-23 13:33:41 +00:00
Christian Schwarz	d353fa1998	refer to our rust-postgres.git fork by branch name (#5894 ) This way, `cargo update -p tokio-postgres` just works. The `Cargo.toml` communicates more clearly that we're referring to the `main` branch. And the git revision is still pinned in `Cargo.lock`.	2023-11-22 10:58:27 +00:00
khanova	0c243faf96	Proxy log pid hack (#5869 ) ## Problem Improve observability for the compute node. ## Summary of changes Log pid from the compute node. Doesn't work with pgbouncer.	2023-11-16 20:46:23 +00:00
John Spray	ab631e6792	pageserver: make TenantsMap shard-aware (#5819 ) ## Problem When using TenantId as the key, we are unable to handle multiple tenant shards attached to the same pageserver for the same tenant ID. This is an expected scenario if we have e.g. 8 shards and 5 pageservers. ## Summary of changes - TenantsMap is now a BTreeMap instead of a HashMap: this enables looking up by range. In future, we will need this for page_service, as incoming requests will just specify the Key, and we'll have to figure out which shard to route it to. - A new key type TenantShardId is introduced, to act as the key in TenantsMap, and as the id type in external APIs. Its human readable serialization is backward compatible with TenantId, and also forward-compatible as long as sharding is not actually used (when we construct a TenantShardId with ShardCount(0), it serializes to an old-fashioned TenantId). - Essential tenant APIs are updated to accept TenantShardIds: tenant/timeline create, tenant delete, and /location_conf. These are the APIs that will enable driving sharded tenants. Other apis like /attach /detach /load /ignore will not work with sharding: those will soon be deprecated and replaced with /location_conf as part of the live migration work. Closes: #5787	2023-11-15 23:20:21 +02:00
khanova	2f0d245c2a	Proxy control plane rate limiter (#5785 ) ## Problem Proxy might overload the control plane. ## Summary of changes Implement rate limiter for proxy<->control plane connection. Resolves https://github.com/neondatabase/neon/issues/5707 Used implementation ideas from https://github.com/conradludgate/squeeze/	2023-11-15 09:15:59 +00:00
Joonas Koivunen	74d150ba45	build: upgrade ahash (#5851 ) `cargo deny` was complaining the version 0.8.3 was yanked (for possible DoS attack [wiki]), but the latest version (0.8.5) also includes aarch64 fixes which may or may not be relevant. Our usage of ahash limits to proxy, but I don't think we are at any risk. [wiki]: https://github.com/tkaitchuck/aHash/wiki/Yanked-versions	2023-11-10 19:10:54 +00:00
Joonas Koivunen	a05f104cce	build: remove async-std dependency (#5848 ) Introduced by accident (missing `default-features = false`) in `e09d5ada6a`. We directly need only `http_types::StatusCode`.	2023-11-10 16:05:21 +02:00
Conrad Ludgate	7cdde285a5	proxy: limit concurrent wake_compute requests per endpoint (#5799 ) ## Problem A user can perform many database connections at the same instant of time - these will all cache miss and materialise as requests to the control plane. #5705 ## Summary of changes I am using a `DashMap` (a sharded `RwLock<HashMap>`) of endpoints -> semaphores to apply a limiter. If the limiter is enabled (permits > 0), the semaphore will be retrieved per endpoint and a permit will be awaited before continuing to call the wake_compute endpoint. ### Important details This dashmap would grow uncontrollably without maintenance. It's not a cache so I don't think an LRU-based reclamation makes sense. Instead, I've made use of the sharding functionality of DashMap to lock a single shard and clear out unused semaphores periodically. I ran a test in release, using 128 tokio tasks among 12 threads each pushing 1000 entries into the map per second, clearing a shard every 2 seconds (64 second epoch with 32 shards). The endpoint names were sampled from a gamma distribution to make sure some overlap would occur, and each permit was held for 1ms. The histogram for time to clear each shard settled between 256-512us without any variance in my testing. Holding a lock for under a millisecond for 1 of the shards does not concern me as blocking	2023-11-09 14:14:30 +00:00
John Spray	9c30883c4b	remote_storage: use S3 SDK's adaptive retry policy (#5813 ) ## Problem Currently, we aren't doing any explicit slowdown in response to 429 responses. Recently, as we hit remote storage a bit harder (pageserver does more ListObjectsv2 requests than it used to since #5580 ), we're seeing storms of 429 responses that may be the result of not just doing too may requests, but continuing to do those extra requests without backing off any more than our usual backoff::exponential. ## Summary of changes Switch from AWS's "Standard" retry policy to "Adaptive" -- docs describe this as experimental but it has been around for a long time. The main difference between Standard and Adaptive is that Adaptive rate-limits the client in response to feedback from the server, which is meant to avoid scenarios where the client would otherwise repeatedly hit throttling responses.	2023-11-09 13:50:13 +00:00
Em Sharnoff	acef742a6e	vm-monitor: Remove dependency on workspace_hack (#5752 ) neondatabase/autoscaling builds libs/vm-monitor during CI because it's a necessary component of autoscaling. workspace_hack includes a lot of crates that are not necessary for vm-monitor, which artificially inflates the build time on the autoscaling side, so hopefully removing the dependency should speed things up. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 09:41:20 -08:00
Arpad Müller	e310533ed3	Support JWT key reload in pageserver (#5594 ) ## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 15:43:29 +01:00
duguorong009	b3d3a2587d	feat: improve the serde impl for several types(`Lsn`, `TenantId`, `TimelineId` ...) (#5335 ) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 11:40:03 +02:00
duguorong009	09b5954526	refactor: use streaming in safekeeper `/v1/debug_dump` http response (#5731 ) - Update the handler for `/v1/debug_dump` http response in safekeeper - Update the `debug_dump::build()` to use the streaming in JSON build process	2023-11-05 10:16:54 +00:00
John Spray	306c4f9967	s3_scrubber: prepare for scrubbing buckets with generation-aware content (#5700 ) ## Problem The scrubber didn't know how to find the latest index_part when generations were in use. ## Summary of changes - Teach the scrubber to do the same dance that pageserver does when finding the latest index_part.json - Teach the scrubber how to understand layer files with generation suffixes. - General improvement to testability: scan_metadata has a machine readable output that the testing `S3Scrubber` wrapper can read. - Existing test coverage of scrubber was false-passing because it just didn't see any data due to prefixing of data in the bucket. Fix that. This is incremental improvement: the more confidence we can have in the scrubber, the more we can use it in integration tests to validate the state of remote storage. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-11-03 17:36:02 +00:00
Conrad Ludgate	cdcaa329bf	proxy: no more statements (#5747 ) ## Problem my prepared statements change in tokio-postgres landed in the latest release. it didn't work as we intended ## Summary of changes https://github.com/neondatabase/rust-postgres/pull/24	2023-11-03 08:30:58 +00:00
Conrad Ludgate	d8c21ec70d	fix nightly 1.75 (#5719 ) ## Problem Neon doesn't compile on nightly and had numerous clippy complaints. ## Summary of changes 1. Fixed troublesome dependency 2. Fixed or ignored the lints where appropriate	2023-10-30 16:43:06 +00:00
Arthur Petukhovsky	66f8f5f1c8	Call walproposer from Rust (#5403 ) Create Rust bindings for C functions from walproposer. This allows to write better tests with real walproposer code without spawning multiple processes and starting up the whole environment. `make walproposer-lib` stage was added to build static libraries `libwalproposer.a`, `libpgport.a`, `libpgcommon.a`. These libraries can be statically linked to any executable to call walproposer functions. `libs/walproposer/src/walproposer.rs` contains `test_simple_sync_safekeepers` to test that walproposer can be called from Rust to emulate sync_safekeepers logic. It can also be used as a usage example.	2023-10-19 14:17:15 +01:00

1 2 3 4 5 ...

394 Commits