rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Arpad Müller	045bc6af8b	Add new compaction abstraction, simulator, and implementation. (#6830 ) Rebased version of #5234, part of #6768 This consists of three parts: 1. A refactoring and new contract for implementing and testing compaction. The logic is now in a separate crate, with no dependency on the 'pageserver' crate. It defines an interface that the real pageserver must implement, in order to call the compaction algorithm. The interface models things like delta and image layers, but just the parts that the compaction algorithm needs to make decisions. That makes it easier unit test the algorithm and experiment with different implementations. I did not convert the current code to the new abstraction, however. When compaction algorithm is set to "Legacy", we just use the old code. It might be worthwhile to convert the old code to the new abstraction, so that we can compare the behavior of the new algorithm against the old one, using the same simulated cases. If we do that, have to be careful that the converted code really is equivalent to the old. This inclues only trivial changes to the main pageserver code. All the new code is behind a tenant config option. So this should be pretty safe to merge, even if the new implementation is buggy, as long as we don't enable it. 2. A new compaction algorithm, implemented using the new abstraction. The new algorithm is tiered compaction. It is inspired by the PoC at PR #4539, although I did not use that code directly, as I needed the new implementation to fit the new abstraction. The algorithm here is less advanced, I did not implement partial image layers, for example. I wanted to keep it simple on purpose, so that as we add bells and whistles, we can see the effects using the included simulator. One difference to #4539 and your typical LSM tree implementations is how we keep track of the LSM tree levels. This PR doesn't have a permanent concept of a level, tier or sorted run at all. There are just delta and image layers. However, when compaction starts, we look at the layers that exist, and arrange them into levels, depending on their shapes. That is ephemeral: when the compaction finishes, we forget that information. This allows the new algorithm to work without any extra bookkeeping. That makes it easier to transition from the old algorithm to new, and back again. There is just a new tenant config option to choose the compaction algorithm. The default is "Legacy", meaning the current algorithm in 'main'. If you set it to "Tiered", the new algorithm is used. 3. A simulator, which implements the new abstraction. The simulator can be used to analyze write and storage amplification, without running a test with the full pageserver. It can also draw an SVG animation of the simulation, to visualize how layers are created and deleted. To run the simulator: cargo run --bin compaction-simulator run-suite --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-27 17:15:46 +01:00
Christian Schwarz	ca07fa5f8b	per-TenantShard read throttling (#6706 )	2024-02-16 21:26:59 +01:00
Anna Khanova	331935df91	Proxy: send cancel notifications to all instances (#6719 ) ## Problem If cancel request ends up on the wrong proxy instance, it doesn't take an effect. ## Summary of changes Send redis notifications to all proxy pods about the cancel request. Related issue: https://github.com/neondatabase/neon/issues/5839, https://github.com/neondatabase/cloud/issues/10262	2024-02-13 17:58:58 +01:00
Arthur Petukhovsky	4be2223a4c	Discrete event simulation for safekeepers (#5804 ) This PR contains the first version of a [FoundationDB-like](https://www.youtube.com/watch?v=4fFDFbi3toc) simulation testing for safekeeper and walproposer. ### desim This is a core "framework" for running determenistic simulation. It operates on threads, allowing to test syncronous code (like walproposer). `libs/desim/src/executor.rs` contains implementation of a determenistic thread execution. This is achieved by blocking all threads, and each time allowing only a single thread to make an execution step. All executor's threads are blocked using `yield_me(after_ms)` function. This function is called when a thread wants to sleep or wait for an external notification (like blocking on a channel until it has a ready message). `libs/desim/src/chan.rs` contains implementation of a channel (basic sync primitive). It has unlimited capacity and any thread can push or read messages to/from it. `libs/desim/src/network.rs` has a very naive implementation of a network (only reliable TCP-like connections are supported for now), that can have arbitrary delays for each package and failure injections for breaking connections with some probability. `libs/desim/src/world.rs` ties everything together, to have a concept of virtual nodes that can have network connections between them. ### walproposer_sim Has everything to run walproposer and safekeepers in a simulation. `safekeeper.rs` reimplements all necesary stuff from `receive_wal.rs`, `send_wal.rs` and `timelines_global_map.rs`. `walproposer_api.rs` implements all walproposer callback to use simulation library. `simulation.rs` defines a schedule – a set of events like `restart <sk>` or `write_wal` that should happen at time `<ts>`. It also has code to spawn walproposer/safekeeper threads and provide config to them. ### tests `simple_test.rs` has tests that just start walproposer and 3 safekeepers together in a simulation, and tests that they are not crashing right away. `misc_test.rs` has tests checking more advanced simulation cases, like crashing or restarting threads, testing memory deallocation, etc. `random_test.rs` is the main test, it checks thousands of random seeds (schedules) for correctness. It roughly corresponds to running a real python integration test in an environment with very unstable network and cpu, but in a determenistic way (each seed results in the same execution log) and much much faster. Closes #547 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-02-12 20:29:57 +00:00
Conrad Ludgate	cbd3a32d4d	proxy: decode username and password (#6700 ) ## Problem usernames and passwords can be URL 'percent' encoded in the connection string URL provided by serverless driver. ## Summary of changes Decode the parameters when getting conn info	2024-02-09 19:22:23 +00:00
Christian Schwarz	c561ad4e2e	feat: expose locked memory in pageserver `/metrics` (#6669 ) context: https://github.com/neondatabase/neon/issues/6667	2024-02-07 19:39:52 +00:00
Conrad Ludgate	74c5e3d9b8	use string interner for project cache (#6578 ) ## Problem Running some memory profiling with high concurrent request rate shows seemingly some memory fragmentation. ## Summary of changes Eventually, we will want to separate global memory (caches) from local memory (per connection handshake and per passthrough). Using a string interner for project info cache helps reduce some of the fragmentation of the global cache by having a single heap dedicated to project strings, and not scattering them throughout all a requests. At the same time, the interned key is 4 bytes vs the 24 bytes that `SmolStr` offers. Important: we should only store verified strings in the interner because there's no way to remove them afterwards. Good for caching responses from console.	2024-02-05 14:27:25 +00:00
John Spray	7e2436695d	storage controller: use AWS Secrets Manager for database URL, etc (#6585 ) ## Problem Passing secrets in via CLI/environment is awkward when using helm for deployment, and not ideal for security (secrets may show up in ps, /proc). We can bypass these issues by simply connecting directly to the AWS Secrets Manager service at runtime. ## Summary of changes - Add dependency on aws-sdk-secretsmanager - Update other aws dependencies to latest, to match transitive dependency versions - Add `Secrets` type in attachment service, using AWS SDK to load if secrets are not provided on the command line.	2024-02-02 16:57:11 +00:00
Christian Schwarz	1be5e564ce	feat(walredo): use posix_spawn by moving close_fds() work to walredo C code (#6574 ) The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually using `gdb` that we're now forking out the walredo process using `posix_spawn`. refs https://github.com/neondatabase/neon/issues/6565	2024-02-01 22:38:34 +01:00
Conrad Ludgate	c7b02ce8ec	proxy: use jemalloc (#6531 ) ## Summary of changes Experiment with jemalloc in proxy	2024-01-31 14:51:11 +01:00
Conrad Ludgate	511e730cc0	hll experiment (#6312 ) ## Problem Measuring cardinality using logs is expensive and slow. ## Summary of changes Implement a pre-aggregated HyperLogLog-based cardinality estimate. HyperLogLog estimates the cardinality of a set by using the probability that the uniform hash of a value will have a run of n 0s at the end is `1/2^n`, therefore, having observed a run of `n` 0s suggests we have measured `2^n` distinct values. By using multiple shards, we can use the harmonic mean to get a more accurate estimate. We record this into a Prometheus time-series. HyperLogLog counts can be merged by taking the `max` of each shard. We can apply a `max_over_time` in order to find the estimate of cardinality of distinct values over time	2024-01-29 07:26:20 +00:00
Christian Schwarz	918b03b3b0	integrate tokio-epoll-uring as alternative VirtualFile IO engine (#5824 )	2024-01-26 09:25:07 +01:00
Conrad Ludgate	72de1cb511	remove some duped deps (#6422 ) ## Problem duplicated deps ## Summary of changes little bit of fiddling with deps to reduce duplicates needs consideration: https://github.com/notify-rs/notify/blob/main/CHANGELOG.md#notify-600-2023-05-17	2024-01-23 11:17:15 +00:00
John Spray	b6ec11ad78	control_plane: generalize attachment_service to handle sharding (#6251 ) ## Problem To test sharding, we need something to control it. We could write python code for doing this from the test runner, but this wouldn't be usable with neon_local run directly, and when we want to write tests with large number of shards/tenants, Rust is a better fit efficiently handling all the required state. This service enables automated tests to easily get a system with sharding/HA without the test itself having to set this all up by hand: existing tests can be run against sharded tenants just by setting a shard count when creating the tenant. ## Summary of changes Attachment service was previously a map of TenantId->TenantState, where the principal state stored for each tenant was the generation and the last attached pageserver. This enabled it to serve the re-attach and validate requests that the pageserver requires. In this PR, the scope of the service is extended substantially to do overall management of tenants in the pageserver, including tenant/timeline creation, live migration, evacuation of offline pageservers etc. This is done using synchronous code to make declarative changes to the tenant's intended state (`TenantState.policy` and `TenantState.intent`), which are then translated into calls into the pageserver by the `Reconciler`. Top level summary of modules within `control_plane/attachment_service/src`: - `tenant_state`: structure that represents one tenant shard. - `service`: implements the main high level such as tenant/timeline creation, marking a node offline, etc. - `scheduler`: for operations that need to pick a pageserver for a tenant, construct a scheduler and call into it. - `compute_hook`: receive notifications when a tenant shard is attached somewhere new. Once we have locations for all the shards in a tenant, emit an update to postgres configuration via the neon_local `LocalEnv`. - `http`: HTTP stubs. These mostly map to methods on `Service`, but are separated for readability and so that it'll be easier to adapt if/when we switch to another RPC layer. - `node`: structure that describes a pageserver node. The most important attribute of a node is its availability: marking a node offline causes tenant shards to reschedule away from it. This PR is a precursor to implementing the full sharding service for prod (#6342). What's the difference between this and a production-ready controller for pageservers? - JSON file persistence to be replaced with a database - Limited observability. - No concurrency limits. Marking a pageserver offline will try and migrate every tenant to a new pageserver concurrently, even if there are thousands. - Very simple scheduler that only knows to pick the pageserver with fewest tenants, and place secondary locations on a different pageserver than attached locations: it does not try to place shards for the same tenant on different pageservers. This matters little in tests, because picking the least-used pageserver usually results in round-robin placement. - Scheduler state is rebuilt exhaustively for each operation that requires a scheduler. - Relies on neon_local mechanisms for updating postgres: in production this would be something that flows through the real control plane. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-01-17 18:01:08 +00:00
Anna Khanova	76372ce002	Added auth info cache with notifiations to redis. (#6208 ) ## Problem Current cache doesn't support any updates from the cplane. ## Summary of changes * Added redis notifier listner. * Added cache which can be invalidated with the notifier. If the notifier is not available, it's just a normal ttl cache. * Updated cplane api. The motivation behind this organization of the data is the following: * In the Neon data model there are projects. Projects could have multiple branches and each branch could have more than one endpoint. * Also there is one special `main` branch. * Password reset works per branch. * Allowed IPs are the same for every branch in the project (except, maybe, the main one). * The main branch can be changed to the other branch. * The endpoint can be moved between branches. Every event described above requires some special processing on the porxy (or cplane) side. The idea of invalidating for the project is that whenever one of the events above is happening with the project, proxy can invalidate all entries for the entire project. This approach also requires some additional API change (returning project_id inside the auth info).	2024-01-10 11:51:05 +00:00
Conrad Ludgate	8a646cb750	proxy: add request context for observability and blocking (#6160 ) ## Summary of changes ### RequestMonitoring We want to add an event stream with information on each request for easier analysis than what we can do with diagnostic logs alone (https://github.com/neondatabase/cloud/issues/8807). This RequestMonitoring will keep a record of the final state of a request. On drop it will be pushed into a queue to be uploaded. Because this context is a bag of data, I don't want this information to impact logic of request handling. I personally think that weakly typed data (such as all these options) makes for spaghetti code. I will however allow for this data to impact rate-limiting and blocking of requests, as this does not _really_ change how a request is handled. ### Parquet Each `RequestMonitoring` is flushed into a channel where it is converted into `RequestData`, which is accumulated into parquet files. Each file will have a certain number of rows per row group, and several row groups will eventually fill up the file, which we then upload to S3. We will also upload smaller files if they take too long to construct.	2024-01-08 11:42:43 +00:00
Christian Schwarz	5385791ca6	add pageserver component-level benchmark (`pagebench`) (#6174 ) This PR adds a component-level benchmarking utility for pageserver. Its name is `pagebench`. The problem solved by `pagebench` is that we want to put Pageserver under high load. This isn't easily achieved with `pgbench` because it needs to go through a compute, which has signficant performance overhead compared to accessing Pageserver directly. Further, compute has its own performance optimizations (most importantly: caches). Instead of designing a compute-facing workload that defeats those internal optimizations, `pagebench` simply bypasses them by accessing pageserver directly. Supported benchmarks: * getpage@latest_lsn * basebackup * triggering logical size calculation This code has no automated users yet. A performance regression test for getpage@latest_lsn will be added in a later PR. part of https://github.com/neondatabase/neon/issues/5771	2023-12-21 13:07:23 +01:00
Arpad Müller	8b91bbc38e	Update jsonwebtoken to 9 and sct to 0.7.1 (#6189 ) This increases the list of crates that base on `ring` 0.17.	2023-12-19 15:45:17 +00:00
Christian Schwarz	1a9854bfb7	add a Rust client for Pageserver management API (#6127 ) Part of getpage@lsn benchmark epic: https://github.com/neondatabase/neon/issues/5771 This PR moves the control plane's spread-all-over-the-place client for the pageserver management API into a separate module within the pageserver crate. I need that client to be async in my benchmarking work, so, this PR switches to the async version of `reqwest`. That is also the right direction generally IMO. The switch to async in turn mandated converting most of the `control_plane/` code to async. Note that some of the client methods should be taking `TenantShardId` instead of `TenantId`, but, none of the callers seem to be sharding-aware. Leaving that for another time: https://github.com/neondatabase/neon/issues/6154	2023-12-15 18:33:45 +01:00
Arpad Müller	7c2c87a5ab	Update azure SDK to 0.18 and use open range support (#6103 ) * Update `azure-` crates to 0.18 Use new open ranges support added by upstream in https://github.com/Azure/azure-sdk-for-rust/pull/1482 Part of #5567. Prior update PR: #6081	2023-12-12 18:20:12 +01:00
Joonas Koivunen	f0d15cee6f	build: update azure-* to 0.17 (#6081 ) this is a drive-by upgrade while we refresh the access tokens at the same time.	2023-12-11 12:21:02 +01:00
Andrew Rudenko	df1f8e13c4	proxy: pass neon options in deep object format (#6068 ) --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2023-12-08 19:58:36 +01:00
Conrad Ludgate	e1a564ace2	proxy simplify cancellation (#5916 ) ## Problem The cancellation code was confusing and error prone (as seen before in our memory leaks). ## Summary of changes * Use the new `TaskTracker` primitve instead of JoinSet to gracefully wait for tasks to shutdown. * Updated libs/utils/completion to use `TaskTracker` * Remove `tokio::select` in favour of `futures::future::select` in a specialised `run_until_cancelled()` helper function	2023-12-08 16:21:17 +00:00
Conrad Ludgate	f39fca0049	proxy: chore: replace strings with SmolStr (#5786 ) ## Problem no problem ## Summary of changes replaces boxstr with arcstr as it's cheaper to clone. mild perf improvement. probably should look into other smallstring optimsations tbh, they will likely be even better. The longest endpoint name I was able to construct is something like `ep-weathered-wildflower-12345678` which is 32 bytes. Most string optimisations top out at 23 bytes	2023-11-30 20:52:30 +00:00
Anna Khanova	e12e2681e9	IP allowlist on the proxy side (#5906 ) ## Problem Per-project IP allowlist: https://github.com/neondatabase/cloud/issues/8116 ## Summary of changes Implemented IP filtering on the proxy side. To retrieve ip allowlist for all scenarios, added `get_auth_info` call to the control plane for: * sql-over-http * password_hack * cleartext_hack Added cache with ttl for sql-over-http path This might slow down a bit, consider using redis in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2023-11-30 13:14:33 +00:00
John Khvatov	3e094e90d7	update aws sdk to 1.0.x (#5976 ) This change will be useful for experimenting with S3 performance. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 14:17:58 +02:00
Rahul Modpur	50d959fddc	refactor: use serde for TenantConf deserialization Fixes: #5300 (#5310 ) Remove handcrafted TenantConf deserialization code. Use `serde_path_to_error` to include the field which failed parsing. Leaves the duplicated TenantConf in pageserver and models, does not touch PageserverConf handcrafted deserialization. Error change: - before change: "configure option `checkpoint_distance` cannot be negative" - after change: "`checkpoint_distance`: invalid value: integer `-1`, expected u64" Fixes: #5300 Cc: #3682 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 12:47:13 +02:00
Arseny Sher	78e73b20e1	Notify safekeeper readiness with systemd. To avoid downtime during deploy, as in busy regions initial load can currently take ~30s.	2023-11-29 14:07:06 +04:00
Arpad Müller	54327bbeec	Upload initdb results to S3 (#5390 ) ## Problem See #2592 ## Summary of changes Compresses the results of initdb into a .tar.zst file and uploads them to S3, to enable usage in recovery from lsn. Generations should not be involved I think because we do this only once at the very beginning of a timeline. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-23 18:11:52 +00:00
Christian Schwarz	d353fa1998	refer to our rust-postgres.git fork by branch name (#5894 ) This way, `cargo update -p tokio-postgres` just works. The `Cargo.toml` communicates more clearly that we're referring to the `main` branch. And the git revision is still pinned in `Cargo.lock`.	2023-11-22 10:58:27 +00:00
khanova	0c243faf96	Proxy log pid hack (#5869 ) ## Problem Improve observability for the compute node. ## Summary of changes Log pid from the compute node. Doesn't work with pgbouncer.	2023-11-16 20:46:23 +00:00
khanova	2f0d245c2a	Proxy control plane rate limiter (#5785 ) ## Problem Proxy might overload the control plane. ## Summary of changes Implement rate limiter for proxy<->control plane connection. Resolves https://github.com/neondatabase/neon/issues/5707 Used implementation ideas from https://github.com/conradludgate/squeeze/	2023-11-15 09:15:59 +00:00
Joonas Koivunen	a05f104cce	build: remove async-std dependency (#5848 ) Introduced by accident (missing `default-features = false`) in `e09d5ada6a`. We directly need only `http_types::StatusCode`.	2023-11-10 16:05:21 +02:00
Conrad Ludgate	7cdde285a5	proxy: limit concurrent wake_compute requests per endpoint (#5799 ) ## Problem A user can perform many database connections at the same instant of time - these will all cache miss and materialise as requests to the control plane. #5705 ## Summary of changes I am using a `DashMap` (a sharded `RwLock<HashMap>`) of endpoints -> semaphores to apply a limiter. If the limiter is enabled (permits > 0), the semaphore will be retrieved per endpoint and a permit will be awaited before continuing to call the wake_compute endpoint. ### Important details This dashmap would grow uncontrollably without maintenance. It's not a cache so I don't think an LRU-based reclamation makes sense. Instead, I've made use of the sharding functionality of DashMap to lock a single shard and clear out unused semaphores periodically. I ran a test in release, using 128 tokio tasks among 12 threads each pushing 1000 entries into the map per second, clearing a shard every 2 seconds (64 second epoch with 32 shards). The endpoint names were sampled from a gamma distribution to make sure some overlap would occur, and each permit was held for 1ms. The histogram for time to clear each shard settled between 256-512us without any variance in my testing. Holding a lock for under a millisecond for 1 of the shards does not concern me as blocking	2023-11-09 14:14:30 +00:00
John Spray	9c30883c4b	remote_storage: use S3 SDK's adaptive retry policy (#5813 ) ## Problem Currently, we aren't doing any explicit slowdown in response to 429 responses. Recently, as we hit remote storage a bit harder (pageserver does more ListObjectsv2 requests than it used to since #5580 ), we're seeing storms of 429 responses that may be the result of not just doing too may requests, but continuing to do those extra requests without backing off any more than our usual backoff::exponential. ## Summary of changes Switch from AWS's "Standard" retry policy to "Adaptive" -- docs describe this as experimental but it has been around for a long time. The main difference between Standard and Adaptive is that Adaptive rate-limits the client in response to feedback from the server, which is meant to avoid scenarios where the client would otherwise repeatedly hit throttling responses.	2023-11-09 13:50:13 +00:00
Arpad Müller	e310533ed3	Support JWT key reload in pageserver (#5594 ) ## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 15:43:29 +01:00
duguorong009	b3d3a2587d	feat: improve the serde impl for several types(`Lsn`, `TenantId`, `TimelineId` ...) (#5335 ) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 11:40:03 +02:00
Conrad Ludgate	cdcaa329bf	proxy: no more statements (#5747 ) ## Problem my prepared statements change in tokio-postgres landed in the latest release. it didn't work as we intended ## Summary of changes https://github.com/neondatabase/rust-postgres/pull/24	2023-11-03 08:30:58 +00:00
Arthur Petukhovsky	66f8f5f1c8	Call walproposer from Rust (#5403 ) Create Rust bindings for C functions from walproposer. This allows to write better tests with real walproposer code without spawning multiple processes and starting up the whole environment. `make walproposer-lib` stage was added to build static libraries `libwalproposer.a`, `libpgport.a`, `libpgcommon.a`. These libraries can be statically linked to any executable to call walproposer functions. `libs/walproposer/src/walproposer.rs` contains `test_simple_sync_safekeepers` to test that walproposer can be called from Rust to emulate sync_safekeepers logic. It can also be used as a usage example.	2023-10-19 14:17:15 +01:00
Conrad Ludgate	572eda44ee	update tokio-postgres (#5597 ) https://github.com/neondatabase/rust-postgres/pull/23	2023-10-19 14:32:19 +02:00
Conrad Ludgate	f775928dfc	proxy: refactor how and when connections are returned to the pool (#5095 ) ## Problem Transactions break connections in the pool fixes #4698 ## Summary of changes * Pool `Client`s are smart object that return themselves to the pool * Pool `Client`s can be 'discard'ed * Pool `Client`s are discarded when certain errors are encountered. * Pool `Client`s are discarded when ReadyForQuery returns a non-idle state.	2023-10-17 13:55:52 +00:00
Arpad Müller	00c71bb93a	Also try to login to Azure via SDK provided methods (#5573 ) ## Problem We ideally use the Azure SDK's way of obtaining authorization, as pointed out in https://github.com/neondatabase/neon/pull/5546#discussion_r1360619178 . ## Summary of changes This PR adds support for Azure SDK based authentication, using [DefaultAzureCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.DefaultAzureCredential.html), which tries the following credentials: * [EnvironmentCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.EnvironmentCredential.html), reading from various env vars * [ImdsManagedIdentityCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.ImdsManagedIdentityCredential.html), using managed identity * [AzureCliCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.AzureCliCredential.html), using Azure CLI closes #5566.	2023-10-17 11:59:57 +01:00
Arpad Müller	e09d5ada6a	Azure blob storage support (#5546 ) Adds prototype-level support for [Azure blob storage](https://azure.microsoft.com/en-us/products/storage/blobs). Some corners were cut, see the TODOs and the followup issue #5567 for details. Steps to try it out: * Create a storage account with block blobs (this is a per-storage account setting). * Create a container inside that storage account. * Set the appropriate env vars: `AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_ACCESS_KEY, REMOTE_STORAGE_AZURE_CONTAINER, REMOTE_STORAGE_AZURE_REGION` * Set the env var `ENABLE_REAL_AZURE_REMOTE_STORAGE=y` and run `cargo test -p remote_storage azure` Fixes #5562	2023-10-16 17:37:09 +02:00
duguorong009	25a37215f3	fix: replace all `std::PathBuf`s with `camino::Utf8PathBuf` (#5352 ) Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with `camino::Utf8Path`, `camino::Utf8PathBuf` in - pageserver - safekeeper - control_plane - libs/remote_storage Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 17:52:23 +03:00
Conrad Ludgate	528fb1bd81	proxy: metrics2 (#5179 ) ## Problem We need to count metrics always when a connection is open. Not only when the transfer is 0. We also need to count bytes usage for HTTP. ## Summary of changes New structure for usage metrics. A `DashMap<Ids, Arc<Counters>>`. If the arc has 1 owner (the map) then I can conclude that no connections are open. If the counters has "open_connections" non zero, then I can conclude a new connection was opened in the last interval and should be reported on. Also, keep count of how many bytes processed for HTTP and report it here.	2023-09-28 11:38:26 +01:00
Alexander Bayandin	211f882428	Update hyper-tungstenite to 0.11 (#5361 )	2023-09-23 18:06:25 +01:00
Joonas Koivunen	720d59737a	rust-1.72.0 changes (#5255 ) Prepare to upgrade rust version to latest stable. - `rustfmt` has learned to format `let irrefutable = $expr else { ... };` blocks - There's a new warning about virtual (workspace) crate resolver, picked the latest resolver as I suspect everyone would expect it to be the latest; should not matter anyways - Some new clippies, which seem alright	2023-09-08 16:28:41 +03:00
John Spray	616e7046c7	s3_scrubber: import into the main `neon` repository (#5141 ) ## Problem The S3 scrubber currently lives at https://github.com/neondatabase/s3-scrubber We don't have tests that use it, and it has copies of some data structures that can get stale. ## Summary of changes - Import the s3-scrubber as `s3_scrubber/ - Replace copied_definitions/ in the scrubber with direct access to the `utils` and `pageserver` crates - Modify visibility of a few definitions in `pageserver` to allow the scrubber to use them - Update scrubber code for recent changes to `IndexPart` - Update `KNOWN_VERSIONS` for IndexPart and move the definition into index.rs so that it is easier to keep up to date As a future refinement, it would be good to pull the remote persistence types (like IndexPart) out of `pageserver` into a separate library so that the scrubber doesn't have to link against the whole pageserver, and so that it's clearer which types need to be public. Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-08-31 19:01:39 +01:00
Conrad Ludgate	3b81e0c86d	chore: remove webpki (#5069 ) ## Problem webpki is unmaintained Closes https://github.com/neondatabase/neon/security/dependabot/33 ## Summary of changes Update all dependents of webpki.	2023-08-30 15:14:03 +01:00
Felix Prasanna	3128eeff01	compute_ctl: add vm-monitor (#4946 ) Co-authored-by: Em Sharnoff <sharnoff@neon.tech>	2023-08-24 15:54:37 -04:00

1 2 3

132 Commits