rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 17:32:56 +00:00

Author	SHA1	Message	Date
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Arseny Sher	7627d85345	Move async postgres_backend to its own crate. To untie cyclic dependency between sync and async versions of postgres_backend, copy QueryError and some logging/error routines to postgres_backend.rs. This is temporal glue to make commits smaller, sync version will be dropped by the upcoming commit completely.	2023-03-09 20:45:56 +03:00
Christian Schwarz	175a577ad4	automatic layer eviction This patch adds a per-timeline periodic task that executes an eviction policy. The eviction policy is configurable per tenant. Two policies exist: - NoEviction (the default one) - LayerAccessThreshold The LayerAccessThreshold policy examines the last access timestamp per layer in the layer map and evicts the layer if that last access is further in the past than a configurable threshold value. This policy kind is evaluated periodically at a configurable period. It logs a summary statistic at `info!()` or `warn!()` level, depending on whether any evictions failed. This feature has no explicit killswitch since it's off by default.	2023-02-09 13:33:55 +01:00
Christian Schwarz	58fa4f0eb7	maintain access stats for historic layers This patch adds basic access statistics for historic layers and exposes them in the management API's `LayerMapInfo`. We record the accesses in the `{Delta,Image}Layer::load()` function because it's the common path of * page_service (`Timline::get_reconstruct_data()`) * Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`) The stats survive residence status changes, and record these as well. When scraping the layer map endpoint to record its evolution over time, one must account for stat resets because they are in-memory only and will reset on pageserver restart. Use the launch timestamp header added by (#3527) to identify pageserver restarts. This is PR https://github.com/neondatabase/neon/pull/3496	2023-02-06 17:01:38 +01:00
bojanserafimov	a3d7ad2d52	Implement layer map using immutable BST (#2998 )	2023-01-20 16:10:12 -05:00
Anastasia Lubennikova	2cbe84b78f	Proxy metrics (#3290 ) Implement proxy metrics collection. Only collect metric for outbound traffic. Add proxy CLI parameters: - metric-collection-endpoint - metric-collection-interval. Add test_proxy_metric_collection test. Move shared consumption metrics code to libs/consumption_metrics. Refactor the code.	2023-01-16 15:17:28 +00:00
Kirill Bulatov	bce4233d3a	Rework Cargo.toml dependencies (#3322 ) * Use workspace variables from cargo, coming with rustc [1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22) See https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table and https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table sections. Now, all dependencies in all non-root `Cargo.toml` files are defined as ``` clap.workspace = true ``` sometimes, when extra features are needed, as ``` bytes = {workspace = true, features = ['serde'] } ``` With the actual declarations (with shared features and version numbers/file paths/etc.) in the root Cargo.toml. Features are additive: https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#inheriting-a-dependency-from-a-workspace * Uses the mechanism above to set common, 2021, edition and license across the workspace * Mechanically bumps a few dependencies * Updates hakari format, as it suggested: ``` work/neon/neon kb/cargo-templated ❯ cargo hakari generate info: no changes detected info: new hakari format version available: 3 (current: 2) (add or update `dep-format-version = "3"` in hakari.toml, then run `cargo hakari generate && cargo hakari manage-deps`) ```	2023-01-13 18:13:34 +02:00
Heikki Linnakangas	e9583db73b	Remove code and test to generate flamegraph on GetPage requests. (#3257 ) It was nice to have and useful at the time, but unfortunately the method used to gather the profiling data doesn't play nicely with 'async'. PR #3228 will turn 'get_page_at_lsn' function async, which will break the profiling support. Let's remove it, and re-introduce some kind of profiling later, using some different method, if we feel like we need it again.	2023-01-03 20:11:32 +02:00
Vadim Kharitonov	0b428f7c41	Enable licenses check for 3rd-parties	2023-01-03 15:11:50 +01:00
Heikki Linnakangas	0a0e55c3d0	Replace 'tar' crate with 'tokio-tar' (#3202 ) The synchronous 'tar' crate has required us to use block_in_place and SyncIoBridge to work together with the async I/O in the client connection. Switch to 'tokio-tar' crate that uses async I/O natively. As part of this, move the CopyDataWriter implementation to postgres_backend_async.rs. Even though it's only used in one place currently, it's in principle generally applicable whenever you want to use COPY out. Unfortunately we cannot use the 'tokio-tar' as it is: the Builder implementation requires the writer to have 'static lifetime. So we have to use a modified version without that requirement. The 'static lifetime was required just for the Drop implementation that writes the end-of-archive sections if the Builder is dropped without calling `finish`. But we don't actually want that behavior anyway; in fact we had to jump through some hoops with the AbortableWrite hack to skip those. With the modified version of 'tokio-tar' without that Drop implementation, we don't need AbortableWrite either. Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2023-01-03 12:39:11 +02:00
Heikki Linnakangas	81afd7011c	Use rustls for everything. I looked at "cargo tree" output and noticed that through various dependencies, we are depending on both native-tls and rustls. We have tried to standardize on rustls for everything, but dependencies on native-tls have crept in recently. One such dependency came from 'reqwest' with default features in pageserver, used for consumption_metrics. Another dependency was from 'sentry'. Both 'reqwest' and 'sentry' use native-tls by default, but can use 'rustls' if compiled with the right feature flags.	2023-01-02 11:14:35 +02:00
Anastasia Lubennikova	4235f97c6a	Implement consumption metrics collection. Add new background job to collect billing metrics for each tenant and send them to the HTTP endpoint. Metrics are cached, so we don't send non-changed metrics. Add metric collection config parameters: metric_collection_endpoint (default None, i.e. disabled) metric_collection_interval (default 60s) Add test_metric_collection.py to test metric collection and sending to the mocked HTTP endpoint. Use port distributor in metric_collection test review fixes: only update cache after metrics were send successfully, simplify code disable metric collection if metric_collection_endpoint is not provided in config	2022-12-20 22:59:52 +02:00
Dmitry Ivanov	61194ab2f4	Update rust-postgres everywhere I've rebased[1] Neon's fork of rust-postgres to incorporate latest upstream changes (including dependabot's fixes), so we need to advance revs here as well. [1] https://github.com/neondatabase/rust-postgres/commits/neon	2022-12-17 00:26:10 +03:00
Dmitry Ivanov	83baf49487	[proxy] Forward compute connection params to client This fixes all kinds of problems related to missing params, like broken timestamps (due to `integer_datetimes`). This solution is not ideal, but it will help. Meanwhile, I'm going to dedicate some time to improving connection machinery. Note that this does not fix problems with passing certain parameters in a reverse direction, i.e. from client to compute. This is a separate matter and will be dealt with in an upcoming PR.	2022-12-16 21:37:50 +03:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Alexander Bayandin	61825dfb57	Update chrono to 0.4.23; use only clock feature from it	2022-12-06 15:45:58 +01:00
Heikki Linnakangas	9a6c0be823	storage_sync2 The code in this change was extracted from PR #2595, i.e., Heikki’s draft PR for on-demand download. High-Level Changes - storage_sync module rewrite - Changes to Tenant Loading - Changes to Timeline States - Crash-safe & Resumable Tenant Attach There are several follow-up work items planned. Refer to the Epic issue on GitHub: https://github.com/neondatabase/neon/issues/2029 Metadata: closes https://github.com/neondatabase/neon/pull/2785 unsquashed history of this patch: archive/pr-2785-storage-sync2/pre-squash Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> =============================================================================== storage_sync module rewrite =========================== The storage_sync code is rewritten. New module name is storage_sync2, mostly to make a more reasonable git diff. The updated block comment in storage_sync2.rs describes the changes quite well, so, we will not reproduce that comment here. TL;DR: - Global sync queue and RemoteIndex are replaced with per-timeline `RemoteTimelineClient` structure that contains a queue for UploadOperations to ensure proper ordering and necessary metadata. - Before deleting local layer files, wait for ongoing UploadOps to finish (wait_completion()). - Download operations are not queued and executed immediately. Changes to Tenant Loading ========================= Initial sync part was rewritten as well and represents the other major change that serves as a foundation for on-demand downloads. Routines for attaching and loading shifted directly to Tenant struct and now are asynchronous and spawned into the background. Since this patch doesn’t introduce on-demand download of layers we fully synchronize with the remote during pageserver startup. See details in `Timeline::reconcile_with_remote` and `Timeline::download_missing`. Changes to Tenant States ======================== The “Active” state has lost its “background_jobs_running: bool” member. That variable indicated whether the GC & Compaction background loops are spawned or not. With this patch, they are now always spawned. Unit tests (#[test]) use the TenantConf::{gc_period,compaction_period} to disable their effect (`15db566`). This patch introduces a new tenant state, “Attaching”. A tenant that is being attached starts in this state and transitions to “Active” once it finishes download. The `GET /tenant` endpoints returns `TenantInfo::has_in_progress_downloads`. We derive the value for that field from the tenant state now, to remain backwards-compatible with cloud.git. We will remove that field when we switch to on-demand downloads. Changes to Timeline States ========================== The TimelineInfo::awaits_download field is now equivalent to the tenant being in Attaching state. Previously, download progress was tracked per timeline. With this change, it’s only tracked per tenant. When on-demand downloads arrive, the field will be completely obsolete. Deprecation is tracked in isuse #2930. Crash-safe & Resumable Tenant Attach ==================================== Previously, the attach operation was not persistent. I.e., when tenant attach was interrupted by a crash, the pageserver would not continue attaching after pageserver restart. In fact, the half-finished tenant directory on disk would simply be skipped by tenant_mgr because it lacked the metadata file (it’s written last). This patch introduces an “attaching” marker file inside that is present inside the tenant directory while the tenant is attaching. During pageserver startup, tenant_mgr will resume attach if that file is present. If not, it assumes that the local tenant state is consistent and tries to load the tenant. If that fails, the tenant transitions into Broken state.	2022-11-29 18:55:20 +01:00
Egor Suvorov	b6989e8928	pageserver: make `wal_source_connstring: String` a 'wal_source_connconf: PgConnectionConfig`	2022-11-24 14:02:23 +03:00
Joonas Koivunen	1d105727cb	perf: simple walredo bench (#2816 ) adds a simple walredo bench to allow some comparison of the walredo throughput. Cc: #1339, #2778	2022-11-16 11:13:56 +02:00
Dmitry Ivanov	c38f38dab7	Move pq_proto to its own crate	2022-11-03 22:56:04 +03:00
Joonas Koivunen	cf68963b18	Add initial tenant sizing model and a http route to query it (#2714 ) Tenant size information is gathered by using existing parts of `Tenant::gc_iteration` which are now separated as `Tenant::refresh_gc_info`. `Tenant::refresh_gc_info` collects branch points, and invokes `Timeline::update_gc_info`; nothing was supposed to be changed there. The gathered branch points (through Timeline's `GcInfo::retain_lsns`), `GcInfo::horizon_cutoff`, and `GcInfo::pitr_cutoff` are used to build up a Vec of updates fed into the `libs/tenant_size_model` to calculate the history size. The gathered information is now exposed using `GET /v1/tenant/{tenant_id}/size`, which which will respond with the actual calculated size. Initially the idea was to have this delivered as tenant background task and exported via metric, but it might be too computationally expensive to run it periodically as we don't yet know if the returned values are any good. Adds one new metric: - pageserver_storage_operations_seconds with label `logical_size` - separating from original `init_logical_size` Adds a pageserver wide configuration variable: - `concurrent_tenant_size_logical_size_queries` with default 1 This leaves a lot of TODO's, tracked on issue #2748.	2022-11-03 12:39:19 +00:00
Kirill Bulatov	d42700280f	Remove daemonize from storage components (#2677 ) Move daemonization logic into `control_plane`. Storage binaries now only crate a lockfile to avoid concurrent services running in the same directory.	2022-11-02 02:26:37 +02:00
bojanserafimov	9fb2287f87	Add draw_timeline binary (#2688 )	2022-10-25 11:25:22 -04:00
Heikki Linnakangas	59bc7e67e0	Use an optimized version of amplify_num. Speeds up layer_map::search somewhat. I also opened a PR in the upstream rust-amplify repository with these changes, see https://github.com/rust-amplify/rust-amplify/pull/148. We can switch back to upstream version when that's merged.	2022-10-18 15:00:10 +03:00
Heikki Linnakangas	80746b1c7a	Add micro-benchmark for layer map search function The test data was extracted from our pgbench benchmark project on the captest environment, the one we use for the 'neon-captest-reuse' test.	2022-10-18 15:00:10 +03:00
Kirill Bulatov	c4ee62d427	Bump clap and other minor dependencies (#2623 )	2022-10-17 12:58:40 +03:00
Kirill Bulatov	f03b7c3458	Bump regular dependencies (#2618 ) * etcd-client is not updated, since we plan to replace it with another client and the new version errors with some missing prost library error * clap had released another major update that requires changing every CLI declaration again, deserves a separate PR	2022-10-15 01:55:31 +03:00
sharnoff	580584c8fc	Remove control_plane deps on pageserver/safekeeper (#2513 ) Creates new `pageserver_api` and `safekeeper_api` crates to serve as the shared dependencies. Should reduce both recompile times and cold compile times. Decreases the size of the optimized `neon_local` binary: 380M -> 179M. No significant changes for anything else (mostly as expected).	2022-10-04 11:14:45 -07:00
Konstantin Knizhnik	f3073a4db9	R-Tree layer map (#2317 ) Replace the layer array and linear search with R-tree So far, the in-memory layer map that holds information about layer files that exist, has used a simple Vec, in no particular order, to hold information about all the layers. That obviously doesn't scale very well; with thousands of layer files the linear search was consuming a lot of CPU. Replace it with a two-dimensional R-tree, with Key and LSN ranges as the dimensions. For the R-tree, use the 'rstar' crate. To be able to use that, we convert the Keys and LSNs into 256-bit integers. 64 bits would be enough to represent LSNs, and 128 bits would be enough to represent Keys. However, we use 256 bits, because rstar internally performs multiplication to calculate the area of rectangles, and the result of multiplying two 128 bit integers doesn't necessarily fit in 128 bits, causing integer overflow and, if overflow-checks are enabled, panic. To avoid that, we use 256 bit integers. Add a performance test that creates a lot of layer files, to demonstrate the benefit.	2022-09-22 08:35:06 +03:00
sharnoff	4a3b3ff11d	Move testing pageserver libpq cmds to HTTP api (#2429 ) Closes #2422. The APIs have been feature gated with the `testing_api!` macro so that they return 400s when support hasn't been compiled in.	2022-09-20 11:28:12 -07:00
Kirill Bulatov	031e57a973	Disable failpoints by default	2022-09-16 09:26:29 +03:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Heikki Linnakangas	40c845e57d	Switch to async for all concurrency in the pageserver. Instead of spawning helper threads, we now use Tokio tasks. There are multiple Tokio runtimes, for different kinds of tasks. One for serving libpq client connections, another for background operations like GC and compaction, and so on. That's not strictly required, we could use just one runtime, but with this you can still get an overview of what's happening with "top -H". There's one subtle behavior in how TenantState is updated. Before this patch, if you deleted all timelines from a tenant, its GC and compaction loops were stopped, and the tenant went back to Idle state. We no longer do that. The empty tenant stays Active. The changes to test_tenant_tasks.py are related to that. There's still plenty of synchronous code and blocking. For example, we still use blocking std::io functions for all file I/O, and the communication with WAL redo processes is still uses low-level unix poll(). We might want to rewrite those later, but this will do for now. The model is that local file I/O is considered to be fast enough that blocking - and preventing other tasks running in the same thread - is acceptable.	2022-09-12 14:21:00 +03:00
Heikki Linnakangas	34b5d7aa9f	Remove unused dependency	2022-08-27 18:14:33 +03:00
Ankur Srivastava	84d1bc06a9	refactor: replace lazy-static with once-cell (#2195 ) - Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy` - fixes #1147 Signed-off-by: Ankur Srivastava <best.ankur@gmail.com>	2022-08-05 19:34:04 +02:00
Heikki Linnakangas	b4c74c0ecd	Clean up unnecessary dependencies. Just to be tidy.	2022-07-20 16:31:25 +03:00
bojanserafimov	1ca28e6f3c	Import basebackup into pageserver (#1925 ) Allow importing basebackup taken from vanilla postgres or another pageserver via psql copy in protocol.	2022-06-21 11:04:10 -04:00
Kirill Bulatov	e5cb727572	Replace callmemaybe with etcd subscriptions on safekeeper timeline info	2022-06-01 16:07:04 +03:00
bojanserafimov	ca10cc12c1	Close file descriptors for redo process (#1834 )	2022-05-31 14:14:09 -04:00
Kirill Bulatov	a884f4cf6b	Add etcd to neon_local	2022-05-17 01:17:44 +03:00
Kirill Bulatov	51c0f9ab2b	Force git version to be up to date via decl macro	2022-05-13 16:34:32 +03:00
Kirill Bulatov	de37f982db	Share the remote storage as a crate	2022-05-07 00:30:36 +03:00
Dmitry Rodionov	05f8e6a050	Use fsync+rename for atomic downloads from remote storage Use failpoint in test_remote_storage to check the behavior	2022-04-29 15:53:56 +03:00
Anastasia Lubennikova	5c5c3c64f3	Fix tenant config parsing. Add a test	2022-04-28 11:49:19 +03:00
Dmitry Ivanov	d3f356e7a8	Update `rust-postgres` project-wide (#1525 ) * Update `rust-postgres` project-wide This commit points to https://github.com/neondatabase/rust-postgres/commits/neon in order to test our patches on top of the latest version of this crate. * [proxy] Update `hmac` and `sha2`	2022-04-22 17:31:58 +03:00
Heikki Linnakangas	a4700c9bbe	Use pprof to get flamegraph of get_page and get_relsize requests. This depends on a hacked version of the 'pprof-rs' crate. Because of that, it's under an optional 'profiling' feature. It is disabled by default, but enabled for release builds in CircleCI config. It doesn't currently work on macOS. The flamegraph is written to 'flamegraph.svg' in the pageserver workdir when the 'pageserver' process exits. Add a performance test that runs the perf_pgbench test, with profiling enabled.	2022-04-21 20:32:48 +03:00
Kirill Bulatov	81cad6277a	Move and library crates into a dedicated directory and rename them	2022-04-21 13:30:33 +03:00
Kirill Bulatov	3e6087a12f	Remove S3 archiving	2022-04-19 23:13:52 +03:00
Kirill Bulatov	0ca2bd929b	Remove log crate from pageserver	2022-04-18 00:00:36 +03:00
Heikki Linnakangas	93e0ac2b7a	Remove a couple of unused dependencies. Found by "cargo-udeps"	2022-04-14 17:38:26 +03:00

1 2 3

121 Commits