rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 14:02:55 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	f0b2e076d9	Move compute_ctl structs used in HTTP API and spec file to separate crate. This is in preparation of using compute_ctl to launch postgres nodes in the neon_local control plane. And seems like a good idea to separate the public interfaces anyway. One non-mechanical change here is that the 'metrics' field is moved under the Mutex, instead of using atomics. We were not using atomics for performance but for convenience here, and it seems more clear to not use atomics in the model for the HTTP response type.	2023-04-09 21:52:28 +03:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Kirill Bulatov	1300dc9239	Replace Python IT test with the Rust one	2023-03-29 00:08:30 +03:00
Arthur Petukhovsky	d9a1329834	Make postgres_backend use generic IO type (#3789 ) - Support measuring inbound and outbound traffic in MeasuredStream - Start using MeasuredStream in safekeepers code	2023-03-13 12:18:10 +03:00
Vadim Kharitonov	1401021b21	Be able to get number of CPUs (#3774 ) After enabling autoscaling, we faced the issue that customers are not able to get the number of CPUs they use at this moment. Therefore I've added these two options: 1. Postgresql function to allow customers to call it whenever they want 2. `compute_ctl` endpoint to show these number in console	2023-03-10 19:00:20 +02:00
Arseny Sher	965837df53	Log connection ids in safekeeper instead of thread ids. Fixes build on macOS (which doesn't have nix gettid) after `0d8ced8534`.	2023-03-10 10:50:03 +03:00
Arseny Sher	b80fe41af3	Refactor postgres protocol parsing. 1) Remove allocation and data copy during each message read. Instead, parsing functions now accept BytesMut from which data they form messages, with pointers (e.g. in CopyData) pointing directly into BytesMut buffer. Accordingly, move ConnectionError containing IO error subtype into framed.rs providing this and leave in pq_proto only ProtocolError. 2) Remove anyhow from pq_proto. 3) Move FeStartupPacket out of FeMessage. Now FeStartupPacket::parse returns it directly, eliminating dead code where user wants startup packet but has to match for others. proxy stream.rs is adapted to framed.rs with minimal changes. It also benefits from framed.rs improvements described above.	2023-03-09 20:45:56 +03:00
Arseny Sher	0d8ced8534	Remove sync postgres_backend, tidy up its split usage. - Add support for splitting async postgres_backend into read and write halfes. Safekeeper needs this for bidirectional streams. To this end, encapsulate reading-writing postgres messages to framed.rs with split support without any additional changes (relying on BufRead for reading and BytesMut out buffer for writing). - Use async postgres_backend throughout safekeeper (and in proxy auth link part). - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: https://github.com/neondatabase/neon/pull/3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split.	2023-03-09 20:45:56 +03:00
Arseny Sher	7627d85345	Move async postgres_backend to its own crate. To untie cyclic dependency between sync and async versions of postgres_backend, copy QueryError and some logging/error routines to postgres_backend.rs. This is temporal glue to make commits smaller, sync version will be dropped by the upcoming commit completely.	2023-03-09 20:45:56 +03:00
Arthur Petukhovsky	b23742e09c	Create `/v1/debug_dump` safekeepers endpoint (#3710 ) Add HTTP endpoint to get full safekeeper state of all existing timelines (all in-memory values and info about all files stored on disk). Example: https://gist.github.com/petuhovskiy/3cbb8f870401e9f486731d145161c286	2023-03-03 14:01:05 +03:00
Shany Pozin	d19c5248c9	Add UUID header to mgmt API (#3708 ) ## Describe your changes ## Issue ticket number and link #3479 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-01 18:09:08 +02:00
Alexander Bayandin	000eb1b069	Bump tempfile from 3.3.0 to 3.4.0 (#3709 ) Update `tempfile` crate to get rid of `remove_dir_all` dependency Ref https://github.com/neondatabase/neon/security/dependabot/15	2023-02-27 12:44:08 +00:00
Sam Kleinman	c79dd8d458	compute_ctl: support for fetching spec from control plane (#3610 )	2023-02-23 13:19:39 -05:00
Joonas Koivunen	b8b8c19fb4	fix: hold permit until GetObject eof (#3663 ) previously we applied the ratelimiting only up to receiving the headers from s3, or somewhere near it. the commit adds an adapter which carries the permit until the AsyncRead has been disposed. fixes #3662.	2023-02-21 21:14:08 +02:00
Heikki Linnakangas	6f9af0aa8c	[proxy] Enable OpenTelemetry tracing. This commit sets up OpenTelemetry tracing and exporter, so that they can be exported as OpenTelemetry traces as well. All outgoing HTTP requests will be traced. A separate (child) span is created for each outgoing HTTP request, and the tracing context is also propagated to the server in the HTTP headers. If tracing is enabled in the control plane and compute node too, you can now get an end-to-end distributed trace of what happens when a new connection is established, starting from the handshake with the client, creating the 'start_compute' operation in the control plane, starting the compute node, all the way to down to fetching the base backup and the availability checks in compute_ctl. Co-authored-by: Dmitry Ivanov <dima@neon.tech>	2023-02-17 15:32:14 +03:00
Heikki Linnakangas	ddbdcdddd7	Tenant size calculation: refactor, rewrite, and add SVG (#2817 ) Refactor the tenant_size_model code. Segment now contains just the minimum amount of information needed to calculate the size. Other information that is useful for building up the segment tree, and for display purposes, is now kept elsewhere. The code in 'main.rs' has a new ScenarioBuilder struct for that. Calculating which Segments are "needed" is now the responsibility of the caller of tenant_size_mode, not part of the calculation itself. So it's up to the caller to make all the decisions with retention periods for each branch. The output of the sizing calculation is now a Vec of SizeResults, rather than a tree. It uses a tree representation internally, when doing the calculation, but it's not exposed to the caller anymore. Refactor the way the recursive calculation is performed. Rewrite the code in size.rs that builds the Segment model. Get rid of the intermediate representation with Update structs. Build the Segments directly, with some local HashMaps and Vecs to track branch points to help with that. retention_period is now an input to gather_inputs(), rather than an output. Update pageserver http API: rename /size endpoint to /synthetic_size with following parameters: - /synthetic_size?inputs_only to get debug info; - /synthetic_size?retention_period=0 to override cutoff that is used to calculate the size; pass header -H "Accept: text/html" to get HTML output, otherwise JSON is returned Update python tests and openapi spec. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-02-16 10:53:46 +02:00
Dmitry Ivanov	9657459d80	[proxy] Fix possible unsoundness in the websocket machinery (#3569 ) This PR replaces the ill-advised `unsafe Sync` impl with a de-facto standard way to solve the underlying problem. TLDR: - tokio::task::spawn requires future to be Send - ∀t. (t : Sync) <=> (&t : Send) - ∀t. (t : Send + !Sync) => (&t : !Send)	2023-02-10 12:45:38 +03:00
Christian Schwarz	175a577ad4	automatic layer eviction This patch adds a per-timeline periodic task that executes an eviction policy. The eviction policy is configurable per tenant. Two policies exist: - NoEviction (the default one) - LayerAccessThreshold The LayerAccessThreshold policy examines the last access timestamp per layer in the layer map and evicts the layer if that last access is further in the past than a configurable threshold value. This policy kind is evaluated periodically at a configurable period. It logs a summary statistic at `info!()` or `warn!()` level, depending on whether any evictions failed. This feature has no explicit killswitch since it's off by default.	2023-02-09 13:33:55 +01:00
Christian Schwarz	58fa4f0eb7	maintain access stats for historic layers This patch adds basic access statistics for historic layers and exposes them in the management API's `LayerMapInfo`. We record the accesses in the `{Delta,Image}Layer::load()` function because it's the common path of * page_service (`Timline::get_reconstruct_data()`) * Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`) The stats survive residence status changes, and record these as well. When scraping the layer map endpoint to record its evolution over time, one must account for stat resets because they are in-memory only and will reset on pageserver restart. Use the launch timestamp header added by (#3527) to identify pageserver restarts. This is PR https://github.com/neondatabase/neon/pull/3496	2023-02-06 17:01:38 +01:00
Christian Schwarz	87cd2bae77	introduce LaunchTimestamp to identify process restarts This patch adds a LaunchTimestamp type to the `metrics` crate, along with a `libmetric_` Prometheus metric. The initial user is pageserver. In addition to exposing the Prometheus metric, it also reproduces the launch timestamp as a header in the API responses. The motivation for this is that we plan to scrape the pageserver's /v1/tenant/:tenant_id/timeline/:timeline_id/layer HTTP endpoint over time. It will soon expose access metrics (#3496) which reset upon process restart. We will use the pageserver's launch ID to identify a restart between two scrape points. However, there are other potential uses. For example, we could use the Prometheus metric to annotate Grafana plots whenever the launch timestamp changes.	2023-02-03 18:12:17 +01:00
bojanserafimov	ada933eb42	Pageserver read trace utils (#2795 ) List, dump, and analyze read traces.	2023-02-02 15:33:40 -05:00
Dmitry Ivanov	ea0278cf27	[proxy] Implement compute node info cache (#3331 ) This patch adds a timed LRU cache implementation and a compute node info cache on top of that. Cache entries might expire on their own (default ttl=5mins) or become invalid due to real-world events, e.g. compute node scale-to-zero event, so we add a connection retry loop with a wake-up call. Solved problems: - [x] Find a decent LRU implementation. - [x] Implement timed LRU on top of that. - [x] Cache results of `proxy_wake_compute` API call. - [x] Don't invalidate newer cache entries for the same key. - [x] Add cmdline configuration knobs (requires some refactoring). - [x] Add failed connection estab metric. - [x] Refactor auth backends to make things simpler (retries, cache placement, etc). - [x] Address review comments (add code comments + cleanup). - [x] Retry `/proxy_wake_compute` if we couldn't connect to a compute (e.g. stalled cache entry). - [x] Add high-level description for `TimedLru`. TODOs (will be addressed later): - [ ] Add cache metrics (hit, spurious hit, miss). - [ ] Synchronize http requests across concurrent per-client tasks (https://github.com/neondatabase/neon/pull/3331#issuecomment-1399216069). - [ ] Cache results of `proxy_get_role_secret` API call.	2023-02-01 17:11:41 +03:00
Christian Schwarz	590695e845	improve query param parsing - add parse_query_param() - use Cow<> where possible - move param parsing code to utils::http::request This was originally PR https://github.com/neondatabase/neon/pull/3502 which targeted a different branch. closes #3510	2023-02-01 14:11:12 +01:00
Heikki Linnakangas	0c0e15b81d	compute_ctl: Extract tracing context from incoming HTTP requests. This allows tracing the handling of HTTP requests as part of the caller's trace.	2023-01-26 15:20:03 +02:00
Heikki Linnakangas	3e94fd5af3	Inherit OpenTelemetry context for compute startup from cloud console. This allows fine-grained distributed tracing of the 'start_compute' operation from the cloud console. The startup actions performed by 'compute_ctl' are now performed in a child of the 'start_compute' context, so you can trace through the whole compute start operation. This needs a corresponding change in the cloud console to fill in the 'startup_tracing_context' field in the json spec. If it's missing, the startup operations are simply traced as a separate trace, without a parent.	2023-01-26 15:20:03 +02:00
Heikki Linnakangas	006ee5f94a	Configure 'compute_ctl' to use OpenTelemetry exporter. This allows tracing the startup actions e.g. with Jaeger (https://www.jaegertracing.io/). We use the "tracing-opentelemetry" crate, which turns tracing spans into OpenTelemetry spans, so you can use the usual "#[instrument]" directives to add tracing. I put the tracing initialization code to a separate crate, `tracing-utils`, so that we can reuse it in other programs. We probably want to set up tracing in the same way in all our programs. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-01-26 15:20:03 +02:00
Kirill Bulatov	fd18692dfb	Output coloured pageserver logs for tty stdout	2023-01-24 09:58:08 +02:00
bojanserafimov	a3d7ad2d52	Implement layer map using immutable BST (#2998 )	2023-01-20 16:10:12 -05:00
Heikki Linnakangas	e5cc2f92c4	Switch to 'tracing' for logging, restructure code to make use of spans. Refactors Compute::prepare_and_run. It's split into subroutines differently, to make it easier to attach tracing spans to the different stages. The high-level logic for waiting for Postgres to exit is moved to the caller. Replace 'env_logger' with 'tracing', and add `#instrument` directives to different stages fo the startup process. This is a fairly mechanical change, except for the changes in 'spec.rs'. 'spec.rs' contained some complicated formatting, where parts of log messages were printed directly to stdout with `print`s. That was a bit messed up because the log normally goes to stderr, but those lines were printed to stdout. In our docker images, stderr and stdout both go to the same place so you wouldn't notice, but I don't think it was intentional. This changes the log format to the default 'tracing_subscriber::format' format. It's different from the Postgres log format, however, and because both compute_tools and Postgres print to the same log, it's now a mix of two different formats. I'm not sure how the Grafana log parsing pipeline can handle that. If it's a problem, we can build custom formatter to change the compute_tools log format to be the same as Postgres's, like it was before this commit, or we can change the Postgres log format to match tracing_formatter's, or we can start printing compute_tool's log output to a different destination than Postgres	2023-01-18 19:42:47 +02:00
Anastasia Lubennikova	2cbe84b78f	Proxy metrics (#3290 ) Implement proxy metrics collection. Only collect metric for outbound traffic. Add proxy CLI parameters: - metric-collection-endpoint - metric-collection-interval. Add test_proxy_metric_collection test. Move shared consumption metrics code to libs/consumption_metrics. Refactor the code.	2023-01-16 15:17:28 +00:00
Kirill Bulatov	bce4233d3a	Rework Cargo.toml dependencies (#3322 ) * Use workspace variables from cargo, coming with rustc [1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22) See https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table and https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table sections. Now, all dependencies in all non-root `Cargo.toml` files are defined as ``` clap.workspace = true ``` sometimes, when extra features are needed, as ``` bytes = {workspace = true, features = ['serde'] } ``` With the actual declarations (with shared features and version numbers/file paths/etc.) in the root Cargo.toml. Features are additive: https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#inheriting-a-dependency-from-a-workspace * Uses the mechanism above to set common, 2021, edition and license across the workspace * Mechanically bumps a few dependencies * Updates hakari format, as it suggested: ``` work/neon/neon kb/cargo-templated ❯ cargo hakari generate info: no changes detected info: new hakari format version available: 3 (current: 2) (add or update `dep-format-version = "3"` in hakari.toml, then run `cargo hakari generate && cargo hakari manage-deps`) ```	2023-01-13 18:13:34 +02:00
Anastasia Lubennikova	5e3e0fbf6f	remove unneeded Cargo.lock changes	2023-01-13 11:51:28 +02:00
Anastasia Lubennikova	26f39c03f2	review code cleanup: - handle errors in calculate_synthetic_size_worker. Don't exit the bgworker if one tenant failed. - add cached_synthetic_tenant_size to cache values calculated by the bgworker - code cleanup: remove unneeded info! messages, clean comments - handle collect_metrics_task() error. Don't exit collect_metrics worker if one task failed. - add unit test to cover case when we have multiple branches at the same lsn	2023-01-13 11:51:28 +02:00
Anastasia Lubennikova	ba0190e3e8	Handle errors in tenant_size_model code	2023-01-13 11:51:28 +02:00
Vadim Kharitonov	80d4afab0c	Update tokio version (RUSTSEC-2023-0001)	2023-01-10 09:02:00 +01:00
Arthur Petukhovsky	debd134b15	Implement wss support in proxy (#3247 ) This is a hacky implementation of WebSocket server, embedded into our postgres proxy. The server is used to allow https://github.com/neondatabase/serverless to connect to our postgres from browser and serverless javascript functions. How it will work (general schema): - browser opens a websocket connection to `wss://ep-abc-xyz-123.xx-central-1.aws.neon.tech/` - proxy accepts this connection and terminates TLS (https) - inside encrypted tunnel (HTTPS), browser initiates plain (non-encrypted) postgres connection - proxy performs auth as in usual plain pg connection and forwards connection to the compute Related issue: #3225	2023-01-06 18:34:18 +03:00
Kirill Bulatov	10dae79c6d	Tone down safekeeper and pageserver walreceiver errors (#3227 ) Closes https://github.com/neondatabase/neon/issues/3114 Adds more typization into errors that appear during protocol messages (`FeMessage`), postgres and walreceiver connections. Socket IO errors are now better detected and logged with lesser (INFO, DEBUG) error level, without traces that they were logged before, when they were wrapped in anyhow context.	2023-01-03 20:42:04 +00:00
Heikki Linnakangas	e9583db73b	Remove code and test to generate flamegraph on GetPage requests. (#3257 ) It was nice to have and useful at the time, but unfortunately the method used to gather the profiling data doesn't play nicely with 'async'. PR #3228 will turn 'get_page_at_lsn' function async, which will break the profiling support. Let's remove it, and re-introduce some kind of profiling later, using some different method, if we feel like we need it again.	2023-01-03 20:11:32 +02:00
Heikki Linnakangas	0a0e55c3d0	Replace 'tar' crate with 'tokio-tar' (#3202 ) The synchronous 'tar' crate has required us to use block_in_place and SyncIoBridge to work together with the async I/O in the client connection. Switch to 'tokio-tar' crate that uses async I/O natively. As part of this, move the CopyDataWriter implementation to postgres_backend_async.rs. Even though it's only used in one place currently, it's in principle generally applicable whenever you want to use COPY out. Unfortunately we cannot use the 'tokio-tar' as it is: the Builder implementation requires the writer to have 'static lifetime. So we have to use a modified version without that requirement. The 'static lifetime was required just for the Drop implementation that writes the end-of-archive sections if the Builder is dropped without calling `finish`. But we don't actually want that behavior anyway; in fact we had to jump through some hoops with the AbortableWrite hack to skip those. With the modified version of 'tokio-tar' without that Drop implementation, we don't need AbortableWrite either. Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2023-01-03 12:39:11 +02:00
Arseny Sher	41b8e67305	Fix `81afd7011` by enabling reqwest feature for sentry. It disabled transport altogether.	2023-01-02 15:29:33 +03:00
Heikki Linnakangas	81afd7011c	Use rustls for everything. I looked at "cargo tree" output and noticed that through various dependencies, we are depending on both native-tls and rustls. We have tried to standardize on rustls for everything, but dependencies on native-tls have crept in recently. One such dependency came from 'reqwest' with default features in pageserver, used for consumption_metrics. Another dependency was from 'sentry'. Both 'reqwest' and 'sentry' use native-tls by default, but can use 'rustls' if compiled with the right feature flags.	2023-01-02 11:14:35 +02:00
Anastasia Lubennikova	4235f97c6a	Implement consumption metrics collection. Add new background job to collect billing metrics for each tenant and send them to the HTTP endpoint. Metrics are cached, so we don't send non-changed metrics. Add metric collection config parameters: metric_collection_endpoint (default None, i.e. disabled) metric_collection_interval (default 60s) Add test_metric_collection.py to test metric collection and sending to the mocked HTTP endpoint. Use port distributor in metric_collection test review fixes: only update cache after metrics were send successfully, simplify code disable metric collection if metric_collection_endpoint is not provided in config	2022-12-20 22:59:52 +02:00
Dmitry Ivanov	61194ab2f4	Update rust-postgres everywhere I've rebased[1] Neon's fork of rust-postgres to incorporate latest upstream changes (including dependabot's fixes), so we need to advance revs here as well. [1] https://github.com/neondatabase/rust-postgres/commits/neon	2022-12-17 00:26:10 +03:00
Dmitry Ivanov	83baf49487	[proxy] Forward compute connection params to client This fixes all kinds of problems related to missing params, like broken timestamps (due to `integer_datetimes`). This solution is not ideal, but it will help. Meanwhile, I'm going to dedicate some time to improving connection machinery. Note that this does not fix problems with passing certain parameters in a reverse direction, i.e. from client to compute. This is a separate matter and will be dealt with in an upcoming PR.	2022-12-16 21:37:50 +03:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Kliment Serafimov	8f2b3cbded	Sentry integration for storage. (#2926 ) Added basic instrumentation to integrate sentry with the proxy, pageserver, and safekeeper processes. Currently in sentry there are three projects, one for each process. Sentry url is sent to all three processes separately via cli args.	2022-12-06 18:57:54 +00:00
Alexander Bayandin	61825dfb57	Update chrono to 0.4.23; use only clock feature from it	2022-12-06 15:45:58 +01:00
Andrés	ebb51f16e0	Re-introduce aws-sdk-rust as rusoto S3 replacement (#2841 ) Part of https://github.com/neondatabase/neon/issues/2683 Initial PR: https://github.com/neondatabase/neon/pull/2802, revert: https://github.com/neondatabase/neon/pull/2837 Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-11-30 17:47:32 +02:00
Heikki Linnakangas	9a6c0be823	storage_sync2 The code in this change was extracted from PR #2595, i.e., Heikki’s draft PR for on-demand download. High-Level Changes - storage_sync module rewrite - Changes to Tenant Loading - Changes to Timeline States - Crash-safe & Resumable Tenant Attach There are several follow-up work items planned. Refer to the Epic issue on GitHub: https://github.com/neondatabase/neon/issues/2029 Metadata: closes https://github.com/neondatabase/neon/pull/2785 unsquashed history of this patch: archive/pr-2785-storage-sync2/pre-squash Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> =============================================================================== storage_sync module rewrite =========================== The storage_sync code is rewritten. New module name is storage_sync2, mostly to make a more reasonable git diff. The updated block comment in storage_sync2.rs describes the changes quite well, so, we will not reproduce that comment here. TL;DR: - Global sync queue and RemoteIndex are replaced with per-timeline `RemoteTimelineClient` structure that contains a queue for UploadOperations to ensure proper ordering and necessary metadata. - Before deleting local layer files, wait for ongoing UploadOps to finish (wait_completion()). - Download operations are not queued and executed immediately. Changes to Tenant Loading ========================= Initial sync part was rewritten as well and represents the other major change that serves as a foundation for on-demand downloads. Routines for attaching and loading shifted directly to Tenant struct and now are asynchronous and spawned into the background. Since this patch doesn’t introduce on-demand download of layers we fully synchronize with the remote during pageserver startup. See details in `Timeline::reconcile_with_remote` and `Timeline::download_missing`. Changes to Tenant States ======================== The “Active” state has lost its “background_jobs_running: bool” member. That variable indicated whether the GC & Compaction background loops are spawned or not. With this patch, they are now always spawned. Unit tests (#[test]) use the TenantConf::{gc_period,compaction_period} to disable their effect (`15db566`). This patch introduces a new tenant state, “Attaching”. A tenant that is being attached starts in this state and transitions to “Active” once it finishes download. The `GET /tenant` endpoints returns `TenantInfo::has_in_progress_downloads`. We derive the value for that field from the tenant state now, to remain backwards-compatible with cloud.git. We will remove that field when we switch to on-demand downloads. Changes to Timeline States ========================== The TimelineInfo::awaits_download field is now equivalent to the tenant being in Attaching state. Previously, download progress was tracked per timeline. With this change, it’s only tracked per tenant. When on-demand downloads arrive, the field will be completely obsolete. Deprecation is tracked in isuse #2930. Crash-safe & Resumable Tenant Attach ==================================== Previously, the attach operation was not persistent. I.e., when tenant attach was interrupted by a crash, the pageserver would not continue attaching after pageserver restart. In fact, the half-finished tenant directory on disk would simply be skipped by tenant_mgr because it lacked the metadata file (it’s written last). This patch introduces an “attaching” marker file inside that is present inside the tenant directory while the tenant is attaching. During pageserver startup, tenant_mgr will resume attach if that file is present. If not, it assumes that the local tenant state is consistent and tries to load the tenant. If that fails, the tenant transitions into Broken state.	2022-11-29 18:55:20 +01:00
Egor Suvorov	b6989e8928	pageserver: make `wal_source_connstring: String` a 'wal_source_connconf: PgConnectionConfig`	2022-11-24 14:02:23 +03:00

1 2 3 4 5 ...

278 Commits