rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-18 21:50:37 +00:00

Author	SHA1	Message	Date
Kirill Bulatov	ebea298415	Update most of the dependencies to their latest versions (#4026 ) See https://github.com/neondatabase/neon/pull/3991 Brings the changes back with the right way to use new `toml_edit` to deserialize values and a unit test for this. All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated. 3 sets of crates remain unupdated: * `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13). * `opentelemetry` and `opentelemetry-` crates ``` error[E0308]: mismatched types --> libs/tracing-utils/src/http.rs:65:21 \| 65 \| span.set_parent(parent_ctx); \| ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context` \| \| \| arguments to this method are incorrect \| = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `opentelemetry_api` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8 \| 43 \| fn set_parent(&self, cx: Context); \| ^^^^^^^^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `tracing-utils` due to previous error warning: build failed, waiting for other jobs to finish... error: could not compile `tracing-utils` due to previous error ``` `tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need. similarly, `rustls`, `tokio-rustls`, `rustls-` and `tls-listener` crates have similar issue: ``` error[E0308]: mismatched types --> libs/postgres_backend/tests/simple_select.rs:112:78 \| 112 \| let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg); \| --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig` \| \| \| arguments to this function are incorrect \| = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types note: struct `ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1 \| 125 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1 \| 91 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `rustls` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12 \| 23 \| pub fn new(config: ClientConfig) -> Self { \| ^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `postgres_backend` due to previous error warning: build failed, waiting for other jobs to finish... ``` aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed. Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008	2023-04-14 18:28:54 +03:00
Christian Schwarz	8895f28dae	make evictions_low_residence_duration_metric_threshold per-tenant (#3949 ) Before this patch, if a tenant would override its eviction_policy setting to use a lower LayerAccessThreshold::threshold than the `evictions_low_residence_duration_metric_threshold`, the evictions done for that tenant would count towards the `evictions_with_low_residence_duration` metric. That metric is used to identify pre-mature evictions, commonly triggered by disk-usage-based eviction under disk pressure. We don't want that to happen for the legitimate evictions of the tenant that overrides its eviction_policy. So, this patch - moves the setting into TenantConf - adds test coverage - updates the staging & prod yamls Forward Compatibility: Software before this patch will ignore the new tenant conf field and use the global one instead. So we can roll back safely. Backward Compatibility: Parsing old configs with software as of this patch will fail in `PageServerConf::parse_and_validate` with error `unrecognized pageserver option 'evictions_low_residence_duration_metric_threshold'` if the option is still present in the global section. We deal with this by updating the configs in Ansible. fixes https://github.com/neondatabase/neon/issues/3940	2023-04-14 13:25:45 +03:00
Kirill Bulatov	f7995b3c70	Revert "Update most of the dependencies to their latest versions (#3991 )" (#4013 ) This reverts commit `a64044a7a9`. See https://neondb.slack.com/archives/C03H1K0PGKH/p1681306682795559	2023-04-12 14:51:59 +00:00
Kirill Bulatov	a64044a7a9	Update most of the dependencies to their latest versions (#3991 ) All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated. 3 sets of crates remain unupdated: * `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13). * `opentelemetry` and `opentelemetry-` crates ``` error[E0308]: mismatched types --> libs/tracing-utils/src/http.rs:65:21 \| 65 \| span.set_parent(parent_ctx); \| ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context` \| \| \| arguments to this method are incorrect \| = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `opentelemetry_api` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8 \| 43 \| fn set_parent(&self, cx: Context); \| ^^^^^^^^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `tracing-utils` due to previous error warning: build failed, waiting for other jobs to finish... error: could not compile `tracing-utils` due to previous error ``` `tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need. similarly, `rustls`, `tokio-rustls`, `rustls-` and `tls-listener` crates have similar issue: ``` error[E0308]: mismatched types --> libs/postgres_backend/tests/simple_select.rs:112:78 \| 112 \| let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg); \| --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig` \| \| \| arguments to this function are incorrect \| = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types note: struct `ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1 \| 125 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1 \| 91 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `rustls` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12 \| 23 \| pub fn new(config: ClientConfig) -> Self { \| ^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `postgres_backend` due to previous error warning: build failed, waiting for other jobs to finish... ``` aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed. Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008	2023-04-12 15:32:38 +03:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Kirill Bulatov	1300dc9239	Replace Python IT test with the Rust one	2023-03-29 00:08:30 +03:00
Joonas Koivunen	f14895b48e	eviction: avoid post-restart download by synthetic_size (#3871 ) As of #3867, we do artificial layer accesses to layers that will be needed after the next restart, but not until then because of caches. With this patch, we also do that for the accesses that the synthetic size calculation worker does if consumption metrics are enabled. The actual size calculation is not of importance, but we need to calculate all of the sizes, so we only call tenant::size::gather_inputs. Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-03-27 19:20:23 +02:00
Christian Schwarz	881356c417	add metrics to detect eviction-induced thrashing (#3837 ) This patch adds two metrics that will enable us to detect thrashing of layers, i.e., repetitions of `eviction, on-demand-download, eviction, ... ` for a given layer. The first metric counts all layer evictions per timeline. It requires no further explanation. The second metric counts the layer evictions where the layer was resident for less than a given threshold. We can alert on increments to the second metric. The first metric will serve as a baseline, and further, it's generally interesting, outside of thrashing. The second metric's threshold is configurable in PageServerConf and defaults to 24h. The threshold value is reproduced as a label in the metric because the counter's value is semantically tied to that threshold. Since changes to the config and hence the label value are infrequent, this will have low storage overhead in the metrics storage. The data source to determine the time that the layer was resident is the file's `mtime`. Using `mtime` is more of a crutch. It would be better if Pageserver did its own persistent bookkeeping of residence change events instead of relying on the filesystem. We had some discussion about this: https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900 My position is that `mtime` is good enough for now. It can theoretically jump forward if someone copies files without resetting `mtime`. But that shouldn't happen in practice. Note that moving files back and forth doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`, which is typically used for filesystem-level backup / restore, correctly syncs `mtime`. I've added a label that identifies the data source to keep options open for a future, better data source than `mtime`. Since this value will stay the same for the time being, it's not a problem for metrics storage. refs https://github.com/neondatabase/neon/issues/3728	2023-03-20 16:11:36 +01:00
Arthur Petukhovsky	b067378d0d	Measure cross-AZ traffic in safekeepers (#3806 ) Create `safekeeper_pg_io_bytes_total` metric to track total amount of bytes written/read in a postgres connections to safekeepers. This metric has the following labels: - `client_az` – availability zone of the connection initiator, or `"unknown"` - `sk_az` – availability zone of the safekeeper, or `"unknown"` - `app_name` – `application_name` of the postgres client - `dir` – data direction, either `"read"` or `"write"` - `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from `client_az` and `sk_az`, exists purely for convenience. This is implemented by passing availability zone in the connection string, like this: `-c tenant_id=AAA timeline_id=BBB availability-zone=AZ-1`. Update ansible deployment scripts to add availability_zone argument to safekeeper and pageserver in systemd service files.	2023-03-16 17:24:01 +03:00
Heikki Linnakangas	10a5d36af8	Separate mgmt and libpq authentication configs in pageserver. (#3773 ) This makes it possible to enable authentication only for the mgmt HTTP API or the compute API. The HTTP API doesn't need to be directly accessible from compute nodes, and it can be secured through network policies. This also allows rolling out authentication in a piecemeal fashion.	2023-03-15 13:52:29 +02:00
Arseny Sher	0d8ced8534	Remove sync postgres_backend, tidy up its split usage. - Add support for splitting async postgres_backend into read and write halfes. Safekeeper needs this for bidirectional streams. To this end, encapsulate reading-writing postgres messages to framed.rs with split support without any additional changes (relying on BufRead for reading and BytesMut out buffer for writing). - Use async postgres_backend throughout safekeeper (and in proxy auth link part). - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: https://github.com/neondatabase/neon/pull/3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split.	2023-03-09 20:45:56 +03:00
Heikki Linnakangas	fb1581d0b9	Fix setting "image_creation_threshold" setting in tenant config. (#3762 ) We have a few tests that try to set image_creation_threshold, but it didn't actually have any effect because we were missing some critical code to load the setting from config file into memory. The two modified tests in `test_remote_storage.py perform compaction and GC, and assert that GC removes some layers. That only happens if new image layers are created by the compaction. The tests explicitly disabled image layer creation by setting image_creation_threshold to a high value, but it didn't take effect because reading image_creation_threshold from config file was broken, which is why the test worked. Fix the test to set image_creation_threshold low, instead, so that GC has work to do. Change 'test_tenant_conf.py' so that it exercises the added code. This might explain why we're apparently missing test coverage for GC (issue #3415), although I didn't try to address that here, nor did I check if this improves the it.	2023-03-08 11:39:30 +02:00
Christian Schwarz	175a577ad4	automatic layer eviction This patch adds a per-timeline periodic task that executes an eviction policy. The eviction policy is configurable per tenant. Two policies exist: - NoEviction (the default one) - LayerAccessThreshold The LayerAccessThreshold policy examines the last access timestamp per layer in the layer map and evicts the layer if that last access is further in the past than a configurable threshold value. This policy kind is evaluated periodically at a configurable period. It logs a summary statistic at `info!()` or `warn!()` level, depending on whether any evictions failed. This feature has no explicit killswitch since it's off by default.	2023-02-09 13:33:55 +01:00
Anastasia Lubennikova	877a2d70e3	Periodically send cached consumption metrics (#3520 ) Add new pageserver config setting `cached_metric_collection_interval` with default `1 hour`. This setting controls how often unchanged cached consumption metrics are sent to the HTTP endpoint. This is a workaround for billing service limitations. fixes #3485	2023-02-06 17:53:10 +02:00
Christian Schwarz	01b4b0c2f3	Introduce RequestContext Motivation ========== Layer Eviction Needs Context ---------------------------- Before we start implementing layer eviction, we need to collect some access statistics per layer file or maybe even page. Part of these statistics should be the initiator of a page read request to answer the question of whether it was page_service vs. one of the background loops, and if the latter, which of them? Further, it would be nice to learn more about what activity in the pageserver initiated an on-demand download of a layer file. We will use this information to test out layer eviction policies. Read more about the current plan for layer eviction here: https://github.com/neondatabase/neon/issues/2476#issuecomment-1370822104 task_mgr problems + cancellation + tenant/timeline lifecycle ------------------------------------------------------------ Apart from layer eviction, we have long-standing problems with task_mgr, task cancellation, and various races around tenant / timeline lifecycle transitions. One approach to solve these is to abandon task_mgr in favor of a mechanism similar to Golang's context.Context, albeit extended to support waiting for completion, and specialized to the needs in the pageserver. Heikki solves all of the above at once in PR https://github.com/neondatabase/neon/pull/3228 , which is not yet merged at the time of writing. What Is This Patch About ======================== This patch addresses the immediate needs of layer eviction by introducing a `RequestContext` structure that is plumbed through the pageserver - all the way from the various entrypoints (page_service, management API, tenant background loops) down to Timeline::{get,get_reconstruct_data}. The struct carries a description of the kind of activity that initiated the call. We re-use task_mgr::TaskKind for this. Also, it carries the desired on-demand download behavior of the entrypoint. Timeline::get_reconstruct_data can then log the TaskKind that initiated the on-demand download. I developed this patch by git-checking-out Heikki's big RequestContext PR https://github.com/neondatabase/neon/pull/3228 , then deleting all the functionality that we do not need to address the needs for layer eviction. After that, I added a few things on top: 1. The concept of attached_child and detached_child in preparation for cancellation signalling through RequestContext, which will be added in a future patch. 2. A kill switch to turn DownloadBehavior::Error into a warning. 3. Renamed WalReceiverConnection to WalReceiverConnectionPoller and added an additional TaskKind WalReceiverConnectionHandler.These were necessary to create proper detached_child-type RequestContexts for the various tasks that walreceiver starts. How To Review This Patch ======================== Start your review with the module-level comment in context.rs. It explains the idea of RequestContext, what parts of it are implemented in this patch, and the future plans for RequestContext. Then review the various `task_mgr::spawn` call sites. At each of them, we should be creating a new detached_child RequestContext. Then review the (few) RequestContext::attached_child call sites and ensure that the spawned tasks do not outlive the task that spawns them. If they do, these call sites should use detached_child() instead. Then review the todo_child() call sites and judge whether it's worth the trouble of plumbing through a parent context from the caller(s). Lastly, go through the bulk of mechanical changes that simply forwards the &ctx.	2023-01-25 14:53:30 +01:00
Christian Schwarz	8ba1699937	Revert "Use actual temporary dir for pageserver unit tests" This reverts commit `826e89b9ce`. The problem with that commit was that it deletes the TempDir while there are still EphemeralFile instances open. At first I thought this could be fixed by simply adding Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None)) to TenantHarness::drop, but it turned out to be insufficient. So, reverting the commit until we find a proper solution. refs https://github.com/neondatabase/neon/issues/3385	2023-01-19 20:16:56 +01:00
Kirill Bulatov	826e89b9ce	Use actual temporary dir for pageserver unit tests	2023-01-18 17:43:27 +02:00
Anastasia Lubennikova	0675859bb0	Add background worker that periodically spawns synthetic size calculation. Add new pageserver config param calculate_synthetic_size_interval	2023-01-13 11:51:28 +02:00
Heikki Linnakangas	e9583db73b	Remove code and test to generate flamegraph on GetPage requests. (#3257 ) It was nice to have and useful at the time, but unfortunately the method used to gather the profiling data doesn't play nicely with 'async'. PR #3228 will turn 'get_page_at_lsn' function async, which will break the profiling support. Let's remove it, and re-introduce some kind of profiling later, using some different method, if we feel like we need it again.	2023-01-03 20:11:32 +02:00
Shany Pozin	0c7b02ebc3	Move tenant related files to tenant directory (#3214 ) Related to https://github.com/neondatabase/neon/issues/3208	2022-12-28 09:20:01 +02:00
Anastasia Lubennikova	63eb87bde3	Set default metric_collection_interval to 10 min, which is more reasonable for real usage	2022-12-22 16:24:09 +02:00
Kirill Bulatov	fca25edae8	Fix 1.66 Clippy warnings (#3178 ) 1.66 release speeds up compile times for over 10% according to tests. Also its Clippy finds plenty of old nits in our code: * useless conversion, `foo as u8` where `foo: u8` and similar, removed `as u8` and similar * useless references and dereferenced (that were automatically adjusted by the compiler), removed various `&` and `` bool -> u8 conversion via `if/else`, changed to `u8::from` * Map `.iter()` calls where only values were used, changed to `.values()` instead Standing out lints: * `Eq` is missing in our protoc generated structs. Silenced, does not seem crucial for us. * `fn default` looks like the one from `Default` trait, so I've implemented that instead and replaced the `dummy_` method in tests with `::default()` invocation Clippy detected that ``` if retry_attempt < u32::MAX { retry_attempt += 1; } ``` is a saturating add and proposed to replace it.	2022-12-22 14:27:48 +02:00
Anastasia Lubennikova	4235f97c6a	Implement consumption metrics collection. Add new background job to collect billing metrics for each tenant and send them to the HTTP endpoint. Metrics are cached, so we don't send non-changed metrics. Add metric collection config parameters: metric_collection_endpoint (default None, i.e. disabled) metric_collection_interval (default 60s) Add test_metric_collection.py to test metric collection and sending to the mocked HTTP endpoint. Use port distributor in metric_collection test review fixes: only update cache after metrics were send successfully, simplify code disable metric collection if metric_collection_endpoint is not provided in config	2022-12-20 22:59:52 +02:00
Heikki Linnakangas	8e2edfcf39	Retry remote downloads. Remote operations fail sometimes due to network failures or other external reasons. Add retry logic to all the remote downloads, so that a transient failure at pageserver startup or tenant attach doesn't cause the whole tenant to be marked as Broken. Like in the uploads retry logic, we print the failure to the log as a WARNing after three retries, but keep retrying. We will retry up to 10 times now, before returning the error to the caller. To test the retries, I created a new RemoteStorage wrapper that simulates failures, by returning an error for the first N times that a remote operation is performed. It can be enabled by setting a new "test_remote_failures" option in the pageserver config file. Fixes #3112	2022-12-20 14:27:24 +02:00
Arseny Sher	e14bbb889a	Enable broker client keepalives. (#3127 ) Should fix stale connections. ref https://github.com/neondatabase/neon/issues/3108	2022-12-16 11:55:12 +02:00
Kirill Bulatov	02c1c351dc	Create initial timeline without remote storage (#3077 ) Removes the race during pageserver initial timeline creation that lead to partial layer uploads. This race is only reproducible in test code, we do not create initial timelines in cloud (yet, at least), but still nice to remove the non-deterministic behavior.	2022-12-13 15:42:59 +02:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Kirill Bulatov	b50e0793cf	Rework remote_storage interface (#2993 ) Changes: * Remove `RemoteObjectId` concept from remote_storage. Operate directly on /-separated names instead. These names are now represented by struct `RemotePath` which was renamed from struct `RelativePath` * Require remote storage to operate on relative paths for its contents, thus simplifying the way to derive them in pageserver and safekeeper * Make `IndexPart` to use `String` instead of `RelativePath` for its entries, since those are just the layer names	2022-12-07 23:11:02 +02:00
Kirill Bulatov	09393279c6	Fix tenant config parsing	2022-12-06 23:52:16 +02:00
Kliment Serafimov	8f2b3cbded	Sentry integration for storage. (#2926 ) Added basic instrumentation to integrate sentry with the proxy, pageserver, and safekeeper processes. Currently in sentry there are three projects, one for each process. Sentry url is sent to all three processes separately via cli args.	2022-12-06 18:57:54 +00:00
Kirill Bulatov	d6bfe955c6	Add commands to unload and load the tenant in memory (#2977 ) Closes https://github.com/neondatabase/neon/issues/2537 Follow-up of https://github.com/neondatabase/neon/pull/2950 With the new model that prevents attaching without the remote storage, it has started to be even more odd to add attach-with-files functionality (in addition to the issues raised previously). Adds two separate commands: * `POST {tenant_id}/ignore` that places a mark file to skip such tenant on every start and removes it from memory * `POST {tenant_id}/schedule_load` that tries to load a tenant from local FS similar to what pageserver does now on startup, but without directory removals	2022-12-06 15:30:02 +00:00
Heikki Linnakangas	9a6c0be823	storage_sync2 The code in this change was extracted from PR #2595, i.e., Heikki’s draft PR for on-demand download. High-Level Changes - storage_sync module rewrite - Changes to Tenant Loading - Changes to Timeline States - Crash-safe & Resumable Tenant Attach There are several follow-up work items planned. Refer to the Epic issue on GitHub: https://github.com/neondatabase/neon/issues/2029 Metadata: closes https://github.com/neondatabase/neon/pull/2785 unsquashed history of this patch: archive/pr-2785-storage-sync2/pre-squash Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> =============================================================================== storage_sync module rewrite =========================== The storage_sync code is rewritten. New module name is storage_sync2, mostly to make a more reasonable git diff. The updated block comment in storage_sync2.rs describes the changes quite well, so, we will not reproduce that comment here. TL;DR: - Global sync queue and RemoteIndex are replaced with per-timeline `RemoteTimelineClient` structure that contains a queue for UploadOperations to ensure proper ordering and necessary metadata. - Before deleting local layer files, wait for ongoing UploadOps to finish (wait_completion()). - Download operations are not queued and executed immediately. Changes to Tenant Loading ========================= Initial sync part was rewritten as well and represents the other major change that serves as a foundation for on-demand downloads. Routines for attaching and loading shifted directly to Tenant struct and now are asynchronous and spawned into the background. Since this patch doesn’t introduce on-demand download of layers we fully synchronize with the remote during pageserver startup. See details in `Timeline::reconcile_with_remote` and `Timeline::download_missing`. Changes to Tenant States ======================== The “Active” state has lost its “background_jobs_running: bool” member. That variable indicated whether the GC & Compaction background loops are spawned or not. With this patch, they are now always spawned. Unit tests (#[test]) use the TenantConf::{gc_period,compaction_period} to disable their effect (`15db566`). This patch introduces a new tenant state, “Attaching”. A tenant that is being attached starts in this state and transitions to “Active” once it finishes download. The `GET /tenant` endpoints returns `TenantInfo::has_in_progress_downloads`. We derive the value for that field from the tenant state now, to remain backwards-compatible with cloud.git. We will remove that field when we switch to on-demand downloads. Changes to Timeline States ========================== The TimelineInfo::awaits_download field is now equivalent to the tenant being in Attaching state. Previously, download progress was tracked per timeline. With this change, it’s only tracked per tenant. When on-demand downloads arrive, the field will be completely obsolete. Deprecation is tracked in isuse #2930. Crash-safe & Resumable Tenant Attach ==================================== Previously, the attach operation was not persistent. I.e., when tenant attach was interrupted by a crash, the pageserver would not continue attaching after pageserver restart. In fact, the half-finished tenant directory on disk would simply be skipped by tenant_mgr because it lacked the metadata file (it’s written last). This patch introduces an “attaching” marker file inside that is present inside the tenant directory while the tenant is attaching. During pageserver startup, tenant_mgr will resume attach if that file is present. If not, it assumes that the local tenant state is consistent and tries to load the tenant. If that fails, the tenant transitions into Broken state.	2022-11-29 18:55:20 +01:00
Egor Suvorov	ae53dc3326	Add authentication between Safekeeper and Pageserver/Compute * Fix https://github.com/neondatabase/neon/issues/1854 * Never log Safekeeper::conninfo in walproposer as it now contains a secret token * control_panel, test_runner: generate and pass JWT tokens for Safekeeper to compute and pageserver * Compute: load JWT token for Safekepeer from the environment variable. Do not reuse the token from pageserver_connstring because it's embedded in there weirdly. * Pageserver: load JWT token for Safekeeper from the environment variable. * Rewrite docs/authentication.md	2022-11-25 04:17:42 +03:00
Heikki Linnakangas	99d9c23df5	Gather path-related consts and functions to one place. Feels more organized this way.	2022-11-24 12:26:15 +02:00
Joonas Koivunen	1d105727cb	perf: simple walredo bench (#2816 ) adds a simple walredo bench to allow some comparison of the walredo throughput. Cc: #1339, #2778	2022-11-16 11:13:56 +02:00
bojanserafimov	7fd88fab59	Trace read requests (#2762 )	2022-11-10 16:43:04 -05:00
Joonas Koivunen	cf68963b18	Add initial tenant sizing model and a http route to query it (#2714 ) Tenant size information is gathered by using existing parts of `Tenant::gc_iteration` which are now separated as `Tenant::refresh_gc_info`. `Tenant::refresh_gc_info` collects branch points, and invokes `Timeline::update_gc_info`; nothing was supposed to be changed there. The gathered branch points (through Timeline's `GcInfo::retain_lsns`), `GcInfo::horizon_cutoff`, and `GcInfo::pitr_cutoff` are used to build up a Vec of updates fed into the `libs/tenant_size_model` to calculate the history size. The gathered information is now exposed using `GET /v1/tenant/{tenant_id}/size`, which which will respond with the actual calculated size. Initially the idea was to have this delivered as tenant background task and exported via metric, but it might be too computationally expensive to run it periodically as we don't yet know if the returned values are any good. Adds one new metric: - pageserver_storage_operations_seconds with label `logical_size` - separating from original `init_logical_size` Adds a pageserver wide configuration variable: - `concurrent_tenant_size_logical_size_queries` with default 1 This leaves a lot of TODO's, tracked on issue #2748.	2022-11-03 12:39:19 +00:00
Lassi Pölönen	321aeac3d4	Json logging capability (#2624 ) * Support configuring the log format as json or plain. Separately test json and plain logger. They would be competing on the same global subscriber otherwise. * Implement log_format for pageserver config * Implement configurable log format for safekeeper.	2022-10-21 17:30:20 +00:00
Anastasia Lubennikova	52e75fead9	Use anyhow::Result explicitly	2022-10-21 12:47:06 +03:00
Anastasia Lubennikova	a347d2b6ac	#2616 handle 'Unsupported pg_version' error properly	2022-10-21 12:47:06 +03:00
Kirill Bulatov	306a47c4fa	Use uninit mark files during timeline init for atomic creation (#2489 ) Part of https://github.com/neondatabase/neon/pull/2239 Regular, from scratch, timeline creation involves initdb to be run in a separate directory, data from this directory to be imported into pageserver and, finally, timeline-related background tasks to start. This PR ensures we don't leave behind any directories that are not marked as temporary and that pageserver removes such directories on restart, allowing timeline creation to be retried with the same IDs, if needed. It would be good to later rewrite the logic to use a temporary directory, similar what tenant creation does. Yet currently it's harder than this change, so not done.	2022-10-20 14:19:17 +03:00
sharnoff	580584c8fc	Remove control_plane deps on pageserver/safekeeper (#2513 ) Creates new `pageserver_api` and `safekeeper_api` crates to serve as the shared dependencies. Should reduce both recompile times and cold compile times. Decreases the size of the optimized `neon_local` binary: 380M -> 179M. No significant changes for anything else (mostly as expected).	2022-10-04 11:14:45 -07:00
Anastasia Lubennikova	8d890b3cbb	fix clippy warnings	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	5dddeb8d88	Use non-versioned pg_distrib dir	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	86bf491981	Support pg 15 - Split postgres_ffi into two version specific files. - Preserve pg_version in timeline metadata. - Use pg_version in safekeeper code. Check for postgres major version mismatch. - Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding. - Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture. To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable: 'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress' Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.	2022-09-22 14:15:13 +03:00
Kirill Bulatov	310c507303	Merge path retrieval methods in config.rs	2022-09-20 23:43:52 +03:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Kirill Bulatov	1a8c8b04d7	Merge Repository and Tenant entities, rework tenant background jobs	2022-09-13 15:39:39 +03:00
Anastasia Lubennikova	05e263d0d3	Prepare pg 15 support (build system and submodules) (#2337 ) * Add submodule postgres-15 * Support pg_15 in pgxn/neon * Renamed zenith -> neon in Makefile * fix name of codestyle check * Refactor build system to prepare for building multiple Postgres versions. Rename "vendor/postgres" to "vendor/postgres-v14" Change Postgres build and install directory paths to be version-specific: - tmp_install/build -> pg_install/build/14 - tmp_install/* -> pg_install/14/* And Makefile targets: - "make postgres" -> "make postgres-v14" - "make postgres-headers" -> "make postgres-v14-headers" - etc. Add Makefile aliases: - "make postgres" to build "postgres-v14" and in future, "postgres-v15" - similarly for "make postgres-headers" Fix POSTGRES_DISTRIB_DIR path in pytest scripts * Make postgres version a variable in codestyle workflow * Support vendor/postgres-v15 in codestyle check workflow * Support postgres-v15 building in Makefile * fix pg version in Dockerfile.compute-node * fix kaniko path * Build neon extensions in version-specific directories * fix obsolete mentions of vendor/postgres * use vendor/postgres-v14 in Dockerfile.compute-node.legacy * Use PG_VERSION_NUM to gate dependencies in inmem_smgr.c * Use versioned ECR repositories and image names for compute-node. The image name format is compute-node-vXX, where XX is postgres major version number. For now only v14 is supported. Old format unversioned name (compute-node) is left, because cloud repo depends on it. * update vendor/postgres submodule url (zenith->neondatabase rename) * Fix postgres path in python tests after rebase * fix path in regress test * Use separate dockerfiles to build compute-node: Dockerfile.compute-node-v15 should be identical to Dockerfile.compute-node-v14 except for the version number. This is a hack, because Kaniko doesn't support build ARGs properly * bump vendor/postgres-v14 and vendor/postgres-v15 * Don't use Kaniko cache for v14 and v15 compute-node images * Build compute-node images for different versions in different jobs Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2022-09-05 18:30:54 +03:00
Arseny Sher	e593cbaaba	Add pageserver checkpoint_timeout option. To flush inmemory layer eventually when no new data arrives, which helps safekeepers to suspend activity (stop pushing to the broker). Default 10m should be ok.	2022-08-11 22:54:09 +03:00

1 2

77 Commits