rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 21:20:37 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	0cf7fd0fb8	Compaction with on-demand download (#3598 ) Repeatedly (twice) try to download the compaction targeted layers before actual compaction. Adds tests for both L0 compaction downloading layers and image creation downloading layers. Image creation support existed already. Fixes #3591 Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-02-16 15:36:13 +02:00
Kirill Bulatov	f0b41e7750	Propose less verbose way to build neon (#3624 ) Closes https://github.com/neondatabase/neon/issues/3518 and might help https://github.com/neondatabase/neon/issues/3611 and the future build attempts. Propose `-s` flag in the Readme when building via `make` command, to help people to spot build errors easier.	2023-02-16 14:25:35 +02:00
Vadim Kharitonov	5082d84f5b	Compile pgjwt extension	2023-02-16 10:54:22 +01:00
Anastasia Lubennikova	7991bd3b69	Fix periodic metric sending: don't reset timer on every iteration (#3617 ) Previously timer was reset on every collect_metrics_iteration and sending of cached metrics was never triggered. This is a follow-up for `a69da4a7`.	2023-02-16 10:56:42 +02:00
Heikki Linnakangas	ddbdcdddd7	Tenant size calculation: refactor, rewrite, and add SVG (#2817 ) Refactor the tenant_size_model code. Segment now contains just the minimum amount of information needed to calculate the size. Other information that is useful for building up the segment tree, and for display purposes, is now kept elsewhere. The code in 'main.rs' has a new ScenarioBuilder struct for that. Calculating which Segments are "needed" is now the responsibility of the caller of tenant_size_mode, not part of the calculation itself. So it's up to the caller to make all the decisions with retention periods for each branch. The output of the sizing calculation is now a Vec of SizeResults, rather than a tree. It uses a tree representation internally, when doing the calculation, but it's not exposed to the caller anymore. Refactor the way the recursive calculation is performed. Rewrite the code in size.rs that builds the Segment model. Get rid of the intermediate representation with Update structs. Build the Segments directly, with some local HashMaps and Vecs to track branch points to help with that. retention_period is now an input to gather_inputs(), rather than an output. Update pageserver http API: rename /size endpoint to /synthetic_size with following parameters: - /synthetic_size?inputs_only to get debug info; - /synthetic_size?retention_period=0 to override cutoff that is used to calculate the size; pass header -H "Accept: text/html" to get HTML output, otherwise JSON is returned Update python tests and openapi spec. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-02-16 10:53:46 +02:00
Shany Pozin	7b182e2605	Update settings.md with latest PITR and gc period values (#3618 ) ## Describe your changes Updates PITR and GC_PERIOD default value doc ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-16 10:33:04 +02:00
Dmitry Ivanov	1d9d7c02db	[proxy] Don't forward empty `options` to compute nodes Clients may specify endpoint/project name via `options=project=...`, so we should not only remove `project=` from `options` but also drop `options` entirely, because connection pools don't support it. Discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1676464382670119	2023-02-15 22:05:03 +03:00
Anna Stepanyan	a974602f9f	fix the logical size term definition (#3609 ) a size of a database cannot be a sum of the sizes of all databases indicating that a logical size is calculated for a branch ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] i checked the suggested changes - [x] this is not a core feature - [x] this is just a docs update, does not require analytics - [x] this PR does not require a public announcement	2023-02-15 15:11:06 +01:00
Anastasia Lubennikova	a839860c2e	Add debug messages around sending cached metrics	2023-02-15 16:02:02 +02:00
Anastasia Lubennikova	a5ce2b5330	Add debug messages around timeline.get_current_logical_size	2023-02-15 16:02:02 +02:00
Dmitry Ivanov	3569c1bacd	[proxy] Fix: don't cache user & dbname in node info cache Upstream proxy erroneously stores user & dbname in compute node info cache entries, thus causing "funny" connection problems if such an entry is reused while connecting to e.g. a different DB on the same compute node. This PR fixes the problem but doesn't eliminate the root cause just yet. I'll revisit this code and make it more type-safe in the upcoming PR.	2023-02-14 17:54:01 +03:00
Vadim Kharitonov	86681b92aa	Enable plls and plcoffee extensions	2023-02-14 13:33:27 +01:00
Sergey Melnikov	eb21d9969d	Add pageserver-3.us-west-2.aws.neon.tech (#3603 )	2023-02-14 12:56:03 +01:00
Joonas Koivunen	e6618f1cc0	Update current logical size gauge (#3592 ) Alternative to #3586. Introduces usage of current_logical_size.current_size as a boundary after which we start to update the metric gauge on ingested wal. Previously any incremented value (ingested wal) would had updated the gauge, but this would had left the metric at zero for timelines which never receive any wal even if size had been calculated. Now the gauge is updated right away as the calculation completes, not requiring any wal to be received.	2023-02-14 13:17:34 +02:00
Dmitry Ivanov	eaff14da5f	[proxy] Restore INFO as the default tracing level Also move tracing init to its own function.	2023-02-13 17:09:43 +03:00
Arthur Petukhovsky	f383b4d540	Enable TCP_NODELAY for wss connections	2023-02-10 21:40:28 +03:00
Dmitry Ivanov	694150ce40	[proxy] Respect the magic `RUST_LOG` env variable Usage: `RUST_LOG=trace proxy ...`	2023-02-10 18:49:32 +03:00
Vadim Kharitonov	f4359b688c	Backport `cargo fmt` diff from `release` branch into `main`	2023-02-10 14:20:55 +01:00
Vadim Kharitonov	948f047f0a	Compile pgvector extension	2023-02-10 13:37:52 +01:00
Vadim Kharitonov	4175cfbdac	Create folder for file_cache	2023-02-10 13:03:12 +01:00
Dmitry Ivanov	9657459d80	[proxy] Fix possible unsoundness in the websocket machinery (#3569 ) This PR replaces the ill-advised `unsafe Sync` impl with a de-facto standard way to solve the underlying problem. TLDR: - tokio::task::spawn requires future to be Send - ∀t. (t : Sync) <=> (&t : Send) - ∀t. (t : Send + !Sync) => (&t : !Send)	2023-02-10 12:45:38 +03:00
Christian Schwarz	a4256b3250	allow on-demand downloads in walreceiver connection handler Without this patch, basebackup fails if we evict all layers before that. This slipped in as part of commit `01b4b0c2f3` Author: Christian Schwarz <christian@neon.tech> Date: Fri Jan 13 17:02:22 2023 +0100 Introduce RequestContext	2023-02-09 13:39:04 +01:00
Christian Schwarz	175a577ad4	automatic layer eviction This patch adds a per-timeline periodic task that executes an eviction policy. The eviction policy is configurable per tenant. Two policies exist: - NoEviction (the default one) - LayerAccessThreshold The LayerAccessThreshold policy examines the last access timestamp per layer in the layer map and evicts the layer if that last access is further in the past than a configurable threshold value. This policy kind is evaluated periodically at a configurable period. It logs a summary statistic at `info!()` or `warn!()` level, depending on whether any evictions failed. This feature has no explicit killswitch since it's off by default.	2023-02-09 13:33:55 +01:00
Joonas Koivunen	1fdf01e3bc	fix: readable Debug for Layers (#3575 ) #3536 added the custom Debug implementations but it using derived Debug on Key lead to too verbose output. Instead of making `Key`'s `Debug` unconditionally or conditionally do the `Display` variant (for table space'd keys), opted to build a newtype to provide `Debug` for `Range<Key>` via `Display` which seemed to work unconditionally. Also orders Key to have: 1. comment, 2. derive, 3. `struct Key`.	2023-02-09 13:55:37 +02:00
Christian Schwarz	446a39e969	make LayerAccesStatFullDetails Copy Method to_api_model renamed to as_api_model because of Clippy complaint: https://rust-lang.github.io/rust-clippy/master/index.html#wrong_self_convention	2023-02-09 12:35:45 +01:00
Arthur Petukhovsky	7ed9eb4a56	Add script for safekeeper tenants cleanup (#3452 ) This script can be used to remove tenant directories on safekeepers for projects which do not longer exist (deleted in the console). To run this script you need to upload it to safekeeper (i.e. with SSH), and run it with python3. Ansible can be used to run this script on multiple safekeepers. Fixes https://github.com/neondatabase/cloud/issues/3356	2023-02-09 13:28:20 +02:00
Joonas Koivunen	f07d6433b6	fix: one leftover Arc::ptr_eq (#3573 ) @knizhnik noticed that one instance of `Arc::<dyn PersistentLayer>::ptr_eq` was missed in #3558. Now all `ptr_eq` which remain are in comments.	2023-02-09 13:02:07 +02:00
Heikki Linnakangas	2040db98ef	Add docs for synthetic size calculation (#3328 ) --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-02-09 11:20:10 +02:00
dependabot[bot]	371493ae32	Bump cryptography from 38.0.3 to 39.0.1 (#3565 )	2023-02-08 16:08:01 +00:00
Rory de Zoete	1b9e5e84aa	Add new storage hosts for placement group test (#3561 ) To test the placement group setup	2023-02-08 16:48:29 +01:00
Christian Schwarz	7ed93fff06	refactor: allow for eviction of layers in a batch The auto-eviction PR (#3552) operates in two phaes: 1. find candidate layers 2. evict them. For (2), a batch API like the one added in this commit is useful. Note that this PR requires #3558 to be merged first. Otherwise, the tests won't pass.	2023-02-08 14:40:47 +01:00
Joonas Koivunen	a6dffb6ef9	fix: stop using Arc::ptr_eq with dyn Trait (#3558 ) This changes the way we compare `Arc<dyn PersistentLayer>` in Timeline's `LayerMap` not to use `Arc::ptr_eq` which has been witnessed in development of #3557 to yield wrong results. It gives wrong results because it compares fat pointers, which are `(object, vtable)` tuples for `dyn Trait` and there are no guarantees that the `vtable`s are unique. As in there were multiple vtables for `RemoteLayer` which is why the comparison failed in #3557. This is a known issue in rust, clippy warns against it and rust std might be moving to the solution which has been reproduced on this PR: compare only object pointers by "casting out" the vtable pointer.	2023-02-08 12:25:25 +00:00
Sergey Melnikov	c5c14368e3	Fix deploy-prod.yml syntax (#3556 )	2023-02-07 15:27:31 +01:00
Sergey Melnikov	1254dc7ee2	Fix production deploy: run as root to access docker (#3555 )	2023-02-07 15:21:15 +01:00
Joonas Koivunen	fcb905f519	Use LayerMap::replace in eviction (#3544 ) Follow-up to #3536, to actually use the new `Debug` in replacing the layers, and use replacement with manual eviction endpoint. Turns out the two paths share a lot of handling of `Replacement` but didn't unify the two (need 3). There are also upcoming refactorings from other PRs to this.	2023-02-07 11:08:55 +02:00
Christian Schwarz	58fa4f0eb7	maintain access stats for historic layers This patch adds basic access statistics for historic layers and exposes them in the management API's `LayerMapInfo`. We record the accesses in the `{Delta,Image}Layer::load()` function because it's the common path of * page_service (`Timline::get_reconstruct_data()`) * Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`) The stats survive residence status changes, and record these as well. When scraping the layer map endpoint to record its evolution over time, one must account for stat resets because they are in-memory only and will reset on pageserver restart. Use the launch timestamp header added by (#3527) to identify pageserver restarts. This is PR https://github.com/neondatabase/neon/pull/3496	2023-02-06 17:01:38 +01:00
Anastasia Lubennikova	877a2d70e3	Periodically send cached consumption metrics (#3520 ) Add new pageserver config setting `cached_metric_collection_interval` with default `1 hour`. This setting controls how often unchanged cached consumption metrics are sent to the HTTP endpoint. This is a workaround for billing service limitations. fixes #3485	2023-02-06 17:53:10 +02:00
Sergey Melnikov	959f5c6f40	Do not deploy legacy scram proxy (*.cloud.neon.tech) to the old account (#3546 ) We have migrated to the new proxy, which was setup in https://github.com/neondatabase/neon/pull/3461	2023-02-06 15:51:20 +01:00
Joonas Koivunen	678fe0684f	std::fmt::Debug for Layer implementations (#3536 ) Follow-up to #3513. This removes the old blanket `std::fmt::Debug` impl on `dyn Layer` which did not seem to be used from anywhere (no compilation errors after removing). Adds `std::fmt::Debug` requirement and implementations for `trait Layer` implementors: - LayerDescriptor (derived) - RemoteLayer (manual) - DeltaLayer (manual) - ImageLayer (manual) Manual implementations are used to skip PageserverConf, tenant and timeline ids, large collections. Adds and adjusts some doc comments to be more rustdoc alike.	2023-02-06 14:21:51 +02:00
Shany Pozin	c9821f13e0	Expose the tenant calculated synthetic size as a Prometheus metric (#3541 ) ## Describe your changes Expose the currently calculated synthetic size as a Prometheus metric ## Issue ticket number and link #3509 ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-06 09:25:15 +02:00
Heikki Linnakangas	121d535068	Add a test for the PostgresNIO Swift client library. It supports SNI. See discussion at https://community.neon.tech/t/postgresnio-swift-sni-support/419/4	2023-02-04 15:56:11 +01:00
Kirill Bulatov	ec3a3aed37	Dump current tenant config (#3534 ) The PR adds an endpoint to show tenant's current config: `GET /v1/tenant/:tenant_id/config` Tenant's config consists of two parts: tenant overrides (could be changed via other management API requests) and the default part, substituting all missing overrides (constant, hardcoded in pageserver). The API returns the custom overrides and the final tenant config, after applying all the defaults. Along the way, it had to fix two things in the config: * allow to shorten the json version and omit all `null`'s (same as toml serializer behaves by default), and to understand such shortened format when deserialized. A unit test is added * fix a bug, when `PUT /v1/tenant/config` endpoint rewritten the local file with what had came in the request, but updating (not rewriting the old values) the in-memory state instead. That got uncovered during adjusting the e2e test and fixed to do the replacement everywhere, otherwise there's no way to revert existing overrides. Fixes #3471 (commit `dc688affe8`) * fixes https://github.com/neondatabase/neon/issues/3472 by reordering the config saving operations	2023-02-04 01:32:29 +02:00
Christian Schwarz	87cd2bae77	introduce LaunchTimestamp to identify process restarts This patch adds a LaunchTimestamp type to the `metrics` crate, along with a `libmetric_` Prometheus metric. The initial user is pageserver. In addition to exposing the Prometheus metric, it also reproduces the launch timestamp as a header in the API responses. The motivation for this is that we plan to scrape the pageserver's /v1/tenant/:tenant_id/timeline/:timeline_id/layer HTTP endpoint over time. It will soon expose access metrics (#3496) which reset upon process restart. We will use the pageserver's launch ID to identify a restart between two scrape points. However, there are other potential uses. For example, we could use the Prometheus metric to annotate Grafana plots whenever the launch timestamp changes.	2023-02-03 18:12:17 +01:00
bojanserafimov	be81db21b9	Revert accidental change (#3538 )	2023-02-03 17:54:12 +02:00
Joonas Koivunen	f2d89761c2	feat: LayerMap::replace (#3513 ) Cc: #3486 Adds a method to replace a particular layer from the LayerMap for the purposes of remote layer download and layer eviction. In those use cases read lock on layer map needs to be released after initial search, but other operations could modify layermap before replacing thread gets to run. Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-02-03 15:33:46 +02:00
Sergey Melnikov	a0372158a0	Add build_info metric to storage broker (#3525 ) ## Describe your changes Add libmetrics_build_info metrics with commit sha to storage_broken /metrics, to match behaviour of proxy, pageserver and safekeeper.	2023-02-03 12:23:27 +01:00
Anastasia Lubennikova	83048a4adc	Handle errors during metric collection. (#3521 ) Don't exit the loop if one of the tenants failed to scrape its metrics. fixes #3490	2023-02-03 12:37:34 +02:00
Konstantin Knizhnik	f71b1b174d	Check correctness of file_cache_size_limit (#3530 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-03 08:01:26 +02:00
Dmitry Ivanov	96e78394f5	[proxy] Fix project (aka endpoint) init in the password hack handler (#3529 ) The project/endpoint should be set in the original (non-as_ref'd) creds, because we call `wake_compute` not only in `try_password_hack` but also later in the connection retry logic. This PR also removes the obsolete `as_ref` method and makes the code simpler because we no longer need this complication after a recent refactoring. Further action points: finally introduce typestate in creds (planned).	2023-02-02 22:56:15 +02:00
bojanserafimov	ada933eb42	Pageserver read trace utils (#2795 ) List, dump, and analyze read traces.	2023-02-02 15:33:40 -05:00

... 24 25 26 27 28 ...

4041 Commits