rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 12:40:37 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	81425123ee	wip: deserialize and serialize nice bytes this is nice, but fails in tests. looks like: { "Finished": { "before": { "config": { "evict_bytes": "92.23MiB" }, "freed_bytes": "0B" }, "planned": { "respecting_tenant_min_resident_size": { "config": { "evict_bytes": "92.23MiB" }, "freed_bytes": "95.28MiB" }, "fallback_to_global_lru": null }, "assumed": { "projected_after": { "config": { "evict_bytes": "92.23MiB" }, "freed_bytes": "95.28MiB" }, "failed": { "file_sizes": 0, "count": 0 } } } }	2023-03-21 12:10:34 +02:00
Joonas Koivunen	d364f5936b	refactor: ubyte has From<usize>	2023-03-21 12:10:03 +02:00
Christian Schwarz	ae1b2f78b3	chore: cargo fmt	2023-03-20 17:34:07 +01:00
Christian Schwarz	834e94f5e3	fix: rebase fallout	2023-03-20 17:24:46 +01:00
Christian Schwarz	751c93c8f7	fix: http handler: use RequestSpan	2023-03-20 17:18:15 +01:00
Christian Schwarz	68b3e68642	refactor: PageserverHttpClient method for breaking tenant from tests	2023-03-20 17:18:15 +01:00
Christian Schwarz	4aa528ce45	chore: python format	2023-03-20 16:51:11 +01:00
Christian Schwarz	bdd502aff7	refactor: handler: re-use `Config` struct in `Usage` struct	2023-03-20 16:51:11 +01:00
Christian Schwarz	5015194b40	refactor: rename wanted_trimmed_bytes to evict_bytes	2023-03-20 16:51:11 +01:00
Christian Schwarz	5aef192bf2	disk-usage-based layer eviction This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. If the above doesn't relieve enough pressure, we fall back to Global LRU. In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console. Co-authored-by: Joonas Koivunen <joonas@neon.tech> fixes https://github.com/neondatabase/neon/issues/3728	2023-03-20 16:51:09 +01:00
Christian Schwarz	881356c417	add metrics to detect eviction-induced thrashing (#3837 ) This patch adds two metrics that will enable us to detect thrashing of layers, i.e., repetitions of `eviction, on-demand-download, eviction, ... ` for a given layer. The first metric counts all layer evictions per timeline. It requires no further explanation. The second metric counts the layer evictions where the layer was resident for less than a given threshold. We can alert on increments to the second metric. The first metric will serve as a baseline, and further, it's generally interesting, outside of thrashing. The second metric's threshold is configurable in PageServerConf and defaults to 24h. The threshold value is reproduced as a label in the metric because the counter's value is semantically tied to that threshold. Since changes to the config and hence the label value are infrequent, this will have low storage overhead in the metrics storage. The data source to determine the time that the layer was resident is the file's `mtime`. Using `mtime` is more of a crutch. It would be better if Pageserver did its own persistent bookkeeping of residence change events instead of relying on the filesystem. We had some discussion about this: https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900 My position is that `mtime` is good enough for now. It can theoretically jump forward if someone copies files without resetting `mtime`. But that shouldn't happen in practice. Note that moving files back and forth doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`, which is typically used for filesystem-level backup / restore, correctly syncs `mtime`. I've added a label that identifies the data source to keep options open for a future, better data source than `mtime`. Since this value will stay the same for the time being, it's not a problem for metrics storage. refs https://github.com/neondatabase/neon/issues/3728	2023-03-20 16:11:36 +01:00
Heikki Linnakangas	fea4b5f551	Switch to EdDSA algorithm for the storage JWT authentication tokens. The control plane currently only supports EdDSA. We need to either teach the storage to use EdDSA, or the control plane to use RSA. EdDSA is more modern, so let's use that. We could support both, but it would require a little more code and tests, and we don't really need the flexibility since we control both sides.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	77107607f3	Allow JWT key generation to fail if authentication is not enabled. This allows you to run without the 'openssl' binary as long as you don't enable authentication. This becomes more important with the next commit, which switches the JWT algorithm to EdDSA. LibreSSL does not support EdDSA, and LibreSSL comes with macOS, so the next commit makes it much more likely for the key generation to fail for macOS users. To allow running without a keypair, don't generate the authentication token in the 'neon_local init' step. Instead, generate a new token on every request that needs one, using the private key.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	1da963b2f9	Remove some unused code in control plane.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	1ddb9249aa	Reduce the # of histogram buckets in metrics. (#3850 ) Shrinks the total number of metrics collected for each timeline by about 50%. See https://github.com/neondatabase/neon/issues/2848. This doesn't fully solve the problem, we still collect a lot of metrics even with this, but this gives us a lot of headroom.	2023-03-20 15:49:16 +02:00
Joonas Koivunen	0c1228c37a	feat: store initial timeline in env fixture (#3839 ) minor change, but will allow more use in future for the default tenants. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-03-20 11:57:27 +02:00
Christian Schwarz	3c15874c48	allow specifying eviction_policy in TenantCreateRequest This was on oversight from `175a577ad4`. Nothing uses this AFAIK, but, let's fix it anyways. Noticed while working on https://github.com/neondatabase/neon/issues/3728	2023-03-20 10:43:53 +01:00
Shany Pozin	93f3f4ab5f	Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818 ) ## Describe your changes Add Error enum for tenant state response to allow better error handling in mgmt api ## Issue ticket number and link #2238 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-19 10:44:42 +02:00
sharnoff	f6e2e0042d	Fix + re-enable VM cgroup creation + running (#3820 ) Re-enable cgroup shenanigans in VMs, with some special care taken to make sure that our version of cgroup-tools supports cgroup v2 (debian bullseye does not, and probably won't because it requires a breaking change in libcgroup). This involves manually building libcgroup / cgroup-tools from source, then copying the output into the final build stage. We originally considered pulling the package from debian's testing repo (which is up-to-date), but decided against it. Refer to the PR for more details. Prior work, for reference: * `2153d2e0` - Run compute_ctl in a cgroup in VMs * `1360361f` - Fix missing VM cgconfig.conf * `8dae8799` - Disable VM cgroup shenanigans	2023-03-16 17:09:45 -07:00
Christian Schwarz	b917270c67	remove unused TenantConfig::update function	2023-03-16 16:25:35 +01:00
Arthur Petukhovsky	b067378d0d	Measure cross-AZ traffic in safekeepers (#3806 ) Create `safekeeper_pg_io_bytes_total` metric to track total amount of bytes written/read in a postgres connections to safekeepers. This metric has the following labels: - `client_az` – availability zone of the connection initiator, or `"unknown"` - `sk_az` – availability zone of the safekeeper, or `"unknown"` - `app_name` – `application_name` of the postgres client - `dir` – data direction, either `"read"` or `"write"` - `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from `client_az` and `sk_az`, exists purely for convenience. This is implemented by passing availability zone in the connection string, like this: `-c tenant_id=AAA timeline_id=BBB availability-zone=AZ-1`. Update ansible deployment scripts to add availability_zone argument to safekeeper and pageserver in systemd service files.	2023-03-16 17:24:01 +03:00
Joonas Koivunen	768c8d9972	test: allow gc to get unlucky (#3826 ) this failure case was probably introduced by `b220ba6`, because earlier the gc would always have run fast enough for restart every 1s. however, test got added later, so we have just been lucky. fixes #3824 by allowing this error to happen.	2023-03-16 11:03:21 +02:00
Rahul Patil	f1d960d2c2	Add new pageserver to eu-central-1 (#3829 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-15 16:58:28 +01:00
Arthur Petukhovsky	cd17802b1f	Add pg_clients test for serverless driver (#3827 ) Fixes #3819	2023-03-15 16:18:38 +03:00
Heikki Linnakangas	10a5d36af8	Separate mgmt and libpq authentication configs in pageserver. (#3773 ) This makes it possible to enable authentication only for the mgmt HTTP API or the compute API. The HTTP API doesn't need to be directly accessible from compute nodes, and it can be secured through network policies. This also allows rolling out authentication in a piecemeal fashion.	2023-03-15 13:52:29 +02:00
Arseny Sher	a7ab53c80c	Forward framed read buf contents to compute before proxy pass. Otherwise they get lost. Normally buffer is empty before proxy pass, but this is not the case with pipeline mode of out npm driver; fixes connection hangup introduced by `b80fe41af3` for it. fixes https://github.com/neondatabase/neon/issues/3822	2023-03-15 14:32:41 +03:00
Heikki Linnakangas	2672fd09d8	Make test independent of the order of config lines.	2023-03-14 20:10:34 +02:00
Heikki Linnakangas	4a92799f24	Fix check for trailing garbage in basebackup import. There was a warning for trailing garbage after end-of-tar archive, but it didn't always work. The reason is that we created a StreamReader over the original copyin-stream, but performed the check for garbage on the copyin-stream. There could be some garbage bytes buffered in the StreamReader, which were not caught by the warning. I considered turning the the warning into a fatal error, aborting the import, but I wasn't sure if we handle aborting the import properly. Do we clean up the timeline directory on error? If we don't, we should make that more robust, but that's a different story. Also, normally a valid tar archive ends with two 512-byte blocks of zeros. The tokio_tar crate stops at the first all-zeros block. Read and check the second all-zeros block, and error out if it's not there, or contains something unexpected.	2023-03-14 19:45:33 +02:00
Konstantin Knizhnik	5396273541	Avoid holes between generated image layers (#3771 ) ## Describe your changes When we perform partitioning of the whole key space, we take in account actual ranges of relation present in the database. So if we have relation with relid=1 and size 100 and relation with relid=2 with size 200 then result of KeySpace::partition may contain partitions <100000000..100000099> and <200000000..200000199>. Generated image layers will contain the same boundaries. But when GC is checking image coverage to find out of old layer is fully covered by newer image layer and so can be deleted, it takes in account only full key range. I.e. if there is delta layer <100000000..300000000> then it never be garbage collected because image layers <100000000..100000099> and <200000000..200000199> are not completely covering it.This is how it looks in practice: 000000067F000032AC00000A300000000000-000000067F000032AC00000A330000000000__000000000F761828 000000067F000032AC00000A31000000001F-000000067F000032AC00000A620000000005__0000000001696070-000000000442A551 000000067F000032AC00000A3300FFFFFFFF-000000067F000032AC00000A650100000000__000000000F761828 So there are two image layers covering delta layer but ... there is a hole: A330000000000...A3300FFFFFFFF and as a result delta layer is not collected. ## Issue ticket number and link This PR is deeply related with #3673 because it is addressing the same problem: old layers are not utilized by GC. The test test_gc_old_layers.py in #3673 can be used to see effect of this patch. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-03-14 16:15:07 +02:00
Joonas Koivunen	c23c8946a3	chore: clippies introduced with rust 1.68 (#3781 ) - handle automatically fixable future clippies - tune run-clippy.sh to remove macos specifics which we no longer have Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-03-14 15:29:02 +02:00
Joonas Koivunen	15b692ccc9	test: more strict finding of WARN, ERROR lines (#3798 ) this prevents flakyness when `WARN\|ERROR` appears in some other part of the line, for example in a random filename.	2023-03-14 15:27:52 +02:00
Alexander Bayandin	3d869cbcde	Replace flake8 and isort with ruff (#3810 ) - Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort - Update mypy and black	2023-03-14 13:25:44 +00:00
Lassi Pölönen	68ae020b37	Use RollingUpdate strategy also for legacy proxy (#3814 ) ## Describe your changes We have previously changed the neon-proxy to use RollingUpdate. This should be enabled in legacy proxy too in order to avoid breaking connections for the clients and allow for example backups to run even during deployment. (https://github.com/neondatabase/neon/pull/3683) ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-03-14 13:23:46 +00:00
Joonas Koivunen	d6bb8caad4	refactor: correct return value for not found L0's on LayerMap::replace (#3805 ) in prev implementation, an `ok_or_else(...)?` is used to cause a "precondition error" on LayerMap::replace, however we only see this particular error if an L0 for which replace fails is not in the layermap because it is not in `l0_delta_layers`. changes or fixes this to be Replacement::NotFound instead, making it more clear that an error would only be raised for actual preconditions, like trying to replace layer with completly unrelated layer.	2023-03-14 13:18:26 +02:00
Alexander Bayandin	319402fc74	postgres_ffi: restore POSTGRES_INSTALL_DIR support (#3811 ) Fix path construction to `pg_config`: `pg_install_dir_versioned` already includes `pg_version`	2023-03-14 10:25:48 +00:00
Dmitry Ivanov	2e4bf7cee4	[proxy] Immediately log all compute node connection errors.	2023-03-14 01:45:57 +03:00
Dmitry Ivanov	15ed6af5f2	Add descriptions to proxy's python tests.	2023-03-14 01:32:37 +03:00
Dmitry Rodionov	50476a7cc7	test: update to match current interfaces	2023-03-13 17:50:10 +02:00
andres	d7ab69f303	add test for getting branchpoints from an inactive timeline	2023-03-13 17:50:10 +02:00
sharnoff	582620274a	Enable file cache handling by vm-informant (#3794 ) Enables the VM informant's file cache integration. See also: https://github.com/neondatabase/autoscaling/pull/47	2023-03-13 07:16:39 -07:00
Vadim Kharitonov	daeaa767c4	Add `neondatabase/release` team as a default reviewers for storage releases	2023-03-13 13:40:15 +01:00
Konstantin Knizhnik	f0573f5991	Remove block cursor cache (#3740 ) ## Describe your changes Do not pin current block in BlockCursor ## Issue ticket number and link See #3712 There are places (see get_reconstruct_data) in our code when thread is holding read layers lock and then try to read file and so lock page cache slot. So we have edge in dependency graph layers->page cache slot. At the same time (as Christian noticed) we can lock page cache slot in BlockCursor and then try obtain shard lock on layers. So there is backward edge in dependency graph page cache slot>layers which forms loop and may cause deadlock. There are three possible fixes of the problem: 1. Perform compaction under `layers` shared lock. See PR #3732. It fixes the problem but make it not possible to append any data to pageserver until compaction is completed. 2. Do not hold `layers` lock while accessing layers (not sure if it is possible to do because it definitely introduce some new race conditions). 3. Do not pin current pages in BockCursor (this PR). My experiments shows that this cache in BlockCursor is not so useful: the number of hits/misses for cursor cache on pgbench workload (-i -s 10/-c 10 -T 100/-c 10 -S -T 100): ``` hits: 163011 misses: 1023602 ``` So number of cache misses is 10x times larger. And results for read-only pgbench are mostly the same: ``` with cache: 14581 w/out cache: 14429 ``` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-13 14:07:35 +02:00
Nikita Kalyanov	07dcf679de	set content type explicitly (#3799 ) I moved management API v2 to ogen and the generated code seems to be more strict about content type. Let's set it properly as it is json after all ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-13 14:00:01 +02:00
Heikki Linnakangas	e0ee138a8b	Add a test for tokio-postgres client to the driver test suite. It is fully supported. To enable TLS, though, it requires some extra glue code, and a dependency to a TLS library.	2023-03-13 12:14:37 +02:00
Arthur Petukhovsky	d9a1329834	Make postgres_backend use generic IO type (#3789 ) - Support measuring inbound and outbound traffic in MeasuredStream - Start using MeasuredStream in safekeepers code	2023-03-13 12:18:10 +03:00
Joonas Koivunen	8699342249	Ondemand rx bytes and layer count (#3777 ) Adds two new global metrics: - pageserver_remote_ondemand_downloaded_layers_total - pageserver_remote_ondemand_downloaded_bytes_total An existing test is repurposed once more to check that we do get some reasonable counts. These are to replace guessing from the nic RX bytes metric how much was on-demand downloaded. First part of #3745: This does not add the "(un)?avoidable" metric, which I plan to add as a new metric, which will be a subset of the counts of the metrics added here.	2023-03-13 09:26:49 +02:00
Joonas Koivunen	ce8fbbd910	Fix allowed error again (#3790 ) Fixes #3360 again, this time checking all other "Error processing HTTP request" messages and aligning the regex with the two others.	2023-03-10 17:44:12 +00:00
Vadim Kharitonov	1401021b21	Be able to get number of CPUs (#3774 ) After enabling autoscaling, we faced the issue that customers are not able to get the number of CPUs they use at this moment. Therefore I've added these two options: 1. Postgresql function to allow customers to call it whenever they want 2. `compute_ctl` endpoint to show these number in console	2023-03-10 19:00:20 +02:00
Stas Kelvich	252b3685a2	Use `unsafe-postgres` feature to build pgx extension Recently added `unsafe-postgres` feature allows to build pgx extensions against postgres forks that decided to change their ABI name (like us). With that we can build extensions without forking them and using stock pgx. As this feature is new few manual version bumps were required.	2023-03-10 17:40:45 +02:00
Heikki Linnakangas	34d3385b2e	Add unit tests for JWT encoding and decoding.	2023-03-10 16:09:32 +02:00

1 2 3 4 5 ...

2940 Commits