rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 01:12:56 +00:00

Author	SHA1	Message	Date
Gleb Novikov	9db70f6232	Added disk_size and instance_type to payload (#3918 ) ## Describe your changes In https://github.com/neondatabase/cloud/issues/4354 we are making scheduling of projects based on available disk space and overcommit, so we need to know disk size and just in case instance type of the pageserver ## Issue ticket number and link https://github.com/neondatabase/cloud/issues/4354 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] ~If it is a core feature, I have added thorough tests.~ - [ ] ~Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?~ - [ ] ~If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.~	2023-04-06 14:02:56 +04:00
Christian Schwarz	45bf76eb05	enable layer eviction by default in prod (#3933 ) Leave disk_usage_based_eviction above the current max usage in prod (82%ish), so that deploying this commit won't trigger disk_usage_based_eviction. As indicated in the TODO, we'll decrease the value to 80% later. Also update the staging YAMLs to use the anchor syntax for `evictions_low_residence_duration_metric_threshold` like we do in the prod YAMLs as of this patch.	2023-04-03 14:57:36 +02:00
Joonas Koivunen	cf5cfe6d71	fix: metric used for alerting threshold on staging (#3932 ) This should remove the too eager alerts from staging.	2023-04-03 13:26:45 +03:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Arthur Petukhovsky	b067378d0d	Measure cross-AZ traffic in safekeepers (#3806 ) Create `safekeeper_pg_io_bytes_total` metric to track total amount of bytes written/read in a postgres connections to safekeepers. This metric has the following labels: - `client_az` – availability zone of the connection initiator, or `"unknown"` - `sk_az` – availability zone of the safekeeper, or `"unknown"` - `app_name` – `application_name` of the postgres client - `dir` – data direction, either `"read"` or `"write"` - `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from `client_az` and `sk_az`, exists purely for convenience. This is implemented by passing availability zone in the connection string, like this: `-c tenant_id=AAA timeline_id=BBB availability-zone=AZ-1`. Update ansible deployment scripts to add availability_zone argument to safekeeper and pageserver in systemd service files.	2023-03-16 17:24:01 +03:00
Rahul Patil	f1d960d2c2	Add new pageserver to eu-central-1 (#3829 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-15 16:58:28 +01:00
Nikita Kalyanov	07dcf679de	set content type explicitly (#3799 ) I moved management API v2 to ogen and the generated code seems to be more strict about content type. Let's set it properly as it is json after all ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-13 14:00:01 +02:00
Sergey Melnikov	9f906ff236	Add pageserver-2.us-east-2.aws.neon.tech (#3701 )	2023-02-23 19:56:21 +01:00
Arthur Petukhovsky	95018672fa	Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671 ) We migrated all timelines to `safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be removed.	2023-02-22 11:55:41 +02:00
Sergey Melnikov	e3d75879c0	Use fqdn to access console management API on production (#3651 ) console-release.local is legacy manual CNAME to neon-internal-api.aws.neon.tech in r53 We could use neon-internal-api.aws.neon.tech name directly This already was deployed to staging in https://github.com/neondatabase/neon/pull/3642	2023-02-20 18:11:06 +01:00
Sergey Melnikov	d5d690c044	Use fqdn for staging console management API (#3642 ) `console-staging.local` is legacy manual CNAME to `neon-internal-api.aws.neon.build` in r53 We could use `neon-internal-api.aws.neon.build` name directly	2023-02-20 16:05:21 +01:00
Arthur Petukhovsky	8f557477c6	Add new safekeeper to ap-southeast-1 prod (#3645 )	2023-02-20 17:51:27 +03:00
Christian Schwarz	8d28a24b26	staging: enable automatic layer eviction at 20m threshold + period (#3636 ) What it says on the tin. Part of #2476	2023-02-17 18:32:01 +02:00
Sergey Melnikov	a1b062123b	Do not deploy storage to old account (#3630 ) It's gone	2023-02-16 20:28:53 +00:00
Sergey Melnikov	eb21d9969d	Add pageserver-3.us-west-2.aws.neon.tech (#3603 )	2023-02-14 12:56:03 +01:00
Rory de Zoete	1b9e5e84aa	Add new storage hosts for placement group test (#3561 ) To test the placement group setup	2023-02-08 16:48:29 +01:00
Sergey Melnikov	aabca55d7e	Migrate update version to management APIv2 (#3430 )	2023-01-24 17:18:16 +01:00
Shany Pozin	a44e5eda14	Adding pageserver3 to staging (#3403 )	2023-01-23 14:08:48 +01:00
Vadim Kharitonov	e59d32ac5d	Change SENTRY_ENVIRONMENT from "development" to "staging"	2023-01-18 16:34:49 +01:00
Anastasia Lubennikova	506086a3e2	Fix metric_collection_endpoint for prod. It was incorrectly set to staging url	2023-01-18 16:35:43 +02:00
Anastasia Lubennikova	63e3b815a2	Enable metric_collection_endpoint for pageserver on prod in all regions	2023-01-17 13:43:50 +02:00
Sergey Melnikov	4c4d3dc87a	Add new pageserver to us-east-2 staging (#3248 )	2023-01-02 22:14:05 +04:00
Dmitry Rodionov	7c7d225d98	add pageserver to new region see https://github.com/neondatabase/aws/pull/116	2022-12-29 20:29:42 +03:00
Anastasia Lubennikova	1468c65ffb	Enable billing metric_collection_endpoint on staging	2022-12-23 18:04:45 +02:00
Sergey Melnikov	c01f92c081	Fully remove old staging deploy (#3191 )	2022-12-22 20:09:45 +01:00
Arthur Petukhovsky	72ab104733	Move zenith-1-sk-3 to zenith-1-sk-4 (#3164 )	2022-12-22 15:21:53 +00:00
Sergey Melnikov	cd7fdf2587	Remove neon-stress configs (#3121 )	2022-12-20 12:03:42 +01:00
Arseny Sher	70ce01d84d	Deploy broker with L4 LB in new env. (#3125 ) Seems to be fixing issue with missing keepalives.	2022-12-15 22:42:30 +01:00
Arseny Sher	f8ab5ef3b5	Update broker endpoint for prod-us-west-2. (#3095 )	2022-12-14 12:58:12 +01:00
Sergey Melnikov	228f9e4322	Use default folder for ansible collections (#3092 )	2022-12-13 23:59:49 +01:00
Vadim Kharitonov	0bc488b723	Add sentry environment for pageserver and safekeepers in new region (us-west-2)	2022-12-13 16:26:28 +01:00
Vadim Kharitonov	4603a4cbb5	Bypass SENTRY_ENVIRONMENT variable in order to filter panics in sentry by environment.	2022-12-13 14:52:04 +01:00
Sergey Melnikov	e5d523c86a	Add new us-west-2 region (#3071 )	2022-12-13 14:11:40 +01:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Nikita Kalyanov	634d0eab68	pass availability zone to console during pageserver registration (#2991 ) this is safe because unknown fields are ignored. After the corresponding PR in control plane is merged this field is going to be required Part of https://github.com/neondatabase/cloud/issues/3131	2022-12-06 21:09:54 +02:00
Kliment Serafimov	8f2b3cbded	Sentry integration for storage. (#2926 ) Added basic instrumentation to integrate sentry with the proxy, pageserver, and safekeeper processes. Currently in sentry there are three projects, one for each process. Sentry url is sent to all three processes separately via cli args.	2022-12-06 18:57:54 +00:00
Arseny Sher	0b0cb77da4	Fix deploy after `2d42f84389`.	2022-11-24 20:07:41 +04:00
Arseny Sher	2d42f84389	Add storage_broker binary. Which ought to replace etcd. This patch only adds the binary and adjusts Dockerfile to include it; subsequent ones will add deploy of helm chart and the actual replacement. It is a simple and fast pub-sub message bus. In this patch only safekeeper message is supported, but others can be easily added. Compilation now requires protoc to be installed. Installing protobuf-compiler package is fine for Debian/Ubuntu. ref https://github.com/neondatabase/neon/pull/2733 https://github.com/neondatabase/neon/issues/2394	2022-11-23 22:05:59 +04:00
Sergey Melnikov	85f0975c5a	Setup eu-west-1 as region for PR testing (#2757 )	2022-11-23 10:54:39 +01:00
Heikki Linnakangas	e9f4ca5972	Remove references to obsolete files in .gitignore	2022-11-23 00:40:17 +02:00
Sergey Melnikov	74ec36a1bf	Add pageserver-1.us-east-2.aws.neon.build (#2881 )	2022-11-22 10:55:02 +01:00
Sergey Melnikov	aca221ac8b	Switch old staging to new etcd (#2834 )	2022-11-16 16:54:55 +04:00
Sergey Melnikov	ac3ccac56c	Add zenith-1-ps-4 and zenith-1-ps-5 (#2815 )	2022-11-16 11:25:24 +04:00
Sergey Melnikov	e86a9105a4	Deploy storage to new prod regions (#2709 )	2022-10-28 10:17:27 +00:00
Sergey Melnikov	6ff2c61ae0	Refactor safekeeper s3 config and change it for new account (#2672 )	2022-10-21 13:44:08 +00:00
Sergey Melnikov	546e9bdbec	Deploy storage into new account and migrate to management API v2 (#2619 ) Deploy storage into new account Migrate safekeeper and pageserver initialisation to management api v2	2022-10-18 15:52:15 +03:00
Lassi Pölönen	5d6553d41d	Fix pageserver configuration generation bug (#2584 ) * We had an issue with `lineinfile` usage for pageserver configuration file: if the S3 bucket related values were changed, it would have resulted in duplicate keys, resulting in invalid toml. So to fix the issue, we should keep the configuration in structured format (yaml in this case) so we can always generate syntactically correct toml. Inventories are converted to yaml just so that it's easier to maintain the configuration there. Another alternative would have been a separate variable files. * Keep the ansible collections dir, but locally installed collections should not be tracked.	2022-10-16 11:37:10 +00:00
Sergey Melnikov	241e549757	Switch neon-stress etcd to dedicatd instance (#2602 )	2022-10-10 22:07:19 +00:00
Kirill Bulatov	13f0e7a5b4	Deploy pageserver_binutils to the envs	2022-10-09 08:21:11 +03:00
Stas Kelvich	367cc01290	Fix deploy paths	2022-09-26 10:08:01 +03:00

1 2

67 Commits