rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 22:12:56 +00:00

Author	SHA1	Message	Date
Christian Schwarz	45bf76eb05	enable layer eviction by default in prod (#3933 ) Leave disk_usage_based_eviction above the current max usage in prod (82%ish), so that deploying this commit won't trigger disk_usage_based_eviction. As indicated in the TODO, we'll decrease the value to 80% later. Also update the staging YAMLs to use the anchor syntax for `evictions_low_residence_duration_metric_threshold` like we do in the prod YAMLs as of this patch.	2023-04-03 14:57:36 +02:00
Joonas Koivunen	cf5cfe6d71	fix: metric used for alerting threshold on staging (#3932 ) This should remove the too eager alerts from staging.	2023-04-03 13:26:45 +03:00
Alexander Bayandin	75ffe34b17	check-macos-build: fix cache key (#3926 ) We don't have `${{ matrix.build_type }}` there, so it gets resolved to an empty substring and looks like this [`v1-macOS--pg-f8a650e49b06d39ad131b860117504044b01f312-dcccd010ff851b9f72bb451f28243fa3a341f07028034bbb46ea802413b36d80`](https://github.com/neondatabase/neon/actions/runs/4575422427/jobs/8078231907#step:26:2)	2023-03-31 21:45:59 +03:00
Dmitry Rodionov	22f9ea5fe2	Remind people to clean up merge commit message in PR template (#3920 )	2023-03-31 16:11:34 +03:00
Joonas Koivunen	d0711d0896	build: fix git perms for deploy job (#3921 ) copy pasted from `build-neon` job. it is interesting that this is only needed by `build-neon` and `deploy`. Fixes: https://github.com/neondatabase/neon/actions/runs/4568077915/jobs/8070960178 which seems to have been going for a while.	2023-03-31 16:05:15 +03:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Lassi Pölönen	1c1bb904ed	Rename zenith_* labels to neon_* (#3911 ) ## Describe your changes Get rid of the legacy labeling. Aslo `neon_region_slug` with the same value as `neon_region` doesn't make much sense, so just drop it. This allows us to drop the relabeling from zenith to neon in the log collector.	2023-03-30 16:24:47 +03:00
Kirill Bulatov	9d714a8413	Split $CARGO_FLAGS and $CARGO_FEATURES to make e2e tests work	2023-03-29 00:08:30 +03:00
Kirill Bulatov	6c84cbbb58	Run new Rust IT test in CI	2023-03-29 00:08:30 +03:00
Vadim Kharitonov	e3cbcc2ea7	Revert "Add `neondatabase/release` team as a default reviewers for storage" This reverts commit `daeaa767c4`.	2023-03-27 14:10:18 +04:00
Shany Pozin	809acb5fa9	Move neon-image-depot to a larger runner (#3860 ) ## Describe your changes https://neondb.slack.com/archives/C039YKBRZB4/p1679413279637059 ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-21 19:32:36 +02:00
Arthur Petukhovsky	b067378d0d	Measure cross-AZ traffic in safekeepers (#3806 ) Create `safekeeper_pg_io_bytes_total` metric to track total amount of bytes written/read in a postgres connections to safekeepers. This metric has the following labels: - `client_az` – availability zone of the connection initiator, or `"unknown"` - `sk_az` – availability zone of the safekeeper, or `"unknown"` - `app_name` – `application_name` of the postgres client - `dir` – data direction, either `"read"` or `"write"` - `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from `client_az` and `sk_az`, exists purely for convenience. This is implemented by passing availability zone in the connection string, like this: `-c tenant_id=AAA timeline_id=BBB availability-zone=AZ-1`. Update ansible deployment scripts to add availability_zone argument to safekeeper and pageserver in systemd service files.	2023-03-16 17:24:01 +03:00
Rahul Patil	f1d960d2c2	Add new pageserver to eu-central-1 (#3829 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-15 16:58:28 +01:00
Alexander Bayandin	3d869cbcde	Replace flake8 and isort with ruff (#3810 ) - Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort - Update mypy and black	2023-03-14 13:25:44 +00:00
Lassi Pölönen	68ae020b37	Use RollingUpdate strategy also for legacy proxy (#3814 ) ## Describe your changes We have previously changed the neon-proxy to use RollingUpdate. This should be enabled in legacy proxy too in order to avoid breaking connections for the clients and allow for example backups to run even during deployment. (https://github.com/neondatabase/neon/pull/3683) ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-03-14 13:23:46 +00:00
Vadim Kharitonov	daeaa767c4	Add `neondatabase/release` team as a default reviewers for storage releases	2023-03-13 13:40:15 +01:00
Nikita Kalyanov	07dcf679de	set content type explicitly (#3799 ) I moved management API v2 to ogen and the generated code seems to be more strict about content type. Let's set it properly as it is json after all ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-13 14:00:01 +02:00
Rory de Zoete	3c4f5af1b9	Try depot.dev for image building (#3768 ) To see if it is faster. Run side-by-side for a while so we can gather enough data.	2023-03-10 11:11:39 +01:00
Sergey Melnikov	9f906ff236	Add pageserver-2.us-east-2.aws.neon.tech (#3701 )	2023-02-23 19:56:21 +01:00
Lassi Pölönen	b0311cfdeb	Change the production neon-proxy-scram update strategy to RollingUpdate (#3683 ) ## Describe your changes The same change in production as was done in staging by https://github.com/neondatabase/neon/pull/3678 ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-02-22 20:15:37 +02:00
Lassi Pölönen	965b4f4ae2	Change the staging neon-proxy-scram update strategy to RollingUpdate (#3678 ) ## Describe your changes When we deploy the proxy with the default Recreate strategy, there's always some downtime and existing connections will be shut down. Change the strategy to RollingUpdate and delay the kill signal by one week. AWS Network Loadbalancer keeps the existing connections alive for as long as the pods are alive, but will direct new connections to new pods. ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-02-22 16:50:07 +02:00
Arthur Petukhovsky	95018672fa	Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671 ) We migrated all timelines to `safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be removed.	2023-02-22 11:55:41 +02:00
Sergey Melnikov	2caece2077	Add -v to ansible invocations (#3670 ) To get more debug output on failures	2023-02-21 23:11:52 +03:00
Sergey Melnikov	e3d75879c0	Use fqdn to access console management API on production (#3651 ) console-release.local is legacy manual CNAME to neon-internal-api.aws.neon.tech in r53 We could use neon-internal-api.aws.neon.tech name directly This already was deployed to staging in https://github.com/neondatabase/neon/pull/3642	2023-02-20 18:11:06 +01:00
Sergey Melnikov	d5d690c044	Use fqdn for staging console management API (#3642 ) `console-staging.local` is legacy manual CNAME to `neon-internal-api.aws.neon.build` in r53 We could use `neon-internal-api.aws.neon.build` name directly	2023-02-20 16:05:21 +01:00
Arthur Petukhovsky	8f557477c6	Add new safekeeper to ap-southeast-1 prod (#3645 )	2023-02-20 17:51:27 +03:00
sharnoff	2153d2e00a	Run compute_ctl in a cgroup in VMs (#3577 )	2023-02-17 14:14:41 -08:00
Christian Schwarz	8d28a24b26	staging: enable automatic layer eviction at 20m threshold + period (#3636 ) What it says on the tin. Part of #2476	2023-02-17 18:32:01 +02:00
Sergey Melnikov	a1b062123b	Do not deploy storage to old account (#3630 ) It's gone	2023-02-16 20:28:53 +00:00
Sergey Melnikov	eb21d9969d	Add pageserver-3.us-west-2.aws.neon.tech (#3603 )	2023-02-14 12:56:03 +01:00
Rory de Zoete	1b9e5e84aa	Add new storage hosts for placement group test (#3561 ) To test the placement group setup	2023-02-08 16:48:29 +01:00
Sergey Melnikov	c5c14368e3	Fix deploy-prod.yml syntax (#3556 )	2023-02-07 15:27:31 +01:00
Sergey Melnikov	1254dc7ee2	Fix production deploy: run as root to access docker (#3555 )	2023-02-07 15:21:15 +01:00
Sergey Melnikov	959f5c6f40	Do not deploy legacy scram proxy (*.cloud.neon.tech) to the old account (#3546 ) We have migrated to the new proxy, which was setup in https://github.com/neondatabase/neon/pull/3461	2023-02-06 15:51:20 +01:00
Kirill Bulatov	f474495ba0	Publish builds stats that are easy to browse (#3514 ) Adds two new tags, `run-extra-build-macos` and `run-extra-build-stats` to trigger corresponding build jobs on any PR. On every build for `main` or PR with `run-extra-build-stats` tag, publish a GitHub commit status with the link to the `cargo build --all --release --timings` report.	2023-02-02 11:18:42 +02:00
Shany Pozin	bf1c36a30c	Moving the template file location (#3523 ) see https://github.com/appsmithorg/appsmith/issus/826#issuecomment-703093426 for details	2023-02-02 11:02:47 +02:00
Alexander Bayandin	567b71c1d2	Require poetry 1.3; regenerate poetry.lock (#3508 ) Ref https://python-poetry.org/blog/announcing-poetry-1.3.0/#new-lock-file-format	2023-02-01 18:11:00 +00:00
Sergey Melnikov	f3dadfb3d0	Confirm that there is an emergency before manual execution of prod deploy workflow (#3507 ) ![image](https://user-images.githubusercontent.com/7127190/215840037-69eda3ee-920e-4b90-bf7d-aa58f0bdfb50.png)	2023-02-01 16:01:27 +01:00
Sergey Melnikov	847fc566fd	Use the same runners/container for old prod deployments as for new prod	2023-01-31 17:40:24 +01:00
Vadim Kharitonov	a7d8bfa631	Fix create release PR	2023-01-31 14:36:04 +01:00
Sergey Melnikov	0806a46c0c	Fix production deploy (#3498 ) `get_binaries.sh` no longer use `RELEASE` environmental variable, it just use `DOCKER_TAG`	2023-01-31 13:36:25 +01:00
Sergey Melnikov	5e08b35f53	Fix new deploy workflow (#3492 ) Add 'branch' input to specify commit for deploy scripts/configs. Commit can't be passed to workflow as ref, and we need to pin configs to specific commit for main/release deploys Update deploy input descriptions to match GH interface	2023-01-30 22:08:00 +01:00
Sergey Melnikov	82cbcb36ab	Extract neon deploy jobs into separate workflows (#3424 ) Extract deploy jobs from build_and_test.yml to deploy-dev and deploy-prod workflows. Add trigger to run this workflows after Neon is build and tested on main and release branches. This will allow us to redeploy/rollback/patch config without full rebuild.	2023-01-30 20:10:54 +01:00
Vadim Kharitonov	ec0e641578	Create Release PR: review fixes	2023-01-30 16:15:22 +01:00
Rory de Zoete	7bb13569b3	Switch more jobs to small runner (#3483 ) As these jobs don't benefit from additional cores Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>	2023-01-30 14:00:44 +01:00
Vadim Kharitonov	5fc233964a	Create release PR	2023-01-30 12:44:48 +01:00
Rory de Zoete	4d291d0e90	Prevent assume error (#3476 ) To fix `Error: The requested DurationSeconds exceeds the MaxSessionDuration set for this role.` Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>	2023-01-27 19:27:23 +01:00
Rory de Zoete	4718c67c17	Update deploy steps (#3470 ) First one isn't optimal, but as it was requested to run the runner as nonroot -> https://github.com/neondatabase/runner/pull/1#discussion_r1069909593 this job will need more significant refactoring. This should unblock the deployment process. --------- Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>	2023-01-27 18:05:49 +01:00
Rory de Zoete	8342e9ea6f	Update helm job (#3467 ) As followup from https://github.com/neondatabase/build/pull/47 Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>	2023-01-27 13:28:26 +01:00
Rory de Zoete	2388981311	Add cleanup tasks for ansible and helm (#3465 ) To fix: https://github.com/neondatabase/neon/actions/runs/4023027504/jobs/6913421070 https://github.com/neondatabase/neon/actions/runs/4023027504/jobs/6913421268 Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>	2023-01-27 11:20:51 +01:00

1 2 3 4 5 ...

323 Commits