rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Alexander Bayandin	218062ceba	GitHub Workflows: use ref_name instead of ref	2023-04-12 15:22:18 +01:00
Stas Kelvich	de99ee2c0d	Add more proxy cnames	2023-04-11 14:54:09 +03:00
Alexander Bayandin	c79d5a947c	Nightly Benchmarks: run third-party benchmarks once a week (#3987 )	2023-04-11 10:58:04 +01:00
Stas Kelvich	22c890b71c	Add more cnames to proxies	2023-04-11 01:55:25 +03:00
Alexander Bayandin	818e341af0	Nightly Benchmarks: replace neon-captest-prefetch with -new/-reuse (#3970 ) We have enabled prefetch by default, let's use this in Nightly Benchmarks: - effective_io_concurrency=100 by default (instead of 32) - maintenance_io_concurrency=100 by default (instead of 32) Rename `neon-captest-prefetch` to `neon-captest-new` (for pgbench with initialisation) and `neon-captest-reuse` (for OLAP scenarios)	2023-04-09 12:52:49 +01:00
Stas Kelvich	0bf70e113f	Add extra cnames to staging proxy	2023-04-07 19:18:19 +03:00
Dmitry Rodionov	b45c92e533	tests: exclude compatibility tests by default (#3975 ) This allows to skip compatibility tests based on `CHECK_ONDISK_DATA_COMPATIBILITY` environment variable. When the variable is missing (default) compatibility tests wont be run.	2023-04-06 21:21:39 +03:00
Alexander Bayandin	4d64edf8a5	Nightly Benchmarks: Add free tier sized compute (#3969 ) - Add support for VMs and CU - Add free tier limited benchmark (0.25 CU) - Ensure we use 1 CU by default for pgbench workload	2023-04-06 19:18:24 +03:00
Gleb Novikov	9db70f6232	Added disk_size and instance_type to payload (#3918 ) ## Describe your changes In https://github.com/neondatabase/cloud/issues/4354 we are making scheduling of projects based on available disk space and overcommit, so we need to know disk size and just in case instance type of the pageserver ## Issue ticket number and link https://github.com/neondatabase/cloud/issues/4354 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] ~If it is a core feature, I have added thorough tests.~ - [ ] ~Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?~ - [ ] ~If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.~	2023-04-06 14:02:56 +04:00
Alexander Bayandin	9310949b44	GitHub Autocomment: Retry on server errors (#3958 ) Retry posting/updating a comment in case of 5XX errors from GitHub API	2023-04-05 22:08:06 +03:00
Alexander Bayandin	1d23b5d1de	Comment PR with test results (#3907 ) This PR adds posting a comment with test results. Each workflow run updates the comment with new results. The layout and the information that we post can be changed to our needs, right now, it contains failed tests and test which changes status after rerun (i.e. flaky tests)	2023-04-04 12:22:47 +01:00
Alexander Bayandin	105b8bb9d3	test_runner: automatically rerun flaky tests (#3880 ) This PR adds a plugin that automatically reruns (up to 3 times) flaky tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and `pytest-rerunfailures` plugin. As the first approximation we consider the test flaky if it has failed on the main branch in the last 10 days. Flaky tests are fetched by `scripts/flaky_tests.py` script (it's possible to use it in a standalone mode to learn which tests are flaky), stored to a JSON file, and then the file is passed to the pytest plugin.	2023-04-04 12:21:54 +01:00
Christian Schwarz	45bf76eb05	enable layer eviction by default in prod (#3933 ) Leave disk_usage_based_eviction above the current max usage in prod (82%ish), so that deploying this commit won't trigger disk_usage_based_eviction. As indicated in the TODO, we'll decrease the value to 80% later. Also update the staging YAMLs to use the anchor syntax for `evictions_low_residence_duration_metric_threshold` like we do in the prod YAMLs as of this patch.	2023-04-03 14:57:36 +02:00
Joonas Koivunen	cf5cfe6d71	fix: metric used for alerting threshold on staging (#3932 ) This should remove the too eager alerts from staging.	2023-04-03 13:26:45 +03:00
Alexander Bayandin	75ffe34b17	check-macos-build: fix cache key (#3926 ) We don't have `${{ matrix.build_type }}` there, so it gets resolved to an empty substring and looks like this [`v1-macOS--pg-f8a650e49b06d39ad131b860117504044b01f312-dcccd010ff851b9f72bb451f28243fa3a341f07028034bbb46ea802413b36d80`](https://github.com/neondatabase/neon/actions/runs/4575422427/jobs/8078231907#step:26:2)	2023-03-31 21:45:59 +03:00
Dmitry Rodionov	22f9ea5fe2	Remind people to clean up merge commit message in PR template (#3920 )	2023-03-31 16:11:34 +03:00
Joonas Koivunen	d0711d0896	build: fix git perms for deploy job (#3921 ) copy pasted from `build-neon` job. it is interesting that this is only needed by `build-neon` and `deploy`. Fixes: https://github.com/neondatabase/neon/actions/runs/4568077915/jobs/8070960178 which seems to have been going for a while.	2023-03-31 16:05:15 +03:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Lassi Pölönen	1c1bb904ed	Rename zenith_* labels to neon_* (#3911 ) ## Describe your changes Get rid of the legacy labeling. Aslo `neon_region_slug` with the same value as `neon_region` doesn't make much sense, so just drop it. This allows us to drop the relabeling from zenith to neon in the log collector.	2023-03-30 16:24:47 +03:00
Kirill Bulatov	9d714a8413	Split $CARGO_FLAGS and $CARGO_FEATURES to make e2e tests work	2023-03-29 00:08:30 +03:00
Kirill Bulatov	6c84cbbb58	Run new Rust IT test in CI	2023-03-29 00:08:30 +03:00
Vadim Kharitonov	e3cbcc2ea7	Revert "Add `neondatabase/release` team as a default reviewers for storage" This reverts commit `daeaa767c4`.	2023-03-27 14:10:18 +04:00
Shany Pozin	809acb5fa9	Move neon-image-depot to a larger runner (#3860 ) ## Describe your changes https://neondb.slack.com/archives/C039YKBRZB4/p1679413279637059 ## Issue ticket number and link ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-21 19:32:36 +02:00
Arthur Petukhovsky	b067378d0d	Measure cross-AZ traffic in safekeepers (#3806 ) Create `safekeeper_pg_io_bytes_total` metric to track total amount of bytes written/read in a postgres connections to safekeepers. This metric has the following labels: - `client_az` – availability zone of the connection initiator, or `"unknown"` - `sk_az` – availability zone of the safekeeper, or `"unknown"` - `app_name` – `application_name` of the postgres client - `dir` – data direction, either `"read"` or `"write"` - `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from `client_az` and `sk_az`, exists purely for convenience. This is implemented by passing availability zone in the connection string, like this: `-c tenant_id=AAA timeline_id=BBB availability-zone=AZ-1`. Update ansible deployment scripts to add availability_zone argument to safekeeper and pageserver in systemd service files.	2023-03-16 17:24:01 +03:00
Rahul Patil	f1d960d2c2	Add new pageserver to eu-central-1 (#3829 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-15 16:58:28 +01:00
Alexander Bayandin	3d869cbcde	Replace flake8 and isort with ruff (#3810 ) - Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort - Update mypy and black	2023-03-14 13:25:44 +00:00
Lassi Pölönen	68ae020b37	Use RollingUpdate strategy also for legacy proxy (#3814 ) ## Describe your changes We have previously changed the neon-proxy to use RollingUpdate. This should be enabled in legacy proxy too in order to avoid breaking connections for the clients and allow for example backups to run even during deployment. (https://github.com/neondatabase/neon/pull/3683) ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-03-14 13:23:46 +00:00
Vadim Kharitonov	daeaa767c4	Add `neondatabase/release` team as a default reviewers for storage releases	2023-03-13 13:40:15 +01:00
Nikita Kalyanov	07dcf679de	set content type explicitly (#3799 ) I moved management API v2 to ogen and the generated code seems to be more strict about content type. Let's set it properly as it is json after all ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-13 14:00:01 +02:00
Rory de Zoete	3c4f5af1b9	Try depot.dev for image building (#3768 ) To see if it is faster. Run side-by-side for a while so we can gather enough data.	2023-03-10 11:11:39 +01:00
Sergey Melnikov	9f906ff236	Add pageserver-2.us-east-2.aws.neon.tech (#3701 )	2023-02-23 19:56:21 +01:00
Lassi Pölönen	b0311cfdeb	Change the production neon-proxy-scram update strategy to RollingUpdate (#3683 ) ## Describe your changes The same change in production as was done in staging by https://github.com/neondatabase/neon/pull/3678 ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-02-22 20:15:37 +02:00
Lassi Pölönen	965b4f4ae2	Change the staging neon-proxy-scram update strategy to RollingUpdate (#3678 ) ## Describe your changes When we deploy the proxy with the default Recreate strategy, there's always some downtime and existing connections will be shut down. Change the strategy to RollingUpdate and delay the kill signal by one week. AWS Network Loadbalancer keeps the existing connections alive for as long as the pods are alive, but will direct new connections to new pods. ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-02-22 16:50:07 +02:00
Arthur Petukhovsky	95018672fa	Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671 ) We migrated all timelines to `safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be removed.	2023-02-22 11:55:41 +02:00
Sergey Melnikov	2caece2077	Add -v to ansible invocations (#3670 ) To get more debug output on failures	2023-02-21 23:11:52 +03:00
Sergey Melnikov	e3d75879c0	Use fqdn to access console management API on production (#3651 ) console-release.local is legacy manual CNAME to neon-internal-api.aws.neon.tech in r53 We could use neon-internal-api.aws.neon.tech name directly This already was deployed to staging in https://github.com/neondatabase/neon/pull/3642	2023-02-20 18:11:06 +01:00
Sergey Melnikov	d5d690c044	Use fqdn for staging console management API (#3642 ) `console-staging.local` is legacy manual CNAME to `neon-internal-api.aws.neon.build` in r53 We could use `neon-internal-api.aws.neon.build` name directly	2023-02-20 16:05:21 +01:00
Arthur Petukhovsky	8f557477c6	Add new safekeeper to ap-southeast-1 prod (#3645 )	2023-02-20 17:51:27 +03:00
sharnoff	2153d2e00a	Run compute_ctl in a cgroup in VMs (#3577 )	2023-02-17 14:14:41 -08:00
Christian Schwarz	8d28a24b26	staging: enable automatic layer eviction at 20m threshold + period (#3636 ) What it says on the tin. Part of #2476	2023-02-17 18:32:01 +02:00
Sergey Melnikov	a1b062123b	Do not deploy storage to old account (#3630 ) It's gone	2023-02-16 20:28:53 +00:00
Sergey Melnikov	eb21d9969d	Add pageserver-3.us-west-2.aws.neon.tech (#3603 )	2023-02-14 12:56:03 +01:00
Rory de Zoete	1b9e5e84aa	Add new storage hosts for placement group test (#3561 ) To test the placement group setup	2023-02-08 16:48:29 +01:00
Sergey Melnikov	c5c14368e3	Fix deploy-prod.yml syntax (#3556 )	2023-02-07 15:27:31 +01:00
Sergey Melnikov	1254dc7ee2	Fix production deploy: run as root to access docker (#3555 )	2023-02-07 15:21:15 +01:00
Sergey Melnikov	959f5c6f40	Do not deploy legacy scram proxy (*.cloud.neon.tech) to the old account (#3546 ) We have migrated to the new proxy, which was setup in https://github.com/neondatabase/neon/pull/3461	2023-02-06 15:51:20 +01:00
Kirill Bulatov	f474495ba0	Publish builds stats that are easy to browse (#3514 ) Adds two new tags, `run-extra-build-macos` and `run-extra-build-stats` to trigger corresponding build jobs on any PR. On every build for `main` or PR with `run-extra-build-stats` tag, publish a GitHub commit status with the link to the `cargo build --all --release --timings` report.	2023-02-02 11:18:42 +02:00
Shany Pozin	bf1c36a30c	Moving the template file location (#3523 ) see https://github.com/appsmithorg/appsmith/issus/826#issuecomment-703093426 for details	2023-02-02 11:02:47 +02:00
Alexander Bayandin	567b71c1d2	Require poetry 1.3; regenerate poetry.lock (#3508 ) Ref https://python-poetry.org/blog/announcing-poetry-1.3.0/#new-lock-file-format	2023-02-01 18:11:00 +00:00
Sergey Melnikov	f3dadfb3d0	Confirm that there is an emergency before manual execution of prod deploy workflow (#3507 ) ![image](https://user-images.githubusercontent.com/7127190/215840037-69eda3ee-920e-4b90-bf7d-aa58f0bdfb50.png)	2023-02-01 16:01:27 +01:00

1 2 3 4 5 ...

335 Commits