## Problem
We want to regularly verify the performance of pgvector HNSW parallel
index builds and parallel similarity search using HNSW indexes.
The first release that considerably improved the index-build parallelism
was pgvector 0.7.0 and we want to make sure that we do not regress by
our neon compute VM settings (swap, memory over commit, pg conf etc.)
## Summary of changes
Prepare a Neon project with 1 million openAI vector embeddings (vector
size 1536).
Run HNSW indexing operations in the regression test for the various
distance metrics.
Run similarity queries using pgbench with 100 concurrent clients.
I have also added the relevant metrics to the grafana dashboards pgbench
and olape
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
## Problem
We have no regression tests for websocket flow
## Summary of changes
Add a hacky implementation of the postgres protocol over websockets just
to verify the protocol behaviour does not regress over time.
## Problem
A set of small changes that are too small to open a separate for each.
A notable change is adding `pytest-repeat` plugin, which can help to
ensure that a flaky test is fixed by running such a test several times.
## Summary of changes
- Update Allure from 2.24.0 to 2.27.0
- Update Ruff from 0.1.11 to 0.2.2 (update `[tool.ruff]` section of
`pyproject.toml` for it)
- Install pytest-repeat plugin
## Problem
Proxy already supported HTTP2, but I expect no one is using it because
we don't advertise it in the TLS handshake.
## Summary of changes
#6335 without the websocket changes.
## Problem
`black` is slow sometimes, we can replace it with `ruff format` (a new
feature in 0.1.2 [0]), which produces pretty similar to black style [1].
On my local machine (MacBook M1 Pro 16GB):
```
# `black` on main
$ hyperfine "BLACK_CACHE_DIR=/dev/null poetry run black ."
Benchmark 1: BLACK_CACHE_DIR=/dev/null poetry run black .
Time (mean ± σ): 3.131 s ± 0.090 s [User: 5.194 s, System: 0.859 s]
Range (min … max): 3.047 s … 3.354 s 10 runs
```
```
# `ruff format` on the current PR
$ hyperfine "RUFF_NO_CACHE=true poetry run ruff format"
Benchmark 1: RUFF_NO_CACHE=true poetry run ruff format
Time (mean ± σ): 300.7 ms ± 50.2 ms [User: 259.5 ms, System: 76.1 ms]
Range (min … max): 267.5 ms … 420.2 ms 10 runs
```
## Summary of changes
- Replace `black` with `ruff format` everywhere
- [0] https://docs.astral.sh/ruff/formatter/
- [1] https://docs.astral.sh/ruff/formatter/#black-compatibility
## Problem
The version of pytest we were using emits a number of
DeprecationWarnings on latest python: these are fixed in latest release.
boto3 and python-dateutil also have deprecation warnings, but
unfortunately these aren't fixed upstream yet.
## Summary of changes
- Update pytest
- Update boto3 (this doesn't fix deprecation warnings, but by the time I
figured that out I had already done the update, and it's good hygiene
anyway)
## Problem
While investigating https://github.com/neondatabase/neon/issues/5854, we
hypothesised that logs/repo-dir from the initial failure might leak into
reruns. Use different directories for each run to avoid such a
possibility.
## Summary of changes
- make each test rerun use different directories
- update `pytest-rerunfailure` plugin from 11.1.2 to 13.0
## Problem
- Because we compress artifacts file by file, we don't need to put them
into `tar` containers (ie instead of `tar.gz` we can use just `gz`).
- Pythons gz single-threaded and pretty slow.
A benchmark has shown ~20 times speedup (19.876176291 vs
0.8748335830000009) on my laptop (for a pageserver.log size is 1.3M)
## Summary of changes
- Replace tarfile with zstandart
- Update allure to 2.24.0
## Problem
Benchmarks run takes about an hour on main branch (in a single job),
which delays pipeline results. And it takes another hour if we want to
restart the job due to some failures.
## Summary of changes
- Use `pytest-split` plugin to run benchmarks on separate CI runners in
4 parallel jobs
- Add `scripts/benchmark_durations.py` for getting benchmark durations
from the database to help `pytest-split` schedule tests more evenly. It
uses p99 for the last 10 days' results (durations).
The current distribution could be better; each worker's durations vary
from 9m to 35m, but this could be improved in consequent PRs.
## Problem
1. The local endpoints provision 2 ports (postgres and HTTP) which means
the migration_check endpoint has a different port than what the setup
implies
2. psycopg2-binary 2.9.3 has a deprecated poetry config and doesn't
install.
## Summary of changes
Update psycopg2-binary and update the endpoint ports in the readme
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
## Problem
`pytest` 6 truncates error messages and this is not configured.
It's fixed in `pytest` 7, it prints the whole message (truncating limit
is higher) if `--verbose` is set (it's set on CI).
## Summary of changes
- `pytest` and `pytest` plugins are updated to their latest versions
- linters (`black` and `ruff`) are updated to their latest versions
- `mypy` and types are updated to their latest versions, new warnings
are fixed
- while we're here, allure updated its latest version as well
This PR adds a plugin that automatically reruns (up to 3 times) flaky
tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and
`pytest-rerunfailures` plugin.
As the first approximation we consider the test flaky if it has failed on
the main branch in the last 10 days.
Flaky tests are fetched by `scripts/flaky_tests.py` script (it's
possible to use it in a standalone mode to learn which tests are flaky),
stored to a JSON file, and then the file is passed to the pytest plugin.
## Describe your changes
```
$ poetry add werkzeug@latest "moto[server]@latest"
Using version ^2.2.3 for werkzeug
Using version ^4.1.2 for moto
Updating dependencies
Resolving dependencies... (1.6s)
Writing lock file
Package operations: 0 installs, 2 updates, 1 removal
• Removing pytz (2022.1)
• Updating werkzeug (2.1.2 -> 2.2.3)
• Updating moto (3.1.18 -> 4.1.2)
```
Resolves:
- https://github.com/neondatabase/neon/security/dependabot/14
- https://github.com/neondatabase/neon/security/dependabot/13
`@dependabot` failed to create a PR for some reason (I guess because it
also needed to handle `moto` dependency)
## Issue ticket number and link
N/A
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
For use in production in case on-demand download turns out to be
problematic during tenant_attach, or when we eventually introduce layer
eviction.
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Add new background job to collect billing metrics for each tenant and
send them to the HTTP endpoint.
Metrics are cached, so we don't send non-changed metrics.
Add metric collection config parameters:
metric_collection_endpoint (default None, i.e. disabled)
metric_collection_interval (default 60s)
Add test_metric_collection.py to test metric collection
and sending to the mocked HTTP endpoint.
Use port distributor in metric_collection test
review fixes: only update cache after metrics were send successfully, simplify code
disable metric collection if metric_collection_endpoint is not provided in config
**NB**: this PR doesn't update Python to 3.11; it makes tests
compatible with it and fixes a couple of warnings by updating
dependencies.
- `poetry add asyncpg@latest` to fix `./scripts/pysync`
- `poetry add boto3@latest "boto3-stubs[s3]@latest"` to fix
```
DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13
```
- `poetry update certifi` to fix
```
DeprecationWarning: path is deprecated. Use files() instead. Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.
```
- Move `types-toml` from `dep-dependencies` to `dependencies` to keep it
aligned with other `types-*` deps
This change wraps the std::process:Child that we spawn for WAL redo
into a type that ensures that we try to SIGKILL + waitpid() on it.
If there is no explicit call to kill_and_wait(), the Drop implementation
will spawns a task that does it in the BACKGROUND_RUNTIME.
That's an ugly hack but I think it's better than doing kill+wait
synchronously from Drop, since I think the general assumption in the
Rust ecosystem is that Drop doesn't block.
Especially since the drop sites can be _any_ place that drops the last
Arc<PostgresRedoManager>, e.g., compaction or GC.
The benefit of having the new type over just adding a Drop impl to
PostgresRedoProcess is that we can construct it earlier than the full
PostgresRedoProcess in PostgresRedoProcess::launch().
That allows us to correctly kill+wait the child if there is an error in
PostgresRedoProcess::launch() after spawning it.
I also took a stab at a regression test. I manually verified
that it fails before the fix to walredo.rs.
fixes https://github.com/neondatabase/neon/issues/2761
closes https://github.com/neondatabase/neon/pull/2776
Add `test_forward_compatibility`, which checks if it's going to
be possible to roll back a release to the previous version.
The test uses artifacts (Neon & Postgres binaries) from the previous
release to start Neon on the repo created by the current version. It
performs exactly the same checks as `test_backward_compatibility` does.
Single `ALLOW_BREAKING_CHANGES` env var got replaced by
`ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` &
`ALLOW_FORWARD_COMPATIBILITY_BREAKAGE` and can be set by `backward
compatibility breakage` and `forward compatibility breakage` labels
respectively.
Added pytest to check correctness of the link authentication pipeline.
Context: this PR is the first step towards refactoring the link authentication pipeline to use https (instead of psql) to send the db info to the proxy. There was a test missing for this pipeline in this repo, so this PR adds that test as preparation for the actual change of psql -> https.
Co-authored-by: Bojan Serafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Stas Kelvic <stas@neon.tech>
Co-authored-by: Dimitrii Ivanov <dima@neon.tech>
Newer version of mypy fixes buggy error when trying to update only boto3 stubs.
However it brings new checks and starts to yell when we index into
cusror.fetchone without checking for None first. So this introduces a wrapper
to simplify quering for scalar values. I tried to use cursor_factory connection
argument but without success. There can be a better way to do that,
but this looks the simplest
The CI times out after 10 minutes of no output. It's annoying if a
test hangs and is killed by the CI timeout, because you don't get
information about which test was running. Try to avoid that, by adding
a slightly smaller timeout in pytest itself. You can override it on a
per-test basis if needed, but let's try to keep our tests shorter than
that.
For the Postgres regression tests, use a longer 30 minute timeout.
They're not really a single test, but many tests wrapped in a single
pytest test. It's OK for them to run longer in aggregate, each
Postgres test is still fairly short.
- Enabled process exporter for storage services
- Changed zenith_proxy prefix to just proxy
- Removed old `monitoring` directory
- Removed common prefix for metrics, now our common metrics have `libmetrics_` prefix, for example `libmetrics_serve_metrics_count`
- Added `test_metrics_normal_work`