- Split postgres_ffi into two version specific files.
- Preserve pg_version in timeline metadata.
- Use pg_version in safekeeper code. Check for postgres major version mismatch.
- Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding.
- Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture.
To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable:
'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress'
Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.
Also get rid if `with_safekeepers` parameter in tests.
Its meaning has changed: `False` meant "no safekeepers" which is not
supported anymore, so we assume it's always `True`.
See #1648
Instead of spawning helper threads, we now use Tokio tasks. There
are multiple Tokio runtimes, for different kinds of tasks. One for
serving libpq client connections, another for background operations
like GC and compaction, and so on. That's not strictly required, we
could use just one runtime, but with this you can still get an
overview of what's happening with "top -H".
There's one subtle behavior in how TenantState is updated. Before this
patch, if you deleted all timelines from a tenant, its GC and
compaction loops were stopped, and the tenant went back to Idle
state. We no longer do that. The empty tenant stays Active. The
changes to test_tenant_tasks.py are related to that.
There's still plenty of synchronous code and blocking. For example, we
still use blocking std::io functions for all file I/O, and the
communication with WAL redo processes is still uses low-level unix
poll(). We might want to rewrite those later, but this will do for
now. The model is that local file I/O is considered to be fast enough
that blocking - and preventing other tasks running in the same thread -
is acceptable.
This caught or reproduced several bugs when I originally wrote this test
back in May, including #1731, #1740, #1751, and #707. I believe all the
issues have been fixed now, but since this was a very fruitful test,
let's add it to the test suite.
We didn't commit this earlier, because the test was very slow especially
with a debug build. We've since changed the build options so that even
the debug builds are not quite so slow anymore.
* Add test for pageserver metric cleanup once a tenant is detached.
* Remove tenant specific timeline metrics on detach.
* Use definitions from timeline_metrics in page service.
* Move metrics to own file from layered_repository/timeline.rs
* TIMELINE_METRICS: define smgr metrics
* REMOVE SMGR cleanup from timeline_metrics. Doesn't seem to work as
expected.
* Vritual file centralized metrics, except for evicted file as there's no
tenat id or timeline id.
* Use STORAGE_TIME from timeline_metrics in layered_repository.
* Remove timelineless gc metrics for tenant on detach.
* Rename timeline metrics -> metrics as it's more generic.
* Don't create a TimelineMetrics instance for VirtualFile
* Move the rest of the metric definitions to metrics.rs too.
* UUID -> ZTenantId
* Use consistent style for dict.
* Use Repository's Drop trait for dropping STORAGE_TIME metrics.
* No need for Arc, TimelineMetrics is used in just one place. Due to that,
we can fall back using ZTenantId and ZTimelineId too to avoid additional
string allocation.
* Add submodule postgres-15
* Support pg_15 in pgxn/neon
* Renamed zenith -> neon in Makefile
* fix name of codestyle check
* Refactor build system to prepare for building multiple Postgres versions.
Rename "vendor/postgres" to "vendor/postgres-v14"
Change Postgres build and install directory paths to be version-specific:
- tmp_install/build -> pg_install/build/14
- tmp_install/* -> pg_install/14/*
And Makefile targets:
- "make postgres" -> "make postgres-v14"
- "make postgres-headers" -> "make postgres-v14-headers"
- etc.
Add Makefile aliases:
- "make postgres" to build "postgres-v14" and in future, "postgres-v15"
- similarly for "make postgres-headers"
Fix POSTGRES_DISTRIB_DIR path in pytest scripts
* Make postgres version a variable in codestyle workflow
* Support vendor/postgres-v15 in codestyle check workflow
* Support postgres-v15 building in Makefile
* fix pg version in Dockerfile.compute-node
* fix kaniko path
* Build neon extensions in version-specific directories
* fix obsolete mentions of vendor/postgres
* use vendor/postgres-v14 in Dockerfile.compute-node.legacy
* Use PG_VERSION_NUM to gate dependencies in inmem_smgr.c
* Use versioned ECR repositories and image names for compute-node.
The image name format is compute-node-vXX, where XX is postgres major version number.
For now only v14 is supported.
Old format unversioned name (compute-node) is left, because cloud repo depends on it.
* update vendor/postgres submodule url (zenith->neondatabase rename)
* Fix postgres path in python tests after rebase
* fix path in regress test
* Use separate dockerfiles to build compute-node:
Dockerfile.compute-node-v15 should be identical to Dockerfile.compute-node-v14 except for the version number.
This is a hack, because Kaniko doesn't support build ARGs properly
* bump vendor/postgres-v14 and vendor/postgres-v15
* Don't use Kaniko cache for v14 and v15 compute-node images
* Build compute-node images for different versions in different jobs
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
* Update relation size cache only when latest LSN is requested
* Fix tests
* Add a test case for timetravel query after pageserver restart.
This test is currently failing, the queries return incorrect results.
I don't know why, needs to be investigated.
FAILED test_runner/batch_others/test_readonly_node.py::test_timetravel - assert 85 == 100000
If you remove the pageserver restart from the test, it passes.
* yapf3 test_readonly_node.py
* Add comment about cache correction in case of setting incorrect latest flag
* Fix formatting for test_readonly_node.py
* Remove unused imports
* Fix mypy warning for test_readonly_node.py
* Fix formatting of test_readonly_node.py
* Bump postgres version
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Start the calculation on the first size request, return
partially calculated size during calculation, retry if failed.
Remove "fast" size init through the ancestor: the current approach is
fast enough for now and there are better ways to optimize the
calculation via incremental ancestor size computation
For better ergonomics. I always found it weird that we used UUID to
actually mean a tenant or timeline ID. It worked because it happened
to have the same length, 16 bytes, but it was hacky.
Merge batch_others and batch_pg_regress. The original idea was to
split all the python tests into multiple "batches" and run each batch
in parallel as a separate CI job. However, the batch_pg_regress batch
was pretty short compared to all the tests in batch_others. We could
split batch_others into multiple batches, but it actually seems better
to just treat them as one big pool of tests and use pytest's handle
the parallelism on its own. If we need to split them across multiple
nodes in the future, we could use pytest-shard or something else,
instead of managing the batches ourselves.
Merge test_neon_regress.py, test_pg_regress.py and test_isolation.py
into one file, test_pg_regress.py. Seems more clear to group all
pg_regress-based tests into one file, now that they would all be in
the same directory.