feat(pageserver): store reldir in sparse keyspace (#10593)

## Problem

Part of https://github.com/neondatabase/neon/issues/9516

## Summary of changes

This patch adds the support for storing reldir in the sparse keyspace.
All logic are guarded with the `rel_size_v2_enabled` flag, so if it's
set to false, the code path is exactly the same as what's currently in
prod.

Note that we did not persist the `rel_size_v2_enabled` flag and the
logic around it will be implemented in the next patch. (i.e., what if we
enabled it, restart the pageserver, and then it gets set to false? we
should still read from v2 using the rel_size_v2_migration_status in the
index_part). The persistence logic I'll implement in the next patch will
disallow switching from v2->v1 via config item.

I also refactored the metrics so that it can work with the new reldir
store. However, this metric is not correctly computed for reldirs (see
the comments) before. With the refactor, the value will be computed only
when we have an initial value for the reldir size. The refactor keeps
the incorrectness of the computation when there are more than 1
database.

For the tests, we currently run all the tests with v2, and I'll set it
to false and add some v2-specific tests before merging, probably also
v1->v2 migration tests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
This commit is contained in:
Alex Chi Z.
2025-02-14 15:31:54 -05:00
committed by GitHub
parent a32e8871ac
commit ae091c6913
8 changed files with 507 additions and 69 deletions

View File

@@ -0,0 +1,68 @@
from __future__ import annotations
from fixtures.neon_fixtures import (
NeonEnvBuilder,
)
def test_pageserver_reldir_v2(
neon_env_builder: NeonEnvBuilder,
):
env = neon_env_builder.init_start(
initial_tenant_conf={
"rel_size_v2_enabled": "false",
}
)
endpoint = env.endpoints.create_start("main")
# Create a relation in v1
endpoint.safe_psql("CREATE TABLE foo1 (id INTEGER PRIMARY KEY, val text)")
endpoint.safe_psql("CREATE TABLE foo2 (id INTEGER PRIMARY KEY, val text)")
# Switch to v2
env.pageserver.http_client().update_tenant_config(
env.initial_tenant,
{
"rel_size_v2_enabled": True,
},
)
# Check if both relations are still accessible
endpoint.safe_psql("SELECT * FROM foo1")
endpoint.safe_psql("SELECT * FROM foo2")
# Restart the endpoint
endpoint.stop()
endpoint.start()
# Check if both relations are still accessible again after restart
endpoint.safe_psql("SELECT * FROM foo1")
endpoint.safe_psql("SELECT * FROM foo2")
# Create a relation in v2
endpoint.safe_psql("CREATE TABLE foo3 (id INTEGER PRIMARY KEY, val text)")
# Delete a relation in v1
endpoint.safe_psql("DROP TABLE foo1")
# Check if both relations are still accessible
endpoint.safe_psql("SELECT * FROM foo2")
endpoint.safe_psql("SELECT * FROM foo3")
# Restart the endpoint
endpoint.stop()
# This will acquire a basebackup, which lists all relations.
endpoint.start()
# Check if both relations are still accessible
endpoint.safe_psql("DROP TABLE IF EXISTS foo1")
endpoint.safe_psql("SELECT * FROM foo2")
endpoint.safe_psql("SELECT * FROM foo3")
endpoint.safe_psql("DROP TABLE foo3")
endpoint.stop()
endpoint.start()
# Check if relations are still accessible
endpoint.safe_psql("DROP TABLE IF EXISTS foo1")
endpoint.safe_psql("SELECT * FROM foo2")
endpoint.safe_psql("DROP TABLE IF EXISTS foo3")

View File

@@ -481,7 +481,8 @@ def test_pageserver_metrics_many_relations(neon_env_builder: NeonEnvBuilder):
counts = timeline_detail["directory_entries_counts"]
assert counts
log.info(f"directory counts: {counts}")
assert counts[2] > COUNT_AT_LEAST_EXPECTED
# We need to add up reldir v1 + v2 counts
assert counts[2] + counts[7] > COUNT_AT_LEAST_EXPECTED
def test_timelines_parallel_endpoints(neon_simple_env: NeonEnv):