Commit Graph

6442 Commits

Author SHA1 Message Date
Conrad Ludgate
897cffb9d8 auth_broker: fix local_proxy conn count (#9593)
our current metrics for http pool opened connections is always negative
:D oops
2024-10-31 14:57:55 +00:00
John Spray
552088ac16 pageserver: fix spurious error logs in timeline lifecycle (#9589)
## Problem

The final part of https://github.com/neondatabase/neon/issues/9543 will
be a chaos test that creates/deletes/archives/offloads timelines while
restarting pageservers and migrating tenants. Developing that test
showed up a few places where we log errors during normal shutdown.

## Summary of changes

- UninitializedTimeline's drop should log at info severity: this is a
normal code path when some part of timeline creation encounters a
cancellation `?` path.
- When offloading and finding a `RemoteTimelineClient` in a
non-initialized state, this is not an error and should not be logged as
such.
- The `offload_timeline` function returned an anyhow error, so callers
couldn't gracefully pick out cancellation errors from real errors:
update this to have a structured error type and use it throughout.
2024-10-31 14:44:59 +00:00
Peter Bendel
51fda118f6 increase lifetime of AWS session token to 12 hours (#9590)
## Problem

clickbench regression causes clickbench to run >9 hours and the AWS
session token is expired before the run completes

## Summary of changes

extend lifetime of session token for this job to 12 hours
2024-10-31 13:34:50 +00:00
Anastasia Lubennikova
e96398a552 Add support of extensions for v17 (part 4) (#9568)
- pg_jsonschema 0.3.3
- pg_graphql 1.5.9
- rum 65e0a752
- pg_tiktoken a5bc447e

update support of extensions for v14-v16:
- pg_jsonschema 0.3.1 -> 0.3.3
- pg_graphql 1.5.7 -> 1.5.9
- rum 6ab37053 -> 65e0a752
- pg_tiktoken e64e55aa -> a5bc447e
2024-10-31 15:05:24 +02:00
Erik Grinaker
f9d8256d55 pageserver: don't return option from DeletionQueue::new (#9588)
`DeletionQueue::new()` always returns deletion workers, so the returned
`Option` is redundant.
2024-10-31 10:51:58 +00:00
Vlad Lazar
411c3aa0d6 pageserver: lift decoding and interpreting of wal into wal_decoder (#9524)
## Problem

Decoding and ingestion are still coupled in `pageserver::WalIngest`.

## Summary of changes

A new type is added to `wal_decoder::models`, InterpretedWalRecord. This
type contains everything that the pageserver requires in order to ingest
a WAL record. The highlights are the `metadata_record` which is an
optional special record type to be handled and `blocks` which stores
key, value pairs to be persisted to storage.

This type is produced by
`wal_decoder::models::InterpretedWalRecord::from_bytes` from a raw PG
wal record.

The rest of this commit separates decoding and interpretation of the PG
WAL record from its application in `WalIngest::ingest_record`.

Related: https://github.com/neondatabase/neon/issues/9335
Epic: https://github.com/neondatabase/neon/issues/9329
2024-10-31 10:47:43 +00:00
Arpad Müller
65b69392ea Disallow offloaded children during timeline deletion (#9582)
If we delete a timeline that has childen, those children will have their
data corrupted. Therefore, extend the already existing safety check to
offloaded timelines as well.

Part of #8088
2024-10-30 19:37:09 +01:00
Alex Chi Z.
8d70f88b37 refactor(pageserver): use JSON field encoding for consumption metrics cache (#9470)
In https://github.com/neondatabase/neon/issues/9032, I would like to
eventually add a `generation` field to the consumption metrics cache.
The current encoding is not backward compatible and it is hard to add
another field into the cache. Therefore, this patch refactors the format
to store "field -> value", and it's easier to maintain backward/forward
compatibility with this new format.

## Summary of changes

* Add `NewRawMetric` as the new format.
* Add upgrade path. When opening the disk cache, the codepath first
inspects the `version` field, and decide how to decode.
* Refactor metrics generation code and tests.
* Add tests on upgrade / compatibility with the old format.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-30 18:13:11 +00:00
Arpad Müller
bcfe013094 Don't keep around the timeline's remote_client (#9583)
Constructing a remote client is no big deal. Yes, it means an extra
download from S3 but it's not that expensive. This simplifies code paths
and scenarios to test. This unifies timelines that have been recently
offloaded with timelines that have been offloaded in an earlier
invocation of the process.

Part of #8088
2024-10-30 18:44:29 +01:00
Arpad Müller
d0a02f3649 Disallow archived timelines to be detached or reparented (#9578)
Disallow a request for timeline ancestor detach if either the to be
detached timeline, or any of the to be reparented timelines are
offloaded or archived.

In theory we could support timelines that are archived but not
offloaded, but archived timelines are at the risk of being offloaded, so
we treat them like offloaded timelines. As for offloaded timelines, any
code to "support" them would amount to unoffloading them, at which point
we can just demand to have the timelines be unarchived.

Part of #8088
2024-10-30 17:04:57 +01:00
Tristan Partin
8af9412eb2 Collect compute backpressure throttling time
This will tell us how much time the compute has spent throttled if
pageserver/safekeeper cannot keep up with WAL generation.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-30 09:58:29 -05:00
Erik Grinaker
96e35e11a6 postgres_ffi: add WAL generator for tests/benchmarks (#9503)
## Problem

We don't have a convenient way to generate WAL records for benchmarks
and tests.

## Summary of changes

Adds a WAL generator, exposed as an iterator. It currently only
generates logical messages (noops), but will be extended to write actual
table rows later.

Some existing code for WAL generation has been replaced with this
generator, to reduce duplication.
2024-10-30 14:46:39 +03:00
Alexey Kondratov
745061ddf8 chore(compute): Bump pg_mooncake to the latest version (#9576)
## Problem

There were some critical breaking changes made in the upstream since Oct
29th morning.

## Summary of changes

Point it to the topmost commit in the `neon` branch at the time of
writing this
https://github.com/Mooncake-Labs/pg_mooncake/commits/neon/
c495cd17d6
2024-10-30 11:07:02 +01:00
Tristan Partin
0c828c57e2 Remove non-gzipped basebackup code path
In July of 2023, Bojan and Chi authored
92aee7e07f. Our in production pageservers
are most definitely at a version where they all support gzipped
basebackups.
2024-10-29 23:03:45 -05:00
John Spray
8e2e9f0fed pageserver: generation-aware storage for TenantManifest (#9555)
## Problem

When tenant manifest objects are written without a generation suffix,
concurrently attached pageservers may stamp on each others writes of the
manifest and cause undefined behavior.

Closes: #9543 

## Summary of changes

- Use download_generation_object helper when reading manifests, to
search for the most recent generation
- Use Tenant::generation as the generation suffix when writing
manifests.
2024-10-29 23:24:04 +01:00
Tristan Partin
b77b9bdc9f Add tests for sql-exporter metrics
Should help us keep non-working metrics from hitting staging or
production.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Fixes: https://github.com/neondatabase/neon/issues/8569
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-29 15:13:06 -05:00
Alex Chi Z.
81f9aba005 fix(pagectl): layer parsing and image layer dump (#9571)
This patch contains various improvements for the pagectl tool.

## Summary of changes

* Rewrite layer name parsing: LayerName now supports all variants we use
now.
* Drop pagectl's own layer parsing function, use LayerName in the
pageserver crate.
* Support image layer dumping in the layer dump command using
ImageLayer::dump, drop the original implementation.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-29 15:16:23 -04:00
Alex Chi Z.
88ff8a7803 feat(pageserver): support partial gc-compaction for lowest retain lsn (#9134)
part of https://github.com/neondatabase/neon/issues/8921,
https://github.com/neondatabase/neon/issues/9114

## Summary of changes

We start the partial compaction implementation with the image layer
partial generation. The partial compaction API now takes a key range. We
will only generate images for that key range for now, and remove layers
fully included in the key range after compaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-10-29 18:25:32 +00:00
Konstantin Knizhnik
0c075fab3a Add --replica parameter to basebackup (#9553)
## Problem

See https://github.com/neondatabase/neon/pull/9458
This PR separates PS related changes in #9458 from compute_ctl changes
to enforce that PS is deployed before compute.

## Summary of changes

This PR adds handlings of `--replica` parameters of backebackup to page
server.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-29 18:40:10 +02:00
Anastasia Lubennikova
80e1630042 Use pg_mooncake from our fork. (#9565)
Switch to main repo once
https://github.com/Mooncake-Labs/pg_mooncake/pull/3 is merged
2024-10-29 15:57:52 +00:00
Jakub Kołodziejczak
57499640c5 proxy: more granular http status codes for sql-over-http errors (#9549)
closes #9532
2024-10-29 15:44:45 +00:00
Anastasia Lubennikova
793ad50b7d fix allow_unstable_extensions GUC - make it USERSET (#9563)
fix message wording
2024-10-29 14:25:23 +00:00
John Spray
7a1331eee5 pageserver: make concurrent offloaded timeline operations safe wrt manifest uploads (#9557)
## Problem

Uploads of the tenant manifest could race between different tasks,
resulting in unexpected results in remote storage.

Closes: https://github.com/neondatabase/neon/issues/9556

## Summary of changes

- Create a central function for uploads that takes a tokio::sync::Mutex
- Store the latest upload in that Mutex, so that when there is lots of
concurrency (e.g. archive 20 timelines at once) we can coalesce their
manifest writes somewhat.
2024-10-29 13:54:48 +00:00
John Spray
4ef74215e1 pageserver: refactor generation-aware loading code into generic (#9545)
## Problem

Indices used to be the only kind of object where we had to search across
generations to find the most recent one. As of
https://github.com/neondatabase/neon/issues/9543, manifests will need
the same treatment.

## Summary of changes

- Refactor download_index_part to a generic download_generation_object
function, which will be usable for downloading manifest objects as well.
2024-10-29 13:00:03 +00:00
Conrad Ludgate
d4cbc8cfeb [auth_broker]: regress test (#9541)
python based regression test setup for auth_broker. This uses a http
mock for cplane as well as the JWKs url.

complications:
1. We cannot just use local_proxy binary, as that requires the
pg_session_jwt extension which we don't have available in the current
test suite
2. We cannot use just any old http mock for local_proxy, as auth_broker
requires http2 to local_proxy

as such, I used the h2 library to implement an echo server - copied from
the examples in the h2 docs.
2024-10-29 11:39:09 +00:00
Conrad Ludgate
47c35f67c3 [proxy]: fix JWT handling for AWS cognito. (#9536)
In the base64 payload of an aws cognito jwt, I saw the following:

```
"iss":"https:\/\/cognito-idp.us-west-2.amazonaws.com\/us-west-2_redacted"
```

issuers are supposed to be URLs, and URLs are always valid un-escaped
JSON. However, `\/` is a valid escape character so what AWS is doing is
technically correct... sigh...

This PR refactors the test suite and adds a new regression test for
cognito.
2024-10-29 11:01:09 +00:00
Peter Bendel
45b558f480 temporarily increase timeout for clickbench benchmark until regression is resolved (#9554)
## Problem

click bench job in benchmarking workflow has a performance regression
causing it to run in timeout of max job run.

Suspected root cause:
Project has been migrated from single pageserver to storage controller
managed project on Oct 14th.
Since then the regression shows.

## Summary of changes

Increase timeout of pytest to 12 hours.
Increase job timeout to 12 hours
2024-10-29 10:53:28 +00:00
Arpad Müller
a73402e646 Offloaded timeline deletion (#9519)
As pointed out in
https://github.com/neondatabase/neon/pull/9489#discussion_r1814699683 ,
we currently didn't support deletion for offloaded timelines after the
timeline has been loaded from the manifest instead of having been
offloaded.

This was because the upload queue hasn't been initialized yet. This PR
thus initializes the timeline and shuts it down immediately.

Part of #8088
2024-10-29 10:41:53 +00:00
Vlad Lazar
07b974480c pageserver: move things around to prepare for decoding logic (#9504)
## Problem

We wish to have high level WAL decoding logic in `wal_decoder::decoder`
module.

## Summary of Changes

For this we need the `Value` and `NeonWalRecord` types accessible there, so:
1. Move `Value` and `NeonWalRecord` to `pageserver::value` and
`pageserver::record` respectively.
2. Get rid of `pageserver::repository` (follow up from (1))
3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In
theory they could live in `wal_decoder`, but it would create a circular
dependency between `wal_decoder` and `postgres_ffi`. Long term it makes
sense for those types to be PG version specific, so that will work out nicely.
4. Move higher level WAL record types (to be ingested by pageserver)
into `wal_decoder::models`

Related: https://github.com/neondatabase/neon/issues/9335
Epic: https://github.com/neondatabase/neon/issues/9329
2024-10-29 10:00:34 +00:00
Arpad Müller
62f5d484d9 Assert the tenant to be active in unoffload_timeline (#9539)
Currently, all callers of `unoffload_timeline` ensure that the tenant
the unoffload operation is called on is active. We rely on it being
active as we activate the timeline below and don't want to race with the
activation code of the tenant (in the worst case, activating a timeline
twice).

Therefore, add this assertion.

Part of #8088
2024-10-29 00:36:05 +00:00
Tristan Partin
4df3987054 Get role name when not a C string
We will only have a C string if the specified role is a string.
Otherwise, we need to resolve references to public, current_role,
current_user, and session_user.

Fixes: https://github.com/neondatabase/cloud/issues/19323
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-28 18:21:45 -05:00
Konstantin Knizhnik
0624565617 Create the notion of unstable extensions
As a DBaaS provider, Neon needs to provide a stable platform for
customers to build applications upon. At the same time however, we also
need to enable customers to use the latest and greatest technology, so
they can prototype their work, and we can solicit feedback. If all
extensions are treated the same in terms of stability, it is hard to
meet that goal.

There are now two new GUCs created by the Neon extension:

neon.allow_unstable_extensions: This is a session GUC which allows
a session to install and load unstable extensions.

neon.unstable_extensions: This is a comma-separated list of extension
names. We can check if a CREATE EXTENSION statement is attempting to
install an unstable extension, and if so, deny the request if
neon.allow_unstable_extensions is not set to true.

Signed-off-by: Tristan Partin <tristan@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-28 17:47:15 -05:00
George MacKerron
7d5f6b6a52 Build pgrag extensions x3 (#8486)
Build the pgrag extensions (rag, rag_bge_small_en_v15, and
rag_jina_reranker_v1_tiny_en) as part of the compute node Dockerfile.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-10-28 20:06:36 +00:00
Alex Chi Z.
f7c61e856f fix(pageserver): bump tokio-epoll-uring (#9546)
Includes https://github.com/neondatabase/tokio-epoll-uring/pull/58 that
fixes the clippy error.

## Summary of changes

Update the version of tokio-epoll-uring

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-28 20:03:02 +00:00
Alex Chi Z.
57c21aff9f refactor(pageserver): remove aux v1 configs (#9494)
## Problem

Part of https://github.com/neondatabase/neon/issues/8623

## Summary of changes

Removed all aux-v1 config processing code. Note that we persisted it
into the index part file, so we cannot really remove the field from
index part. I also kept the config item within the tenant config, but we
will not read it any more.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-28 19:51:14 +00:00
Erik Grinaker
248558dee8 safekeeper: refactor WalAcceptor to be event-driven (#9462)
## Problem

The `WalAcceptor` main loop currently uses two nested loops to consume
inbound messages. This makes it hard to slot in periodic events like
metrics collection. It also duplicates the event processing code, and assumes
all messages in steady state are AppendRequests (other messages types may
be dropped if following an AppendRequest).

## Summary of changes

Refactor the `WalAcceptor` loop to be event driven.
2024-10-28 17:18:37 +00:00
Sergey Melnikov
3bad52543f We don't have legacy proxies anymore (#9544)
We don't have legacy scram proxies anymore:
cc: https://github.com/neondatabase/cloud/issues/9745
2024-10-28 16:42:35 +00:00
Tristan Partin
3d64a7ddcd Add pg_mooncake to compute-node.Dockerfile
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-28 11:23:30 -05:00
Conrad Ludgate
25f1e5cfeb [proxy] demote warnings and remove dead-argument (#9512)
fixes https://github.com/neondatabase/cloud/issues/19000
2024-10-28 15:02:20 +00:00
Rahul Patil
8dd555d396 ci(proxy): Update GH action flag on proxy deployment (#9535)
## Problem

Based on a recent proxy deployment issue, we deployed another proxy
version (proxy-scram), which was not needed when deploying a specific
proxy type. we have
[PR](https://github.com/neondatabase/infra/pull/2142) to update on the
infra branch and need to update CI in this repo which triggers proxy
deployment.

## Summary of changes

- Update proxy deployment flag 

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-10-28 13:17:09 +01:00
Arthur Petukhovsky
01b6843e12 Route pgbouncer logs to virtio-serial (#9488)
virtio-serial is much more performant than /dev/console emulation,
therefore, is much more suitable for the verbose logs inside vm. This
commit changes routing for pgbouncer logs, since we've recently noticed
it can emit large volumes of logs.

Manually tested on staging by pinning a compute image to my test
project.

Should help with https://github.com/neondatabase/cloud/issues/19072
2024-10-28 12:09:47 +00:00
John Spray
93987b5a4a tests: add test_storage_controller_onboard_detached (#9431)
## Problem

We haven't historically taken this API route where we would onboard a
tenant to the controller in detached state. It worked, but we didn't
have test coverage.

## Summary of changes

- Add a test that onboards a tenant to the storage controller in
Detached mode, and checks that deleting it without attaching it works as
expected.
2024-10-28 11:11:12 +00:00
John Spray
33baca07b6 storcon: add an API to cancel ongoing reconciler (#9520)
## Problem

If something goes wrong with a live migration, we currently only have
awkward ways to interrupt that:
- Restart the storage controller
- Ask it to do some other modification/migration on the shard, which we
don't really want.

## Summary of changes

- Add a new `/cancel` control API, and storcon_cli wrapper for it, which
fires the Reconciler's cancellation token. This is just for on-call use
and we do not expect it to be used by any other services.
2024-10-28 09:26:01 +00:00
John Spray
923974d4da safekeeper: don't un-evict timelines during snapshot API handler (#9428)
## Problem

When we use pull_timeline API on an evicted timeline, it gets downloaded
to serve the snapshot API request. That means that to evacuate all the
timelines from a node, the node needs enough disk space to download
partial segments from all timelines, which may not be physically the
case.

Closes: #8833 

## Summary of changes

- Add a "try" variant of acquiring a residence guard, that returns None
if the timeline is offloaded
- During snapshot API handler, take a different code path if the
timeline isn't resident, where we just read the checkpoint and don't try
to read any segments.
2024-10-28 08:47:12 +00:00
Arpad Müller
e7277885b3 Don't consider archived timelines for synthetic size calculation (#9497)
Archived timelines should not count towards synthetic size.

Closes #9384.

Part of #8088.
2024-10-26 13:27:57 +00:00
dependabot[bot]
80262e724f build(deps): bump werkzeug from 3.0.3 to 3.0.6 (#9527) 2024-10-26 08:24:15 +01:00
Yuchen Liang
85b954f449 pageserver: add tokio-epoll-uring slots waiters queue depth metrics (#9482)
In complement to
https://github.com/neondatabase/tokio-epoll-uring/pull/56.

## Problem

We want to make tokio-epoll-uring slots waiters queue depth observable
via Prometheus.

## Summary of changes

- Add `pageserver_tokio_epoll_uring_slots_submission_queue_depth`
metrics as a `Histogram`.
- Each thread-local tokio-epoll-uring system is given a `LocalHistogram`
to observe the metrics.
- Keep a list of `Arc<ThreadLocalMetrics>` used on-demand to flush data
to the shared histogram.
- Extend `Collector::collect` to report
`pageserver_tokio_epoll_uring_slots_submission_queue_depth`.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-10-25 21:30:57 +01:00
Arpad Müller
76328ada05 Fix unoffload_timeline races with creation (#9525)
This PR does two things:

1. Obtain a `TimelineCreateGuard` object in `unoffload_timeline`. This
prevents two unoffload tasks from racing with each other. While they
already obtain locks for `timelines` and `offloaded_timelines`, they
aren't sufficient, as we have already constructed an entire timeline at
that point. We shouldn't ever have two `Timeline` objects in the same
process at the same time.
2. don't allow timeline creations for timelines that have been
offloaded. Obviously they already exist, so we should not allow
creation. the previous logic only looked at the timelines list.

Part of #8088
2024-10-25 20:06:27 +00:00
Erik Grinaker
b54b632c6a safekeeper: don't pass conf into storage constructors (#9523)
## Problem

The storage components take an entire `SafekeeperConf` during
construction, but only actually use the `no_sync` field. This makes it
hard to understand the storage inputs (which fields do they actually
care about?), and is also inconvenient for tests and benchmarks that
need to set up a lot of unnecessary boilerplate.

## Summary of changes

* Don't take the entire config, but pass in the `no_sync` field
explicitly.
* Take the timeline dir instead of `ttid` as an input, since it's the
only thing it cares about.
* Fix a couple of tests to not leak tempdirs.
* Various minor tweaks.
2024-10-25 18:19:52 +01:00
Erik Grinaker
9909551f47 safekeeper: fix version in TimelinePersistentState::empty() (#9521)
## Problem

The Postgres version in `TimelinePersistentState::empty()` is incorrect:
the major version should be multiplied by 10000.

## Summary of changes

Multiply the version by 10000.
2024-10-25 16:22:35 +01:00