Compare commits

...

347 Commits

Author SHA1 Message Date
Konstantin Knizhnik
d40272f783 Prevent LFC overflow by skipping write when not unpinned pages are available 2025-01-20 18:19:06 +02:00
Christian Schwarz
02fc58b878 impr(timeline handles): add more tests covering reference cyle (#10446)
The other test focus on the external interface usage while the tests
added in this PR add some testing around HandleInner's lifecycle,
ensuring we don't leak it once either connection gets dropped or
per-timeline-state is shut down explicitly.
2025-01-20 14:37:24 +00:00
Arpad Müller
b312a3c320 Move DeleteTimelineFlow::prepare to separate function and use enum (#10334)
It was requested by review in #10305 to use an enum or something like it
for distinguishing the different modes instead of two parameters,
because two flags allow four combinations, and two of them don't really
make sense/ aren't used.

follow-up of #10305
2025-01-20 12:50:44 +00:00
John Spray
7d761a9d22 storage controller: make chaos less disruptive to AZ locality (#10438)
## Problem

Since #9916 , the chaos code is actively fighting the optimizer: tenants
tend to be attached in their preferred AZ, so most chaos migrations were
moving them to a non-preferred AZ.

## Summary of changes

- When picking migrations, prefer to migrate things _toward_ their
preferred AZ when possible. Then pick shards to move the other way when
necessary.

The resulting behavior should be an alternating "back and forth" where
the chaos code migrates thiings away from home, and then migrates them
back on the next iteration.

The side effect will be that the chaos code actively helps to push
things into their home AZ. That's not contrary to its purpose though: we
mainly just want it to continuously migrate things to exercise
migration+notification code.
2025-01-20 09:47:23 +00:00
John Spray
8bdaee35f3 pageserver: safety checks on validity of uploaded indices (#10403)
## Problem

Occasionally, we encounter bugs in test environments that can be
detected at the point of uploading an index, but we proceed to upload it
anyway and leave a tenant in a broken state that's awkward to handle.

## Summary of changes

- Validate index when submitting it for upload, so that we can see the
issue quickly e.g. in an API invoking compaction
- Validate index before executing the upload, so that we have a hard
enforcement that any code path that tries to upload an index will not
overwrite a valid index with an invalid one.
2025-01-20 09:20:31 +00:00
Arpad Müller
b0f34099f9 Add safekeeper utilization endpoint (#10429)
Add an endpoint to obtain the utilization of a safekeeper. Future
changes to the storage controller can use this endpoint to find the most
suitable safekeepers for newly created timelines, analogously to how
it's done for pageservers already.

Initially we just want to assign by timeline count, then we can iterate
from there.

Part of https://github.com/neondatabase/neon/issues/9011
2025-01-17 21:43:52 +00:00
Vlad Lazar
6975228a76 pageserver: add initdb metrics (#10434)
## Problem

Initdb observability is poor.

## Summary of changes

Add some metrics so we can figure out which part, if any, is slow.

Closes https://github.com/neondatabase/neon/issues/10423
2025-01-17 14:51:33 +00:00
JC Grünhage
053abff71f Fix dependency on neon-image in promote-images-dev (#10437)
## Problem
871e8b325f failed CI on main because a job
ran to soon. This was caused by
ea84ec357f. While `promote-images-dev`
does not inherently need `neon-image`, a few jobs depending on
`promote-images-dev` do need it, and previously had it when it was
`promote-images`, which depended on `test-images`, which in turn
depended on `neon-image`.

## Summary of changes
To ensure jobs depending `docker.io/neondatabase/neon` images get them,
`promote-images-dev` gets the dependency to `neon-image` back which it
previously had transitively through `test-images`.
2025-01-17 14:21:30 +00:00
Tristan Partin
871e8b325f Use the request ID given by the control plane in compute_ctl (#10418)
Instead of generating our own request ID, we can just use the one
provided by the control plane. In the event, we get a request from a
client which doesn't set X-Request-ID, then we just generate one which
is useful for tracing purposes.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-16 20:46:53 +00:00
Christian Schwarz
c47c5f4ace fix(page_service pipelining): tenant cannot shut down because gate kept open while flushing responses (#10386)
# Refs

- fixes https://github.com/neondatabase/neon/issues/10309
- fixup of batching design, first introduced in
https://github.com/neondatabase/neon/pull/9851
- refinement of https://github.com/neondatabase/neon/pull/8339

# Problem

`Tenant::shutdown` was occasionally taking many minutes (sometimes up to
20) in staging and prod if the
`page_service_pipelining.mode="concurrent-futures"` is enabled.

# Symptoms

The issue happens during shard migration between pageservers.
There is page_service unavailability and hence effectively downtime for
customers in the following case:
1. The source (state `AttachedStale`) gets stuck in `Tenant::shutdown`,
waiting for the gate to close.
2. Cplane/Storcon decides to transition the target `AttachedMulti` to
`AttachedSingle`.
3. That transition comes with a bump of the generation number, causing
the `PUT .../location_config` endpoint to do a full `Tenant::shutdown` /
`Tenant::attach` cycle for the target location.
4. That `Tenant::shutdown` on the target gets stuck, waiting for the
gate to close.
5. Eventually the gate closes (`close completed`), correlating with a
`page_service` connection handler logging that it's exiting because of a
network error (`Connection reset by peer` or `Broken pipe`).

While in (4):
- `Tenant::shutdown` is stuck waiting for all `Timeline::shutdown` calls
to complete.
  So, really, this is a `Timeline::shutdown` bug.
- retries from Cplane/Storcon to complete above state transitions, fail
with errors related to the tenant mgr slot being in state
`TenantSlot::InProgress`, the tenant state being
`TenantState::Stopping`, and the timelines being in
`TimelineState::Stopping`, and the `Timeline::cancel` being cancelled.
- Existing (and/or new?) page_service connections log errors `error
reading relation or page version: Not found: Timed out waiting 30s for
tenant active state. Latest state: None`

# Root-Cause

After a lengthy investigation ([internal
write-up](https://www.notion.so/neondatabase/2025-01-09-batching-deadlock-Slow-Log-Analysis-in-Staging-176f189e00478050bc21c1a072157ca4?pvs=4))
I arrived at the following root cause.

The `spsc_fold` channel (`batch_tx`/`batch_rx`) that connects the
Batcher and Executor stages of the pipelined mode was storing a `Handle`
and thus `GateGuard` of the Timeline that was not shutting down.
The design assumption with pipelining was that this would always be a
short transient state.
However, that was incorrect: the Executor was stuck on writing/flushing
an earlier response into the connection to the client, i.e., socket
write being slow because of TCP backpressure.

The probable scenario of how we end up in that case:
1. Compute backend process sends a continuous stream of getpage prefetch
requests into the connection, but never reads the responses (why this
happens: see Appendix section).
2. Batch N is processed by Batcher and Executor, up to the point where
Executor starts flushing the response.
3. Batch N+1 is procssed by Batcher and queued in the `spsc_fold`.
4. Executor is still waiting for batch N flush to finish.
5. Batcher eventually hits the `TimeoutReader` error (10min).
From here on it waits on the
`spsc_fold.send(Err(QueryError(TimeoutReader_error)))`
which will never finish because the batch already inside the `spsc_fold`
is not
being read by the Executor, because the Executor is still stuck in the
flush.
   (This state is not observable at our default `info` log level)
6. Eventually, Compute backend process is killed (`close()` on the
socket) or Compute as a whole gets killed (probably no clean TCP
shutdown happening in that case).
7. Eventually, Pageserver TCP stack learns about (6) through RST packets
and the Executor's flush() call fails with an error.
8. The Executor exits, dropping `cancel_batcher` and its end of the
spsc_fold.
   This wakes Batcher, causing the `spsc_fold.send` to fail.
   Batcher exits.
   The pipeline shuts down as intended.
We return from `process_query` and log the `Connection reset by peer` or
`Broken pipe` error.

The following diagram visualizes the wait-for graph at (5)

```mermaid
flowchart TD
   Batcher --spsc_fold.send(TimeoutReader_error)--> Executor
   Executor --flush batch N responses--> socket.write_end
   socket.write_end --wait for TCP window to move forward--> Compute
```

# Analysis

By holding the GateGuard inside the `spsc_fold` open, the pipelining
implementation
violated the principle established in
(https://github.com/neondatabase/neon/pull/8339).
That is, that `Handle`s must only be held across an await point if that
await point
is sensitive to the `<Handle as Deref<Target=Timeline>>::cancel` token.

In this case, we were holding the Handle inside the `spsc_fold` while
awaiting the
`pgb_writer.flush()` future.

One may jump to the conclusion that we should simply peek into the
spsc_fold to get
that Timeline cancel token and be sensitive to it during flush, then.

But that violates another principle of the design from
https://github.com/neondatabase/neon/pull/8339.
That is, that the page_service connection lifecycle and the Timeline
lifecycles must be completely decoupled.
Tt must be possible to shut down one shard without shutting down the
page_service connection, because on that single connection we might be
serving other shards attached to this pageserver.
(The current compute client opens separate connections per shard, but,
there are plans to change that.)

# Solution

This PR adds a `handle::WeakHandle` struct that does _not_ hold the
timeline gate open.
It must be `upgrade()`d to get a `handle::Handle`.
That `handle::Handle` _does_ hold the timeline gate open.

The batch queued inside the `spsc_fold` only holds a `WeakHandle`.
We only upgrade it while calling into the various `handle_` methods,
i.e., while interacting with the `Timeline` via `<Handle as
Deref<Target=Timeline>>`.
All that code has always been required to be (and is!) sensitive to
`Timeline::cancel`, and therefore we're guaranteed to bail from it
quickly when `Timeline::shutdown` starts.
We will drop the `Handle` immediately, before we start
`pgb_writer.flush()`ing the responses.
Thereby letting go of our hold on the `GateGuard`, allowing the timeline
shutdown to complete while the page_service handler remains intact.

# Code Changes

* Reproducer & Regression Test
* Developed and proven to reproduce the issue in
https://github.com/neondatabase/neon/pull/10399
* Add a `Test` message to the pagestream protocol (`cfg(feature =
"testing")`).
* Drive-by minimal improvement to the parsing code, we now have a
`PagestreamFeMessageTag`.
* Refactor `pageserver/client` to allow sending and receiving
`page_service` requests independently.
  * Add a Rust helper binary to produce situation (4) from above
* Rationale: (4) and (5) are the same bug class, we're holding a gate
open while `flush()`ing.
* Add a Python regression test that uses the helper binary to
demonstrate the problem.
* Fix
   * Introduce and use `WeakHandle` as explained earlier.
* Replace the `shut_down` atomic with two enum states for `HandleInner`,
wrapped in a `Mutex`.
* To make `WeakHandle::upgrade()` and `Handle::downgrade()`
cache-efficient:
     * Wrap the `Types::Timeline` in an `Arc`
     * Wrap the `GateGuard` in an `Arc`
* The separate `Arc`s enable uncontended cloning of the timeline
reference in `upgrade()` and `downgrade()`.
If instead we were `Arc<Timeline>::clone`, different connection handlers
would be hitting the same cache line on every upgrade()/downgrade(),
causing contention.
* Please read the udpated module-level comment in `mod handle`
module-level comment for details.

# Testing & Performance

The reproducer test that failed before the changes now passes, and
obviously other tests are passing as well.

We'll do more testing in staging, where the issue happens every ~4h if
chaos migrations are enabled in storcon.

Existing perf testing will be sufficient, no perf degradation is
expected.
It's a few more alloctations due to the added Arc's, but, they're low
frequency.

# Appendix: Why Compute Sometimes Doesn't Read Responses

Remember, the whole problem surfaced because flush() was slow because
Compute was not reading responses. Why is that?

In short, the way the compute works, it only advances the page_service
protocol processing when it has an interest in data, i.e., when the
pagestore smgr is called to return pages.

Thus, if compute issues a bunch of requests as part of prefetch but then
it turns out it can service the query without reading those pages, it
may very well happen that these messages stay in the TCP until the next
smgr read happens, either in that session, or possibly in another
session.

If there’s too many unread responses in the TCP, the pageserver kernel
is going to backpressure into userspace, resulting in our stuck flush().

All of this stems from the way vanilla Postgres does prefetching and
"async IO":
it issues `fadvise()` to make the kernel do the IO in the background,
buffering results in the kernel page cache.
It then consumes the results through synchronous `read()` system calls,
which hopefully will be fast because of the `fadvise()`.

If it turns out that some / all of the prefetch results are not needed,
Postgres will not be issuing those `read()` system calls.
The kernel will eventually react to that by reusing page cache pages
that hold completed prefetched data.
Uncompleted prefetch requests may or may not be processed -- it's up to
the kernel.

In Neon, the smgr + Pageserver together take on the role of the kernel
in above paragraphs.
In the current implementation, all prefetches are sent as GetPage
requests to Pageserver.
The responses are only processed in the places where vanilla Postgres
would do the synchronous `read()` system call.
If we never get to that, the responses are queued inside the TCP
connection, which, once buffers run full, will backpressure into
Pageserver's sending code, i.e., the `pgb_writer.flush()` that was the
root cause of the problems we're fixing in this PR.
2025-01-16 20:34:02 +00:00
Tristan Partin
b0838a68e5 Enable pgx_ulid on Postgres 17 (#10397)
The extension now supports Postgres 17. The release also seems to be
binary compatible with the previous version.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-16 19:49:04 +00:00
John Spray
8f2ebc0684 tests: stabilize test_storage_controller_node_deletion (#10420)
## Problem

`test_storage_controller_node_deletion` sometimes failed because shards
were moving around during timeline creation, and neon_local isn't
tolerant of that. The movements were unexpected because the shards had
only just been created.

This was a regression from #9916

Closes: #10383 

## Summary of changes

- Make this test use multiple AZs -- this makes the storage controller's
scheduling reliably stable

Why this works: in #9916 , I made a simplifying assumption that we would
have multiple AZs to get nice stable scheduling -- it's much easier,
because each tenant has a well defined primary+secondary location when
they have an AZ preference and nodes have different AZs. Everything
still works if you don't have multiple AZs, but you just have this quirk
that sometimes the optimizer can disagree with initial scheduling, so
once in a while a shard moves after being created -- annoying for tests,
harmless IRL.
2025-01-16 19:00:16 +00:00
Vlad Lazar
3a285a046b pageserver: include node id when subscribing to SK (#10432)
## Problem

All pageserver have the same application name which makes it hard to
distinguish them.

## Summary of changes

Include the node id in the application name sent to the safekeeper. This
should gives us
more visibility in logs. There's a few metrics that will increase in
cardinality by `pageserver_count`,
but that's fine.
2025-01-16 18:51:56 +00:00
John Spray
da13154791 storcon: revise fill logic to prioritize AZ (#10411)
## Problem

Node fills were limited to moving (total shards / node_count) shards. In
systems that aren't perfectly balanced already, that leads us to skip
migrating some of the shards that belong on this node, generating work
for the optimizer later to gradually move them back.

## Summary of changes

- Where a shard has a preferred AZ and is currently attached outside
this AZ, then always promote it during fill, irrespective of target fill
count
2025-01-16 17:33:46 +00:00
John Spray
2e13a3aa7a storage controller: handle legacy TenantConf in consistency_check (#10422)
## Problem

We were comparing serialized configs from the database with serialized
configs from memory. If fields have been added/removed to TenantConfig,
this generates spurious consistency errors. This is fine in test
environments, but limits the usefulness of this debug API in the field.

Closes: https://github.com/neondatabase/neon/issues/10369

## Summary of changes

- Do a decode/encode cycle on the config before comparing it, so that it
will have exactly the expected fields.
2025-01-16 16:56:44 +00:00
Alex Chi Z.
cccc196848 refactor(pageserver): make partitioning an ArcSwap (#10377)
## Problem

gc-compaction needs the partitioning data to decide the job split. This
refactor allows concurrent access/computing the partitioning.

## Summary of changes

Make `partitioning` an ArcSwap so that others can access the
partitioning while we compute it. Fully eliminate the `repartition is
called concurrently` warning when gc-compaction is going on.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-16 15:33:37 +00:00
Arpad Müller
e436dcad57 Rename "disabled" safekeeper scheduling policy to "pause" (#10410)
Rename the safekeeper scheduling policy "disabled" to "pause".

A rename was requested in
https://github.com/neondatabase/neon/pull/10400#discussion_r1916259124,
as the "disabled" policy is meant to be analogous to the "pause" policy
for pageservers.

Also simplify the `SkSchedulingPolicyArg::from_str` function, relying on
the `from_str` implementation of `SkSchedulingPolicy`. Latter is used
for the database format as well, so it is quite stable. If we ever want
to change the UI, we'll need to duplicate the function again but this is
cheap.
2025-01-16 14:30:49 +00:00
John Spray
21d7b6a258 tests: refactor test_tenant_delete_races_timeline_creation (#10425)
## Problem

Threads spawned in `test_tenant_delete_races_timeline_creation` are not
joined before the test ends, and can generate
`PytestUnhandledThreadExceptionWarning` in other tests.


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10419/12805365523/index.html#/testresult/53a72568acd04dbd

## Summary of changes

- Wrap threads in ThreadPoolExecutor which will join them before the
test ends
- Remove a spurious deletion call -- the background thread doing
deletion ought to succeed.
2025-01-16 14:11:33 +00:00
JC Grünhage
86dbc44db1 CI: Run check-codestyle-rust as part of pre-merge-checks (#10387)
## Problem

When multiple changes are grouped in a merge group to be merged as part
of the merge queue, the changes might individually pass
`check-codestyle-rust` but not in their combined form.

## Summary of changes

- Move `check-codestyle-rust` into a reusable workflow that is called
from it's previous location in `build_and_test.yml`, and additionally
call it from `pre_merge_checks.yml`. The additional call does not run on
ARM, only x86, to ensure the merge queue continues being responsive.
- Trigger `pre_merge_checks.yml` on PRs that change any of the workflows
running in `pre_merge_checks.yml`, so that we get feedback on those
early an not only after trying to merge those changes.
2025-01-16 09:20:24 +00:00
Tristan Partin
58f6af6c9a Clean up compute_ctl extension server code (#10417) 2025-01-16 08:35:36 +00:00
Matthias van de Meent
7be971081a Make sure we request pages with a known-flushed LSN. (#10413)
This should fix the largest source of flakyness of
test_nbtree_pagesplit_cycleid.

## Problem

https://github.com/neondatabase/neon/issues/10390

## Summary of changes

By using a guaranteed-flushed LSN, we ensure that PS won't have to wait
forever.

(If it does wait forever, we know the issue can't be with Compute's WAL)
2025-01-16 08:34:11 +00:00
Arseny Sher
6fe4c6798f Add START_WAL_PUSH proto_version and allow_timeline_creation options. (#10406)
## Problem

As part of https://github.com/neondatabase/neon/issues/8614 we need to
pass options to START_WAL_PUSH.

## Summary of changes

Add two options. `allow_timeline_creation`, default true, disables
implicit timeline creation in the connection from compute. Eventually
such creation will be forbidden completely, but as we migrate to
configurations we need to support both: current mode and configurations
enabled where creation by compute is disabled.

`proto_version` specifies compute <-> sk protocol version. We have it
currently in the first greeting package also, but I plan to change tag
size from u64 to u8, which would make it hard to use. Command is more
appropriate place for it anyway.
2025-01-16 08:01:19 +00:00
Matthias van de Meent
2eda484ef6 prefetch: Read more frequently from TCP buffer (#10394)
This reduces pressure on the OS TCP read buffer by increasing the
moments we read data out of the receive buffer, and increasing the
number of bytes we can pull from that buffer when we do reads.

## Problem

A backend may not always consume its prefetch data quick enough

## Summary of changes

We add a new function `prefetch_pump_state` which pulls as many prefetch
requests from the OS TCP receive buffer as possible, but without
blocking.

This thus reduces pressure on OS-level TCP buffers, thus increasing
throughput by limiting throttling caused by full TCP buffers.
2025-01-16 02:43:47 +00:00
Mikhail Kot
c7429af8a0 Enable dblink (#10358)
Update compute image to include dblink #3720
2025-01-15 22:29:18 +00:00
Alex Chi Z.
a753349cb0 feat(pageserver): validate data integrity during gc-compaction (#10131)
## Problem

part of https://github.com/neondatabase/neon/issues/9114
part of investigation of
https://github.com/neondatabase/neon/issues/10049

## Summary of changes

* If `cfg!(test) or cfg!(feature = testing)`, then we will always try
generating an image to ensure the history is replayable, but not put the
image layer into the final layer results, therefore discovering wrong
key history before we hit a read error.
* I suspect it's easier to trigger some races if gc-compaction is
continuously run on a timeline, so I increased the frequency to twice
per 10 churns.
* Also, create branches in gc-compaction smoke tests to get more test
coverage.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad@neon.tech>
2025-01-15 22:04:06 +00:00
Gleb Novikov
55a68b28a2 fast import: restore to neondb (not postgres) database (#10251)
## Problem

`postgres` is system database at neon, so we need to do `pg_restore`
into `neondb` instead

https://github.com/neondatabase/cloud/issues/22100

## Summary of changes

Changed fast_import a little bit:
1. After succesfull connection creating `neondb` in postgres instance
2. Changed restore connstring to use new db
3. Added optional `source_connection_string`, which allows to skip
`s3_prefix` and just connect directly.
4. Added `-i` that stops process until sigterm 

## TODO
- [x] test image in cplane e2e
- [ ] Change import job image back to latest after this merged (partial
revert of https://github.com/neondatabase/cloud/pull/22338)
2025-01-15 20:51:09 +00:00
John Spray
fb0e2acb2f pageserver: add page_trace API for debugging (#10293)
## Problem

When a pageserver is receiving high rates of requests, we don't have a
good way to efficiently discover what the client's access pattern is.

Closes: https://github.com/neondatabase/neon/issues/10275

## Summary of changes

- Add
`/v1/tenant/x/timeline/y/page_trace?size_limit_bytes=...&time_limit_secs=...`
API, which returns a binary buffer.
- Add `pagectl page-trace` tool to decode and analyze the output.

---------

Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-01-15 19:07:22 +00:00
Arpad Müller
efaec6cdf8 Add endpoint and storcon cli cmd to set sk scheduling policy (#10400)
Implementing the last missing endpoint of #9981, this adds support to
set the scheduling policy of an individual safekeeper, as specified in
the RFC. However, unlike in the RFC we call the endpoint
`scheduling_policy` not `status`

Closes #9981.

As for why not use the upsert endpoint for this: we want to have the
safekeeper upsert endpoint be used for testing and for deploying new
safekeepers, but not for changes of the scheduling policy. We don't want
to change any of the other fields when marking a safekeeper as
decommissioned for example, so we'd have to first fetch them only to
then specify them again. Of course one can also design an endpoint where
one can omit any field and it doesn't get modified, but it's still not
great for observability to put everything into one big "change something
about this safekeeper" endpoint.
2025-01-15 18:15:30 +00:00
Tristan Partin
3d41069dc4 Update pgrx in extension builds to 0.12.9 (#10372)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-15 16:26:58 +00:00
Vlad Lazar
dbebede7bf safekeeper: fan out from single wal reader to multiple shards (#10190)
## Problem

Safekeepers currently decode and interpret WAL for each shard
separately.
This is wasteful in terms of CPU memory usage - we've seen this in
profiles.

## Summary of changes

Fan-out interpreted WAL to multiple shards. 
The basic is that wal decoding and interpretation happens in a separate
tokio task and senders
attach to it. Senders only receive batches concerning their shard and
only past the Lsn they've last seen.

Fan-out is gated behind the `wal_reader_fanout` safekeeper flag
(disabled by default for now).

When fan-out is enabled, it might be desirable to control the absolute
delta between the
current position and a new shard's desired position (i.e. how far behind
or ahead a shard may be).
`max_delta_for_fanout` is a new optional safekeeper flag which dictates
whether to create a new
WAL reader or attach to the existing one. By default, this behaviour is
disabled. Let's consider enabling
it if we spot the need for it in the field.

## Testing

Tests passed [here](https://github.com/neondatabase/neon/pull/10301)
with wal reader fanout enabled
as of
34f6a71718.

Related: https://github.com/neondatabase/neon/issues/9337
Epic: https://github.com/neondatabase/neon/issues/9329
2025-01-15 15:33:54 +00:00
Tristan Partin
3e529f124f Remove leading slashes when downloading remote files (#10396)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-15 15:29:52 +00:00
Arseny Sher
05a71c7d6a safekeeper: add membership configuration switch endpoint (#10241)
## Problem

https://github.com/neondatabase/neon/issues/9965

## Summary of changes

Add to safekeeper http endpoint to switch membership configuration. Also
add it to python client for tests, and add simple test itself.
2025-01-15 14:16:04 +00:00
Alexander Bayandin
b9464865b6 benchmarks: report successful runs to slack as well (#10393)
## Problem

Successful `benchmarks` runs doesn't have enough visibility

Ref https://neondb.slack.com/archives/C069Z2199DL/p1736868055094539

## Summary of changes
- Report both successful and failed `benchmarks` to Slack
- Update `slackapi/slack-github-action` action
2025-01-15 13:05:05 +00:00
Vlad Lazar
1577430408 safekeeper: decode and interpret for multiple shards in one go (#10201)
## Problem

Currently, we call `InterpretedWalRecord::from_bytes_filtered`
from each shard. To serve multiple shards at the same time,
the API needs to allow for enquiring about multiple shards.

## Summary of changes

This commit tweaks it a pretty brute force way. Naively, we could
just generate the shard for a key, but pre and post split shards
may be subscribed at the same time, so doing it efficiently is more
complex.
2025-01-15 11:10:24 +00:00
Erik Grinaker
05d17a10ae rfc: add CPU and heap profiling RFC (#10085)
This document proposes a standard cross-team pattern for CPU and memory
profiling across applications and languages, using the
[pprof](https://github.com/google/pprof) profile format.

It enables both ad hoc profiles via HTTP endpoints, and continuous
profiling across the fleet via [Grafana Cloud
Profiles](https://grafana.com/docs/grafana-cloud/monitor-applications/profiles/).
Continuous profiling incurs an overhead of about 0.1% CPU usage and 3%
slower heap allocations.


[Rendered](https://github.com/neondatabase/neon/blob/erik/profiling-rfc/docs/rfcs/040-profiling.md)

Touches #9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
2025-01-15 10:35:38 +00:00
Arseny Sher
2d0ea08524 Add safekeeper membership conf to control file. (#10196)
## Problem

https://github.com/neondatabase/neon/issues/9965

## Summary of changes

Add safekeeper membership configuration struct itself and storing it in
the control file. In passing also add creation timestamp to the control
file (there were cases where I wanted it in the past).

Remove obsolete unused PersistedPeerInfo struct from control file (still
keep it control_file_upgrade.rs to have it in old upgrade code).

Remove the binary representation of cfile in the roundtrip test.
Updating it is annoying, and we still test the actual roundtrip.

Also add configuration to timeline creation http request, currently used
only in one python test. In passing, slightly change LSNs meaning in the
request: normally start_lsn is passed (the same as ancestor_start_lsn in
similar pageserver call), but we allow specifying higher commit_lsn for
manual intervention if needed. Also when given LSN initialize
term_history with it.
2025-01-15 09:45:58 +00:00
Arseny Sher
c98cbbeac1 Add migration details to safekeeper membership RFC. (#10272)
## Problem

https://github.com/neondatabase/neon/pull/8455 wasn't specific enough on
migration from current situation to enabling generations.

## Summary of changes

Describe the missing parts, including control plane pushing generation
to compute, which also defines whether generations are enabled -- non
zero value does it.
2025-01-15 09:41:49 +00:00
John Spray
47c1640acc storage controller: pagination for tenant listing API (#10365)
## Problem

For large deployments, the `control/v1/tenant` listing API can time out
transmitting a monolithic serialized response.

## Summary of changes

- Add `limit` and `start_after` parameters to listing API
- Update storcon_cli to use these parameters and limit requests to 1000
items at a time
2025-01-14 21:37:32 +00:00
Erik Grinaker
6debb49b87 pageserver: coalesce index uploads when possible (#10248)
## Problem

With upload queue reordering in #10218, we can easily get into a
situation where multiple index uploads are queued back to back, which
can't be parallelized. This will happen e.g. when multiple layer flushes
enqueue layer/index/layer/index/... and the layers skip the queue and
are uploaded in parallel.

These index uploads will incur serial S3 roundtrip latencies, and may
block later operations.

Touches #10096.

## Summary of changes

When multiple back-to-back index uploads are ready to upload, only
upload the most recent index and drop the rest.
2025-01-14 21:10:17 +00:00
Erik Grinaker
e58e29e639 pageserver: limit number of upload queue tasks (#10384)
## Problem

The upload queue can currently schedule an arbitrary number of tasks.
This can both spawn an unbounded number of Tokio tasks, and also
significantly slow down upload queue scheduling as it's quadratic in
number of operations.

Touches #10096.

## Summary of changes

Limit the number of inprogress tasks to the remote storage upload
concurrency. While this concurrency limit is shared across all tenants,
there's certainly no point in scheduling more than this -- we could even
consider setting the limit lower, but don't for now to avoid
artificially constraining tenants.
2025-01-14 18:01:14 +00:00
Heikki Linnakangas
d36112d20f Simplify compute dockerfile by setting PATH just once (#10357)
By setting PATH in the 'pg-build' layer, all the extension build layers
will inherit. No need to pass PG_CONFIG to all the various make
invocations either: once pg_config is in PATH, the Makefiles will pick
it up from there.
2025-01-14 17:02:35 +00:00
Erik Grinaker
ffaa52ff5d pageserver: reorder upload queue when possible (#10218)
## Problem

The upload queue currently sees significant head-of-line blocking. For
example, index uploads act as upload barriers, and for every layer flush
we schedule a layer and index upload, which effectively serializes layer
uploads.

Resolves #10096.

## Summary of changes

Allow upload queue operations to bypass the queue if they don't conflict
with preceding operations, increasing parallelism.

NB: the upload queue currently schedules an explicit barrier after every
layer flush as well (see #8550). This must be removed to enable
parallelism. This will require a better mechanism for compaction
backpressure, see e.g. #8390 or #5415.
2025-01-14 16:31:59 +00:00
John Spray
aa7323a384 storage controller: quality of life improvements for AZ handling (#10379)
## Problem

Since https://github.com/neondatabase/neon/pull/9916, the preferred AZ
of a tenant is much more impactful, and we would like to make it more
visible in tooling.

## Summary of changes

- Include AZ in node describe API
- Include AZ info in node & tenant outputs in CLI
- Add metrics for per-node shard counts, labelled by AZ
- Add a CLI for setting preferred AZ on a tenant
- Extend AZ-setting API+CLI to handle None for clearing preferred AZ
2025-01-14 15:30:43 +00:00
Christian Schwarz
2466a2f977 page_service: throttle individual requests instead of the batched request (#10353)
## Problem

Before this PR, the pagestream throttle was applied weighted on a
per-batch basis.
This had several problems:

1. The throttle occurence counters were only bumped by `1` instead of
`batch_size`.
2. The throttle wait time aggregator metric only counted one wait time,
irrespective
of `batch_size`. That makes sense in some ways of looking at it but not
in others.
3. If the last request in the batch runs into the throttle, the other
requests in the
batch are also throttled, i.e., over-throttling happens (theoretical,
didn't measure
   it in practice).

## Solution

It occured to me that we can simply push the throttling upwards into
`pagestream_read_message`.

This has the added benefit that in pipeline mode, the `executor` stage
will, if it is idle,
steal whatever requests already made it into the `spsc_fold` and execute
them; before this
change, that was not the case - the throttling happened in the
`executor` stage instead of
the `batcher` stage.
   
## Code Changes

There are two changes in this PR:

1. Lifting up the throttling into the `pagestream_read_message` method.
2. Move the throttling metrics out of the `Throttle` type into
`SmgrOpMetrics`.
Unlike the other smgr metrics, throttling is per-tenant, hence the Arc.
3. Refactor the `SmgrOpTimer` implementation to account for the new
observation states,
   and simplify its design.
4. Drive-by-fix flush time metrics. It was using the same `now` in the
`observe_guard` every time.

The `SmgrOpTimer` is now a state machine.
Each observation point moves the state machine forward.
If a timer object is dropped early some "pair"-like metrics still
require an increment or observation.
That's done in the Drop implementation, by driving the state machine to
completion.
2025-01-14 15:28:01 +00:00
Alex Chi Z.
9bdb14c1c0 fix(pageserver): ensure initial image layers have correct key ranges (#10374)
## Problem

Discovered during the relation dir refactor work.

If we do not create images as in this patch, we would get two set of
image layers:

```
0000...METADATA_KEYS
0000...REL_KEYS
```

They overlap at the same LSN and would cause data loss for relation
keys. This doesn't happen in prod because initial image layer generation
is never called, but better to be fixed to avoid future issues with the
reldir refactors.

## Summary of changes

* Consolidate create_image_layers call into a single one.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-14 15:27:48 +00:00
Conrad Ludgate
df4abd8b14 fix: force-refresh azure identity token (#10378)
## Problem

Because of https://github.com/Azure/azure-sdk-for-rust/issues/1739, our
identity token file was not being refreshed. This caused our uploads to
start failing when the storage token expired.

## Summary of changes

Drop and recreate the remote storage config every time we upload in
order to force reload the identity token file.
2025-01-14 12:53:32 +00:00
Konstantin Knizhnik
a039f8381f Optimize vector get last written LSN (#10360)
## Problem

See https://github.com/neondatabase/neon/issues/10281

pg17 performs extra lock/unlock operation when fetching LwLSN.

## Summary of changes

Perform all lookups under one lock, moving initialization of not found
keys to separate loop.

Related Postgres PR:
https://github.com/neondatabase/postgres/pull/553

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-14 05:54:30 +00:00
Tristan Partin
430b556b34 Update postgres-exporter and sql_exporter in computes (#10349)
The postgres-exporter was much further out of date, but let's just bump
both.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-14 00:44:39 +00:00
Konstantin Knizhnik
1783501eaa Increase max connection for replica to prevent test flukyness (#10306)
## Problem

See https://github.com/neondatabase/neon/issues/10167
Too small number of `max_connections` (2) can cause failures of
test_physical_replication_config_mismatch_too_many_known_xids test

## Summary of changes

Increase `max_connections` to 5

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-13 20:01:03 +00:00
John Spray
fd1368d31e storcon: rework scheduler optimisation, prioritize AZ (#9916)
## Problem

We want to do a more robust job of scheduling tenants into their home
AZ: https://github.com/neondatabase/neon/issues/8264.

Closes:  https://github.com/neondatabase/neon/issues/8969

## Summary of changes

### Scope

This PR combines prioritizing AZ with a larger rework of how we do
optimisation. The rationale is that just bumping AZ in the order of
Score attributes is a very tiny change: the interesting part is lining
up all the optimisation logic to respect this properly, which means
rewriting it to use the same scores as the scheduler, rather than the
fragile hand-crafted logic that we had before. Separating these changes
out is possible, but would involve doing two rounds of test updates
instead of one.

### Scheduling optimisation

`TenantShard`'s `optimize_attachment` and `optimize_secondary` methods
now both use the scheduler to pick a new "favourite" location. Then
there is some refined logic for whether + how to migrate to it:
- To decide if a new location is sufficiently "better", we generate
scores using some projected ScheduleContexts that exclude the shard
under consideration, so that we avoid migrating from a node with
AffinityScore(2) to a node with AffinityScore(1), only to migrate back
later.
- Score types get a `for_optimization` method so that when we compare
scores, we will only do an optimisation if the scores differ by their
highest-ranking attributes, not just because one pageserver is lower in
utilization. Eventually we _will_ want a mode that does this, but doing
it here would make scheduling logic unstable and harder to test, and to
do this correctly one needs to know the size of the tenant that one is
migrating.
- When we find a new attached location that we would like to move to, we
will create a new secondary location there, even if we already had one
on some other node. This handles the case where we have a home AZ A, and
want to migrate the attachment between pageservers in that AZ while
retaining a secondary location in some other AZ as well.
- A unit test is added for
https://github.com/neondatabase/neon/issues/8969, which is implicitly
fixed by reworking optimisation to use the same scheduling scores as
scheduling.
2025-01-13 19:33:00 +00:00
Alex Chi Z.
e9ed53b14f feat(pageserver): support inherited sparse keyspace (#10313)
## Problem

In preparation to https://github.com/neondatabase/neon/issues/9516. We
need to store rel size and directory data in the sparse keyspace, but it
does not support inheritance yet.

## Summary of changes

Add a new type of keyspace "sparse but inherited" into the system.

On the read path: we don't remove the key range when we descend into the
ancestor. The search will stop when (1) the full key range is covered by
image layers (which has already been implemented before), or (2) we
reach the end of the ancestor chain.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-13 15:43:01 +00:00
Conrad Ludgate
a338aee132 feat(local_proxy): use ed25519 signatures with pg_session_jwt (#10290)
Generally ed25519 seems to be much preferred for cryptographic strength
to P256 nowadays, and it is NIST approved finally. We should use it
where we can as it's also faster than p256.

This PR makes the re-signed JWTs between local_proxy and pg_session_jwt
use ed25519.

This does introduce a new dependency on ed25519, but I do recall some
Neon Authorise customers asking for support for ed25519, so I am
justifying this dependency addition in the context that we can then
introduce support for customer ed25519 keys

sources:
* https://csrc.nist.gov/pubs/fips/186-5/final subsection 7 (EdDSA)
* https://datatracker.ietf.org/doc/html/rfc8037#section-3.1
2025-01-13 15:20:46 +00:00
Heikki Linnakangas
96243af651 Stop building unnecessary extension tarballs (#10355)
We build "custom extensions" from a different repository nowadays.
2025-01-13 15:01:13 +00:00
John Spray
ef8bfacd6b storage controller: API + CLI for migrating secondary locations (#10284)
## Problem

Currently, if we want to move a secondary there isn't a neat way to do
that: we just have migration API for the attached location, and it is
only clean to use that if you've manually created a secondary via
pageserver API in the place you're going to move it to.

Secondary migration API enables:
- Moving the secondary somewhere because we would like to later move the
attached location there.
- Move the secondary location because we just want to reclaim some disk
space from its current location.

## Summary of changes

- Add `/migrate_secondary` API
- Add `tenant-shard-migrate-secondary` CLI
- Add tests for above
2025-01-13 14:52:43 +00:00
Konstantin Knizhnik
ceacc29609 Start with minimal prefetch distance to minimize prefetch overhead for exact or limited index scans (#10359)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1736526089437179

In case of queries index scan with LIMIT clause, multiple backends can
concurrently send larger number of duplicated prefetch requests which
are not stored in LFC and so actually do useless job.

Current implementation of index prefetch starts with maximal prefetch
distance (10 by default now) when there are no key bounds, so in queries
with LIMIT clause like `select * from T order by pk limit 1` compute can
send a lot of useless prefetch requests to page server.

## Summary of changes

Always start with minimal prefetch distance even if there are not key
boundaries.

Related Postgres PRs:
https://github.com/neondatabase/postgres/pull/552
https://github.com/neondatabase/postgres/pull/551
https://github.com/neondatabase/postgres/pull/550
https://github.com/neondatabase/postgres/pull/549

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-13 14:26:11 +00:00
Erik Grinaker
b31ed0acd1 utils: add ?force=true hint for CPU profiler (#10368)
This makes it less annoying to try to take a CPU profile when a
continuous profile is already running.
2025-01-13 14:23:42 +00:00
Alexander Bayandin
b2d0e1a519 Link OpenSSL dynamically (#10302)
## Problem
Statically linked OpenSSL is buggy in multithreaded environment:
- https://github.com/neondatabase/cloud/issues/16155
- https://github.com/neondatabase/neon/issues/8275

## Summary of changes
- Link OpenSSL dynamically (revert OpenSSL part from
https://github.com/neondatabase/neon/pull/8074)

Before:
```
ldd /usr/local/v17/lib/libpq.so
        linux-vdso.so.1 (0x0000ffffb5ce4000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb5c10000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb5650000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffb5ca7000)
```

After:
```
ldd /usr/local/v17/lib/libpq.so
        linux-vdso.so.1 (0x0000ffffbf3e8000)
        libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3 (0x0000ffffbf260000)
        libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3 (0x0000ffffbec00000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffbf1c0000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbea50000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffbf3ab000)
```
2025-01-13 14:13:02 +00:00
John Spray
d1bc36f536 storage controller: fix retries of compute hook notifications while a secondary node is offline (#10352)
## Problem

We would sometimes fail to retry compute notifications:
1. Try and send, set compute_notify_failure if we can't
2. On next reconcile, reconcile() fails for some other reason (e.g.
tried to talk to an offline node), and we fail the `result.is_ok() &&
must_notify` condition around the re-sending.

Closes: https://github.com/neondatabase/cloud/issues/22612

## Summary of changes

- Clarify the meaning of the reconcile result: it should be Ok(()) if
configuring attached location worked, even if secondary or detach
locations cannot be reached.
- Skip trying to talk to secondaries if they're offline
- Even if reconcile fails and we can't send the compute notification (we
can't send it because we're not sure if it's really attached), make sure
we save the `compute_notify_failure` flag so that subsequent reconciler
runs will try again
- Add a regression test for the above
2025-01-13 13:31:57 +00:00
Erik Grinaker
0b9032065e utils: allow 60-second CPU profiles (#10367)
Taking continuous profiles every 20 seconds is likely too expensive (in
dollar terms). Let's try 60-second profiles. We can now interrupt
running profiles via `?force=true`, so this should be fine.
2025-01-13 13:14:23 +00:00
Heikki Linnakangas
09fe3b025c Add a websockets tunnel and a test for the proxy's websockets support. (#3823)
For testing the proxy's websockets support.

I wrote this to test https://github.com/neondatabase/neon/issues/3822.
Unfortunately, that bug can *not* be reproduced with this tunnel. The
bug only appears when the client pipelines the first query with the
authentication messages. The tunnel doesn't do that.

---

Update (@conradludgate 2025-01-10):

We have since added some websocket tests, but they manually implemented
a very simplistic setup of the postgres protocol. Introducing the tunnel
would make more complex testing simpler in the future.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2025-01-13 11:35:39 +00:00
John Spray
12053cf832 storage controller: improve consistency_check_api (#10363)
## Problem

Limitations found while using this to investigate
https://github.com/neondatabase/neon/issues/10234:
- If we hit a node consistency issue, we drop out and don't check shards
for consistency
- The messages printed after a shard consistency issue are huge, and
grafana appears to drop them.

## Summary of changes

- Defer node consistency errors until the end of the function, so that
we always proceed to check shards for consistency
- Print out smaller log lines that just point out the diffs between
expected and persistent state
2025-01-13 11:18:14 +00:00
Conrad Ludgate
de199d71e1 chore: Address lints introduced in rust 1.85.0 beta (#10340)
With a new beta build of the rust compiler, it's good to check out the
new lints. Either to find false positives, or find flaws in our code.
Additionally, it helps reduce the effort required to update to 1.85 in 6
weeks.
2025-01-13 10:34:36 +00:00
Erik Grinaker
22a6460010 libs/utils: add force parameter for /profile/cpu (#10361)
## Problem

It's only possible to take one CPU profile at a time. With Grafana
continuous profiling, a (low-frequency) CPU profile will always be
running, making it hard to take an ad hoc CPU profile at the same time.

Resolves #10072.

## Summary of changes

Add a `force` parameter for `/profile/cpu` which will end and return an
already running CPU profile, starting a new one for the current caller.
2025-01-13 10:01:18 +00:00
Erik Grinaker
cd982a82ec pageserver,safekeeper: increase heap profiling frequency to 2 MB (#10362)
## Problem

Currently, the heap profiling frequency is every 1 MB allocated. Taking
a profile stack trace takes about 1 µs, and allocating 1 MB takes about
15 µs, so the overhead is about 6.7% which is a bit high. This is a
fixed cost regardless of whether heap profiles are actually accessed.

## Summary of changes

Increase the heap profiling sample frequency from 1 MB to 2 MB, which
reduces the overhead to about 3.3%. This seems acceptable, considering
performance-sensitive code will avoid allocations as far as possible
anyway.
2025-01-13 09:44:59 +00:00
Heikki Linnakangas
8327f68043 Minor cleanup of extension build commands (#10356)
There used to be some pg version dependencies in these extensions, but
now that there isn't, follow the simpler pattern used in other
extensions. No change in the produced images.
2025-01-11 17:39:27 +00:00
Heikki Linnakangas
846e8fdce4 Remove obsolete hnsw extension (#8008)
This has been deprecated and disabled for new installations for a long
time. Let's remove it for good.
2025-01-11 14:20:50 +00:00
Heikki Linnakangas
70a3bf37a0 Stop building 'compute-tools' image (#10333)
It's been unused from time immemorial.

---------

Co-authored-by: Matthias van de Meent <matthias@neon.tech>
2025-01-11 13:09:55 +00:00
Arpad Müller
23c0748cdd Remove active column (#10335)
We don't need or want the `active` column. Remove it. Vlad pointed out
that this is safe.

Thanks to the separation of the schemata in earlier PRs, this is easy.

follow-up of #10205

Part of https://github.com/neondatabase/neon/issues/9981
2025-01-11 02:52:45 +00:00
Alex Chi Z.
b5d54ba52a refactor(pageserver): move queue logic to compaction.rs (#10330)
## Problem

close https://github.com/neondatabase/neon/issues/10031, part of
https://github.com/neondatabase/neon/issues/9114

## Summary of changes

Move the compaction job generation to `compaction.rs`, thus making the
code more readable and debuggable. We now also return running job
through the get compaction job API, versus before we only return
scheduled jobs.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-10 20:53:00 +00:00
Christian Schwarz
58332cb361 pageserver: remove unused metric pageserver_layers_visited_per_read_global (#10141)
As of commit "pageserver: remove legacy read path" (#8601) we always use
vectored get, which has a separate metric.
2025-01-10 20:35:50 +00:00
Christian Schwarz
9b43204893 fix(page_service): Timeline::gate held open while throttling (#10314)
When we moved throttling up from Timeline::get into page_service,
we stopped being sensitive to `Timeline::cancel`, even though we're
holding a Handle and thus a guard on the `Timeline::gate` open.

This PR rectifies the situation.

Refs

- Found while investigating #10309 (hung detach because gate kept open),
  but not expected to be the root cause of that issue because the
  affected tenants are not being throttled according to their metrics.
2025-01-10 19:21:01 +00:00
Christian Schwarz
cdd34dfc12 impr(utils/spsc_fold): add another test case (#10319)
Wondered about the case covered here while investigating #10309.
2025-01-10 19:19:48 +00:00
Vlad Lazar
3cd5034eac storcon: don't assume host:port format in storcon client (#10347)
## Problem

https://github.com/neondatabase/infra/pull/2725 updated the scrubber to
use a non-host
port endpoint for storcon. That breaks when unwrapping the port.

## Summary of changes

Support both `host:port` and `host` formats for the storcon api.
2025-01-10 18:35:16 +00:00
Cheng Chen
425b777840 chore(compute): pg_mooncake v0.1.0 (#10337)
## Problem
Upgrade pg_mooncake to v0.1.0

## Summary of changes
2025-01-10 16:38:13 +00:00
John Spray
4398051385 tests: smaller datasets in LFC tests (#10346)
## Problem

These two tests came up in #9537 as doing multi-gigabyte I/O, and from
inspection of the tests it doesn't seem like they need that to fulfil
their purpose.

## Summary of changes

- In test_local_file_cache_unlink, run fewer background threads with a
smaller number of rows. These background threads AFAICT exist to make
sure some I/O is going on while we unlink the LFC directory, but 5
threads should be enough for "some".
- In test_lfc_resize, tweak the test to validate that the cache size is
larger than the final size before resizing it, so that we're sure we're
writing enough data to really be doing something. Then decrease the
pgbench scale.
2025-01-10 15:53:23 +00:00
Folke Behrens
71bca6f580 poetry: Update packaging for poetry v2 (#10344)
## Problem

When poetry v2 (released Jan 5) is used it needs `packaging.metadata`
module, but we downgrade `packaging` to 23.0. `packaging==23.1`
introduced the metadata submodule.

## Summary of changes

Update `packaging` to 24.2.
2025-01-10 14:32:26 +00:00
John Spray
105f66c4ce tests: move test_parallel_copy into performance tree (#10343)
## Problem

This test writes ~5GB of data. It is not suitable to run in parallel
with all the other small tests in test_runner/regress.

via #9537 

## Summary of changes

- Move test_parallel_copy into the performance directory, so that it
does not run in parallel with other tests
2025-01-10 13:57:26 +00:00
John Spray
0d4fce2d35 tests: refine how compat snapshot is generated (#10342)
## Problem

I noticed in https://github.com/neondatabase/neon/pull/9537 that tests
which work with compat snapshots were writing several hundred MB of
data, which isn't really necessary.

Also, the snapshots are large but don't have the proper variety of
storage format features, e.g. they could just have L0 deltas.

## Summary of changes

- Use smaller scale factor and runtime to generate less data
- Configure a small layer size and use force image layer generation so
that our output contains L1 deltas and image layers, and has a decent
number of entries in the layer map
2025-01-10 13:57:23 +00:00
Erik Grinaker
2b8ea1e768 utils: add flamegraph for heap profiles (#10223)
## Problem

Unlike CPU profiles, the `/profile/heap` endpoint can't automatically
generate SVG flamegraphs. This requires the user to install and use
`pprof` tooling, which is unnecessary and annoying.

Resolves #10203.

## Summary of changes

Add `format=svg` for the `/profile/heap` route, and generate an SVG
flamegraph using the `inferno` crate, similarly to what `pprof-rs`
already does for CPU profiles.
2025-01-10 12:14:29 +00:00
Christian Schwarz
db00eb41a1 fix(spsc_fold): potentially missing wakeup when send()ing in state SenderWaitsForReceiverToConsume (#10318)
# Problem

Before this PR, there were cases where send() in state
SenderWaitsForReceiverToConsume would never be woken up
by the receiver, because it never registered with `wake_sender`.

Example Scenario 1: we stop polling a send() future A that was waiting
for the receiver to consume. We drop A and create a new send() future B.
B would return Poll::Pending and never regsister a waker.

Example Scenario 2: a send() future A transitions from HasData
to SenderWaitsForReceiverToConsume. This registers the context X
with `wake_sender`. But before the Receiver consumes the data,
we poll A from a different context Y.
The state is still SenderWaitsForReceiverToConsume, but we wouldn't
register the new context with `wake_sender`.
When the Receiver comes around to consume and `wake_sender.notify()`s,
it wakes the old context X instead of Y.

# Fix

Register the waker in the case where we're polled in
state `SenderWaitsForReceiverToConsume`.

# Relation to #10309

I found this bug while investigating #10309.
There was never proof that this bug here is the root cause for #10309.
In the meantime we found a more probably hypothesis
for the root cause than what is being fixed here.
Regardless, let's walk through my thought process about
how it might have been relevant:

There (in page_service), Scenario 1 does not apply because
we poll the send() future to completion.

Scenario 2 (`tokio::join!`) also does not apply with the
current `tokio::join!()` impl, because it will just poll each
future every time, each with the same context.
Although if we ever used something like a FuturesUnordered anywhere,
that will be using a different context, so, in that case,
the bug might materialize.

Regarding tokio & spurious poll in general:
@conradludgate is not aware of any spurious wakeup cases in current
tokio,
but within a `tokio::join!()`, any wake meant for one future will poll
all
the futures, so that can appear as a spurious wake up to the N-1 futures
of the `tokio::join!()`.
2025-01-10 11:06:03 +00:00
Conrad Ludgate
735c66dc65 fix(proxy): propagate the existing ComputeUserInfo to connect for cancellation (#10322)
## Problem

We were incorrectly constructing the ComputeUserInfo, used for
cancellation checks, based on the return parameters from postgres. This
didn't contain the correct info.

## Summary of changes

Propagate down the existing ComputeUserInfo.
2025-01-10 09:36:51 +00:00
Folke Behrens
77660f3d88 proxy: Fix parsing of UnknownTopic with payload (#10339)
## Problem

When the proxy receives a `Notification` with an unknown topic it's
supposed to use the `UnknownTopic` unit variant. Unfortunately, in
adjacently tagged enums serde will not simply ignore the configured
content if found and try to deserialize a map/object instead.

## Summary of changes

* Use a custom deserialize function to ignore variant content.
* Add a little unit test covering both cases.
2025-01-10 09:12:31 +00:00
Folke Behrens
b6205af4a5 Update tracing/otel crates (#10311)
Update the tracing(-x) and opentelemetry(-x) crates.

Some breaking changes require updating our code:
* Initialization is done via builders now

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/CHANGELOG.md#0270
* Errors from OTel SDK are logged via tracing crate as well.

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry/CHANGELOG.md#0270
2025-01-10 08:48:03 +00:00
Arpad Müller
6149ac8834 Handle race between auto-offload and unarchival (#10305)
## Problem

Auto-offloading as requested by the compaction task is racy with
unarchival, in that the compaction task might attempt to offload an
unarchived timeline. By that point it will already have set the timeline
to the `Stopping` state however, which makes it unusable for any
purpose. For example:

1. compaction task decides to offload timeline
2. timeline gets unarchived
3. `offload_timeline` gets called by compaction task
  * sets timeline's state to `Stopping`
  * realizes that the timeline can't be unarchived, errors out
6. endpoint can't be started as the timeline is `Stopping` and thus
'can't be found'.

A future iteration of the compaction task can't "heal" this state either
as the timeline will still not be archived, same goes for other
automatic stuff. The only way to heal this is a tenant detach+attach, or
alternatively a pageserver restart.

Furthermore, the compaction task is especially amenable for such races
as it first stores `can_offload` into a variable, figures out whether
compaction is needed (which takes some time), and only then does it
attempt an offload operation: the time difference between "check" and
"use" is non-trivially small.

To make it even worse, we start the compaction task right after attach
of a tenant, and it is a common pattern by pageserver users to attach a
tenant to then immediately unarchive a timeline, so that an endpoint can
be started.

## Solutions not adopted

The simplest solution is to move the `can_offload` check to right before
attempting of the offload. But this is not a good solution, as no lock
is held between that check and timeline shutdown. So races would still
be possible, just become less likely.

I explored using the timeline state for this, as in adding an additional
enum variant. But `Timeline::set_state` is racy (#10297).

## Adopted solution

We use the lock on the timeline's upload queue as an arbiter: either
unarchival gets to it first and sours the state for auto-offloading, or
auto-offloading shuts it down, which stops any parallel unarchival in
its tracks. The key part is not releasing the upload queue's lock
between the check whether the timeline is archived or not, and shutting
it down (the actual implementation only sets `shutting_down` but it has
the same effect on `initialized_mut()` as a full shutdown). The rest of
the patch is stuff that follows from this.

We also move the part where we set the state to `Stopping` to after that
arbiter has decided the fate of the timeline. For deletions, we do keep
it inside `DeleteTimelineFlow::prepare` however, so that it is called
with all of the the timelines locks held that the function allocates
(timelines lock most importantly). This is only a precautionary measure
however, as I didn't want to analyze deletion related code for possible
races.

## Future changes

It might make sense to move `can_offload` to right before the offload
attempt. Maybe some other properties might have changed as well.
Although this will not be perfect either as no lock is held. I want to
keep it out of this change to emphasize that this move wasn't the main
reason we are race free now.

Fixes #10220
2025-01-09 20:41:49 +00:00
Tristan Partin
49756a0d01 Implement compute_ctl management API in Axum (#10099)
This is a refactor to create better abstractions related to our
management server. It cleans up the code, and prepares everything for
authorized communication to and from the control plane.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-09 20:08:26 +00:00
Arpad Müller
99b5a6705f Update rust to 1.84.0 (#10328)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://releases.rs/docs/1.84.0/).

Prior update was in #9926.
2025-01-09 18:29:09 +00:00
Alexey Kondratov
f37eeb56ad fix(compute_ctl): Resolve issues with dropping roles having dangling permissions (#10299)
## Problem

In Postgres, one cannot drop a role if it has any dependent objects in
the DB. In `compute_ctl`, we automatically reassign all dependent
objects in every DB to the corresponding DB owner. Yet, it seems that it
doesn't help with some implicit permissions. The issue is reproduced by
installing a `postgis` extension because it creates some views and
tables in the public schema.

## Summary of changes

Added a repro test without using a `postgis`: i) create a role via
`compute_ctl` (with `neon_superuser` grant); ii) create a test role, a
table in schema public, and grant permissions via the role in
`neon_superuser`.

To fix the issue, I added a new `compute_ctl` code that removes such
dangling permissions before dropping the role. It's done in the least
invasive way, i.e., only touches the schema public, because i) that's
the problem we had with PostGIS; ii) it creates a smaller chance of
messing anything up and getting a stuck operation again, just for a
different reason.

Properly, any API-based catalog operations should fail gracefully and
provide an actionable error and status code to the control plane,
allowing the latter to unwind the operation and propagate an error
message and hint to the user. In this sense, it's aligned with another
feature request https://github.com/neondatabase/cloud/issues/21611

Resolve neondatabase/cloud#13582
2025-01-09 16:39:53 +00:00
Arpad Müller
bebc46e713 Add scheduling_policy column to safekeepers table (#10205)
Add a `scheduling_policy` column to the safekeepers table of the storage
controller.

Part of #9981
2025-01-09 15:55:02 +00:00
John Spray
ad51622568 remote_storage: enable Azure connection pooling by default (#10324)
## Problem

Initially we defaulted this to zero to reduce risk. We have now been
using pooling in staging for some time without issues, so let's make it
the default for anyone using this software without setting the config
explicitly.

Closes: https://github.com/neondatabase/cloud/issues/20971

## Summary of changes

- Set Azure blob storage connection pool size to 8 by default
2025-01-09 15:34:06 +00:00
John Spray
ac6cca17ac storcon: don't log a heartbeat error during shutdown (#10325)
## Problem

Occasionally we see an unexpected error like:
```
ERROR spawn_heartbeat_driver: Failed to update node state 1 after heartbeat round: Shutting down\n')
Hint: use scripts/check_allowed_errors.sh to test any new allowed_error you add
```


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10324/12690404952/index.html#/testresult/63406a0687bf6eca

## Summary of changes

- Explicitly handle ApiError::ShuttingDown as a no-op when mutating node
status
2025-01-09 15:33:44 +00:00
Alex Chi Z.
640ac4fc9e fix(pageserver): report timestamp is in the past if the key is missing (#10210)
## Problem

If for some reasons we already garbage-collected the data under an LSN
but the caller uses a past LSN for the find_time_cutoff function, now we
will report a missing key error and GC will never proceed.

Note that missing key error can also happen if the key is really missing
(i.e., during the past offload incidents)

## Summary of changes

Make sure GC proceeds by bumping the LSN. When time_cutoff=None, we will
not increase the time_cutoff (it will be set to latest_gc_cutoff). If we
really need to bump the GC LSN for maintenance purpose, we need a
separate API to do that.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-09 14:43:20 +00:00
Konstantin Knizhnik
20c40eb733 Add response tag to getpage request in V3 protocol version (#8686)
## Problem

We have several serious data corruption incidents caused by mismatch of
get-age requests:
https://neondb.slack.com/archives/C07FJS4QF7V/p1723032720164359

We hope that the problem is fixed now. But it is better to prevent such
kind of problems in future.

Part of https://github.com/neondatabase/cloud/issues/16472

## Summary of changes

This PR introduce new V3 version of compute<->pageserver protocol,
adding tag to getpage response.
So now compute is able to check if it really gets response to the
requested page.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2025-01-09 13:12:04 +00:00
Vlad Lazar
f4739d49e3 pageserver: tweak interpreted ingest record metrics (#10291)
## Problem
The filtered record metric doesn't make sense for interpreted ingest. 

## Summary of changes
While of dubious utility in the first place, this patch replaces them
with records received and records observed metrics for interpreted
ingest:
* received records cause the pageserver to do _something_: write a key,
value pair to storage, update some metadata or flush pending
modifications
* observed records are a shard 0 concept and contain only key metadata
used in tracking relation sizes (received records include observed
records)
2025-01-09 12:31:02 +00:00
Arseny Sher
030ab1c0e8 TLA+ spec for safekeeper membership change (#9966)
## Problem

We want to define the algorithm for safekeeper membership change.

## Summary of changes

Add spec for it, several models and logs of checking them.

ref https://github.com/neondatabase/neon/issues/8699
2025-01-09 12:26:17 +00:00
John Spray
5baa4e7f0a docker: don't set LD_LIBRARY_PATH (#10321)
## Problem

This was causing storage controller to still use neon-built libpq
instead of vanilla libpq.

Since https://github.com/neondatabase/neon/pull/10269 we have a vanilla
postgres in the system path -- anything that wants a postgres library
will use that.

## Summary of changes

- Remove LD_LIBRARY_PATH assignment in Dockerfile
2025-01-09 11:47:55 +00:00
Tristan Partin
5b2751397d Refactor MigrationRunner::run_migrations() to call a helper (#10232)
This will make it easier to add per-db migrations, such as that for
CVE-2024-4317.

Link: https://www.postgresql.org/support/security/CVE-2024-4317/
Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-09 07:05:07 +00:00
Ivan Efremov
fcfff72454 impr(proxy): Decouple ip_allowlist from the CancelClosure (#10199)
This PR removes the direct dependency of the IP allowlist from
CancelClosure, allowing for more scalable and flexible IP restrictions
and enabling the future use of Redis-based CancelMap storage.

Changes:
- Introduce a new BackendAuth async trait that retrieves the IP
allowlist through existing authentication methods;
- Improve cancellation error handling by instrument() async
cancel_sesion() rather than dropping it.
- Set and store IP allowlist for SCRAM Proxy to consistently perform IP
allowance check
 
 Relates to #9660
2025-01-08 19:34:53 +00:00
Anastasia Lubennikova
0ad0db6ff8 compute: dropdb DROP SUBSCRIPTION fix (#10066)
## Problem
Project gets stuck if database with subscriptions was deleted via API /
UI.

https://github.com/neondatabase/cloud/issues/18646

## Summary of changes
Before dropping the database, drop all the subscriptions in it.
Do not drop slot on publisher, because we have no guarantee that the
slot still exists or that the publisher is reachable.

Add `DropSubscriptionsForDeletedDatabases` phase to run these operations
in all databases, we're about to delete.
Ignore the error if the database does not exist.
2025-01-08 18:55:04 +00:00
John Spray
68d8acfd05 storage controller: don't hold detached tenants in memory (#10264)
## Problem

Typical deployments of neon have some tenants that stay in use
continuously, and a background churning population of tenants that are
created and then fall idle, and are configured to Detached state.
Currently, this churn of short lived tenants results in an
ever-increasing memory footprint.

Closes: https://github.com/neondatabase/neon/issues/9712

## Summary of changes

- At startup, filter to only load shards that don't have Detached policy
- In process_result, check if a tenant's shards are all Detached and
observed=={}, and if so drop them from memory
- In tenant_location_conf and other tenant mutators, load the tenants'
shards on-demand if they are not present
2025-01-08 18:12:09 +00:00
Vlad Lazar
dc284247a5 storage_controller: fix node flap detach race (#10298)
## Problem

The observed state removal may race with the inline updates of the
observed state done from `Service::node_activate_reconcile`.

This was intended to work as follows:
1. Detaches while the node is unavailable remove the entry from the
   observed state.
2. `Service::node_activate_reconcile` diffs the locations returned
   by the pageserver with the observed state and detaches in-line
   when required.

## Summary of changes

This PR removes step (1) and lets background reconciliations
deal with the mismatch between the intent and observed state.
A follow up will attempt to remove `Service::node_activate_reconcile`
altogether.

Closes https://github.com/neondatabase/neon/issues/10253
2025-01-08 10:26:53 +00:00
Alex Chi Z.
5c76e2a983 fix(storage-scrubber): ignore errors if index_part is not consistent (#10304)
## Problem

Consider the pageserver is doing the following sequence of operations:

* upload X files
* update index_part to add X and remove Y
* delete Y files

When storage scrubber obtains the initial timeline snapshot before
"update index_part" (that is the old version that contains Y but not X),
and then obtains the index_part file after it gets updated, it will
report all Y files are missing.

## Summary of changes

Do not report layer file missing if index_part listed and downloaded are
not the same (i.e. different last_modified times)

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-07 23:24:17 +00:00
Erik Grinaker
237dae71a1 Revert "pageserver,safekeeper: disable heap profiling (#10268)" (#10303)
This reverts commit b33299dc37.

Heap profiles weren't the culprit after all.

Touches #10225.
2025-01-07 22:49:00 +00:00
Fedor Dikarev
43a5e575d6 ci: use reusable workflow for MacOs build (#9889)
Closes: https://github.com/neondatabase/cloud/issues/17784

## Problem
Currently, we run the whole CI pipeline for any changes. It's slow and
expensive.

## Suggestion
Starting with MacOs builds:
- check what files were changed
- rebuild only needed parts
- reuse results from previous builds when available
- run builds in parallel when possible

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-01-07 20:00:56 +00:00
Folke Behrens
0a117fb1f1 proxy: Parse Notification twice only for unknown topic (#10296)
## Problem

We currently parse Notification twice even in the happy path.

## Summary of changes

Use `#[serde(other)]` to catch unknown topics and defer the second
parsing.
2025-01-07 15:24:54 +00:00
JC Grünhage
4aa9786c6b Fix promote-images-prod after splitting it out (#10292)
## Problem
`promote-images` was split into `promote-images-dev` and
`promote-images-prod` in
https://github.com/neondatabase/neon/pull/10267.

`dev` credentials were loaded in `promote-images-dev` and `prod`
credentials were loaded in `promote-images-prod`, but
`promote-images-prod` needs `dev` credentials as well to access the
`dev` images to replicate them from `dev` to `prod`.

## Summary of changes
Load `dev` credentials in `promote-images-prod` as well.
2025-01-07 13:45:18 +00:00
Matthias van de Meent
be38123e62 Fix accounting of dropped prefetched GetPage requests (#10276)
Apparently, we failed to do this bookkeeping in quite a few places...

## Problem

Fixes https://github.com/neondatabase/cloud/issues/22364

## Summary of changes

Add accounting of dropped requests. Note that this includes prefetches
dropped due to things like "PS connection dropped unexpectedly" or
"prefetch queue is already full", but *not* (yet?) "dropped due to
backend shutdown".
2025-01-07 10:41:52 +00:00
JC Grünhage
ea84ec357f Split promote-images into promote-images-dev and promote-images-prod (#10267)
## Problem
`trigger-e2e-tests` waits half an hour before starting to run. Nearly
half of that time can be saved by promoting images before tests on them
are complete, so the e2e tests can run in parallel.

On `main` and `release{,-proxy,-compute}`, `promote-images` updates
`latest` and pushes things to prod ecr, so we want to run
`promote-images` only after `test-images` is done, but on other
branches, there is no harm in promoting images that aren't tested yet.

## Summary of changes

To promote images into dev container registries sooner, `promote-images`
is split into `promote-images-dev` and `promote-images-prod`. The former
pushes to dev container registries, the latter to prod ones. The latter
also waits for `test-images`, while the former doesn't. This allows to
run `trigger-e2e-tests` sooner.
2025-01-07 10:36:05 +00:00
Matthias van de Meent
30863c0104 libpagestore: timeout = max(0, difference), not min(0, difference) (#10274)
Using `min(0, ...)` causes us to fail to wait in most situations, so a
lack of data would be a hot wait loop, which is bad.

## Problem

We noticed high CPU usage in some situations
2025-01-07 09:07:38 +00:00
Alexander Bayandin
02f81b6469 Fix clippy warning on macOS (#10282)
## Problem

On macOS:

```
error: unused variable: `disable_lfc_resizing`
   --> compute_tools/src/bin/compute_ctl.rs:431:9
    |
431 |         disable_lfc_resizing,
    |         ^^^^^^^^^^^^^^^^^^^^ help: try ignoring the field: `disable_lfc_resizing: _`
    |
    = note: `-D unused-variables` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(unused_variables)]`
```

## Summary of changes
- Initialise `disable_lfc_resizing` only on Linux (because it's used on
Linux only in further bloc)
2025-01-06 20:28:33 +00:00
Alexander Bayandin
ad7f14d526 test_runner: update packages for Python 3.13 (#10285)
## Problem

It's impossible to run regression tests with Python 3.13 as some
dependencies don't support it (some of them are outdated, and `jsonnet`
doesn't support it at all yet)

## Summary of changes
- Update dependencies for Python 3.13
- Install `jsonnet` only on Python < 3.13 and skip relevant tests on
Python 3.13

Closes #10237
2025-01-06 20:25:31 +00:00
Erik Grinaker
b342a02b1c Dockerfile: build with force-frame-pointers=yes (#10286)
See
https://github.com/neondatabase/neon/pull/10226#issuecomment-2573725182.
2025-01-06 20:17:43 +00:00
Alex Chi Z.
4a6556e269 fix(pageserver): ensure GC computes time cutoff using the same start time (#10193)
## Problem

close https://github.com/neondatabase/neon/issues/10192

## Summary of changes

* `find_gc_time_cutoff` takes `now` parameter so that all branches
compute the cutoff based on the same start time, avoiding races.
* gc-compaction uses a single `get_gc_compaction_watermark` function to
get the safe LSN to compact.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-01-06 19:29:18 +00:00
Erik Grinaker
95f1920231 cargo: build with frame pointers (#10226)
## Problem

Frame pointers are typically disabled by default (depending on CPU
architecture), to improve performance. This frees up a CPU register, and
avoids a couple of instructions per function call. However, it makes
stack unwinding much more inefficient, since it has to use DWARF debug
information instead, and gives worse results with e.g. `perf` and eBPF
profiles. The `backtrace` implementation of `libunwind` is also
suspected to cause seg faults.

The performance benefit of frame pointer omission doesn't appear to
matter that much on modern 64-bit CPU architectures (which have plenty
of registers and optimized instruction execution), and benchmarks did
not show measurable overhead.

The Rust standard library and jemalloc already enable frame pointers by
default.

For more information, see
https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html.

Resolves #10224.
Touches #10225.

## Summary of changes

Enable frame pointers in all builds, and use frame pointers for pprof-rs
stack sampling.
2025-01-06 17:27:08 +00:00
Conrad Ludgate
fda52a0005 feat(proxy): dont trigger error alerts for unknown topics (#10266)
## Problem

Before the holidays, and just before our code freeze, a change to cplane
was made that started publishing the topics from #10197. This triggered
our alerts and put us in a sticky situation as it was not an error, and
we didn't want to silence the alert for the entire holidays, and we
didn't want to release proxy 2 days in a row if it was not essential.

We fixed it eventually by rewriting the alert based on logs, but this is
not a good solution.

## Summary of changes

Introduces an intermediate parsing step to check the topic name first,
to allow us to ignore parsing errors for any topics we do not know
about.
2025-01-06 13:05:35 +00:00
Busra Kugler
406cca643b Update neon_fixtures.py - remove logs (#10219)
We need to remove this line to prevent aws keys exposing in the public
s3 buckets
2025-01-06 10:44:23 +00:00
dependabot[bot]
b368e62cfc build(deps): bump jinja2 from 3.1.4 to 3.1.5 in the pip group (#10236)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-04 15:40:50 +00:00
John Spray
4b2f56862d docker: include vanilla debian postgres client (#10269)
## Problem

We are chasing down segfaults in the storage controller
https://github.com/neondatabase/cloud/issues/21010

This is for use by the storage controller, which links dynamically with
`libpq`. We currently use the neon-built libpq, but this may be unsafe
for use from multi-threaded programs like the controller, as it uses a
statically linked openssl

Precursor to https://github.com/neondatabase/neon/pull/10258

## Summary of changes

- Include `postgresql-15` in container builds.

The reason for using version 15 is simply because that is what's
available in Debian 12 without adding any extra repositories, and we
don't have any special need for latest version in our libpq usage.
2025-01-03 16:16:04 +00:00
Erik Grinaker
a77e87a48a pageserver: assert that uploads don't modify indexed layers (#10228)
## Problem

It's not legal to modify layers that are referenced by the current layer
index. Assert this in the upload queue, as preparation for upload queue
reordering.

Touches #10096.

## Summary of changes

Add a debug assertion that the upload queue does not modify layers
referenced by the current index.

I could be convinced that this should be a plain assertion, but will be
conservative for now.
2025-01-03 16:03:19 +00:00
Erik Grinaker
1393cc668b Revert "pageserver: revert flush backpressure (#8550) (#10135)" (#10270)
This reverts commit f3ecd5d76a.

It is
[suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759)
to have caused significant read amplification in the [ingest
benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs)
(specifically during index creation).

We will revisit an intermediate improvement here to unblock [upload
parallelism](https://github.com/neondatabase/neon/issues/10096) before
properly addressing [compaction
backpressure](https://github.com/neondatabase/neon/issues/8390).
2025-01-03 15:38:51 +00:00
Erik Grinaker
b33299dc37 pageserver,safekeeper: disable heap profiling (#10268)
## Problem

Since enabling continuous profiling in staging, we've seen frequent seg
faults. This is suspected to be because jemalloc and pprof-rs take a
stack trace at the same time, and the handlers aren't signal safe.
jemalloc does this probabilistically on every allocation, regardless of
whether someone is taking a heap profile, which means that any CPU
profile has a chance to cause a seg fault.

Touches #10225.

## Summary of changes

For now, just disable heap profiles -- CPU profiles are more important,
and we need to be able to take them without risking a crash.
2025-01-03 15:21:31 +00:00
John Spray
e9d30edc7f pageserver: fix a 500 during timeline creation + shutdown (#10259)
## Problem

The test_create_churn_during_restart test fails if timeline creation
calls return 500 errors (because the API shouldn't do it), and it's
sometimes failing, for example:

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/3ce2e7045465012e

## Summary of changes

- Avoid handling UploadQueueShutDownOrStopped case as an Other (i.e.
500)
2025-01-03 13:13:22 +00:00
Arpad Müller
1303cd5d05 Fix defusing race between Tenant::shutdown and offload_timeline (#10150)
There is a race condition between `Tenant::shutdown`'s `defuse_for_drop`
loop and `offload_timeline`, where timeline offloading can insert into a
tenant that is in the process of shutting down, in fact so far
progressed that the `defuse_for_drop` has already been called.

This prevents warn log lines of the form:

```
offloaded timeline <hash> was dropped without having cleaned it up at the ancestor
```

The solution piggybacks on the `offloaded_timelines` lock: both the
defuse loop and the offloaded timeline insertion need to acquire the
lock, and we know that the defuse loop only runs after the tenant has
set its `TenantState` to `Stopping`.

So if we hold the `offloaded_timelines` lock, and know that the
`TenantState` is not `Stopping`, then we know that the defuse loop has
not ran yet, and holding the lock ensures that it doesn't start running
while we are inserting the offloaded timeline.

Fixes #10070
2025-01-03 12:36:01 +00:00
John Spray
c08759f367 storcon: verbose logs in rare case of shards not attached yet (#10262)
## Problem

When we do a timeline CRUD operation, we check that the shards we need
to mutate are currently attached to a pageserver, by reading
`generation` and `generation_pageserver` from the database.

If any don't appear to be attached, we respond with a a 503 and "One or
more shards in tenant is not yet attached".

This is happening more often than expected, and it's not obvious with
current logging what's going on: specifically which shard has a problem,
and exactly what we're seeing in these persistent generation columns.

(Aside: it's possible that we broke something with the change in #10011
which clears generation_pageserver when we detach a shard, although if
so the mechanism isn't trivial: what should happen is that if we stamp
on generation_pageserver if a reconciler is running, then it shouldn't
matter because we're about to

## Summary of changes

- When we are in Attached mode but find that
generation_pageserver/generation are unset, output details while looping
over shards.
2025-01-03 10:55:15 +00:00
John Spray
ba9722a2fd tests: add upload wait in test_scrubber_physical_gc_ancestors (#10260)
## Problem

We see periodic failures in `test_scrubber_physical_gc_ancestors`, where
the logs show that the pageserver is creating image layers that should
cause child shards to no longer reference their parents' layers, but
then the scrubber runs and doesn't find any unreferenced layers.[


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/78ea06dea6ba8dd3

From inspecting the code & test, it seems like this could be as simple
as the test failing to wait for uploads before running the scrubber. It
had a 2 second delay built in to satisfy the scrubbers time threshold
checks, which on a lightly loaded machine would also have been easily
enough for uploads to complete, but our test machines are more heavily
loaded all the time.

## Summary of changes

- Wait for uploads to complete after generating images layers in
test_scrubber_physical_gc_ancestors, so that the scrubber should
reliably see the post-compaction metadata.
2025-01-03 10:55:07 +00:00
John Spray
2d4f267983 cargo: update diesel, pq-sys (#10256)
## Problem

Versions of `diesel` and `pq-sys` were somewhat stale. I was checking on
libpq->openssl versions while investigating a segfault via
https://github.com/neondatabase/cloud/issues/21010. I don't think these
rust bindings are likely to be the source of issues, but we might as
well freshen them as a precaution.

## Summary of changes

- Update diesel to 2.2.6
- Update pq-sys to 0.6.3
2025-01-03 10:20:18 +00:00
Ivan Efremov
7a598b9842 [proxy/docs]imprv: Add local testing section to proxy README (#10230)
Add commands to run proxy locally with the mocked control plane
2025-01-03 10:04:58 +00:00
Tristan Partin
eefad27538 Inline various migration queries (#10231)
There was no value in saving them off to temporary variables.

Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-02 22:12:56 +00:00
Em Sharnoff
cd10c719f9 compute: Add spec support for disabling LFC resizing (#10132)
ref neondatabase/cloud#21731

## Problem

When we manually override the LFC size for particular computes,
autoscaling will typically undo that because vm-monitor will resize LFC
itself.

So, we'd like a way to make vm-monitor not set LFC size — this actually
already exists, if we just don't give vm-monitor a postgres connection
string.

## Summary of changes

Add a new field to the compute spec, `disable_lfc_resizing`. When set to
`true`, we pass in `None` for its postgres connection string. That
matches the configuration tested in `neondatabase/autoscaling` CI.
2025-01-02 19:45:59 +00:00
Tristan Partin
363ea97f69 Add more substantial tests for compute migrations (#9811)
The previous tests really didn't do much. This set should be quite a bit
more encompassing.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-02 18:37:50 +00:00
Conrad Ludgate
56e6ebfe17 chore: building compute_tools and local_proxy together (#10257)
## Problem

Building local_proxy and compute_tools features the same dependency
tree, but as they are currently built in separate clean layers all that
progress is wasted. For our arm builds that's an extra 10 minutes.

## Summary of changes

Combines the compute_tools and local_proxy build layers.
2025-01-02 16:05:14 +00:00
Raphael 'kena' Poss
1622fd8bda proxy: recognize but ignore the 3 new redis message types (#10197)
## Problem

https://neondb.slack.com/archives/C085MBDUSS2/p1734604792755369

## Summary of changes

Recognize and ignore the 3 new broadcast messages:
- `/block_public_or_vpc_access_updated`
- `/allowed_vpc_endpoints_updated_for_org`
- `/allowed_vpc_endpoints_updated_for_projects`
2025-01-02 16:02:48 +00:00
Konstantin Knizhnik
8c7dcd2598 Set heartbeat interval for chaos test (#10222)
## Problem

See https://neondb.slack.com/archives/C033RQ5SPDH/p1734707873215729

test_timeline_archival_chaos becomes more flaky with increased heartbeat
interval

Resolves #10250.

## Summary of changes

Override heatbeat interval for `test_timelirn_archival_chaos.py`

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-02 14:14:18 +00:00
Folke Behrens
ee22d4c9ef proxy: Set TCP_NODELAY for compute connections (#10240)
neondatabase/cloud#19184
2025-01-02 13:32:24 +00:00
JC Grünhage
26600f2973 Skip running clippy without default features (#10098)
## Problem

Running clippy with `cargo hack --feature-powerset` in CI isn't
particularly fast. This PR follows-up on
https://github.com/neondatabase/neon/pull/8912 to improve the speed of
our clippy runs.

Parallelism as suggested in
https://github.com/neondatabase/neon/issues/9901 was tested, but didn't
show consistent enough improvements to be worth it. It actually
increased the amount of work done, as there's less cache hits when
clippy runs are spread out over multiple target directories.
Additionally, parallelism makes it so caching needs to be thought about
more actively and copying around target directories to enable
parallelism eats the rest of the performance gains from parallel
execution.

After some discussion, the decision was to instead cut down on the
number of jobs that are running further. The easiest way to do this is
to not run clippy *without* default features. The list of default
features is empty for all crates, and I haven't found anything using
`cfg(feature = "default")` either, so this is likely not going to change
anything except speeding the runs up.

## Summary of changes

Reduce the amount of feature combinations tried by `cargo hack` (as
suggested in
https://github.com/neondatabase/neon/pull/8912#pullrequestreview-2286482368)
by never disabling default features.

## Alternatives

- We can split things out into different jobs which reduces the time
until everything is finished by running more things in parallel. This
does however decreases the amount of cache hits and increases the amount
of time spent on overhead tasks like repo cloning and restoring caches
by doing those multiple times instead of once.
- We could replace `cargo hack [...] clippy` with `cargo clippy [...];
cargo clippy --features testing`. I'm not 100% sure how this compares to
the change here in the PR, but it does seem to run a bit faster. That
likely means it's doing less work, but without understanding what
exactly we loose by that I'd rather not do that for now. I'd appreciate
input on this though.
2025-01-02 11:33:42 +00:00
Konstantin Knizhnik
b3cd883f93 Unlock LFC mutex when LFC cache is disabled (#10235)
## Problem

See https://github.com/neondatabase/neon/issues/10233
`lfc_containsv` returns with holding lock when LFC was disabled.
This bug was introduced in commit  78938d1b59

## Summary of changes

Release lock before return.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-02 11:28:15 +00:00
Conrad Ludgate
38c7a2abfc chore(proxy): pre-load native tls certificates and propagate compute client config (#10182)
Now that we construct the TLS client config for cancellation as well as
connect, it feels appropriate to construct the same config once and
re-use it elsewhere. It might also help should #7500 require any extra
setup, so we can easily add it to all the appropriate call sites.
2025-01-02 09:36:13 +00:00
Conrad Ludgate
f94248a594 chore(libs/proxy): refactor tokio-postgres connection control flow (#10247)
In #10207 it was clear there was some confusion with the current
connection logic. To analyse the flow to make sure there was no poll
stalling, I ended up with the following refactor.

Notable changes:
1. Now all functions called `poll_xyz` and that have a `cx: &mut
Context` argument must return a `Poll<_>` type, and can only return
`Pending` iff an internal poll call also returned `Pending`
2. State management is handled entirely by `poll_messages`. There are
now only 2 states which makes it much easier to keep track of.

Each commit should be self-reviewable and should be simple to verify
that it keeps the same behaviour
2025-01-02 09:35:28 +00:00
Alex Chi Z.
9c53b41245 fix(pageserver): update remote latest_gc_cutoff after gc-compaction (#10209)
## Problem

close https://github.com/neondatabase/neon/issues/10208
part of #9114 

## Summary of changes

* Ensure remote `latest_gc_cutoff` is up-to-date before removing any
files for gc-compaction.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-19 18:40:20 +00:00
Konstantin Knizhnik
197a89ab3d Increase default stotrage controller heartbeat interval from 100msec … (#10206)
## Problem

Currently default value of storage controller heartbeat interval is
100msec. It means that 10 times per second it establish connection to
PS. And it seems to be quite expensive.
At MacOS right now storage_controller consumes 70% CPU and trusts - 30%.
So together they completely utilize one core.
A lot of us has Macs. Let's save environment a little bit and do not
waste electricity and contribute to global warming.

By the way, on prod we have interval  10seconds 

## Summary of changes

Increase heartbeat interval from 100msec to 1 second.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-19 18:32:32 +00:00
Alex Chi Z.
b89e02f3e8 fix(pageserver): consider partial compaction layer map in layer check (#10044)
## Problem

In https://github.com/neondatabase/neon/pull/9897 we temporarily
disabled the layer valid check because the current one only considers
the end result of all compaction algorithms, but partial gc-compaction
would temporarily produce an "invalid" layer map.

part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

Allow LSN splits to overlap in the slow path check. Currently, the valid
check is only used in storage scrubber (background job) and during
gc-compaction (without taking layer lock). Therefore, it's fine for such
checks to be a little bit inefficient but more accurate.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-12-19 18:04:53 +00:00
Konstantin Knizhnik
04517c6ff3 Do not reload config file on PS reconnect (#10204)
## Problem

See https://github.com/neondatabase/neon/issues/10184
and
https://neondb.slack.com/archives/C04DGM6SMTM/p1733997259898819

Reloading config file inside parallel worker cause it's termination

## Summary of changes

Remove call of `HandleMainLoopInterrupts()` 
Update of page server URL is propagated by postmaster through shared
memory and we should not reload config for it.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-19 15:22:39 +00:00
Vlad Lazar
628451d68e safekeeper: short-circuit interpreted wal sender (#10202)
## Problem

Safekeeper may currently send a batch to the pageserver even if it
hasn't decoded a new record.
I think this is quite unlikely in the field, but worth adressing.

## Summary of changes

Don't send anything if we haven't decoded a full record. Once this
merges and releases, the `InterpretedWalRecords` struct can be updated
to remove the Option wrapper for `next_record_lsn`.
2024-12-19 14:04:46 +00:00
Vlad Lazar
502d512fe2 safekeeper: lift benchmarking utils into safekeeper crate (#10200)
## Problem

The benchmarking utilities are also useful for testing. We want to write
tests in the safekeeper crate.

## Summary of changes

This commit lifts the utils to the safekeeper crate. They are compiled
if the benchmarking features is enabled or if in test mode.
2024-12-19 14:04:42 +00:00
John Spray
afda6d4700 storage_scrubber: don't report half-created timelines as corruption (#10198)
## Problem

test_timeline_archival_chaos does timeline creation with failure
injection, and thereby sometimes leaves timelines in a part created
state. This was being reported as corruption by the scrubber on test
teardown, because it considered a layer without an index to be an
invalid state. This was incorrect: the scrubber should accept this
state, it occurs legitimately during timeline creation.

Closes: https://github.com/neondatabase/neon/issues/9988

## Summary of changes

- Report a timeline with layers but no index as Relic rather than
MissingIndexPart.
- We retain the MissingIndexPart variant for the case where an index
_was_ found in the listing, but was not found by a subsequent GET, i.e.
racing with deletion.
2024-12-19 12:55:05 +00:00
John Spray
65042cbadd tests: use high IO concurrency in test_pgdata_import_smoke, use effective_io_concurrency=2 in tests by default (#10114)
## Problem

`test_pgdata_import_smoke` writes two gigabytes of pages and then reads
them back serially. This is CPU bottlenecked and results in a long
runtime, and sensitivity to CPU load from other tests on the same
machine.

Closes: https://github.com/neondatabase/neon/issues/10071

## Summary of changes

- Use effective_io_concurrency=32 when doing sequential scans through
2GiB of pages in test_pgdata_import_smoke. This is a ~10x runtime
decrease in the parts of the test that do sequential scans.
- Also set `effective_io_concurrency=2` for tests, as I noticed while
debugging that we were doing all getpage requests serially, which is bad
for checking the stability of the batching code.
2024-12-19 10:58:49 +00:00
Folke Behrens
b135194090 proxy: Delay SASL complete message until auth is done (#10189)
The final SASL complete message can be bundled with the remainder of the
auth flow messages until ReadyForQuery.

neondatabase/cloud#19184
2024-12-19 10:37:08 +00:00
Peter Bendel
43dc03459d Run pgbench on 10 GB scale factor on database with n relations (e.g. 10k) (#10172)
## Problem

We want to verify how much / if pgbench throughput and latency on Neon
suffers if the database contains many other relations, too.

## Summary of changes

Modify the benchmarking.yml pgbench-compare job to
- create an addiitional project at scale factor 10 GiB
- before running pgbench add n tables (initially 10k) to the database
- then compare the pgbench throughput and latency to the existing
pgbench-compare at 10 Gib scale factor

We use a realistic template for the n relations that is a partitioned
table with some realistic data types, indexes and constraints - similar
to a table that we use internally.

Example run:
https://github.com/neondatabase/neon/actions/runs/12377565956/job/34547386959
2024-12-19 10:25:44 +00:00
Christian Schwarz
a1b0558493 fast import: importer: use aws s3 cli (#10162)
## Problem

s5cmd doesn't pick up the pod service account

```
2024/12/16 16:26:01 Ignoring, HTTP credential provider invalid endpoint host, "169.254.170.23", only loopback hosts are allowed. <nil>
ERROR "ls s3://neon-dev-bulk-import-us-east-2/import-pgdata/fast-import/v1/br-wandering-hall-w2xobawv": NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors
```

## Summary of changes

Switch to offical CLI.


## Testing

Tested the pre-merge image in staging, using `job_image` override in
project settings.


https://neondb.slack.com/archives/C033RQ5SPDH/p1734554944391949?thread_ts=1734368383.258759&cid=C033RQ5SPDH

## Future Work

Switch back to s5cmd once https://github.com/peak/s5cmd/pull/769 gets
merged.

## Refs

- fixes https://github.com/neondatabase/cloud/issues/21876

---------

Co-authored-by: Gleb Novikov <NanoBjorn@users.noreply.github.com>
2024-12-19 10:04:17 +00:00
Alex Chi Z.
cc138b56f9 fix(pageserver): run psql in thread to avoid blocking (#10177)
## Problem

ref https://github.com/neondatabase/neon/issues/10170
ref https://github.com/neondatabase/neon/issues/9994

The psql command will block the main thread, causing other async tasks
to timeout (i.e., HTTP connect). Therefore, we need to move it to an I/O
executor thread.

## Summary of changes

* run psql connection in a thread

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
2024-12-19 09:45:06 +00:00
Konstantin Knizhnik
61fcf64c22 Fix flukyness of test_physical_and_logical_replicaiton.py (#10176)
## Problem

See https://github.com/neondatabase/neon/issues/10037
test_physical_and_logical_replication.py sometimes failed.

## Summary of changes

Add `wait_replica_caughtup` to wait for replica sync

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-18 19:15:38 +00:00
Alex Chi Z.
6d3e8096fc refactor(test): tighten up test_gc_feedback (#10126)
## Problem

In https://github.com/neondatabase/neon/pull/8103 we changed the test
case to have more test coverage of gc_compaction. Now that we have
`test_gc_compaction_smoke`, we can revert this test case to serve its
original purpose and revert the parameter changes.

part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

* Revert pitr_interval from 60s to 10s.
* Assert the physical/logical size ratio in the benchmark.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-12-18 18:10:05 +00:00
Alex Chi Z.
3d1c3a80ae feat(pageserver): add compact queue http endpoint (#10173)
## Problem

We cannot get the size of the compaction queue and access the info.

Part of #9114 

## Summary of changes

* Add an API endpoint to get the compaction queue.
* gc_compaction test case now waits until the compaction finishes.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-18 18:09:02 +00:00
John Spray
835287ba3a neon_local: add a flock to protect against concurrent execution (#10185)
## Problem

`neon_local` has always been unsafe to run concurrently with itself: it
uses simple text files for persistent state, and concurrent runs will
step on each other.

In some test environments we intentionally handle this with mutexes in
python land, but it's fragile to try and always remember to do that.

## Summary of changes

- Add a `flock` based mutex around the `main` function of neon_local,
using the repo directory as the file to lock
- Clean up an Option<> around control_plane_api, this is a drive-by
change because it was one of the fields that had a weird effect when
previous concurrent stuff stamped on it.
2024-12-18 16:29:47 +00:00
Conrad Ludgate
d63602cc78 chore(proxy): fully remove allow-self-signed-compute flag (#10168)
When https://github.com/neondatabase/cloud/pull/21856 is merged, this
flag is no longer necessary.
2024-12-18 16:03:14 +00:00
Erik Grinaker
1668d39b7c safekeeper: fix typo in allowlist for /profile/heap (#10186) 2024-12-18 15:51:53 +00:00
Alex Chi Z.
1d12efc428 fix(pageserver): allow repartition errors during gc-compaction smoke tests (#10164)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

In https://github.com/neondatabase/neon/pull/10127 we fixed the race,
but we didn't add the errors to the allowlist.

## Summary of changes

* Allow repartition errors in the gc-compaction smoke test.

I think it might be worth to refactor the code to allow multiple threads
getting a copy of repartition status (i.e., using Rcu) in the future.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-18 15:37:26 +00:00
Arpad Müller
85696297c5 Add safekeepers command to storcon_cli for listing (#10151)
Add a `safekeepers` subcommand to `storcon_cli` that allows listing the
safekeepers.

```
$ curl -X POST --url http://localhost:1234/control/v1/safekeeper/42 --data \
  '{"active":true, "id":42, "created_at":"2023-10-25T09:11:25Z", "updated_at":"2024-08-28T11:32:43Z","region_id":"neon_local","host":"localhost","port":5454,"http_port":0,"version":123,"availability_zone_id":"us-east-2b"}'
$ cargo run --bin storcon_cli  -- --api http://localhost:1234 safekeepers
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.38s
     Running `target/debug/storcon_cli --api 'http://localhost:1234' safekeepers`
+----+---------+-----------+------+-----------+------------+
| Id | Version | Host      | Port | Http Port | AZ Id      |
+==========================================================+
| 42 | 123     | localhost | 5454 | 0         | us-east-2b |
+----+---------+-----------+------+-----------+------------+
```

Also:

* Don't return the raw `SafekeeperPersistence` struct that contains the
raw database presentation, but instead a new
`SafekeeperDescribeResponse` struct.
* The `SafekeeperPersistence` struct leaves out the `active` field on
purpose because we want to deprecate it and replace it with a
`scheduling_policy` one.

Part of https://github.com/neondatabase/neon/issues/9981
2024-12-18 12:47:56 +00:00
Konstantin Knizhnik
aaf980f70d Online checkpoint replication state (#9976)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1733180965970089

Replication state is checkpointed only by shutdown checkpoint.
It means that replication snapshots are not removed till compute
shutdown.

## Summary of changes

Checkpoint replication state during online checkpoint

Related Postgres PR:
https://github.com/neondatabase/postgres/pull/546

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-18 09:34:38 +00:00
a-masterov
c52514ab02 Fix allure report creation on periodic pg_regress testing (#10171)
## Problem
The allure report finishes with the error `HttpError: Resource not
accessible by integration` while running the `pg_regress` test against a
cloud staging project due to a lack of permissions.
## Summary of changes
The permissions are added.
2024-12-17 20:47:44 +00:00
Conrad Ludgate
2ee6bc5ec4 chore(proxy): update vendored postgres libs to edition 2021 (#10139)
I ran `cargo fix --edition` in each project prior, and it found nothing
that needed fixing.
2024-12-17 20:06:18 +00:00
John Spray
fd230227f2 storcon: include preferred AZ in compute notifications (#9953)
## Problem

It is unreliable for the control plane to infer the AZ for computes from
where the tenant is currently attached, because if a tenant happens to
be in a degraded state or a release is ongoing while a compute starts,
then the tenant's attached AZ can be a different one to where it will
run long-term, and the control plane doesn't check back later to restart
the compute.

This can land in parallel with
https://github.com/neondatabase/neon/pull/9947

## Summary of changes

- Thread through the preferred AZ into the compute hook code via the
reconciler
- Include the preferred AZ in the body of compute hook notifications
2024-12-17 20:04:09 +00:00
Ivan Efremov
93e958341f [proxy]: Use TLS for cancellation queries (#10152)
## Problem
pg_sni_router assumes that all the streams are upgradable to TLS.
Cancellation requests were declined because of using NoTls config.

## Summary of changes
Provide TLS client config for cancellation requests.

Fixes
[#21789](https://github.com/orgs/neondatabase/projects/65/views/1?pane=issue&itemId=90911361&issue=neondatabase%7Ccloud%7C21789)
2024-12-17 19:26:54 +00:00
Tristan Partin
7dddbb9570 Add pg_repack extension (#10100)
Our solutions engineers and some customers would like to have this
extension available.

Link: https://github.com/neondatabase/cloud/issues/18890

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-17 18:36:55 +00:00
Erik Grinaker
a55853f67f utils: symbolize heap profiles (#10153)
## Problem

Jemalloc heap profiles aren't symbolized. This is inconvenient, and
doesn't work with Grafana Cloud Profiles.

Resolves #9964.

## Summary of changes

Symbolize the heap profiles in-process, and strip unnecessary cruft.

This uses about 100 MB additional memory to cache the DWARF information,
but I believe this is already the case with CPU profiles, which use the
same library for symbolization. With cached DWARF information, the
symbolization CPU overhead is negligible.

Example profiles:

*
[pageserver.pb.gz](https://github.com/user-attachments/files/18141395/pageserver.pb.gz)
*
[safekeeper.pb.gz](https://github.com/user-attachments/files/18141396/safekeeper.pb.gz)
2024-12-17 16:51:58 +00:00
Mikhail Kot
007b13b79a Don't build tests in compute image, use ninja (#10149)
Don't build tests in h3 and rdkit: ~15 min speedup.
Use Ninja as cmake generator where possible: ~10 min speedup.
Clean apt cache for smaller images: around 250mb size loss for
intermediate layers
2024-12-17 16:43:54 +00:00
Alexey Kondratov
2dfd3cab8c fix(compute): Report compute_backpressure_throttling_seconds as counter (#10125)
## Problem

It was reported as `gauge`, but it's actually a `counter`.

Also add `_total` suffix as that's the convention for counters.

The corresponding flux-fleet PR:
https://github.com/neondatabase/flux-fleet/pull/386
2024-12-17 16:14:07 +00:00
John Spray
b5833ef259 remote_storage: configurable connection pooling for ABS (#10169)
## Problem

The ABS SDK's default behavior is to do no connection pooling, i.e. open
and close a fresh connection for each request. Under high request rates,
this can result in an accumulation of TCP connections in TIME_WAIT or
CLOSE_WAIT state, and in extreme cases exhaustion of client ports.

Related: https://github.com/neondatabase/cloud/issues/20971

## Summary of changes

- Add a configurable `conn_pool_size` parameter for Azure storage,
defaulting to zero (current behavior)
- Construct a custom reqwest client using this connection pool size.
2024-12-17 12:24:51 +00:00
Erik Grinaker
b0e43c2f88 postgres_ffi: add WalStreamDecoder::complete_record() benchmark (#10158)
Touches #10097.
2024-12-17 10:35:00 +00:00
a-masterov
e226d7a3d1 Fix docker compose with PG17 (#10165)
## Problem
It's impossible to run docker compose with compute v17 due to `pg_anon`
extension which is not supported under PG17.
## Summary of changes
The auto-loading of `pg_anon` is disabled by default
2024-12-17 08:16:54 +00:00
Folke Behrens
aa7ab9b3ac proxy: Allow dumping TLS session keys for debugging (#10163)
## Problem

To debug issues with TLS connections there's no easy way to decrypt
packets unless a client has special support for logging the keys.

## Summary of changes

Add TLS session keys logging to proxy via `SSLKEYLOGFILE` env var gated
by flag.
2024-12-16 18:56:24 +00:00
Erik Grinaker
28ccda0a63 test_runner: ignore error in test_timeline_archival_chaos (#10161)
Resolves #10159.
2024-12-16 17:10:55 +00:00
Conrad Ludgate
59b7ff8988 chore(proxy): disallow unwrap and unimplemented (#10142)
As the title says, I updated the lint rules to no longer allow unwrap or
unimplemented.

Three special cases:
* Tests are allowed to use them
* std::sync::Mutex lock().unwrap() is common because it's usually
correct to continue panicking on poison
* `tokio::spawn_blocking(...).await.unwrap()` is common because it will
only error if the blocking fn panics, so continuing the panic is also
correct

I've introduced two extension traits to help with these last two, that
are a bit more explicit so they don't need an expect message every time.
2024-12-16 16:37:15 +00:00
Conrad Ludgate
2e4c9c5704 chore(proxy): remove allow_self_signed from regular proxy (#10157)
I noticed that the only place we use this flag is for testing console
redirect proxy. Makes sense to me to make this assumption more explicit.
2024-12-16 16:11:39 +00:00
Erik Grinaker
3d30a7a934 pageserver: make RemoteTimelineClient::schedule_index_upload infallible (#10155)
Remove an unnecessary `Result` and address a `FIXME`.
2024-12-16 15:54:47 +00:00
Conrad Ludgate
6565fd4056 chore: fix clippy lints 2024-12-06 (#10138) 2024-12-16 15:33:21 +00:00
Arseny Sher
c5e3314c6e Add test restarting compute at WAL page boundary (#10111)
## Problem

We've had similar test in test_logical_replication, but then removed it
because it wasn't needed to trigger LR related bug. Restarting at WAL
page boundary is still a useful test, so add it separately back.

## Summary of changes

Add the test.
2024-12-16 14:53:04 +00:00
Arseny Sher
1ed0e52bc8 Extract safekeeper http client to separate crate. (#10140)
## Problem

We want to use safekeeper http client in storage controller and
neon_local.

## Summary of changes

Extract it to separate crate. No functional changes.
2024-12-16 12:07:24 +00:00
Conrad Ludgate
24d6587914 chore(proxy): refactor self-signed config (#10154)
## Problem

While reviewing #10152 I found it tricky to actually determine whether
the connection used `allow_self_signed_compute` or not.

I've tried to remove this setting in the past:
* https://github.com/neondatabase/neon/pull/7884
* https://github.com/neondatabase/neon/pull/7437
* https://github.com/neondatabase/cloud/pull/13702

But each time it seems it is used by e2e tests

## Summary of changes

The `node_info.allow_self_signed_computes` is always initialised to
false, and then sometimes inherits the proxy config value. There's no
need this needs to be in the node_info, so removing it and propagating
it via `TcpMechansim` is simpler.
2024-12-16 11:15:25 +00:00
John Spray
ebcbc1a482 pageserver: tighten up code around SLRU dir key handling (#10082)
## Problem

Changes in #9786 were functionally complete but missed some edges that
made testing less robust than it should have been:
- `is_key_disposable` didn't consider SLRU dir keys disposable
- Timeline `init_empty` was always creating SLRU dir keys on all shards

The result was that when we had a bug
(https://github.com/neondatabase/neon/pull/10080), it wasn't apparent in
tests, because one would only encounter the issue if running on a
long-lived timeline with enough compaction to drop the initially created
empty SLRU dir keys, _and_ some CLog truncation going on.

Closes: https://github.com/neondatabase/cloud/issues/21516

## Summary of changes

- Update is_key_global and init_empty to handle SLRU dir keys properly
-- the only functional impact is that we avoid writing some spurious
keys in shards >0, but this makes testing much more robust.
- Make `test_clog_truncate` explicitly use a sharded tenant

The net result is that if one reverts #10080, then tests fail (i.e. this
PR is a reproducer for the issue)
2024-12-16 10:06:08 +00:00
Konstantin Knizhnik
117c1b5dde Do not perform prefetch for temp relations (#10146)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1734002916827019

With recent prefetch fixes for pg17 and `effective_io_concurrency=100` 
pg_regress test stats.sql is failed when set temp_buffers to 100.
Stream API will try to lock all this 100 buffers for prefetch.

## Summary of changes

Disable such behaviour for temp relations.
Postgres PR: https://github.com/neondatabase/postgres/pull/548

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-16 06:03:53 +00:00
Erik Grinaker
f3ecd5d76a pageserver: revert flush backpressure (#8550) (#10135)
## Problem

In #8550, we made the flush loop wait for uploads after every layer.
This was to avoid unbounded buildup of uploads, and to reduce compaction
debt. However, the approach has several problems:

* It prevents upload parallelism.
* It prevents flush and upload pipelining.
* It slows down ingestion even when there is no need to backpressure.
* It does not directly backpressure WAL ingestion (only via
`disk_consistent_lsn`), and will build up in-memory layers.
* It does not directly backpressure based on compaction debt and read
amplification.

An alternative solution to these problems is proposed in #8390.

In the meanwhile, we revert the change to reduce the impact on ingest
throughput. This does reintroduce some risk of unbounded
upload/compaction buildup. Until
https://github.com/neondatabase/neon/issues/8390, this can be addressed
in other ways:

* Use `max_replication_apply_lag` (aka `remote_consistent_lsn`), which
will more directly limit upload debt.
* Shard the tenant, which will spread the flush/upload work across more
Pageservers and move the bottleneck to Safekeeper.

Touches #10095.

## Summary of changes

Remove waiting on the upload queue in the flush loop.
2024-12-15 09:45:12 +00:00
Mikhail Kot
cf161e1556 fix(adapter): password not set in role drop (#10130)
## Problem

When entry was dropped and password wasn't set, new entry
had uninitialized memory in controlplane adapter

Resolves: https://github.com/neondatabase/cloud/issues/14914

## Summary of changes

Initialize password in all cases, add tests.
Minor formatting for less indentation
2024-12-14 17:37:13 +00:00
Konstantin Knizhnik
2521eba674 Check for invalid down link while prefetching B-Tree leave pages for index-only scan (#9867)
## Problem

See #9866

Index-only scan prefetch implementation doesn't take in account that
down link may be invalid

## Summary of changes

Check that downlink is valid block number


Correspondent Postgres PRs:
https://github.com/neondatabase/postgres/pull/534
https://github.com/neondatabase/postgres/pull/535
https://github.com/neondatabase/postgres/pull/536
https://github.com/neondatabase/postgres/pull/537

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-13 20:46:41 +00:00
Alexander Bayandin
d56fea680e CI: always require aws-oicd-role-arn input to be set (#10145)
## Problem
`benchmarking` job fails because `aws-oicd-role-arn` input is not set

## Summary of changes:
- Set `aws-oicd-role-arn` for `benchmarking job
- Always require `aws-oicd-role-arn` to be set
- Rename `aws_oicd_role_arn` to `aws-oicd-role-arn` for consistency
2024-12-13 19:56:32 +00:00
Alex Chi Z.
7ee5dca752 fix(pageserver): race between gc-compaction and repartition (#10127)
## Problem

close https://github.com/neondatabase/neon/issues/10124

gc-compaction split_gc_jobs is holding the repartition lock for too long
time.

## Summary of changes

* Ensure split_gc_compaction_jobs drops the repartition lock once it
finishes cloning the structures.
* Update comments.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-13 18:22:25 +00:00
Tristan Partin
07d1db54b3 Improve comments and log messages in the logical replication monitor (#9974)
Improved comments will help others when they read the code, and the log
messages will help others understand why the logical replication monitor
works the way it does.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-13 18:10:42 +00:00
Konstantin Knizhnik
eeabecd89f Correctly update LFC used_pages in case of LFC resize (#10128)
## Problem

LFC used_pages statistic is not updated in case of LFC resize (shrinking
`neon.file_cache_size_limit`)

## Summary of changes

Update `lfc_ctl->used_pages` in `lfc_change_limit_hook`

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-13 17:40:26 +00:00
Christian Schwarz
fcff752851 fix(test_timeline_archival_chaos): flakiness caused by orphan layers (#10083)
The test was failing with the scary but generic message `Remote storage
metadata corrupted`.

The underlying scrubber error is `Orphan layer detected: ...`.

The test kills pageserver at random points, hence it's expected that we
leak layers if we're killed in the window after layer upload but before
it's referenced from index part.

Refer to generation numbers RFC for details.

Refs:
- fixes https://github.com/neondatabase/neon/issues/9988
- root-cause analysis
https://github.com/neondatabase/neon/issues/9988#issuecomment-2520673167
2024-12-13 16:28:21 +00:00
Alexander Bayandin
2c91062828 test_prefetch: reduce timeout to default 5m from 10m (#10105)
## Problem

`test_prefetch` is flaky
(https://github.com/neondatabase/neon/issues/9961), but if it passes,
the run time is less than 30 seconds — we don't need an extended timeout
for it.

## Summary of changes
- Remove extended test timeout for `test_prefetch`
2024-12-13 14:52:54 +00:00
Arseny Sher
ce8eb089f3 Extract public sk types to safekeeper_api (#10137)
## Problem

We want to extract safekeeper http client to separate crate for use in
storage controller and neon_local. However, many types used in the API
are internal to safekeeper.

## Summary of changes

Move them to safekeeper_api crate. No functional changes.

ref https://github.com/neondatabase/neon/issues/9011
2024-12-13 14:06:27 +00:00
a-masterov
7dc382601c Fix pg_regress tests on a cloud staging instance (#10134)
## Problem
pg_regress tests start failing due to unique ids added to Neon error
messages
## Summary of changes
Patches updated
2024-12-13 13:59:04 +00:00
Rahul Patil
2451969d5c fix(ci): Allow github-action-script to post reports (#10136)
Allow github-action-script to post reports.

Failed CI:
https://github.com/neondatabase/neon/actions/runs/12304655364/job/34342554049#step:13:514
2024-12-13 12:22:15 +00:00
JC Grünhage
59ef701925 CI(deploy): fix git tag/release creation (#10119)
## Problem

When moving the comment on proxy-releases from the yaml doc into a
javascript code block, I missed converting the comment marker from `#`
to `//`.

## Summary of changes

Correctly convert comment marker.
2024-12-12 23:38:20 +00:00
Alexander Bayandin
ac04bad457 CI: don't run debug builds with LFC (#10123)
## Problem

I've noticed that debug builds with LFC fail more frequently and for
some reason ,their failure do block merging (but it should not)

## Summary of changes
- Do not run Debug builds with LFC
2024-12-12 22:55:38 +00:00
Peter Bendel
2f3f98a319 use OIDC role instead of AWS access keys for managing test runner (#10117)
in periodic pagebench workflow

## Problem

for background see https://github.com/neondatabase/cloud/issues/21545

## Summary of changes

use OIDC role to manage runners instead of AWS access key which needs to
be periodically rotated

## logs

seems to work in
https://github.com/neondatabase/neon/actions/runs/12298575888/job/34322306127#step:6:1
2024-12-12 20:25:39 +00:00
Alex Chi Z.
5ff4b991c7 feat(pageserver): gc-compaction split over LSN (#9900)
## Problem

part of https://github.com/neondatabase/neon/issues/9114, stacked PR
over https://github.com/neondatabase/neon/pull/9897, partially
refactored to help with
https://github.com/neondatabase/neon/issues/10031

## Summary of changes

* gc-compaction takes `above_lsn` parameter. We only compact the layers
above this LSN, and all data below the LSN are treated as if they are on
the ancestor branch.
* refactored gc-compaction to take `GcCompactJob` that describes the
rectangular range to be compacted.
* Added unit test for this case.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-12 20:23:24 +00:00
John Spray
a93e3d31cc storcon: refine logic for choosing AZ on tenant creation (#10054)
## Problem

When we update our scheduler/optimization code to respect AZs properly
(https://github.com/neondatabase/neon/pull/9916), the choice of AZ
becomes a much higher-stakes decision. We will pretty much always run a
tenant in its preferred AZ, and that AZ is fixed for the lifetime of the
tenant (unless a human intervenes)

Eventually, when we do auto-balancing based on utilization, I anticipate
that part of that will be to automatically change the AZ of tenants if
our original scheduling decisions have caused imbalance, but as an
interim measure, we can at least avoid making this scheduling decision
based purely on which AZ contains the emptiest node.

This is a precursor to https://github.com/neondatabase/neon/pull/9947

## Summary of changes

- When creating a tenant, instead of scheduling a shard and then reading
its preferred AZ back, make the AZ decision first.
- Instead of choosing AZ based on which node is emptiest, use the median
utilization of nodes in each AZ to pick the AZ to use. This avoids bad
AZ decisions during periods when some node has very low utilization
(such as after replacing a dead node)

I considered also making the selection a weighted pseudo-random choice
based on utilization, but wanted to avoid destabilising tests with that
for now.
2024-12-12 19:35:38 +00:00
Rahul Patil
6d5687521b fix(ci): Allow github-script to post test reports (#10120)
Allow github-script to post test reports
2024-12-12 18:53:35 +00:00
Heikki Linnakangas
53721266f1 Disable connection logging in pgbouncer by default (#10118)
It can produce a lot of logs, making pgbouncer itself consume all CPU in
extreme cases. We saw that happen in stress testing.
2024-12-12 17:05:58 +00:00
a-masterov
2f3433876f Change the channel for notification. (#10112)
## Problem
Now notifications about failures in `pg_regress` tests run on the
staging cloud instance, reach the channel `on-call-staging-stream`,
while they should reach `on-call-qa-staging-stream`
## Summary of changes
The channel changed.
2024-12-12 16:34:07 +00:00
Rahul Patil
58d45c6e86 ci(fix): Use OIDC auth to login on ECR (#10055)
## Problem

CI currently uses static credentials in some places. These are less
secure and hard to maintain, so we are going to deprecate them and use
OIDC auth.

## Summary of changes
- ci(fix): Use OIDC auth to upload artifact on s3
- ci(fix): Use OIDC auth to login on ECR
2024-12-12 15:13:08 +00:00
Conrad Ludgate
e502e880b5 chore(proxy): remove code for old API (#10109)
## Problem

Now that https://github.com/neondatabase/cloud/issues/15245 is done, we
can remove the old code.

## Summary of changes

Removes support for the ManagementV2 API, in favour of the ProxyV1 API.
2024-12-12 13:42:50 +00:00
Arseny Sher
c9a773af37 Fix test_subscriber_synchronous_commit flakiness. (#10057)
6f7aeaa configured LFC for USE_LFC case, but omitted setting
shared_buffers for non USE_LFC, causing flakiness.

ref https://github.com/neondatabase/neon/issues/9989
2024-12-12 11:57:00 +00:00
Vlad Lazar
ec0ce06c16 tests: default interpreted proto in tests (#10079)
## Problem

We aren't using the sharded interpreted wal receiver protocol in all
tests.

## Summary of changes

Default to the interpreted protocol.
2024-12-12 10:53:10 +00:00
Alexander Bayandin
0bd8eca9ca Storage: create release PRs On Fridays (#10017)
## Problem

To give Storage more time on preprod — create a release branch on Friday

## Summary of changes
- Automatically create Storage release PR on Friday instead of Monday
2024-12-12 09:18:50 +00:00
Misha Sakhnov
739f627b96 Bump vm-builder v0.35.0 -> v0.37.1 (#10015)
Bump version to pick up changes introduced in the neonvm-daemon to
support sys fs based CPU scaling
(https://github.com/neondatabase/autoscaling/issues/1082).

Previous update: https://github.com/neondatabase/neon/pull/9208
2024-12-12 08:45:52 +00:00
Arpad Müller
342cbea255 storcon: add safekeeper list API (#10089)
This adds an API to the storage controller to list safekeepers
registered to it.

This PR does a `diesel print-schema > storage_controller/src/schema.rs`
because of an inconsistency between up.sql and schema.rs, introduced by
[this](2c142f14f7)
commit, so there is some updates of `schema.rs` due to that. As a
followup to this, we should maybe think about running `diesel
print-schema` in CI.

Part of #9981
2024-12-12 01:09:24 +00:00
Tristan Partin
b391b29bdc Improve typing in test_runner/fixtures/httpserver.py (#10103)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-11 22:21:42 +00:00
Erik Grinaker
5126ebbfed test_runner: bump test_check_visibility_map timeout (#10091)
## Problem

`test_check_visibility_map` has been seen to time out in debug tests.

## Summary of changes

Bump the timeout to 10 minutes (test reports indicate 7 minutes is
sufficient).

We don't want to disable the test entirely in debug builds, to exercise
this with debug assertions enabled.

Resolves #10069.
2024-12-11 21:37:25 +00:00
Arpad Müller
7fa986bc92 Do tenant manifest validation with index-part (#10007)
This adds some validation of invariants that we want to uphold wrt the
tenant manifest and `index_part.json`:

* the data the manifest has about a timeline must match with the data in
`index_part.json`. It might actually change, e.g. when we do reparenting
during detach ancestor, but that requires the timeline to be
unoffloaded, i.e. removed from the manifest.
* any timeline mentioned in index part, must, if present, be archived.
If we unarchive, we first update the tenant manifest to unoffload, and
only then update index part. And one needs to archive before offloading.
* it is legal for timelines to be mentioned in the manifest but have no
`index_part`: this is a temporary state visible during deletion of the
timeline. if the pageserver crashed, an attach of the tenant will clean
the state up.
* it is also legal for offloaded timelines to have an
`ancestor_retain_lsn` of None while having an `ancestor_timeline_id`.
This is for the to-be-added flattening functionality: the plan is to set
former to None if we have flattened a timeline.

follow-up of #9942
part of #8088
2024-12-11 20:10:22 +00:00
Vlad Lazar
e8395807a5 storcon: allow for more concurrency in drain/fill operations (#10093)
## Problem

We saw the drain/fill operations not drain fast enough in ap-southeast.

## Summary of changes

These are some quick changes to speed it up:
* double reconcile concurrency - this is now half of the available
reconcile bandwidth
* reduce the waiter polling timeout - this way we can spawn new
reconciliations faster
2024-12-11 19:43:40 +00:00
Vlad Lazar
a3e80448e8 pageserver/storcon: add patch endpoints for tenant config metrics (#10020)
## Problem

Cplane and storage controller tenant config changes are not additive.
Any change overrides all existing tenant configs. This would be fine if
both did client side patching, but that's not the case.

Once this merges, we must update cplane to use the PATCH endpoint.

## Summary of changes

### High Level

Allow for patching of tenant configuration with a `PATCH
/v1/tenant/config` endpoint.
It takes the same data as it's PUT counterpart. For example the payload
below will update `gc_period` and unset `compaction_period`. All other
fields are left in their original state.
```
{
  "tenant_id": "1234",
  "gc_period": "10s",
  "compaction_period": null
}
```

### Low Level
* PS and storcon gain `PATCH /v1/tenant/config` endpoints. PS endpoint
is only used for cplane managed instances.
* `storcon_cli` is updated to have separate commands for
`set-tenant-config` and `patch-tenant-config`

Related https://github.com/neondatabase/cloud/issues/21043
2024-12-11 19:16:33 +00:00
Anastasia Lubennikova
ef233e91ef Update compute_installed_extensions metric: (#9891)
add owned_by_superuser field to filter out system extensions.

While on it, also correct related code:
- fix the metric setting: use set() instead of inc() in a loop.
inc() is not idempotent and can lead to incorrect results
if the function called multiple times. Currently it is only called at
compute start, but this will change soon.
- fix the return type of the installed_extensions endpoint
to match the metric. Currently it is only used in the test.
2024-12-11 16:43:26 +00:00
Mikhail Kot
dee2041cd3 walproposer: fix link error on debian 12 / ubuntu 22 (#10090)
## Problem

Linking walproposer library (e.g. `cargo t`) produces linker errors:
/home/myrrc/neon/pgxn/neon/walproposer_compat.c:169: undefined reference
to `pg_snprintf'

The library with these symbols (libpgcommon.a) is present

## Summary of changes

Changed order of libraries resolution for linker
2024-12-11 16:23:59 +00:00
Arseny Sher
e4bb1ca7d8 Increase neon_local http client to compute timeout in reconfigure. (#10088)
Seems like 30s sometimes not enough when CI runners are overloaded,
causing pull_timeline flakiness.

ref
https://github.com/neondatabase/neon/issues/9731#issuecomment-2535946443
2024-12-11 15:46:50 +00:00
a-masterov
b987648e71 Enable LFC for all the PG versions. (#10068)
## Problem
We added support for LFC for tests but are still using it only for the
PG17 release.

## Summary of changes
LFC is enabled for all PG versions. Errors in tests with LFC enabled now
block merging as usual. We keep tests with disabled LFC for PG17
release. Tests on debug builds with LFC enabled still don't affect
permission to merge.
2024-12-11 15:28:10 +00:00
Mikhail Kot
c79c1dd8e9 compute_ctl: don't panic if control plane can't be reached (#10078)
## Problem

If the control plane cannot be reached for some reason, compute_ctl
panics

## Summary of changes

panic is removed in favour of returning an error.
Code is reformatted a bit for more flat control flow

Resolves: #5391
2024-12-11 15:03:11 +00:00
Vlad Lazar
a53db73851 pageserver: don't drop multixact slrus on non zero shards (#10086)
## Problem

We get slru truncation commands on non-zero shards.
Compaction will drop the slru dir keys and ingest will fail when
receiving such records.
https://github.com/neondatabase/neon/pull/10080 fixed it for clog, but
not for multixact.

## Summary of changes

Only truncate multixact slrus on shard zero. I audited the rest of the
ingest code and it looks
fine from this pov.
2024-12-11 14:28:18 +00:00
Christian Schwarz
9ae980bf4f page_service: don't count time spent in Batcher towards smgr latency metrics (#10075)
## Problem

With pipelining enabled, the time a request spends in the batcher stage
counts towards the smgr op latency.

If pipelining is disabled, that time is not accounted for.

In practice, this results in a jump in smgr getpage latencies in various
dashboards and degrades the internal SLO.

## Solution

In a similar vein to #10042 and with a similar rationale, this PR stops
counting the time spent in batcher stage towards smgr op latency.

The smgr op latency metric is reduced to the actual execution time.

Time spent in batcher stage is tracked in a separate histogram.
I expect to remove that histogram after batching rollout is complete,
but it will be helpful in the meantime to reason about the rollout.
2024-12-11 13:37:08 +00:00
Vlad Lazar
665369c439 wal_decoder: fix compact key protobuf encoding (#10074)
## Problem

Protobuf doesn't support 128 bit integers, so we encode the keys as two
64 bit integers. Issue is that when we split the 128 bit compact key we
use signed 64 bit integers to represent the two halves. This may result
in a negative lower half when relnode is larger than `0x00800000`. When
we convert the lower half to an i128 we get a negative `CompactKey`.

## Summary of Changes

Use unsigned integers when encoding into Protobuf.

## Deployment

* Prod: We disabled the interpreted proto, so no compat concerns.
* Staging: Disable the interpreted proto, do one release, and then
release the fixed version.
We do this because a negative int32 will convert to a large uint32 value
and could give
a key in the actual pageserver space. In production we would around this
by adding new
fields to the proto and deprecating the old ones, but we can make our
lives easy here.
* Pre-prod: Same as staging
2024-12-11 12:35:02 +00:00
JC Grünhage
d7aeca2f34 CI(deploy): create git tags/releases before triggering deploy workflows (#10022)
## Problem

When dev deployments are disabled (or fail), the tags for releases
aren't created. It makes more sense to have tag and release creation
before the deployment to prevent situations like
[this](https://github.com/neondatabase/neon/pull/9959).

It is not enough to move the tag creation before the deployment. If the
deployment fails, re-running the job isn't possible because the API call
to create the tag will fail.

## Summary of changes

- Tag/Release creation now happens before the deployment
- The two steps for tag and release have been merged into a bigger one
- There's new checks to ensure the that if the tags/releases already
exist as expected, things will continue just fine.
2024-12-11 09:41:34 +00:00
John Spray
38415a9816 pageserver: fix ingest handling of CLog truncate (#10080)
## Problem

In #9786 we stop storing SLRUs on non-zero shards.

However, there was one code path during ingest that still tries to
enumerate SLRU relations on all shards. This fails if it sees a tenant
who has never seen any write to an SLRU, or who has done such thorough
compaction+GC that it has dropped its SLRU directory key.

## Summary of changes

- Avoid trying to list SLRU relations on nonzero shards
2024-12-11 09:16:11 +00:00
Matthias van de Meent
597125e124 Disable readstream's reliance on seqscan readahead (#9860)
Neon doesn't have seqscan detection of its own, so stop read_stream from
trying to utilize that readahead, and instead make it issue readahead of
its own.

## Problem

@knizhnik noticed that we didn't issue smgrprefetch[v] calls for
seqscans in PG17 due to the move to the read_stream API, which assumes
that the underlying IO facilities do seqscan detection for readahead.
That is a wrong assumption when Neon is involved, so let's remove the
code that applies that assumption.

## Summary of changes
Remove the cases where seqscans are detected and prefetch is disabled as
a consequence, and instead don't do that detection.

PG PR: https://github.com/neondatabase/postgres/pull/532
2024-12-11 00:51:05 +00:00
Matthias van de Meent
e71d20d392 Emit nbtree vacuum cycle id in nbtree xlog through forced FPIs (#9932)
This fixes neondatabase/neon#9929.

## Postgres repo PRS:
- PG17: https://github.com/neondatabase/postgres/pull/538
- PG16: https://github.com/neondatabase/postgres/pull/539
- PG15: https://github.com/neondatabase/postgres/pull/540
- PG14: https://github.com/neondatabase/postgres/pull/541

## Problem
see #9929 

## Summary of changes

We update the split code to force the code to emit an FPI whenever the
cycle ID might be interesting for concurrent btree vacuum.
2024-12-10 19:42:52 +00:00
Alex Chi Z.
aa0554fd1e feat(test_runner): allowed_errors in storage scrubber (#10062)
## Problem

resolve
https://github.com/neondatabase/neon/issues/9988#issuecomment-2528239437

## Summary of changes

* New verbose mode for storage scrubber scan metadata (pageserver) that
contains the error messages.
* Filter allowed_error list from the JSON output to determine the
healthy flag status.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-10 17:00:47 +00:00
Heikki Linnakangas
b853f78136 Print a log message if GetPage response takes too long (#10046)
We have metrics for GetPage request latencies, but this is an extra
measure to capture requests that take way too long in the logs. The log
message is printed every 10 s, until the response is received:

```
PG:2024-12-09 16:02:07.715 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 10.000 s, still waiting (sent 10613 requests, received 10612 responses)
PG:2024-12-09 16:02:17.723 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 20.008 s, still waiting (sent 10613 requests, received 10612 responses)
PG:2024-12-09 16:02:19.719 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] received response from pageserver after 22.006 s
```
2024-12-10 16:26:56 +00:00
Alex Chi Z.
6ad99826c1 fix(pageserver): refresh_gc_info should always increase cutoff (#9862)
## Problem

close https://github.com/neondatabase/cloud/issues/19671

```
Timeline -----------------------------
         ^ last GC happened LSN
              ^ original retention period setting = 24hr
> refresh-gc-info updates the gc_info
              ^ planned cutoff (gc_info)
         ^ customer set retention to 48hr, and it's still within the last GC LSN
         ^1   ^2 we have two choices: (1) update the planned cutoff to
                 move backwards, or (2) keep the current one
```

In this patch, we decided to keep the current cutoff instead of moving
back the gc_info to avoid races. In the future, we could allow the
planned gc cutoff to go back once cplane sends a retention_history
tenant config update, but this requires a careful revisit of the code.

## Summary of changes

Ensure that GC cutoffs never go back if retention settings get changed.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-10 15:23:26 +00:00
Konstantin Knizhnik
311ee793b9 Fix handling in-flight requersts in prefetch buffer resize (#9968)
## Problem

See https://github.com/neondatabase/neon/issues/9961
Current implementation of prefetch buffer resize doesn't correctly
handle in-flight requests

## Summary of changes

1. Fix index of entry we should wait for if new prefetch buffer size is
smaller than number of in-flight requests.
2. Correctly set flush position

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-10 15:01:40 +00:00
Erik Grinaker
ad472bd4a1 test_runner: add visibility map test (#9940)
Verifies that visibility map pages are correctly maintained across
shards.

Touches #9914.
2024-12-10 12:07:00 +00:00
Arpad Müller
c51db1db61 Replace MAX_KEYS_PER_DELETE constant with function (#10061)
Azure has a different per-request limit of 256 items for bulk deletion
compared to the number of 1000 on AWS. Therefore, we need to support
multiple values. Due to `GenericRemoteStorage`, we can't add an
associated constant, but it has to be a function.

The PR replaces the `MAX_KEYS_PER_DELETE` constant with a function of
the same name, implemented on both the `RemoteStorage` trait as well as
on `GenericRemoteStorage`.

The value serves as hint of how many objects to pass to the
`delete_objects` function.

Reading:

* https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch
* https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html

Part of #7931
2024-12-10 11:29:38 +00:00
Ivan Efremov
34c1295594 [proxy] impr: Additional logging for cancellation queries (#10039)
## Problem
Since cancellation tasks spawned in the background sometimes logs
missing context.

https://neondb.slack.com/archives/C060N3SEF9D/p1733427801527419?thread_ts=1733419882.560159&cid=C060N3SEF9D

## Summary of changes
Add `session_id` and change loglevel for cancellation queries
2024-12-10 10:14:28 +00:00
Evan Fleming
b593e51eae safekeeper: use arc for global timelines and config (#10051)
Hello! I was interested in potentially making some contributions to Neon
and looking through the issue backlog I found
[8200](https://github.com/neondatabase/neon/issues/8200) which seemed
like a good first issue to attempt to tackle. I see it was assigned a
while ago so apologies if I'm stepping on any toes with this PR. I also
apologize for the size of this PR. I'm not sure if there is a simple way
to reduce it given the footprint of the components being changed.

## Problem
This PR is attempting to address part of the problem outlined in issue
[8200](https://github.com/neondatabase/neon/issues/8200). Namely to
remove global static usage of timeline state in favour of
`Arc<GlobalTimelines>` and to replace wasteful clones of
`SafeKeeperConf` with `Arc<SafeKeeperConf>`. I did not opt to tackle
`RemoteStorage` in this PR to minimize the amount of changes as this PR
is already quite large. I also did not opt to introduce an
`SafekeeperApp` wrapper struct to similarly minimize changes but I can
tackle either or both of these omissions in this PR if folks would like.

## Summary of changes
- Remove static usage of `GlobalTimelines` in favour of
`Arc<GlobalTimelines>`
- Wrap `SafeKeeperConf` in `Arc` to avoid wasteful clones of the
underlying struct

## Some additional thoughts
- We seem to currently store `SafeKeeperConf` in `GlobalTimelines` and
then expose it through a public`get_global_config` function which
requires locking. This seems needlessly wasteful and based on observed
usage we could remove this public accessor and force consumers to
acquire `SafeKeeperConf` through the new Arc reference.
2024-12-09 21:09:20 +00:00
Alex Chi Z.
4c4cb80186 fix(pageserver): fix gc-compaction racing with legacy gc (#10052)
## Problem

close https://github.com/neondatabase/neon/issues/10049, close
https://github.com/neondatabase/neon/issues/10030, close
https://github.com/neondatabase/neon/issues/8861

part of https://github.com/neondatabase/neon/issues/9114

The legacy gc process calls `get_latest_gc_cutoff`, which uses a Rcu
different than the gc_info struct. In the gc_compaction_smoke test case,
the "latest" cutoff could be lower than the gc_info struct, causing
gc-compaction to collect data that could be accessed by
`latest_gc_cutoff`. Technically speaking, there's nothing wrong with
gc-compaction using gc_info without considering latest_gc_cutoff,
because gc_info is the source of truth. But anyways, let's fix it.

## Summary of changes

* gc-compaction uses `latest_gc_cutoff` instead of gc_info to determine
the gc horizon.
* if a gc-compaction is scheduled via tenant compaction iteration, it
will take the gc_block lock to avoid racing with functionalities like
detach ancestor (if it's triggered via manual compaction API without
scheduling, then it won't take the lock)

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-12-09 20:06:06 +00:00
a-masterov
92273b6d5e Enable the pg_regress tests on staging for PG17 (#9978)
## Problem
Currently, we run the `pg_regress` tests only for PG16
However, PG17 is a part of Neon and should be tested as well 
## Summary of changes
Modified the workflow and added a patch for PG17 enabling the
`pg_regress` tests.
The problem with leftovers was solved by using branches.
2024-12-09 19:30:39 +00:00
Arpad Müller
e74e7aac93 Use updated patched azure SDK crates (#10036)
For a while already, we've been unable to update the Azure SDK crates
due to Azure adopting use of a non-tokio async runtime, see #7545.

The effort to upstream the fix got stalled, and I think it's better to
switch to a patched version of the SDK that is up to date.

Now we have a fork of the SDK under the neondatabase github org, to
which I have applied Conrad's rebased patches to:
https://github.com/neondatabase/azure-sdk-for-rust/tree/neon .

The existence of a fork will also help with shipping bulk delete support
before it's upstreamed (#7931).

Also, in related news, the Azure SDK has gotten a rift in development,
where the main branch pertains to a future, to-be-officially-blessed
release of the SDK, and the older versions, which we are currently
using, are on the `legacy` branch. Upstream doesn't really want patches
for the `legacy` branch any more, they want to focus on the `main`
efforts. However, even then, the `legacy` branch is still newer than
what we are having right now, so let's switch to `legacy` for now.

Depending on how long it takes, we can switch to the official version of
the SDK once it's released or switch to the upstream `main` branch if
there is changes we want before that.

As a nice side effect of this PR, we now use reqwest 0.12 everywhere,
dropping the dependency on version 0.11.

Fixes #7545
2024-12-09 15:50:06 +00:00
Vlad Lazar
4cca5cdb12 deps: update url to 2.5.4 for RUSTSEC-2024-0421 (#10059)
## Problem

See https://rustsec.org/advisories/RUSTSEC-2024-0421

## Summary of changes

Update url crate to 2.5.4.
2024-12-09 14:57:42 +00:00
Arpad Müller
9d425b54f7 Update AWS SDK crates (#10056)
Result of running:

cargo update -p aws-types -p aws-sigv4 -p aws-credential-types -p
aws-smithy-types -p aws-smithy-async -p aws-sdk-kms -p aws-sdk-iam -p
aws-sdk-s3 -p aws-config

We want to keep the AWS SDK up to date as that way we benefit from new
developments and improvements.
2024-12-09 12:46:59 +00:00
John Spray
ec790870d5 storcon: automatically clear Pause/Stop scheduling policies to enable detaches (#10011)
## Problem

We saw a tenant get stuck when it had been put into Pause scheduling
mode to pin it to a pageserver, then it was left idle for a while and
the control plane tried to detach it.

Close: https://github.com/neondatabase/neon/issues/9957

## Summary of changes

- When changing policy to Detached or Secondary, set the scheduling
policy to Active.
- Add a test that exercises this
- When persisting tenant shards, set their `generation_pageserver` to
null if the placement policy is not Attached (this enables consistency
checks to work, and avoids leaving state in the DB that could be
confusing/misleading in future)
2024-12-07 13:05:09 +00:00
Christian Schwarz
4d7111f240 page_service: don't count time spent flushing towards smgr latency metrics (#10042)
## Problem

In #9962 I changed the smgr metrics to include time spent on flush.

It isn't under our (=storage team's) control how long that flush takes
because the client can stop reading requests.

## Summary of changes

Stop the timer as soon as we've buffered up the response in the
`pgb_writer`.

Track flush time in a separate metric.

---------

Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>
2024-12-07 08:57:55 +00:00
Alex Chi Z.
b1fd086c0c test(pageserver): disable gc_compaction smoke test for now (#10045)
## Problem

The test is flaky.

## Summary of changes

Disable the test.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-06 22:30:04 +00:00
Heikki Linnakangas
b6eea65597 Fix error message if PS connection is lost while receiving prefetch (#9923)
If the pageserver connection is lost while receiving the prefetch
request, the prefetch queue is cleared. The error message prints the
values from the prefetch slot, but because the slot was already cleared,
they're all zeros:

LOG: [NEON_SMGR] [shard 0] No response from reading prefetch entry 0:
0/0/0.0 block 0. This can be caused by a concurrent disconnect

To fix, make local copies of the values.

In the passing, also add a sanity check that if the receive() call
succeeds, the prefetch slot is still intact.
2024-12-06 20:56:57 +00:00
Alex Chi Z.
c42c28b339 feat(pageserver): gc-compaction split job and partial scheduler (#9897)
## Problem

part of https://github.com/neondatabase/neon/issues/9114, stacked PR
over #9809

The compaction scheduler now schedules partial compaction jobs.

## Summary of changes

* Add the compaction job splitter based on size.
* Schedule subcompactions using the compaction scheduler.
* Test subcompaction scheduler in the smoke regress test.
* Temporarily disable layer map checks

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-06 18:44:26 +00:00
Tristan Partin
e4837b0a5a Bump sql_exporter to 0.16.0 (#10041)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-06 17:43:55 +00:00
Erik Grinaker
14c4fae64a test_runner/performance: add improved bulk insert benchmark (#9812)
Adds an improved bulk insert benchmark, including S3 uploads.

Touches #9789.
2024-12-06 15:17:15 +00:00
Vlad Lazar
cc70fc802d pageserver: add metric for number of wal records received by each shard (#10035)
## Problem

With the current metrics we can't identify which shards are ingesting
data at any given time.

## Summary of changes

Add a metric for the number of wal records received for processing by
each shard. This is per (tenant, timeline, shard).
2024-12-06 12:51:41 +00:00
Alexey Kondratov
fa07097f2f chore: Reorganize and refresh CODEOWNERS (#10008)
## Problem

We didn't have a codeowner for `/compute`, so nobody was auto-assigned
for PRs like #9973

## Summary of changes

While on it:
1. Group codeowners into sections.
2. Remove control plane from the `/compute_tools` because it's primarily
the internal `compute_ctl` code.
3. Add control plane (and compute) to `/libs/compute_api` because that's
the shared public interface of the compute.
2024-12-06 11:44:50 +00:00
Erik Grinaker
7838659197 pageserver: assert that keys belong to shard (#9943)
We've seen cases where stray keys end up on the wrong shard. This
shouldn't happen. Add debug assertions to prevent this. In release
builds, we should be lenient in order to handle changing key ownership
policies.

Touches #9914.
2024-12-06 10:24:13 +00:00
Vlad Lazar
3f1c542957 pageserver: add disk consistent and remote lsn metrics (#10005)
## Problem

There's no metrics for disk consistent LSN and remote LSN. This stuff is
useful when looking at ingest performance.

## Summary of changes

Two per timeline metrics are added: `pageserver_disk_consistent_lsn` and
`pageserver_projected_remote_consistent_lsn`. I went for the projected
remote lsn instead of the visible one
because that more closely matches remote storage write tput. Ideally we
would have both, but these metrics are expensive.
2024-12-06 10:21:52 +00:00
Erik Grinaker
ec4072f845 pageserver: add wait_until_flushed parameter for timeline checkpoint (#10013)
## Problem

I'm writing an ingest benchmark in #9812. To time S3 uploads, I need to
schedule a flush of the Pageserver's in-memory layer, but don't actually
want to wait around for it to complete (which will take a minute).

## Summary of changes

Add a parameter `wait_until_flush` (default `true`) for
`timeline/checkpoint` to control whether to wait for the flush to
complete.
2024-12-06 10:12:39 +00:00
Erik Grinaker
56f867bde5 pageserver: only zero truncated FSM page on owning shard (#10032)
## Problem

FSM pages are managed like regular relation pages, and owned by a single
shard. However, when truncating the FSM relation the last FSM page was
zeroed out on all shards. This is unnecessary and potentially confusing.

The superfluous keys will be removed during compactions, as they do not
belong on these shards.

Resolves #10027.

## Summary of changes

Only zero out the truncated FSM page on the owning shard.
2024-12-06 07:22:22 +00:00
Arpad Müller
d1ab7471e2 Fix desc_str for Azure container (#10021)
Small logs fix I've noticed while working on
https://github.com/neondatabase/cloud/issues/19963 .
2024-12-05 20:51:57 +00:00
Tristan Partin
6ff4175fd7 Send Content-Type header on reconfigure request from neon_local (#10029)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-05 20:30:35 +00:00
Tristan Partin
6331cb2161 Bump anyhow to 1.0.94 (#10028)
We were over a year out of date.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-05 19:42:52 +00:00
Alex Chi Z.
71f38d1354 feat(pageserver): support schedule gc-compaction (#9809)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

gc-compaction can take a long time. This patch adds support for
scheduling a gc-compaction job. The compaction loop will first handle
L0->L1 compaction, and then gc compaction. The scheduled jobs are stored
in a non-persistent queue within the tenant structure.

This will be the building block for the partial compaction trigger -- if
the system determines that we need to do a gc compaction, it will
partition the keyspace and schedule several jobs. Each of these jobs
will run for a short amount of time (i.e, 1 min). L0 compaction will be
prioritized over gc compaction.

## Summary of changes
 
* Add compaction scheduler in tenant.
* Run scheduled compaction in integration tests.
* Change the manual compaction API to allow schedule a compaction
instead of immediately doing it.
* Add LSN upper bound as gc-compaction parameter. If we schedule partial
compactions, gc_cutoff might move across different runs. Therefore, we
need to pass a pre-determined gc_cutoff beforehand. (TODO: support LSN
lower bound so that we can compact arbitrary "rectangle" in the layer
map)
* Refactor the gc_compaction internal interface.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-05 19:37:17 +00:00
Tristan Partin
c0ba416967 Add compute_logical_snapshots_bytes metric (#9887)
This metric exposes the size of all non-temporary logical snapshot
files.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-05 19:04:33 +00:00
Alexey Kondratov
13e8105740 feat(compute): Allow specifying the reconfiguration concurrency (#10006)
## Problem

We need a higher concurrency during reconfiguration in case of many DBs,
but the instance is already running and used by the client. We can
easily get out of `max_connections` limit, and the current code won't
handle that.

## Summary of changes

Default to 1, but also allow control plane to override this value for
specific projects. It's also recommended to bump
`superuser_reserved_connections` += `reconfigure_concurrency` for such
projects to ensure that we always have enough spare connections for
reconfiguration process to succeed.

Quick workaround for neondatabase/cloud#17846
2024-12-05 17:57:25 +00:00
Erik Grinaker
db79304416 storage_controller: increase shard scan timeout (#10000)
## Problem

The node shard scan timeout of 1 second is a bit too aggressive, and
we've seen this cause test failures. The scans are performed in parallel
across nodes, and the entire operation has a 15 second timeout.

Resolves #9801.

## Summary of changes

Increase the timeout to 5 seconds. This is still enough to time out on a
network failure and retry successfully within 15 seconds.
2024-12-05 17:29:21 +00:00
Ivan Efremov
ffc9c33eb2 proxy: Present new auth backend cplane_proxy_v1 (#10012)
Implement a new auth backend based on the current Neon backend to switch
to the new Proxy V1 cplane API.

Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)
2024-12-05 05:30:38 +00:00
Yuchen Liang
ed2d892113 pageserver: fix buffered-writer on macos build (#10019)
## Problem

In https://github.com/neondatabase/neon/pull/9693, we forgot to check
macos build. The [CI
run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468)
on main showed that macos build failed with unused variables and dead
code.

## Summary of changes

- add `allow(dead_code)` and `allow(unused_variables)` to the relevant
code that is not used on macos.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-12-05 02:16:09 +00:00
Conrad Ludgate
131585eb6b chore: update rust-postgres (#10002)
Like #9931 but without rebasing upstream just yet, to try and minimise
the differences.

Removes all proxy-specific commits from the rust-postgres fork, now that
proxy no longer depends on them. Merging upstream changes to come later.
2024-12-04 21:07:44 +00:00
Conrad Ludgate
0bab7e3086 chore: update clap (#10009)
This updates clap to use a new version of anstream
2024-12-04 17:42:17 +00:00
Yuchen Liang
e6cd5050fc pageserver: make BufferedWriter do double-buffering (#9693)
Closes #9387.

## Problem

`BufferedWriter` cannot proceed while the owned buffer is flushing to
disk. We want to implement double buffering so that the flush can happen
in the background. See #9387.

## Summary of changes

- Maintain two owned buffers in `BufferedWriter`.
- The writer is in charge of copying the data into owned, aligned
buffer, once full, submit it to the flush task.
- The flush background task is in charge of flushing the owned buffer to
disk, and returned the buffer to the writer for reuse.
- The writer and the flush background task communicate through a
bi-directional channel.

For in-memory layer, we also need to be able to read from the buffered
writer in `get_values_reconstruct_data`. To handle this case, we did the
following
- Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`,
and use `Arc` to share it between writer and background task.
- leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`,
one clone will be submitted to the channel, the other clone will be
saved within the writer to serve reads. When we want to reuse the
buffer, we can invoke `IoBuffer::into_mut`, which gives us back the
mutable aligned buffer.
- InMemoryLayer reads is now aware of the maybe_flushed part of the
buffer.

**Caveat**

- We removed the owned version of write, because this interface does not
work well with buffer alignment. The result is that without direct IO
enabled,
[`download_object`](a439d57050/pageserver/src/tenant/remote_timeline_client/download.rs (L243))
does one more memcpy than before this PR due to the switch to use
`_borrowed` version of the write.
- "Bypass aligned part of write" could be implemented later to avoid
large amount of memcpy.

**Testing**
- use an oneshot channel based control mechanism to make flush behavior
deterministic in test.
- test reading from `EphemeralFile` when the last submitted buffer is
not flushed, in-progress, and done flushing to disk.


## Performance


We see performance improvement for small values, and regression on big
values, likely due to being CPU bound + disk write latency.


[Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4)


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-04 16:54:56 +00:00
John Spray
60c0d19f57 tests: make storcon scale test AZ-aware (#9952)
## Problem

We have a scale test for the storage controller which also acts as a
good stress test for scheduling stability. However, it created nodes
with no AZs set.

## Summary of changes

- Bump node count to 6 and set AZs on them.

This is a precursor to other AZ-related PRs, to make sure any new code
that's landed is getting scale tested in an AZ-aware environment.
2024-12-04 15:04:04 +00:00
a-masterov
dec2e2fb29 Create a branch for compute release (#9637)
## Problem
We practice a manual release flow for the compute module. This will
allow automation of the compute release process.

## Summary of changes
The workflow was modified to make a compute release automatically on the
branch release-compute.
## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-12-04 13:10:00 +00:00
Erik Grinaker
699a213c5d Display reqwest error source (#10004)
## Problem

Reqwest errors don't include details about the inner source error. This
means that we get opaque errors like:

```
receive body: error sending request for url (http://localhost:9898/v1/location_config)
```

Instead of the more helpful:

```
receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out
```

Touches #9801.

## Summary of changes

Include the source error for `reqwest::Error` wherever it's displayed.
2024-12-04 13:05:53 +00:00
Alexey Kondratov
9a4157dadb feat(compute): Set default application_name for pgbouncer connections (#9973)
## Problem

When client specifies `application_name`, pgbouncer propagates it to the
Postgres. Yet, if client doesn't do it, we have hard time figuring out
who opens a lot of Postgres connections (including the `cloud_admin`
ones).

See this investigation as an example:
https://neondb.slack.com/archives/C0836R0RZ0D

## Summary of changes

I haven't found this documented, but it looks like pgbouncer accepts
standard Postgres connstring parameters in the connstring in the
`[databases]` section, so put the default `application_name=pgbouncer`
there. That way, we will always see who opens Postgres connections. I
did tests, and if client specifies a `application_name`, pgbouncer
overrides this default, so it only works if it's not specified or set to
blank `&application_name=` in the connection string.

This is the last place we could potentially open some Postgres
connections without `application_name`. Everything else should be either
of two:
1. Direct client connections without `application_name`, but these
should be strictly non-`cloud_admin` ones
2. Some ad-hoc internal connections, so if we see spikes of unidentified
`cloud_admin` connections, we will need to investigate it again.

Fixes neondatabase/cloud#20948
2024-12-04 13:05:31 +00:00
Conrad Ludgate
bd52822e14 feat(proxy): add option to forward startup params (#9979)
(stacked on #9990 and #9995)

Partially fixes #1287 with a custom option field to enable the fixed
behaviour. This allows us to gradually roll out the fix without silently
changing the observed behaviour for our customers.

related to https://github.com/neondatabase/cloud/issues/15284
2024-12-04 12:58:35 +00:00
Folke Behrens
dcd016bbfc Assign /libs/proxy/ to proxy team (#10003) 2024-12-04 12:58:31 +00:00
Erik Grinaker
7b18e33997 pageserver: return proper status code for heatmap_upload errors (#9991)
## Problem

During deploys, we see a lot of 500 errors due to heapmap uploads for
inactive tenants. These should be 503s instead.

Resolves #9574.

## Summary of changes

Make the secondary tenant scheduler use `ApiError` rather than
`anyhow::Error`, to propagate the tenant error and convert it to an
appropriate status code.
2024-12-04 12:53:52 +00:00
Peter Bendel
9d75218ba7 fix parsing human time output like "50m37s" (#10001)
## Problem

In ingest_benchmark.yml workflow we use pgcopydb tool to migrate
project.
pgcopydb logs human time.

Our parsing of the human time doesn't work for times like "50m37s".

[Example
workflow](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:479)

contains "57m45s"

but we
[reported](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:500)
only the seconds part: 
45.000 s


## Summary of changes

add a regex pattern for Minute/Second combination
2024-12-04 11:37:24 +00:00
Peter Bendel
1b3558df7a optimize parms for ingest bench (#9999)
## Problem

we tried different parallelism settings for ingest bench 

## Summary of changes

the following settings seem optimal after merging
- SK side Wal filtering
- batched getpages

Settings:
- effective_io_concurrency 100
- concurrency limit 200 (different from Prod!)
- jobs 4, maintenance workers 7
- 10 GB chunk size
2024-12-04 11:07:22 +00:00
Vlad Lazar
68205c48ed storcon: return an error for drain attempts while paused (#9997)
## Problem

We currently allow drain operations to proceed while the node policy is
paused.

## Summary of changes

Return a precondition failed error in such cases. The orchestrator is
updated in https://github.com/neondatabase/infra/pull/2544 to skip drain
and fills if the pageserver is paused.

Closes: https://github.com/neondatabase/neon/issues/9907
2024-12-04 09:25:29 +00:00
Christian Schwarz
8d93d02c2f page_service: enable batching in Rust & Python Tests + Python benchmarks (#9993)
This is the first step towards batching rollout.

Refs

- rollout plan: https://github.com/neondatabase/cloud/issues/20620
- task https://github.com/neondatabase/neon/issues/9377
- uber-epic: https://github.com/neondatabase/neon/issues/9376
2024-12-04 00:07:49 +00:00
Alexander Bayandin
023821a80c test_page_service_batching: fix non-numeric metrics (#9998)
## Problem

```
2024-12-03T15:42:46.5978335Z + poetry run python /__w/neon/neon/scripts/ingest_perf_test_result.py --ingest /__w/neon/neon/test_runner/perf-report-local
2024-12-03T15:42:49.5325077Z Traceback (most recent call last):
2024-12-03T15:42:49.5325603Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 165, in <module>
2024-12-03T15:42:49.5326029Z     main()
2024-12-03T15:42:49.5326316Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 155, in main
2024-12-03T15:42:49.5326739Z     ingested = ingest_perf_test_result(cur, item, recorded_at_timestamp)
2024-12-03T15:42:49.5327488Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-03T15:42:49.5327914Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 99, in ingest_perf_test_result
2024-12-03T15:42:49.5328321Z     psycopg2.extras.execute_values(
2024-12-03T15:42:49.5328940Z   File "/github/home/.cache/pypoetry/virtualenvs/non-package-mode-_pxWMzVK-py3.11/lib/python3.11/site-packages/psycopg2/extras.py", line 1299, in execute_values
2024-12-03T15:42:49.5335618Z     cur.execute(b''.join(parts))
2024-12-03T15:42:49.5335967Z psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "concurrent-futures"
2024-12-03T15:42:49.5336287Z LINE 57:             'concurrent-futures',
2024-12-03T15:42:49.5336462Z                      ^
```

## Summary of changes
- `test_page_service_batching`: save non-numeric params as `labels`
- Add a runtime check that `metric_value` is NUMERIC
2024-12-03 22:46:18 +00:00
Christian Schwarz
944c1adc4c tests & benchmarks: unify the way we customize the default tenant config (#9992)
Before this PR, some override callbacks used `.default()`, others
used `.setdefault()`.

As of this PR, all callbacks use `.setdefault()` which I think is least
prone to failure.

Aligning on a single way will set the right example for future tests
that need such customization.

The `test_pageserver_getpage_throttle.py` technically is a change in
behavior: before, it replaced the `tenant_config` field, now it just
configures the throttle. This is what I believe is intended anyway.
2024-12-03 22:07:03 +00:00
Arpad Müller
ca85f364ba Support tenant manifests in the scrubber (#9942)
Support tenant manifests in the storage scrubber:

* list the manifests, order them by generation
* delete all manifests except for the two most recent generations
* for the latest manifest: try parsing it.

I've tested this patch by running the against a staging bucket and it
successfully deleted stuff (and avoided deleting the latest two
generations).

In follow-up work, we might want to also check some invariants of the
manifest, as mentioned in #8088.

Part of #9386
Part of #8088

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-03 20:39:10 +00:00
Conrad Ludgate
9ef0662a42 chore(proxy): enforce single host+port (#9995)
proxy doesn't ever provide multiple hosts/ports, so this code adds a lot
of complexity of error handling for no good reason.

(stacked on #9990)
2024-12-03 20:00:14 +00:00
Alexey Immoreev
3baef0bca3 Improvement: add console redirect timeout warning (#9985)
## Problem

There is no information on session being cancelled in 2 minutes at the
moment

## Summary of changes

The timeout being logged for the user
2024-12-03 18:59:44 +00:00
Erik Grinaker
f312c6571f pageserver: respond to multiple shutdown signals (#9982)
## Problem

The Pageserver signal handler would only respond to a single signal and
initiate shutdown. Subsequent signals were ignored. This meant that a
`SIGQUIT` sent after a `SIGTERM` had no effect (e.g. in the case of a
slow or stalled shutdown). The `test_runner` uses this to force shutdown
if graceful shutdown is slow.

Touches #9740.

## Summary of changes

Keep responding to signals after the initial shutdown signal has been
received.

Arguably, the `test_runner` should also use `SIGKILL` rather than
`SIGQUIT` in this case, but it seems reasonable to respond to `SIGQUIT`
regardless.
2024-12-03 18:47:17 +00:00
Conrad Ludgate
27a42d0f96 chore(proxy): remove postgres config parser and md5 support (#9990)
Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres
allows us to remove a lot of dead weight from our actual postgres
connection logic.
2024-12-03 18:39:23 +00:00
John Spray
b04ab468ee pageserver: more detailed logs when calling re-attach (#9996)
## Problem

We saw a peculiar case where a pageserver apparently got a 0-tenant
response to `/re-attach` but we couldn't see the request landing on a
storage controller. It was hard to confirm retrospectively that the
pageserver was configured properly at the moment it sent the request.

## Summary of changes

- Log the URL to which we are sending the request
- Log the NodeId and metadata that we sent
2024-12-03 18:36:37 +00:00
John Spray
dcb629532b pageserver: only store SLRUs & aux files on shard zero (#9786)
## Problem

Since https://github.com/neondatabase/neon/pull/9423 the non-zero shards
no longer need SLRU content in order to do GC. This data is now
redundant on shards >0.

One release cycle after merging that PR, we may merge this one, which
also stops writing those pages to shards > 0, reaping the efficiency
benefit.

Closes: https://github.com/neondatabase/neon/issues/7512
Closes: https://github.com/neondatabase/neon/issues/9641

## Summary of changes

- Avoid storing SLRUs on non-zero shards
- Bonus: avoid storing aux files on non-zero shards
2024-12-03 17:22:49 +00:00
John Spray
71d004289c storcon: in shard splits, inherit parent's AZ (#9946)
## Problem

Sharded tenants should be run in a single AZ for best performance, so
that computes have AZ-local latency to all the shards.

Part of https://github.com/neondatabase/neon/issues/8264

## Summary of changes

- When we split a tenant, instead of updating each shard's preferred AZ
to wherever it is scheduled, propagate the preferred AZ from the parent.
- Drop the check in `test_shard_preferred_azs` that asserts shards end
up in their preferred AZ: this will not be true again until the
optimize_attachment logic is updated to make this so. The existing check
wasn't testing anything about scheduling, it was just asserting that we
set preferred AZ in a way that matches the way things happen to be
scheduled at time of split.
2024-12-03 16:55:00 +00:00
Christian Schwarz
4d422b937c pageserver: only throttle pagestream requests & bring back throttling deduction for smgr latency metrics (#9962)
## Problem

In the batching PR 
- https://github.com/neondatabase/neon/pull/9870

I stopped deducting the time-spent-in-throttle fro latency metrics,
i.e.,
- smgr latency metrics (`SmgrOpTimer`)
- basebackup latency (+scan latency, which I think is part of
basebackup).

The reason for stopping the deduction was that with the introduction of
batching, the trick with tracking time-spent-in-throttle inside
RequestContext and swap-replacing it from the `impl Drop for
SmgrOpTimer` no longer worked with >1 requests in a batch.

However, deducting time-spent-in-throttle is desirable because our
internal latency SLO definition does not account for throttling.

## Summary of changes

- Redefine throttling to be a page_service pagestream request throttle
instead of a throttle for repository `Key` reads through `Timeline::get`
/ `Timeline::get_vectored`.
- This means reads done by `basebackup` are no longer subject to any
throttle.
- The throttle applies after batching, before handling of the request.
- Drive-by fix: make throttle sensitive to cancellation.
- Rename metric label `kind` from `timeline_get` to `pagestream` to
reflect the new scope of throttling.

To avoid config format breakage, we leave the config field named
`timeline_get_throttle` and ignore the `task_kinds` field.
This will be cleaned up in a future PR.

## Trade-Offs

Ideally, we would apply the throttle before reading a request off the
connection, so that we queue the minimal amount of work inside the
process.
However, that's not possible because we need to do shard routing.

The redefinition of the throttle to limit pagestream request rate
instead of repository `Key` rate comes with several downsides:
- We're no longer able to use the throttle mechanism for other other
tasks, e.g. image layer creation.
  However, in practice, we never used that capability anyways.
- We no longer throttle basebackup.
2024-12-03 15:25:58 +00:00
Erik Grinaker
bbe4dfa991 test_runner: use immediate shutdown in test_sharded_ingest (#9984)
## Problem

`test_sharded_ingest` ingests a lot of data, which can cause shutdown to
be slow e.g. due to local "S3 uploads" or compactions. This can cause
test flakes during teardown.

Resolves #9740.

## Summary of changes

Perform an immediate shutdown of the cluster.
2024-12-03 14:33:31 +00:00
Erik Grinaker
dcb24ce170 safekeeper,pageserver: add heap profiling (#9778)
## Problem

We don't have good observability for memory usage. This would be useful
e.g. to debug OOM incidents or optimize performance or resource usage.

We would also like to use continuous profiling with e.g. [Grafana Cloud
Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/)
(see https://github.com/neondatabase/cloud/issues/14888).

This PR is intended as a proof of concept, to try it out in staging and
drive further discussions about profiling more broadly.

Touches https://github.com/neondatabase/neon/issues/9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
Depends on #9779.
Depends on #9780.

## Summary of changes

Adds a HTTP route `/profile/heap` that takes a heap profile and returns
it. Query parameters:

* `format`: output format (`jemalloc` or `pprof`; default `pprof`).

Unlike CPU profiles (see #9764), heap profiles are not symbolized and
require the original binary to translate addresses to function names. To
make this work with Grafana, we'll probably have to symbolize the
process server-side -- this is left as future work, as is other output
formats like SVG.

Heap profiles don't work on macOS due to limitations in jemalloc.
2024-12-03 11:35:59 +00:00
a-masterov
a2a942f93c Add support for the extensions test for Postgres v17 (#9748)
## Problem
The extensions for Postgres v17 are ready but we do not test the
extensions shipped with v17
## Summary of changes
Build the test image based on Postgres v17. Run the tests for v17.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2024-12-03 11:25:29 +00:00
Christian Schwarz
cb10be710d page_service: batching observability & include throttled time in smgr metrics (#9870)
This PR 

- fixes smgr metrics https://github.com/neondatabase/neon/issues/9925 
- adds an additional startup log line logging the current batching
config
- adds a histogram of batch sizes global and per-tenant
- adds a metric exposing the current batching config

The issue described #9925 is that before this PR, request latency was
only observed *after* batching.
This means that smgr latency metrics (most importantly getpage latency)
don't account for
- `wait_lsn` time 
- time spent waiting for batch to fill up / the executor stage to pick
up the batch.

The fix is to use a per-request batching timer, like we did before the
initial batching PR.
We funnel those timers through the entire request lifecycle.

I noticed that even before the initial batching changes, we weren't
accounting for the time spent writing & flushing the response to the
wire.
This PR drive-by fixes that deficiency by dropping the timers at the
very end of processing the batch, i.e., after the `pgb.flush()` call.

I was **unable to maintain the behavior that we deduct
time-spent-in-throttle from various latency metrics.
The reason is that we're using a *single* counter in `RequestContext` to
track micros spent in throttle.
But there are *N* metrics timers in the batch, one per request.
As a consequence, the practice of consuming the counter in the drop
handler of each timer no longer works because all but the first timer
will encounter error `close() called on closed state`.
A failed attempt to maintain the current behavior can be found in
https://github.com/neondatabase/neon/pull/9951.

So, this PR remvoes the deduction behavior from all metrics.
I started a discussion on Slack about it the implications this has for
our internal SLO calculation:
https://neondb.slack.com/archives/C033RQ5SPDH/p1732910861704029

# Refs

- fixes https://github.com/neondatabase/neon/issues/9925
- sub-issue https://github.com/neondatabase/neon/issues/9377
- epic: https://github.com/neondatabase/neon/issues/9376
2024-12-03 11:03:23 +00:00
Christian Schwarz
15d01b257a storcon_cli tenant-describe: include tenant-wide information in output (#9899)
Before this PR, the storcon_cli didn't have a way to show the
tenant-wide information of the TenantDescribeResponse.

Sadly, the `Serialize` impl for the tenant config doesn't skip on
`None`, so, the output becomes a bit bloated.
Maybe we can use `skip_serializing_if(Option::is_none)` in the future.
=> https://github.com/neondatabase/neon/issues/9983
2024-12-03 10:55:13 +00:00
John Spray
aaee713e53 storcon: use proper schedule context during node delete (#9958)
## Problem

I was touching `test_storage_controller_node_deletion` because for AZ
scheduling work I was adding a change to the storage controller (kick
secondaries during optimisation) that made a FIXME in this test defunct.
While looking at it I also realized that we can easily fix the way node
deletion currently doesn't use a proper ScheduleContext, using the
iterator type recently added for that purpose.

## Summary of changes

- A testing-only behavior in storage controller where if a secondary
location isn't yet ready during optimisation, it will be actively
polled.
- Remove workaround in `test_storage_controller_node_deletion` that
previously was needed because optimisation would get stuck on cold
secondaries.
- Update node deletion code to use a `TenantShardContextIterator` and
thereby a proper ScheduleContext
2024-12-03 08:59:38 +00:00
Alexey Kondratov
2e9207fdf3 fix(testing): Use 1 MB shared_buffers even with LFC (#9969)
## Problem

After enabling LFC in tests and lowering `shared_buffers` we started
having more problems with `test_pg_regress`.

## Summary of changes

Set `shared_buffers` to 1MB to both exercise getPage requests/LFC, and
still have enough room for Postgres to operate. Everything smaller might
be not enough for Postgres under load, and can cause errors like 'no
unpinned buffers available'.

See Konstantin's comment [1] as well.

Fixes #9956

[1]:
https://github.com/neondatabase/neon/issues/9956#issuecomment-2511608097
2024-12-02 18:46:06 +00:00
Tristan Partin
d8ebd33fe6 Stop changing the value of neon.extension_server_port at runtime (#9972)
On reconfigure, we no longer passed a port for the extension server
which caused us to not write out the neon.extension_server_port line.
Thus, Postgres thought we were setting the port to the default value of
0. PGC_POSTMASTER GUCs cannot be set at runtime, which causes the
following log messages:

> LOG: parameter "neon.extension_server_port" cannot be changed without
restarting the server
> LOG: configuration file
"/var/db/postgres/compute/pgdata/postgresql.conf" contains errors;
unaffected changes were applied

Fixes: https://github.com/neondatabase/neon/issues/9945

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-02 18:06:19 +00:00
Conrad Ludgate
2dc238e5b3 feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971)
Fix the HTTP AuthMethod to accomodate the JWT authorization method.
Introduces the JWT issuer as an additional field in the parquet logs
2024-12-02 17:54:32 +00:00
Folke Behrens
243bca1c49 Bump OTel, tracing, reqwest crates (#9970) 2024-12-02 17:24:48 +00:00
Arseny Sher
fa909c27fc Update consensus protocol spec (#9607)
The spec was written for the buggy protocol which we had before the one
more similar to Raft was implemented. Update the spec with what we
currently have.

ref https://github.com/neondatabase/neon/issues/8699
2024-12-02 16:10:44 +00:00
Folke Behrens
1b60571636 proxy: Create Elasticache credentials provider lazily (#9967)
## Problem

The credentials providers tries to connect to AWS STS even when we use
plain Redis connections.

## Summary of changes

* Construct the CredentialsProvider only when needed ("irsa").
2024-12-02 15:38:12 +00:00
Alexander Bayandin
c18716bb3f CI(replication-tests): fix notifications about replication-tests failures (#9950)
## Problem

`if: ${{ github.event.schedule }}` gets skipped if a previous step has
failed, but we want to run the step for both `success` and `failure`

## Summary of changes
- Add `!cancelled()` to notification step if-condition, to skip only
cancelled jobs
2024-12-02 12:46:07 +00:00
Conrad Ludgate
cd1d2d1996 fix(proxy): forward notifications from authentication (#9948)
Fixes https://github.com/neondatabase/cloud/issues/20973. 

This refactors `connect_raw` in order to return direct access to the
delayed notices.

I cannot find a way to test this with psycopg2 unfortunately, although
testing it with psql does return the expected results.
2024-12-02 12:29:57 +00:00
John Spray
bd09369198 storcon: add metric for AZ scheduling violations (#9949)
## Problem

We can't easily tell how far the state of shards is from their AZ
preferences. This can be a cause of performance issues, so it's
important for diagnosability that we can tell easily if there are
significant numbers of shards that aren't running in their preferred AZ.

Related: https://github.com/neondatabase/cloud/issues/15413

## Summary of changes

- In reconcile_all, count shards that are scheduled into the wrong AZ
(if they have a preference), and publish it as a prometheus gauge.
- Also calculate a statistic for how many shards wanted to reconcile but
couldn't.

This is clearly a lazy calculation: reconcile all only runs
periodically. But that's okay: shards in the wrong AZ is something that
only matters if it stays that way for some period of time.
2024-12-02 11:50:22 +00:00
Erik Grinaker
5330122049 test_runner: improve wait_until (#9936)
Improves `wait_until` by:

* Use `timeout` instead of `iterations`. This allows changing the
timeout/interval parameters independently.
* Make `timeout` and `interval` optional (default 20s and 0.5s). Most
callers don't care.
* Only output status every 1s by default, and add optional
`status_interval` parameter.
* Remove `show_intermediate_error`, this was always emitted anyway.

Most callers have been updated to use the defaults, except where they
had good reason otherwise.
2024-12-02 10:26:15 +00:00
Anastasia Lubennikova
45658ccccb Update pgvector to 0.8.0 (#9733) 2024-12-02 10:10:51 +00:00
John Spray
14853a3284 storcon: don't take any Service locks in /status and /ready (#9944)
## Problem

We saw unexpected container terminations when running in k8s with with
small CPU resource requests.

The /status and /ready handlers called `maybe_forward`, which always
takes the lock on Service::inner.

If there is a lot of writer lock contention, and the container is
starved of CPU, this increases the likelihood that we will get killed by
the kubelet.

It isn't certain that this was a cause of issues, but it is a potential
source that we can eliminate.

## Summary of changes

- Revise logic to return immediately if the URL is in the non-forwarded
list, rather than calling maybe_forward
2024-12-01 18:09:58 +00:00
Konstantin Knizhnik
aad809b048 Fix issues with prefetch ring buffer resize (#9847)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479


We observe the following error in the logs 
```
[XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128
```
most likely caused by changing `neon.readahead_buffer_size`

## Summary of changes

1. Copy shard state
2. Do not use prefetch_set_unused in readahead_buffer_resize
3. Change prefetch buffer overflow criteria

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-01 15:47:28 +00:00
Alexander Bayandin
fae8e7ba76 Compute image: prepare Postgres v14-v16 for Debian 12 (#9954)
## Problem

Current compute images for Postgres 14-16 don't build on Debian 12
because of issues with extensions.
This PR fixes that, but for the current setup, it is mostly a no-op
change.

## Summary of changes
- Use `/bin/bash -euo pipefail` as SHELL to fail earlier
- Fix `plv8` build: backport a trivial patch for v8
- Fix `postgis` build: depend `sfgal` version on Debian version instead
of Postgres version


Tested in: https://github.com/neondatabase/neon/pull/9849
2024-12-01 13:04:37 +00:00
Konstantin Knizhnik
97a9abd181 Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057)
## Problem

See https://github.com/neondatabase/neon/issues/9023

## Summary of changes

Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of
replica and primary configuration mismatch

See https://github.com/neondatabase/postgres/pull/501
See https://github.com/neondatabase/postgres/pull/502
See https://github.com/neondatabase/postgres/pull/503
See https://github.com/neondatabase/postgres/pull/504


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-12-01 12:23:10 +00:00
Folke Behrens
4abc8e5282 Merge the consumption metric pushes (#9939)
#8564

## Problem

The main and backup consumption metric pushes are completely
independent,
resulting in different event time windows and different idempotency
keys.

## Summary of changes

* Merge the push tasks, but keep chunks the same size.
2024-11-30 10:11:37 +00:00
Christian Schwarz
aa4ec11af9 page_service: rewrite batching to work without a timeout (#9851)
# Problem

The timeout-based batching adds latency to unbatchable workloads.

We can choose a short batching timeout (e.g. 10us) but that requires
high-resolution timers, which tokio doesn't have.
I thoroughly explored options to use OS timers (see
[this](https://github.com/neondatabase/neon/pull/9822) abandoned PR).
In short, it's not an attractive option because any timer implementation
adds non-trivial overheads.

# Solution

The insight is that, in the steady state of a batchable workload, the
time we spend in `get_vectored` will be hundreds of microseconds anyway.

If we prepare the next batch concurrently to `get_vectored`, we will
have a sizeable batch ready once `get_vectored` of the current batch is
done and do not need an explicit timeout.

This can be reasonably described as **pipelining of the protocol
handler**.

# Implementation

We model the sub-protocol handler for pagestream requests
(`handle_pagrequests`) as two futures that form a pipeline:

2. Batching: read requests from the connection and fill the current
batch
3. Execution: `take` the current batch, execute it using `get_vectored`,
and send the response.

The Reading and Batching stage are connected through a new type of
channel called `spsc_fold`.

See the long comment in the `handle_pagerequests_pipelined` for details.

# Changes

- Refactor `handle_pagerequests`
    - separate functions for
- reading one protocol message; produces a `BatchedFeMessage` with just
one page request in it
- batching; tried to merge an incoming `BatchedFeMessage` into an
existing `BatchedFeMessage`; returns `None` on success and returns back
the incoming message in case merging isn't possible
        - execution of a batched message
- unify the timeline handle acquisition & request span construction; it
now happen in the function that reads the protocol message
- Implement serial and pipelined model
    - serial: what we had before any of the batching changes
      - read one protocol message
      - execute protocol messages
    - pipelined: the design described above
- optionality for execution of the pipeline: either via concurrent
futures vs tokio tasks
- Pageserver config
  - remove batching timeout field
  - add ability to configure pipelining mode
- add ability to limit max batch size for pipelined configurations
(required for the rollout, cf
https://github.com/neondatabase/cloud/issues/20620 )
  - ability to configure execution mode
- Tests
  - remove `batch_timeout` parametrization
  - rename `test_getpage_merge_smoke` to `test_throughput`
- add parametrization to test different max batch sizes and execution
moes
  - rename `test_timer_precision` to `test_latency`
  - rename the test case file to `test_page_service_batching.py`
  - better descriptions of what the tests actually do

## On the holding The `TimelineHandle` in the pending batch

While batching, we hold the `TimelineHandle` in the pending batch.
Therefore, the timeline will not finish shutting down while we're
batching.

This is not a problem in practice because the concurrently ongoing
`get_vectored` call will fail quickly with an error indicating that the
timeline is shutting down.
This results in the Execution stage returning a `QueryError::Shutdown`,
which causes the pipeline / entire page service connection to shut down.
This drops all references to the
`Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the
contained `TimelineHandle`s.

- => fixes https://github.com/neondatabase/neon/issues/9850

# Performance

Local run of the benchmarks, results in [this empty
commit](1cf5b1463f)
in the PR branch.

Key take-aways:
* `concurrent-futures` and `tasks` deliver identical `batching_factor`
* tail latency impact unknown, cf
https://github.com/neondatabase/neon/issues/9837
* `concurrent-futures` has higher throughput than `tasks` in all
workloads (=lower `time` metric)
* In unbatchable workloads, `concurrent-futures` has 5% higher
`CPU-per-throughput` than that of `tasks`, and 15% higher than that of
`serial`.
* In batchable-32 workload, `concurrent-futures` has 8% lower
`CPU-per-throughput` than that of `tasks` (comparison to tput of
`serial` is irrelevant)
* in unbatchable workloads, mean and tail latencies of
`concurrent-futures` is practically identical to `serial`, whereas
`tasks` adds 20-30us of overhead

Overall, `concurrent-futures` seems like a slightly more attractive
choice.

# Rollout

This change is disabled-by-default.

Rollout plan:
- https://github.com/neondatabase/cloud/issues/20620

# Refs

- epic: https://github.com/neondatabase/neon/issues/9376
- this sub-task: https://github.com/neondatabase/neon/issues/9377
- the abandoned attempt to improve batching timeout resolution:
https://github.com/neondatabase/neon/pull/9820
- closes https://github.com/neondatabase/neon/issues/9850
- fixes https://github.com/neondatabase/neon/issues/9835
2024-11-30 00:16:24 +00:00
Matthias van de Meent
973a8d2680 Fix timeout value used in XLogWaitForReplayOf (#9937)
The previous value assumed usec precision, while the timeout used is in
milliseconds, causing replica backends to wait for (potentially) many
hours for WAL replay without the expected progress reports in logs.

This fixes the issue.

Reported-By: Alexander Lakhin <exclusion@gmail.com>

## Problem


https://github.com/neondatabase/postgres/pull/279#issuecomment-2507671817

The timeout value was configured with the assumption the indicated value
would be microseconds, where it's actually milliseconds. That causes the
backend to wait for much longer (2h46m40s) before it emits the "I'm
waiting for recovery" message. While we do have wait events configured
on this, it's not great to have stuck backends without clear logs, so
this fixes the timeout value in all our PostgreSQL branches.

## PG PRs

* PG14: https://github.com/neondatabase/postgres/pull/542
* PG15: https://github.com/neondatabase/postgres/pull/543
* PG16: https://github.com/neondatabase/postgres/pull/544
* PG17: https://github.com/neondatabase/postgres/pull/545
2024-11-29 19:10:26 +00:00
Gleb Novikov
c848f25ec2 Fixed fast_import pgbin in calling get_pg_version (#9933)
Was working on https://github.com/neondatabase/cloud/pull/20795 and
discovered that fast_import is not working normally.
2024-11-29 17:58:36 +00:00
John Spray
d5624cc505 pageserver: download small objects using a smaller timeout (#9938)
## Problem

It appears that the Azure storage API tends to hang TCP connections more
than S3 does.

Currently we use a 2 minute timeout for all downloads. This is large
because sometimes the objects we download are large. However, waiting 2
minutes when doing something like downloading a manifest on tenant
attach is problematic, because when someone is doing a "create tenant,
create timeline" workflow, that 2 minutes is long enough for them
reasonably to give up creating that timeline.

Rather than propagate oversized timeouts further up the stack, we should
use a different timeout for objects that we expect to be small.

Closes: https://github.com/neondatabase/neon/issues/9836

## Summary of changes

- Add a `small_timeout` configuration attribute to remote storage,
defaulting to 30 seconds (still a very generous period to do something
like download an index)
- Add a DownloadKind parameter to DownloadOpts, so that callers can
indicate whether they expect the object to be small or large.
- In the azure client, use small timeout for HEAD requests, and for GET
requests if DownloadKind::Small is used.
- Use DownloadKind::Small for manifests, indices, and heatmap downloads.

This PR intentionally does not make the equivalent change to the S3
client, to reduce blast radius in case this has unexpected consequences
(we could accomplish the same thing by editing lots of configs, but just
skipping the code is simpler for right now)
2024-11-29 15:11:44 +00:00
Alexey Kondratov
538e2312a6 feat(compute_ctl): Always set application_name (#9934)
## Problem

It was not always possible to judge what exactly some `cloud_admin`
connections were doing because we didn't consistently set
`application_name` everywhere.

## Summary of changes

Unify the way we connect to Postgres:
1. Switch to building configs everywhere
2. Always set `application_name` and make naming consistent

Follow-up for #9919
Part of neondatabase/cloud#20948
2024-11-29 13:55:56 +00:00
Erik Grinaker
a6073b5013 safekeeper: use jemalloc (#9780)
## Problem

To add Safekeeper heap profiling in #9778, we need to switch to an
allocator that supports it. Pageserver and proxy already use jemalloc.

Touches #9534.

## Summary of changes

Use jemalloc in Safekeeper.
2024-11-29 13:38:04 +00:00
John Spray
ea3798e3b3 storage controller: use proper ScheduleContext when evacuating a node (#9908)
## Problem

When picking locations for a shard, we should use a ScheduleContext that
includes all the other shards in the tenant, so that we apply proper
anti-affinity between shards. If we don't do this, then it can lead to
unstable scheduling, where we place a shard somewhere that the optimizer
will then immediately move it away from.

We didn't always do this, because it was a bit awkward to accumulate the
context for a tenant rather than just walking tenants.

This was a TODO in `handle_node_availability_transition`:
```
                        // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters
                        // for tenants without secondary locations: if they have a secondary location, then this
                        // schedule() call is just promoting an existing secondary)
```

This is a precursor to https://github.com/neondatabase/neon/issues/8264,
where the current imperfect scheduling during node evacuation hampers
testing.

## Summary of changes

- Add an iterator type that yields each shard along with a
schedulecontext that includes all the other shards from the same tenant
- Use the iterator to replace hand-crafted logic in optimize_all_plan
(functionally identical)
- Use the iterator in `handle_node_availability_transition` to apply
proper anti-affinity during node evacuation.
2024-11-29 13:27:49 +00:00
Conrad Ludgate
1d642d6a57 chore(proxy): vendor a subset of rust-postgres (#9930)
Our rust-postgres fork is getting messy. Mostly because proxy wants more
control over the raw protocol than tokio-postgres provides. As such,
it's diverging more and more. Storage and compute also make use of
rust-postgres, but in more normal usage, thus they don't need our crazy
changes.

Idea: 
* proxy maintains their subset
* other teams use a minimal patch set against upstream rust-postgres

Reviewing this code will be difficult. To implement it, I
1. Copied tokio-postgres, postgres-protocol and postgres-types from
00940fcdb5
2. Updated their package names with the `2` suffix to make them compile
in the workspace.
3. Updated proxy to use those packages
4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches
applied https://github.com/jbg/tokio-postgres-rustls/pull/32
https://github.com/jbg/tokio-postgres-rustls/pull/33)
5. Removed as much dead code as I could find in the vendored libraries
6. Updated the tokio-postgres-rustls code to use our existing channel
binding implementation
2024-11-29 11:08:01 +00:00
Erik Grinaker
3ffe6de0b9 test_runner/performance: add logical message ingest benchmark (#9749)
Adds a benchmark for logical message WAL ingestion throughput
end-to-end. Logical messages are essentially noops, and thus ignored by
the Pageserver.

Example results from my MacBook, with fsync enabled:

```
postgres_ingest: 14.445 s
safekeeper_ingest: 29.948 s
pageserver_ingest: 30.013 s
pageserver_recover_ingest: 8.633 s
wal_written: 10,340 MB
message_count: 1310720 messages
postgres_throughput: 715 MB/s
safekeeper_throughput: 345 MB/s
pageserver_throughput: 344 MB/s
pageserver_recover_throughput: 1197 MB/s
```

See
https://github.com/neondatabase/neon/issues/9642#issuecomment-2475995205
for running analysis.

Touches #9642.
2024-11-29 09:40:08 +00:00
Alexey Kondratov
42fb3c4d30 fix(compute_ctl): Allow usage of DB names with whitespaces (#9919)
## Problem

We used `set_path()` to replace the database name in the connection
string. It automatically does url-safe encoding if the path is not
already encoded, but it does it as per the URL standard, which assumes
that tabs can be safely removed from the path without changing the
meaning of the URL. See, e.g.,
https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks
for DBs with properly %-encoded names, like with `%20`, as they are kept
intact, but actually should be escaped.

Yet, this is not true for Postgres, where it's completely valid to have
trailing tabs in the database name.

I think this is the PR that caused this regression
https://github.com/neondatabase/neon/pull/9717, as it switched from
`postgres::config::Config` back to `set_path()`.

This was fixed a while ago already [1], btw, I just haven't added a test
to catch this regression back then :(

## Summary of changes

This commit changes the code back to use
`postgres/tokio_postgres::Config` everywhere.

While on it, also do some changes around, as I had to touch this code:
1. Bump some logging from `debug` to `info` in the spec apply path. We
do not use `debug` in prod, and it was tricky to understand what was
going on with this bug in prod.
2. Refactor configuration concurrency calculation code so it was
reusable. Yet, still keep `1` in the case of reconfiguration. The
database can be actively used at this moment, so we cannot guarantee
that there will be enough spare connection slots, and the underlying
code won't handle connection errors properly.
3. Simplify the installed extensions code. It was spawning a blocking
task inside async function, which doesn't make much sense. Instead, just
have a main sync function and call it with `spawn_blocking` in the API
code -- the only place we need it to be async.
4. Add regression python test to cover this and related problems in the
future. Also, add more extensive testing of schema dump and DBs and
roles listing API.

[1]:
4d1e48f3b9
[2]:
https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org

Resolves neondatabase/cloud#20869
2024-11-28 21:38:30 +00:00
Alexander Bayandin
e04dd3be0b test_runner: rerun all failed tests (#9917)
## Problem

Currently, we rerun only known flaky tests. This approach was chosen to
reduce the number of tests that go unnoticed (by forcing people to take
a look at failed tests and rerun the job manually), but it has some
drawbacks:
- In PRs, people tend to push new changes without checking failed tests
(that's ok)
- In the main, tests are just restarted without checking
(understandable)
- Parametrised tests become flaky one by one, i.e. if `test[1]` is flaky
`, test[2]` is not marked as flaky automatically (which may or may not
be the case).

I suggest rerunning all failed tests to increase the stability of GitHub
jobs and using the Grafana Dashboard with flaky tests for deeper
analysis.

## Summary of changes
- Rerun all failed tests twice at max
2024-11-28 19:02:57 +00:00
Vlad Lazar
eb520a14ce pageserver: return correct LSN for interpreted proto keep alive responses (#9928)
## Problem

For the interpreted proto the pageserver is not returning the correct
LSN
in replies to keep alive requests. This is because the interpreted
protocol arm
was not updating `last_rec_lsn`.

## Summary of changes

* Return correct LSN in keep-alive responses
* Fix shard field in wal sender traces
2024-11-28 17:38:47 +00:00
Arpad Müller
eb5d832e6f Update rust to 1.83.0, also update cargo adjacent tools (#9926)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://releases.rs/docs/1.83.0/).

Also update `cargo-hakari`, `cargo-deny`, `cargo-hack` and
`cargo-nextest` to their latest versions.

Prior update was in #9445.
2024-11-28 15:49:30 +00:00
Erik Grinaker
70780e310c Makefile: build pg_visibility (#9922)
Build the `pg_visibility` extension for use with `neon_local`. This is
useful to inspect the visibility map for debugging.

Touches #9914.
2024-11-28 15:48:18 +00:00
Vlad Lazar
e82f7f0dfc remote_storage/abs: count 404 and 304 for get as ok for metrics (#9912)
## Problem

We currently see elevated levels of errors for GetBlob requests. This is
because 404 and 304 are counted as errors for metric reporting.

## Summary of Changes

Bring the implementation in line with the S3 client and treat 404 and
304 responses as ok for metric purposes.

Related: https://github.com/neondatabase/cloud/issues/20666
2024-11-28 10:11:08 +00:00
Ivan Efremov
8173dc600a proxy: spawn cancellation checks in the background (#9918)
## Problem
For cancellation, a connection is open during all the cancel checks.
## Summary of changes
Spawn cancellation checks in the background, and close connection
immediately.
Use task_tracker for cancellation checks.
2024-11-28 06:32:22 +00:00
Erik Grinaker
da1daa2426 pageserver: only apply ClearVmBits on relevant shards (#9895)
# Problem

VM (visibility map) pages are stored and managed as any regular relation
page, in the VM fork of the main relation. They are also sharded like
other pages. Regular WAL writes to the VM pages (typically performed by
vacuum) are routed to the correct shard as usual. However, VM pages are
also updated via `ClearVmBits` metadata records emitted when main
relation pages are updated. These metadata records were sent to all
shards, like other metadata records. This had the following effects:

* On shards responsible for VM pages, the `ClearVmBits` applies as
expected.

* On shard 0, which knows about the VM relation and its size but doesn't
necessarily have any VM pages, the `ClearVmBits` writes may have been
applied without also having applied the explicit WAL writes to VM pages.

* If VM pages are spread across multiple shards (unlikely with 256MB
stripe size), all shards may have applied `ClearVmBits` if the pages
fall within their local view of the relation size, even for pages they
do not own.

* On other shards, this caused a relation size cache miss and a DbDir
and RelDir lookup before dropping the `ClearVmBits`. With many
relations, this could cause significant CPU overhead.

This is not believed to be a correctness problem, but this will be
verified in #9914.

Resolves #9855.

# Changes

Route `ClearVmBits` metadata records only to the shards responsible for
the VM pages.

Verification of the current VM handling and cleanup of incomplete VM
pages on shard 0 (and potentially elsewhere) is left as follow-up work.
2024-11-27 19:44:24 +00:00
Alex Chi Z.
9e3cb75bc7 fix(pageserver): flush deletion queue in reload shutdown mode (#9884)
## Problem

close https://github.com/neondatabase/neon/issues/9859

## Summary of changes

Ensure that the deletion queue gets fully flushed (i.e., the deletion
lists get applied) during a graceful shutdown.

It is still possible that an incomplete shutdown would leave deletion
list behind and cause race upon the next startup, but we assume this
will unlikely happen, and even if it happened, the pageserver should
already be at a tainted state and the tenant should be moved to a new
tenant with a new generation number.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-27 18:30:54 +00:00
Folke Behrens
5c41707bee proxy: promote two logs to error, fix multiline log (#9913)
* Promote two logs from mpsc send errors to error level. The channels
are unbounded and there shouldn't be errors.
* Fix one multiline log from anyhow::Error. Use Debug instead of
Display.
2024-11-27 18:05:46 +00:00
Erik Grinaker
cc37fa0f33 pageserver: add metrics for unknown ClearVmBits pages (#9911)
## Problem

When ingesting implicit `ClearVmBits` operations, we silently drop the
writes if the relation or page is unknown. There are implicit
assumptions around VM pages wrt. explicit/implicit updates, sharding,
and relation sizes, which can possibly drop writes incorrectly. Adding a
few metrics will allow us to investigate further and tighten up the
logic.

Touches #9855.

## Summary of changes

Add a `pageserver_wal_ingest_clear_vm_bits_unknown` metric to record
dropped `ClearVmBits` writes.

Also add comments clarifying the behavior of relation sizes on non-zero
shards.
2024-11-27 17:16:41 +00:00
Alex Chi Z.
23f5a27146 fix(storage-scrubber): valid layermap error degrades to warning (#9902)
Valid layer assumption is a necessary condition for a layer map to be
valid. It's a stronger check imposed by gc-compaction than the actual
valid layermap definition. Actually, the system can work as long as
there are no overlapping layer maps. Therefore, we degrade that into a
warning.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-27 16:07:39 +00:00
Erik Grinaker
e4f437a354 pageserver: add relsize cache metrics (#9890)
## Problem

We don't have any observability for the relation size cache. We have
seen cache misses cause significant performance impact with high
relation counts.

Touches #9855.

## Summary of changes

Adds the following metrics:

* `pageserver_relsize_cache_entries`
* `pageserver_relsize_cache_hits`
* `pageserver_relsize_cache_misses`
* `pageserver_relsize_cache_misses_old`
2024-11-27 13:54:14 +00:00
Vlad Lazar
8fdf786217 pageserver: add tenant config override for wal receiver proto (#9888)
## Problem

Can't change protocol at tenant granularity.

## Summary of changes

Add tenant config level override for wal receiver protocol.

## Links

Related: https://github.com/neondatabase/neon/issues/9336
Epic: https://github.com/neondatabase/neon/issues/9329
2024-11-27 13:46:23 +00:00
Vlad Lazar
9e0148de11 safekeeper: use protobuf for sending compressed records to pageserver (#9821)
## Problem

https://github.com/neondatabase/neon/pull/9746 lifted decoding and
interpretation of WAL to the safekeeper.
This reduced the ingested amount on the pageservers by around 10x for a
tenant with 8 shards, but doubled
the ingested amount for single sharded tenants.

Also, https://github.com/neondatabase/neon/pull/9746 uses bincode which
doesn't support schema evolution.
Technically the schema can be evolved, but it's very cumbersome.

## Summary of changes

This patch set addresses both problems by adding protobuf support for
the interpreted wal records and adding compression support. Compressed
protobuf reduced the ingested amount by 100x on the 32 shards
`test_sharded_ingest` case (compared to non-interpreted proto). For the
1 shard case the reduction is 5x.

Sister change to `rust-postgres` is
[here](https://github.com/neondatabase/rust-postgres/pull/33).

## Links

Related: https://github.com/neondatabase/neon/issues/9336
Epic: https://github.com/neondatabase/neon/issues/9329
2024-11-27 12:12:21 +00:00
Alexander Bayandin
7b41ee872e CI(pre-merge-checks): build only one build-tools-image (#9718)
## Problem

The `pre-merge-checks` workflow relies on the build-tools image. 
If changes to the `build-tools` image have been merged into the main
branch since the last CI run for a PR (with other changes to the
`build-tools`), the image will be rebuilt during the merge queue run.
Otherwise, cached images are used.
Rebuilding the image adds approximately 10 minutes on x86-64 and 20
minutes on arm64 to the process.

## Summary of changes
- parametrise `build-build-tools-image` job with arch and Debian version
- Run `pre-merge-checks` only on Debian 12 x86-64 image
2024-11-27 10:42:26 +00:00
Peter Bendel
277c33ba3f ingest benchmark: after effective_io_concurrency = 100 we can increase compute side parallelism (#9904)
## Problem

ingest benchmark tests project migration to Neon involving steps
- COPY relation data
- create indexes
- create constraints

Previously we used only 4 copy jobs, 4 create index jobs and 7
maintenance workers. After increasing effective_io_concurrency on
compute we see that we can sustain more parallelism in the ingest bench

## Summary of changes

Increase copy jobs to 8, create index jobs to 8 and maintenance workers
to 16
2024-11-27 10:09:01 +00:00
Tristan Partin
2b788cb53f Bump neon.logical_replication_max_snap_files default to 10000 (#9896)
This bump comes from a recommendation from Chi.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-26 17:49:37 +00:00
Peter Bendel
13feda0669 track how much time the flush loop is stalled waiting for uploads (#9885)
## Problem

We don't know how much time PS is losing during ingest when waiting for
remote storage uploads in the flush frozen layer loop.
Also we don't know how many remote storage requests get an permit
without waiting (not throttled by remote_storage concurrency_limit).

## Summary of changes

- Add a metric that accumulates the time waited per shard/PS
- in [remote storage semaphore wait
seconds](https://neonprod.grafana.net/d/febd9732-9bcf-4992-a821-49b1f6b02724/remote-storage?orgId=1&var-datasource=HUNg6jvVk&var-instance=pageserver-26.us-east-2.aws.neon.build&var-instance=pageserver-27.us-east-2.aws.neon.build&var-instance=pageserver-28.us-east-2.aws.neon.build&var-instance=pageserver-29.us-east-2.aws.neon.build&var-instance=pageserver-30.us-east-2.aws.neon.build&var-instance=pageserver-31.us-east-2.aws.neon.build&var-instance=pageserver-36.us-east-2.aws.neon.build&var-instance=pageserver-37.us-east-2.aws.neon.build&var-instance=pageserver-38.us-east-2.aws.neon.build&var-instance=pageserver-39.us-east-2.aws.neon.build&var-instance=pageserver-40.us-east-2.aws.neon.build&var-instance=pageserver-41.us-east-2.aws.neon.build&var-request_type=put_object&from=1731961336340&to=1731964762933&viewPanel=3)
add a first bucket with 100 microseconds to count requests that do not
need to wait on semaphore

Update: created a new version that uses a Gauge (one increasing value
per PS/shard) instead of histogram as suggested by review
2024-11-26 11:46:58 +00:00
Conrad Ludgate
96a1b71c84 chore(proxy): discard request context span during passthrough (#9882)
## Problem

The RequestContext::span shouldn't live for the entire postgres
connection, only the handshake.

## Summary of changes

* Slight refactor to the RequestContext to discard the span upon
handshake completion.
* Make sure the temporary future for the handshake is dropped (not bound
to a variable)
* Runs our nightly fmt script
2024-11-25 21:32:53 +00:00
Arpad Müller
a74ab9338d fast_import: remove hardcoding of pg_version (#9878)
Before, we hardcoded the pg_version to 140000, while the code expected
version numbers like 14. Now we use an enum, and code from
`extension_server.rs` to auto-detect the correct version. The enum helps
when we add support for a version: enums ensure that compilation fails
if one forgets to put the version to one of the `match` locations.

cc https://github.com/neondatabase/neon/pull/9218
2024-11-25 20:23:42 +00:00
Folke Behrens
7404887b81 proxy: Demote errors from cplane request routines to debug (#9886)
## Problem

Any errors from these async blocks are unconditionally logged at error
level
even though we already handle such errors based on context.

## Summary of changes

* Log raw errors from creating and executing cplane requests at debug
level.
* Inline macro calls to retain the correct callsite.
2024-11-25 19:35:32 +00:00
Folke Behrens
87e4dd23a1 proxy: Demote all cplane error replies to info log level (#9880)
## Problem

The vast majority of the error/warn logs from cplane are about time or
data transfer quotas exceeded or endpoint-not-found errors and not
operational errors in proxy or cplane.

## Summary of changes

* Demote cplane error replies to info level.
* Raise other errors from warn back to error.
2024-11-25 17:53:26 +00:00
Vlad Lazar
7a2f0ed8d4 safekeeper: lift decoding and interpretation of WAL to the safekeeper (#9746)
## Problem

For any given tenant shard, pageservers receive all of the tenant's WAL
from the safekeeper.
This soft-blocks us from using larger shard counts due to bandwidth
concerns and CPU overhead of filtering
out the records.

## Summary of changes

This PR lifts the decoding and interpretation of WAL from the pageserver
into the safekeeper.

A customised PG replication protocol is used where instead of sending
raw WAL, the safekeeper sends
filtered, interpreted records. The receiver drives the protocol
selection, so, on the pageserver side, usage
of the new protocol is gated by a new pageserver config:
`wal_receiver_protocol`.

 More granularly the changes are:
1. Optionally inject the protocol and shard identity into the arguments
used for starting replication
2. On the safekeeper side, implement a new wal sending primitive which
decodes and interprets records
 before sending them over
3. On the pageserver side, implement the ingestion of this new
replication message type. It's very similar
 to what we already have for raw wal (minus decoding and interpreting).
 
 ## Notes
 
* This PR currently uses my [branch of
rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support)
which includes the deserialization logic for the new replication message
type. PR for that is open
[here](https://github.com/neondatabase/rust-postgres/pull/32).
* This PR contains changes for both pageservers and safekeepers. It's
safe to merge because the new protocol is disabled by default on the
pageserver side. We can gradually start enabling it in subsequent
releases.
* CI tests are running on https://github.com/neondatabase/neon/pull/9747
 
 ## Links
 
 Related: https://github.com/neondatabase/neon/issues/9336
 Epic: https://github.com/neondatabase/neon/issues/9329
2024-11-25 17:29:28 +00:00
Christian Schwarz
5c2356988e page_service: add benchmark for batching (#9820)
This PR adds two benchmark to demonstrate the effect of server-side
getpage request batching added in
https://github.com/neondatabase/neon/pull/9321.

For the CPU usage, I found the the `prometheus` crate's built-in CPU
usage accounts the seconds at integer granularity. That's not enough you
reduce the target benchmark runtime for local iteration. So, add a new
`libmetrics` metric and report that.

The benchmarks are disabled because [on our benchmark nodes, timer
resolution isn't high
enough](https://neondb.slack.com/archives/C059ZC138NR/p1732264223207449).
They work (no statement about quality) on my bare-metal devbox.

They will be refined and enabled once we find a fix. Candidates at time
of writing are:
- https://github.com/neondatabase/neon/pull/9822
- https://github.com/neondatabase/neon/pull/9851


Refs:

- Epic: https://github.com/neondatabase/neon/issues/9376
- Extracted from https://github.com/neondatabase/neon/pull/9792
2024-11-25 15:52:39 +00:00
Konstantin Knizhnik
441612c1ce Prefetch on macos (#9875)
## Problem

Prefetch is disabled at MacODS because `posix_fadvise` is not available.
But Neon prefetch is not using this function and for testing at MacOS is
it very convenient that prefetch is available.

## Summary of changes

Define `USE_PREFETCH` in Makefile.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-25 15:21:52 +00:00
Arpad Müller
77630e5408 Address beta clippy lint needless_lifetimes (#9877)
The 1.82.0 version of Rust will be stable soon, let's get the clippy
lint fixes in before the compiler version upgrade.
2024-11-25 14:59:12 +00:00
Alexander Bayandin
3d380acbd1 Bump default Debian version to Bookworm everywhere (#9863)
## Problem

We have a couple of CI workflows that still run on Debian Bullseye, and
the default Debian version in images is Bullseye as well (we explicitly
set building on Bookworm)

## Summary of changes
- Run `pgbench-pgvector` on Bookworm (fix a couple of packages)
- Run `trigger_bench_on_ec2_machine_in_eu_central_1` on Bookworm
- Change default `DEBIAN_VERSION` in Dockerfiles to Bookworm
- Make `pinned` docker tag an alias to `pinned-bookworm`
2024-11-25 14:43:32 +00:00
Alex Chi Z.
4630b70962 fix(pageserver): ensure all layers are flushed before measuring RSS (#9861)
## Problem

close https://github.com/neondatabase/neon/issues/9761

The test assumed that no new L0 layers are flushed throughout the
process, which is not true.

## Summary of changes

Fix the test case `test_compaction_l0_memory` by flushing in-memory
layers before compaction.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-25 14:25:18 +00:00
Conrad Ludgate
6f6749c4a9 chore: update rustls (#9871) 2024-11-25 12:01:30 +00:00
Folke Behrens
0d1e82f0a7 Bump futures-* crates, drop unused license, hide duplicate crate warnings (#9858)
* The futures-util crate we use was yanked. Bump it and its siblings to
new patch release.
https://github.com/rust-lang/futures-rs/releases/tag/0.3.31
* cargo-deny: Drop an unused license.
* cargo-deny: Don't warn about duplicate crate. Duplicate crates are
unavoidable and the noise just hides real warnings.
2024-11-25 10:59:49 +00:00
Alexander Bayandin
6f7aeaa1c5 test_runner: use LFC by default (#8613)
## Problem
LFC is not enabled by default in tests, but it is enabled in production.
This increases the risk of errors in the production environment, which
were not found during the routine workflow.
However, enabling LFC for all the tests may overload the disk on our
servers and increase the number of failures.
So, we try enabling  LFC in one case to evaluate the possible risk.

## Summary of changes
A new environment variable, USE_LFC is introduced. If it is set to true,
LFC is enabled by default in all the tests.
In our workflow, we enable LFC for PG17, release, x86-64, and disabled
for all other combinations.

---------

Co-authored-by: Alexey Masterov <alexeymasterov@neon.tech>
Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>
2024-11-25 09:01:05 +00:00
608 changed files with 50202 additions and 11624 deletions

View File

@@ -3,6 +3,16 @@
# by the RUSTDOCFLAGS env var in CI.
rustdocflags = ["-Arustdoc::private_intra_doc_links"]
# Enable frame pointers. This may have a minor performance overhead, but makes it easier and more
# efficient to obtain stack traces (and thus CPU/heap profiles). It may also avoid seg faults that
# we've seen with libunwind-based profiling. See also:
#
# * <https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>
# * <https://github.com/rust-lang/rust/pull/122646>
#
# NB: the RUSTFLAGS envvar will replace this. Make sure to update e.g. Dockerfile as well.
rustflags = ["-Cforce-frame-pointers=yes"]
[alias]
build_testing = ["build", "--features", "testing"]
neon = ["run", "--bin", "neon_local"]

View File

@@ -46,6 +46,9 @@ workspace-members = [
"utils",
"wal_craft",
"walproposer",
"postgres-protocol2",
"postgres-types2",
"tokio-postgres2",
]
# Write out exact versions rather than a semver range. (Defaults to false.)

View File

@@ -21,3 +21,8 @@ config-variables:
- SLACK_UPCOMING_RELEASE_CHANNEL_ID
- DEV_AWS_OIDC_ROLE_ARN
- BENCHMARK_INGEST_TARGET_PROJECTID
- PGREGRESS_PG16_PROJECT_ID
- PGREGRESS_PG17_PROJECT_ID
- SLACK_ON_CALL_QA_STAGING_STREAM
- DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN
- SLACK_ON_CALL_STORAGE_STAGING_STREAM

View File

@@ -7,10 +7,9 @@ inputs:
type: boolean
required: false
default: false
aws_oicd_role_arn:
description: 'the OIDC role arn to (re-)acquire for allure report upload - if not set call must acquire OIDC role'
required: false
default: ''
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
outputs:
base-url:
@@ -43,7 +42,8 @@ runs:
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \
[ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
@@ -83,12 +83,11 @@ runs:
ALLURE_VERSION: 2.27.0
ALLURE_ZIP_SHA256: b071858fb2fa542c65d8f152c5c40d26267b2dfb74df1f1608a589ecca38e777
- name: (Re-)configure AWS credentials # necessary to upload reports to S3 after a long-running test
if: ${{ !cancelled() && (inputs.aws_oicd_role_arn != '') }}
uses: aws-actions/configure-aws-credentials@v4
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws_oicd_role_arn }}
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
# Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this

View File

@@ -8,10 +8,9 @@ inputs:
unique-key:
description: 'string to distinguish different results in the same run'
required: true
aws_oicd_role_arn:
description: 'the OIDC role arn to (re-)acquire for allure report upload - if not set call must acquire OIDC role'
required: false
default: ''
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
@@ -23,7 +22,8 @@ runs:
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \
[ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
@@ -35,12 +35,11 @@ runs:
env:
REPORT_DIR: ${{ inputs.report-dir }}
- name: (Re-)configure AWS credentials # necessary to upload reports to S3 after a long-running test
if: ${{ !cancelled() && (inputs.aws_oicd_role_arn != '') }}
uses: aws-actions/configure-aws-credentials@v4
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws_oicd_role_arn }}
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
- name: Upload test results

View File

@@ -15,10 +15,19 @@ inputs:
prefix:
description: "S3 prefix. Default is '${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
required: false
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600
- name: Download artifact
id: download-artifact
shell: bash -euxo pipefail {0}

View File

@@ -36,8 +36,8 @@ inputs:
description: 'Region name for real s3 tests'
required: false
default: ''
rerun_flaky:
description: 'Whether to rerun flaky tests'
rerun_failed:
description: 'Whether to rerun failed tests'
required: false
default: 'false'
pg_version:
@@ -48,10 +48,9 @@ inputs:
description: 'benchmark durations JSON'
required: false
default: '{}'
aws_oicd_role_arn:
description: 'the OIDC role arn to (re-)acquire for allure report upload - if not set call must acquire OIDC role'
required: false
default: ''
aws-oicd-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
@@ -62,6 +61,7 @@ runs:
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact
path: /tmp/neon
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Download Neon binaries for the previous release
if: inputs.build_type != 'remote'
@@ -70,6 +70,7 @@ runs:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact
path: /tmp/neon-previous
prefix: latest
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Download compatibility snapshot
if: inputs.build_type != 'remote'
@@ -81,6 +82,7 @@ runs:
# The lack of compatibility snapshot (for example, for the new Postgres version)
# shouldn't fail the whole job. Only relevant test should fail.
skip-if-does-not-exist: true
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Checkout
if: inputs.needs_postgres_source == 'true'
@@ -108,7 +110,7 @@ runs:
COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}
ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'backward compatibility breakage')
ALLOW_FORWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'forward compatibility breakage')
RERUN_FLAKY: ${{ inputs.rerun_flaky }}
RERUN_FAILED: ${{ inputs.rerun_failed }}
PG_VERSION: ${{ inputs.pg_version }}
shell: bash -euxo pipefail {0}
run: |
@@ -154,15 +156,8 @@ runs:
EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"
fi
if [ "${RERUN_FLAKY}" == "true" ]; then
mkdir -p $TEST_OUTPUT
poetry run ./scripts/flaky_tests.py "${TEST_RESULT_CONNSTR}" \
--days 7 \
--output "$TEST_OUTPUT/flaky.json" \
--pg-version "${DEFAULT_PG_VERSION}" \
--build-type "${BUILD_TYPE}"
EXTRA_PARAMS="--flaky-tests-json $TEST_OUTPUT/flaky.json $EXTRA_PARAMS"
if [ "${RERUN_FAILED}" == "true" ]; then
EXTRA_PARAMS="--reruns 2 $EXTRA_PARAMS"
fi
# We use pytest-split plugin to run benchmarks in parallel on different CI runners
@@ -225,17 +220,19 @@ runs:
# The lack of compatibility snapshot shouldn't fail the job
# (for example if we didn't run the test for non build-and-test workflow)
skip-if-does-not-exist: true
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: (Re-)configure AWS credentials # necessary to upload reports to S3 after a long-running test
if: ${{ !cancelled() && (inputs.aws_oicd_role_arn != '') }}
uses: aws-actions/configure-aws-credentials@v4
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws_oicd_role_arn }}
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
- name: Upload test results
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-store
with:
report-dir: /tmp/test_output/allure/results
unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

View File

@@ -14,9 +14,11 @@ runs:
name: coverage-data-artifact
path: /tmp/coverage
skip-if-does-not-exist: true # skip if there's no previous coverage to download
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}
- name: Upload coverage data
uses: ./.github/actions/upload
with:
name: coverage-data-artifact
path: /tmp/coverage
aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

View File

@@ -14,6 +14,10 @@ inputs:
prefix:
description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
required: false
aws-oicd-role-arn:
description: "the OIDC role arn for aws auth"
required: false
default: ""
runs:
using: "composite"
@@ -53,6 +57,13 @@ runs:
echo 'SKIPPED=false' >> $GITHUB_OUTPUT
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oicd-role-arn }}
role-duration-seconds: 3600
- name: Upload artifact
if: ${{ steps.prepare-artifact.outputs.SKIPPED == 'false' }}
shell: bash -euxo pipefail {0}

12
.github/file-filters.yaml vendored Normal file
View File

@@ -0,0 +1,12 @@
rust_code: ['**/*.rs', '**/Cargo.toml', '**/Cargo.lock']
v14: ['vendor/postgres-v14/**', 'Makefile', 'pgxn/**']
v15: ['vendor/postgres-v15/**', 'Makefile', 'pgxn/**']
v16: ['vendor/postgres-v16/**', 'Makefile', 'pgxn/**']
v17: ['vendor/postgres-v17/**', 'Makefile', 'pgxn/**']
rebuild_neon_extra:
- .github/workflows/neon_extra_builds.yml
rebuild_macos:
- .github/workflows/build-macos.yml

View File

@@ -70,6 +70,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# we create a table that has one row for each database that we want to restore with the status whether the restore is done
- name: Create benchmark_restore_status table if it does not exist

View File

@@ -19,8 +19,8 @@ on:
description: 'debug or release'
required: true
type: string
pg-versions:
description: 'a json array of postgres versions to run regression tests on'
test-cfg:
description: 'a json object of postgres versions and lfc states to run regression tests on'
required: true
type: string
@@ -31,12 +31,13 @@ defaults:
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
jobs:
build-neon:
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
contents: read
container:
image: ${{ inputs.build-tools-image }}
credentials:
@@ -205,6 +206,13 @@ jobs:
done
fi
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours
- name: Run rust tests
env:
NEXTEST_RETRIES: 3
@@ -256,6 +264,7 @@ jobs:
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-artifact
path: /tmp/neon
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# XXX: keep this after the binaries.list is formed, so the coverage can properly work later
- name: Merge and upload coverage data
@@ -265,6 +274,10 @@ jobs:
regress-tests:
# Don't run regression tests on debug arm64 builds
if: inputs.build-type != 'debug' || inputs.arch != 'arm64'
permissions:
id-token: write # aws-actions/configure-aws-credentials
contents: read
statuses: write
needs: [ build-neon ]
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}
container:
@@ -276,14 +289,14 @@ jobs:
options: --init --shm-size=512mb --ulimit memlock=67108864:67108864
strategy:
fail-fast: false
matrix:
pg_version: ${{ fromJson(inputs.pg-versions) }}
matrix: ${{ fromJSON(format('{{"include":{0}}}', inputs.test-cfg)) }}
steps:
- uses: actions/checkout@v4
with:
submodules: true
- name: Pytest regression tests
continue-on-error: ${{ matrix.lfc_state == 'with-lfc' && inputs.build-type == 'debug' }}
uses: ./.github/actions/run-python-test-set
timeout-minutes: 60
with:
@@ -293,13 +306,15 @@ jobs:
run_with_real_s3: true
real_s3_bucket: neon-github-ci-tests
real_s3_region: eu-central-1
rerun_flaky: true
rerun_failed: true
pg_version: ${{ matrix.pg_version }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
TEST_RESULT_CONNSTR: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
CHECK_ONDISK_DATA_COMPATIBILITY: nonempty
BUILD_TAG: ${{ inputs.build-tag }}
PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring
USE_LFC: ${{ matrix.lfc_state == 'with-lfc' && 'true' || 'false' }}
# Temporary disable this step until we figure out why it's so flaky
# Ref https://github.com/neondatabase/neon/issues/4540

View File

@@ -0,0 +1,91 @@
name: Check Codestyle Rust
on:
workflow_call:
inputs:
build-tools-image:
description: "build-tools image"
required: true
type: string
archs:
description: "Json array of architectures to run on"
type: string
defaults:
run:
shell: bash -euxo pipefail {0}
jobs:
check-codestyle-rust:
strategy:
matrix:
arch: ${{ fromJson(inputs.archs) }}
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'small-arm64' || 'small')) }}
container:
image: ${{ inputs.build-tools-image }}
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
options: --init
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: true
- name: Cache cargo deps
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
# Some of our rust modules use FFI and need those to be checked
- name: Get postgres headers
run: make postgres-headers -j$(nproc)
# cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.
# This will catch compiler & clippy warnings in all feature combinations.
# TODO: use cargo hack for build and test as well, but, that's quite expensive.
# NB: keep clippy args in sync with ./run_clippy.sh
#
# The only difference between "clippy --debug" and "clippy --release" is that in --release mode,
# #[cfg(debug_assertions)] blocks are not built. It's not worth building everything for second
# time just for that, so skip "clippy --release".
- run: |
CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"
if [ "$CLIPPY_COMMON_ARGS" = "" ]; then
echo "No clippy args found in .neon_clippy_args"
exit 1
fi
echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV
- name: Run cargo clippy (debug)
run: cargo hack --features default --ignore-unknown-features --feature-powerset clippy $CLIPPY_COMMON_ARGS
- name: Check documentation generation
run: cargo doc --workspace --no-deps --document-private-items
env:
RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"
# Use `${{ !cancelled() }}` to run quck tests after the longer clippy run
- name: Check formatting
if: ${{ !cancelled() }}
run: cargo fmt --all -- --check
# https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci
- name: Check rust dependencies
if: ${{ !cancelled() }}
run: |
cargo hakari generate --diff # workspace-hack Cargo.toml is up-to-date
cargo hakari manage-deps --dry-run # all workspace crates depend on workspace-hack
# https://github.com/EmbarkStudios/cargo-deny
- name: Check rust licenses/bans/advisories/sources
if: ${{ !cancelled() }}
run: cargo deny check --hide-inclusion-graph

View File

@@ -21,7 +21,7 @@ defaults:
shell: bash -euo pipefail {0}
jobs:
create-storage-release-branch:
create-release-branch:
runs-on: ubuntu-22.04
permissions:

View File

@@ -33,7 +33,7 @@ jobs:
# SC2086 - Double quote to prevent globbing and word splitting. - https://www.shellcheck.net/wiki/SC2086
SHELLCHECK_OPTS: --exclude=SC2046,SC2086
with:
fail_on_error: true
fail_level: error
filter_mode: nofilter
level: error

View File

@@ -105,6 +105,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -122,7 +123,7 @@ jobs:
run_in_parallel: false
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# Set --sparse-ordering option of pytest-order plugin
# to ensure tests are running in order of appears in the file.
# It's important for test_perf_pgbench.py::test_pgbench_remote_* tests
@@ -152,7 +153,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -204,6 +205,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Run Logical Replication benchmarks
uses: ./.github/actions/run-python-test-set
@@ -214,7 +216,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 5400
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -231,7 +233,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 5400
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -243,13 +245,13 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
# Post both success and failure to the Slack channel
- name: Post to a Slack channel
if: ${{ github.event.schedule }}
if: ${{ github.event.schedule && !cancelled() }}
uses: slackapi/slack-github-action@v1
with:
channel-id: "C06T9AMNDQQ" # on-call-compute-staging-stream
@@ -306,6 +308,7 @@ jobs:
"image": [ "'"$image_default"'" ],
"include": [{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-freetier", "db_size": "3gb" ,"runner": '"$runner_default"', "image": "'"$image_default"'" },
{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new", "db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'" },
{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new-many-tables","db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'" },
{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new", "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-freetier", "db_size": "3gb" ,"runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned-bookworm" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-new", "db_size": "10gb","runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned-bookworm" },
@@ -405,9 +408,10 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
if: contains(fromJson('["neonvm-captest-new", "neonvm-captest-freetier", "neonvm-azure-captest-freetier", "neonvm-azure-captest-new"]'), matrix.platform)
if: contains(fromJson('["neonvm-captest-new", "neonvm-captest-new-many-tables", "neonvm-captest-freetier", "neonvm-azure-captest-freetier", "neonvm-azure-captest-new"]'), matrix.platform)
id: create-neon-project
uses: ./.github/actions/neon-project-create
with:
@@ -426,7 +430,7 @@ jobs:
neonvm-captest-sharding-reuse)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_SHARDING_CONNSTR }}
;;
neonvm-captest-new | neonvm-captest-freetier | neonvm-azure-captest-new | neonvm-azure-captest-freetier)
neonvm-captest-new | neonvm-captest-new-many-tables | neonvm-captest-freetier | neonvm-azure-captest-new | neonvm-azure-captest-freetier)
CONNSTR=${{ steps.create-neon-project.outputs.dsn }}
;;
rds-aurora)
@@ -443,6 +447,26 @@ jobs:
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
# we want to compare Neon project OLTP throughput and latency at scale factor 10 GB
# without (neonvm-captest-new)
# and with (neonvm-captest-new-many-tables) many relations in the database
- name: Create many relations before the run
if: contains(fromJson('["neonvm-captest-new-many-tables"]'), matrix.platform)
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: performance
run_in_parallel: false
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_perf_many_relations
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
TEST_NUM_RELATIONS: 10000
- name: Benchmark init
uses: ./.github/actions/run-python-test-set
with:
@@ -452,7 +476,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_init
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -467,7 +491,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_simple_update
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -482,7 +506,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_select_only
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -500,7 +524,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -541,7 +565,7 @@ jobs:
runs-on: ${{ matrix.RUNNER }}
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -558,12 +582,12 @@ jobs:
arch=$(uname -m | sed 's/x86_64/amd64/g' | sed 's/aarch64/arm64/g')
cd /home/nonroot
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-17/libpq5_17.2-1.pgdg110+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-client-16_16.6-1.pgdg110+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-16_16.6-1.pgdg110+1_${arch}.deb"
dpkg -x libpq5_17.2-1.pgdg110+1_${arch}.deb pg
dpkg -x postgresql-16_16.6-1.pgdg110+1_${arch}.deb pg
dpkg -x postgresql-client-16_16.6-1.pgdg110+1_${arch}.deb pg
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-17/libpq5_17.2-1.pgdg120+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-client-16_16.6-1.pgdg120+1_${arch}.deb"
wget -q "https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-16_16.6-1.pgdg120+1_${arch}.deb"
dpkg -x libpq5_17.2-1.pgdg120+1_${arch}.deb pg
dpkg -x postgresql-16_16.6-1.pgdg120+1_${arch}.deb pg
dpkg -x postgresql-client-16_16.6-1.pgdg120+1_${arch}.deb pg
mkdir -p /tmp/neon/pg_install/v16/bin
ln -s /home/nonroot/pg/usr/lib/postgresql/16/bin/pgbench /tmp/neon/pg_install/v16/bin/pgbench
@@ -611,7 +635,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_pgvector_indexing
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -626,7 +650,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
@@ -637,7 +661,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -708,6 +732,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Set up Connection String
id: set-up-connstr
@@ -739,7 +764,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 43200 -k test_clickbench
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -753,7 +778,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -818,6 +843,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Get Connstring Secret Name
run: |
@@ -856,7 +882,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_tpch
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -868,7 +894,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -926,6 +952,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Set up Connection String
id: set-up-connstr
@@ -957,7 +984,7 @@ jobs:
save_perf_report: ${{ env.SAVE_PERF_REPORT }}
extra_params: -m remote_cluster --timeout 21600 -k test_user_examples
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -968,7 +995,7 @@ jobs:
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}

View File

@@ -2,6 +2,17 @@ name: Build build-tools image
on:
workflow_call:
inputs:
archs:
description: "Json array of architectures to build"
# Default values are set in `check-image` job, `set-variables` step
type: string
required: false
debians:
description: "Json array of Debian versions to build"
# Default values are set in `check-image` job, `set-variables` step
type: string
required: false
outputs:
image-tag:
description: "build-tools tag"
@@ -32,25 +43,37 @@ jobs:
check-image:
runs-on: ubuntu-22.04
outputs:
tag: ${{ steps.get-build-tools-tag.outputs.image-tag }}
found: ${{ steps.check-image.outputs.found }}
archs: ${{ steps.set-variables.outputs.archs }}
debians: ${{ steps.set-variables.outputs.debians }}
tag: ${{ steps.set-variables.outputs.image-tag }}
everything: ${{ steps.set-more-variables.outputs.everything }}
found: ${{ steps.set-more-variables.outputs.found }}
steps:
- uses: actions/checkout@v4
- name: Get build-tools image tag for the current commit
id: get-build-tools-tag
- name: Set variables
id: set-variables
env:
ARCHS: ${{ inputs.archs || '["x64","arm64"]' }}
DEBIANS: ${{ inputs.debians || '["bullseye","bookworm"]' }}
IMAGE_TAG: |
${{ hashFiles('build-tools.Dockerfile',
'.github/workflows/build-build-tools-image.yml') }}
run: |
echo "image-tag=${IMAGE_TAG}" | tee -a $GITHUB_OUTPUT
echo "archs=${ARCHS}" | tee -a ${GITHUB_OUTPUT}
echo "debians=${DEBIANS}" | tee -a ${GITHUB_OUTPUT}
echo "image-tag=${IMAGE_TAG}" | tee -a ${GITHUB_OUTPUT}
- name: Check if such tag found in the registry
id: check-image
- name: Set more variables
id: set-more-variables
env:
IMAGE_TAG: ${{ steps.get-build-tools-tag.outputs.image-tag }}
IMAGE_TAG: ${{ steps.set-variables.outputs.image-tag }}
EVERYTHING: |
${{ contains(fromJson(steps.set-variables.outputs.archs), 'x64') &&
contains(fromJson(steps.set-variables.outputs.archs), 'arm64') &&
contains(fromJson(steps.set-variables.outputs.debians), 'bullseye') &&
contains(fromJson(steps.set-variables.outputs.debians), 'bookworm') }}
run: |
if docker manifest inspect neondatabase/build-tools:${IMAGE_TAG}; then
found=true
@@ -58,8 +81,8 @@ jobs:
found=false
fi
echo "found=${found}" | tee -a $GITHUB_OUTPUT
echo "everything=${EVERYTHING}" | tee -a ${GITHUB_OUTPUT}
echo "found=${found}" | tee -a ${GITHUB_OUTPUT}
build-image:
needs: [ check-image ]
@@ -67,8 +90,8 @@ jobs:
strategy:
matrix:
debian-version: [ bullseye, bookworm ]
arch: [ x64, arm64 ]
arch: ${{ fromJson(needs.check-image.outputs.archs) }}
debian: ${{ fromJson(needs.check-image.outputs.debians) }}
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}
@@ -99,11 +122,11 @@ jobs:
push: true
pull: true
build-args: |
DEBIAN_VERSION=${{ matrix.debian-version }}
cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.debian-version }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0}-{1},mode=max', matrix.debian-version, matrix.arch) || '' }}
DEBIAN_VERSION=${{ matrix.debian }}
cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.debian }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0}-{1},mode=max', matrix.debian, matrix.arch) || '' }}
tags: |
neondatabase/build-tools:${{ needs.check-image.outputs.tag }}-${{ matrix.debian-version }}-${{ matrix.arch }}
neondatabase/build-tools:${{ needs.check-image.outputs.tag }}-${{ matrix.debian }}-${{ matrix.arch }}
merge-images:
needs: [ check-image, build-image ]
@@ -117,16 +140,22 @@ jobs:
- name: Create multi-arch image
env:
DEFAULT_DEBIAN_VERSION: bullseye
DEFAULT_DEBIAN_VERSION: bookworm
ARCHS: ${{ join(fromJson(needs.check-image.outputs.archs), ' ') }}
DEBIANS: ${{ join(fromJson(needs.check-image.outputs.debians), ' ') }}
EVERYTHING: ${{ needs.check-image.outputs.everything }}
IMAGE_TAG: ${{ needs.check-image.outputs.tag }}
run: |
for debian_version in bullseye bookworm; do
tags=("-t" "neondatabase/build-tools:${IMAGE_TAG}-${debian_version}")
if [ "${debian_version}" == "${DEFAULT_DEBIAN_VERSION}" ]; then
for debian in ${DEBIANS}; do
tags=("-t" "neondatabase/build-tools:${IMAGE_TAG}-${debian}")
if [ "${EVERYTHING}" == "true" ] && [ "${debian}" == "${DEFAULT_DEBIAN_VERSION}" ]; then
tags+=("-t" "neondatabase/build-tools:${IMAGE_TAG}")
fi
docker buildx imagetools create "${tags[@]}" \
neondatabase/build-tools:${IMAGE_TAG}-${debian_version}-x64 \
neondatabase/build-tools:${IMAGE_TAG}-${debian_version}-arm64
for arch in ${ARCHS}; do
tags+=("neondatabase/build-tools:${IMAGE_TAG}-${debian}-${arch}")
done
docker buildx imagetools create "${tags[@]}"
done

241
.github/workflows/build-macos.yml vendored Normal file
View File

@@ -0,0 +1,241 @@
name: Check neon with MacOS builds
on:
workflow_call:
inputs:
pg_versions:
description: "Array of the pg versions to build for, for example: ['v14', 'v17']"
type: string
default: '[]'
required: false
rebuild_rust_code:
description: "Rebuild Rust code"
type: boolean
default: false
required: false
rebuild_everything:
description: "If true, rebuild for all versions"
type: boolean
default: false
required: false
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
# TODO: move `check-*` and `files-changed` jobs to the "Caller" Workflow
# We should care about that as Github has limitations:
# - You can connect up to four levels of workflows
# - You can call a maximum of 20 unique reusable workflows from a single workflow file.
# https://docs.github.com/en/actions/sharing-automations/reusing-workflows#limitations
jobs:
build-pgxn:
if: |
(inputs.pg_versions != '[]' || inputs.rebuild_everything) && (
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
)
timeout-minutes: 30
runs-on: macos-15
strategy:
matrix:
postgres-version: ${{ inputs.rebuild_everything && fromJson('["v14", "v15", "v16", "v17"]') || fromJSON(inputs.pg_versions) }}
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
steps:
- name: Checkout main repo
uses: actions/checkout@v4
- name: Set pg ${{ matrix.postgres-version }} for caching
id: pg_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-${{ matrix.postgres-version }}) | tee -a "${GITHUB_OUTPUT}"
- name: Cache postgres ${{ matrix.postgres-version }} build
id: cache_pg
uses: actions/cache@v4
with:
path: pg_install/${{ matrix.postgres-version }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ matrix.postgres-version }}-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Checkout submodule vendor/postgres-${{ matrix.postgres-version }}
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
git submodule init vendor/postgres-${{ matrix.postgres-version }}
git submodule update --depth 1 --recursive
- name: Install build dependencies
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
brew install flex bison openssl protobuf icu4c
- name: Set extra env for macOS
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
- name: Build Postgres ${{ matrix.postgres-version }}
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
make postgres-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)
- name: Build Neon Pg Ext ${{ matrix.postgres-version }}
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
make "neon-pg-ext-${{ matrix.postgres-version }}" -j$(sysctl -n hw.ncpu)
- name: Get postgres headers ${{ matrix.postgres-version }}
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
make postgres-headers-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)
build-walproposer-lib:
if: |
(inputs.pg_versions != '[]' || inputs.rebuild_everything) && (
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
)
timeout-minutes: 30
runs-on: macos-15
needs: [build-pgxn]
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
steps:
- name: Checkout main repo
uses: actions/checkout@v4
- name: Set pg v17 for caching
id: pg_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"
- name: Cache postgres v17 build
id: cache_pg
uses: actions/cache@v4
with:
path: pg_install/v17
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache walproposer-lib
id: cache_walproposer_lib
uses: actions/cache@v4
with:
path: pg_install/build/walproposer-lib
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Checkout submodule vendor/postgres-v17
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run: |
git submodule init vendor/postgres-v17
git submodule update --depth 1 --recursive
- name: Install build dependencies
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run: |
brew install flex bison openssl protobuf icu4c
- name: Set extra env for macOS
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
- name: Build walproposer-lib (only for v17)
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run:
make walproposer-lib -j$(sysctl -n hw.ncpu)
cargo-build:
if: |
(inputs.pg_versions != '[]' || inputs.rebuild_rust_code || inputs.rebuild_everything) && (
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
)
timeout-minutes: 30
runs-on: macos-15
needs: [build-pgxn, build-walproposer-lib]
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
steps:
- name: Checkout main repo
uses: actions/checkout@v4
with:
submodules: true
- name: Set pg v14 for caching
id: pg_rev_v14
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) | tee -a "${GITHUB_OUTPUT}"
- name: Set pg v15 for caching
id: pg_rev_v15
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) | tee -a "${GITHUB_OUTPUT}"
- name: Set pg v16 for caching
id: pg_rev_v16
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) | tee -a "${GITHUB_OUTPUT}"
- name: Set pg v17 for caching
id: pg_rev_v17
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"
- name: Cache postgres v14 build
id: cache_pg
uses: actions/cache@v4
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v14-${{ steps.pg_rev_v14.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v15 build
id: cache_pg_v15
uses: actions/cache@v4
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v15-${{ steps.pg_rev_v15.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v16 build
id: cache_pg_v16
uses: actions/cache@v4
with:
path: pg_install/v16
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v16-${{ steps.pg_rev_v16.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v17 build
id: cache_pg_v17
uses: actions/cache@v4
with:
path: pg_install/v17
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache cargo deps (only for v17)
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
- name: Cache walproposer-lib
id: cache_walproposer_lib
uses: actions/cache@v4
with:
path: pg_install/build/walproposer-lib
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Install build dependencies
run: |
brew install flex bison openssl protobuf icu4c
- name: Set extra env for macOS
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
- name: Run cargo build (only for v17)
run: PQ_LIB_DIR=$(pwd)/pg_install/v17/lib cargo build --all --release -j$(sysctl -n hw.ncpu)
- name: Check that no warnings are produced (only for v17)
run: ./run_clippy.sh

View File

@@ -6,6 +6,7 @@ on:
- main
- release
- release-proxy
- release-compute
pull_request:
defaults:
@@ -20,8 +21,6 @@ concurrency:
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
# A concurrency group that we use for e2e-tests runs, matches `concurrency.group` above with `github.repository` as a prefix
E2E_CONCURRENCY_GROUP: ${{ github.repository }}-e2e-tests-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
@@ -70,8 +69,10 @@ jobs:
echo "tag=release-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then
echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-compute" ]]; then
echo "tag=release-compute-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
else
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release', 'release-proxy', 'release-compute'"
echo "tag=$GITHUB_RUN_ID" >> $GITHUB_OUTPUT
fi
shell: bash
@@ -163,77 +164,11 @@ jobs:
check-codestyle-rust:
needs: [ check-permissions, build-build-tools-image ]
strategy:
matrix:
arch: [ x64, arm64 ]
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'small-arm64' || 'small')) }}
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
options: --init
steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: true
- name: Cache cargo deps
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
# Some of our rust modules use FFI and need those to be checked
- name: Get postgres headers
run: make postgres-headers -j$(nproc)
# cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.
# This will catch compiler & clippy warnings in all feature combinations.
# TODO: use cargo hack for build and test as well, but, that's quite expensive.
# NB: keep clippy args in sync with ./run_clippy.sh
#
# The only difference between "clippy --debug" and "clippy --release" is that in --release mode,
# #[cfg(debug_assertions)] blocks are not built. It's not worth building everything for second
# time just for that, so skip "clippy --release".
- run: |
CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"
if [ "$CLIPPY_COMMON_ARGS" = "" ]; then
echo "No clippy args found in .neon_clippy_args"
exit 1
fi
echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV
- name: Run cargo clippy (debug)
run: cargo hack --feature-powerset clippy $CLIPPY_COMMON_ARGS
- name: Check documentation generation
run: cargo doc --workspace --no-deps --document-private-items
env:
RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"
# Use `${{ !cancelled() }}` to run quck tests after the longer clippy run
- name: Check formatting
if: ${{ !cancelled() }}
run: cargo fmt --all -- --check
# https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci
- name: Check rust dependencies
if: ${{ !cancelled() }}
run: |
cargo hakari generate --diff # workspace-hack Cargo.toml is up-to-date
cargo hakari manage-deps --dry-run # all workspace crates depend on workspace-hack
# https://github.com/EmbarkStudios/cargo-deny
- name: Check rust licenses/bans/advisories/sources
if: ${{ !cancelled() }}
run: cargo deny check --hide-inclusion-graph
uses: ./.github/workflows/_check-codestyle-rust.yml
with:
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
archs: '["x64", "arm64"]'
secrets: inherit
build-and-test-locally:
needs: [ tag, build-build-tools-image ]
@@ -252,8 +187,15 @@ jobs:
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
build-tag: ${{ needs.tag.outputs.build-tag }}
build-type: ${{ matrix.build-type }}
# Run tests on all Postgres versions in release builds and only on the latest version in debug builds
pg-versions: ${{ matrix.build-type == 'release' && '["v14", "v15", "v16", "v17"]' || '["v17"]' }}
# Run tests on all Postgres versions in release builds and only on the latest version in debug builds.
# Run without LFC on v17 release and debug builds only. For all the other cases LFC is enabled.
test-cfg: |
${{ matrix.build-type == 'release' && '[{"pg_version":"v14", "lfc_state": "with-lfc"},
{"pg_version":"v15", "lfc_state": "with-lfc"},
{"pg_version":"v16", "lfc_state": "with-lfc"},
{"pg_version":"v17", "lfc_state": "with-lfc"},
{"pg_version":"v17", "lfc_state": "without-lfc"}]'
|| '[{"pg_version":"v17", "lfc_state": "without-lfc" }]' }}
secrets: inherit
# Keep `benchmarks` job outside of `build-and-test-locally` workflow to make job failures non-blocking
@@ -295,6 +237,11 @@ jobs:
benchmarks:
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-benchmarks')
needs: [ check-permissions, build-and-test-locally, build-build-tools-image, get-benchmarks-durations ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
@@ -323,6 +270,7 @@ jobs:
extra_params: --splits 5 --group ${{ matrix.pytest_split_group }}
benchmark_durations: ${{ needs.get-benchmarks-durations.outputs.json }}
pg_version: v16
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
@@ -332,24 +280,31 @@ jobs:
# XXX: no coverage data handling here, since benchmarks are run on release builds,
# while coverage is currently collected for the debug ones
report-benchmarks-failures:
report-benchmarks-results-to-slack:
needs: [ benchmarks, create-test-report ]
if: github.ref_name == 'main' && failure() && needs.benchmarks.result == 'failure'
if: github.ref_name == 'main' && !cancelled() && contains(fromJSON('["success", "failure"]'), needs.benchmarks.result)
runs-on: ubuntu-22.04
steps:
- uses: slackapi/slack-github-action@v1
- uses: slackapi/slack-github-action@v2
with:
channel-id: C060CNA47S9 # on-call-staging-storage-stream
slack-message: |
Benchmarks failed on main <${{ github.event.head_commit.url }}|${{ github.sha }}>
<${{ needs.create-test-report.outputs.report-url }}|Allure report>
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
method: chat.postMessage
token: ${{ secrets.SLACK_BOT_TOKEN }}
payload: |
channel: "${{ vars.SLACK_ON_CALL_STORAGE_STAGING_STREAM }}"
text: |
Benchmarks on main: *${{ needs.benchmarks.result }}*
- <${{ needs.create-test-report.outputs.report-url }}|Allure report>
- <${{ github.event.head_commit.url }}|${{ github.sha }}>
create-test-report:
needs: [ check-permissions, build-and-test-locally, coverage-report, build-build-tools-image, benchmarks ]
if: ${{ !cancelled() && contains(fromJSON('["skipped", "success"]'), needs.check-permissions.result) }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
outputs:
report-url: ${{ steps.create-allure-report.outputs.report-url }}
@@ -370,6 +325,7 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
@@ -401,6 +357,10 @@ jobs:
coverage-report:
if: ${{ !startsWith(github.ref_name, 'release') }}
needs: [ check-permissions, build-build-tools-image, build-and-test-locally ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
@@ -427,12 +387,14 @@ jobs:
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ matrix.build_type }}-artifact
path: /tmp/neon
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Get coverage artifact
uses: ./.github/actions/download
with:
name: coverage-data-artifact
path: /tmp/coverage
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Merge coverage data
run: scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage merge
@@ -506,8 +468,8 @@ jobs:
})
trigger-e2e-tests:
if: ${{ !github.event.pull_request.draft || contains( github.event.pull_request.labels.*.name, 'run-e2e-tests-in-draft') || github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy' }}
needs: [ check-permissions, promote-images, tag ]
if: ${{ !github.event.pull_request.draft || contains( github.event.pull_request.labels.*.name, 'run-e2e-tests-in-draft') || github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy' || github.ref_name == 'release-compute' }}
needs: [ check-permissions, promote-images-dev, tag ]
uses: ./.github/workflows/trigger-e2e-tests.yml
secrets: inherit
@@ -563,6 +525,10 @@ jobs:
neon-image:
needs: [ neon-image-arch, tag ]
runs-on: ubuntu-22.04
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
steps:
- uses: docker/login-action@v3
@@ -577,11 +543,15 @@ jobs:
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm-x64 \
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm-arm64
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Push multi-arch image to ECR
run: |
@@ -590,6 +560,10 @@ jobs:
compute-node-image-arch:
needs: [ check-permissions, build-build-tools-image, tag ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
strategy:
fail-fast: false
matrix:
@@ -630,11 +604,15 @@ jobs:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- uses: docker/login-action@v3
with:
@@ -662,7 +640,7 @@ jobs:
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-${{ matrix.arch }}
- name: Build neon extensions test image
if: matrix.version.pg == 'v16'
if: matrix.version.pg >= 'v16'
uses: docker/build-push-action@v6
with:
context: .
@@ -677,37 +655,16 @@ jobs:
pull: true
file: compute/compute-node.Dockerfile
target: neon-pg-ext-test
cache-from: type=registry,ref=cache.neon.build/neon-test-extensions-${{ matrix.version.pg }}:cache-${{ matrix.version.debian }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/neon-test-extensions-{0}:cache-{1}-{2},mode=max', matrix.version.pg, matrix.version.debian, matrix.arch) || '' }}
cache-from: type=registry,ref=cache.neon.build/compute-node-${{ matrix.version.pg }}:cache-${{ matrix.version.debian }}-${{ matrix.arch }}
tags: |
neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{needs.tag.outputs.build-tag}}-${{ matrix.version.debian }}-${{ matrix.arch }}
- name: Build compute-tools image
# compute-tools are Postgres independent, so build it only once
# We pick 16, because that builds on debian 11 with older glibc (and is
# thus compatible with newer glibc), rather than 17 on Debian 12, as
# that isn't guaranteed to be compatible with Debian 11
if: matrix.version.pg == 'v16'
uses: docker/build-push-action@v6
with:
target: compute-tools-image
context: .
build-args: |
GIT_VERSION=${{ github.event.pull_request.head.sha || github.sha }}
BUILD_TAG=${{ needs.tag.outputs.build-tag }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}-${{ matrix.version.debian }}
DEBIAN_VERSION=${{ matrix.version.debian }}
provenance: false
push: true
pull: true
file: compute/compute-node.Dockerfile
cache-from: type=registry,ref=cache.neon.build/neon-test-extensions-${{ matrix.version.pg }}:cache-${{ matrix.version.debian }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/compute-tools-{0}:cache-{1}-{2},mode=max', matrix.version.pg, matrix.version.debian, matrix.arch) || '' }}
tags: |
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-${{ matrix.arch }}
compute-node-image:
needs: [ compute-node-image-arch, tag ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
runs-on: ubuntu-22.04
strategy:
@@ -737,38 +694,28 @@ jobs:
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-arm64
- name: Create multi-arch neon-test-extensions image
if: matrix.version.pg == 'v16'
if: matrix.version.pg >= 'v16'
run: |
docker buildx imagetools create -t neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }} \
-t neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }} \
neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-x64 \
neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-arm64
- name: Create multi-arch compute-tools image
if: matrix.version.pg == 'v16'
run: |
docker buildx imagetools create -t neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }} \
-t neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }} \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-x64 \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-arm64
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Push multi-arch compute-node-${{ matrix.version.pg }} image to ECR
run: |
docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }} \
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}
- name: Push multi-arch compute-tools image to ECR
if: matrix.version.pg == 'v16'
run: |
docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-tools:${{ needs.tag.outputs.build-tag }} \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}
vm-compute-node-image:
needs: [ check-permissions, tag, compute-node-image ]
runs-on: [ self-hosted, large ]
@@ -786,7 +733,7 @@ jobs:
- pg: v17
debian: bookworm
env:
VM_BUILDER_VERSION: v0.35.0
VM_BUILDER_VERSION: v0.37.1
steps:
- uses: actions/checkout@v4
@@ -826,6 +773,7 @@ jobs:
fail-fast: false
matrix:
arch: [ x64, arm64 ]
pg_version: [v16, v17]
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'small-arm64' || 'small')) }}
@@ -864,7 +812,10 @@ jobs:
- name: Verify docker-compose example and test extensions
timeout-minutes: 20
run: env TAG=${{needs.tag.outputs.build-tag}} ./docker-compose/docker_compose_test.sh
env:
TAG: ${{needs.tag.outputs.build-tag}}
TEST_VERSION_ONLY: ${{ matrix.pg_version }}
run: ./docker-compose/docker_compose_test.sh
- name: Print logs and clean up
if: always()
@@ -872,12 +823,14 @@ jobs:
docker compose -f ./docker-compose/docker-compose.yml logs || 0
docker compose -f ./docker-compose/docker-compose.yml down
promote-images:
needs: [ check-permissions, tag, test-images, vm-compute-node-image ]
promote-images-dev:
needs: [ check-permissions, tag, vm-compute-node-image, neon-image ]
runs-on: ubuntu-22.04
permissions:
id-token: write # for `aws-actions/configure-aws-credentials`
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
env:
VERSIONS: v14 v15 v16 v17
@@ -888,12 +841,15 @@ jobs:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- name: Login to dev ECR
uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Copy vm-compute-node images to ECR
run: |
@@ -902,6 +858,35 @@ jobs:
neondatabase/vm-compute-node-${version}:${{ needs.tag.outputs.build-tag }}
done
promote-images-prod:
needs: [ check-permissions, tag, test-images, vm-compute-node-image ]
runs-on: ubuntu-22.04
if: github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy' || github.ref_name == 'release-compute'
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
env:
VERSIONS: v14 v15 v16 v17
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- uses: docker/login-action@v3
with:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- name: Add latest tag to images
if: github.ref_name == 'main'
run: |
@@ -909,9 +894,6 @@ jobs:
docker buildx imagetools create -t $repo/neon:latest \
$repo/neon:${{ needs.tag.outputs.build-tag }}
docker buildx imagetools create -t $repo/compute-tools:latest \
$repo/compute-tools:${{ needs.tag.outputs.build-tag }}
for version in ${VERSIONS}; do
docker buildx imagetools create -t $repo/compute-node-${version}:latest \
$repo/compute-node-${version}:${{ needs.tag.outputs.build-tag }}
@@ -924,7 +906,7 @@ jobs:
neondatabase/neon-test-extensions-v16:${{ needs.tag.outputs.build-tag }}
- name: Configure AWS-prod credentials
if: github.ref_name == 'release'|| github.ref_name == 'release-proxy'
if: github.ref_name == 'release'|| github.ref_name == 'release-proxy' || github.ref_name == 'release-compute'
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
@@ -933,38 +915,38 @@ jobs:
- name: Login to prod ECR
uses: docker/login-action@v3
if: github.ref_name == 'release'|| github.ref_name == 'release-proxy'
if: github.ref_name == 'release'|| github.ref_name == 'release-proxy' || github.ref_name == 'release-compute'
with:
registry: 093970136003.dkr.ecr.eu-central-1.amazonaws.com
- name: Copy all images to prod ECR
if: github.ref_name == 'release'|| github.ref_name == 'release-proxy'
if: github.ref_name == 'release' || github.ref_name == 'release-proxy' || github.ref_name == 'release-compute'
run: |
for image in neon compute-tools {vm-,}compute-node-{v14,v15,v16,v17}; do
for image in neon {vm-,}compute-node-{v14,v15,v16,v17}; do
docker buildx imagetools create -t 093970136003.dkr.ecr.eu-central-1.amazonaws.com/${image}:${{ needs.tag.outputs.build-tag }} \
369495373322.dkr.ecr.eu-central-1.amazonaws.com/${image}:${{ needs.tag.outputs.build-tag }}
done
push-to-acr-dev:
if: github.ref_name == 'main'
needs: [ tag, promote-images ]
needs: [ tag, promote-images-dev ]
uses: ./.github/workflows/_push-to-acr.yml
with:
client_id: ${{ vars.AZURE_DEV_CLIENT_ID }}
image_tag: ${{ needs.tag.outputs.build-tag }}
images: neon compute-tools vm-compute-node-v14 vm-compute-node-v15 vm-compute-node-v16 vm-compute-node-v17 compute-node-v14 compute-node-v15 compute-node-v16 compute-node-v17
images: neon vm-compute-node-v14 vm-compute-node-v15 vm-compute-node-v16 vm-compute-node-v17 compute-node-v14 compute-node-v15 compute-node-v16 compute-node-v17
registry_name: ${{ vars.AZURE_DEV_REGISTRY_NAME }}
subscription_id: ${{ vars.AZURE_DEV_SUBSCRIPTION_ID }}
tenant_id: ${{ vars.AZURE_TENANT_ID }}
push-to-acr-prod:
if: github.ref_name == 'release'|| github.ref_name == 'release-proxy'
needs: [ tag, promote-images ]
if: github.ref_name == 'release' || github.ref_name == 'release-proxy' || github.ref_name == 'release-compute'
needs: [ tag, promote-images-prod ]
uses: ./.github/workflows/_push-to-acr.yml
with:
client_id: ${{ vars.AZURE_PROD_CLIENT_ID }}
image_tag: ${{ needs.tag.outputs.build-tag }}
images: neon compute-tools vm-compute-node-v14 vm-compute-node-v15 vm-compute-node-v16 vm-compute-node-v17 compute-node-v14 compute-node-v15 compute-node-v16 compute-node-v17
images: neon vm-compute-node-v14 vm-compute-node-v15 vm-compute-node-v16 vm-compute-node-v17 compute-node-v14 compute-node-v15 compute-node-v16 compute-node-v17
registry_name: ${{ vars.AZURE_PROD_REGISTRY_NAME }}
subscription_id: ${{ vars.AZURE_PROD_SUBSCRIPTION_ID }}
tenant_id: ${{ vars.AZURE_TENANT_ID }}
@@ -972,6 +954,11 @@ jobs:
trigger-custom-extensions-build-and-wait:
needs: [ check-permissions, tag ]
runs-on: ubuntu-22.04
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
steps:
- name: Set PR's status to pending and request a remote CI test
run: |
@@ -1044,15 +1031,82 @@ jobs:
exit 1
deploy:
needs: [ check-permissions, promote-images, tag, build-and-test-locally, trigger-custom-extensions-build-and-wait, push-to-acr-dev, push-to-acr-prod ]
needs: [ check-permissions, promote-images-prod, tag, build-and-test-locally, trigger-custom-extensions-build-and-wait, push-to-acr-dev, push-to-acr-prod ]
# `!failure() && !cancelled()` is required because the workflow depends on the job that can be skipped: `push-to-acr-dev` and `push-to-acr-prod`
if: (github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy') && !failure() && !cancelled()
if: (github.ref_name == 'main' || github.ref_name == 'release' || github.ref_name == 'release-proxy' || github.ref_name == 'release-compute') && !failure() && !cancelled()
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
runs-on: [ self-hosted, small ]
container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest
steps:
- uses: actions/checkout@v4
- name: Create git tag and GitHub release
if: github.ref_name == 'release' || github.ref_name == 'release-proxy' || github.ref_name == 'release-compute'
uses: actions/github-script@v7
with:
retries: 5
script: |
const tag = "${{ needs.tag.outputs.build-tag }}";
try {
const existingRef = await github.rest.git.getRef({
owner: context.repo.owner,
repo: context.repo.repo,
ref: `tags/${tag}`,
});
if (existingRef.data.object.sha !== context.sha) {
throw new Error(`Tag ${tag} already exists but points to a different commit (expected: ${context.sha}, actual: ${existingRef.data.object.sha}).`);
}
console.log(`Tag ${tag} already exists and points to ${context.sha} as expected.`);
} catch (error) {
if (error.status !== 404) {
throw error;
}
console.log(`Tag ${tag} does not exist. Creating it...`);
await github.rest.git.createRef({
owner: context.repo.owner,
repo: context.repo.repo,
ref: `refs/tags/${tag}`,
sha: context.sha,
});
console.log(`Tag ${tag} created successfully.`);
}
// TODO: check how GitHub releases looks for proxy/compute releases and enable them if they're ok
if (context.ref !== 'refs/heads/release') {
console.log(`GitHub release skipped for ${context.ref}.`);
return;
}
try {
const existingRelease = await github.rest.repos.getReleaseByTag({
owner: context.repo.owner,
repo: context.repo.repo,
tag: tag,
});
console.log(`Release for tag ${tag} already exists (ID: ${existingRelease.data.id}).`);
} catch (error) {
if (error.status !== 404) {
throw error;
}
console.log(`Release for tag ${tag} does not exist. Creating it...`);
await github.rest.repos.createRelease({
owner: context.repo.owner,
repo: context.repo.repo,
tag_name: tag,
generate_release_notes: true,
});
console.log(`Release for tag ${tag} created successfully.`);
}
- name: Trigger deploy workflow
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
@@ -1095,43 +1149,20 @@ jobs:
-f deployProxyAuthBroker=true \
-f branch=main \
-f dockerTag=${{needs.tag.outputs.build-tag}}
elif [[ "$GITHUB_REF_NAME" == "release-compute" ]]; then
gh workflow --repo neondatabase/infra run deploy-compute-dev.yml --ref main -f dockerTag=${{needs.tag.outputs.build-tag}}
else
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main', 'release', 'release-proxy' or 'release-compute'"
exit 1
fi
- name: Create git tag
if: github.ref_name == 'release' || github.ref_name == 'release-proxy'
uses: actions/github-script@v7
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
await github.rest.git.createRef({
owner: context.repo.owner,
repo: context.repo.repo,
ref: "refs/tags/${{ needs.tag.outputs.build-tag }}",
sha: context.sha,
})
# TODO: check how GitHub releases looks for proxy releases and enable it if it's ok
- name: Create GitHub release
if: github.ref_name == 'release'
uses: actions/github-script@v7
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
await github.rest.repos.createRelease({
owner: context.repo.owner,
repo: context.repo.repo,
tag_name: "${{ needs.tag.outputs.build-tag }}",
generate_release_notes: true,
})
# The job runs on `release` branch and copies compatibility data and Neon artifact from the last *release PR* to the latest directory
promote-compatibility-data:
needs: [ deploy ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
# `!failure() && !cancelled()` is required because the workflow transitively depends on the job that can be skipped: `push-to-acr-dev` and `push-to-acr-prod`
if: github.ref_name == 'release' && !failure() && !cancelled()
@@ -1168,6 +1199,12 @@ jobs:
echo "run-id=${run_id}" | tee -a ${GITHUB_OUTPUT}
echo "commit-sha=${last_commit_sha}" | tee -a ${GITHUB_OUTPUT}
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Promote compatibility snapshot and Neon artifact
env:
BUCKET: neon-github-public-dev
@@ -1215,7 +1252,7 @@ jobs:
done
pin-build-tools-image:
needs: [ build-build-tools-image, promote-images, build-and-test-locally ]
needs: [ build-build-tools-image, promote-images-prod, build-and-test-locally ]
if: github.ref_name == 'main'
uses: ./.github/workflows/pin-build-tools-image.yml
with:
@@ -1238,7 +1275,7 @@ jobs:
- build-and-test-locally
- check-codestyle-python
- check-codestyle-rust
- promote-images
- promote-images-dev
- test-images
- trigger-custom-extensions-build-and-wait
runs-on: ubuntu-22.04

View File

@@ -19,15 +19,21 @@ concurrency:
group: ${{ github.workflow }}
cancel-in-progress: true
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
jobs:
regress:
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
DEFAULT_PG_VERSION: 16
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
strategy:
fail-fast: false
matrix:
pg-version: [16, 17]
runs-on: us-east-2
container:
@@ -40,9 +46,11 @@ jobs:
submodules: true
- name: Patch the test
env:
PG_VERSION: ${{matrix.pg-version}}
run: |
cd "vendor/postgres-v${DEFAULT_PG_VERSION}"
patch -p1 < "../../compute/patches/cloud_regress_pg${DEFAULT_PG_VERSION}.patch"
cd "vendor/postgres-v${PG_VERSION}"
patch -p1 < "../../compute/patches/cloud_regress_pg${PG_VERSION}.patch"
- name: Generate a random password
id: pwgen
@@ -55,8 +63,9 @@ jobs:
- name: Change tests according to the generated password
env:
DBPASS: ${{ steps.pwgen.outputs.DBPASS }}
PG_VERSION: ${{matrix.pg-version}}
run: |
cd vendor/postgres-v"${DEFAULT_PG_VERSION}"/src/test/regress
cd vendor/postgres-v"${PG_VERSION}"/src/test/regress
for fname in sql/*.sql expected/*.out; do
sed -i.bak s/NEON_PASSWORD_PLACEHOLDER/"'${DBPASS}'"/ "${fname}"
done
@@ -72,27 +81,46 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create a new branch
id: create-branch
uses: ./.github/actions/neon-branch-create
with:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}
- name: Run the regression tests
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: cloud_regress
pg_version: ${{ env.DEFAULT_PG_VERSION }}
pg_version: ${{matrix.pg-version}}
extra_params: -m remote_cluster
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ secrets.PG_REGRESS_CONNSTR }}
BENCHMARK_CONNSTR: ${{steps.create-branch.outputs.dsn}}
- name: Delete branch
if: always()
uses: ./.github/actions/neon-branch-delete
with:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}
branch_id: ${{steps.create-branch.outputs.branch_id}}
- name: Create Allure report
id: create-allure-report
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@v1
with:
channel-id: "C033QLM5P7D" # on-call-staging-stream
channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}
slack-message: |
Periodic pg_regress on staging: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

View File

@@ -13,7 +13,7 @@ on:
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 9 * * *' # run once a day, timezone is utc
workflow_dispatch: # adds ability to run this manually
defaults:
run:
shell: bash -euxo pipefail {0}
@@ -26,8 +26,9 @@ concurrency:
jobs:
ingest:
strategy:
fail-fast: false # allow other variants to continue even if one fails
matrix:
target_project: [new_empty_project, large_existing_project]
target_project: [new_empty_project, large_existing_project]
permissions:
contents: write
statuses: write
@@ -55,7 +56,7 @@ jobs:
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role
role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role
- name: Download Neon artifact
uses: ./.github/actions/download
@@ -63,6 +64,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
if: ${{ matrix.target_project == 'new_empty_project' }}
@@ -93,7 +95,7 @@ jobs:
project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Initialize Neon project
- name: Initialize Neon project
if: ${{ matrix.target_project == 'large_existing_project' }}
env:
BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-branch-ingest-target.outputs.dsn }}
@@ -121,7 +123,7 @@ jobs:
${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"
echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV
- name: Invoke pgcopydb
- name: Invoke pgcopydb
uses: ./.github/actions/run-python-test-set
with:
build_type: remote
@@ -130,7 +132,7 @@ jobs:
extra_params: -s -m remote_cluster --timeout 86400 -k test_ingest_performance_using_pgcopydb
pg_version: v16
save_perf_report: true
aws_oicd_role_arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_INGEST_SOURCE_CONNSTR: ${{ secrets.BENCHMARK_INGEST_SOURCE_CONNSTR }}
TARGET_PROJECT_TYPE: ${{ matrix.target_project }}
@@ -142,7 +144,7 @@ jobs:
run: |
export LD_LIBRARY_PATH=${PG_16_LIB_PATH}
${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "\dt+"
- name: Delete Neon Project
if: ${{ always() && matrix.target_project == 'new_empty_project' }}
uses: ./.github/actions/neon-project-delete

View File

@@ -31,19 +31,15 @@ jobs:
uses: ./.github/workflows/build-build-tools-image.yml
secrets: inherit
check-macos-build:
needs: [ check-permissions ]
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
timeout-minutes: 90
runs-on: macos-15
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
files-changed:
name: Detect what files changed
runs-on: ubuntu-22.04
timeout-minutes: 3
outputs:
v17: ${{ steps.files_changed.outputs.v17 }}
postgres_changes: ${{ steps.postgres_changes.outputs.changes }}
rebuild_rust_code: ${{ steps.files_changed.outputs.rust_code }}
rebuild_everything: ${{ steps.files_changed.outputs.rebuild_neon_extra || steps.files_changed.outputs.rebuild_macos }}
steps:
- name: Checkout
@@ -51,102 +47,45 @@ jobs:
with:
submodules: true
- name: Install macOS postgres dependencies
run: brew install flex bison openssl protobuf icu4c
- name: Set pg 14 revision for caching
id: pg_v14_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT
- name: Set pg 15 revision for caching
id: pg_v15_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT
- name: Set pg 16 revision for caching
id: pg_v16_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT
- name: Set pg 17 revision for caching
id: pg_v17_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) >> $GITHUB_OUTPUT
- name: Cache postgres v14 build
id: cache_pg_14
uses: actions/cache@v4
- name: Check for Postgres changes
uses: dorny/paths-filter@1441771bbfdd59dcd748680ee64ebd8faab1a242 #v3
id: files_changed
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
token: ${{ github.token }}
filters: .github/file-filters.yaml
base: ${{ github.event_name != 'pull_request' && (github.event.merge_group.base_ref || github.ref_name) || '' }}
ref: ${{ github.event_name != 'pull_request' && (github.event.merge_group.head_ref || github.ref) || '' }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v4
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v16 build
id: cache_pg_16
uses: actions/cache@v4
with:
path: pg_install/v16
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v17 build
id: cache_pg_17
uses: actions/cache@v4
with:
path: pg_install/v17
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v17_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Set extra env for macOS
- name: Filter out only v-string for build matrix
id: postgres_changes
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
v_strings_only_as_json_array=$(echo ${{ steps.files_changed.outputs.chnages }} | jq '.[]|select(test("v\\d+"))' | jq --slurp -c)
echo "changes=${v_strings_only_as_json_array}" | tee -a "${GITHUB_OUTPUT}"
- name: Cache cargo deps
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'
run: make postgres-v14 -j$(sysctl -n hw.ncpu)
- name: Build postgres v15
if: steps.cache_pg_15.outputs.cache-hit != 'true'
run: make postgres-v15 -j$(sysctl -n hw.ncpu)
- name: Build postgres v16
if: steps.cache_pg_16.outputs.cache-hit != 'true'
run: make postgres-v16 -j$(sysctl -n hw.ncpu)
- name: Build postgres v17
if: steps.cache_pg_17.outputs.cache-hit != 'true'
run: make postgres-v17 -j$(sysctl -n hw.ncpu)
- name: Build neon extensions
run: make neon-pg-ext -j$(sysctl -n hw.ncpu)
- name: Build walproposer-lib
run: make walproposer-lib -j$(sysctl -n hw.ncpu)
- name: Run cargo build
run: PQ_LIB_DIR=$(pwd)/pg_install/v16/lib cargo build --all --release
- name: Check that no warnings are produced
run: ./run_clippy.sh
gather-rust-build-stats:
needs: [ check-permissions, build-build-tools-image ]
check-macos-build:
needs: [ check-permissions, files-changed ]
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
uses: ./.github/workflows/build-macos.yml
with:
pg_versions: ${{ needs.files-changed.outputs.postgres_changes }}
rebuild_rust_code: ${{ needs.files-changed.outputs.rebuild_rust_code }}
rebuild_everything: ${{ fromJson(needs.files-changed.outputs.rebuild_everything) }}
gather-rust-build-stats:
needs: [ check-permissions, build-build-tools-image, files-changed ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
if: |
(needs.files-changed.outputs.v17 == 'true' || needs.files-changed.outputs.rebuild_everything == 'true') && (
contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
)
runs-on: [ self-hosted, large ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
@@ -177,13 +116,18 @@ jobs:
- name: Produce the build stats
run: PQ_LIB_DIR=$(pwd)/pg_install/v17/lib cargo build --all --release --timings -j$(nproc)
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Upload the build stats
id: upload-stats
env:
BUCKET: neon-github-public-dev
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
run: |
REPORT_URL=https://${BUCKET}.s3.amazonaws.com/build-stats/${SHA}/${GITHUB_RUN_ID}/cargo-timing.html
aws s3 cp --only-show-errors ./target/cargo-timings/cargo-timing.html "s3://${BUCKET}/build-stats/${SHA}/${GITHUB_RUN_ID}/"

View File

@@ -27,9 +27,14 @@ concurrency:
jobs:
trigger_bench_on_ec2_machine_in_eu_central_1:
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
runs-on: [ self-hosted, small ]
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -38,8 +43,6 @@ jobs:
env:
API_KEY: ${{ secrets.PERIODIC_PAGEBENCH_EC2_RUNNER_API_KEY }}
RUN_ID: ${{ github.run_id }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY : ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION : "eu-central-1"
AWS_INSTANCE_ID : "i-02a59a3bf86bc7e74"
steps:
@@ -50,6 +53,13 @@ jobs:
- name: Show my own (github runner) external IP address - usefull for IP allowlisting
run: curl https://ifconfig.me
- name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}
role-duration-seconds: 3600
- name: Start EC2 instance and wait for the instance to boot up
run: |
aws ec2 start-instances --instance-ids $AWS_INSTANCE_ID
@@ -124,11 +134,10 @@ jobs:
cat "test_log_${GITHUB_RUN_ID}"
- name: Create Allure report
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -148,6 +157,14 @@ jobs:
-H "Authorization: Bearer $API_KEY" \
-d ''
- name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)
if: always() && steps.poll_step.outputs.too_many_runs != 'true'
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}
role-duration-seconds: 3600
- name: Stop EC2 instance and wait for the instance to be stopped
if: always() && steps.poll_step.outputs.too_many_runs != 'true'
run: |

View File

@@ -25,11 +25,13 @@ defaults:
run:
shell: bash -euxo pipefail {0}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write # require for posting a status update
env:
DEFAULT_PG_VERSION: 16
PLATFORM: neon-captest-new
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
AWS_DEFAULT_REGION: eu-central-1
jobs:
@@ -94,6 +96,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -110,6 +113,7 @@ jobs:
run_in_parallel: false
extra_params: -m remote_cluster
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
@@ -126,6 +130,7 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
@@ -159,6 +164,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -175,6 +181,7 @@ jobs:
run_in_parallel: false
extra_params: -m remote_cluster
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
@@ -191,6 +198,7 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

View File

@@ -67,7 +67,7 @@ jobs:
runs-on: ubuntu-22.04
permissions:
id-token: write # for `azure/login`
id-token: write # for `azure/login` and aws auth
steps:
- uses: docker/login-action@v3
@@ -75,11 +75,15 @@ jobs:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- uses: docker/login-action@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Login to Amazon Dev ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Azure login
uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a # @v2.1.1
@@ -94,7 +98,7 @@ jobs:
- name: Tag build-tools with `${{ env.TO_TAG }}` in Docker Hub, ECR, and ACR
env:
DEFAULT_DEBIAN_VERSION: bullseye
DEFAULT_DEBIAN_VERSION: bookworm
run: |
for debian_version in bullseye bookworm; do
tags=()

View File

@@ -1,6 +1,12 @@
name: Pre-merge checks
on:
pull_request:
paths:
- .github/workflows/_check-codestyle-python.yml
- .github/workflows/_check-codestyle-rust.yml
- .github/workflows/build-build-tools-image.yml
- .github/workflows/pre-merge-checks.yml
merge_group:
branches:
- main
@@ -17,27 +23,49 @@ jobs:
runs-on: ubuntu-22.04
outputs:
python-changed: ${{ steps.python-src.outputs.any_changed }}
rust-changed: ${{ steps.rust-src.outputs.any_changed }}
steps:
- uses: actions/checkout@v4
- uses: tj-actions/changed-files@4edd678ac3f81e2dc578756871e4d00c19191daf # v45.0.4
id: python-src
with:
files: |
.github/workflows/_check-codestyle-python.yml
.github/workflows/build-build-tools-image.yml
.github/workflows/pre-merge-checks.yml
**/**.py
poetry.lock
pyproject.toml
- uses: tj-actions/changed-files@4edd678ac3f81e2dc578756871e4d00c19191daf # v45.0.4
id: rust-src
with:
files: |
.github/workflows/_check-codestyle-rust.yml
.github/workflows/build-build-tools-image.yml
.github/workflows/pre-merge-checks.yml
**/**.rs
**/Cargo.toml
Cargo.toml
Cargo.lock
- name: PRINT ALL CHANGED FILES FOR DEBUG PURPOSES
env:
PYTHON_CHANGED_FILES: ${{ steps.python-src.outputs.all_changed_files }}
RUST_CHANGED_FILES: ${{ steps.rust-src.outputs.all_changed_files }}
run: |
echo "${PYTHON_CHANGED_FILES}"
echo "${RUST_CHANGED_FILES}"
build-build-tools-image:
if: needs.get-changed-files.outputs.python-changed == 'true'
needs: [ get-changed-files ]
uses: ./.github/workflows/build-build-tools-image.yml
with:
# Build only one combination to save time
archs: '["x64"]'
debians: '["bookworm"]'
secrets: inherit
check-codestyle-python:
@@ -45,7 +73,18 @@ jobs:
needs: [ get-changed-files, build-build-tools-image ]
uses: ./.github/workflows/_check-codestyle-python.yml
with:
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
# `-bookworm-x64` suffix should match the combination in `build-build-tools-image`
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64
secrets: inherit
check-codestyle-rust:
if: needs.get-changed-files.outputs.rust-changed == 'true'
needs: [ get-changed-files, build-build-tools-image ]
uses: ./.github/workflows/_check-codestyle-rust.yml
with:
# `-bookworm-x64` suffix should match the combination in `build-build-tools-image`
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64
archs: '["x64"]'
secrets: inherit
# To get items from the merge queue merged into main we need to satisfy "Status checks that are required".
@@ -56,9 +95,11 @@ jobs:
if: always()
permissions:
statuses: write # for `github.repos.createCommitStatus(...)`
contents: write
needs:
- get-changed-files
- check-codestyle-python
- check-codestyle-rust
runs-on: ubuntu-22.04
steps:
- name: Create fake `neon-cloud-e2e` check

View File

@@ -3,7 +3,7 @@ name: Create Release Branch
on:
schedule:
# It should be kept in sync with if-condition in jobs
- cron: '0 6 * * MON' # Storage release
- cron: '0 6 * * FRI' # Storage release
- cron: '0 6 * * THU' # Proxy release
workflow_dispatch:
inputs:
@@ -15,6 +15,10 @@ on:
type: boolean
description: 'Create Proxy release PR'
required: false
create-compute-release-branch:
type: boolean
description: 'Create Compute release PR'
required: false
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
@@ -25,20 +29,20 @@ defaults:
jobs:
create-storage-release-branch:
if: ${{ github.event.schedule == '0 6 * * MON' || format('{0}', inputs.create-storage-release-branch) == 'true' }}
if: ${{ github.event.schedule == '0 6 * * FRI' || inputs.create-storage-release-branch }}
permissions:
contents: write
uses: ./.github/workflows/_create-release-pr.yml
with:
component-name: 'Storage & Compute'
component-name: 'Storage'
release-branch: 'release'
secrets:
ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}
create-proxy-release-branch:
if: ${{ github.event.schedule == '0 6 * * THU' || format('{0}', inputs.create-proxy-release-branch) == 'true' }}
if: ${{ github.event.schedule == '0 6 * * THU' || inputs.create-proxy-release-branch }}
permissions:
contents: write
@@ -49,3 +53,16 @@ jobs:
release-branch: 'release-proxy'
secrets:
ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}
create-compute-release-branch:
if: inputs.create-compute-release-branch
permissions:
contents: write
uses: ./.github/workflows/_create-release-pr.yml
with:
component-name: 'Compute'
release-branch: 'release-compute'
secrets:
ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}

View File

@@ -51,6 +51,8 @@ jobs:
echo "tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then
echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-compute" ]]; then
echo "tag=release-compute-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
else
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"
BUILD_AND_TEST_RUN_ID=$(gh run list -b $CURRENT_BRANCH -c $CURRENT_SHA -w 'Build and Test' -L 1 --json databaseId --jq '.[].databaseId')
@@ -66,7 +68,7 @@ jobs:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
TAG: ${{ needs.tag.outputs.build-tag }}
steps:
- name: Wait for `promote-images` job to finish
- name: Wait for `promote-images-dev` job to finish
# It's important to have a timeout here, the script in the step can run infinitely
timeout-minutes: 60
run: |
@@ -77,17 +79,17 @@ jobs:
# For PRs we use the run id as the tag
BUILD_AND_TEST_RUN_ID=${TAG}
while true; do
conclusion=$(gh run --repo ${GITHUB_REPOSITORY} view ${BUILD_AND_TEST_RUN_ID} --json jobs --jq '.jobs[] | select(.name == "promote-images") | .conclusion')
conclusion=$(gh run --repo ${GITHUB_REPOSITORY} view ${BUILD_AND_TEST_RUN_ID} --json jobs --jq '.jobs[] | select(.name == "promote-images-dev") | .conclusion')
case "$conclusion" in
success)
break
;;
failure | cancelled | skipped)
echo "The 'promote-images' job didn't succeed: '${conclusion}'. Exiting..."
echo "The 'promote-images-dev' job didn't succeed: '${conclusion}'. Exiting..."
exit 1
;;
*)
echo "The 'promote-images' hasn't succeed yet. Waiting..."
echo "The 'promote-images-dev' hasn't succeed yet. Waiting..."
sleep 60
;;
esac

View File

@@ -1,15 +1,29 @@
/.github/ @neondatabase/developer-productivity
/compute_tools/ @neondatabase/control-plane @neondatabase/compute
/libs/pageserver_api/ @neondatabase/storage
/libs/postgres_ffi/ @neondatabase/compute @neondatabase/storage
/libs/remote_storage/ @neondatabase/storage
/libs/safekeeper_api/ @neondatabase/storage
# Autoscaling
/libs/vm_monitor/ @neondatabase/autoscaling
/pageserver/ @neondatabase/storage
# DevProd
/.github/ @neondatabase/developer-productivity
# Compute
/pgxn/ @neondatabase/compute
/pgxn/neon/ @neondatabase/compute @neondatabase/storage
/vendor/ @neondatabase/compute
/compute/ @neondatabase/compute
/compute_tools/ @neondatabase/compute
# Proxy
/libs/proxy/ @neondatabase/proxy
/proxy/ @neondatabase/proxy
# Storage
/pageserver/ @neondatabase/storage
/safekeeper/ @neondatabase/storage
/storage_controller @neondatabase/storage
/storage_scrubber @neondatabase/storage
/vendor/ @neondatabase/compute
/libs/pageserver_api/ @neondatabase/storage
/libs/remote_storage/ @neondatabase/storage
/libs/safekeeper_api/ @neondatabase/storage
# Shared
/pgxn/neon/ @neondatabase/compute @neondatabase/storage
/libs/compute_api/ @neondatabase/compute @neondatabase/control-plane
/libs/postgres_ffi/ @neondatabase/compute @neondatabase/storage

1359
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -11,6 +11,7 @@ members = [
"pageserver/pagebench",
"proxy",
"safekeeper",
"safekeeper/client",
"storage_broker",
"storage_controller",
"storage_controller/client",
@@ -35,6 +36,9 @@ members = [
"libs/walproposer",
"libs/wal_decoder",
"libs/postgres_initdb",
"libs/proxy/postgres-protocol2",
"libs/proxy/postgres-types2",
"libs/proxy/tokio-postgres2",
]
[workspace.package]
@@ -48,10 +52,7 @@ anyhow = { version = "1.0", features = ["backtrace"] }
arc-swap = "1.6"
async-compression = { version = "0.4.0", features = ["tokio", "gzip", "zstd"] }
atomic-take = "1.1.0"
azure_core = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }
azure_identity = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage_blobs = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
backtrace = "0.3.74"
flate2 = "1.0.26"
async-stream = "0.3"
async-trait = "0.1"
@@ -64,14 +65,14 @@ aws-smithy-types = "1.2"
aws-credential-types = "1.2.0"
aws-sigv4 = { version = "1.2", features = ["sign-http"] }
aws-types = "1.3"
axum = { version = "0.7.5", features = ["ws"] }
axum = { version = "0.7.9", features = ["ws"] }
base64 = "0.13.0"
bincode = "1.3"
bindgen = "0.70"
bit_field = "0.10.2"
bstr = "1.0"
byteorder = "1.4"
bytes = "1.0"
bytes = "1.9"
camino = "1.1.6"
cfg-if = "1.0.0"
chrono = { version = "0.4", default-features = false, features = ["clock"] }
@@ -80,6 +81,7 @@ comfy-table = "7.1"
const_format = "0.2"
crc32c = "0.6"
dashmap = { version = "5.5.0", features = ["raw-api"] }
diatomic-waker = { version = "0.2.3" }
either = "1.8"
enum-map = "2.4.2"
enumset = "1.0.12"
@@ -108,9 +110,11 @@ hyper-util = "0.1"
tokio-tungstenite = "0.21.0"
indexmap = "2"
indoc = "2"
inferno = "0.12.0"
ipnet = "2.10.0"
itertools = "0.10"
itoa = "1.0.11"
jemalloc_pprof = "0.6"
jsonwebtoken = "9"
lasso = "0.7"
libc = "0.2"
@@ -123,16 +127,16 @@ notify = "6.0.0"
num_cpus = "1.15"
num-traits = "0.2.15"
once_cell = "1.13"
opentelemetry = "0.24"
opentelemetry_sdk = "0.24"
opentelemetry-otlp = { version = "0.17", default-features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }
opentelemetry-semantic-conventions = "0.16"
opentelemetry = "0.27"
opentelemetry_sdk = "0.27"
opentelemetry-otlp = { version = "0.27", default-features = false, features = ["http-proto", "trace", "http", "reqwest-client"] }
opentelemetry-semantic-conventions = "0.27"
parking_lot = "0.12"
parquet = { version = "53", default-features = false, features = ["zstd"] }
parquet_derive = "53"
pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }
pin-project-lite = "0.2"
pprof = { version = "0.14", features = ["criterion", "flamegraph", "protobuf", "protobuf-codec"] }
pprof = { version = "0.14", features = ["criterion", "flamegraph", "frame-pointer", "protobuf", "protobuf-codec"] }
procfs = "0.16"
prometheus = {version = "0.13", default-features=false, features = ["process"]} # removes protobuf dependency
prost = "0.13"
@@ -140,9 +144,9 @@ rand = "0.8"
redis = { version = "0.25.2", features = ["tokio-rustls-comp", "keep-alive"] }
regex = "1.10.2"
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls"] }
reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_24"] }
reqwest-middleware = "0.3.0"
reqwest-retry = "0.5"
reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_27"] }
reqwest-middleware = "0.4"
reqwest-retry = "0.7"
routerify = "3"
rpds = "0.13"
rustc-hash = "1.1.0"
@@ -171,7 +175,7 @@ sync_wrapper = "0.1.2"
tar = "0.4"
test-context = "0.3"
thiserror = "1.0"
tikv-jemallocator = { version = "0.6", features = ["stats"] }
tikv-jemallocator = { version = "0.6", features = ["profiling", "stats", "unprefixed_malloc_on_supported_platforms"] }
tikv-jemalloc-ctl = { version = "0.6", features = ["stats"] }
tokio = { version = "1.17", features = ["macros"] }
tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }
@@ -184,10 +188,12 @@ tokio-util = { version = "0.7.10", features = ["io", "rt"] }
toml = "0.8"
toml_edit = "0.22"
tonic = {version = "0.12.3", features = ["tls", "tls-roots"]}
tower-service = "0.3.2"
tower = { version = "0.5.2", default-features = false }
tower-http = { version = "0.6.2", features = ["request-id", "trace"] }
tower-service = "0.3.3"
tracing = "0.1"
tracing-error = "0.2"
tracing-opentelemetry = "0.25"
tracing-opentelemetry = "0.28"
tracing-subscriber = { version = "0.3", default-features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter", "json"] }
try-lock = "0.2.5"
twox-hash = { version = "1.6.3", default-features = false }
@@ -211,6 +217,12 @@ postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git",
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
## Azure SDK crates
azure_core = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }
azure_identity = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage_blobs = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }
## Local libraries
compute_api = { version = "0.1", path = "./libs/compute_api/" }
consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }
@@ -226,6 +238,7 @@ postgres_initdb = { path = "./libs/postgres_initdb" }
pq_proto = { version = "0.1", path = "./libs/pq_proto/" }
remote_storage = { version = "0.1", path = "./libs/remote_storage/" }
safekeeper_api = { version = "0.1", path = "./libs/safekeeper_api" }
safekeeper_client = { path = "./safekeeper/client" }
desim = { version = "0.1", path = "./libs/desim" }
storage_broker = { version = "0.1", path = "./storage_broker/" } # Note: main broker code is inside the binary crate, so linking with the library shouldn't be heavy.
storage_controller_client = { path = "./storage_controller/client" }
@@ -256,6 +269,8 @@ tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", br
[profile.release]
# This is useful for profiling and, to some extent, debug.
# Besides, debug info should not affect the performance.
#
# NB: we also enable frame pointers for improved profiling, see .cargo/config.toml.
debug = true
# disable debug symbols for all packages except this one to decrease binaries size

View File

@@ -7,7 +7,7 @@ ARG IMAGE=build-tools
ARG TAG=pinned
ARG DEFAULT_PG_VERSION=17
ARG STABLE_PG_VERSION=16
ARG DEBIAN_VERSION=bullseye
ARG DEBIAN_VERSION=bookworm
ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim
# Build Postgres
@@ -45,7 +45,7 @@ COPY --chown=nonroot . .
ARG ADDITIONAL_RUSTFLAGS
RUN set -e \
&& PQ_LIB_DIR=$(pwd)/pg_install/v${STABLE_PG_VERSION}/lib RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment ${ADDITIONAL_RUSTFLAGS}" cargo build \
&& PQ_LIB_DIR=$(pwd)/pg_install/v${STABLE_PG_VERSION}/lib RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo build \
--bin pg_sni_router \
--bin pageserver \
--bin pagectl \
@@ -69,6 +69,9 @@ RUN set -e \
libreadline-dev \
libseccomp-dev \
ca-certificates \
# System postgres for use with client libraries (e.g. in storage controller)
postgresql-15 \
openssl \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
&& useradd -d /data neon \
&& chown -R neon:neon /data
@@ -101,11 +104,6 @@ RUN mkdir -p /data/.neon/ && \
> /data/.neon/pageserver.toml && \
chown -R neon:neon /data/.neon
# When running a binary that links with libpq, default to using our most recent postgres version. Binaries
# that want a particular postgres version will select it explicitly: this is just a default.
ENV LD_LIBRARY_PATH=/usr/local/v${DEFAULT_PG_VERSION}/lib
VOLUME ["/data"]
USER neon
EXPOSE 6400

View File

@@ -3,7 +3,6 @@ ROOT_PROJECT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
# Where to install Postgres, default is ./pg_install, maybe useful for package managers
POSTGRES_INSTALL_DIR ?= $(ROOT_PROJECT_DIR)/pg_install/
OPENSSL_PREFIX_DIR := /usr/local/openssl
ICU_PREFIX_DIR := /usr/local/icu
#
@@ -26,11 +25,9 @@ endif
ifeq ($(shell test -e /home/nonroot/.docker_build && echo -n yes),yes)
# Exclude static build openssl, icu for local build (MacOS, Linux)
# Only keep for build type release and debug
PG_CFLAGS += -I$(OPENSSL_PREFIX_DIR)/include
PG_CONFIGURE_OPTS += --with-icu
PG_CONFIGURE_OPTS += ICU_CFLAGS='-I/$(ICU_PREFIX_DIR)/include -DU_STATIC_IMPLEMENTATION'
PG_CONFIGURE_OPTS += ICU_LIBS='-L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -licui18n -licuuc -licudata -lstdc++ -Wl,-Bdynamic -lm'
PG_CONFIGURE_OPTS += LDFLAGS='-L$(OPENSSL_PREFIX_DIR)/lib -L$(OPENSSL_PREFIX_DIR)/lib64 -L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -Wl,-Bstatic -lssl -lcrypto -Wl,-Bdynamic -lrt -lm -ldl -lpthread'
endif
UNAME_S := $(shell uname -s)
@@ -38,6 +35,7 @@ ifeq ($(UNAME_S),Linux)
# Seccomp BPF is only available for Linux
PG_CONFIGURE_OPTS += --with-libseccomp
else ifeq ($(UNAME_S),Darwin)
PG_CFLAGS += -DUSE_PREFETCH
ifndef DISABLE_HOMEBREW
# macOS with brew-installed openssl requires explicit paths
# It can be configured with OPENSSL_PREFIX variable
@@ -146,6 +144,8 @@ postgres-%: postgres-configure-% \
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_prewarm install
+@echo "Compiling pg_buffercache $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_buffercache install
+@echo "Compiling pg_visibility $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_visibility install
+@echo "Compiling pageinspect $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect install
+@echo "Compiling amcheck $*"

View File

@@ -1,4 +1,4 @@
ARG DEBIAN_VERSION=bullseye
ARG DEBIAN_VERSION=bookworm
FROM debian:bookworm-slim AS pgcopydb_builder
ARG DEBIAN_VERSION
@@ -57,9 +57,9 @@ RUN mkdir -p /pgcopydb/bin && \
mkdir -p /pgcopydb/lib && \
chmod -R 755 /pgcopydb && \
chown -R nonroot:nonroot /pgcopydb
COPY --from=pgcopydb_builder /usr/lib/postgresql/16/bin/pgcopydb /pgcopydb/bin/pgcopydb
COPY --from=pgcopydb_builder /pgcopydb/lib/libpq.so.5 /pgcopydb/lib/libpq.so.5
COPY --from=pgcopydb_builder /usr/lib/postgresql/16/bin/pgcopydb /pgcopydb/bin/pgcopydb
COPY --from=pgcopydb_builder /pgcopydb/lib/libpq.so.5 /pgcopydb/lib/libpq.so.5
# System deps
#
@@ -115,7 +115,7 @@ RUN set -e \
# Keep the version the same as in compute/compute-node.Dockerfile and
# test_runner/regress/test_compute_metrics.py.
ENV SQL_EXPORTER_VERSION=0.13.1
ENV SQL_EXPORTER_VERSION=0.17.0
RUN curl -fsSL \
"https://github.com/burningalchemist/sql_exporter/releases/download/${SQL_EXPORTER_VERSION}/sql_exporter-${SQL_EXPORTER_VERSION}.linux-$(case "$(uname -m)" in x86_64) echo amd64;; aarch64) echo arm64;; esac).tar.gz" \
--output sql_exporter.tar.gz \
@@ -190,21 +190,6 @@ RUN for package in Capture::Tiny DateTime Devel::Cover Digest::MD5 File::Spec JS
&& make install \
&& rm -rf ../lcov.tar.gz
# Compile and install the static OpenSSL library
ENV OPENSSL_VERSION=1.1.1w
ENV OPENSSL_PREFIX=/usr/local/openssl
RUN wget -O /tmp/openssl-${OPENSSL_VERSION}.tar.gz https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz && \
echo "cf3098950cb4d853ad95c0841f1f9c6d3dc102dccfcacd521d93925208b76ac8 /tmp/openssl-${OPENSSL_VERSION}.tar.gz" | sha256sum --check && \
cd /tmp && \
tar xzvf /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \
rm /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \
cd /tmp/openssl-${OPENSSL_VERSION} && \
./config --prefix=${OPENSSL_PREFIX} -static --static no-shared -fPIC && \
make -j "$(nproc)" && \
make install && \
cd /tmp && \
rm -rf /tmp/openssl-${OPENSSL_VERSION}
# Use the same version of libicu as the compute nodes so that
# clusters created using inidb on pageserver can be used by computes.
#
@@ -258,14 +243,14 @@ WORKDIR /home/nonroot
# Rust
# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)
ENV RUSTC_VERSION=1.82.0
ENV RUSTC_VERSION=1.84.0
ENV RUSTUP_HOME="/home/nonroot/.rustup"
ENV PATH="/home/nonroot/.cargo/bin:${PATH}"
ARG RUSTFILT_VERSION=0.2.1
ARG CARGO_HAKARI_VERSION=0.9.30
ARG CARGO_DENY_VERSION=0.16.1
ARG CARGO_HACK_VERSION=0.6.31
ARG CARGO_NEXTEST_VERSION=0.9.72
ARG CARGO_HAKARI_VERSION=0.9.33
ARG CARGO_DENY_VERSION=0.16.2
ARG CARGO_HACK_VERSION=0.6.33
ARG CARGO_NEXTEST_VERSION=0.9.85
RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && whoami && \
chmod +x rustup-init && \
./rustup-init -y --default-toolchain ${RUSTC_VERSION} && \
@@ -289,7 +274,7 @@ RUN whoami \
&& cargo --version --verbose \
&& rustup --version --verbose \
&& rustc --version --verbose \
&& clang --version
&& clang --version
RUN if [ "${DEBIAN_VERSION}" = "bookworm" ]; then \
LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version; \

File diff suppressed because it is too large Load Diff

View File

@@ -3,9 +3,10 @@
metrics: [
import 'sql_exporter/checkpoints_req.libsonnet',
import 'sql_exporter/checkpoints_timed.libsonnet',
import 'sql_exporter/compute_backpressure_throttling_seconds.libsonnet',
import 'sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet',
import 'sql_exporter/compute_current_lsn.libsonnet',
import 'sql_exporter/compute_logical_snapshot_files.libsonnet',
import 'sql_exporter/compute_logical_snapshots_bytes.libsonnet',
import 'sql_exporter/compute_max_connections.libsonnet',
import 'sql_exporter/compute_receive_lsn.libsonnet',
import 'sql_exporter/compute_subscriptions_count.libsonnet',

View File

@@ -1,5 +1,9 @@
[databases]
*=host=localhost port=5432 auth_user=cloud_admin
;; pgbouncer propagates application_name (if it's specified) to the server, but some
;; clients don't set it. We set default application_name=pgbouncer to make it
;; easier to identify pgbouncer connections in Postgres. If client sets
;; application_name, it will be used instead.
*=host=localhost port=5432 auth_user=cloud_admin application_name=pgbouncer
[pgbouncer]
listen_port=6432
listen_addr=0.0.0.0
@@ -15,3 +19,10 @@ max_prepared_statements=0
admin_users=postgres
unix_socket_dir=/tmp/
unix_socket_mode=0777
;; Disable connection logging. It produces a lot of logs that no one looks at,
;; and we can get similar log entries from the proxy too. We had incidents in
;; the past where the logging significantly stressed the log device or pgbouncer
;; itself.
log_connections=0
log_disconnections=0

View File

@@ -1,10 +1,10 @@
{
metric_name: 'compute_backpressure_throttling_seconds',
type: 'gauge',
metric_name: 'compute_backpressure_throttling_seconds_total',
type: 'counter',
help: 'Time compute has spent throttled',
key_labels: null,
values: [
'throttled',
],
query: importstr 'sql_exporter/compute_backpressure_throttling_seconds.sql',
query: importstr 'sql_exporter/compute_backpressure_throttling_seconds_total.sql',
}

View File

@@ -0,0 +1,7 @@
SELECT
(SELECT current_setting('neon.timeline_id')) AS timeline_id,
-- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.
-- These temporary snapshot files are renamed to the actual snapshot files
-- after they are completely built. We only WAL-log the completely built
-- snapshot files
(SELECT COALESCE(sum(size), 0) FROM pg_ls_logicalsnapdir() WHERE name LIKE '%.snap') AS logical_snapshots_bytes;

View File

@@ -0,0 +1,17 @@
local neon = import 'neon.libsonnet';
local pg_ls_logicalsnapdir = importstr 'sql_exporter/compute_logical_snapshots_bytes.15.sql';
local pg_ls_dir = importstr 'sql_exporter/compute_logical_snapshots_bytes.sql';
{
metric_name: 'compute_logical_snapshots_bytes',
type: 'gauge',
help: 'Size of the pg_logical/snapshots directory, not including temporary files',
key_labels: [
'timeline_id',
],
values: [
'logical_snapshots_bytes',
],
query: if neon.PG_MAJORVERSION_NUM < 15 then pg_ls_dir else pg_ls_logicalsnapdir,
}

View File

@@ -0,0 +1,9 @@
SELECT
(SELECT setting FROM pg_settings WHERE name = 'neon.timeline_id') AS timeline_id,
-- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.
-- These temporary snapshot files are renamed to the actual snapshot files
-- after they are completely built. We only WAL-log the completely built
-- snapshot files
(SELECT COALESCE(sum((pg_stat_file('pg_logical/snapshots/' || name, missing_ok => true)).size), 0)
FROM (SELECT * FROM pg_ls_dir('pg_logical/snapshots') WHERE pg_ls_dir LIKE '%.snap') AS name
) AS logical_snapshots_bytes;

View File

@@ -981,7 +981,7 @@ index fc42d418bf..e38f517574 100644
CREATE SCHEMA addr_nsp;
SET search_path TO 'addr_nsp';
diff --git a/src/test/regress/expected/password.out b/src/test/regress/expected/password.out
index 8475231735..1afae5395f 100644
index 8475231735..0653946337 100644
--- a/src/test/regress/expected/password.out
+++ b/src/test/regress/expected/password.out
@@ -12,11 +12,11 @@ SET password_encryption = 'md5'; -- ok
@@ -1006,65 +1006,63 @@ index 8475231735..1afae5395f 100644
-----------------+---------------------------------------------------
- regress_passwd1 | md5783277baca28003b33453252be4dbb34
- regress_passwd2 | md54044304ba511dd062133eb5b4b84a2a3
+ regress_passwd1 | NEON_MD5_PLACEHOLDER_regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER_regress_passwd2
+ regress_passwd1 | NEON_MD5_PLACEHOLDER:regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER:regress_passwd2
regress_passwd3 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd4 |
+ regress_passwd4 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
(4 rows)
-- Rename a role
@@ -54,24 +54,30 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
@@ -54,24 +54,16 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
-- passwords.
SET password_encryption = 'md5';
-- encrypt with MD5
-ALTER ROLE regress_passwd2 PASSWORD 'foo';
--- already encrypted, use as they are
-ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
-ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted, use as they are
ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
SET password_encryption = 'scram-sha-256';
-- create SCRAM secret
-ALTER ROLE regress_passwd4 PASSWORD 'foo';
--- already encrypted with MD5, use as it is
-CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
--- This looks like a valid SCRAM-SHA-256 secret, but it is not
--- so it should be hashed with SCRAM-SHA-256.
-CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
--- These may look like valid MD5 secrets, but they are not, so they
--- should be hashed with SCRAM-SHA-256.
--- trailing garbage at the end
-CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
--- invalid length
-CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+ALTER ROLE regress_passwd4 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted with MD5, use as it is
CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- This looks like a valid SCRAM-SHA-256 secret, but it is not
-- so it should be hashed with SCRAM-SHA-256.
CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- These may look like valid MD5 secrets, but they are not, so they
-- should be hashed with SCRAM-SHA-256.
-- trailing garbage at the end
CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
-- invalid length
CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd5 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd6 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd7 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd8 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Changing the SCRAM iteration count
SET scram_iterations = 1024;
CREATE ROLE regress_passwd9 PASSWORD 'alterediterationcount';
@@ -81,63 +87,67 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
@@ -81,11 +73,11 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
ORDER BY rolname, rolpassword;
rolname | rolpassword_masked
-----------------+---------------------------------------------------
- regress_passwd1 | md5cd3578025fe2c3d7ed1b9a9b26238b70
- regress_passwd2 | md5dfa155cadd5f4ad57860162f3fab9cdb
+ regress_passwd1 | NEON_MD5_PLACEHOLDER_regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER_regress_passwd2
+ regress_passwd1 | NEON_MD5_PLACEHOLDER:regress_passwd1
+ regress_passwd2 | NEON_MD5_PLACEHOLDER:regress_passwd2
regress_passwd3 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd4 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd5 | md5e73a4b11df52a6068f8b39f90be36023
- regress_passwd6 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd7 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
- regress_passwd8 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd9 | SCRAM-SHA-256$1024:<salt>$<storedkey>:<serverkey>
-(9 rows)
+(5 rows)
+ regress_passwd5 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd6 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd7 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
regress_passwd8 | SCRAM-SHA-256$4096:<salt>$<storedkey>:<serverkey>
@@ -95,23 +87,20 @@ SELECT rolname, regexp_replace(rolpassword, '(SCRAM-SHA-256)\$(\d+):([a-zA-Z0-9+
-- An empty password is not allowed, in any form
CREATE ROLE regress_passwd_empty PASSWORD '';
NOTICE: empty string is not a valid password, clearing password
@@ -1082,56 +1080,37 @@ index 8475231735..1afae5395f 100644
-(1 row)
+(0 rows)
-- Test with invalid stored and server keys.
--
-- The first is valid, to act as a control. The others have too long
-- stored/server keys. They will be re-hashed.
CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+ERROR: Received HTTP code 400 from control plane: {"error":"Neon only supports being given plaintext passwords"}
--- Test with invalid stored and server keys.
---
--- The first is valid, to act as a control. The others have too long
--- stored/server keys. They will be re-hashed.
-CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd_sha_len0 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Check that the invalid secrets were re-hashed. A re-hashed secret
-- should not contain the original salt.
SELECT rolname, rolpassword not like '%A6xHKoH/494E941doaPOYg==%' as is_rolpassword_rehashed
FROM pg_authid
WHERE rolname LIKE 'regress_passwd_sha_len%'
@@ -120,7 +109,7 @@ SELECT rolname, rolpassword not like '%A6xHKoH/494E941doaPOYg==%' as is_rolpassw
ORDER BY rolname;
- rolname | is_rolpassword_rehashed
--------------------------+-------------------------
rolname | is_rolpassword_rehashed
-------------------------+-------------------------
- regress_passwd_sha_len0 | f
- regress_passwd_sha_len1 | t
- regress_passwd_sha_len2 | t
-(3 rows)
+ rolname | is_rolpassword_rehashed
+---------+-------------------------
+(0 rows)
DROP ROLE regress_passwd1;
DROP ROLE regress_passwd2;
DROP ROLE regress_passwd3;
DROP ROLE regress_passwd4;
DROP ROLE regress_passwd5;
+ERROR: role "regress_passwd5" does not exist
DROP ROLE regress_passwd6;
+ERROR: role "regress_passwd6" does not exist
DROP ROLE regress_passwd7;
+ERROR: role "regress_passwd7" does not exist
+ regress_passwd_sha_len0 | t
regress_passwd_sha_len1 | t
regress_passwd_sha_len2 | t
(3 rows)
@@ -135,6 +124,7 @@ DROP ROLE regress_passwd7;
DROP ROLE regress_passwd8;
+ERROR: role "regress_passwd8" does not exist
DROP ROLE regress_passwd9;
DROP ROLE regress_passwd_empty;
+ERROR: role "regress_passwd_empty" does not exist
DROP ROLE regress_passwd_sha_len0;
+ERROR: role "regress_passwd_sha_len0" does not exist
DROP ROLE regress_passwd_sha_len1;
+ERROR: role "regress_passwd_sha_len1" does not exist
DROP ROLE regress_passwd_sha_len2;
+ERROR: role "regress_passwd_sha_len2" does not exist
-- all entries should have been removed
SELECT rolname, rolpassword
FROM pg_authid
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index 5b9dba7b32..cc408dad42 100644
--- a/src/test/regress/expected/privileges.out
@@ -3194,7 +3173,7 @@ index 1a6c61f49d..1c31ac6a53 100644
-- Test generic object addressing/identification functions
CREATE SCHEMA addr_nsp;
diff --git a/src/test/regress/sql/password.sql b/src/test/regress/sql/password.sql
index 53e86b0b6c..f07cf1ec54 100644
index 53e86b0b6c..0303fdfe96 100644
--- a/src/test/regress/sql/password.sql
+++ b/src/test/regress/sql/password.sql
@@ -10,11 +10,11 @@ SET password_encryption = 'scram-sha-256'; -- ok
@@ -3213,23 +3192,59 @@ index 53e86b0b6c..f07cf1ec54 100644
-- check list of created entries
--
@@ -42,14 +42,14 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
@@ -42,26 +42,18 @@ ALTER ROLE regress_passwd2_new RENAME TO regress_passwd2;
SET password_encryption = 'md5';
-- encrypt with MD5
-ALTER ROLE regress_passwd2 PASSWORD 'foo';
--- already encrypted, use as they are
-ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
-ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
+ALTER ROLE regress_passwd2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted, use as they are
ALTER ROLE regress_passwd1 PASSWORD 'md5cd3578025fe2c3d7ed1b9a9b26238b70';
ALTER ROLE regress_passwd3 PASSWORD 'SCRAM-SHA-256$4096:VLK4RMaQLCvNtQ==$6YtlR4t69SguDiwFvbVgVZtuz6gpJQQqUMZ7IQJK5yI=:ps75jrHeYU4lXCcXI4O8oIdJ3eO8o2jirjruw9phBTo=';
SET password_encryption = 'scram-sha-256';
-- create SCRAM secret
-ALTER ROLE regress_passwd4 PASSWORD 'foo';
--- already encrypted with MD5, use as it is
-CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+ALTER ROLE regress_passwd4 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- already encrypted with MD5, use as it is
CREATE ROLE regress_passwd5 PASSWORD 'md5e73a4b11df52a6068f8b39f90be36023';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd5 PASSWORD NEON_PASSWORD_PLACEHOLDER;
--- This looks like a valid SCRAM-SHA-256 secret, but it is not
--- so it should be hashed with SCRAM-SHA-256.
-CREATE ROLE regress_passwd6 PASSWORD 'SCRAM-SHA-256$1234';
--- These may look like valid MD5 secrets, but they are not, so they
--- should be hashed with SCRAM-SHA-256.
--- trailing garbage at the end
-CREATE ROLE regress_passwd7 PASSWORD 'md5012345678901234567890123456789zz';
--- invalid length
-CREATE ROLE regress_passwd8 PASSWORD 'md501234567890123456789012345678901zz';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd6 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd7 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd8 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Changing the SCRAM iteration count
SET scram_iterations = 1024;
@@ -78,13 +70,10 @@ ALTER ROLE regress_passwd_empty PASSWORD 'md585939a5ce845f1a1b620742e3c659e0a';
ALTER ROLE regress_passwd_empty PASSWORD 'SCRAM-SHA-256$4096:hpFyHTUsSWcR7O9P$LgZFIt6Oqdo27ZFKbZ2nV+vtnYM995pDh9ca6WSi120=:qVV5NeluNfUPkwm7Vqat25RjSPLkGeoZBQs6wVv+um4=';
SELECT rolpassword FROM pg_authid WHERE rolname='regress_passwd_empty';
--- Test with invalid stored and server keys.
---
--- The first is valid, to act as a control. The others have too long
--- stored/server keys. They will be re-hashed.
-CREATE ROLE regress_passwd_sha_len0 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len1 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96RqwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZI=';
-CREATE ROLE regress_passwd_sha_len2 PASSWORD 'SCRAM-SHA-256$4096:A6xHKoH/494E941doaPOYg==$Ky+A30sewHIH3VHQLRN9vYsuzlgNyGNKCh37dy96Rqw=:COPdlNiIkrsacU5QoxydEuOH6e/KfiipeETb/bPw8ZIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=';
+-- Neon does not support encrypted passwords, use unencrypted instead
+CREATE ROLE regress_passwd_sha_len0 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len1 PASSWORD NEON_PASSWORD_PLACEHOLDER;
+CREATE ROLE regress_passwd_sha_len2 PASSWORD NEON_PASSWORD_PLACEHOLDER;
-- Check that the invalid secrets were re-hashed. A re-hashed secret
-- should not contain the original salt.
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 249df17a58..b258e7f26a 100644
--- a/src/test/regress/sql/privileges.sql

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,174 @@
diff --git a/expected/ut-A.out b/expected/ut-A.out
index e7d68a1..65a056c 100644
--- a/expected/ut-A.out
+++ b/expected/ut-A.out
@@ -9,13 +9,16 @@ SET search_path TO public;
----
-- No.A-1-1-3
CREATE EXTENSION pg_hint_plan;
+LOG: Sending request to compute_ctl: http://localhost:3080/extension_server/pg_hint_plan
-- No.A-1-2-3
DROP EXTENSION pg_hint_plan;
-- No.A-1-1-4
CREATE SCHEMA other_schema;
CREATE EXTENSION pg_hint_plan SCHEMA other_schema;
+LOG: Sending request to compute_ctl: http://localhost:3080/extension_server/pg_hint_plan
ERROR: extension "pg_hint_plan" must be installed in schema "hint_plan"
CREATE EXTENSION pg_hint_plan;
+LOG: Sending request to compute_ctl: http://localhost:3080/extension_server/pg_hint_plan
DROP SCHEMA other_schema;
----
---- No. A-5-1 comment pattern
diff --git a/expected/ut-J.out b/expected/ut-J.out
index 2fa3c70..314e929 100644
--- a/expected/ut-J.out
+++ b/expected/ut-J.out
@@ -789,38 +789,6 @@ NestLoop(st1 st2)
MergeJoin(t1 t2)
not used hint:
duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-NestLoop(st1 st2)
-MergeJoin(t1 t2)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-NestLoop(st1 st2)
-MergeJoin(t1 t2)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-NestLoop(st1 st2)
-MergeJoin(t1 t2)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-NestLoop(st1 st2)
-MergeJoin(t1 t2)
-duplication hint:
error hint:
explain_filter
diff --git a/expected/ut-S.out b/expected/ut-S.out
index 0bfcfb8..e75f581 100644
--- a/expected/ut-S.out
+++ b/expected/ut-S.out
@@ -4415,34 +4415,6 @@ used hint:
IndexScan(ti1 ti1_pred)
not used hint:
duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(ti1 ti1_pred)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(ti1 ti1_pred)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(ti1 ti1_pred)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(ti1 ti1_pred)
-duplication hint:
error hint:
explain_filter
diff --git a/expected/ut-W.out b/expected/ut-W.out
index a09bd34..0ad227c 100644
--- a/expected/ut-W.out
+++ b/expected/ut-W.out
@@ -1341,54 +1341,6 @@ IndexScan(ft1)
IndexScan(t)
Parallel(s1 3 hard)
duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(*VALUES*)
-SeqScan(cte1)
-IndexScan(ft1)
-IndexScan(t)
-Parallel(p1 5 hard)
-Parallel(s1 3 hard)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(*VALUES*)
-SeqScan(cte1)
-IndexScan(ft1)
-IndexScan(t)
-Parallel(p1 5 hard)
-Parallel(s1 3 hard)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(*VALUES*)
-SeqScan(cte1)
-IndexScan(ft1)
-IndexScan(t)
-Parallel(p1 5 hard)
-Parallel(s1 3 hard)
-duplication hint:
-error hint:
-
-LOG: pg_hint_plan:
-used hint:
-not used hint:
-IndexScan(*VALUES*)
-SeqScan(cte1)
-IndexScan(ft1)
-IndexScan(t)
-Parallel(p1 5 hard)
-Parallel(s1 3 hard)
-duplication hint:
error hint:
explain_filter
diff --git a/expected/ut-fdw.out b/expected/ut-fdw.out
index 017fa4b..98d989b 100644
--- a/expected/ut-fdw.out
+++ b/expected/ut-fdw.out
@@ -7,6 +7,7 @@ SET pg_hint_plan.debug_print TO on;
SET client_min_messages TO LOG;
SET pg_hint_plan.enable_hint TO on;
CREATE EXTENSION file_fdw;
+LOG: Sending request to compute_ctl: http://localhost:3080/extension_server/file_fdw
CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw;
CREATE USER MAPPING FOR PUBLIC SERVER file_server;
CREATE FOREIGN TABLE ft1 (id int, val int) SERVER file_server OPTIONS (format 'csv', filename :'filename');

View File

@@ -0,0 +1,42 @@
commit 46b38d3e46f9cd6c70d9b189dd6ff4abaa17cf5e
Author: Alexander Bayandin <alexander@neon.tech>
Date: Sat Nov 30 18:29:32 2024 +0000
Fix v8 9.7.37 compilation on Debian 12
diff --git a/patches/code/84cf3230a9680aac3b73c410c2b758760b6d3066.patch b/patches/code/84cf3230a9680aac3b73c410c2b758760b6d3066.patch
new file mode 100644
index 0000000..f0a5dc7
--- /dev/null
+++ b/patches/code/84cf3230a9680aac3b73c410c2b758760b6d3066.patch
@@ -0,0 +1,30 @@
+From 84cf3230a9680aac3b73c410c2b758760b6d3066 Mon Sep 17 00:00:00 2001
+From: Michael Lippautz <mlippautz@chromium.org>
+Date: Thu, 27 Jan 2022 14:14:11 +0100
+Subject: [PATCH] cppgc: Fix include
+
+Add <utility> to cover for std::exchange.
+
+Bug: v8:12585
+Change-Id: Ida65144e93e466be8914527d0e646f348c136bcb
+Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3420309
+Auto-Submit: Michael Lippautz <mlippautz@chromium.org>
+Reviewed-by: Omer Katz <omerkatz@chromium.org>
+Commit-Queue: Michael Lippautz <mlippautz@chromium.org>
+Cr-Commit-Position: refs/heads/main@{#78820}
+---
+ src/heap/cppgc/prefinalizer-handler.h | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/src/heap/cppgc/prefinalizer-handler.h b/src/heap/cppgc/prefinalizer-handler.h
+index bc17c99b1838..c82c91ff5a45 100644
+--- a/src/heap/cppgc/prefinalizer-handler.h
++++ b/src/heap/cppgc/prefinalizer-handler.h
+@@ -5,6 +5,7 @@
+ #ifndef V8_HEAP_CPPGC_PREFINALIZER_HANDLER_H_
+ #define V8_HEAP_CPPGC_PREFINALIZER_HANDLER_H_
+
++#include <utility>
+ #include <vector>
+
+ #include "include/cppgc/prefinalizer.h"

View File

@@ -7,7 +7,7 @@ license.workspace = true
[features]
default = []
# Enables test specific features.
testing = []
testing = ["fail/failpoints"]
[dependencies]
base64.workspace = true
@@ -15,13 +15,15 @@ aws-config.workspace = true
aws-sdk-s3.workspace = true
aws-sdk-kms.workspace = true
anyhow.workspace = true
axum = { workspace = true, features = [] }
camino.workspace = true
chrono.workspace = true
cfg-if.workspace = true
clap.workspace = true
fail.workspace = true
flate2.workspace = true
futures.workspace = true
hyper0 = { workspace = true, features = ["full"] }
http.workspace = true
metrics.workspace = true
nix.workspace = true
notify.workspace = true
@@ -36,6 +38,8 @@ serde_with.workspace = true
serde_json.workspace = true
signal-hook.workspace = true
tar.workspace = true
tower.workspace = true
tower-http.workspace = true
reqwest = { workspace = true, features = ["json"] }
tokio = { workspace = true, features = ["rt", "rt-multi-thread"] }
tokio-postgres.workspace = true
@@ -47,6 +51,7 @@ tracing-subscriber.workspace = true
tracing-utils.workspace = true
thiserror.workspace = true
url.workspace = true
uuid.workspace = true
prometheus.workspace = true
postgres_initdb.workspace = true

View File

@@ -37,6 +37,7 @@ use std::collections::HashMap;
use std::fs::File;
use std::path::Path;
use std::process::exit;
use std::str::FromStr;
use std::sync::atomic::Ordering;
use std::sync::{mpsc, Arc, Condvar, Mutex, RwLock};
use std::{thread, time::Duration};
@@ -58,20 +59,23 @@ use compute_tools::compute::{
forward_termination_signal, ComputeNode, ComputeState, ParsedSpec, PG_PID,
};
use compute_tools::configurator::launch_configurator;
use compute_tools::extension_server::get_pg_version;
use compute_tools::http::api::launch_http_server;
use compute_tools::extension_server::get_pg_version_string;
use compute_tools::http::launch_http_server;
use compute_tools::logger::*;
use compute_tools::monitor::launch_monitor;
use compute_tools::params::*;
use compute_tools::spec::*;
use compute_tools::swap::resize_swap;
use rlimit::{setrlimit, Resource};
use utils::failpoint_support;
// this is an arbitrary build tag. Fine as a default / for testing purposes
// in-case of not-set environment var
const BUILD_TAG_DEFAULT: &str = "latest";
fn main() -> Result<()> {
let scenario = failpoint_support::init();
let (build_tag, clap_args) = init()?;
// enable core dumping for all child processes
@@ -99,17 +103,14 @@ fn main() -> Result<()> {
maybe_delay_exit(delay_exit);
scenario.teardown();
deinit_and_exit(wait_pg_result);
}
fn init() -> Result<(String, clap::ArgMatches)> {
init_tracing_and_logging(DEFAULT_LOG_LEVEL)?;
opentelemetry::global::set_error_handler(|err| {
tracing::info!("OpenTelemetry error: {err}");
})
.expect("global error handler lock poisoned");
let mut signals = Signals::new([SIGINT, SIGTERM, SIGQUIT])?;
thread::spawn(move || {
for sig in signals.forever() {
@@ -245,47 +246,48 @@ fn try_spec_from_cli(
let compute_id = matches.get_one::<String>("compute-id");
let control_plane_uri = matches.get_one::<String>("control-plane-uri");
let spec;
let mut live_config_allowed = false;
match spec_json {
// First, try to get cluster spec from the cli argument
Some(json) => {
info!("got spec from cli argument {}", json);
spec = Some(serde_json::from_str(json)?);
}
None => {
// Second, try to read it from the file if path is provided
if let Some(sp) = spec_path {
let path = Path::new(sp);
let file = File::open(path)?;
spec = Some(serde_json::from_reader(file)?);
live_config_allowed = true;
} else if let Some(id) = compute_id {
if let Some(cp_base) = control_plane_uri {
live_config_allowed = true;
spec = match get_spec_from_control_plane(cp_base, id) {
Ok(s) => s,
Err(e) => {
error!("cannot get response from control plane: {}", e);
panic!("neither spec nor confirmation that compute is in the Empty state was received");
}
};
} else {
panic!("must specify both --control-plane-uri and --compute-id or none");
}
} else {
panic!(
"compute spec should be provided by one of the following ways: \
--spec OR --spec-path OR --control-plane-uri and --compute-id"
);
}
}
// First, try to get cluster spec from the cli argument
if let Some(spec_json) = spec_json {
info!("got spec from cli argument {}", spec_json);
return Ok(CliSpecParams {
spec: Some(serde_json::from_str(spec_json)?),
live_config_allowed: false,
});
}
// Second, try to read it from the file if path is provided
if let Some(spec_path) = spec_path {
let file = File::open(Path::new(spec_path))?;
return Ok(CliSpecParams {
spec: Some(serde_json::from_reader(file)?),
live_config_allowed: true,
});
}
let Some(compute_id) = compute_id else {
panic!(
"compute spec should be provided by one of the following ways: \
--spec OR --spec-path OR --control-plane-uri and --compute-id"
);
};
let Some(control_plane_uri) = control_plane_uri else {
panic!("must specify both --control-plane-uri and --compute-id or none");
};
Ok(CliSpecParams {
spec,
live_config_allowed,
})
match get_spec_from_control_plane(control_plane_uri, compute_id) {
Ok(spec) => Ok(CliSpecParams {
spec,
live_config_allowed: true,
}),
Err(e) => {
error!(
"cannot get response from control plane: {}\n\
neither spec nor confirmation that compute is in the Empty state was received",
e
);
Err(e)
}
}
}
struct CliSpecParams {
@@ -322,11 +324,19 @@ fn wait_spec(
} else {
spec_set = false;
}
let connstr = Url::parse(connstr).context("cannot parse connstr as a URL")?;
let conn_conf = postgres::config::Config::from_str(connstr.as_str())
.context("cannot build postgres config from connstr")?;
let tokio_conn_conf = tokio_postgres::config::Config::from_str(connstr.as_str())
.context("cannot build tokio postgres config from connstr")?;
let compute_node = ComputeNode {
connstr: Url::parse(connstr).context("cannot parse connstr as a URL")?,
connstr,
conn_conf,
tokio_conn_conf,
pgdata: pgdata.to_string(),
pgbin: pgbin.to_string(),
pgversion: get_pg_version(pgbin),
pgversion: get_pg_version_string(pgbin),
http_port,
live_config_allowed,
state: Mutex::new(new_state),
state_changed: Condvar::new(),
@@ -381,7 +391,6 @@ fn wait_spec(
Ok(WaitSpecResult {
compute,
http_port,
resize_swap_on_bind,
set_disk_quota_for_fs: set_disk_quota_for_fs.cloned(),
})
@@ -389,8 +398,6 @@ fn wait_spec(
struct WaitSpecResult {
compute: Arc<ComputeNode>,
// passed through from ProcessCliResult
http_port: u16,
resize_swap_on_bind: bool,
set_disk_quota_for_fs: Option<String>,
}
@@ -400,7 +407,6 @@ fn start_postgres(
#[allow(unused_variables)] matches: &clap::ArgMatches,
WaitSpecResult {
compute,
http_port,
resize_swap_on_bind,
set_disk_quota_for_fs,
}: WaitSpecResult,
@@ -413,9 +419,14 @@ fn start_postgres(
"running compute with features: {:?}",
state.pspec.as_ref().unwrap().spec.features
);
// before we release the mutex, fetch the swap size (if any) for later.
let swap_size_bytes = state.pspec.as_ref().unwrap().spec.swap_size_bytes;
let disk_quota_bytes = state.pspec.as_ref().unwrap().spec.disk_quota_bytes;
// before we release the mutex, fetch some parameters for later.
let &ComputeSpec {
swap_size_bytes,
disk_quota_bytes,
#[cfg(target_os = "linux")]
disable_lfc_resizing,
..
} = &state.pspec.as_ref().unwrap().spec;
drop(state);
// Launch remaining service threads
@@ -473,13 +484,14 @@ fn start_postgres(
}
}
let extension_server_port: u16 = http_port;
// Start Postgres
let mut pg = None;
if !prestartup_failed {
pg = match compute.start_compute(extension_server_port) {
Ok(pg) => Some(pg),
pg = match compute.start_compute() {
Ok(pg) => {
info!(postmaster_pid = %pg.0.id(), "Postgres was started");
Some(pg)
}
Err(err) => {
error!("could not start the compute node: {:#}", err);
compute.set_failed_status(err);
@@ -522,11 +534,18 @@ fn start_postgres(
// This token is used internally by the monitor to clean up all threads
let token = CancellationToken::new();
// don't pass postgres connection string to vm-monitor if we don't want it to resize LFC
let pgconnstr = if disable_lfc_resizing.unwrap_or(false) {
None
} else {
file_cache_connstr.cloned()
};
let vm_monitor = rt.as_ref().map(|rt| {
rt.spawn(vm_monitor::start(
Box::leak(Box::new(vm_monitor::Args {
cgroup: cgroup.cloned(),
pgconnstr: file_cache_connstr.cloned(),
pgconnstr,
addr: vm_monitor_addr.clone(),
})),
token.clone(),
@@ -570,6 +589,8 @@ fn wait_postgres(pg: Option<PostgresHandle>) -> Result<WaitPostgresResult> {
// propagate to Postgres and it will be shut down as well.
let mut exit_code = None;
if let Some((mut pg, logs_handle)) = pg {
info!(postmaster_pid = %pg.id(), "Waiting for Postgres to exit");
let ecode = pg
.wait()
.expect("failed to start waiting on Postgres process");

View File

@@ -17,11 +17,11 @@
//!
//! # Local Testing
//!
//! - Comment out most of the pgxns in The Dockerfile.compute-tools to speed up the build.
//! - Comment out most of the pgxns in compute-node.Dockerfile to speed up the build.
//! - Build the image with the following command:
//!
//! ```bash
//! docker buildx build --build-arg DEBIAN_FLAVOR=bullseye-slim --build-arg GIT_VERSION=local --build-arg PG_VERSION=v14 --build-arg BUILD_TAG="$(date --iso-8601=s -u)" -t localhost:3030/localregistry/compute-node-v14:latest -f compute/Dockerfile.com
//! docker buildx build --platform linux/amd64 --build-arg DEBIAN_VERSION=bullseye --build-arg GIT_VERSION=local --build-arg PG_VERSION=v14 --build-arg BUILD_TAG="$(date --iso-8601=s -u)" -t localhost:3030/localregistry/compute-node-v14:latest -f compute/compute-node.Dockerfile .
//! docker push localhost:3030/localregistry/compute-node-v14:latest
//! ```
@@ -29,23 +29,31 @@ use anyhow::Context;
use aws_config::BehaviorVersion;
use camino::{Utf8Path, Utf8PathBuf};
use clap::Parser;
use compute_tools::extension_server::{get_pg_version, PostgresMajorVersion};
use nix::unistd::Pid;
use tracing::{info, info_span, warn, Instrument};
use tracing::{error, info, info_span, warn, Instrument};
use utils::fs_ext::is_directory_empty;
#[path = "fast_import/aws_s3_sync.rs"]
mod aws_s3_sync;
#[path = "fast_import/child_stdio_to_log.rs"]
mod child_stdio_to_log;
#[path = "fast_import/s3_uri.rs"]
mod s3_uri;
#[path = "fast_import/s5cmd.rs"]
mod s5cmd;
const PG_WAIT_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(600);
const PG_WAIT_RETRY_INTERVAL: std::time::Duration = std::time::Duration::from_millis(300);
#[derive(clap::Parser)]
struct Args {
#[clap(long)]
working_directory: Utf8PathBuf,
#[clap(long, env = "NEON_IMPORTER_S3_PREFIX")]
s3_prefix: s3_uri::S3Uri,
s3_prefix: Option<s3_uri::S3Uri>,
#[clap(long)]
source_connection_string: Option<String>,
#[clap(short, long)]
interactive: bool,
#[clap(long)]
pg_bin_dir: Utf8PathBuf,
#[clap(long)]
@@ -76,30 +84,70 @@ pub(crate) async fn main() -> anyhow::Result<()> {
info!("starting");
let Args {
working_directory,
s3_prefix,
pg_bin_dir,
pg_lib_dir,
} = Args::parse();
let args = Args::parse();
let aws_config = aws_config::load_defaults(BehaviorVersion::v2024_03_28()).await;
// Validate arguments
if args.s3_prefix.is_none() && args.source_connection_string.is_none() {
anyhow::bail!("either s3_prefix or source_connection_string must be specified");
}
if args.s3_prefix.is_some() && args.source_connection_string.is_some() {
anyhow::bail!("only one of s3_prefix or source_connection_string can be specified");
}
let spec: Spec = {
let spec_key = s3_prefix.append("/spec.json");
let s3_client = aws_sdk_s3::Client::new(&aws_config);
let object = s3_client
.get_object()
.bucket(&spec_key.bucket)
.key(spec_key.key)
.send()
.await
.context("get spec from s3")?
.body
.collect()
.await
.context("download spec body")?;
serde_json::from_slice(&object.into_bytes()).context("parse spec as json")?
let working_directory = args.working_directory;
let pg_bin_dir = args.pg_bin_dir;
let pg_lib_dir = args.pg_lib_dir;
// Initialize AWS clients only if s3_prefix is specified
let (aws_config, kms_client) = if args.s3_prefix.is_some() {
let config = aws_config::load_defaults(BehaviorVersion::v2024_03_28()).await;
let kms = aws_sdk_kms::Client::new(&config);
(Some(config), Some(kms))
} else {
(None, None)
};
// Get source connection string either from S3 spec or direct argument
let source_connection_string = if let Some(s3_prefix) = &args.s3_prefix {
let spec: Spec = {
let spec_key = s3_prefix.append("/spec.json");
let s3_client = aws_sdk_s3::Client::new(aws_config.as_ref().unwrap());
let object = s3_client
.get_object()
.bucket(&spec_key.bucket)
.key(spec_key.key)
.send()
.await
.context("get spec from s3")?
.body
.collect()
.await
.context("download spec body")?;
serde_json::from_slice(&object.into_bytes()).context("parse spec as json")?
};
match spec.encryption_secret {
EncryptionSecret::KMS { key_id } => {
let mut output = kms_client
.unwrap()
.decrypt()
.key_id(key_id)
.ciphertext_blob(aws_sdk_s3::primitives::Blob::new(
spec.source_connstring_ciphertext_base64,
))
.send()
.await
.context("decrypt source connection string")?;
let plaintext = output
.plaintext
.take()
.context("get plaintext source connection string")?;
String::from_utf8(plaintext.into_inner())
.context("parse source connection string as utf8")?
}
}
} else {
args.source_connection_string.unwrap()
};
match tokio::fs::create_dir(&working_directory).await {
@@ -122,20 +170,18 @@ pub(crate) async fn main() -> anyhow::Result<()> {
.await
.context("create pgdata directory")?;
//
// Setup clients
//
let aws_config = aws_config::load_defaults(BehaviorVersion::v2024_03_28()).await;
let kms_client = aws_sdk_kms::Client::new(&aws_config);
//
// Initialize pgdata
//
let pgbin = pg_bin_dir.join("postgres");
let pg_version = match get_pg_version(pgbin.as_ref()) {
PostgresMajorVersion::V14 => 14,
PostgresMajorVersion::V15 => 15,
PostgresMajorVersion::V16 => 16,
PostgresMajorVersion::V17 => 17,
};
let superuser = "cloud_admin"; // XXX: this shouldn't be hard-coded
postgres_initdb::do_run_initdb(postgres_initdb::RunInitdbArgs {
superuser,
locale: "en_US.UTF-8", // XXX: this shouldn't be hard-coded,
pg_version: 140000, // XXX: this shouldn't be hard-coded but derived from which compute image we're running in
pg_version,
initdb_bin: pg_bin_dir.join("initdb").as_ref(),
library_search_path: &pg_lib_dir, // TODO: is this right? Prob works in compute image, not sure about neon_local.
pgdata: &pgdata_dir,
@@ -148,7 +194,7 @@ pub(crate) async fn main() -> anyhow::Result<()> {
//
// Launch postgres process
//
let mut postgres_proc = tokio::process::Command::new(pg_bin_dir.join("postgres"))
let mut postgres_proc = tokio::process::Command::new(pgbin)
.arg("-D")
.arg(&pgdata_dir)
.args(["-c", "wal_level=minimal"])
@@ -162,7 +208,13 @@ pub(crate) async fn main() -> anyhow::Result<()> {
.args(["-c", &format!("max_parallel_workers={nproc}")])
.args(["-c", &format!("max_parallel_workers_per_gather={nproc}")])
.args(["-c", &format!("max_worker_processes={nproc}")])
.args(["-c", "effective_io_concurrency=100"])
.args([
"-c",
&format!(
"effective_io_concurrency={}",
if cfg!(target_os = "macos") { 0 } else { 100 }
),
])
.env_clear()
.stdout(std::process::Stdio::piped())
.stderr(std::process::Stdio::piped())
@@ -177,44 +229,58 @@ pub(crate) async fn main() -> anyhow::Result<()> {
)
.instrument(info_span!("postgres")),
);
// Create neondb database in the running postgres
let restore_pg_connstring =
format!("host=localhost port=5432 user={superuser} dbname=postgres");
let start_time = std::time::Instant::now();
loop {
let res = tokio_postgres::connect(&restore_pg_connstring, tokio_postgres::NoTls).await;
if res.is_ok() {
info!("postgres is ready, could connect to it");
break;
if start_time.elapsed() > PG_WAIT_TIMEOUT {
error!(
"timeout exceeded: failed to poll postgres and create database within 10 minutes"
);
std::process::exit(1);
}
match tokio_postgres::connect(&restore_pg_connstring, tokio_postgres::NoTls).await {
Ok((client, connection)) => {
// Spawn the connection handling task to maintain the connection
tokio::spawn(async move {
if let Err(e) = connection.await {
warn!("connection error: {}", e);
}
});
match client.simple_query("CREATE DATABASE neondb;").await {
Ok(_) => {
info!("created neondb database");
break;
}
Err(e) => {
warn!(
"failed to create database: {}, retying in {}s",
e,
PG_WAIT_RETRY_INTERVAL.as_secs_f32()
);
tokio::time::sleep(PG_WAIT_RETRY_INTERVAL).await;
continue;
}
}
}
Err(_) => {
info!(
"postgres not ready yet, retrying in {}s",
PG_WAIT_RETRY_INTERVAL.as_secs_f32()
);
tokio::time::sleep(PG_WAIT_RETRY_INTERVAL).await;
continue;
}
}
}
//
// Decrypt connection string
//
let source_connection_string = {
match spec.encryption_secret {
EncryptionSecret::KMS { key_id } => {
let mut output = kms_client
.decrypt()
.key_id(key_id)
.ciphertext_blob(aws_sdk_s3::primitives::Blob::new(
spec.source_connstring_ciphertext_base64,
))
.send()
.await
.context("decrypt source connection string")?;
let plaintext = output
.plaintext
.take()
.context("get plaintext source connection string")?;
String::from_utf8(plaintext.into_inner())
.context("parse source connection string as utf8")?
}
}
};
//
// Start the work
//
let restore_pg_connstring = restore_pg_connstring.replace("dbname=postgres", "dbname=neondb");
let dumpdir = working_directory.join("dumpdir");
@@ -302,6 +368,12 @@ pub(crate) async fn main() -> anyhow::Result<()> {
}
}
// If interactive mode, wait for Ctrl+C
if args.interactive {
info!("Running in interactive mode. Press Ctrl+C to shut down.");
tokio::signal::ctrl_c().await.context("wait for ctrl-c")?;
}
info!("shutdown postgres");
{
nix::sys::signal::kill(
@@ -317,21 +389,24 @@ pub(crate) async fn main() -> anyhow::Result<()> {
.context("wait for postgres to shut down")?;
}
info!("upload pgdata");
s5cmd::sync(Utf8Path::new(&pgdata_dir), &s3_prefix.append("/"))
.await
.context("sync dump directory to destination")?;
info!("write status");
{
let status_dir = working_directory.join("status");
std::fs::create_dir(&status_dir).context("create status directory")?;
let status_file = status_dir.join("status");
std::fs::write(&status_file, serde_json::json!({"done": true}).to_string())
.context("write status file")?;
s5cmd::sync(&status_file, &s3_prefix.append("/status/pgdata"))
// Only sync if s3_prefix was specified
if let Some(s3_prefix) = args.s3_prefix {
info!("upload pgdata");
aws_s3_sync::sync(Utf8Path::new(&pgdata_dir), &s3_prefix.append("/pgdata/"))
.await
.context("sync status directory to destination")?;
.context("sync dump directory to destination")?;
info!("write status");
{
let status_dir = working_directory.join("status");
std::fs::create_dir(&status_dir).context("create status directory")?;
let status_file = status_dir.join("pgdata");
std::fs::write(&status_file, serde_json::json!({"done": true}).to_string())
.context("write status file")?;
aws_s3_sync::sync(&status_dir, &s3_prefix.append("/status/"))
.await
.context("sync status directory to destination")?;
}
}
Ok(())

View File

@@ -4,24 +4,21 @@ use camino::Utf8Path;
use super::s3_uri::S3Uri;
pub(crate) async fn sync(local: &Utf8Path, remote: &S3Uri) -> anyhow::Result<()> {
let mut builder = tokio::process::Command::new("s5cmd");
// s5cmd uses aws-sdk-go v1, hence doesn't support AWS_ENDPOINT_URL
if let Some(val) = std::env::var_os("AWS_ENDPOINT_URL") {
builder.arg("--endpoint-url").arg(val);
}
let mut builder = tokio::process::Command::new("aws");
builder
.arg("s3")
.arg("sync")
.arg(local.as_str())
.arg(remote.to_string());
let st = builder
.spawn()
.context("spawn s5cmd")?
.context("spawn aws s3 sync")?
.wait()
.await
.context("wait for s5cmd")?;
.context("wait for aws s3 sync")?;
if st.success() {
Ok(())
} else {
Err(anyhow::anyhow!("s5cmd failed"))
Err(anyhow::anyhow!("aws s3 sync failed"))
}
}

View File

@@ -1,4 +1,3 @@
use compute_api::responses::CatalogObjects;
use futures::Stream;
use postgres::NoTls;
use std::{path::Path, process::Stdio, result::Result, sync::Arc};
@@ -7,19 +6,17 @@ use tokio::{
process::Command,
spawn,
};
use tokio_postgres::connect;
use tokio_stream::{self as stream, StreamExt};
use tokio_util::codec::{BytesCodec, FramedRead};
use tracing::warn;
use crate::compute::ComputeNode;
use crate::pg_helpers::{get_existing_dbs_async, get_existing_roles_async};
use crate::pg_helpers::{get_existing_dbs_async, get_existing_roles_async, postgres_conf_for_db};
use compute_api::responses::CatalogObjects;
pub async fn get_dbs_and_roles(compute: &Arc<ComputeNode>) -> anyhow::Result<CatalogObjects> {
let connstr = compute.connstr.clone();
let (client, connection): (tokio_postgres::Client, _) =
connect(connstr.as_str(), NoTls).await?;
let conf = compute.get_tokio_conn_conf(Some("compute_ctl:get_dbs_and_roles"));
let (client, connection): (tokio_postgres::Client, _) = conf.connect(NoTls).await?;
spawn(async move {
if let Err(e) = connection.await {
@@ -39,10 +36,12 @@ pub async fn get_dbs_and_roles(compute: &Arc<ComputeNode>) -> anyhow::Result<Cat
#[derive(Debug, thiserror::Error)]
pub enum SchemaDumpError {
#[error("Database does not exist.")]
#[error("database does not exist")]
DatabaseDoesNotExist,
#[error("Failed to execute pg_dump.")]
#[error("failed to execute pg_dump")]
IO(#[from] std::io::Error),
#[error("unexpected I/O error")]
Unexpected,
}
// It uses the pg_dump utility to dump the schema of the specified database.
@@ -60,11 +59,38 @@ pub async fn get_database_schema(
let pgbin = &compute.pgbin;
let basepath = Path::new(pgbin).parent().unwrap();
let pgdump = basepath.join("pg_dump");
let mut connstr = compute.connstr.clone();
connstr.set_path(dbname);
// Replace the DB in the connection string and disable it to parts.
// This is the only option to handle DBs with special characters.
let conf =
postgres_conf_for_db(&compute.connstr, dbname).map_err(|_| SchemaDumpError::Unexpected)?;
let host = conf
.get_hosts()
.first()
.ok_or(SchemaDumpError::Unexpected)?;
let host = match host {
tokio_postgres::config::Host::Tcp(ip) => ip.to_string(),
#[cfg(unix)]
tokio_postgres::config::Host::Unix(path) => path.to_string_lossy().to_string(),
};
let port = conf
.get_ports()
.first()
.ok_or(SchemaDumpError::Unexpected)?;
let user = conf.get_user().ok_or(SchemaDumpError::Unexpected)?;
let dbname = conf.get_dbname().ok_or(SchemaDumpError::Unexpected)?;
let mut cmd = Command::new(pgdump)
// XXX: this seems to be the only option to deal with DBs with `=` in the name
// See <https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org>
.env("PGDATABASE", dbname)
.arg("--host")
.arg(host)
.arg("--port")
.arg(port.to_string())
.arg("--username")
.arg(user)
.arg("--schema-only")
.arg(connstr.as_str())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.kill_on_drop(true)

View File

@@ -9,7 +9,8 @@ use crate::compute::ComputeNode;
#[instrument(skip_all)]
pub async fn check_writability(compute: &ComputeNode) -> Result<()> {
// Connect to the database.
let (client, connection) = tokio_postgres::connect(compute.connstr.as_str(), NoTls).await?;
let conf = compute.get_tokio_conn_conf(Some("compute_ctl:availability_checker"));
let (client, connection) = conf.connect(NoTls).await?;
if client.is_closed() {
return Err(anyhow!("connection to postgres closed"));
}

View File

@@ -15,13 +15,14 @@ use std::time::Instant;
use anyhow::{Context, Result};
use chrono::{DateTime, Utc};
use compute_api::spec::{PgIdent, Role};
use compute_api::spec::{Database, PgIdent, Role};
use futures::future::join_all;
use futures::stream::FuturesUnordered;
use futures::StreamExt;
use nix::unistd::Pid;
use postgres;
use postgres::error::SqlState;
use postgres::{Client, NoTls};
use postgres::NoTls;
use tracing::{debug, error, info, instrument, warn};
use utils::id::{TenantId, TimelineId};
use utils::lsn::Lsn;
@@ -34,9 +35,8 @@ use utils::measured_stream::MeasuredReader;
use nix::sys::signal::{kill, Signal};
use remote_storage::{DownloadError, RemotePath};
use tokio::spawn;
use url::Url;
use crate::installed_extensions::get_installed_extensions_sync;
use crate::installed_extensions::get_installed_extensions;
use crate::local_proxy;
use crate::pg_helpers::*;
use crate::spec::*;
@@ -45,8 +45,10 @@ use crate::spec_apply::ApplySpecPhase::{
DropInvalidDatabases, DropRoles, HandleNeonExtension, HandleOtherExtensions,
RenameAndDeleteDatabases, RenameRoles, RunInEachDatabase,
};
use crate::spec_apply::PerDatabasePhase;
use crate::spec_apply::PerDatabasePhase::{
ChangeSchemaPerms, DeleteDBRoleReferences, HandleAnonExtension,
ChangeSchemaPerms, DeleteDBRoleReferences, DropSubscriptionsForDeletedDatabases,
HandleAnonExtension,
};
use crate::spec_apply::{apply_operations, MutableApplyContext, DB};
use crate::sync_sk::{check_if_synced, ping_safekeeper};
@@ -59,6 +61,10 @@ pub static PG_PID: AtomicU32 = AtomicU32::new(0);
pub struct ComputeNode {
// Url type maintains proper escaping
pub connstr: url::Url,
// We connect to Postgres from many different places, so build configs once
// and reuse them where needed.
pub conn_conf: postgres::config::Config,
pub tokio_conn_conf: tokio_postgres::config::Config,
pub pgdata: String,
pub pgbin: String,
pub pgversion: String,
@@ -75,6 +81,8 @@ pub struct ComputeNode {
/// - we push spec and it does configuration
/// - but then it is restarted without any spec again
pub live_config_allowed: bool,
/// The port that the compute's HTTP server listens on
pub http_port: u16,
/// Volatile part of the `ComputeNode`, which should be used under `Mutex`.
/// To allow HTTP API server to serving status requests, while configuration
/// is in progress, lock should be held only for short periods of time to do
@@ -607,11 +615,7 @@ impl ComputeNode {
/// Do all the preparations like PGDATA directory creation, configuration,
/// safekeepers sync, basebackup, etc.
#[instrument(skip_all)]
pub fn prepare_pgdata(
&self,
compute_state: &ComputeState,
extension_server_port: u16,
) -> Result<()> {
pub fn prepare_pgdata(&self, compute_state: &ComputeState) -> Result<()> {
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
let spec = &pspec.spec;
let pgdata_path = Path::new(&self.pgdata);
@@ -621,7 +625,7 @@ impl ComputeNode {
config::write_postgres_conf(
&pgdata_path.join("postgresql.conf"),
&pspec.spec,
Some(extension_server_port),
self.http_port,
)?;
// Syncing safekeepers is only safe with primary nodes: if a primary
@@ -801,10 +805,10 @@ impl ComputeNode {
/// version. In the future, it may upgrade all 3rd-party extensions.
#[instrument(skip_all)]
pub fn post_apply_config(&self) -> Result<()> {
let connstr = self.connstr.clone();
let conf = self.get_conn_conf(Some("compute_ctl:post_apply_config"));
thread::spawn(move || {
let func = || {
let mut client = Client::connect(connstr.as_str(), NoTls)?;
let mut client = conf.connect(NoTls)?;
handle_neon_extension_upgrade(&mut client)
.context("handle_neon_extension_upgrade")?;
Ok::<_, anyhow::Error>(())
@@ -816,30 +820,48 @@ impl ComputeNode {
Ok(())
}
async fn get_maintenance_client(url: &Url) -> Result<tokio_postgres::Client> {
let mut connstr = url.clone();
pub fn get_conn_conf(&self, application_name: Option<&str>) -> postgres::Config {
let mut conf = self.conn_conf.clone();
if let Some(application_name) = application_name {
conf.application_name(application_name);
}
conf
}
connstr
.query_pairs_mut()
.append_pair("application_name", "apply_config");
pub fn get_tokio_conn_conf(&self, application_name: Option<&str>) -> tokio_postgres::Config {
let mut conf = self.tokio_conn_conf.clone();
if let Some(application_name) = application_name {
conf.application_name(application_name);
}
conf
}
let (client, conn) = match tokio_postgres::connect(connstr.as_str(), NoTls).await {
pub async fn get_maintenance_client(
conf: &tokio_postgres::Config,
) -> Result<tokio_postgres::Client> {
let mut conf = conf.clone();
conf.application_name("compute_ctl:apply_config");
let (client, conn) = match conf.connect(NoTls).await {
// If connection fails, it may be the old node with `zenith_admin` superuser.
//
// In this case we need to connect with old `zenith_admin` name
// and create new user. We cannot simply rename connected user,
// but we can create a new one and grant it all privileges.
Err(e) => match e.code() {
Some(&SqlState::INVALID_PASSWORD)
| Some(&SqlState::INVALID_AUTHORIZATION_SPECIFICATION) => {
// connect with zenith_admin if cloud_admin could not authenticate
// Connect with zenith_admin if cloud_admin could not authenticate
info!(
"cannot connect to postgres: {}, retrying with `zenith_admin` username",
e
);
let mut zenith_admin_connstr = connstr.clone();
zenith_admin_connstr
.set_username("zenith_admin")
.map_err(|_| anyhow::anyhow!("invalid connstr"))?;
let mut zenith_admin_conf = postgres::config::Config::from(conf.clone());
zenith_admin_conf.application_name("compute_ctl:apply_config");
zenith_admin_conf.user("zenith_admin");
let mut client =
Client::connect(zenith_admin_connstr.as_str(), NoTls)
zenith_admin_conf.connect(NoTls)
.context("broken cloud_admin credential: tried connecting with cloud_admin but could not authenticate, and zenith_admin does not work either")?;
// Disable forwarding so that users don't get a cloud_admin role
@@ -853,8 +875,8 @@ impl ComputeNode {
drop(client);
// reconnect with connstring with expected name
tokio_postgres::connect(connstr.as_str(), NoTls).await?
// Reconnect with connstring with expected name
conf.connect(NoTls).await?
}
_ => return Err(e.into()),
},
@@ -885,7 +907,7 @@ impl ComputeNode {
pub fn apply_spec_sql(
&self,
spec: Arc<ComputeSpec>,
url: Arc<Url>,
conf: Arc<tokio_postgres::Config>,
concurrency: usize,
) -> Result<()> {
let rt = tokio::runtime::Builder::new_multi_thread()
@@ -897,7 +919,7 @@ impl ComputeNode {
rt.block_on(async {
// Proceed with post-startup configuration. Note, that order of operations is important.
let client = Self::get_maintenance_client(&url).await?;
let client = Self::get_maintenance_client(&conf).await?;
let spec = spec.clone();
let databases = get_existing_dbs_async(&client).await?;
@@ -923,6 +945,78 @@ impl ComputeNode {
dbs: databases,
}));
// Apply special pre drop database phase.
// NOTE: we use the code of RunInEachDatabase phase for parallelism
// and connection management, but we don't really run it in *each* database,
// only in databases, we're about to drop.
info!("Applying PerDatabase (pre-dropdb) phase");
let concurrency_token = Arc::new(tokio::sync::Semaphore::new(concurrency));
// Run the phase for each database that we're about to drop.
let db_processes = spec
.delta_operations
.iter()
.flatten()
.filter_map(move |op| {
if op.action.as_str() == "delete_db" {
Some(op.name.clone())
} else {
None
}
})
.map(|dbname| {
let spec = spec.clone();
let ctx = ctx.clone();
let jwks_roles = jwks_roles.clone();
let mut conf = conf.as_ref().clone();
let concurrency_token = concurrency_token.clone();
// We only need dbname field for this phase, so set other fields to dummy values
let db = DB::UserDB(Database {
name: dbname.clone(),
owner: "cloud_admin".to_string(),
options: None,
restrict_conn: false,
invalid: false,
});
debug!("Applying per-database phases for Database {:?}", &db);
match &db {
DB::SystemDB => {}
DB::UserDB(db) => {
conf.dbname(db.name.as_str());
}
}
let conf = Arc::new(conf);
let fut = Self::apply_spec_sql_db(
spec.clone(),
conf,
ctx.clone(),
jwks_roles.clone(),
concurrency_token.clone(),
db,
[DropSubscriptionsForDeletedDatabases].to_vec(),
);
Ok(spawn(fut))
})
.collect::<Vec<Result<_, anyhow::Error>>>();
for process in db_processes.into_iter() {
let handle = process?;
if let Err(e) = handle.await? {
// Handle the error case where the database does not exist
// We do not check whether the DB exists or not in the deletion phase,
// so we shouldn't be strict about it in pre-deletion cleanup as well.
if e.to_string().contains("does not exist") {
warn!("Error dropping subscription: {}", e);
} else {
return Err(e);
}
};
}
for phase in [
CreateSuperUser,
DropInvalidDatabases,
@@ -931,7 +1025,7 @@ impl ComputeNode {
RenameAndDeleteDatabases,
CreateAndAlterDatabases,
] {
debug!("Applying phase {:?}", &phase);
info!("Applying phase {:?}", &phase);
apply_operations(
spec.clone(),
ctx.clone(),
@@ -942,6 +1036,7 @@ impl ComputeNode {
.await?;
}
info!("Applying RunInEachDatabase2 phase");
let concurrency_token = Arc::new(tokio::sync::Semaphore::new(concurrency));
let db_processes = spec
@@ -955,7 +1050,7 @@ impl ComputeNode {
let spec = spec.clone();
let ctx = ctx.clone();
let jwks_roles = jwks_roles.clone();
let mut url = url.as_ref().clone();
let mut conf = conf.as_ref().clone();
let concurrency_token = concurrency_token.clone();
let db = db.clone();
@@ -964,18 +1059,24 @@ impl ComputeNode {
match &db {
DB::SystemDB => {}
DB::UserDB(db) => {
url.set_path(db.name.as_str());
conf.dbname(db.name.as_str());
}
}
let url = Arc::new(url);
let conf = Arc::new(conf);
let fut = Self::apply_spec_sql_db(
spec.clone(),
url,
conf,
ctx.clone(),
jwks_roles.clone(),
concurrency_token.clone(),
db,
[
DeleteDBRoleReferences,
ChangeSchemaPerms,
HandleAnonExtension,
]
.to_vec(),
);
Ok(spawn(fut))
@@ -1017,21 +1118,18 @@ impl ComputeNode {
/// semaphore. The caller has to make sure the semaphore isn't exhausted.
async fn apply_spec_sql_db(
spec: Arc<ComputeSpec>,
url: Arc<Url>,
conf: Arc<tokio_postgres::Config>,
ctx: Arc<tokio::sync::RwLock<MutableApplyContext>>,
jwks_roles: Arc<HashSet<String>>,
concurrency_token: Arc<tokio::sync::Semaphore>,
db: DB,
subphases: Vec<PerDatabasePhase>,
) -> Result<()> {
let _permit = concurrency_token.acquire().await?;
let mut client_conn = None;
for subphase in [
DeleteDBRoleReferences,
ChangeSchemaPerms,
HandleAnonExtension,
] {
for subphase in subphases {
apply_operations(
spec.clone(),
ctx.clone(),
@@ -1046,7 +1144,7 @@ impl ComputeNode {
// that database.
|| async {
if client_conn.is_none() {
let db_client = Self::get_maintenance_client(&url).await?;
let db_client = Self::get_maintenance_client(&conf).await?;
client_conn.replace(db_client);
}
let client = client_conn.as_ref().unwrap();
@@ -1061,34 +1159,16 @@ impl ComputeNode {
Ok::<(), anyhow::Error>(())
}
/// Do initial configuration of the already started Postgres.
#[instrument(skip_all)]
pub fn apply_config(&self, compute_state: &ComputeState) -> Result<()> {
// If connection fails,
// it may be the old node with `zenith_admin` superuser.
//
// In this case we need to connect with old `zenith_admin` name
// and create new user. We cannot simply rename connected user,
// but we can create a new one and grant it all privileges.
let mut url = self.connstr.clone();
url.query_pairs_mut()
.append_pair("application_name", "apply_config");
let url = Arc::new(url);
let spec = Arc::new(
compute_state
.pspec
.as_ref()
.expect("spec must be set")
.spec
.clone(),
);
// Choose how many concurrent connections to use for applying the spec changes.
// If the cluster is not currently Running we don't have to deal with user connections,
/// Choose how many concurrent connections to use for applying the spec changes.
pub fn max_service_connections(
&self,
compute_state: &ComputeState,
spec: &ComputeSpec,
) -> usize {
// If the cluster is in Init state we don't have to deal with user connections,
// and can thus use all `max_connections` connection slots. However, that's generally not
// very efficient, so we generally still limit it to a smaller number.
let max_concurrent_connections = if compute_state.status != ComputeStatus::Running {
if compute_state.status == ComputeStatus::Init {
// If the settings contain 'max_connections', use that as template
if let Some(config) = spec.cluster.settings.find("max_connections") {
config.parse::<usize>().ok()
@@ -1144,10 +1224,28 @@ impl ComputeNode {
.map(|val| if val > 1 { val - 1 } else { 1 })
.last()
.unwrap_or(3)
};
}
}
/// Do initial configuration of the already started Postgres.
#[instrument(skip_all)]
pub fn apply_config(&self, compute_state: &ComputeState) -> Result<()> {
let conf = self.get_tokio_conn_conf(Some("compute_ctl:apply_config"));
let conf = Arc::new(conf);
let spec = Arc::new(
compute_state
.pspec
.as_ref()
.expect("spec must be set")
.spec
.clone(),
);
let max_concurrent_connections = self.max_service_connections(compute_state, &spec);
// Merge-apply spec & changes to PostgreSQL state.
self.apply_spec_sql(spec.clone(), url.clone(), max_concurrent_connections)?;
self.apply_spec_sql(spec.clone(), conf.clone(), max_concurrent_connections)?;
if let Some(ref local_proxy) = &spec.clone().local_proxy_config {
info!("configuring local_proxy");
@@ -1156,13 +1254,23 @@ impl ComputeNode {
// Run migrations separately to not hold up cold starts
thread::spawn(move || {
let mut connstr = url.as_ref().clone();
connstr
.query_pairs_mut()
.append_pair("application_name", "migrations");
let conf = conf.as_ref().clone();
let mut conf = postgres::config::Config::from(conf);
conf.application_name("compute_ctl:migrations");
let mut client = Client::connect(connstr.as_str(), NoTls)?;
handle_migrations(&mut client).context("apply_config handle_migrations")
match conf.connect(NoTls) {
Ok(mut client) => {
if let Err(e) = handle_migrations(&mut client) {
error!("Failed to run migrations: {}", e);
}
}
Err(e) => {
error!(
"Failed to connect to the compute for running migrations: {}",
e
);
}
};
});
Ok::<(), anyhow::Error>(())
@@ -1221,22 +1329,24 @@ impl ComputeNode {
// Write new config
let pgdata_path = Path::new(&self.pgdata);
let postgresql_conf_path = pgdata_path.join("postgresql.conf");
config::write_postgres_conf(&postgresql_conf_path, &spec, None)?;
// temporarily reset max_cluster_size in config
config::write_postgres_conf(&postgresql_conf_path, &spec, self.http_port)?;
let max_concurrent_connections = spec.reconfigure_concurrency;
// Temporarily reset max_cluster_size in config
// to avoid the possibility of hitting the limit, while we are reconfiguring:
// creating new extensions, roles, etc...
// creating new extensions, roles, etc.
config::with_compute_ctl_tmp_override(pgdata_path, "neon.max_cluster_size=-1", || {
self.pg_reload_conf()?;
if spec.mode == ComputeMode::Primary {
let mut url = self.connstr.clone();
url.query_pairs_mut()
.append_pair("application_name", "apply_config");
let url = Arc::new(url);
let mut conf = tokio_postgres::Config::from_str(self.connstr.as_str()).unwrap();
conf.application_name("apply_config");
let conf = Arc::new(conf);
let spec = Arc::new(spec.clone());
self.apply_spec_sql(spec, url, 1)?;
self.apply_spec_sql(spec, conf, max_concurrent_connections)?;
}
Ok(())
@@ -1255,10 +1365,7 @@ impl ComputeNode {
}
#[instrument(skip_all)]
pub fn start_compute(
&self,
extension_server_port: u16,
) -> Result<(std::process::Child, std::thread::JoinHandle<()>)> {
pub fn start_compute(&self) -> Result<(std::process::Child, std::thread::JoinHandle<()>)> {
let compute_state = self.state.lock().unwrap().clone();
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
info!(
@@ -1333,7 +1440,7 @@ impl ComputeNode {
info!("{:?}", remote_ext_metrics);
}
self.prepare_pgdata(&compute_state, extension_server_port)?;
self.prepare_pgdata(&compute_state)?;
let start_time = Utc::now();
let pg_process = self.start_postgres(pspec.storage_auth_token.clone())?;
@@ -1360,9 +1467,19 @@ impl ComputeNode {
}
self.post_apply_config()?;
let connstr = self.connstr.clone();
let conf = self.get_conn_conf(None);
thread::spawn(move || {
get_installed_extensions_sync(connstr).context("get_installed_extensions")
let res = get_installed_extensions(conf);
match res {
Ok(extensions) => {
info!(
"[NEON_EXT_STAT] {}",
serde_json::to_string(&extensions)
.expect("failed to serialize extensions list")
);
}
Err(err) => error!("could not get installed extensions: {err:?}"),
}
});
}
@@ -1491,7 +1608,8 @@ impl ComputeNode {
/// Select `pg_stat_statements` data and return it as a stringified JSON
pub async fn collect_insights(&self) -> String {
let mut result_rows: Vec<String> = Vec::new();
let connect_result = tokio_postgres::connect(self.connstr.as_str(), NoTls).await;
let conf = self.get_tokio_conn_conf(Some("compute_ctl:collect_insights"));
let connect_result = conf.connect(NoTls).await;
let (client, connection) = connect_result.unwrap();
tokio::spawn(async move {
if let Err(e) = connection.await {
@@ -1617,10 +1735,9 @@ LIMIT 100",
privileges: &[Privilege],
role_name: &PgIdent,
) -> Result<()> {
use tokio_postgres::config::Config;
use tokio_postgres::NoTls;
let mut conf = Config::from_str(self.connstr.as_str()).unwrap();
let mut conf = self.get_tokio_conn_conf(Some("compute_ctl:set_role_grants"));
conf.dbname(db_name);
let (db_client, conn) = conf
@@ -1657,10 +1774,9 @@ LIMIT 100",
db_name: &PgIdent,
ext_version: ExtVersion,
) -> Result<ExtVersion> {
use tokio_postgres::config::Config;
use tokio_postgres::NoTls;
let mut conf = Config::from_str(self.connstr.as_str()).unwrap();
let mut conf = self.get_tokio_conn_conf(Some("compute_ctl:install_extension"));
conf.dbname(db_name);
let (db_client, conn) = conf

View File

@@ -37,7 +37,7 @@ pub fn line_in_file(path: &Path, line: &str) -> Result<bool> {
pub fn write_postgres_conf(
path: &Path,
spec: &ComputeSpec,
extension_server_port: Option<u16>,
extension_server_port: u16,
) -> Result<()> {
// File::create() destroys the file content if it exists.
let mut file = File::create(path)?;
@@ -127,9 +127,7 @@ pub fn write_postgres_conf(
writeln!(file, "# Managed by compute_ctl: end")?;
}
if let Some(port) = extension_server_port {
writeln!(file, "neon.extension_server_port={}", port)?;
}
writeln!(file, "neon.extension_server_port={}", extension_server_port)?;
// This is essential to keep this line at the end of the file,
// because it is intended to override any settings above.

View File

@@ -103,14 +103,33 @@ fn get_pg_config(argument: &str, pgbin: &str) -> String {
.to_string()
}
pub fn get_pg_version(pgbin: &str) -> String {
pub fn get_pg_version(pgbin: &str) -> PostgresMajorVersion {
// pg_config --version returns a (platform specific) human readable string
// such as "PostgreSQL 15.4". We parse this to v14/v15/v16 etc.
let human_version = get_pg_config("--version", pgbin);
parse_pg_version(&human_version).to_string()
parse_pg_version(&human_version)
}
fn parse_pg_version(human_version: &str) -> &str {
pub fn get_pg_version_string(pgbin: &str) -> String {
match get_pg_version(pgbin) {
PostgresMajorVersion::V14 => "v14",
PostgresMajorVersion::V15 => "v15",
PostgresMajorVersion::V16 => "v16",
PostgresMajorVersion::V17 => "v17",
}
.to_owned()
}
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum PostgresMajorVersion {
V14,
V15,
V16,
V17,
}
fn parse_pg_version(human_version: &str) -> PostgresMajorVersion {
use PostgresMajorVersion::*;
// Normal releases have version strings like "PostgreSQL 15.4". But there
// are also pre-release versions like "PostgreSQL 17devel" or "PostgreSQL
// 16beta2" or "PostgreSQL 17rc1". And with the --with-extra-version
@@ -121,10 +140,10 @@ fn parse_pg_version(human_version: &str) -> &str {
.captures(human_version)
{
Some(captures) if captures.len() == 2 => match &captures["major"] {
"14" => return "v14",
"15" => return "v15",
"16" => return "v16",
"17" => return "v17",
"14" => return V14,
"15" => return V15,
"16" => return V16,
"17" => return V17,
_ => {}
},
_ => {}
@@ -263,24 +282,25 @@ mod tests {
#[test]
fn test_parse_pg_version() {
assert_eq!(parse_pg_version("PostgreSQL 15.4"), "v15");
assert_eq!(parse_pg_version("PostgreSQL 15.14"), "v15");
use super::PostgresMajorVersion::*;
assert_eq!(parse_pg_version("PostgreSQL 15.4"), V15);
assert_eq!(parse_pg_version("PostgreSQL 15.14"), V15);
assert_eq!(
parse_pg_version("PostgreSQL 15.4 (Ubuntu 15.4-0ubuntu0.23.04.1)"),
"v15"
V15
);
assert_eq!(parse_pg_version("PostgreSQL 14.15"), "v14");
assert_eq!(parse_pg_version("PostgreSQL 14.0"), "v14");
assert_eq!(parse_pg_version("PostgreSQL 14.15"), V14);
assert_eq!(parse_pg_version("PostgreSQL 14.0"), V14);
assert_eq!(
parse_pg_version("PostgreSQL 14.9 (Debian 14.9-1.pgdg120+1"),
"v14"
V14
);
assert_eq!(parse_pg_version("PostgreSQL 16devel"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16beta1"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16rc2"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16extra"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16devel"), V16);
assert_eq!(parse_pg_version("PostgreSQL 16beta1"), V16);
assert_eq!(parse_pg_version("PostgreSQL 16rc2"), V16);
assert_eq!(parse_pg_version("PostgreSQL 16extra"), V16);
}
#[test]

View File

@@ -1,587 +0,0 @@
use std::convert::Infallible;
use std::net::IpAddr;
use std::net::Ipv6Addr;
use std::net::SocketAddr;
use std::sync::Arc;
use std::thread;
use crate::catalog::SchemaDumpError;
use crate::catalog::{get_database_schema, get_dbs_and_roles};
use crate::compute::forward_termination_signal;
use crate::compute::{ComputeNode, ComputeState, ParsedSpec};
use crate::installed_extensions;
use compute_api::requests::{ConfigurationRequest, ExtensionInstallRequest, SetRoleGrantsRequest};
use compute_api::responses::{
ComputeStatus, ComputeStatusResponse, ExtensionInstallResult, GenericAPIError,
SetRoleGrantsResponse,
};
use anyhow::Result;
use hyper::header::CONTENT_TYPE;
use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Method, Request, Response, Server, StatusCode};
use metrics::proto::MetricFamily;
use metrics::Encoder;
use metrics::TextEncoder;
use tokio::task;
use tracing::{debug, error, info, warn};
use tracing_utils::http::OtelName;
use utils::http::request::must_get_query_param;
fn status_response_from_state(state: &ComputeState) -> ComputeStatusResponse {
ComputeStatusResponse {
start_time: state.start_time,
tenant: state
.pspec
.as_ref()
.map(|pspec| pspec.tenant_id.to_string()),
timeline: state
.pspec
.as_ref()
.map(|pspec| pspec.timeline_id.to_string()),
status: state.status,
last_active: state.last_active,
error: state.error.clone(),
}
}
// Service function to handle all available routes.
async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body> {
//
// NOTE: The URI path is currently included in traces. That's OK because
// it doesn't contain any variable parts or sensitive information. But
// please keep that in mind if you change the routing here.
//
match (req.method(), req.uri().path()) {
// Serialized compute state.
(&Method::GET, "/status") => {
debug!("serving /status GET request");
let state = compute.state.lock().unwrap();
let status_response = status_response_from_state(&state);
Response::new(Body::from(serde_json::to_string(&status_response).unwrap()))
}
// Startup metrics in JSON format. Keep /metrics reserved for a possible
// future use for Prometheus metrics format.
(&Method::GET, "/metrics.json") => {
info!("serving /metrics.json GET request");
let metrics = compute.state.lock().unwrap().metrics.clone();
Response::new(Body::from(serde_json::to_string(&metrics).unwrap()))
}
// Prometheus metrics
(&Method::GET, "/metrics") => {
debug!("serving /metrics GET request");
// When we call TextEncoder::encode() below, it will immediately
// return an error if a metric family has no metrics, so we need to
// preemptively filter out metric families with no metrics.
let metrics = installed_extensions::collect()
.into_iter()
.filter(|m| !m.get_metric().is_empty())
.collect::<Vec<MetricFamily>>();
let encoder = TextEncoder::new();
let mut buffer = vec![];
if let Err(err) = encoder.encode(&metrics, &mut buffer) {
let msg = format!("error handling /metrics request: {err}");
error!(msg);
return render_json_error(&msg, StatusCode::INTERNAL_SERVER_ERROR);
}
match Response::builder()
.status(StatusCode::OK)
.header(CONTENT_TYPE, encoder.format_type())
.body(Body::from(buffer))
{
Ok(response) => response,
Err(err) => {
let msg = format!("error handling /metrics request: {err}");
error!(msg);
render_json_error(&msg, StatusCode::INTERNAL_SERVER_ERROR)
}
}
}
// Collect Postgres current usage insights
(&Method::GET, "/insights") => {
info!("serving /insights GET request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!("compute is not running, current status: {:?}", status);
error!(msg);
return Response::new(Body::from(msg));
}
let insights = compute.collect_insights().await;
Response::new(Body::from(insights))
}
(&Method::POST, "/check_writability") => {
info!("serving /check_writability POST request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for check_writability request: {:?}",
status
);
error!(msg);
return Response::new(Body::from(msg));
}
let res = crate::checker::check_writability(compute).await;
match res {
Ok(_) => Response::new(Body::from("true")),
Err(e) => {
error!("check_writability failed: {}", e);
Response::new(Body::from(e.to_string()))
}
}
}
(&Method::POST, "/extensions") => {
info!("serving /extensions POST request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for extensions request: {:?}",
status
);
error!(msg);
return render_json_error(&msg, StatusCode::PRECONDITION_FAILED);
}
let request = hyper::body::to_bytes(req.into_body()).await.unwrap();
let request = serde_json::from_slice::<ExtensionInstallRequest>(&request).unwrap();
let res = compute
.install_extension(&request.extension, &request.database, request.version)
.await;
match res {
Ok(version) => render_json(Body::from(
serde_json::to_string(&ExtensionInstallResult {
extension: request.extension,
version,
})
.unwrap(),
)),
Err(e) => {
error!("install_extension failed: {}", e);
render_json_error(&e.to_string(), StatusCode::INTERNAL_SERVER_ERROR)
}
}
}
(&Method::GET, "/info") => {
let num_cpus = num_cpus::get_physical();
info!("serving /info GET request. num_cpus: {}", num_cpus);
Response::new(Body::from(
serde_json::json!({
"num_cpus": num_cpus,
})
.to_string(),
))
}
// Accept spec in JSON format and request compute configuration. If
// anything goes wrong after we set the compute status to `ConfigurationPending`
// and update compute state with new spec, we basically leave compute
// in the potentially wrong state. That said, it's control-plane's
// responsibility to watch compute state after reconfiguration request
// and to clean restart in case of errors.
(&Method::POST, "/configure") => {
info!("serving /configure POST request");
match handle_configure_request(req, compute).await {
Ok(msg) => Response::new(Body::from(msg)),
Err((msg, code)) => {
error!("error handling /configure request: {msg}");
render_json_error(&msg, code)
}
}
}
(&Method::POST, "/terminate") => {
info!("serving /terminate POST request");
match handle_terminate_request(compute).await {
Ok(()) => Response::new(Body::empty()),
Err((msg, code)) => {
error!("error handling /terminate request: {msg}");
render_json_error(&msg, code)
}
}
}
(&Method::GET, "/dbs_and_roles") => {
info!("serving /dbs_and_roles GET request",);
match get_dbs_and_roles(compute).await {
Ok(res) => render_json(Body::from(serde_json::to_string(&res).unwrap())),
Err(_) => {
render_json_error("can't get dbs and roles", StatusCode::INTERNAL_SERVER_ERROR)
}
}
}
(&Method::GET, "/database_schema") => {
let database = match must_get_query_param(&req, "database") {
Err(e) => return e.into_response(),
Ok(database) => database,
};
info!("serving /database_schema GET request with database: {database}",);
match get_database_schema(compute, &database).await {
Ok(res) => render_plain(Body::wrap_stream(res)),
Err(SchemaDumpError::DatabaseDoesNotExist) => {
render_json_error("database does not exist", StatusCode::NOT_FOUND)
}
Err(e) => {
error!("can't get schema dump: {}", e);
render_json_error("can't get schema dump", StatusCode::INTERNAL_SERVER_ERROR)
}
}
}
(&Method::POST, "/grants") => {
info!("serving /grants POST request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for set_role_grants request: {:?}",
status
);
error!(msg);
return render_json_error(&msg, StatusCode::PRECONDITION_FAILED);
}
let request = hyper::body::to_bytes(req.into_body()).await.unwrap();
let request = serde_json::from_slice::<SetRoleGrantsRequest>(&request).unwrap();
let res = compute
.set_role_grants(
&request.database,
&request.schema,
&request.privileges,
&request.role,
)
.await;
match res {
Ok(()) => render_json(Body::from(
serde_json::to_string(&SetRoleGrantsResponse {
database: request.database,
schema: request.schema,
role: request.role,
privileges: request.privileges,
})
.unwrap(),
)),
Err(e) => render_json_error(
&format!("could not grant role privileges to the schema: {e}"),
// TODO: can we filter on role/schema not found errors
// and return appropriate error code?
StatusCode::INTERNAL_SERVER_ERROR,
),
}
}
// get the list of installed extensions
// currently only used in python tests
// TODO: call it from cplane
(&Method::GET, "/installed_extensions") => {
info!("serving /installed_extensions GET request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for extensions request: {:?}",
status
);
error!(msg);
return Response::new(Body::from(msg));
}
let connstr = compute.connstr.clone();
let res = crate::installed_extensions::get_installed_extensions(connstr).await;
match res {
Ok(res) => render_json(Body::from(serde_json::to_string(&res).unwrap())),
Err(e) => render_json_error(
&format!("could not get list of installed extensions: {}", e),
StatusCode::INTERNAL_SERVER_ERROR,
),
}
}
// download extension files from remote extension storage on demand
(&Method::POST, route) if route.starts_with("/extension_server/") => {
info!("serving {:?} POST request", route);
info!("req.uri {:?}", req.uri());
// don't even try to download extensions
// if no remote storage is configured
if compute.ext_remote_storage.is_none() {
info!("no extensions remote storage configured");
let mut resp = Response::new(Body::from("no remote storage configured"));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
return resp;
}
let mut is_library = false;
if let Some(params) = req.uri().query() {
info!("serving {:?} POST request with params: {}", route, params);
if params == "is_library=true" {
is_library = true;
} else {
let mut resp = Response::new(Body::from("Wrong request parameters"));
*resp.status_mut() = StatusCode::BAD_REQUEST;
return resp;
}
}
let filename = route.split('/').last().unwrap().to_string();
info!("serving /extension_server POST request, filename: {filename:?} is_library: {is_library}");
// get ext_name and path from spec
// don't lock compute_state for too long
let ext = {
let compute_state = compute.state.lock().unwrap();
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
let spec = &pspec.spec;
// debug only
info!("spec: {:?}", spec);
let remote_extensions = match spec.remote_extensions.as_ref() {
Some(r) => r,
None => {
info!("no remote extensions spec was provided");
let mut resp = Response::new(Body::from("no remote storage configured"));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
return resp;
}
};
remote_extensions.get_ext(
&filename,
is_library,
&compute.build_tag,
&compute.pgversion,
)
};
match ext {
Ok((ext_name, ext_path)) => {
match compute.download_extension(ext_name, ext_path).await {
Ok(_) => Response::new(Body::from("OK")),
Err(e) => {
error!("extension download failed: {}", e);
let mut resp = Response::new(Body::from(e.to_string()));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
resp
}
}
}
Err(e) => {
warn!("extension download failed to find extension: {}", e);
let mut resp = Response::new(Body::from("failed to find file"));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
resp
}
}
}
// Return the `404 Not Found` for any other routes.
_ => {
let mut not_found = Response::new(Body::from("404 Not Found"));
*not_found.status_mut() = StatusCode::NOT_FOUND;
not_found
}
}
}
async fn handle_configure_request(
req: Request<Body>,
compute: &Arc<ComputeNode>,
) -> Result<String, (String, StatusCode)> {
if !compute.live_config_allowed {
return Err((
"live configuration is not allowed for this compute node".to_string(),
StatusCode::PRECONDITION_FAILED,
));
}
let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap();
let spec_raw = String::from_utf8(body_bytes.to_vec()).unwrap();
if let Ok(request) = serde_json::from_str::<ConfigurationRequest>(&spec_raw) {
let spec = request.spec;
let parsed_spec = match ParsedSpec::try_from(spec) {
Ok(ps) => ps,
Err(msg) => return Err((msg, StatusCode::BAD_REQUEST)),
};
// XXX: wrap state update under lock in code blocks. Otherwise,
// we will try to `Send` `mut state` into the spawned thread
// bellow, which will cause error:
// ```
// error: future cannot be sent between threads safely
// ```
{
let mut state = compute.state.lock().unwrap();
if state.status != ComputeStatus::Empty && state.status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for configuration request: {:?}",
state.status.clone()
);
return Err((msg, StatusCode::PRECONDITION_FAILED));
}
state.pspec = Some(parsed_spec);
state.set_status(ComputeStatus::ConfigurationPending, &compute.state_changed);
drop(state);
info!("set new spec and notified waiters");
}
// Spawn a blocking thread to wait for compute to become Running.
// This is needed to do not block the main pool of workers and
// be able to serve other requests while some particular request
// is waiting for compute to finish configuration.
let c = compute.clone();
task::spawn_blocking(move || {
let mut state = c.state.lock().unwrap();
while state.status != ComputeStatus::Running {
state = c.state_changed.wait(state).unwrap();
info!(
"waiting for compute to become Running, current status: {:?}",
state.status
);
if state.status == ComputeStatus::Failed {
let err = state.error.as_ref().map_or("unknown error", |x| x);
let msg = format!("compute configuration failed: {:?}", err);
return Err((msg, StatusCode::INTERNAL_SERVER_ERROR));
}
}
Ok(())
})
.await
.unwrap()?;
// Return current compute state if everything went well.
let state = compute.state.lock().unwrap().clone();
let status_response = status_response_from_state(&state);
Ok(serde_json::to_string(&status_response).unwrap())
} else {
Err(("invalid spec".to_string(), StatusCode::BAD_REQUEST))
}
}
fn render_json_error(e: &str, status: StatusCode) -> Response<Body> {
let error = GenericAPIError {
error: e.to_string(),
};
Response::builder()
.status(status)
.header(CONTENT_TYPE, "application/json")
.body(Body::from(serde_json::to_string(&error).unwrap()))
.unwrap()
}
fn render_json(body: Body) -> Response<Body> {
Response::builder()
.header(CONTENT_TYPE, "application/json")
.body(body)
.unwrap()
}
fn render_plain(body: Body) -> Response<Body> {
Response::builder()
.header(CONTENT_TYPE, "text/plain")
.body(body)
.unwrap()
}
async fn handle_terminate_request(compute: &Arc<ComputeNode>) -> Result<(), (String, StatusCode)> {
{
let mut state = compute.state.lock().unwrap();
if state.status == ComputeStatus::Terminated {
return Ok(());
}
if state.status != ComputeStatus::Empty && state.status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for termination request: {}",
state.status
);
return Err((msg, StatusCode::PRECONDITION_FAILED));
}
state.set_status(ComputeStatus::TerminationPending, &compute.state_changed);
drop(state);
}
forward_termination_signal();
info!("sent signal and notified waiters");
// Spawn a blocking thread to wait for compute to become Terminated.
// This is needed to do not block the main pool of workers and
// be able to serve other requests while some particular request
// is waiting for compute to finish configuration.
let c = compute.clone();
task::spawn_blocking(move || {
let mut state = c.state.lock().unwrap();
while state.status != ComputeStatus::Terminated {
state = c.state_changed.wait(state).unwrap();
info!(
"waiting for compute to become {}, current status: {:?}",
ComputeStatus::Terminated,
state.status
);
}
Ok(())
})
.await
.unwrap()?;
info!("terminated Postgres");
Ok(())
}
// Main Hyper HTTP server function that runs it and blocks waiting on it forever.
#[tokio::main]
async fn serve(port: u16, state: Arc<ComputeNode>) {
// this usually binds to both IPv4 and IPv6 on linux
// see e.g. https://github.com/rust-lang/rust/pull/34440
let addr = SocketAddr::new(IpAddr::from(Ipv6Addr::UNSPECIFIED), port);
let make_service = make_service_fn(move |_conn| {
let state = state.clone();
async move {
Ok::<_, Infallible>(service_fn(move |req: Request<Body>| {
let state = state.clone();
async move {
Ok::<_, Infallible>(
// NOTE: We include the URI path in the string. It
// doesn't contain any variable parts or sensitive
// information in this API.
tracing_utils::http::tracing_handler(
req,
|req| routes(req, &state),
OtelName::UriPath,
)
.await,
)
}
}))
}
});
info!("starting HTTP server on {}", addr);
let server = Server::bind(&addr).serve(make_service);
// Run this server forever
if let Err(e) = server.await {
error!("server error: {}", e);
}
}
/// Launch a separate Hyper HTTP API server thread and return its `JoinHandle`.
pub fn launch_http_server(port: u16, state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
let state = Arc::clone(state);
Ok(thread::Builder::new()
.name("http-endpoint".into())
.spawn(move || serve(port, state))?)
}

View File

@@ -0,0 +1,48 @@
use std::ops::{Deref, DerefMut};
use axum::{
async_trait,
extract::{rejection::JsonRejection, FromRequest, Request},
};
use compute_api::responses::GenericAPIError;
use http::StatusCode;
/// Custom `Json` extractor, so that we can format errors into
/// `JsonResponse<GenericAPIError>`.
#[derive(Debug, Clone, Copy, Default)]
pub(crate) struct Json<T>(pub T);
#[async_trait]
impl<S, T> FromRequest<S> for Json<T>
where
axum::Json<T>: FromRequest<S, Rejection = JsonRejection>,
S: Send + Sync,
{
type Rejection = (StatusCode, axum::Json<GenericAPIError>);
async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection> {
match axum::Json::<T>::from_request(req, state).await {
Ok(value) => Ok(Self(value.0)),
Err(rejection) => Err((
rejection.status(),
axum::Json(GenericAPIError {
error: rejection.body_text().to_lowercase(),
}),
)),
}
}
}
impl<T> Deref for Json<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<T> DerefMut for Json<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}

View File

@@ -0,0 +1,7 @@
pub(crate) mod json;
pub(crate) mod path;
pub(crate) mod query;
pub(crate) use json::Json;
pub(crate) use path::Path;
pub(crate) use query::Query;

View File

@@ -0,0 +1,48 @@
use std::ops::{Deref, DerefMut};
use axum::{
async_trait,
extract::{rejection::PathRejection, FromRequestParts},
};
use compute_api::responses::GenericAPIError;
use http::{request::Parts, StatusCode};
/// Custom `Path` extractor, so that we can format errors into
/// `JsonResponse<GenericAPIError>`.
#[derive(Debug, Clone, Copy, Default)]
pub(crate) struct Path<T>(pub T);
#[async_trait]
impl<S, T> FromRequestParts<S> for Path<T>
where
axum::extract::Path<T>: FromRequestParts<S, Rejection = PathRejection>,
S: Send + Sync,
{
type Rejection = (StatusCode, axum::Json<GenericAPIError>);
async fn from_request_parts(parts: &mut Parts, state: &S) -> Result<Self, Self::Rejection> {
match axum::extract::Path::<T>::from_request_parts(parts, state).await {
Ok(value) => Ok(Self(value.0)),
Err(rejection) => Err((
rejection.status(),
axum::Json(GenericAPIError {
error: rejection.body_text().to_ascii_lowercase(),
}),
)),
}
}
}
impl<T> Deref for Path<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<T> DerefMut for Path<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}

View File

@@ -0,0 +1,48 @@
use std::ops::{Deref, DerefMut};
use axum::{
async_trait,
extract::{rejection::QueryRejection, FromRequestParts},
};
use compute_api::responses::GenericAPIError;
use http::{request::Parts, StatusCode};
/// Custom `Query` extractor, so that we can format errors into
/// `JsonResponse<GenericAPIError>`.
#[derive(Debug, Clone, Copy, Default)]
pub(crate) struct Query<T>(pub T);
#[async_trait]
impl<S, T> FromRequestParts<S> for Query<T>
where
axum::extract::Query<T>: FromRequestParts<S, Rejection = QueryRejection>,
S: Send + Sync,
{
type Rejection = (StatusCode, axum::Json<GenericAPIError>);
async fn from_request_parts(parts: &mut Parts, state: &S) -> Result<Self, Self::Rejection> {
match axum::extract::Query::<T>::from_request_parts(parts, state).await {
Ok(value) => Ok(Self(value.0)),
Err(rejection) => Err((
rejection.status(),
axum::Json(GenericAPIError {
error: rejection.body_text().to_ascii_lowercase(),
}),
)),
}
}
}
impl<T> Deref for Query<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<T> DerefMut for Query<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}

View File

@@ -1 +1,56 @@
pub mod api;
use axum::{body::Body, response::Response};
use compute_api::responses::{ComputeStatus, GenericAPIError};
use http::{header::CONTENT_TYPE, StatusCode};
use serde::Serialize;
use tracing::error;
pub use server::launch_http_server;
mod extract;
mod routes;
mod server;
/// Convenience response builder for JSON responses
struct JsonResponse;
impl JsonResponse {
/// Helper for actually creating a response
fn create_response(code: StatusCode, body: impl Serialize) -> Response {
Response::builder()
.status(code)
.header(CONTENT_TYPE.as_str(), "application/json")
.body(Body::from(serde_json::to_string(&body).unwrap()))
.unwrap()
}
/// Create a successful error response
pub(self) fn success(code: StatusCode, body: impl Serialize) -> Response {
assert!({
let code = code.as_u16();
(200..300).contains(&code)
});
Self::create_response(code, body)
}
/// Create an error response
pub(self) fn error(code: StatusCode, error: impl ToString) -> Response {
assert!(code.as_u16() >= 400);
let message = error.to_string();
error!(message);
Self::create_response(code, &GenericAPIError { error: message })
}
/// Create an error response related to the compute being in an invalid state
pub(self) fn invalid_status(status: ComputeStatus) -> Response {
Self::create_response(
StatusCode::PRECONDITION_FAILED,
&GenericAPIError {
error: format!("invalid compute status: {status}"),
},
)
}
}

View File

@@ -37,7 +37,7 @@ paths:
schema:
$ref: "#/components/schemas/ComputeMetrics"
/metrics
/metrics:
get:
tags:
- Info
@@ -537,12 +537,14 @@ components:
properties:
extname:
type: string
versions:
type: array
version:
type: string
items:
type: string
n_databases:
type: integer
owned_by_superuser:
type: integer
SetRoleGrantsRequest:
type: object

View File

@@ -0,0 +1,20 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use compute_api::responses::ComputeStatus;
use http::StatusCode;
use crate::{checker::check_writability, compute::ComputeNode, http::JsonResponse};
/// Check that the compute is currently running.
pub(in crate::http) async fn is_writable(State(compute): State<Arc<ComputeNode>>) -> Response {
let status = compute.get_status();
if status != ComputeStatus::Running {
return JsonResponse::invalid_status(status);
}
match check_writability(&compute).await {
Ok(_) => JsonResponse::success(StatusCode::OK, true),
Err(e) => JsonResponse::error(StatusCode::INTERNAL_SERVER_ERROR, e),
}
}

View File

@@ -0,0 +1,91 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use compute_api::{
requests::ConfigurationRequest,
responses::{ComputeStatus, ComputeStatusResponse},
};
use http::StatusCode;
use tokio::task;
use tracing::info;
use crate::{
compute::{ComputeNode, ParsedSpec},
http::{extract::Json, JsonResponse},
};
// Accept spec in JSON format and request compute configuration. If anything
// goes wrong after we set the compute status to `ConfigurationPending` and
// update compute state with new spec, we basically leave compute in the
// potentially wrong state. That said, it's control-plane's responsibility to
// watch compute state after reconfiguration request and to clean restart in
// case of errors.
pub(in crate::http) async fn configure(
State(compute): State<Arc<ComputeNode>>,
request: Json<ConfigurationRequest>,
) -> Response {
if !compute.live_config_allowed {
return JsonResponse::error(
StatusCode::PRECONDITION_FAILED,
"live configuration is not allowed for this compute node".to_string(),
);
}
let pspec = match ParsedSpec::try_from(request.spec.clone()) {
Ok(p) => p,
Err(e) => return JsonResponse::error(StatusCode::BAD_REQUEST, e),
};
// XXX: wrap state update under lock in a code block. Otherwise, we will try
// to `Send` `mut state` into the spawned thread bellow, which will cause
// the following rustc error:
//
// error: future cannot be sent between threads safely
{
let mut state = compute.state.lock().unwrap();
if !matches!(state.status, ComputeStatus::Empty | ComputeStatus::Running) {
return JsonResponse::invalid_status(state.status);
}
state.pspec = Some(pspec);
state.set_status(ComputeStatus::ConfigurationPending, &compute.state_changed);
drop(state);
}
// Spawn a blocking thread to wait for compute to become Running. This is
// needed to do not block the main pool of workers and be able to serve
// other requests while some particular request is waiting for compute to
// finish configuration.
let c = compute.clone();
let completed = task::spawn_blocking(move || {
let mut state = c.state.lock().unwrap();
while state.status != ComputeStatus::Running {
state = c.state_changed.wait(state).unwrap();
info!(
"waiting for compute to become {}, current status: {}",
ComputeStatus::Running,
state.status
);
if state.status == ComputeStatus::Failed {
let err = state.error.as_ref().map_or("unknown error", |x| x);
let msg = format!("compute configuration failed: {:?}", err);
return Err(msg);
}
}
Ok(())
})
.await
.unwrap();
if let Err(e) = completed {
return JsonResponse::error(StatusCode::INTERNAL_SERVER_ERROR, e);
}
// Return current compute state if everything went well.
let state = compute.state.lock().unwrap().clone();
let body = ComputeStatusResponse::from(&state);
JsonResponse::success(StatusCode::OK, body)
}

View File

@@ -0,0 +1,34 @@
use std::sync::Arc;
use axum::{body::Body, extract::State, response::Response};
use http::{header::CONTENT_TYPE, StatusCode};
use serde::Deserialize;
use crate::{
catalog::{get_database_schema, SchemaDumpError},
compute::ComputeNode,
http::{extract::Query, JsonResponse},
};
#[derive(Debug, Clone, Deserialize)]
pub(in crate::http) struct DatabaseSchemaParams {
database: String,
}
/// Get a schema dump of the requested database.
pub(in crate::http) async fn get_schema_dump(
params: Query<DatabaseSchemaParams>,
State(compute): State<Arc<ComputeNode>>,
) -> Response {
match get_database_schema(&compute, &params.database).await {
Ok(schema) => Response::builder()
.status(StatusCode::OK)
.header(CONTENT_TYPE.as_str(), "application/json")
.body(Body::from_stream(schema))
.unwrap(),
Err(SchemaDumpError::DatabaseDoesNotExist) => {
JsonResponse::error(StatusCode::NOT_FOUND, SchemaDumpError::DatabaseDoesNotExist)
}
Err(e) => JsonResponse::error(StatusCode::INTERNAL_SERVER_ERROR, e),
}
}

View File

@@ -0,0 +1,16 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use http::StatusCode;
use crate::{catalog::get_dbs_and_roles, compute::ComputeNode, http::JsonResponse};
/// Get the databases and roles from the compute.
pub(in crate::http) async fn get_catalog_objects(
State(compute): State<Arc<ComputeNode>>,
) -> Response {
match get_dbs_and_roles(&compute).await {
Ok(catalog_objects) => JsonResponse::success(StatusCode::OK, catalog_objects),
Err(e) => JsonResponse::error(StatusCode::INTERNAL_SERVER_ERROR, e),
}
}

View File

@@ -0,0 +1,68 @@
use std::sync::Arc;
use axum::{
extract::State,
response::{IntoResponse, Response},
};
use http::StatusCode;
use serde::Deserialize;
use crate::{
compute::ComputeNode,
http::{
extract::{Path, Query},
JsonResponse,
},
};
#[derive(Debug, Clone, Deserialize)]
pub(in crate::http) struct ExtensionServerParams {
#[serde(default)]
is_library: bool,
}
/// Download a remote extension.
pub(in crate::http) async fn download_extension(
Path(filename): Path<String>,
params: Query<ExtensionServerParams>,
State(compute): State<Arc<ComputeNode>>,
) -> Response {
// Don't even try to download extensions if no remote storage is configured
if compute.ext_remote_storage.is_none() {
return JsonResponse::error(
StatusCode::PRECONDITION_FAILED,
"remote storage is not configured",
);
}
let ext = {
let state = compute.state.lock().unwrap();
let pspec = state.pspec.as_ref().unwrap();
let spec = &pspec.spec;
let remote_extensions = match spec.remote_extensions.as_ref() {
Some(r) => r,
None => {
return JsonResponse::error(
StatusCode::CONFLICT,
"information about remote extensions is unavailable",
);
}
};
remote_extensions.get_ext(
&filename,
params.is_library,
&compute.build_tag,
&compute.pgversion,
)
};
match ext {
Ok((ext_name, ext_path)) => match compute.download_extension(ext_name, ext_path).await {
Ok(_) => StatusCode::OK.into_response(),
Err(e) => JsonResponse::error(StatusCode::INTERNAL_SERVER_ERROR, e),
},
Err(e) => JsonResponse::error(StatusCode::NOT_FOUND, e),
}
}

View File

@@ -0,0 +1,45 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use compute_api::{
requests::ExtensionInstallRequest,
responses::{ComputeStatus, ExtensionInstallResponse},
};
use http::StatusCode;
use crate::{
compute::ComputeNode,
http::{extract::Json, JsonResponse},
};
/// Install a extension.
pub(in crate::http) async fn install_extension(
State(compute): State<Arc<ComputeNode>>,
request: Json<ExtensionInstallRequest>,
) -> Response {
let status = compute.get_status();
if status != ComputeStatus::Running {
return JsonResponse::invalid_status(status);
}
match compute
.install_extension(
&request.extension,
&request.database,
request.version.to_string(),
)
.await
{
Ok(version) => JsonResponse::success(
StatusCode::CREATED,
Some(ExtensionInstallResponse {
extension: request.extension.clone(),
version,
}),
),
Err(e) => JsonResponse::error(
StatusCode::INTERNAL_SERVER_ERROR,
format!("failed to install extension: {e}"),
),
}
}

View File

@@ -0,0 +1,35 @@
use axum::response::{IntoResponse, Response};
use http::StatusCode;
use tracing::info;
use utils::failpoint_support::{apply_failpoint, ConfigureFailpointsRequest};
use crate::http::{extract::Json, JsonResponse};
/// Configure failpoints for testing purposes.
pub(in crate::http) async fn configure_failpoints(
failpoints: Json<ConfigureFailpointsRequest>,
) -> Response {
if !fail::has_failpoints() {
return JsonResponse::error(
StatusCode::PRECONDITION_FAILED,
"Cannot manage failpoints because neon was compiled without failpoints support",
);
}
for fp in &*failpoints {
info!("cfg failpoint: {} {}", fp.name, fp.actions);
// We recognize one extra "action" that's not natively recognized
// by the failpoints crate: exit, to immediately kill the process
let cfg_result = apply_failpoint(&fp.name, &fp.actions);
if let Err(e) = cfg_result {
return JsonResponse::error(
StatusCode::BAD_REQUEST,
format!("failed to configure failpoints: {e}"),
);
}
}
StatusCode::OK.into_response()
}

View File

@@ -0,0 +1,48 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use compute_api::{
requests::SetRoleGrantsRequest,
responses::{ComputeStatus, SetRoleGrantsResponse},
};
use http::StatusCode;
use crate::{
compute::ComputeNode,
http::{extract::Json, JsonResponse},
};
/// Add grants for a role.
pub(in crate::http) async fn add_grant(
State(compute): State<Arc<ComputeNode>>,
request: Json<SetRoleGrantsRequest>,
) -> Response {
let status = compute.get_status();
if status != ComputeStatus::Running {
return JsonResponse::invalid_status(status);
}
match compute
.set_role_grants(
&request.database,
&request.schema,
&request.privileges,
&request.role,
)
.await
{
Ok(()) => JsonResponse::success(
StatusCode::CREATED,
Some(SetRoleGrantsResponse {
database: request.database.clone(),
schema: request.schema.clone(),
role: request.role.clone(),
privileges: request.privileges.clone(),
}),
),
Err(e) => JsonResponse::error(
StatusCode::INTERNAL_SERVER_ERROR,
format!("failed to grant role privileges to the schema: {e}"),
),
}
}

View File

@@ -0,0 +1,11 @@
use axum::response::Response;
use compute_api::responses::InfoResponse;
use http::StatusCode;
use crate::http::JsonResponse;
/// Get information about the physical characteristics about the compute.
pub(in crate::http) async fn get_info() -> Response {
let num_cpus = num_cpus::get_physical();
JsonResponse::success(StatusCode::OK, &InfoResponse { num_cpus })
}

View File

@@ -0,0 +1,18 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use compute_api::responses::ComputeStatus;
use http::StatusCode;
use crate::{compute::ComputeNode, http::JsonResponse};
/// Collect current Postgres usage insights.
pub(in crate::http) async fn get_insights(State(compute): State<Arc<ComputeNode>>) -> Response {
let status = compute.get_status();
if status != ComputeStatus::Running {
return JsonResponse::invalid_status(status);
}
let insights = compute.collect_insights().await;
JsonResponse::success(StatusCode::OK, insights)
}

View File

@@ -0,0 +1,33 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use compute_api::responses::ComputeStatus;
use http::StatusCode;
use tokio::task;
use crate::{compute::ComputeNode, http::JsonResponse, installed_extensions};
/// Get a list of installed extensions.
pub(in crate::http) async fn get_installed_extensions(
State(compute): State<Arc<ComputeNode>>,
) -> Response {
let status = compute.get_status();
if status != ComputeStatus::Running {
return JsonResponse::invalid_status(status);
}
let conf = compute.get_conn_conf(None);
let res = task::spawn_blocking(move || installed_extensions::get_installed_extensions(conf))
.await
.unwrap();
match res {
Ok(installed_extensions) => {
JsonResponse::success(StatusCode::OK, Some(installed_extensions))
}
Err(e) => JsonResponse::error(
StatusCode::INTERNAL_SERVER_ERROR,
format!("failed to get list of installed extensions: {e}"),
),
}
}

View File

@@ -0,0 +1,32 @@
use axum::{body::Body, response::Response};
use http::header::CONTENT_TYPE;
use http::StatusCode;
use metrics::proto::MetricFamily;
use metrics::Encoder;
use metrics::TextEncoder;
use crate::{http::JsonResponse, installed_extensions};
/// Expose Prometheus metrics.
pub(in crate::http) async fn get_metrics() -> Response {
// When we call TextEncoder::encode() below, it will immediately return an
// error if a metric family has no metrics, so we need to preemptively
// filter out metric families with no metrics.
let metrics = installed_extensions::collect()
.into_iter()
.filter(|m| !m.get_metric().is_empty())
.collect::<Vec<MetricFamily>>();
let encoder = TextEncoder::new();
let mut buffer = vec![];
if let Err(e) = encoder.encode(&metrics, &mut buffer) {
return JsonResponse::error(StatusCode::INTERNAL_SERVER_ERROR, e);
}
Response::builder()
.status(StatusCode::OK)
.header(CONTENT_TYPE, encoder.format_type())
.body(Body::from(buffer))
.unwrap()
}

View File

@@ -0,0 +1,12 @@
use std::sync::Arc;
use axum::{extract::State, response::Response};
use http::StatusCode;
use crate::{compute::ComputeNode, http::JsonResponse};
/// Get startup metrics.
pub(in crate::http) async fn get_metrics(State(compute): State<Arc<ComputeNode>>) -> Response {
let metrics = compute.state.lock().unwrap().metrics.clone();
JsonResponse::success(StatusCode::OK, metrics)
}

View File

@@ -0,0 +1,38 @@
use compute_api::responses::ComputeStatusResponse;
use crate::compute::ComputeState;
pub(in crate::http) mod check_writability;
pub(in crate::http) mod configure;
pub(in crate::http) mod database_schema;
pub(in crate::http) mod dbs_and_roles;
pub(in crate::http) mod extension_server;
pub(in crate::http) mod extensions;
pub(in crate::http) mod failpoints;
pub(in crate::http) mod grants;
pub(in crate::http) mod info;
pub(in crate::http) mod insights;
pub(in crate::http) mod installed_extensions;
pub(in crate::http) mod metrics;
pub(in crate::http) mod metrics_json;
pub(in crate::http) mod status;
pub(in crate::http) mod terminate;
impl From<&ComputeState> for ComputeStatusResponse {
fn from(state: &ComputeState) -> Self {
ComputeStatusResponse {
start_time: state.start_time,
tenant: state
.pspec
.as_ref()
.map(|pspec| pspec.tenant_id.to_string()),
timeline: state
.pspec
.as_ref()
.map(|pspec| pspec.timeline_id.to_string()),
status: state.status,
last_active: state.last_active,
error: state.error.clone(),
}
}
}

View File

@@ -0,0 +1,14 @@
use std::{ops::Deref, sync::Arc};
use axum::{extract::State, http::StatusCode, response::Response};
use compute_api::responses::ComputeStatusResponse;
use crate::{compute::ComputeNode, http::JsonResponse};
/// Retrieve the state of the comute.
pub(in crate::http) async fn get_status(State(compute): State<Arc<ComputeNode>>) -> Response {
let state = compute.state.lock().unwrap();
let body = ComputeStatusResponse::from(state.deref());
JsonResponse::success(StatusCode::OK, body)
}

View File

@@ -0,0 +1,58 @@
use std::sync::Arc;
use axum::{
extract::State,
response::{IntoResponse, Response},
};
use compute_api::responses::ComputeStatus;
use http::StatusCode;
use tokio::task;
use tracing::info;
use crate::{
compute::{forward_termination_signal, ComputeNode},
http::JsonResponse,
};
/// Terminate the compute.
pub(in crate::http) async fn terminate(State(compute): State<Arc<ComputeNode>>) -> Response {
{
let mut state = compute.state.lock().unwrap();
if state.status == ComputeStatus::Terminated {
return StatusCode::CREATED.into_response();
}
if !matches!(state.status, ComputeStatus::Empty | ComputeStatus::Running) {
return JsonResponse::invalid_status(state.status);
}
state.set_status(ComputeStatus::TerminationPending, &compute.state_changed);
drop(state);
}
forward_termination_signal();
info!("sent signal and notified waiters");
// Spawn a blocking thread to wait for compute to become Terminated.
// This is needed to do not block the main pool of workers and
// be able to serve other requests while some particular request
// is waiting for compute to finish configuration.
let c = compute.clone();
task::spawn_blocking(move || {
let mut state = c.state.lock().unwrap();
while state.status != ComputeStatus::Terminated {
state = c.state_changed.wait(state).unwrap();
info!(
"waiting for compute to become {}, current status: {:?}",
ComputeStatus::Terminated,
state.status
);
}
})
.await
.unwrap();
info!("terminated Postgres");
StatusCode::OK.into_response()
}

View File

@@ -0,0 +1,155 @@
use std::{
net::{IpAddr, Ipv6Addr, SocketAddr},
sync::Arc,
thread,
time::Duration,
};
use anyhow::Result;
use axum::{
extract::Request,
middleware::{self, Next},
response::{IntoResponse, Response},
routing::{get, post},
Router,
};
use http::StatusCode;
use tokio::net::TcpListener;
use tower::ServiceBuilder;
use tower_http::{request_id::PropagateRequestIdLayer, trace::TraceLayer};
use tracing::{debug, error, info, Span};
use uuid::Uuid;
use super::routes::{
check_writability, configure, database_schema, dbs_and_roles, extension_server, extensions,
grants, info as info_route, insights, installed_extensions, metrics, metrics_json, status,
terminate,
};
use crate::compute::ComputeNode;
async fn handle_404() -> Response {
StatusCode::NOT_FOUND.into_response()
}
const X_REQUEST_ID: &str = "x-request-id";
/// This middleware function allows compute_ctl to generate its own request ID
/// if one isn't supplied. The control plane will always send one as a UUID. The
/// neon Postgres extension on the other hand does not send one.
async fn maybe_add_request_id_header(mut request: Request, next: Next) -> Response {
let headers = request.headers_mut();
if headers.get(X_REQUEST_ID).is_none() {
headers.append(X_REQUEST_ID, Uuid::new_v4().to_string().parse().unwrap());
}
next.run(request).await
}
/// Run the HTTP server and wait on it forever.
#[tokio::main]
async fn serve(port: u16, compute: Arc<ComputeNode>) {
let mut app = Router::new()
.route("/check_writability", post(check_writability::is_writable))
.route("/configure", post(configure::configure))
.route("/database_schema", get(database_schema::get_schema_dump))
.route("/dbs_and_roles", get(dbs_and_roles::get_catalog_objects))
.route(
"/extension_server/*filename",
post(extension_server::download_extension),
)
.route("/extensions", post(extensions::install_extension))
.route("/grants", post(grants::add_grant))
.route("/info", get(info_route::get_info))
.route("/insights", get(insights::get_insights))
.route(
"/installed_extensions",
get(installed_extensions::get_installed_extensions),
)
.route("/metrics", get(metrics::get_metrics))
.route("/metrics.json", get(metrics_json::get_metrics))
.route("/status", get(status::get_status))
.route("/terminate", post(terminate::terminate))
.fallback(handle_404)
.layer(
ServiceBuilder::new()
// Add this middleware since we assume the request ID exists
.layer(middleware::from_fn(maybe_add_request_id_header))
.layer(
TraceLayer::new_for_http()
.on_request(|request: &http::Request<_>, _span: &Span| {
let request_id = request
.headers()
.get(X_REQUEST_ID)
.unwrap()
.to_str()
.unwrap();
match request.uri().path() {
"/metrics" => {
debug!(%request_id, "{} {}", request.method(), request.uri())
}
_ => info!(%request_id, "{} {}", request.method(), request.uri()),
};
})
.on_response(
|response: &http::Response<_>, latency: Duration, _span: &Span| {
let request_id = response
.headers()
.get(X_REQUEST_ID)
.unwrap()
.to_str()
.unwrap();
info!(
%request_id,
code = response.status().as_u16(),
latency = latency.as_millis()
)
},
),
)
.layer(PropagateRequestIdLayer::x_request_id()),
)
.with_state(compute);
// Add in any testing support
if cfg!(feature = "testing") {
use super::routes::failpoints;
app = app.route("/failpoints", post(failpoints::configure_failpoints))
}
// This usually binds to both IPv4 and IPv6 on Linux, see
// https://github.com/rust-lang/rust/pull/34440 for more information
let addr = SocketAddr::new(IpAddr::from(Ipv6Addr::UNSPECIFIED), port);
let listener = match TcpListener::bind(&addr).await {
Ok(listener) => listener,
Err(e) => {
error!(
"failed to bind the compute_ctl HTTP server to port {}: {}",
port, e
);
return;
}
};
if let Ok(local_addr) = listener.local_addr() {
info!("compute_ctl HTTP server listening on {}", local_addr);
} else {
info!("compute_ctl HTTP server listening on port {}", port);
}
if let Err(e) = axum::serve(listener, app).await {
error!("compute_ctl HTTP server error: {}", e);
}
}
/// Launch a separate HTTP server thread and return its `JoinHandle`.
pub fn launch_http_server(port: u16, state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
let state = Arc::clone(state);
Ok(thread::Builder::new()
.name("http-server".into())
.spawn(move || serve(port, state))?)
}

View File

@@ -1,13 +1,9 @@
use compute_api::responses::{InstalledExtension, InstalledExtensions};
use metrics::proto::MetricFamily;
use std::collections::HashMap;
use std::collections::HashSet;
use tracing::info;
use url::Url;
use anyhow::Result;
use postgres::{Client, NoTls};
use tokio::task;
use metrics::core::Collector;
use metrics::{register_uint_gauge_vec, UIntGaugeVec};
@@ -41,83 +37,77 @@ fn list_dbs(client: &mut Client) -> Result<Vec<String>> {
/// Connect to every database (see list_dbs above) and get the list of installed extensions.
///
/// Same extension can be installed in multiple databases with different versions,
/// we only keep the highest and lowest version across all databases.
pub async fn get_installed_extensions(connstr: Url) -> Result<InstalledExtensions> {
let mut connstr = connstr.clone();
/// so we report a separate metric (number of databases where it is installed)
/// for each extension version.
pub fn get_installed_extensions(mut conf: postgres::config::Config) -> Result<InstalledExtensions> {
conf.application_name("compute_ctl:get_installed_extensions");
let mut client = conf.connect(NoTls)?;
let databases: Vec<String> = list_dbs(&mut client)?;
task::spawn_blocking(move || {
let mut client = Client::connect(connstr.as_str(), NoTls)?;
let databases: Vec<String> = list_dbs(&mut client)?;
let mut extensions_map: HashMap<(String, String, String), InstalledExtension> = HashMap::new();
for db in databases.iter() {
conf.dbname(db);
let mut db_client = conf.connect(NoTls)?;
let extensions: Vec<(String, String, i32)> = db_client
.query(
"SELECT extname, extversion, extowner::integer FROM pg_catalog.pg_extension",
&[],
)?
.iter()
.map(|row| {
(
row.get("extname"),
row.get("extversion"),
row.get("extowner"),
)
})
.collect();
let mut extensions_map: HashMap<String, InstalledExtension> = HashMap::new();
for db in databases.iter() {
connstr.set_path(db);
let mut db_client = Client::connect(connstr.as_str(), NoTls)?;
let extensions: Vec<(String, String)> = db_client
.query(
"SELECT extname, extversion FROM pg_catalog.pg_extension;",
&[],
)?
.iter()
.map(|row| (row.get("extname"), row.get("extversion")))
.collect();
for (extname, v, extowner) in extensions.iter() {
let version = v.to_string();
for (extname, v) in extensions.iter() {
let version = v.to_string();
// check if the extension is owned by superuser
// 10 is the oid of superuser
let owned_by_superuser = if *extowner == 10 { "1" } else { "0" };
// increment the number of databases where the version of extension is installed
INSTALLED_EXTENSIONS
.with_label_values(&[extname, &version])
.inc();
extensions_map
.entry(extname.to_string())
.and_modify(|e| {
e.versions.insert(version.clone());
// count the number of databases where the extension is installed
e.n_databases += 1;
})
.or_insert(InstalledExtension {
extname: extname.to_string(),
versions: HashSet::from([version.clone()]),
n_databases: 1,
});
}
extensions_map
.entry((
extname.to_string(),
version.clone(),
owned_by_superuser.to_string(),
))
.and_modify(|e| {
// count the number of databases where the extension is installed
e.n_databases += 1;
})
.or_insert(InstalledExtension {
extname: extname.to_string(),
version: version.clone(),
n_databases: 1,
owned_by_superuser: owned_by_superuser.to_string(),
});
}
}
let res = InstalledExtensions {
extensions: extensions_map.values().cloned().collect(),
};
for (key, ext) in extensions_map.iter() {
let (extname, version, owned_by_superuser) = key;
let n_databases = ext.n_databases as u64;
Ok(res)
INSTALLED_EXTENSIONS
.with_label_values(&[extname, version, owned_by_superuser])
.set(n_databases);
}
Ok(InstalledExtensions {
extensions: extensions_map.into_values().collect(),
})
.await?
}
// Gather info about installed extensions
pub fn get_installed_extensions_sync(connstr: Url) -> Result<()> {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.expect("failed to create runtime");
let result = rt
.block_on(crate::installed_extensions::get_installed_extensions(
connstr,
))
.expect("failed to get installed extensions");
info!(
"[NEON_EXT_STAT] {}",
serde_json::to_string(&result).expect("failed to serialize extensions list")
);
Ok(())
}
static INSTALLED_EXTENSIONS: Lazy<UIntGaugeVec> = Lazy::new(|| {
register_uint_gauge_vec!(
"compute_installed_extensions",
"Number of databases where the version of extension is installed",
&["extension_name", "version"]
&["extension_name", "version", "owned_by_superuser"]
)
.expect("failed to define a metric")
});

View File

@@ -3,8 +3,6 @@
#![deny(unsafe_code)]
#![deny(clippy::undocumented_unsafe_blocks)]
extern crate hyper0 as hyper;
pub mod checker;
pub mod config;
pub mod configurator;

View File

@@ -1,13 +1,16 @@
use anyhow::{Context, Result};
use postgres::Client;
use fail::fail_point;
use postgres::{Client, Transaction};
use tracing::info;
/// Runs a series of migrations on a target database
pub(crate) struct MigrationRunner<'m> {
client: &'m mut Client,
migrations: &'m [&'m str],
}
impl<'m> MigrationRunner<'m> {
/// Create a new migration runner
pub fn new(client: &'m mut Client, migrations: &'m [&'m str]) -> Self {
// The neon_migration.migration_id::id column is a bigint, which is equivalent to an i64
assert!(migrations.len() + 1 < i64::MAX as usize);
@@ -15,87 +18,110 @@ impl<'m> MigrationRunner<'m> {
Self { client, migrations }
}
/// Get the current value neon_migration.migration_id
fn get_migration_id(&mut self) -> Result<i64> {
let query = "SELECT id FROM neon_migration.migration_id";
let row = self
.client
.query_one(query, &[])
.context("run_migrations get migration_id")?;
.query_one("SELECT id FROM neon_migration.migration_id", &[])?;
Ok(row.get::<&str, i64>("id"))
}
fn update_migration_id(&mut self, migration_id: i64) -> Result<()> {
let setval = format!("UPDATE neon_migration.migration_id SET id={}", migration_id);
/// Update the neon_migration.migration_id value
///
/// This function has a fail point called compute-migration, which can be
/// used if you would like to fail the application of a series of migrations
/// at some point.
fn update_migration_id(txn: &mut Transaction, migration_id: i64) -> Result<()> {
// We use this fail point in order to check that failing in the
// middle of applying a series of migrations fails in an expected
// manner
if cfg!(feature = "testing") {
let fail = (|| {
fail_point!("compute-migration", |fail_migration_id| {
migration_id == fail_migration_id.unwrap().parse::<i64>().unwrap()
});
false
})();
if fail {
return Err(anyhow::anyhow!(format!(
"migration {} was configured to fail because of a failpoint",
migration_id
)));
}
}
txn.query(
"UPDATE neon_migration.migration_id SET id = $1",
&[&migration_id],
)
.with_context(|| format!("update neon_migration.migration_id to {migration_id}"))?;
Ok(())
}
/// Prepare the migrations the target database for handling migrations
fn prepare_database(&mut self) -> Result<()> {
self.client
.simple_query(&setval)
.context("run_migrations update id")?;
.simple_query("CREATE SCHEMA IF NOT EXISTS neon_migration")?;
self.client.simple_query("CREATE TABLE IF NOT EXISTS neon_migration.migration_id (key INT NOT NULL PRIMARY KEY, id bigint NOT NULL DEFAULT 0)")?;
self.client.simple_query(
"INSERT INTO neon_migration.migration_id VALUES (0, 0) ON CONFLICT DO NOTHING",
)?;
self.client
.simple_query("ALTER SCHEMA neon_migration OWNER TO cloud_admin")?;
self.client
.simple_query("REVOKE ALL ON SCHEMA neon_migration FROM PUBLIC")?;
Ok(())
}
fn prepare_migrations(&mut self) -> Result<()> {
let query = "CREATE SCHEMA IF NOT EXISTS neon_migration";
self.client.simple_query(query)?;
/// Run an individual migration
fn run_migration(txn: &mut Transaction, migration_id: i64, migration: &str) -> Result<()> {
if migration.starts_with("-- SKIP") {
info!("Skipping migration id={}", migration_id);
let query = "CREATE TABLE IF NOT EXISTS neon_migration.migration_id (key INT NOT NULL PRIMARY KEY, id bigint NOT NULL DEFAULT 0)";
self.client.simple_query(query)?;
// Even though we are skipping the migration, updating the
// migration ID should help keep logic easy to understand when
// trying to understand the state of a cluster.
Self::update_migration_id(txn, migration_id)?;
} else {
info!("Running migration id={}:\n{}\n", migration_id, migration);
let query = "INSERT INTO neon_migration.migration_id VALUES (0, 0) ON CONFLICT DO NOTHING";
self.client.simple_query(query)?;
txn.simple_query(migration)
.with_context(|| format!("apply migration {migration_id}"))?;
let query = "ALTER SCHEMA neon_migration OWNER TO cloud_admin";
self.client.simple_query(query)?;
let query = "REVOKE ALL ON SCHEMA neon_migration FROM PUBLIC";
self.client.simple_query(query)?;
Self::update_migration_id(txn, migration_id)?;
}
Ok(())
}
/// Run the configured set of migrations
pub fn run_migrations(mut self) -> Result<()> {
self.prepare_migrations()?;
self.prepare_database()
.context("prepare database to handle migrations")?;
let mut current_migration = self.get_migration_id()? as usize;
while current_migration < self.migrations.len() {
macro_rules! migration_id {
($cm:expr) => {
($cm + 1) as i64
};
}
// The index lags the migration ID by 1, so the current migration
// ID is also the next index
let migration_id = (current_migration + 1) as i64;
let migration = self.migrations[current_migration];
let mut txn = self
.client
.transaction()
.with_context(|| format!("begin transaction for migration {migration_id}"))?;
if migration.starts_with("-- SKIP") {
info!("Skipping migration id={}", migration_id!(current_migration));
} else {
info!(
"Running migration id={}:\n{}\n",
migration_id!(current_migration),
migration
);
Self::run_migration(&mut txn, migration_id, self.migrations[current_migration])
.with_context(|| format!("running migration {migration_id}"))?;
self.client
.simple_query("BEGIN")
.context("begin migration")?;
txn.commit()
.with_context(|| format!("commit transaction for migration {migration_id}"))?;
self.client.simple_query(migration).with_context(|| {
format!(
"run_migrations migration id={}",
migration_id!(current_migration)
)
})?;
// Migration IDs start at 1
self.update_migration_id(migration_id!(current_migration))?;
self.client
.simple_query("COMMIT")
.context("commit migration")?;
info!("Finished migration id={}", migration_id!(current_migration));
}
info!("Finished migration id={}", migration_id);
current_migration += 1;
}

View File

@@ -0,0 +1,9 @@
DO $$
DECLARE
bypassrls boolean;
BEGIN
SELECT rolbypassrls INTO bypassrls FROM pg_roles WHERE rolname = 'neon_superuser';
IF NOT bypassrls THEN
RAISE EXCEPTION 'neon_superuser cannot bypass RLS';
END IF;
END $$;

View File

@@ -0,0 +1,25 @@
DO $$
DECLARE
role record;
BEGIN
FOR role IN
SELECT rolname AS name, rolinherit AS inherit
FROM pg_roles
WHERE pg_has_role(rolname, 'neon_superuser', 'member')
LOOP
IF NOT role.inherit THEN
RAISE EXCEPTION '% cannot inherit', quote_ident(role.name);
END IF;
END LOOP;
FOR role IN
SELECT rolname AS name, rolbypassrls AS bypassrls
FROM pg_roles
WHERE NOT pg_has_role(rolname, 'neon_superuser', 'member')
AND NOT starts_with(rolname, 'pg_')
LOOP
IF role.bypassrls THEN
RAISE EXCEPTION '% can bypass RLS', quote_ident(role.name);
END IF;
END LOOP;
END $$;

View File

@@ -0,0 +1,10 @@
DO $$
BEGIN
IF (SELECT current_setting('server_version_num')::numeric < 160000) THEN
RETURN;
END IF;
IF NOT (SELECT pg_has_role('neon_superuser', 'pg_create_subscription', 'member')) THEN
RAISE EXCEPTION 'neon_superuser cannot execute pg_create_subscription';
END IF;
END $$;

View File

@@ -0,0 +1,19 @@
DO $$
DECLARE
monitor record;
BEGIN
SELECT pg_has_role('neon_superuser', 'pg_monitor', 'member') AS member,
admin_option AS admin
INTO monitor
FROM pg_auth_members
WHERE roleid = 'pg_monitor'::regrole
AND member = 'pg_monitor'::regrole;
IF NOT monitor.member THEN
RAISE EXCEPTION 'neon_superuser is not a member of pg_monitor';
END IF;
IF NOT monitor.admin THEN
RAISE EXCEPTION 'neon_superuser cannot grant pg_monitor';
END IF;
END $$;

View File

@@ -0,0 +1,2 @@
-- This test was never written becuase at the time migration tests were added
-- the accompanying migration was already skipped.

View File

@@ -0,0 +1,2 @@
-- This test was never written becuase at the time migration tests were added
-- the accompanying migration was already skipped.

View File

@@ -0,0 +1,2 @@
-- This test was never written becuase at the time migration tests were added
-- the accompanying migration was already skipped.

View File

@@ -0,0 +1,2 @@
-- This test was never written becuase at the time migration tests were added
-- the accompanying migration was already skipped.

View File

@@ -0,0 +1,2 @@
-- This test was never written becuase at the time migration tests were added
-- the accompanying migration was already skipped.

View File

@@ -0,0 +1,13 @@
DO $$
DECLARE
can_execute boolean;
BEGIN
SELECT bool_and(has_function_privilege('neon_superuser', oid, 'execute'))
INTO can_execute
FROM pg_proc
WHERE proname IN ('pg_export_snapshot', 'pg_log_standby_snapshot')
AND pronamespace = 'pg_catalog'::regnamespace;
IF NOT can_execute THEN
RAISE EXCEPTION 'neon_superuser cannot execute both pg_export_snapshot and pg_log_standby_snapshot';
END IF;
END $$;

View File

@@ -0,0 +1,13 @@
DO $$
DECLARE
can_execute boolean;
BEGIN
SELECT has_function_privilege('neon_superuser', oid, 'execute')
INTO can_execute
FROM pg_proc
WHERE proname = 'pg_show_replication_origin_status'
AND pronamespace = 'pg_catalog'::regnamespace;
IF NOT can_execute THEN
RAISE EXCEPTION 'neon_superuser cannot execute pg_show_replication_origin_status';
END IF;
END $$;

View File

@@ -17,11 +17,8 @@ const MONITOR_CHECK_INTERVAL: Duration = Duration::from_millis(500);
// should be handled gracefully.
fn watch_compute_activity(compute: &ComputeNode) {
// Suppose that `connstr` doesn't change
let mut connstr = compute.connstr.clone();
connstr
.query_pairs_mut()
.append_pair("application_name", "compute_activity_monitor");
let connstr = connstr.as_str();
let connstr = compute.connstr.clone();
let conf = compute.get_conn_conf(Some("compute_ctl:activity_monitor"));
// During startup and configuration we connect to every Postgres database,
// but we don't want to count this as some user activity. So wait until
@@ -29,7 +26,7 @@ fn watch_compute_activity(compute: &ComputeNode) {
wait_for_postgres_start(compute);
// Define `client` outside of the loop to reuse existing connection if it's active.
let mut client = Client::connect(connstr, NoTls);
let mut client = conf.connect(NoTls);
let mut sleep = false;
let mut prev_active_time: Option<f64> = None;
@@ -57,7 +54,7 @@ fn watch_compute_activity(compute: &ComputeNode) {
info!("connection to Postgres is closed, trying to reconnect");
// Connection is closed, reconnect and try again.
client = Client::connect(connstr, NoTls);
client = conf.connect(NoTls);
continue;
}
@@ -196,7 +193,7 @@ fn watch_compute_activity(compute: &ComputeNode) {
debug!("could not connect to Postgres: {}, retrying", e);
// Establish a new connection and try again.
client = Client::connect(connstr, NoTls);
client = conf.connect(NoTls);
}
}
}

View File

@@ -6,6 +6,7 @@ use std::io::{BufRead, BufReader};
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::process::Child;
use std::str::FromStr;
use std::thread::JoinHandle;
use std::time::{Duration, Instant};
@@ -13,8 +14,10 @@ use anyhow::{bail, Result};
use futures::StreamExt;
use ini::Ini;
use notify::{RecursiveMode, Watcher};
use postgres::config::Config;
use tokio::io::AsyncBufReadExt;
use tokio::time::timeout;
use tokio_postgres;
use tokio_postgres::NoTls;
use tracing::{debug, error, info, instrument};
@@ -542,3 +545,11 @@ async fn handle_postgres_logs_async(stderr: tokio::process::ChildStderr) -> Resu
Ok(())
}
/// `Postgres::config::Config` handles database names with whitespaces
/// and special characters properly.
pub fn postgres_conf_for_db(connstr: &url::Url, dbname: &str) -> Result<Config> {
let mut conf = Config::from_str(connstr.as_str())?;
conf.dbname(dbname);
Ok(conf)
}

View File

@@ -47,6 +47,7 @@ pub enum PerDatabasePhase {
DeleteDBRoleReferences,
ChangeSchemaPerms,
HandleAnonExtension,
DropSubscriptionsForDeletedDatabases,
}
#[derive(Clone, Debug)]
@@ -74,7 +75,7 @@ pub struct MutableApplyContext {
pub dbs: HashMap<String, Database>,
}
/// Appply the operations that belong to the given spec apply phase.
/// Apply the operations that belong to the given spec apply phase.
///
/// Commands within a single phase are executed in order of Iterator yield.
/// Commands of ApplySpecPhase::RunInEachDatabase will execute in the database
@@ -326,13 +327,12 @@ async fn get_operations<'a>(
// Use FORCE to drop database even if there are active connections.
// We run this from `cloud_admin`, so it should have enough privileges.
//
// NB: there could be other db states, which prevent us from dropping
// the database. For example, if db is used by any active subscription
// or replication slot.
// TODO: deal with it once we allow logical replication. Proper fix should
// involve returning an error code to the control plane, so it could
// figure out that this is a non-retryable error, return it to the user
// and fail operation permanently.
// Such cases are handled in the DropSubscriptionsForDeletedDatabases
// phase. We do all the cleanup before actually dropping the database.
let drop_db_query: String = format!(
"DROP DATABASE IF EXISTS {} WITH (FORCE)",
&op.name.pg_quote()
@@ -444,6 +444,30 @@ async fn get_operations<'a>(
}
ApplySpecPhase::RunInEachDatabase { db, subphase } => {
match subphase {
PerDatabasePhase::DropSubscriptionsForDeletedDatabases => {
match &db {
DB::UserDB(db) => {
let drop_subscription_query: String = format!(
include_str!("sql/drop_subscription_for_drop_dbs.sql"),
datname_str = escape_literal(&db.name),
);
let operations = vec![Operation {
query: drop_subscription_query,
comment: Some(format!(
"optionally dropping subscriptions for DB {}",
db.name,
)),
}]
.into_iter();
Ok(Box::new(operations))
}
// skip this cleanup for the system databases
// because users can't drop them
DB::SystemDB => Ok(Box::new(empty())),
}
}
PerDatabasePhase::DeleteDBRoleReferences => {
let ctx = ctx.read().await;
@@ -474,7 +498,19 @@ async fn get_operations<'a>(
),
comment: None,
},
// Revoke some potentially blocking privileges (Neon-specific currently)
Operation {
query: format!(
include_str!("sql/pre_drop_role_revoke_privileges.sql"),
role_name = quoted,
),
comment: None,
},
// This now will only drop privileges of the role
// TODO: this is obviously not 100% true because of the above case,
// there could be still some privileges that are not revoked. Maybe this
// only drops privileges that were granted *by this* role, not *to this* role,
// but this has to be checked.
Operation {
query: format!("DROP OWNED BY {}", quoted),
comment: None,

View File

@@ -0,0 +1,11 @@
DO $$
DECLARE
subname TEXT;
BEGIN
FOR subname IN SELECT pg_subscription.subname FROM pg_subscription WHERE subdbid = (SELECT oid FROM pg_database WHERE datname = {datname_str}) LOOP
EXECUTE format('ALTER SUBSCRIPTION %I DISABLE;', subname);
EXECUTE format('ALTER SUBSCRIPTION %I SET (slot_name = NONE);', subname);
EXECUTE format('DROP SUBSCRIPTION %I;', subname);
END LOOP;
END;
$$;

View File

@@ -0,0 +1,28 @@
SET SESSION ROLE neon_superuser;
DO $$
DECLARE
schema TEXT;
revoke_query TEXT;
BEGIN
FOR schema IN
SELECT schema_name
FROM information_schema.schemata
-- So far, we only had issues with 'public' schema. Probably, because we do some additional grants,
-- e.g., make DB owner the owner of 'public' schema automatically (when created via API).
-- See https://github.com/neondatabase/cloud/issues/13582 for the context.
-- Still, keep the loop because i) it efficiently handles the case when there is no 'public' schema,
-- ii) it's easy to add more schemas to the list if needed.
WHERE schema_name IN ('public')
LOOP
revoke_query := format(
'REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA %I FROM {role_name} GRANTED BY neon_superuser;',
schema
);
EXECUTE revoke_query;
END LOOP;
END;
$$;
RESET ROLE;

View File

@@ -274,6 +274,7 @@ fn fill_remote_storage_secrets_vars(mut cmd: &mut Command) -> &mut Command {
for env_key in [
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_PROFILE",
// HOME is needed in combination with `AWS_PROFILE` to pick up the SSO sessions.
"HOME",

Some files were not shown because too many files have changed in this diff Show More