Compare commits

...

1504 Commits

Author SHA1 Message Date
Konstantin Knizhnik
8de5b63ac4 Remove assert for non-zero number of shards from AssignPageserverConnstring 2023-12-20 09:16:54 +02:00
Konstantin Knizhnik
338bf7a446 Bump postgres version 2023-12-19 22:17:07 +02:00
Konstantin Knizhnik
767ce2b187 Bump Postgres version 2023-12-19 16:21:15 +02:00
Konstantin Knizhnik
80bf8d4761 Allow empty connection string 2023-12-19 14:45:25 +02:00
Konstantin Knizhnik
c2b396905f Merge with main 2023-12-19 10:20:56 +02:00
Konstantin Knizhnik
b43ae6e26c Address review comments 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
151870b666 Update pgxn/neon/libpagestore.c
Co-authored-by: Sasha Krassovsky <sasha@neon.tech>
2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
4cdfff75f2 Update pgxn/neon/libpagestore.c
Co-authored-by: Sasha Krassovsky <sasha@neon.tech>
2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
02abe3c82e Fix problem with stats collector at pg14 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
5c8b3d997f Add [NEON_SMGR] to all messages produced by Neon exrtension 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
b465185738 Add [NEON_SMGR] to all messages produced by Neon exrtension 2023-12-19 09:22:42 +02:00
John Spray
4e47d3a984 pgxn: amend key hashing 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
3853646c75 [see #6052] make connection logging shard-aware 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
86c76d4b59 Fix shard map reload synchronization 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
864d5f84e9 Fix shard map reload mechanism 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
38f64ea8ef Load shard map only at postmaster 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
2855074f55 Fix comments 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
16b0348c1f Do not deop PS connections of config reload if connection strings are not changed 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
93c75c97e1 Fix shard hash caclulation 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
d3221243f8 Minor refectoring 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
3cbd2df0b3 Add neon.stripe_size 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
8ddfa0515e Undo occsional changed in control_place_connector.c 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
3019f8fe8f Merge with main 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
e720b5f9d4 Load shardmap from postgresql.conf 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
039fa60446 Take in account stripe size when calculating shard hash number 2023-12-19 09:22:42 +02:00
Konstantin Knizhnik
7ca20dd12b Add support for PS shardoing in compute 2023-12-19 09:22:42 +02:00
Anna Khanova
6e6e40dd7f Invalidate credentials on auth failure (#6171)
## Problem

If the user reset password, cache could receive this information only
after `ttl` minutes.

## Summary of changes

Invalidate password on auth failure.
2023-12-18 23:24:22 +01:00
Heikki Linnakangas
6939fc3db6 Remove declarations of non-existent global variables and functions
FileCacheMonitorMain was removed in commit b497d0094e.
2023-12-18 21:05:31 +02:00
Heikki Linnakangas
c4c48cfd63 Clean up #includes
- No need to include c.h, port.h or pg_config.h, they are included in
  postgres.h
- No need to include postgres.h in header files. Instead, the
  assumption in PostgreSQL is that all .c files include postgres.h.
- Reorder includes to alphabetical order, and system headers before
  pgsql headers
- Remove bunch of other unnecessary includes that got copy-pasted from
  one source file to another
2023-12-18 21:05:29 +02:00
Heikki Linnakangas
82215d20b0 Mark some variables 'static'
Move initialization of neon_redo_read_buffer_filter. This allows
marking it 'static', too.
2023-12-18 21:05:24 +02:00
Sasha Krassovsky
62737f3776 Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles 2023-12-18 10:54:14 -08:00
Christian Schwarz
1f9a7d1cd0 add a Rust client for Pageserver page_service (#6128)
Part of getpage@lsn benchmark epic:
https://github.com/neondatabase/neon/issues/5771

Stacked atop https://github.com/neondatabase/neon/pull/6145
2023-12-18 18:17:19 +00:00
John Spray
4ea4812ab2 tests: update python dependencies (#6164)
## Problem

Existing dependencies didn't work on Fedora 39 (python 3.12)

## Summary of changes

- Update pyyaml 6.0 -> 6.0.1
- Update yarl 1.8.2->1.9.4
- Update the `dnf install` line in README to include dependencies of
python packages (unrelated to upgrades, just noticed absences while
doing fresh pysync run)
2023-12-18 15:47:09 +00:00
Anna Khanova
00d90ce76a Added cache for get role secret (#6165)
## Problem

Currently if we are getting many consecutive connections to the same
user/ep we will send a lot of traffic to the console.

## Summary of changes

Cache with ttl=4min proxy_get_role_secret response.

Note: this is the temporary hack, notifier listener is WIP.
2023-12-18 16:04:47 +01:00
John Khvatov
33cb9a68f7 pageserver: Reduce tracing overhead in timeline::get (#6115)
## Problem

Compaction process (specifically the image layer reconstructions part)
is lagging behind wal ingest (at speed ~10-15MB/s) for medium-sized
tenants (30-50GB). CPU profile shows that significant amount of time
(see flamegraph) is being spent in `tracing::span::Span::new`.

mainline (commit: 0ba4cae491):

![reconstruct-mainline-0ba4cae491c2](https://github.com/neondatabase/neon/assets/289788/ebfd262e-5c97-4858-80c7-664a1dbcc59d)

## Summary of changes

By lowering the tracing level in get_value_reconstruct_data and
get_or_maybe_download from info to debug, we can reduce the overhead of
span creation in prod environments. On my system, this sped up the image
reconstruction process by 60% (from 14500 to 23160 page reconstruction
per sec)

pr:

![reconstruct-opt-2](https://github.com/neondatabase/neon/assets/289788/563a159b-8f2f-4300-b0a1-6cd66e7df769)


`create_image_layers()` (it's 1 CPU bound here) mainline vs pr:

![image](https://github.com/neondatabase/neon/assets/289788/a981e3cb-6df9-4882-8a94-95e99c35aa83)
2023-12-18 13:33:23 +00:00
Conrad Ludgate
17bde7eda5 proxy refactor large files (#6153)
## Problem

The `src/proxy.rs` file is far too large

## Summary of changes

Creates 3 new files:
```
src/metrics.rs
src/proxy/retry.rs
src/proxy/connect_compute.rs
```
2023-12-18 10:59:49 +00:00
John Spray
dbdb1d21f2 pageserver: on-demand activation cleanups (#6157)
## Problem

#6112 added some logs and metrics: clean these up a bit:
- Avoid counting startup completions for tenants launched after startup
- exclude no-op cases from timing histograms 
- remove a rogue log messages
2023-12-18 10:29:19 +00:00
Arseny Sher
e1935f42a1 Don't generate core dump when walproposer intentionally panics.
Walproposer sometimes intentionally PANICs when its term is defeated as the
basebackup is likely spoiled by that time. We don't want core dumped in this
case.
2023-12-18 11:03:34 +04:00
Alexander Bayandin
9bdc25f0af Revert "CI: build build-tools image" (#6156)
It turns out the issue with skipped jobs is not so trivial (because
Github checks jobs transitively), a possible workaround with `if:
always() && contains(fromJSON('["success", "skipped"]'),
needs.build-buildtools-image.result)` will tangle the workflow really
bad. We'll need to come up with a better solution.

To unblock the main I'm going to revert
https://github.com/neondatabase/neon/pull/6082.
2023-12-16 12:32:00 +00:00
Christian Schwarz
47873470db pageserver: add method to dump keyspace in mgmt api client (#6145)
Part of getpage@lsn benchmark epic:
https://github.com/neondatabase/neon/issues/5771
2023-12-16 10:52:48 +00:00
Abhijeet Patil
8619e6295a CI: build build-tools image (#6082)
## Currently our build docker file is located in the build repo it makes
sense to have it as a part of our neon repo

## Summary of changes
We had the docker file that we use to build our binary and other tools
resided in the build repo
It made sense to bring the docker file to its repo where it has been
used
So that the contributors can also view it and amend if required
It will reduce the maintenance. Docker file changes and code changes can
be accommodated in same PR
Also, building the image and pushing it to ECR is abstracted in a
reusable workflow. Ideal is to use that for any other jobs too

## Checklist before requesting a review

- [x] Moved the docker file used to build the binary from the build repo
to the neon repo
- [x] adding gh workflow to build and push the image
- [x] adding gh workflow to tag the pushed image
- [x] update readMe file

---------

Co-authored-by: Abhijeet Patil <abhijeet@neon.tech>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-12-16 10:33:52 +00:00
Conrad Ludgate
83811491da update zerocopy (#6148)
## Problem

https://github.com/neondatabase/neon/security/dependabot/48

```
$ cargo tree -i zerocopy
zerocopy v0.7.3
└── ahash v0.8.5
    └── hashbrown v0.13.2
```

ahash doesn't use the affected APIs we we are not vulnerable but best to
update to silence the alert anyway

## Summary of changes

```
$ cargo update -p zerocopy --precise 0.7.31
    Updating crates.io index
    Updating syn v2.0.28 -> v2.0.32
    Updating zerocopy v0.7.3 -> v0.7.31
    Updating zerocopy-derive v0.7.3 -> v0.7.31
```
2023-12-16 09:06:00 +00:00
John Spray
d066dad84b pageserver: prioritize activation of tenants with client requests (#6112)
## Problem

During startup, a client request might have to wait a long time while
the system is busy initializing all the attached tenants, even though
most of the attached tenants probably don't have any client requests to
service, and could wait a bit.

## Summary of changes

- Add a semaphore to limit how many Tenant::spawn()s may concurrently do
I/O to attach their tenant (i.e. read indices from remote storage, scan
local layer files, etc).
- Add Tenant::activate_now, a hook for kicking a tenant in its spawn()
method to skip waiting for the warmup semaphore
- For tenants that attached via warmup semaphore units, wait for logical
size calculation to complete before dropping the warmup units
- Set Tenant::activate_now in `get_active_tenant_with_timeout` (the page
service's path for getting a reference to a tenant).
- Wait for tenant activation in HTTP handlers for timeline creation and
deletion: like page service requests, these require an active tenant and
should prioritize activation if called.
2023-12-15 20:37:47 +00:00
John Spray
56f7d55ba7 pageserver: basic cancel/timeout for remote storage operations (#6097)
## Problem

Various places in remote storage were not subject to a timeout (thereby
stuck TCP connections could hold things up), and did not respect a
cancellation token (so things like timeline deletion or tenant detach
would have to wait arbitrarily long).



## Summary of changes

- Add download_cancellable and upload_cancellable helpers, and use them
in all the places we wait for remote storage operations (with the
exception of initdb downloads, where it would not have been safe).
- Add a cancellation token arg to `download_retry`.
- Use cancellation token args in various places that were missing one
per #5066

Closes: #5066 

Why is this only "basic" handling?
- Doesn't express difference between shutdown and errors in return
types, to avoid refactoring all the places that use an anyhow::Error
(these should all eventually return a more structured error type)
- Implements timeouts on top of remote storage, rather than within it:
this means that operations hitting their timeout will lose their
semaphore permit and thereby go to the back of the queue for their
retry.
- Doing a nicer job is tracked in
https://github.com/neondatabase/neon/issues/6096
2023-12-15 17:43:02 +00:00
Christian Schwarz
1a9854bfb7 add a Rust client for Pageserver management API (#6127)
Part of getpage@lsn benchmark epic:
https://github.com/neondatabase/neon/issues/5771

This PR moves the control plane's spread-all-over-the-place client for
the pageserver management API into a separate module within the
pageserver crate.

I need that client to be async in my benchmarking work, so, this PR
switches to the async version of `reqwest`.
That is also the right direction generally IMO.

The switch to async in turn mandated converting most of the
`control_plane/` code to async.

Note that some of the client methods should be taking `TenantShardId`
instead of `TenantId`, but, none of the callers seem to be
sharding-aware.
Leaving that for another time:
https://github.com/neondatabase/neon/issues/6154
2023-12-15 18:33:45 +01:00
John Spray
de1a9c6e3b s3_scrubber: basic support for sharding (#6119)
This doesn't make the scrubber smart enough to understand that many
shards are part of the same tenants, but it makes it understand paths
well enough to scrub the individual shards without thinking they're
malformed.

This is a prerequisite to being able to run tests with sharding enabled.

Related: #5929
2023-12-15 15:48:55 +00:00
Arseny Sher
e62569a878 A few comments on rust walproposer build. 2023-12-15 19:31:51 +04:00
John Spray
bd1cb1b217 tests: update allow list for negative_env (#6144)
Tests attaching the tenant immediately after the fixture detaches it
could result in LSN updates failing validation

e.g.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6142/7211196140/index.html#suites/7745dadbd815ab87f5798aa881796f47/32b12ccc0b01b122
2023-12-15 15:08:28 +00:00
Conrad Ludgate
98629841e0 improve proxy code cov (#6141)
## Summary of changes

saw some low-hanging codecov improvements. even if code coverage is
somewhat of a pointless game, might as well add tests where we can and
delete code if it's unused
2023-12-15 12:11:50 +00:00
Arpad Müller
215cdd18c4 Make initdb upload retries cancellable and seek to beginning (#6147)
* initdb uploads had no cancellation token, which means that when we
were stuck in upload retries, we wouldn't be able to delete the
timeline. in general, the combination of retrying forever and not having
cancellation tokens is quite dangerous.
* initdb uploads wouldn't rewind the file. this wasn't discovered in the
purposefully unreliable test-s3 in pytest because those fail on the
first byte always, not somewhere during the connection. we'd be getting
errors from the AWS sdk that the file was at an unexpected end.

slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1702632247784079
2023-12-15 12:11:25 +00:00
Joonas Koivunen
0fd80484a9 fix: Timeline deletion during busy startup (#6133)
Compaction was holding back timeline deletion because the compaction
lock had been acquired, but the semaphore was waited on. Timeline
deletion was waiting on the same lock for 1500s.

This replaces the
`pageserver::tenant::tasks::concurrent_background_tasks_rate_limit`
(which looks correct) with a simpler `..._permit` which is just an
infallible acquire, which is easier to spot "aah this needs to be raced
with cancellation tokens".

Ref: https://neondb.slack.com/archives/C03F5SM1N02/p1702496912904719
Ref: https://neondb.slack.com/archives/C03F5SM1N02/p1702578093497779
2023-12-15 11:59:24 +00:00
Joonas Koivunen
07508fb110 fix: better Json parsing errors (#6135)
Before any json parsing from the http api only returned errors were per
field errors. Now they are done using `serde_path_to_error`, which at
least helped greatly with the `disk_usage_eviction_run` used for
testing. I don't think this can conflict with anything added in #5310.
2023-12-15 12:18:22 +02:00
Arseny Sher
5bb9ba37cc Fix python list_segments of sk.
Fixes rare test_peer_recovery flakiness as we started to compare tmp control
file.

https://neondb.slack.com/archives/C04KGFVUWUQ/p1702310929657179
2023-12-15 13:43:11 +04:00
John Spray
f1cd1a2122 pageserver: improved handling of concurrent timeline creations on the same ID (#6139)
## Problem

Historically, the pageserver used an "uninit mark" file on disk for two
purposes:
- Track which timeline dirs are incomplete for handling on restart
- Avoid trying to create the same timeline twice at the same time.

The original purpose of handling restarts is now defunct, as we use
remote storage as the source of truth and clean up any trash timeline
dirs on startup. Using the file to mutually exclude creation operations
is error prone compared with just doing it in memory, and the existing
checks happened some way into the creation operation, and could expose
errors as 500s (anyhow::Errors) rather than something clean.

## Summary of changes

- Creations are now mutually excluded in memory (using
`Tenant::timelines_creating`), rather than relying on a file on disk for
coordination.
- Acquiring unique access to the timeline ID now happens earlier in the
request.
- Creating the same timeline which already exists is now a 201: this
simplifies retry handling for clients.
- 409 is still returned if a timeline with the same ID is still being
created: if this happens it is probably because the client timed out an
earlier request and has retried.
- Colliding timeline creation requests should no longer return 500
errors

This paves the way to entirely removing uninit markers in a subsequent
change.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-12-15 08:51:23 +00:00
Joonas Koivunen
f010479107 feat(layer): pageserver_layer_redownloaded_after histogram (#6132)
this is aimed at replacing the current mtime only based trashing
alerting later.

Cc: #5331
2023-12-14 21:32:54 +02:00
Conrad Ludgate
cc633585dc gauge guards (#6138)
## Problem

The websockets gauge for active db connections seems to be growing more
than the gauge for client connections over websockets, which does not
make sense.

## Summary of changes

refactor how our counter-pair gauges are represented. not sure if this
will improve the problem, but it should be harder to mess-up the
counters. The API is much nicer though now and doesn't require
scopeguard::defer hacks
2023-12-14 17:21:39 +00:00
Christian Schwarz
aa5581d14f utils::logging: TracingEventCountLayer: don't use with_label_values() on hot path (#6129)
fixes #6126
2023-12-14 16:31:41 +01:00
John Spray
c4e0ef507f pageserver: heatmap uploads (#6050)
Dependency (commits inline):
https://github.com/neondatabase/neon/pull/5842

## Problem

Secondary mode tenants need a manifest of what to download. Ultimately
this will be some kind of heat-scored set of layers, but as a robust
first step we will simply use the set of resident layers: secondary
tenant locations will aim to match the on-disk content of the attached
location.

## Summary of changes

- Add heatmap types representing the remote structure
- Add hooks to Tenant/Timeline for generating these heatmaps
- Create a new `HeatmapUploader` type that is external to `Tenant`, and
responsible for walking the list of attached tenants and scheduling
heatmap uploads.

Notes to reviewers:
- Putting the logic for uploads (and later, secondary mode downloads)
outside of `Tenant` is an opinionated choice, motivated by:
- Enable future smarter scheduling of operations, e.g. uploading the
stalest tenant first, rather than having all tenants compete for a fair
semaphore on a first-come-first-served basis. Similarly for downloads,
we may wish to schedule the tenants with the hottest un-downloaded
layers first.
- Enable accessing upload-related state without synchronization (it
belongs to HeatmapUploader, rather than being some Mutex<>'d part of
Tenant)
- Avoid further expanding the scope of Tenant/Timeline types, which are
already among the largest in the codebase
- You might reasonably wonder how much of the uploader code could be a
generic job manager thing. Probably some of it: but let's defer pulling
that out until we have at least two users (perhaps secondary downloads
will be the second one) to highlight which bits are really generic.

Compromises:
- Later, instead of using digests of heatmaps to decide whether anything
changed, I would prefer to avoid walking the layers in tenants that
don't have changes: tracking that will be a bit invasive, as it needs
input from both remote_timeline_client and Layer.
2023-12-14 13:09:24 +00:00
Conrad Ludgate
6987b5c44e proxy: add more rates to endpoint limiter (#6130)
## Problem

Single rate bucket is limited in usefulness

## Summary of changes

Introduce a secondary bucket allowing an average of 200 requests per
second over 1 minute, and a tertiary bucket allowing an average of 100
requests per second over 10 minutes.

Configured by using a format like

```sh
proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 100@10s --endpoint-rps-limit 50@1m
```

If the bucket limits are inconsistent, an error is returned on startup

```
$ proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 10@10s
Error: invalid endpoint RPS limits. 10@10s allows fewer requests per bucket than 300@1s (100 vs 300)
```
2023-12-13 21:43:49 +00:00
Alexander Bayandin
0cd49cac84 test_compatibility: make it use initdb.tar.zst 2023-12-13 15:04:25 -06:00
Alexander Bayandin
904dff58b5 test_wal_restore_http: cleanup test 2023-12-13 15:04:25 -06:00
Arthur Petukhovsky
f401a21cf6 Fix test_simple_sync_safekeepers
There is a postgres 16 version encoded in a binary message.
2023-12-13 15:04:25 -06:00
Tristan Partin
158adf602e Update Postgres 16 series to 16.1 2023-12-13 15:04:25 -06:00
Tristan Partin
c94db6adbb Update Postgres 15 series to 15.5 2023-12-13 15:04:25 -06:00
Tristan Partin
85720616b1 Update Postgres 14 series to 14.10 2023-12-13 15:04:25 -06:00
George MacKerron
d6fcc18eb2 Add Neon-Batch- headers to OPTIONS response for SQL-over-HTTP requests (#6116)
This is needed to allow use of batch queries from browsers.

## Problem

SQL-over-HTTP batch queries fail from web browsers because the relevant
headers, `Neon-Batch-isolation-Level` and `Neon-Batch-Read-Only`, are
not included in the server's OPTIONS response. I think we simply forgot
to add them when implementing the batch query feature.

## Summary of changes

Added `Neon-Batch-Isolation-Level` and `Neon-Batch-Read-Only` to the
OPTIONS response.
2023-12-13 17:18:20 +00:00
Vadim Kharitonov
c2528ae671 Increase pgbouncer pool size to 64 for VMs (#6124)
The pool size was changed for pods
(https://github.com/neondatabase/cloud/pull/8057). The idea to increase
it for VMs too
2023-12-13 16:23:24 +00:00
Joonas Koivunen
a919b863d1 refactor: remove eviction batching (#6060)
We no longer have `layer_removal_cs` since #5108, we no longer need
batching.
2023-12-13 18:05:33 +02:00
Joonas Koivunen
2d22661061 refactor: calculate_synthetic_size_worker, remove PRE::NeedsDownload (#6111)
Changes I wanted to make on #6106 but decided to leave out to keep that
commit clean for including in the #6090. Finally remove
`PageReconstructionError::NeedsDownload`.
2023-12-13 14:23:19 +00:00
John Spray
e3778381a8 tests: make test_bulk_insert recreate tenant in same generation (#6113)
## Problem

Test deletes tenant and recreates with the same ID. The recreation bumps
generation number. This could lead to stale generation warnings in the
logs.

## Summary of changes

Handle this more gracefully by re-creating in the same generation that
the tenant was previously attached in.

We could also update the tenant delete path to have the attachment
service to drop tenant state on delete, but I like having it there: it
makes debug easier, and the only time it's a problem is when a test is
re-using a tenant ID after deletion.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-12-13 14:14:38 +00:00
Conrad Ludgate
c8316b7a3f simplify endpoint limiter (#6122)
## Problem

1. Using chrono for durations only is wasteful
2. The arc/mutex was not being utilised
3. Locking every shard in the dashmap every GC could cause latency
spikes
4. More buckets

## Summary of changes

1. Use `Instant` instead of `NaiveTime`.
2. Remove the `Arc<Mutex<_>>` wrapper, utilising that dashmap entry
returns mut access
3. Clear only a random shard, update gc interval accordingly
4. Multiple buckets can be checked before allowing access

When I benchmarked the check function, it took on average 811ns when
multithreaded over the course of 10 million checks.
2023-12-13 13:53:23 +00:00
Stas Kelvich
8460654f61 Add per-endpoint rate limiter to proxy 2023-12-13 07:03:21 +02:00
Arpad Müller
7c2c87a5ab Update azure SDK to 0.18 and use open range support (#6103)
* Update `azure-*` crates to 0.18
* Use new open ranges support added by upstream in
https://github.com/Azure/azure-sdk-for-rust/pull/1482

Part of #5567. Prior update PR: #6081
2023-12-12 18:20:12 +01:00
Arpad Müller
5820faaa87 Use extend instead of groups of append calls in tests (#6109)
Repeated calls to `.append` don't line up as nicely as they might get
formatted in different ways. Also, it is more characters and the lines
might be longer.

Saw this while working on #5912.
2023-12-12 18:00:37 +01:00
John Spray
dfb0a6fdaf scrubber: handle initdb files, fix an issue with prefixes (#6079)
- The code for calculating the prefix in the bucket was expecting a
trailing slash (as it is in the tests), but that's an awkward
expectation to impose for use in the field: make the code more flexible
by only trimming a trailing character if it is indeed a slash.
- initdb archives were detected by the scrubber as malformed layer
files. Teach it to recognize and ignore them.
2023-12-12 16:53:08 +00:00
Alexander Bayandin
6acbee2368 test_runner: add from_repo_dir method (#6087)
## Problem

We need a reliable way to restore a project state (in this context, I
mean data on pageservers, safekeepers, and remote storage) from a
snapshot. The existing method (that we use in `test_compatibility`)
heavily relies on config files, which makes it harder to add/change
fields in the config.
The proposed solution uses config file only to get `default_tenant_id`
and `branch_name_mappings`.

## Summary of changes
- Add `NeonEnvBuilder#from_repo_dir` method, which allows using the
`neon_env_builder` fixture with data from a snapshot.
- Use `NeonEnvBuilder#from_repo_dir` in compatibility tests

Requires for https://github.com/neondatabase/neon/issues/6033
2023-12-12 16:24:13 +00:00
Konstantin Knizhnik
aec1acdbac Do not inherite replication slots in branch (#5898)
## Problem
 
See 
https://github.com/neondatabase/company_projects/issues/111
https://neondb.slack.com/archives/C03H1K0PGKH/p1700166126954079


## Summary of changes

Do not search for AUX_FILES_KEY in parent timelines

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2023-12-12 14:24:21 +02:00
Konstantin Knizhnik
8bb4a13192 Do not materialize null images in PS (#5979)
## Problem

PG16 is writing null images during relation extension.
And page server implements optimisation which replace WAL record with
FPI with page image.
So instead of WAL record ~30 bytes we store 8kb null page image.
Ans this image is almost useless, because most likely it will be shortly
rewritten with actual page content.

## Summary of changes

Do not materialize wal records with null page FPI.
 
## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-12-12 14:23:45 +02:00
Anna Khanova
9e071e4458 Propagate information about the protocol to console (#6102)
## Problem

In snowflake logs currently there is no information about the protocol,
that the client uses.

## Summary of changes

Propagate the information about the protocol together with the app_name.
In format: `{app_name}/{sql_over_http/tcp/ws}`.

This will give to @stepashka more observability on what our clients are
using.
2023-12-12 11:42:51 +00:00
John Spray
fead836f26 swagger: remove 'format: hex' from tenant IDs (#6099)
## Problem

TenantId is changing to TenantShardId in many APIs. The swagger had
`format: hex` attributes on some of these IDs. That isn't formally
defined anywhere, but a reasonable person might think it means "hex
digits only", which will no longer be the case once we start using
shard-aware IDs (they're like `<tenant_id>-0001`).



## Summary of changes

- Remove these `format` attributes from all `tenant_id` fields in the
swagger definition
2023-12-12 10:39:34 +00:00
John Spray
20e9cf7d31 pageserver: tweaks to slow/hung task logging (#6098)
## Problem

- `shutdown_tasks` would log when a particular task was taking a long
time to shut down, but not when it eventually completed. That left one
uncertain as to whether the slow task was the source of a hang, or just
a precursor.

## Summary of changes

- Add a log line after a slow task shutdown
- Add an equivalent in Gate's `warn_if_stuck`, in case we ever need it.
This isn't related to the original issue but was noticed when checking
through these logging paths.
2023-12-12 07:19:59 +00:00
Joonas Koivunen
3b04f3a749 fix: accidential return Ok (#6106)
Error indicating request cancellation OR timeline shutdown was deemed as
a reason to exit the background worker that calculated synthetic size.
Fix it to only be considered for avoiding logging such of such errors.
2023-12-11 21:27:53 +00:00
Arpad Müller
c49fd69bd6 Add initdb_lsn to TimelineInfo (#6104)
This way, we can query it.

Background: I want to do statistics for how reproducible `initdb_lsn`
really is, see https://github.com/neondatabase/cloud/issues/8284 and
https://neondb.slack.com/archives/C036U0GRMRB/p1701895218280269
2023-12-11 21:08:14 +00:00
Tristan Partin
5ab9592a2d Add submodule paths as safe directories as a precaution
The check-codestyle-rust-arm job requires this for some reason, so let's
just add them everywhere we do this workaround.
2023-12-11 13:08:37 -06:00
Tristan Partin
036558c956 Fix git ownership issue in check-codestyle-rust-arm
We have this workaround for other jobs. Looks like this one was
forgotten about.
2023-12-11 13:08:37 -06:00
John Spray
6a922b1a75 tests: start adding tests for secondary mode, live migration (#5842)
These tests have been loitering on a branch of mine for a while: they
already provide value even without all the secondary mode bits landed
yet, and the Workload helper is handy for other tests too.

- `Workload` is a re-usable test workload that replaces some of the
arbitrary "write a few rows" SQL that I've found my self repeating, and
adds a systematic way to append data and check that reads properly
reflect the changes. This append+validate stuff is important when doing
migrations, as we want to detect situations where we might be reading
from a pageserver that has not properly seen latest changes.
- test_multi_attach is a validation of how the pageserver handles
attaching the same tenant to multiple pageservers, from a safety point
of view. This is intentionally separate from the larger testing of
migration, to provide an isolated environment for multi-attachment.
- test_location_conf_churn is a pseudo-random walk through the various
states that TenantSlot can be put into, with validation that attached
tenants remain externally readable when they should, and as a side
effect validating that the compute endpoint's online configuration
changes work as expected.
- test_live_migration is the reference implementation of how to drive a
pair of pageservers through a zero-downtime migration of a tenant.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-12-11 16:55:43 +00:00
John Spray
f1fc1fd639 pageserver: further refactoring from TenantId to TenantShardId (#6059)
## Problem

In https://github.com/neondatabase/neon/pull/5957, the most essential
types were updated to use TenantShardId rather than TenantId. That
unblocked other work, but didn't fully enable running multiple shards
from the same tenant on the same pageserver.

## Summary of changes

- Use TenantShardId in page cache key for materialized pages
- Update mgr.rs get_tenant() and list_tenants() functions to use a shard
id, and update all callers.
- Eliminate the exactly_one_or_none helper in mgr.rs and all code that
used it
- Convert timeline HTTP routes to use tenant_shard_id

Note on page cache:
```
struct MaterializedPageHashKey {
    /// Why is this TenantShardId rather than TenantId?
    ///
    /// Usually, the materialized value of a page@lsn is identical on any shard in the same tenant.  However, this
    /// this not the case for certain internally-generated pages (e.g. relation sizes).  In future, we may make this
    /// key smaller by omitting the shard, if we ensure that reads to such pages always skip the cache, or are
    /// special-cased in some other way.
    tenant_shard_id: TenantShardId,
    timeline_id: TimelineId,
    key: Key,
}
```
2023-12-11 15:52:33 +00:00
Alexander Bayandin
66a7a226f8 test_runner: use toml instead of formatted strings (#6088)
## Problem

A bunch of refactorings extracted from
https://github.com/neondatabase/neon/pull/6087 (not required for it); 
the most significant one is using toml instead of formatted strings.

## Summary of changes 
- Use toml instead of formatted strings for config
- Skip pageserver log check if `pageserver.log` doesn't exist
- `chmod -x test_runner/regress/test_config.py`
2023-12-11 15:13:27 +00:00
Joonas Koivunen
f0d15cee6f build: update azure-* to 0.17 (#6081)
this is a drive-by upgrade while we refresh the access tokens at the
same time.
2023-12-11 12:21:02 +01:00
Sasha Krassovsky
0ba4cae491 Fix RLS/REPLICATION granting (#6083)
## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-12-08 12:55:44 -08:00
Andrew Rudenko
df1f8e13c4 proxy: pass neon options in deep object format (#6068)
---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2023-12-08 19:58:36 +01:00
John Spray
e640bc7dba tests: allow-lists for occasional failures (#6074)
test_creating_tenant_conf_after...
- Test detaches a tenant and then re-attaches immediatel: this causes a
race between pending remote LSN update and the generation bump in the
attachment.

test_gc_cutoff:
- Test rapidly restarts a pageserver before one generation has had the
chance to process deletions from the previous generation
2023-12-08 17:32:16 +00:00
Christian Schwarz
cf024de202 virtual_file metrics: expose max size of the fd cache (#6078)
And also leave a comment on how to determine current size.

Kind of follow-up to #6066

refs https://github.com/neondatabase/cloud/issues/8351
refs https://github.com/neondatabase/neon/issues/5479
2023-12-08 17:23:50 +00:00
Conrad Ludgate
e1a564ace2 proxy simplify cancellation (#5916)
## Problem

The cancellation code was confusing and error prone (as seen before in
our memory leaks).

## Summary of changes

* Use the new `TaskTracker` primitve instead of JoinSet to gracefully
wait for tasks to shutdown.
* Updated libs/utils/completion to use `TaskTracker`
* Remove `tokio::select` in favour of `futures::future::select` in a
specialised `run_until_cancelled()` helper function
2023-12-08 16:21:17 +00:00
Christian Schwarz
f5b9af6ac7 page cache: improve eviction-related metrics (#6077)
These changes help with identifying thrashing.

The existing `pageserver_page_cache_find_victim_iters_total` is already
useful, but, it doesn't tell us how many individual find_victim() calls
are happening, only how many clock-LRU steps happened in the entire
system,
without info about whether we needed to actually evict other data vs
just scan for a long time, e.g., because the cache is large.

The changes in this PR allows us to
1. count each possible outcome separately, esp evictions
2. compute mean iterations/outcome

I don't think anyone except me was paying close attention to
`pageserver_page_cache_find_victim_iters_total` before, so,
I think the slight behavior change of also counting iterations
for the 'iters exceeded' case is fine.

refs https://github.com/neondatabase/cloud/issues/8351
refs https://github.com/neondatabase/neon/issues/5479
2023-12-08 15:27:21 +00:00
John Spray
5e98855d80 tests: update tests that used local_fs&mock_s3 to use one or the other (#6015)
## Problem

This was wasting resources: if we run a test with mock s3 we don't then
need to run it again with local fs. When we're running in CI, we don't
need to run with the mock/local storage as well as real S3. There is
some value in having CI notice/spot issues that might otherwise only
happen when running locally, but that doesn't justify the cost of
running the tests so many more times on every PR.

## Summary of changes

- For tests that used available_remote_storages or
available_s3_storages, update them to either specify no remote storage
(therefore inherit the default, which is currently local fs), or to
specify s3_storage() for the tests that actually want an S3 API.
2023-12-08 14:52:37 +00:00
Conrad Ludgate
699049b8f3 proxy: make auth more type safe (#5689)
## Problem

a5292f7e67/proxy/src/auth/backend.rs (L146-L148)

a5292f7e67/proxy/src/console/provider/neon.rs (L90)

a5292f7e67/proxy/src/console/provider/neon.rs (L154)

## Summary of changes

1. Test backend is only enabled on `cfg(test)`.
2. Postgres mock backend + MD5 auth keys are only enabled on
`cfg(feature = testing)`
3. Password hack and cleartext flow will have their passwords validated
before proceeding.
4. Distinguish between ClientCredentials with endpoint and without,
removing many panics in the process
2023-12-08 11:48:37 +00:00
John Spray
2c544343e0 pageserver: filtered WAL ingest for sharding (#6024)
## Problem

Currently, if one creates many shards they will all ingest all the data:
not much use! We want them to ingest a proportional share of the data
each.

Closes: #6025

## Summary of changes

- WalIngest object gets a copy of the ShardIdentity for the Tenant it
was created by.
- While iterating the `blocks` part of a decoded record, blocks that do
not match the current shard are ignored, apart from on shard zero where
they are used to update relation sizes in `observe_decoded_block` (but
not stored).
- Before committing a `DataDirModificiation` from a WAL record, we check
if it's empty, and drop the record if so. This check is necessary
(rather than just looking at the `blocks` part) because certain record
types may modify blocks in non-obvious ways (e.g.
`ingest_heapam_record`).
- Add WAL ingest metrics to record the total received, total committed,
and total filtered out
- Behaviour for unsharded tenants is unchanged: they will continue to
ingest all blocks, and will take the fast path through `is_key_local`
that doesn't bother calculating any hashes.

After this change, shards store a subset of the tenant's total data, and
accurate relation sizes are only maintained on shard zero.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-12-08 10:12:37 +00:00
Arseny Sher
193e60e2b8 Fix/edit pgindent confusing places in neon. 2023-12-08 14:03:13 +04:00
Arseny Sher
1bbd6cae24 pgindent pgxn/neon 2023-12-08 14:03:13 +04:00
Arseny Sher
65f48c7002 Make targets to run pgindent on core and neon extension. 2023-12-08 14:03:13 +04:00
Alexander Bayandin
d9d8e9afc7 test_tenant_reattach: fix reattach mode names (#6070)
## Problem

Ref
https://neondb.slack.com/archives/C033QLM5P7D/p1701987609146109?thread_ts=1701976393.757279&cid=C033QLM5P7D

## Summary of changes
- Make reattach mode names unique for `test_tenant_reattach`
2023-12-08 08:39:45 +00:00
Arpad Müller
7914eaf1e6 Buffer initdb.tar.zst to a temporary file before upload (#5944)
In https://github.com/neondatabase/neon/pull/5912#pullrequestreview-1749982732 , Christian liked the idea of using files instead of buffering the
archive to RAM for the *download* path. This is for the upload path,
which is a very similar situation.
2023-12-08 03:33:44 +01:00
Joonas Koivunen
37fdbc3aaa fix: use larger buffers for remote storage (#6069)
Currently using 8kB buffers, raise that to 32kB to hopefully 1/4 of
`spawn_blocking` usage. Also a drive-by fixing of last `tokio::io::copy`
to `tokio::io::copy_buf`.
2023-12-07 19:36:44 +00:00
Tristan Partin
7aa1e58301 Add support for Python 3.12 2023-12-07 12:30:42 -06:00
Christian Schwarz
f2892d3798 virtual_file metrics: distinguish first and subsequent open() syscalls (#6066)
This helps with identifying thrashing.

I don't love the name, but, there is already "close-by-replace".

While reading the code, I also found a case where we waste
work in a cache pressure situation:
https://github.com/neondatabase/neon/issues/6065

refs https://github.com/neondatabase/cloud/issues/8351
2023-12-07 16:17:33 +00:00
Joonas Koivunen
b492cedf51 fix(remote_storage): buffering, by using streams for upload and download (#5446)
There is double buffering in remote_storage and in pageserver for 8KiB
in using `tokio::io::copy` to read `BufReader<ReaderStream<_>>`.

Switches downloads and uploads to use `Stream<Item =
std::io::Result<Bytes>>`. Caller and only caller now handles setting up
buffering. For reading, `Stream<Item = ...>` is also a `AsyncBufRead`,
so when writing to a file, we now have `tokio::io::copy_buf` reading
full buffers and writing them to `tokio::io::BufWriter` which handles
the buffering before dispatching over to `tokio::fs::File`.

Additionally implements streaming uploads for azure. With azure
downloads are a bit nicer than before, but not much; instead of one huge
vec they just hold on to N allocations we got over the wire.

This PR will also make it trivial to switch reading and writing to
io-uring based methods.

Cc: #5563.
2023-12-07 15:52:22 +00:00
John Spray
880663f6bc tests: use tenant_create() helper in test_bulk_insert (#6064)
## Problem

Since #5449 we enable generations in tests by default. Running
benchmarks was missed while merging that PR, and there was one that
needed updating.

## Summary of changes

Make test_bulk_insert use the proper generation-aware helper for tenant
creation.
2023-12-07 14:52:16 +00:00
John Spray
e89e41f8ba tests: update for tenant generations (#5449)
## Problem

Some existing tests are written in a way that's incompatible with tenant
generations.

## Summary of changes

Update all the tests that need updating: this is things like calling
through the NeonPageserver.tenant_attach helper to get a generation
number, instead of calling directly into the pageserver API. There are
various more subtle cases.
2023-12-07 12:27:16 +00:00
Conrad Ludgate
f9401fdd31 proxy: fix channel binding error messages (#6054)
## Problem

For channel binding failed messages we were still saying "channel
binding not supported" in the errors.

## Summary of changes

Fix error messages
2023-12-07 11:47:16 +00:00
Joonas Koivunen
b7ffe24426 build: update tokio to 1.34.0, tokio-utils 0.7.10 (#6061)
We should still remember to bump minimum crates for libraries beginning
to use task tracker.
2023-12-07 11:31:38 +00:00
Joonas Koivunen
52718bb8ff fix(layer): metric splitting, span rename (#5902)
Per [feedback], split the Layer metrics, also finally account for lost
and [re-submitted feedback] on `layer_gc` by renaming it to
`layer_delete`, `Layer::garbage_collect_on_drop` renamed to
`Layer::delete_on_drop`. References to "gc" dropped from metric names
and elsewhere.

Also fixes how the cancellations were tracked: there was one rare
counter. Now there is a top level metric for cancelled inits, and the
rare "download failed but failed to communicate" counter is kept.

Fixes: #6027

[feedback]: https://github.com/neondatabase/neon/pull/5809#pullrequestreview-1720043251
[re-submitted feedback]: https://github.com/neondatabase/neon/pull/5108#discussion_r1401867311
2023-12-07 11:39:40 +02:00
Joonas Koivunen
10c77cb410 temp: increase the wait tenant activation timeout (#6058)
5s is causing way too much noise; this is of course a temporary fix, we
should prioritize tenants for which there are pagestream openings the
highest, second highest the basebackups.

Deployment thread for context:
https://neondb.slack.com/archives/C03H1K0PGKH/p1701935048144479?thread_ts=1701765158.926659&cid=C03H1K0PGKH
2023-12-07 09:01:08 +00:00
Heikki Linnakangas
31be301ef3 Make simple_rcu::RcuWaitList::wait() async (#6046)
The gc_timeline() function is async, but it calls the synchronous wait()
function. In the worst case, that could lead to a deadlock by using up
all tokio executor threads.

In the passing, fix a few typos in comments.

Fixes issue #6045.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-12-07 10:20:40 +02:00
Joonas Koivunen
a3c7d400b4 fix: avoid allocations with logging a slug (#6047)
to_string forces allocating a less than pointer sized string (costing on
stack 4 usize), using a Display formattable slug saves that. the
difference seems small, but at the same time, we log these a lot.
2023-12-07 07:25:22 +00:00
Vadim Kharitonov
7501ca6efb Revert timescaledb for pg14 and pg15 (#6056)
```
could not start the compute node: compute is in state "failed": db error: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory Caused by: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory
```
2023-12-06 15:12:36 +00:00
Christian Schwarz
987c9aaea0 virtual_file: fix the metric for close() calls done by VirtualFile::drop (#6051)
Before this PR we would inc() the counter for `Close` even though the
slot's FD had already been closed.

Especially visible when subtracting `open` from `close+close-by-replace`
on a system that does a lot of attach and detach.

refs https://github.com/neondatabase/cloud/issues/8440
refs https://github.com/neondatabase/cloud/issues/8351
2023-12-06 12:05:28 +00:00
Konstantin Knizhnik
7fab731f65 Track size of FSM fork while applying records at replica (#5901)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1700560921471619

## Summary of changes

Update relation size cache for FSM fork in WAL records filter

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-12-05 18:49:24 +02:00
John Spray
483caa22c6 pageserver: logging tweaks (#6039)
- The `Attaching tenant` log message omitted some useful information
like the generation and mode
- info-level messages about writing configuration files were
unnecessarily verbose
- During process shutdown, we don't emit logs about the various phases:
this is very cheap to log since we do it once per process lifetime, and
is helpful when figuring out where something got stuck during a hang.
2023-12-05 16:11:15 +00:00
John Spray
da5e03b0d8 pageserver: add a /reset API for tenants (#6014)
## Problem

Traditionally we would detach/attach directly with curl if we wanted to
"reboot" a single tenant. That's kind of inconvenient these days,
because one needs to know a generation number to issue an attach
request.

Closes: https://github.com/neondatabase/neon/issues/6011

## Summary of changes

- Introduce a new `/reset` API, which remembers the LocationConf from
the current attachment so that callers do not have to work out the
correct configuration/generation to use.
- As an additional support tool, allow an optional `drop_cache` query
parameter, for situations where we are concerned that some on-disk state
might be bad and want to clear that as well as the in-memory state.

One might wonder why I didn't call this "reattach" -- it's because
there's already a PS->CP API of that name and it could get confusing.
2023-12-05 15:38:27 +00:00
John Spray
be885370f6 pageserver: remove redundant unsafe_create_dir_all (#6040)
This non-fsyncing analog to our safe directory creation function was
just duplicating what tokio's fs::create_dir_all does.
2023-12-05 15:03:07 +00:00
Alexey Kondratov
bc1020f965 compute_ctl: Notify waiters when Postgres failed to start (#6034)
In case of configuring the empty compute, API handler is waiting on
condvar for compute state change. Yet, previously if Postgres failed to
start we were just setting compute status to `Failed` without notifying.
It causes a timeout on control plane side, although we can return a
proper error from compute earlier.

With this commit API handler should be properly notified.
2023-12-05 13:38:45 +01:00
John Spray
61fe9d360d pageserver: add Key->Shard mapping logic & use it in page service (#5980)
## Problem

When a pageserver receives a page service request identified by
TenantId, it must decide which `Tenant` object to route it to.

As in earlier PRs, this stuff is all a no-op for tenants with a single
shard: calls to `is_key_local` always return true without doing any
hashing on a single-shard ShardIdentity.

Closes: https://github.com/neondatabase/neon/issues/6026

## Summary of changes

- Carry immutable `ShardIdentity` objects in Tenant and Timeline. These
provide the information that Tenants/Timelines need to figure out which
shard is responsible for which Key.
- Augment `get_active_tenant_with_timeout` to take a `ShardSelector`
specifying how the shard should be resolved for this tenant. This mode
depends on the kind of request (e.g. basebackups always go to shard
zero).
- In `handle_get_page_at_lsn_request`, handle the case where the
Timeline we looked up at connection time is not the correct shard for
the page being requested. This can happen whenever one node holds
multiple shards for the same tenant. This is currently written as a
"slow path" with the optimistic expectation that usually we'll run with
one shard per pageserver, and the Timeline resolved at connection time
will be the one serving page requests. There is scope for optimization
here later, to avoid doing the full shard lookup for each page.
- Omit consumption metrics from nonzero shards: only the 0th shard is
responsible for tracing accurate relation sizes.

Note to reviewers:
- Testing of these changes is happening separately on the
`jcsp/sharding-pt1` branch, where we have hacked neon_local etc needed
to run a test_pg_regress.
- The main caveat to this implementation is that page service
connections still look up one Timeline when the connection is opened,
before they know which pages are going to be read. If there is one shard
per pageserver then this will always also be the Timeline that serves
page requests. However, if multiple shards are on one pageserver then
get page requests will incur the cost of looking up the correct Timeline
on each getpage request. We may look to improve this in future with a
"sticky" timeline per connection handler so that subsequent requests for
the same Timeline don't have to look up again, and/or by having postgres
pass a shard hint when connecting. This is tracked in the "Loose ends"
section of https://github.com/neondatabase/neon/issues/5507
2023-12-05 12:01:55 +00:00
Conrad Ludgate
f60e49fe8e proxy: fix panic in startup packet (#6032)
## Problem

Panic when less than 8 bytes is presented in a startup packet.

## Summary of changes

We need there to be a 4 byte message code, so the expected min length is
8.
2023-12-05 11:24:16 +01:00
Anna Khanova
c48918d329 Rename metric (#6030)
## Problem

It looks like because of reallocation of the buckets in previous PR, the
metric is broken in graphana.

## Summary of changes

Renamed the metric.
2023-12-05 10:03:07 +00:00
Sasha Krassovsky
bad686bb71 Remove trusted from wal2json (#6035)
## Problem

## Summary of changes
2023-12-04 21:10:23 +00:00
Alexey Kondratov
85d08581ed [compute_ctl] Introduce feature flags in the compute spec (#6016)
## Problem

In the past we've rolled out all new `compute_ctl` functionality right
to all users, which could be risky. I want to have a more fine-grained
control over what we enable, in which env and to which users.

## Summary of changes

Add an option to pass a list of feature flags to `compute_ctl`. If not
passed, it defaults to an empty list. Any unknown flags are ignored.

This allows us to release new experimental features safer, as we can
then flip the flag for one specific user, only Neon employees, free /
pro / etc. users and so on. Or control it per environment.

In the current implementation feature flags are passed via compute spec,
so they do not allow controlling behavior of `empty` computes. For them,
we can either stick with the previous approach, i.e. add separate cli
args or introduce a more generic `--features` cli argument.
2023-12-04 19:54:18 +01:00
Christian Schwarz
c7f1143e57 concurrency-limit low-priority initial logical size calculation [v2] (#6000)
Problem
-------

Before this PR, there was no concurrency limit on initial logical size
computations.

While logical size computations are lazy in theory, in practice
(production), they happen in a short timeframe after restart.

This means that on a PS with 20k tenants, we'd have up to 20k concurrent
initial logical size calculation requests.

This is self-inflicted needless overload.

This hasn't been a problem so far because the `.await` points on the
logical size calculation path never return `Pending`, hence we have a
natural concurrency limit of the number of executor threads.
But, as soon as we return `Pending` somewhere in the logical size
calculation path, other concurrent tasks get scheduled by tokio.
If these other tasks are also logical size calculations, they eventually
pound on the same bottleneck.

For example, in #5479, we want to switch the VirtualFile descriptor
cache to a `tokio::sync::RwLock`, which makes us return `Pending`, and
without measures like this patch, after PS restart, VirtualFile
descriptor cache thrashes heavily for 2 hours until all the logical size
calculations have been computed and the degree of concurrency /
concurrent VirtualFile operations is down to regular levels.
See the *Experiment* section below for details.

<!-- Experiments (see below) show that plain #5479 causes heavy
thrashing of the VirtualFile descriptor cache.
The high degree of concurrency is too much for 
In the case of #5479 the VirtualFile descriptor cache size starts
thrashing heavily.


-->

Background
----------

Before this PR, initial logical size calculation was spawned lazily on
first call to `Timeline::get_current_logical_size()`.

In practice (prod), the lazy calculation is triggered by
`WalReceiverConnectionHandler` if the timeline is active according to
storage broker, or by the first iteration of consumption metrics worker
after restart (`MetricsCollection`).

The spawns by walreceiver are high-priority because logical size is
needed by Safekeepers (via walreceiver `PageserverFeedback`) to enforce
the project logical size limit.
The spawns by metrics collection are not on the user-critical path and
hence low-priority. [^consumption_metrics_slo]

[^consumption_metrics_slo]: We can't delay metrics collection
indefintely because there are TBD internal SLOs tied to metrics
collection happening in a timeline manner
(https://github.com/neondatabase/cloud/issues/7408). But let's ignore
that in this issue.

The ratio of walreceiver-initiated spawns vs
consumption-metrics-initiated spawns can be reconstructed from logs
(`spawning logical size computation from context of task kind {:?}"`).
PR #5995 and #6018 adds metrics for this.

First investigation of the ratio lead to the discovery that walreceiver
spawns 75% of init logical size computations.
That's because of two bugs:
- In Safekeepers: https://github.com/neondatabase/neon/issues/5993
- In interaction between Pageservers and Safekeepers:
https://github.com/neondatabase/neon/issues/5962

The safekeeper bug is likely primarily responsible but we don't have the
data yet. The metrics will hopefully provide some insights.

When assessing production-readiness of this PR, please assume that
neither of these bugs are fixed yet.


Changes In This PR
------------------

With this PR, initial logical size calculation is reworked as follows:

First, all initial logical size calculation task_mgr tasks are started
early, as part of timeline activation, and run a retry loop with long
back-off until success. This removes the lazy computation; it was
needless complexity because in practice, we compute all logical sizes
anyways, because consumption metrics collects it.

Second, within the initial logical size calculation task, each attempt
queues behind the background loop concurrency limiter semaphore. This
fixes the performance issue that we pointed out in the "Problem" section
earlier.

Third, there is a twist to queuing behind the background loop
concurrency limiter semaphore. Logical size is needed by Safekeepers
(via walreceiver `PageserverFeedback`) to enforce the project logical
size limit. However, we currently do open walreceiver connections even
before we have an exact logical size. That's bad, and I'll build on top
of this PR to fix that
(https://github.com/neondatabase/neon/issues/5963). But, for the
purposes of this PR, we don't want to introduce a regression, i.e., we
don't want to provide an exact value later than before this PR. The
solution is to introduce a priority-boosting mechanism
(`GetLogicalSizePriority`), allowing callers of
`Timeline::get_current_logical_size` to specify how urgently they need
an exact value. The effect of specifying high urgency is that the
initial logical size calculation task for the timeline will skip the
concurrency limiting semaphore. This should yield effectively the same
behavior as we had before this PR with lazy spawning.

Last, the priority-boosting mechanism obsoletes the `init_order`'s grace
period for initial logical size calculations. It's a separate commit to
reduce the churn during review. We can drop that commit if people think
it's too much churn, and commit it later once we know this PR here
worked as intended.

Experiment With #5479 
---------------------

I validated this PR combined with #5479 to assess whether we're making
forward progress towards asyncification.

The setup is an `i3en.3xlarge` instance with 20k tenants, each with one
timeline that has 9 layers.
All tenants are inactive, i.e., not known to SKs nor storage broker.
This means all initial logical size calculations are spawned by
consumption metrics `MetricsCollection` task kind.
The consumption metrics worker starts requesting logical sizes at low
priority immediately after restart. This is achieved by deleting the
consumption metrics cache file on disk before starting
PS.[^consumption_metrics_cache_file]

[^consumption_metrics_cache_file] Consumption metrics worker persists
its interval across restarts to achieve persistent reporting intervals
across PS restarts; delete the state file on disk to get predictable
(and I believe worst-case in terms of concurrency during PS restart)
behavior.

Before this patch, all of these timelines would all do their initial
logical size calculation in parallel, leading to extreme thrashing in
page cache and virtual file cache.

With this patch, the virtual file cache thrashing is reduced
significantly (from 80k `open`-system-calls/second to ~500
`open`-system-calls/second during loading).


### Critique

The obvious critique with above experiment is that there's no skipping
of the semaphore, i.e., the priority-boosting aspect of this PR is not
exercised.

If even just 1% of our 20k tenants in the setup were active in
SK/storage_broker, then 200 logical size calculations would skip the
limiting semaphore immediately after restart and run concurrently.

Further critique: given the two bugs wrt timeline inactive vs active
state that were mentioned in the Background section, we could have 75%
of our 20k tenants being (falsely) active on restart.

So... (next section)

This Doesn't Make Us Ready For Async VirtualFile
------------------------------------------------

This PR is a step towards asynchronous `VirtualFile`, aka, #5479 or even
#4744.

But it doesn't yet enable us to ship #5479.

The reason is that this PR doesn't limit the amount of high-priority
logical size computations.
If there are many high-priority logical size calculations requested,
we'll fall over like we did if #5479 is applied without this PR.
And currently, at very least due to the bugs mentioned in the Background
section, we run thousands of high-priority logical size calculations on
PS startup in prod.

So, at a minimum, we need to fix these bugs.

Then we can ship #5479 and #4744, and things will likely be fine under
normal operation.

But in high-traffic situations, overload problems will still be more
likely to happen, e.g., VirtualFile cache descriptor thrashing.
The solution candidates for that are orthogonal to this PR though:
* global concurrency limiting
* per-tenant rate limiting => #5899
* load shedding
* scaling bottleneck resources (fd cache size (neondatabase/cloud#8351),
page cache size(neondatabase/cloud#8351), spread load across more PSes,
etc)

Conclusion
----------

Even with the remarks from in the previous section, we should merge this
PR because:
1. it's an improvement over the status quo (esp. if the aforementioned
bugs wrt timeline active / inactive are fixed)
2. it prepares the way for
https://github.com/neondatabase/neon/pull/6010
3. it gets us close to shipping #5479 and #4744
2023-12-04 17:22:26 +00:00
Christian Schwarz
7403d55013 walredo: stderr cleanup & make explicitly cancel safe (#6031)
# Problem

I need walredo to be cancellation-safe for
https://github.com/neondatabase/neon/pull/6000#discussion_r1412049728

# Solution

We are only `async fn` because of
`wait_for(stderr_logger_task_done).await`, added in #5560 .

The `stderr_logger_cancel` and `stderr_logger_task_done` were there out
of precaution that the stderr logger task might for some reason not stop
when the walredo process terminates.
That hasn't been a problem in practice.
So, simplify things:
- remove `stderr_logger_cancel` and the
`wait_for(...stderr_logger_task_done...)`
- use `tokio::process::ChildStderr` in the stderr logger task
- add metrics to track number of running stderr logger tasks so in case
I'm wrong here, we can use these metrics to identify the issue (not
planning to put them into a dashboard or anything)
2023-12-04 16:06:41 +00:00
Anna Khanova
12f02523a4 Enable dynamic rate limiter (#6029)
## Problem

Limit the number of open connections between the control plane and
proxy.

## Summary of changes

Enable dynamic rate limiter in prod.

Unfortunately the latency metrics are a bit broken, but from logs I see
that on staging for the past 7 days only 2 times latency for acquiring
was greater than 1ms (for most of the cases it's insignificant).
2023-12-04 15:00:24 +00:00
Arseny Sher
207c527270 Safekeepers: persist state before timeline deactivation.
Without it, sometimes on restart we lose latest remote_consistent_lsn which
leads to excessive ps -> sk reconnections.

https://github.com/neondatabase/neon/issues/5993
2023-12-04 18:22:36 +04:00
John Khvatov
eae49ff598 Perform L0 compaction before creating new image layers (#5950)
If there are too many L0 layers before compaction, the compaction
process becomes slow because of slow `Timeline::get`. As a result of the
slowdown, the pageserver will generate even more L0 layers for the next
iteration, further exacerbating the slow performance.

Change to perform L0 -> L1 compaction before creating new images. The
simple change speeds up compaction time and `Timeline::get` to 5x.
`Timeline::get` is faster on top of L1 layers.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-12-04 12:35:09 +00:00
Alexander Bayandin
e6b2f89fec test_pg_clients: fix test that reads from stdout (#6021)
## Problem

`test_pg_clients` reads the actual result from a *.stdout file,
https://github.com/neondatabase/neon/pull/5977 has added a header to
such files, so `test_pg_clients` started to fail.

## Summary of changes
- Use `capture_stdout` and compare the expected result with the output
instead of *.stdout file content
2023-12-04 11:18:41 +00:00
John Spray
1d81e70d60 pageserver: tweak logs for index_part loading (#6005)
## Problem

On pageservers upgraded to enable generations, these INFO level logs
were rather frequent. If a tenant timeline hasn't written new layers
since the upgrade, it will emit the "No index_part.json*" log every time
it starts.

## Summary of changes

- Downgrade two log lines from info to debug
- Add a tiny unit test that I wrote for sanity-checking that there
wasn't something wrong with our Generation-comparing logic when loading
index parts.
2023-12-04 09:57:47 +00:00
Anastasia Lubennikova
e3512340c1 Override neon.max_cluster_size for the time of compute_ctl (#5998)
Temporarily reset neon.max_cluster_size to avoid
the possibility of hitting the limit, while we are applying config:
creating new extensions, roles, etc...
2023-12-03 15:21:44 +00:00
Christian Schwarz
e43cde7aba initial logical size: remove CALLS metric from hot path (#6018)
Only introduced a few hours ago (#5995), I took a look at the numbers
from staging and realized that `get_current_logical_size()` is on the
walingest hot path: we call it for every `ReplicationMessage::XLogData`
that we receive.

Since the metric is global, it would be quite a busy cache line.

This PR replaces it with a new metric purpose-built for what's most
interesting right now.
2023-12-01 22:45:04 +01:00
Alexey Kondratov
c1295bfb3a [compute_ctl] User correct HTTP code in the /configure errors (#6017)
It was using `PRECONDITION_FAILED` for errors during `ComputeSpec` to
`ParsedSpec` conversion, but this disobeys the OpenAPI spec [1] and
correct code should be `BAD_REQUEST` for any spec processing errors.

While on it, I also noticed that `compute_ctl` OpenAPI spec has an
invalid format and fixed it.

[1] fd81945a60/compute_tools/src/http/openapi_spec.yaml (L119-L120)
2023-12-01 18:19:55 +01:00
Joonas Koivunen
711425cc47 fix: use create_new instead of create for mutex file (#6012)
Using create_new makes the uninit marker work as a mutual exclusion
primitive. Temporary hopefully.
2023-12-01 18:30:51 +02:00
bojanserafimov
fd81945a60 Use TEST_OUTPUT envvar in pageserver (#5984) 2023-12-01 09:16:24 -05:00
bojanserafimov
e49c21a3cd Speed up rel extend (#5983) 2023-12-01 09:11:41 -05:00
Anastasia Lubennikova
92e7cd40e8 add sql_exporter to vm-image (#5949)
expose LFC metrics
2023-12-01 13:40:49 +00:00
Alexander Bayandin
7eabfc40ee test_runner: use separate directory for each rerun (#6004)
## Problem

While investigating https://github.com/neondatabase/neon/issues/5854, we
hypothesised that logs/repo-dir from the initial failure might leak into
reruns. Use different directories for each run to avoid such a
possibility.

## Summary of changes
- make each test rerun use different directories
- update `pytest-rerunfailure` plugin from 11.1.2 to 13.0
2023-12-01 13:26:19 +00:00
Christian Schwarz
ce1652990d logical size: better represent level of accuracy in the type system (#5999)
I would love to not expose the in-accurate value int he mgmt API at all,
and in fact control plane doesn't use it [^1].
But our tests do, and I have no desire to change them at this time.

[^1]: https://github.com/neondatabase/cloud/pull/8317
2023-12-01 14:16:29 +01:00
Christian Schwarz
8cd28e1718 logical size calculation: make .current_size() infallible (#5999)
... by panicking on overflow;

It was made fallible initially due to in-confidence in logical size
calculation. However, the error has never happened since I am at Neon.

Let's stop worrying about this by converting the overflow check into a panic.
2023-12-01 14:16:29 +01:00
Christian Schwarz
1c88824ed0 initial logical size calculation: add a bunch of metrics (#5995)
These will help us answer questions such as:
- when & at what do calculations get started after PS restart?
- how often is the api to get current incrementally-computed logical
  size called, and does it return Exact vs Approximate?

I'd also be interested in a histogram of how much wall clock
time size calculations take, but, I don't know good bucket sizes,
and, logging it would introduce yet another per-timeline log
message during startup; don't think that's worth it just yet.

Context

- https://neondb.slack.com/archives/C033RQ5SPDH/p1701197668789769
- https://github.com/neondatabase/neon/issues/5962
- https://github.com/neondatabase/neon/issues/5963
- https://github.com/neondatabase/neon/pull/5955
- https://github.com/neondatabase/cloud/issues/7408
2023-12-01 12:52:59 +01:00
Arpad Müller
1ce1c82d78 Clean up local state if index_part.json request gives 404 (#6009)
If `index_part.json` is (verifiably) not present on remote storage, we
should regard the timeline as inexistent. This lets `clean_up_timelines`
purge the partial local disk state, which is important in the case of
incomplete creations leaving behind state that hinders retries. For
incomplete deletions, we also want the timeline's local disk content be
gone completely.

The PR removes the allowed warnings added by #5390 and #5912, as we now
are only supposed to issue info level messages. It also adds a
reproducer for #6007, by parametrizing the
`test_timeline_init_break_before_checkpoint_recreate` test added by
#5390. If one reverts the .rs changes, the "cannot create its uninit
mark file" log line occurs once one comments out the failing checks for
the local disk state being actually empty.

Closes #6007

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-12-01 10:58:06 +00:00
Vadim Kharitonov
f784e59b12 Update timescaledb to 2.13.0 (#5975)
TimescaleDB has released 2.13.0. This version is compatible with
Postgres16
2023-11-30 17:12:52 -06:00
Arpad Müller
b71b8ecfc2 Add existing_initdb_timeline_id param to timeline creation (#5912)
This PR adds an `existing_initdb_timeline_id` option to timeline
creation APIs, taking an optional timeline ID.

Follow-up of  #5390.

If the `existing_initdb_timeline_id` option is specified via the HTTP
API, the pageserver downloads the existing initdb archive from the given
timeline ID and extracts it, instead of running initdb itself.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-11-30 22:32:04 +01:00
Arpad Müller
3842773546 Correct RFC number for Pageserver WAL DR RFC (#5997)
When I opened #5248, 27 was an unused RFC number. Since then, two RFCs
have been merged, so now 27 is taken. 29 is free though, so move it
there.
2023-11-30 21:01:25 +00:00
Conrad Ludgate
f39fca0049 proxy: chore: replace strings with SmolStr (#5786)
## Problem

no problem

## Summary of changes

replaces boxstr with arcstr as it's cheaper to clone. mild perf
improvement.

probably should look into other smallstring optimsations tbh, they will
likely be even better. The longest endpoint name I was able to construct
is something like `ep-weathered-wildflower-12345678` which is 32 bytes.
Most string optimisations top out at 23 bytes
2023-11-30 20:52:30 +00:00
Joonas Koivunen
b451e75dc6 test: include cmdline in captured output (#5977)
aiming for faster to understand a bunch of `.stdout` and `.stderr`
files, see example echo_1.stdout differences:

```
+# echo foobar abbacd
+
 foobar abbacd
```

it can be disabled and is disabled in this PR for some tests; use
`pg_bin.run_capture(..., with_command_header=False)` for that.

as a bonus this cleans up the echoed newlines from s3_scrubber
output which are also saved to file but echoed to test log.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-11-30 17:31:03 +00:00
Anna Khanova
3657a3c76e Proxy fix metrics record (#5996)
## Problem

Some latency metrics are recorded in inconsistent way.

## Summary of changes

Make sure that everything is recorded in seconds.
2023-11-30 16:33:54 +00:00
Joonas Koivunen
eba3bfc57e test: python needs thread safety as well (#5992)
we have test cases which launch processes from threads, and they capture
output assuming this counter is thread-safe. at least according to my
understanding this operation in python requires a lock to be
thread-safe.
2023-11-30 15:48:40 +00:00
John Spray
57ae9cd07f pageserver: add flush_ms and document /location_config API (#5860)
- During migration of tenants, it is useful for callers to
`/location_conf` to flush a tenant's layers while transitioning to
AttachedStale: this optimization reduces the redundant WAL replay work
that the tenant's new attached pageserver will have to do. Test coverage
for this will come as part of the larger tests for live migration in
#5745 #5842
- Flushing is controlled with `flush_ms` query parameter: it is the
caller's job to decide how long they want to wait for a flush to
complete. If flush is not complete within the time limit, the pageserver
proceeds to succeed anyway: flushing is only an optimization.
- Add swagger definitions for all this: the location_config API is the
primary interface for driving tenant migration as described in
docs/rfcs/028-pageserver-migration.md, and will eventually replace the
various /attach /detach /load /ignore APIs.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-30 14:22:07 +00:00
Christian Schwarz
3bb1030f5d walingest: refactor if-cascade on decoded.xl_rmid into match statement (#5974)
refs https://github.com/neondatabase/neon/issues/5962

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-30 14:07:41 +00:00
John Spray
5d3c3636fc tests: add a log allow list entry in test_timeline_deletion_with_files_stuck_in_upload_queue (#5981)
Test failure seen here:

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5860/7032903218/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/c0f1c79a70a3b9ab

```
E   AssertionError: assert not [(302, '2023-11-29T13:23:51.046801Z ERROR request{method=PUT path=/v1/tenant/f6b845de60cb0e92f4426e0d6af1d2ea/timeline/69a8c98004abe71a281cff8642a45274/checkpoint request_id=eca33d8a-7af2-46e7-92ab-c28629feb42c}: Error processing HTTP request: InternalServerError(queue is in state Stopped\n')]
```

This appears to be a legitimate log: the test is issuing checkpoint
requests in the background, and deleting (therefore shutting down) a
timeline.
2023-11-30 13:44:14 +00:00
Conrad Ludgate
0c87d1866b proxy: fix wake_compute error prop (#5989)
## Problem

fixes #5654 - WakeComputeErrors occuring during a connect_to_compute got
propagated as IO errors, which get forwarded to the user as "Couldn't
connect to compute node" with no helpful message.

## Summary of changes

Handle WakeComputeError during ConnectionError properly
2023-11-30 13:43:21 +00:00
Arpad Müller
8ec6033ed8 Pageserver disaster recovery RFC (#5248)
Enable the pageserver to recover from data corruption events by
implementing a feature to re-apply historic WAL records in parallel to the already
occurring WAL replay.

The feature is outside of the user-visible backup and history story, and
only
serves as a second-level backup for the case that there is a bug in the
pageservers that corrupted the served pages.

The RFC proposes the addition of two new features:
* recover a broken branch from WAL (downtime is allowed)
* a test recovery system to recover random branches to make sure
recovery works
2023-11-30 14:30:17 +01:00
Anna Khanova
e12e2681e9 IP allowlist on the proxy side (#5906)
## Problem

Per-project IP allowlist:
https://github.com/neondatabase/cloud/issues/8116

## Summary of changes

Implemented IP filtering on the proxy side. 

To retrieve ip allowlist for all scenarios, added `get_auth_info` call
to the control plane for:
* sql-over-http
* password_hack
* cleartext_hack

Added cache with ttl for sql-over-http path

This might slow down a bit, consider using redis in the future.

---------

Co-authored-by: Conrad Ludgate <conrad@neon.tech>
2023-11-30 13:14:33 +00:00
Joonas Koivunen
1e57ddaabc fix: flush loop should also keep the gate open (#5987)
I was expecting this to already be in place, because this should not
conflict how we shutdown (0. cancel, 1. shutdown_tasks, 2. close gate).
2023-11-30 14:26:11 +02:00
John Khvatov
3e094e90d7 update aws sdk to 1.0.x (#5976)
This change will be useful for experimenting with S3 performance.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-30 14:17:58 +02:00
Christian Schwarz
292281c9df pagectl: add subcommand to rewrite layer file summary (#5933)
Part of getpage@lsn benchmark epic:
https://github.com/neondatabase/neon/issues/5771
2023-11-30 11:34:30 +00:00
Rahul Modpur
50d959fddc refactor: use serde for TenantConf deserialization Fixes: #5300 (#5310)
Remove handcrafted TenantConf deserialization code. Use
`serde_path_to_error` to include the field which failed parsing. Leaves
the duplicated TenantConf in pageserver and models, does not touch
PageserverConf handcrafted deserialization.

Error change:
- before change: "configure option `checkpoint_distance` cannot be
negative"
- after change: "`checkpoint_distance`: invalid value: integer `-1`,
expected u64"

Fixes: #5300
Cc: #3682

---------

Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
Co-authored-by: Shany Pozin <shany@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-30 12:47:13 +02:00
Conrad Ludgate
fc77c42c57 proxy: add flag to enable http pool for all users (#5959)
## Problem

#5123

## Summary of changes

Add `--sql-over-http-pool-opt-in true` default cli arg. Allows us to set
`--sql-over-http-pool-opt-in false` region-by-region
2023-11-30 10:19:30 +00:00
Conrad Ludgate
f05d1b598a proxy: add more db error info (#5951)
## Problem

https://github.com/neondatabase/serverless/issues/51

## Summary of changes

include more error fields in the json response
2023-11-30 10:18:59 +00:00
Christian Schwarz
ca597206b8 walredo: latency histogram for spawn duration (#5925)
fixes https://github.com/neondatabase/neon/issues/5891
2023-11-29 18:44:37 +00:00
Rahul Modpur
46f20faa0d neon_local: fix endpoint api to prevent two primary endpoints (#5520)
`neon_local endpoint` subcommand currently allows creating two primary
endpoints for the same branch which leads to shutdown of both endpoints

`neon_local endpoint start` new behavior:
1. Fail if endpoint doesn't exist
2. Fail if two primary conflict detected

Fixes #4959
Closes #5426

Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-29 19:38:03 +02:00
John Spray
9e55ad4796 pageserver: refactor TenantId to TenantShardId in Tenant & Timeline (#5957)
(includes two preparatory commits from
https://github.com/neondatabase/neon/pull/5960)

## Problem

To accommodate multiple shards in the same tenant on the same
pageserver, we must include the full TenantShardId in local paths. That
means that all code touching local storage needs to see the
TenantShardId.

## Summary of changes

- Replace `tenant_id: TenantId` with `tenant_shard_id: TenantShardId` on
Tenant, Timeline and RemoteTimelineClient.
- Use TenantShardId in helpers for building local paths.
- Update all the relevant call sites.

This doesn't update absolutely everything: things like PageCache,
TaskMgr, WalRedo are still shard-naive. The purpose of this PR is to
update the core types so that others code can be added/updated
incrementally without churning the most central shared types.
2023-11-29 14:52:35 +00:00
John Spray
70b5646fba pageserver: remove redundant serialization helpers on DeletionList (#5960)
Precursor for https://github.com/neondatabase/neon/pull/5957

## Problem

When DeletionList was written, TenantId/TimelineId didn't have
human-friendly modes in their serde. #5335 added those, such that the
helpers used in serialization of HashMaps are no longer necessary.

## Summary of changes

- Add a unit test to ensure that this change isn't changing anything
about the serialized form
- Remove the serialization helpers for maps of Id
2023-11-29 10:39:12 +00:00
Konstantin Knizhnik
64890594a5 Optimize storing of null page in WAL (#5910)
## Problem

PG16 (https://github.com/neondatabase/postgres/pull/327) adds new
function to SMGR: zeroextend
It's implementation in Neon actually wal-log zero pages of extended
relation.
This zero page is wal-logged using XLOG_FPI.
As far as page is zero, the hole optimization (excluding from the image
everything between pg_upper and pd_lower) doesn't work.

## Summary of changes

In case of zero page (`PageIsNull()` returns true) assume
`hole_size=BLCKSZ`

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-11-29 12:08:20 +02:00
Arseny Sher
78e73b20e1 Notify safekeeper readiness with systemd.
To avoid downtime during deploy, as in busy regions initial load can currently
take ~30s.
2023-11-29 14:07:06 +04:00
John Spray
c48cc020bd pageserver: fix race between deletion completion and incoming requests (#5941)
## Problem

This is a narrow race that can leave a stuck Stopping tenant behind,
while emitting a log error "Missing InProgress marker during tenant
upsert, this is a bug"

- Deletion request 1 puts tenant into Stopping state, and fires off
background part of DeleteTenantFlow
- Deletion request 2 acquires a SlotGuard for the same tenant ID, leaves
a TenantSlot::InProgress in place while it checks if the tenant's state
is accept able.
- DeleteTenantFlow finishes, calls TenantsMap::remove, which removes the
InProgress marker.
- Deletion request 2 calls SlotGuard::revert, which upserts the old
value (the Tenant in Stopping state), and emits the telltale log
message.

Closes: #5936 

## Summary of changes

- Add a regression test which uses pausable failpoints to reproduce this
scenario.
- TenantsMap::remove is only called by DeleteTenantFlow. Its behavior is
tweaked to express the different possible states, especially
`InProgress` which carriers a barrier.
- In DeleteTenantFlow, if we see such a barrier result from remove(),
wait for the barrier and then try removing again.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-29 09:32:26 +00:00
dependabot[bot]
a15969714c build(deps): bump openssl from 0.10.57 to 0.10.60 in /test_runner/pg_clients/rust/tokio-postgres (#5966)
Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.57 to 0.10.60.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-29 02:17:15 +01:00
dependabot[bot]
8c195d8214 build(deps): bump cryptography from 41.0.4 to 41.0.6 (#5970)
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-29 02:16:35 +01:00
dependabot[bot]
0d16874960 build(deps): bump openssl from 0.10.55 to 0.10.60 (#5965)
Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.55 to 0.10.60.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-29 01:24:02 +01:00
Alexander Bayandin
fd440e7d79 neonvm: add pgbouncer patch to support DEALLOCATD/DISCARD ALL (#5958)
pgbouncer 1.21.0 doesn't play nicely with DEALLOCATD/DISCARD ALL if
prepared statement support is enabled (max_prepared_statements > 0).
There's a patch[0] that improves this (it will be included in the next 
release of pgbouncer).

This PR applies this patch on top of 1.21.0 release tarball. 
For some reason, the tarball doesn't include `test/test_prepared.py` 
(which is modified by the patch as well), so the patch can't be applied 
clearly. I use `filterdiff` (from `patchutils` package) to apply 
the required changes.

[0] a7b3c0a5f4
2023-11-28 23:43:24 +00:00
bojanserafimov
65160650da Add walingest test (#5892) 2023-11-28 12:50:53 -05:00
dependabot[bot]
12dd6b61df build(deps): bump aiohttp from 3.8.6 to 3.9.0 (#5946) 2023-11-28 17:47:15 +00:00
bojanserafimov
5345c1c21b perf readme fix (#5956) 2023-11-28 17:31:42 +00:00
Joonas Koivunen
105edc265c fix: remove layer_removal_cs (#5108)
Quest: https://github.com/neondatabase/neon/issues/4745. Follow-up to
#4938.

- add in locks for compaction and gc, so we don't have multiple
executions at the same time in tests
- remove layer_removal_cs
- remove waiting for uploads in eviction/gc/compaction
    - #4938 will keep the file resident until upload completes

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-11-28 19:15:21 +02:00
Shany Pozin
8625466144 Move run_initdb to be async and guarded by max of 8 running tasks. Fixes #5895. Use tenant.cancel for cancellation (#5921)
## Problem
https://github.com/neondatabase/neon/issues/5895
2023-11-28 14:49:31 +00:00
John Spray
1ab0cfc8cb pageserver: add sharding metadata to LocationConf (#5932)
## Problem

The TenantShardId in API URLs is sufficient to uniquely identify a
tenant shard, but not for it to function: it also needs to know its full
sharding configuration (stripe size, layout version) in order to map
keys to shards.

## Summary of changes

- Introduce ShardIdentity: this is the superset of ShardIndex (#5924 )
that is required for translating keys to shard numbers.
- Include ShardIdentity as an optional attribute of LocationConf
- Extend the public `LocationConfig` API structure with a flat
representation of shard attributes.

The net result is that at the point we construct a `Tenant`, we have a
`ShardIdentity` (inside LocationConf). This enables the next steps to
actually use the ShardIdentity to split WAL and validate that page
service requires are reaching the correct shard.
2023-11-28 13:14:51 +00:00
John Spray
ca469be1cf pageserver: add shard indices to layer metadata (#5928)
## Problem

For sharded tenants, the layer keys must include the shard number and
shard count, to disambiguate keys written by different shards in the
same tenant (shard number), and disambiguate layers written before and
after splits (shard count).

Closes: https://github.com/neondatabase/neon/issues/5924

## Summary of changes

There are no functional changes in this PR: everything behaves the same
for the default ShardIndex::unsharded() value. Actual construct of
sharded tenants will come next.

- Add a ShardIndex type: this is just a wrapper for a ShardCount and
ShardNumber. This is a subset of ShardIdentity: whereas ShardIdentity
contains enough information to filter page keys, ShardIndex contains
just enough information to construct a remote key. ShardIndex has a
compact encoding, the same as the shard part of TenantShardId.
- Store the ShardIndex as part of IndexLayerMetadata, if it is set to a
different value than ShardIndex::unsharded.
- Update RemoteTimelineClient and DeletionQueue to construct paths using
the layer metadata. Deletion code paths that previously just passed a
`Generation` now pass a full `LayerFileMetadata` to capture the shard as
well.

Notes to reviewers:
- In deletion code paths, I could have used a (Generation, ShardIndex)
instead of the full LayerFileMetadata. I opted for the full object
partly for brevity, and partly because in future when we add checksums
the deletion code really will care about the full metadata in order to
validate that it is deleting what was intended.
- While ShardIdentity and TenantShardId could both use a ShardIndex, I
find that they read more cleanly as "flat" structs that spell out the
shard count and number field separately. Serialization code would need
writing out by hand anyway, because TenantShardId's serialized form is
not a serde struct-style serialization.
- ShardIndex doesn't _have_ to exist (we could use ShardIdentity
everywhere), but it is a worthwhile optimization, as we will have many
copies of this as part of layer metadata. In future the size difference
betweedn ShardIndex and ShardIdentity may become larger if we implement
more sophisticated key distribution mechanisms (i.e. new values of
ShardIdentity::layout).

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-11-28 11:47:25 +00:00
Christian Schwarz
286f34dfce test suite: add method for generation-aware detachment of a tenant (#5939)
Part of getpage@lsn benchmark epic:
https://github.com/neondatabase/neon/issues/5771
2023-11-28 09:51:37 +00:00
Sasha Krassovsky
f290b27378 Fix check for if shmem is valid to take into account detached shmem (#5937)
## Problem
We can segfault if we update connstr inside of a process that has
detached from shmem (e.g. inside stats collector)
## Summary of changes
Add a check to make sure we're not detached
2023-11-28 03:14:42 +00:00
Sasha Krassovsky
4cd18fcebd Compile wal2json (#5893)
Add wal2json extension
2023-11-27 18:17:26 -08:00
Anastasia Lubennikova
4c29e0594e Update neon extension relocatable for existing installations (#5943) 2023-11-27 23:29:24 +00:00
Anastasia Lubennikova
3c56a4dd18 Make neon extension relocatable to allow SET SCHEMA (#5942) 2023-11-27 21:45:41 +00:00
Conrad Ludgate
316309c85b channel binding (#5683)
## Problem

channel binding protects scram from sophisticated MITM attacks where the
attacker is able to produce 'valid' TLS certificates.

## Summary of changes

get the tls-server-end-point channel binding, and verify it is correct
for the SCRAM-SHA-256-PLUS authentication flow
2023-11-27 21:45:15 +00:00
Arpad Müller
e09bb9974c bootstrap_timeline: rename initdb_path to pgdata_path (#5931)
This is a rename without functional changes, in preparation for #5912.

Split off from #5912 as per review request.
2023-11-27 20:14:39 +00:00
Anastasia Lubennikova
5289f341ce Use test specific directory in test_remote_extensions (#5938) 2023-11-27 18:57:58 +00:00
Joonas Koivunen
683ec2417c deflake: test_live_reconfig_get_evictions_low_residence_... (#5926)
- disable extra tenant
- disable compaction which could try to repartition while we assert

Split from #5108.
2023-11-27 15:20:54 +02:00
Christian Schwarz
a76a503b8b remove confusing no-op .take() of init_tenant_load_remote (#5923)
The `Tenant::spawn()` method already `.take()`s it.

I think this was an oversight in
https://github.com/neondatabase/neon/pull/5580 .
2023-11-27 12:50:19 +00:00
Anastasia Lubennikova
92bc2bb132 Refactor remote extensions feature to request extensions from proxy (#5836)
instead of direct S3 request.

Pros:
- simplify code a lot (no need to provide AWS credentials and paths);
- reduce latency of downloading extension data as proxy resides near
computes; -reduce AWS costs as proxy has cache and 1000 computes asking
the same extension will not generate 1000 downloads from S3.
- we can use only one S3 bucket to store extensions (and rid of regional
buckets which were introduced to reduce latency);

Changes:
- deprecate remote-ext-config compute_ctl parameter, use
http://pg-ext-s3-gateway if any old format remote-ext-cofig is provided;
- refactor tests to use mock http server;
2023-11-27 12:10:23 +00:00
John Spray
b80b9e1c4c pageserver: remove defunct local timeline delete markers (#5699)
## Problem

Historically, we treated the presence of a timeline on local disk as
evidence that it logically exists. Since #5580 that is no longer the
case, so we can always rely on remote storage. If we restart and the
timeline is gone in remote storage, we will also purge it from local
disk: no need for a marker.

Reference on why this PR is for timeline markers and not tenant markers:
https://github.com/neondatabase/neon/issues/5080#issuecomment-1783187807

## Summary of changes

Remove code paths that read + write deletion marker for timelines.

Leave code path that deletes these markers, just in case we deploy while
there are some in existence. This can be cleaned up later.
(https://github.com/neondatabase/neon/issues/5718)
2023-11-27 09:31:20 +00:00
Anastasia Lubennikova
87b8ac3ec3 Only create neon extension in postgres database; (#5918)
Create neon extension in neon schema.
2023-11-26 08:37:01 +00:00
Joonas Koivunen
6b1c4cc983 fix: long timeline create cancelled by tenant delete (#5917)
Fix the fallible vs. infallible check order with
`UninitTimeline::finish_creation` so that the incomplete timeline can be
removed. Currently the order of drop guard unwrapping causes uninit
files to be left on pageserver, blocking the tenant deletion.

Cc: #5914
Cc: #investigation-2023-11-23-stuck-tenant-deletion
2023-11-24 16:17:56 +00:00
Joonas Koivunen
831fad46d5 tests: fix allowed_error for compaction detecting a shutdown (#5919)
This has been causing flaky tests, [example evidence].

Follow-up to #5883 where I forgot to fix this.

[example evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5917/6981540065/index.html#suites/9d2450a537238135fd4007859e09aca7/6fd3556a879fa3d1
2023-11-24 16:14:32 +00:00
Joonas Koivunen
53851ea8ec fix: log cancelled request handler errors (#5915)
noticed during [investigation] with @problame a major point of lost
error logging which would had sped up the investigation.

Cc: #5815

[investigation]:
https://neondb.slack.com/archives/C066ZFAJU85/p1700751858049319
2023-11-24 15:54:06 +02:00
Joonas Koivunen
044375732a test: support validating allowed_errors against a logfile (#5905)
this will make it easier to test if an added allowed_error does in fact
match for example against a log file from an allure report.

```
$ python3 test_runner/fixtures/pageserver/allowed_errors.py --help
usage: allowed_errors.py [-h] [-i INPUT]

check input against pageserver global allowed_errors

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Pageserver logs file. Reads from stdin if no file is provided.
```

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-11-24 12:43:25 +00:00
Konstantin Knizhnik
ea63b43009 Check if LFC was intialized in local_cache_pages function (#5911)
## Problem

There is not check that LFC is initialised (`lfc_max_size != 0`) in
`local_cache_pages` function

## Summary of changes

Add proper check.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-11-24 08:23:00 +02:00
Conrad Ludgate
a56fd45f56 proxy: fix memory leak again (#5909)
## Problem

The connections.join_next helped but it wasn't enough... The way I
implemented the improvement before was still faulty but it mostly worked
so it looked like it was working correctly.

From [`tokio::select`
docs](https://docs.rs/tokio/latest/tokio/macro.select.html):
> 4. Once an <async expression> returns a value, attempt to apply the
value to the provided <pattern>, if the pattern matches, evaluate
<handler> and return. If the pattern does not match, disable the current
branch and for the remainder of the current call to select!. Continue
from step 3.

The `connections.join_next()` future would complete and `Some(Err(e))`
branch would be evaluated but not match (as the future would complete
without panicking, we would hope). Since the branch doesn't match, it's
disabled. The select continues but never attempts to call `join_next`
again. Getting unlucky, more TCP connections are created than we attempt
to join_next.

## Summary of changes

Replace the `Some(Err(e))` pattern with `Some(e)`. Because of the
auto-disabling feature, we don't need the `if !connections.is_empty()`
step as the `None` pattern will disable it for us.
2023-11-23 19:11:24 +00:00
Anastasia Lubennikova
582a42762b update extension version in test_neon_extension 2023-11-23 18:53:03 +00:00
Anastasia Lubennikova
f5dfa6f140 Create extension neon in existing databases too 2023-11-23 18:53:03 +00:00
Anastasia Lubennikova
f8d9bd8d14 Add extension neon to all databases.
- Run CREATE EXTENSION neon for template1, so that it was created in all databases.
- Run ALTER EXTENSION neon in all databases, to always have the newest version of the extension in computes.
- Add test_neon_extension test
2023-11-23 18:53:03 +00:00
Anastasia Lubennikova
04e6c09f14 Add pgxn/neon/README.md 2023-11-23 18:53:03 +00:00
Arpad Müller
54327bbeec Upload initdb results to S3 (#5390)
## Problem

See #2592

## Summary of changes

Compresses the results of initdb into a .tar.zst file and uploads them
to S3, to enable usage in recovery from lsn.

Generations should not be involved I think because we do this only once
at the very beginning of a timeline.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-23 18:11:52 +00:00
Shany Pozin
35f243e787 Move weekly release PR trigger to Monday morning (#5908) 2023-11-23 19:09:34 +02:00
Shany Pozin
b7a988ba46 Support cancellation for find_lsn_for_timestamp API (#5904)
## Problem
#5900
## Summary of changes
Added cancellation token as param in all relevant code paths and actually used it in the find_lsn_for_timestamp main loop
2023-11-23 17:08:32 +02:00
Christian Schwarz
a0e61145c8 fix: cleanup of layers from the future can race with their re-creation (#5890)
fixes https://github.com/neondatabase/neon/issues/5878
obsoletes https://github.com/neondatabase/neon/issues/5879

Before this PR, it could happen that `load_layer_map` schedules removal
of the future
image layer. Then a later compaction run could re-create the same image
layer, scheduling a PUT.
Due to lack of an upload queue barrier, the PUT and DELETE could be
re-ordered.
The result was IndexPart referencing a non-existent object.

## Summary of changes

* Add support to `pagectl` / Python tests to decode `IndexPart`
  * Rust
    * new `pagectl` Subcommand
* `IndexPart::{from,to}_s3_bytes()` methods to internalize knowledge
about encoding of `IndexPart`
  * Python
    * new `NeonCli` subclass
* Add regression test
  * Rust
* Ability to force repartitioning; required to ensure image layer
creation at last_record_lsn
  * Python
    * The regression test.
* Fix the issue
  * Insert an `UploadOp::Barrier` after scheduling the deletions.
2023-11-23 13:33:41 +00:00
Konstantin Knizhnik
6afbadc90e LFC fixes + statistics (#5727)
## Problem

## Summary of changes

See #5500

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-11-23 08:59:19 +02:00
Anastasia Lubennikova
2a12e9c46b Add documentation for our sample pre-commit hook (#5868) 2023-11-22 12:04:36 +00:00
Christian Schwarz
9e3c07611c logging: support output to stderr (#5896)
(part of the getpage benchmarking epic #5771)

The plan is to make the benchmarking tool log on stderr and emit results
as JSON on stdout. That way, the test suite can simply take captures
stdout and json.loads() it, while interactive users of the benchmarking
tool have a reasonable experience as well.

Existing logging users continue to print to stdout, so, this change
should be a no-op functionally and performance-wise.
2023-11-22 11:08:35 +00:00
Christian Schwarz
d353fa1998 refer to our rust-postgres.git fork by branch name (#5894)
This way, `cargo update -p tokio-postgres` just works. The `Cargo.toml`
communicates more clearly that we're referring to the `main` branch. And
the git revision is still pinned in `Cargo.lock`.
2023-11-22 10:58:27 +00:00
Joonas Koivunen
0d10992e46 Cleanup compact_level0_phase1 fsyncing (#5852)
While reviewing code noticed a scary `layer_paths.pop().unwrap()` then
realized this should be further asyncified, something I forgot to do
when I switched the `compact_level0_phase1` back to async in #4938.

This keeps the double-fsync for new deltas as #4749 is still unsolved.
2023-11-21 15:30:40 +02:00
Arpad Müller
3e131bb3d7 Update Rust to 1.74.0 (#5873)
[Release notes](https://github.com/rust-lang/rust/releases/tag/1.74.0).
2023-11-21 11:41:41 +01:00
Sasha Krassovsky
81b2cefe10 Disallow CREATE DATABASE WITH OWNER neon_superuser (#5887)
## Problem
Currently, control plane doesn't know about neon_superuser, so if a user
creates a database with owner neon_superuser it causes an exception when
it tries to forward it. It is also currently possible to ALTER ROLE
neon_superuser.

## Summary of changes
Disallow creating database with owner neon_superuser. This is probably
fine, since I don't think you can create a database with owner normal
superuser. Also forbids altering neon_superuser
2023-11-20 22:39:47 +00:00
Christian Schwarz
d2ca410919 build: back to opt-level=0 in debug builds, for faster compile times (#5751)
This change brings down incremental compilation for me
from > 1min to 10s (and this is a pretty old Ryzen 1700X).

More details: "incremental compilation" here means to change one
character
in the `failed to read value from offset` string in `image_layer.rs`.
The command for incremental compilation is `cargo build_testing`.
The system on which I got these numbers uses `mold` via
`~/.cargo/config.toml`.

As a bonus, `rust-gdb` is now at least a little fun again.

Some tests are timing out in debug builds due to these changes.
This PR makes them skip for debug builds.
We run both with debug and release build, so, the loss of coverage is
marginal.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-11-20 15:41:37 +01:00
Joonas Koivunen
d98ac04136 chore(background_tasks): missed allowed_error change, logging change (#5883)
- I am always confused by the log for the error wait time, now it will
be `2s` or `2.0s` not `2.0`
- fix missed string change introduced in #5881 [evidence]

[evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/6921062837/index.html#suites/f9eba3cfdb71aa6e2b54f6466222829b/87897fe1ddee3825
2023-11-20 07:33:17 +00:00
Joonas Koivunen
ac08072d2e fix(layer): VirtualFile opening and read errors can be caused by contention (#5880)
A very low number of layer loads have been marked wrongly as permanent,
as I did not remember that `VirtualFile::open` or reading could fail
transiently for contention. Return separate errors for transient and
persistent errors from `{Delta,Image}LayerInner::load`.

Includes drive-by comment changes.

The implementation looks quite ugly because having the same type be both
the inner (operation error) and outer (critical error), but with the
alternatives I tried I did not find a better way.
2023-11-19 14:57:39 +00:00
John Spray
d22dce2e31 pageserver: shut down idle walredo processes (#5877)
The longer a pageserver runs, the more walredo processes it accumulates
from tenants that are touched intermittently (e.g. by availability
checks). This can lead to getting OOM killed.

Changes:
- Add an Instant recording the last use of the walredo process for a
tenant
- After compaction iteration in the background task, check for idleness
and stop the walredo process if idle for more than 10x compaction
period.

Cc: #3620

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Shany Pozin <shany@neon.tech>
2023-11-19 14:21:16 +00:00
Joonas Koivunen
3b3f040be3 fix(background_tasks): first backoff, compaction error stacktraces (#5881)
First compaction/gc error backoff starts from 0 which is less than 2s
what it was before #5672. This is now fixed to be the intended 2**n.

Additionally noticed the `compaction_iteration` creating an
`anyhow::Error` via `into()` always captures a stacktrace even if we had
a stacktraceful anyhow error within the CompactionError because there is
no stable api for querying that.
2023-11-19 14:16:31 +00:00
Em Sharnoff
cad0dca4b8 compute_ctl: Remove deprecated flag --file-cache-on-disk (#5622)
See neondatabase/cloud#7516 for more.
2023-11-18 12:43:54 +01:00
Em Sharnoff
5d13a2e426 Improve error message when neon.max_cluster_size reached (#4173)
Changes the error message encountered when the `neon.max_cluster_size`
limit is reached. Reasoning is that this is user-visible, and so should
*probably* use language that's closer to what users are familiar with.
2023-11-16 21:51:26 +00:00
khanova
0c243faf96 Proxy log pid hack (#5869)
## Problem

Improve observability for the compute node.

## Summary of changes

Log pid from the compute node. Doesn't work with pgbouncer.
2023-11-16 20:46:23 +00:00
Em Sharnoff
d0a842a509 Update vm-builder to v0.19.0 and move its customization here (#5783)
ref neondatabase/autoscaling#600 for more
2023-11-16 18:17:42 +01:00
khanova
6b82f22ada Collect number of connections by sni type (#5867)
## Problem

We don't know the number of users with the different kind of
authentication: ["sni", "endpoint in options" (A and B from
[here](https://neon.tech/docs/connect/connection-errors)),
"password_hack"]

## Summary of changes

Collect metrics by sni kind.
2023-11-16 12:19:13 +00:00
John Spray
ab631e6792 pageserver: make TenantsMap shard-aware (#5819)
## Problem

When using TenantId as the key, we are unable to handle multiple tenant
shards attached to the same pageserver for the same tenant ID. This is
an expected scenario if we have e.g. 8 shards and 5 pageservers.

## Summary of changes

- TenantsMap is now a BTreeMap instead of a HashMap: this enables
looking up by range. In future, we will need this for page_service, as
incoming requests will just specify the Key, and we'll have to figure
out which shard to route it to.
- A new key type TenantShardId is introduced, to act as the key in
TenantsMap, and as the id type in external APIs. Its human readable
serialization is backward compatible with TenantId, and also
forward-compatible as long as sharding is not actually used (when we
construct a TenantShardId with ShardCount(0), it serializes to an
old-fashioned TenantId).
- Essential tenant APIs are updated to accept TenantShardIds:
tenant/timeline create, tenant delete, and /location_conf. These are the
APIs that will enable driving sharded tenants. Other apis like /attach
/detach /load /ignore will not work with sharding: those will soon be
deprecated and replaced with /location_conf as part of the live
migration work.

Closes: #5787
2023-11-15 23:20:21 +02:00
Alexander Bayandin
f84ac2b98d Fix baseline commit and branch for code coverage (#5769)
## Problem

`HEAD` commit for a PR is a phantom merge commit which skews the baseline
commit for coverage reports.

See
https://github.com/neondatabase/neon/pull/5751#issuecomment-1790717867

## Summary of changes
- Use commit hash instead of `HEAD` for finding baseline commits for
code coverage
- Use the base branch for PRs or the current branch for pushes
2023-11-15 12:40:21 +01:00
dependabot[bot]
5cd5b93066 build(deps): bump aiohttp from 3.8.5 to 3.8.6 (#5864) 2023-11-15 11:08:49 +00:00
khanova
2f0d245c2a Proxy control plane rate limiter (#5785)
## Problem

Proxy might overload the control plane.

## Summary of changes

Implement rate limiter for proxy<->control plane connection.
Resolves https://github.com/neondatabase/neon/issues/5707

Used implementation ideas from https://github.com/conradludgate/squeeze/
2023-11-15 09:15:59 +00:00
Joonas Koivunen
462f04d377 Smaller test addition and change (#5858)
- trivial serialization roundtrip test for
`pageserver::repository::Value`
- add missing `start_paused = true` to 15s test making it <0s test
- completely unrelated future clippy lint avoidance (helps beta channel
users)
2023-11-14 18:04:34 +01:00
Arpad Müller
31a54d663c Migrate links from wiki to notion (#5862)
See the slack discussion:
https://neondb.slack.com/archives/C033A2WE6BZ/p1696429688621489?thread_ts=1695647103.117499
2023-11-14 15:36:47 +00:00
John Spray
7709c91fe5 neon_local: use remote storage by default, add cargo neon tenant migrate (#5760)
## Problem

Currently the only way to exercise tenant migration is via python test
code. We need a convenient way for developers to do it directly in a
neon local environment.

## Summary of changes

- Add a `--num-pageservers` argument to `cargo neon init` so that it's
easy to run with multiple pageservers
- Modify default pageserver overrides in neon_local to set up `LocalFs`
remote storage, as any migration/attach/detach stuff doesn't work in the
legacy local storage mode. This also unblocks removing the pageserver's
support for the legacy local mode.
- Add a new `cargo neon tenant migrate` command that orchestrates tenant
migration, including endpoints.
2023-11-14 09:51:51 +00:00
Arpad Müller
f7249b9018 Fix comment in find_lsn_for_timestamp (#5855)
We still subtract 1 from low to compute `commit_lsn`. the comment
moved/added by #5844 should point this out.
2023-11-11 00:32:00 +00:00
Joonas Koivunen
74d150ba45 build: upgrade ahash (#5851)
`cargo deny` was complaining the version 0.8.3 was yanked (for possible
DoS attack [wiki]), but the latest version (0.8.5) also includes aarch64
fixes which may or may not be relevant. Our usage of ahash limits to
proxy, but I don't think we are at any risk.

[wiki]: https://github.com/tkaitchuck/aHash/wiki/Yanked-versions
2023-11-10 19:10:54 +00:00
Joonas Koivunen
b7f45204a2 build: deny async-std and friends (#5849)
rationale: some crates pull these in as default; hopefully these hints
will require less cleanup-after and Cargo.lock file watching.

follow-up to #5848.
2023-11-10 18:02:22 +01:00
Joonas Koivunen
a05f104cce build: remove async-std dependency (#5848)
Introduced by accident (missing `default-features = false`) in
e09d5ada6a. We directly need only `http_types::StatusCode`.
2023-11-10 16:05:21 +02:00
John Spray
d672e44eee pageserver: error type for collect_keyspace (#5846)
## Problem

This is a log hygiene fix, for an occasional test failure.

warn-level logging in imitate_timeline_cached_layer_accesses can't
distinguish actual errors from shutdown cases.

## Summary of changes

Replaced anyhow::Error with an explicit CollectKeySpaceError type, that
includes conversion from PageReconstructError::Cancelled.
2023-11-10 13:58:18 +00:00
Rahul Modpur
a6f892e200 metric: add started and killed walredo processes counter (#5809)
In OOM situations, knowing exactly how many walredo processes there were
at a time would help afterwards to understand why was pageserver OOM
killed. Add `pageserver_wal_redo_process_total` metric to keep track of
total wal redo process started, shutdown and killed since pageserver
start.

Closes #5722

---------

Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <me@cschwarz.com>
2023-11-10 15:05:22 +02:00
Alexander Bayandin
71b380f90a Set BUILD_TAG for build-neon job (#5847)
## Problem

I've added `BUILD_TAG` to docker images.
(https://github.com/neondatabase/neon/pull/5812), but forgot to add it
to services that we build for tests

## Summary of changes
- Set `BUILD_TAG` in `build-neon` job
2023-11-10 12:49:52 +00:00
Alexander Bayandin
6e145a44fa workflows/neon_extra_builds: run check-codestyle-rust & build-neon on arm64 (#5832)
## Problem

Some developers use workstations with arm CPUs, and sometimes x86-64
code is not fully compatible with it (for example,
https://github.com/neondatabase/neon/pull/5827).
Although we don't have arm CPUs in the prod (yet?), it is worth having
some basic checks for this architecture to have a better developer
experience.

Closes https://github.com/neondatabase/neon/issues/5829

## Summary of changes
- Run `check-codestyle-rust`-like & `build-neon`-like jobs on Arm runner
- Add `run-extra-build-*` label to run all available extra builds
2023-11-10 12:45:41 +00:00
Arpad Müller
8e5e3971ba find_lsn_for_timestamp fixes (#5844)
Includes the changes of #3689 that address point 1 of #3689, plus some
further improvements. In particular, this PR does:

* set `min_lsn` to a safe value to create branches from (and verify it
in tests)
* return `min_lsn` instead of `max_lsn` for `NoData` and `Past` (verify
it in test for `Past`, `NoData` is harder and not as important)
* return `commit_lsn` instead of `max_lsn` for Future (and verify it in
the tests)
* add some comments

Split out of #5686 to get something more minimal out to users.
2023-11-10 13:38:44 +01:00
Joonas Koivunen
8dd29f1e27 fix(pageserver): spawn all kinds of tenant shutdowns (#5841)
Minor bugfix, something noticed while manual code-review. Use the same
joinset for inprogress tenants so we can get the benefit of the
buffering logging just as we get for attached tenants, and no single
inprogress task can hold up shutdown of other tenants.
2023-11-09 21:36:57 +00:00
Joonas Koivunen
f5344fb85a temp: log all layer loading errors while we lose them (#5816)
Temporary workaround while some errors are not being logged.

Cc: #5815.
2023-11-09 21:31:53 +00:00
Arpad Müller
f95f001b8b Lsn for get_timestamp_of_lsn should be string, not integer (#5840)
The `get_timestamp_of_lsn` pageserver endpoint has been added in #5497,
but the yml it added was wrong: the lsn is expected in hex format, not
in integer (decimal) format.
2023-11-09 16:12:18 +00:00
John Spray
e0821e1eab pageserver: refined Timeline shutdown (#5833)
## Problem

We have observed the shutdown of a timeline taking a long time when a
deletion arrives at a busy time for the system. This suggests that we
are not respecting cancellation tokens promptly enough.

## Summary of changes

- Refactor timeline shutdown so that rather than having a shutdown()
function that takes a flag for optionally flushing, there are two
distinct functions, one for graceful flushing shutdown, and another that
does the "normal" shutdown where we're just setting a cancellation token
and then tearing down as fast as we can. This makes things a bit easier
to reason about, and enables us to remove the hand-written variant of
shutdown that was maintained in `delete.rs`
- Layer flush task checks cancellation token more carefully
- Logical size calculation's handling of cancellation tokens is
simplified: rather than passing one in, it respects the Timeline's
cancellation token.

This PR doesn't touch RemoteTimelineClient, which will be a key thing to
fix as well, so that a slow remote storage op doesn't hold up shutdown.
2023-11-09 16:02:59 +00:00
bojanserafimov
4469b1a62c Fix blob_io test (#5818) 2023-11-09 10:47:03 -05:00
Joonas Koivunen
842223b47f fix(metric): remove pageserver_wal_redo_wait_seconds (#5791)
the meaning of the values recorded in this histogram changed with #5560
and we never had it visualized as a histogram, just the
`increase(_sum)`. The histogram is not too interesting to look at, so
remove it per discussion in [slack
thread](https://neondb.slack.com/archives/C063LJFF26S/p1699008316109999?thread_ts=1698852436.637559&cid=C063LJFF26S).
2023-11-09 16:40:52 +02:00
Anna Stepanyan
893616051d Update epic-template.md (#5709)
replace the checkbox list with a a proper task list in the epic template

NB: this PR does not change the code, it only touches the github issue
templates
2023-11-09 15:24:43 +01:00
Conrad Ludgate
7cdde285a5 proxy: limit concurrent wake_compute requests per endpoint (#5799)
## Problem

A user can perform many database connections at the same instant of time
- these will all cache miss and materialise as requests to the control
plane. #5705

## Summary of changes

I am using a `DashMap` (a sharded `RwLock<HashMap>`) of endpoints ->
semaphores to apply a limiter. If the limiter is enabled (permits > 0),
the semaphore will be retrieved per endpoint and a permit will be
awaited before continuing to call the wake_compute endpoint.

### Important details

This dashmap would grow uncontrollably without maintenance. It's not a
cache so I don't think an LRU-based reclamation makes sense. Instead,
I've made use of the sharding functionality of DashMap to lock a single
shard and clear out unused semaphores periodically.

I ran a test in release, using 128 tokio tasks among 12 threads each
pushing 1000 entries into the map per second, clearing a shard every 2
seconds (64 second epoch with 32 shards). The endpoint names were
sampled from a gamma distribution to make sure some overlap would occur,
and each permit was held for 1ms. The histogram for time to clear each
shard settled between 256-512us without any variance in my testing.

Holding a lock for under a millisecond for 1 of the shards does not
concern me as blocking
2023-11-09 14:14:30 +00:00
John Spray
9c30883c4b remote_storage: use S3 SDK's adaptive retry policy (#5813)
## Problem

Currently, we aren't doing any explicit slowdown in response to 429
responses. Recently, as we hit remote storage a bit harder (pageserver
does more ListObjectsv2 requests than it used to since #5580 ), we're
seeing storms of 429 responses that may be the result of not just doing
too may requests, but continuing to do those extra requests without
backing off any more than our usual backoff::exponential.

## Summary of changes

Switch from AWS's "Standard" retry policy to "Adaptive" -- docs describe
this as experimental but it has been around for a long time. The main
difference between Standard and Adaptive is that Adaptive rate-limits
the client in response to feedback from the server, which is meant to
avoid scenarios where the client would otherwise repeatedly hit
throttling responses.
2023-11-09 13:50:13 +00:00
Arthur Petukhovsky
0495798591 Fix walproposer build on aarch64 (#5827)
There was a compilation error due to `std::ffi::c_char` being different type on different platforms. Clippy also complained due to a similar reason.
2023-11-09 13:05:17 +00:00
Sasha Krassovsky
87389bc933 Add test simulating bad connection between pageserver and compute (#5728)
## Problem
We have a funny 3-day timeout for connections between the compute and
pageserver. We want to get rid of it, so to do that we need to make sure
the compute is resilient to connection failures.

Closes: https://github.com/neondatabase/neon/issues/5518

## Summary of changes
This test makes the pageserver randomly drop the connection if the
failpoint is enabled, and ensures we can keep querying the pageserver.

This PR also reduces the default timeout to 10 minutes from 3 days.
2023-11-08 19:48:57 +00:00
Arpad Müller
ea118a238a JWT logging improvements (#5823)
* lower level on auth success from info to debug (fixes #5820)
* don't log stacktraces on auth errors (as requested on slack). we do this by introducing an `AuthError` type instead of using `anyhow` and `bail`.
* return errors that have been censored for improved security.
2023-11-08 16:56:53 +00:00
Christian Schwarz
e9b227a11e cleanup unused RemoteStorage fields (#5830)
Found this while working on #5771
2023-11-08 16:54:33 +00:00
John Spray
40441f8ada pageserver: use Gate for stronger safety check in SlotGuard (#5793)
## Problem

#5711 and #5367 raced -- the `SlotGuard` type needs `Gate` to properly
enforce its invariant that we may not drop an `Arc<Tenant>` from a slot.

## Summary of changes

Replace the TODO with the intended check of Gate.
2023-11-08 13:00:11 +00:00
John Spray
a8a39cd464 test: de-flake test_deletion_queue_recovery (#5822)
## Problem

This test could fail if timing is unlucky, and the deletions in the test
land in two deletion lists instead of one.

## Summary of changes

We await _some_ validations instead of _all_ validations, because our
execution failpoint
will prevent validation proceeding for any but the first DeletionList.
Usually the workload
just generates one, but if it generates two due to timing, then we must
not expect that the
second one will be validated.
2023-11-08 12:41:48 +00:00
John Spray
b989ad1922 extend test_change_pageserver for failure case, rework changing pageserver (#5693)
Reproducer for https://github.com/neondatabase/neon/issues/5692

The test change in this PR intentionally fails, to demonstrate the
issue.

---------

Co-authored-by: Sasha Krassovsky <krassovskysasha@gmail.com>
2023-11-08 11:26:56 +00:00
Em Sharnoff
acef742a6e vm-monitor: Remove dependency on workspace_hack (#5752)
neondatabase/autoscaling builds libs/vm-monitor during CI because it's a
necessary component of autoscaling.

workspace_hack includes a lot of crates that are not necessary for
vm-monitor, which artificially inflates the build time on the
autoscaling side, so hopefully removing the dependency should speed
things up.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-07 09:41:20 -08:00
duguorong009
11d9d801b5 pageserver: improve the shutdown log error (#5792)
## Problem
- Close #5784 

## Summary of changes
- Update the `GetActiveTenantError` -> `QueryError` conversion process
in `pageserver/src/page_service.rs`
- Update the pytest logging exceptions in
`./test_runner/regress/test_tenant_detach.py`
2023-11-07 16:57:26 +00:00
Andrew Rudenko
fc47af156f Passing neon options to the console (#5781)
The idea is to pass neon_* prefixed options to control plane. It can be
used by cplane to dynamically create timelines and computes. Such
options also should be excluded from passing to compute. Another issue
is how connection caching is working now, because compute's instance now
depends not only on hostname but probably on such options too I included
them to cache key.
2023-11-07 16:49:26 +01:00
Arpad Müller
e310533ed3 Support JWT key reload in pageserver (#5594)
## Problem

For quickly rotating JWT secrets, we want to be able to reload the JWT
public key file in the pageserver, and also support multiple JWT keys.

See #4897.

## Summary of changes

* Allow directories for the `auth_validation_public_key_path` config
param instead of just files. for the safekeepers, all of their config options
also support multiple JWT keys.
* For the pageservers, make the JWT public keys easily globally swappable
by using the `arc-swap` crate.
* Add an endpoint to the pageserver, triggered by a POST to
`/v1/reload_auth_validation_keys`, that reloads the JWT public keys from
the pre-configured path (for security reasons, you cannot upload any
keys yourself).

Fixes #4897

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-07 15:43:29 +01:00
John Spray
1d68f52b57 pageserver: move deletion failpoint inside backoff (#5814)
## Problem

When enabled, this failpoint would busy-spin in a loop that emits log
messages.

## Summary of changes

Move the failpoint inside a backoff::exponential block: it will still
spam the log, but at much lower rate.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-07 14:25:51 +00:00
Alexander Bayandin
4cd47b7d4b Dockerfile: Set BUILD_TAG for storage services (#5812)
## Problem

https://github.com/neondatabase/neon/pull/5576 added `build-tag`
reporting to `libmetrics_build_info`, but it's not reported because we
didn't set the corresponding env variable in the build process.

## Summary of changes
- Add `BUILD_TAG` env var while building services
2023-11-07 13:45:59 +00:00
Fernando Luz
0141c95788 build: Add warning when missing postgres submodule during the build (#5614)
I forked the project and in my local repo, I wasn't able to compile the
project and in my search, I found the solution in neon forum. After a PR
discussion, I made a change in the makefile to alert the missing `git
submodules update` step.

---------

Signed-off-by: Fernando Luz <prof.fernando.luz@gmail.com>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-07 12:13:05 +00:00
Shany Pozin
0ac4cf67a6 Use self.tenants instead of TENANTS (#5811) 2023-11-07 11:38:02 +00:00
Joonas Koivunen
4be6bc7251 refactor: remove unnecessary unsafe (#5802)
unsafe impls for `Send` and `Sync` should not be added by default. in
the case of `SlotGuard` removing them does not cause any issues, as the
compiler automatically derives those.

This PR adds requirement to document the unsafety (see
[clippy::undocumented_unsafe_blocks]) and opportunistically adds
`#![deny(unsafe_code)]` to most places where we don't have unsafe code
right now.

TRPL on Send and Sync:
https://doc.rust-lang.org/book/ch16-04-extensible-concurrency-sync-and-send.html

[clippy::undocumented_unsafe_blocks]:
https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks
2023-11-07 10:26:25 +00:00
John Spray
a394f49e0d pageserver: avoid converting an error to anyhow::Error (#5803)
This was preventing it getting cleanly converted to a
CalculateLogicalSizeError::Cancelled, resulting in "Logical size
calculation failed" errors in logs.
2023-11-07 09:35:45 +00:00
John Spray
c00651ff9b pageserver: start refactoring into TenantManager (#5797)
## Problem

See: https://github.com/neondatabase/neon/issues/5796

## Summary of changes

Completing the refactor is quite verbose and can be done in stages: each
interface that is currently called directly from a top-level mgr.rs
function can be moved into TenantManager once the relevant subsystems
have access to it.

Landing the initial change to create of TenantManager is useful because
it enables new code to use it without having to be altered later, and
sets us up to incrementally fix the existing code to use an explicit
Arc<TenantManager> instead of relying on the static TENANTS.
2023-11-07 09:06:53 +00:00
Richy Wang
bea8efac24 Fix comments in 'receive_wal.rs'. (#5807)
## Problem
Some comments in 'receive_wal.rs' is not suitable. It may copy from
'send_wal.rs' and leave it unchanged.
## Summary of changes
This commit fixes two comments in the code:
Changed "/// Unregister walsender." to "/// Unregister walreceiver."
Changed "///Scope guard to access slot in WalSenders registry" to
"///Scope guard to access slot in WalReceivers registry."
2023-11-07 09:13:01 +01:00
Conrad Ludgate
ad5b02e175 proxy: remove unsafe (#5805)
## Problem

`unsafe {}`

## Summary of changes

`CStr` has a method to parse the bytes up to a null byte, so we don't
have to do it ourselves.
2023-11-06 17:44:44 +00:00
Arpad Müller
b09a851705 Make azure blob storage not do extra metadata requests (#5777)
Load the metadata from the returned `GetBlobResponse` and avoid
downloading it via a separate request.
As it turns out, the SDK does return the metadata:
https://github.com/Azure/azure-sdk-for-rust/issues/1439 .

This PR will reduce the number of requests to Azure caused by downloads.

Fixes #5571
2023-11-06 15:16:55 +00:00
John Spray
85cd97af61 pageserver: add InProgress tenant map state, use a sync lock for the map (#5367)
## Problem

Follows on from #5299 
- We didn't have a generic way to protect a tenant undergoing changes:
`Tenant` had states, but for our arbitrary transitions between
secondary/attached, we need a general way to say "reserve this tenant
ID, and don't allow any other ops on it, but don't try and report it as
being in any particular state".
- The TenantsMap structure was behind an async RwLock, but it was never
correct to hold it across await points: that would block any other
changes for all tenants.


## Summary of changes

- Add the `TenantSlot::InProgress` value.  This means:
  - Incoming administrative operations on the tenant should retry later
- Anything trying to read the live state of the tenant (e.g. a page
service reader) should retry later or block.
- Store TenantsMap in `std::sync::RwLock`
- Provide an extended `get_active_tenant_with_timeout` for page_service
to use, which will wait on InProgress slots as well as non-active
tenants.

Closes: https://github.com/neondatabase/neon/issues/5378

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-11-06 14:03:22 +00:00
Arpad Müller
e6470ee92e Add API description for safekeeper copy endpoint (#5770)
Adds a yaml API description for a new endpoint that allows creation of a
new timeline as the copy of an existing one.
 
Part of #5282
2023-11-06 15:00:07 +01:00
bojanserafimov
dc72567288 Layer flush minor speedup (#5765)
Convert keys to `i128` before sorting
2023-11-06 08:58:20 -05:00
John Spray
6defa2b5d5 pageserver: add Gate as a partner to CancellationToken for safe shutdown of Tenant & Timeline (#5711)
## Problem

When shutting down a Tenant, it isn't just important to cause any
background tasks to stop. It's also important to wait until they have
stopped before declaring shutdown complete, in cases where we may re-use
the tenant's local storage for something else, such as running in
secondary mode, or creating a new tenant with the same ID.

## Summary of changes

A `Gate` class is added, inspired by
[seastar::gate](https://docs.seastar.io/master/classseastar_1_1gate.html).
For types that have an important lifetime that corresponds to some
physical resource, use of a Gate as well as a CancellationToken provides
a robust pattern for async requests & shutdown:
- Requests must always acquire the gate as long as they are using the
object
- Shutdown must set the cancellation token, and then `close()` the gate
to wait for requests in progress before returning.

This is not for memory safety: it's for expressing the difference
between "Arc<Tenant> exists", and "This tenant's files on disk are
eligible to be read/written".

- Both Tenant and Timeline get a Gate & CancellationToken.
- The Timeline gate is held during eviction of layers, and during
page_service requests.
- Existing cancellation support in page_service is refined to use the
timeline-scope cancellation token instead of a process-scope
cancellation token. This replaces the use of `task_mgr::associate_with`:
tasks no longer change their tenant/timelineidentity after being
spawned.

The Tenant's Gate is not yet used, but will be important for
Tenant-scoped operations in secondary mode, where we must ensure that
our secondary-mode downloads for a tenant are gated wrt the activity of
an attached Tenant.

This is part of a broader move away from using the global-state driven
`task_mgr` shutdown tokens:
- less global state where we rely on implicit knowledge of what task a
given function is running in, and more explicit references to the
cancellation token that a particular function/type will respect, making
shutdown easier to reason about.
- eventually avoid the big global TASKS mutex.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-06 12:39:20 +00:00
duguorong009
b3d3a2587d feat: improve the serde impl for several types(Lsn, TenantId, TimelineId ...) (#5335)
Improve the serde impl for several types (`Lsn`, `TenantId`,
`TimelineId`) by making them sensitive to
`Serializer::is_human_readadable` (true for json, false for bincode).

Fixes #3511 by:
- Implement the custom serde for `Lsn`
- Implement the custom serde for `Id`
- Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs`
- Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in
all possible structs

Additionally some safekeeper types gained serde tests.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-11-06 11:40:03 +02:00
Heikki Linnakangas
b85fc39bdb Update control plane API path for getting compute spec. (#5357)
We changed the path in the control plane. The old path is still accepted
for compatibility with existing computes, but we'd like to phase it out.
2023-11-06 09:26:09 +02:00
duguorong009
09b5954526 refactor: use streaming in safekeeper /v1/debug_dump http response (#5731)
- Update the handler for `/v1/debug_dump` http response in safekeeper
- Update the `debug_dump::build()` to use the streaming in JSON build
process
2023-11-05 10:16:54 +00:00
John Spray
306c4f9967 s3_scrubber: prepare for scrubbing buckets with generation-aware content (#5700)
## Problem

The scrubber didn't know how to find the latest index_part when
generations were in use.

## Summary of changes

- Teach the scrubber to do the same dance that pageserver does when
finding the latest index_part.json
- Teach the scrubber how to understand layer files with generation
suffixes.
- General improvement to testability: scan_metadata has a machine
readable output that the testing `S3Scrubber` wrapper can read.
- Existing test coverage of scrubber was false-passing because it just
didn't see any data due to prefixing of data in the bucket. Fix that.

This is incremental improvement: the more confidence we can have in the
scrubber, the more we can use it in integration tests to validate the
state of remote storage.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-11-03 17:36:02 +00:00
Konstantin Knizhnik
5ceccdc7de Logical replication startup fixes (#5750)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1698226491736459

## Summary of changes

Update WAL affected buffers when restoring WAL from safekeeper

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2023-11-03 18:40:27 +02:00
Conrad Ludgate
cdcaa329bf proxy: no more statements (#5747)
## Problem

my prepared statements change in tokio-postgres landed in the latest
release. it didn't work as we intended

## Summary of changes

https://github.com/neondatabase/rust-postgres/pull/24
2023-11-03 08:30:58 +00:00
Joonas Koivunen
27bdbf5e36 chore(layer): restore logging, doc changes (#5766)
Some of the log messages were lost with the #4938. This PR adds some of
them back, most notably:

- starting to on-demand download
- successful completion of on-demand download
- ability to see when there were many waiters for the layer download
- "unexpectedly on-demand downloading ..." is now `info!`

Additionally some rare events are logged as error, which should never
happen.
2023-11-02 19:05:33 +00:00
khanova
4c7fa12a2a Proxy introduce allowed ips (#5729)
## Problem

Proxy doesn't accept wake_compute responses with the allowed IPs.

## Summary of changes

Extend wake_compute api to be able to return allowed_ips.
2023-11-02 16:26:15 +00:00
Em Sharnoff
367971a0e9 vm-monitor: Remove support for file cache in tmpfs (#5617)
ref neondatabase/cloud#7516.

We switched everything over to file cache on disk, now time to remove
support for having it in tmpfs.
2023-11-02 16:06:16 +00:00
bojanserafimov
51570114ea Remove outdated and flaky perf test (#5762) 2023-11-02 10:43:59 -04:00
Joonas Koivunen
098d3111a5 fix(layer): get_and_upgrade and metrics (#5767)
when introducing `get_and_upgrade` I forgot that an `evict_and_wait`
would had already incremented the counter for started evictions, but an
upgrade would just "silently" cancel the eviction as no drop would ever
run. these metrics are likely sources for alerts with the next release,
so it's important to keep them correct.
2023-11-02 13:06:14 +00:00
Joonas Koivunen
3737fe3a4b fix(layer): error out early if layer path is non-file (#5756)
In an earlier PR
https://github.com/neondatabase/neon/pull/5743#discussion_r1378625244 I
added a FIXME and there's a simple solution suggested by @jcsp, so
implement it. Wondering why I did not implement this originally, there
is no concept of a permanent failure, so this failure will happen quite
often. I don't think the frequency is a problem however.

Sadly for std::fs::FileType there is only decimal and hex formatting, no
octal.
2023-11-02 11:03:38 +00:00
John Spray
5650138532 pageserver: helpers for explicitly dying on fatal I/O errors (#5651)
Following from discussion on
https://github.com/neondatabase/neon/pull/5436 where hacking an implicit
die-on-fatal-io behavior into an Error type was a source of disagreement
-- in this PR, dying on fatal I/O errors is explicit, with `fatal_err`
and `maybe_fatal_err` helpers in the `MaybeFatalIo` trait, which is
implemented for std::io::Result.

To enable this approach with `crashsafe_overwrite`, the return type of
that function is changed to std::io::Result -- the previous error enum
for this function was not used for any logic, and the utility of saying
exactly which step in the function failed is outweighed by the hygiene
of having an I/O funciton return an io::Result.

The initial use case for these helpers is the deletion queue.
2023-11-02 09:14:26 +00:00
Joonas Koivunen
2dca4c03fc feat(layer): cancellable get_or_maybe_download (#5744)
With the layer implementation as was done in #4938, it is possible via
cancellation to cause two concurrent downloads on the same path, due to
how `RemoteTimelineClient::download_remote_layer` does tempfiles. Thread
the init semaphore through the spawned task of downloading to make this
impossible to happen.
2023-11-02 08:06:32 +00:00
bojanserafimov
0b790b6d00 Record wal size in import benchmark (#5755) 2023-11-01 17:02:58 -04:00
Joonas Koivunen
e82d1ad6b8 fix(layer): reinit on access before eviction happens (#5743)
Right before merging, I added a loop to `fn
LayerInner::get_or_maybe_download`, which was always supposed to be
there. However I had forgotten to restart initialization instead of
waiting for the eviction to happen to support original design goal of
"eviction should always lose to redownload (or init)". This was wrong.
After this fix, if `spawn_blocking` queue is blocked on something,
nothing bad will happen.

Part of #5737.
2023-11-01 17:38:32 +02:00
Muhammet Yazici
4f0a8e92ad fix: Add bearer prefix to Authorization header (#5740)
## Problem

Some requests with `Authorization` header did not properly set the
`Bearer ` prefix. Problem explained here
https://github.com/neondatabase/cloud/issues/6390.

## Summary of changes

Added `Bearer ` prefix to missing requests.
2023-11-01 09:41:48 +03:00
Konstantin Knizhnik
5952f350cb Always handle POLLHUP in walredo error poll loop (#5716)
## Problem

test_stderr hangs on MacOS.

See https://neondb.slack.com/archives/C036U0GRMRB/p1698438997903919

## Summary of changes

Always handle POLLHUP to prevent infinite loop.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-10-31 20:57:03 +02:00
Tristan Partin
726c8e6730 Add docs for updating Postgres for new minor versions 2023-10-31 12:31:14 -05:00
Em Sharnoff
f7067a38b7 compute_ctl: Assume --vm-monitor-addr arg is always present (#5611)
It has a default value, so this should be sound.
Treating its presence as semantically significant was leading to
spurious warnings.
2023-10-31 10:00:23 -07:00
Joonas Koivunen
896347f307 refactor(layer): remove version checking with atomics (#5742)
The `LayerInner::version` never needed to be read in more than one
place. Clarified while fixing #5737 of which this is the first step.
This decrements possible wrong atomics usage in Layer, but does not
really fix anything.
2023-10-31 18:40:08 +02:00
John Spray
e5c81fef86 tests: minor improvements (#5674)
Minor changes from while I have been working on HA tests:
- Manual pytest executions came with some warnings from `log.warn()`
usage
- When something fails in a generations-enabled test, it it useful to
have a log from the attachment service of what attached when, and with
which generation.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-10-31 11:44:35 +00:00
Christian Schwarz
7ebe9ca1ac pageserver: /attach: clarify semantics of 409 (#5698)
context: https://app.incident.io/neondb/incidents/75
specifically:
https://neondb.slack.com/archives/C0634NXQ6E7/p1698422852902959?thread_ts=1698419362.155059&cid=C0634NXQ6E7
2023-10-31 09:32:58 +01:00
Shany Pozin
1588601503 Move release PR creation to Friday (#5721)
Prepare for a new release workflow
* Release PR is created on Fridays
* The discussion/approval happens during Friday
* Sunday morning the deployment will be done in central-il and perf
tests will be run
* On Monday early IST morning gradually start rolling (starting from US
regions as they are still in weekend time)

See slack for discussion:
https://neondb.slack.com/archives/C04P81J55LK/p1698565305607839?thread_ts=1698428241.031979&cid=C04P81J55LK
2023-10-30 22:10:24 +01:00
John Spray
9c35e1e6e5 pageserver: downgrade slow task warnings from warn to info (#5724)
## Problem

In #5658 we suppressed the first-iteration output from these logs, but
the volume of warnings is still problematic.

## Summary of changes

- Downgrade all slow task warnings to INFO. The information is still
there if we actively want to know about which tasks are running slowly,
without polluting the overall stream of warnings with situations that
are unsurprising to us.
- Revert the previous change so that we output on the first iteration as
we used to do. There is no reason to suppress these, now that the
severity is just info.
2023-10-30 18:32:30 +00:00
Conrad Ludgate
d8c21ec70d fix nightly 1.75 (#5719)
## Problem

Neon doesn't compile on nightly and had numerous clippy complaints.

## Summary of changes

1. Fixed troublesome dependency
2. Fixed or ignored the lints where appropriate
2023-10-30 16:43:06 +00:00
Konstantin Knizhnik
ad99fa5f03 Grant BYPASSRLS and REPLICATION to exited roles (#5657)
## Problem

Role need to have REPLICATION privilege to be able to used for logical
replication.
New roles are created with this option.
This PR tries to update existed roles.

## Summary of changes

Update roles in `handle_roles` method

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-10-30 15:29:25 +00:00
John Spray
e675f4cec8 tests: fix missing comma in test_timeline_deletion_with_files_stuck_… (#5713)
…in_upload_queue

This was a syntax mistake in
https://github.com/neondatabase/neon/pull/5149

We didn't notice because the situation the log allow list covers is a
relative rare race.
2023-10-30 15:18:32 +00:00
Joonas Koivunen
4db8efb2cf Layer: logging fixes (#5676)
- include Layer generation in the default display, with
Generation::Broken as `-broken`
- omit layer from `layer_gc` span because the api it works with needs to
support N layers, so the api needs to log each layer
2023-10-30 16:21:30 +02:00
John Spray
07c2b29895 pageserver: fix error logging on stray timeline files (#5712)
## Problem

If there were stray files in the timelines/ dir after tenant deletion,
pageserver could panic on out of range.

## Summary of changes

Use iterator `take()`, which doesn't care if the number of elements
available is less than requested.
2023-10-30 13:24:52 +00:00
Konstantin Knizhnik
9cdffd164a Prevent SIGSEGV in apply_error_callback when record was not decoded (#5703)
## Problem

See 

https://neondb.slack.com/archives/C036U0GRMRB/p1698652221399419?thread_ts=1698438997.903919&cid=C036U0GRMRB

## Summary of changes

Check if record pointer is not NULL before trying to print record
descriptor

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-10-30 12:06:08 +02:00
John Spray
87db4b441c pageserver: cleaner shutdown in timeline delete (#5701)
The flush task logs a backtrace if it tries to upload and remote
timeline client is already in stopped state.

Therefore we cannot shut them down concurrently: flush task must be shut
down first.

This wasn't more obvious because:
- Timeline deletions IRL usually happen when not much is being written
- In tests, there is a global allow-list for this log

It's not obvious whether removing the global log allow list is safe,
this PR was prompted by how the log spam got in my way when testing
deletion changes.
2023-10-30 09:18:40 +00:00
Conrad Ludgate
964c5c56b7 proxy: dont retry server errors (#5694)
## Problem

accidental spam

## Summary of changes

don't spam control plane if control plane is down :)

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-10-30 08:38:56 +00:00
Arpad Müller
bd59349af3 Fix Rust 1.74 warnings (#5702)
Fixes new warnings and clippy changes introduced by version 1.74 of the
rust compiler toolchain.
2023-10-28 03:47:26 +02:00
Joonas Koivunen
2bd79906d9 fix: possible page_service hang on cancel (#5696)
Fixes #5341, one more suspected case, see:
https://github.com/neondatabase/neon/issues/5341#issuecomment-1783052379

- races `MaybeWriteOnly::shutdown` with cancellation
- switches to using `AsyncWriteExt::write_buf`
- notes cancellation safety for shutdown
2023-10-27 19:09:34 +01:00
Conrad Ludgate
493b47e1da proxy: exclude client latencies in metrics (#5688)
## Problem

In #5539, I moved the connect_to_compute latency to start counting
before authentication - this is because authentication will perform some
calls to the control plane in order to get credentials and to eagerly
wake a compute server. It felt important to include these times in the
latency metric as these are times we should definitely care about
reducing.

What is not interesting to record in this metric is the roundtrip time
during authentication when we wait for the client to respond.

## Summary of changes

Implement a mechanism to pause the latency timer, resuming on drop of
the pause struct. We pause the timer right before we send the
authentication message to the client, and we resume the timer right
after we complete the authentication flow.
2023-10-27 17:17:39 +00:00
John Spray
c13e932c3b pageserver: add generation fields in openapi spec (#5690)
These optional fields have existed for as while, but weren't mentioned
in `openapi_spec.yaml` yet.
2023-10-27 14:20:04 +01:00
Gleb Novikov
a5292f7e67 Some minor renames in attachment service API (#5687)
## Problem

## Summary of changes

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] ~~If it is a core feature, I have added thorough tests.~~
- [ ] ~~Do we need to implement analytics? if so did you add the
relevant metrics to the dashboard?~~
- [ ] ~~If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.~~

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-10-27 12:36:34 +01:00
Arthur Petukhovsky
262348e41b Fix safekeeper log spans (#5643)
We were missing spans with ttid in "WAL backup" and several other
places, this commit should fix it.

Here are the examples of logs before and after:
https://gist.github.com/petuhovskiy/711a4a4e7ddde3cab3fa6419b2f70fb9
2023-10-27 12:09:02 +01:00
Joonas Koivunen
68f15cf967 fix: schedule_compaction_update must only unlink (#5675)
#5649 added the concept of dangling layers which #4938 uses but only
partially. I forgot to change `schedule_compaction_update` to not
schedule deletions to uphold the "have a layer, you can read it".

With the now remembered fix, I don't think these checks should ever fail
except for a mistake I already did. These changes might be useful for
protecting future changes, even though the Layer carrying the generation
AND the `schedule_(gc|compaction)_update` require strong arcs.

Rationale for keeping the `#[cfg(feature = "testing")]` is worsening any
leak situation which might come up.
2023-10-27 11:16:01 +01:00
duguorong009
39f8fd6945 feat: add build_tag env support for set_build_info_metric (#5576)
- Add a new util `project_build_tag` macro, similar to
`project_git_version`
- Update the `set_build_info_metric` to accept and make use of
`build_tag` info
- Update all codes which use the `set_build_info_metric`
2023-10-27 10:47:11 +01:00
John Spray
83567f9e4e tests: revise perf test that interfered with local disk state (#5682)
This benchmark started failing after #5580 merged.

It was manually deleting some local content on a pageserver, and
expecting the behavior that the pageserver would "forget" about the
timeline on startup as a result. That is no longer our behavior:
pageservers use the remote storage as the source of truth.

Rather than having the test go manually delete things at all, we can
just delete the whole tenant via the pageserver API, and thereby start
from a clean situation.
2023-10-27 09:23:49 +01:00
Conrad Ludgate
71611f4ab3 proxy: prepare to remove high cardinality metrics (#5461)
## Problem

High cardinality metrics are bad

## Summary of changes

Preparing to remove high cardinality metrics. Will actually remove in
#5466
2023-10-26 22:54:37 +01:00
John Spray
7c16b5215e scrubber: add separate find/purge garbage commands (#5409)
## Problem

The previous garbage cleanup functionality relied on doing a dry run,
inspecting logs, and then doing a deletion. This isn't ideal, because
what one actually deletes might not be the same as what one saw in the
dry run. It's also risky UX to rely on presence/absence of one CLI flag
to control deletion: ideally the deletion command should be totally
separate from the one that scans the bucket.

Related: https://github.com/neondatabase/neon/issues/5037

## Summary of changes

This is a major re-work of the code, which results in a net decrease in
line count of about 600. The old code for removing garbage was build
around the idea of doing discovery and purging together: a
"delete_batch_producer" sent batches into a deleter. The new code writes
out both procedures separately, in functions that use the async streams
introduced in https://github.com/neondatabase/neon/pull/5176 to achieve
fast concurrent access to S3 while retaining the readability of a single
function.

- Add `find-garbage`, which writes out a JSON file of tenants/timelines
to purge
- Add `purge-garbage` which consumes the garbage JSON file, applies some
extra validations, and does deletions.
- The purge command will refuse to execute if the garbage file indicates
that only garbage was found: this guards against classes of bugs where
the scrubber might incorrectly deem everything garbage.
- The purge command defaults to only deleting tenants that were found in
"deleted" state in the control plane. This guards against the risk that
using the wrong console API endpoint could cause all tenants to appear
to be missing.

Outstanding work for a future PR:
- Make whatever changes are needed to adapt to the Console/Control Plane
separation.
- Make purge even safer by checking S3 `Modified` times for
index_part.json files (not doing this here, because it will depend on
the generation-aware changes for finding index_part.json files)

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
Co-authored-by: Shany Pozin <shany@neon.tech>
2023-10-26 20:36:28 +01:00
Em Sharnoff
39b148b74e Bump vm-builder v0.18.2 -> v0.18.4 (#5666)
Only applicable change was neondatabase/autoscaling#584, setting
pgbouncer auth_dbname=postgres in order to fix superuser connections
from preventing dropping databases.
2023-10-26 20:04:57 +01:00
Sasha Krassovsky
116c342cad Support changing pageserver dynamically (#5542)
## Problem
We currently require full restart of compute if we change the pageserver
url
## Summary of changes
Makes it so that we don't have to do a full restart, but can just send
SIGHUP
2023-10-26 10:56:07 -07:00
John Spray
ba4fe9e10f pageserver: fix the second "AUX files" warning (#5673)
In https://github.com/neondatabase/neon/pull/5669 I didn't notice that
the same warning is logged in two places: fix the other one.
2023-10-26 13:54:52 +00:00
John Spray
de90bf4663 pageserver: always load remote metadata (no more spawn_load) (#5580)
## Problem

The pageserver had two ways of loading a tenant:
- `spawn_load` would trust on-disk content to reflect all existing
timelines
- `spawn_attach` would list timelines in remote storage.

It was incorrect for `spawn_load` to trust local disk content, because
it doesn't know if the tenant might have been attached and written
somewhere else. To make this correct would requires some generation
number checks, but the payoff is to avoid one S3 op per tenant at
startup, so it's not worth the complexity -- it is much simpler to have
one way to load a tenant.

## Summary of changes

- `Tenant` objects are always created with `Tenant::spawn`: there is no
more distinction between "load" and "attach".
- The ability to run without remote storage (for `neon_local`) is
preserved by adding a branch inside `attach` that uses a fallback
`load_local` if no remote_storage is present.
- Fix attaching a tenant when it has a timeline with no IndexPart: this
can occur if a newly created timeline manages to upload a layer before
it has uploaded an index.
- The attach marker file that used to indicate whether a tenant should
be "loaded" or "attached" is no longer needed, and is removed.
- The GenericRemoteStorage interface gets a `list()` method that maps
more directly to what ListObjects does, returning both keys and common
prefixes. The existing `list_files` and `list_prefixes` methods are just
calls into `list()` now -- these can be removed later if we would like
to shrink the interface a bit.
- The remote deletion marker is moved into `timelines/` and detected as
part of listing timelines rather than as a separate GET request. If any
existing tenants have a marker in the old location (unlikely, only
happens if something crashes mid-delete), then they will rely on the
control plane retrying to complete their deletion.
- Revise S3 calls for timeline listing and tenant load to take a
cancellation token, and retry forever: it never makes sense to make a
Tenant broken because of a transient S3 issue.

## Breaking changes

- The remote deletion marker is moved from `deleted` to
`timelines/deleted` within the tenant prefix. Markers in the old
location will be ignored: it is the control plane's responsibility to
retry deletions until they succeed. Markers in the new location will be
tolerated by the previous release of pageserver via
https://github.com/neondatabase/neon/pull/5632
- The local `attaching` marker file is no longer written. Therefore, if
the pageserver is downgraded after running this code, the old pageserver
will not be able to distinguish between partially attached tenants and
fully attached tenants. This would only impact tenants that were partway
through attaching at the moment of downgrade. In the unlikely even t
that we do experience an incident that prompts us to roll back, then we
may check for attach operations in flight, and manually insert
`attaching` marker files as needed.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-10-26 14:48:44 +01:00
John Spray
8360307ea0 pageserver: exponential backoff on compaction/GC failures (#5672)
Previously, if walredo process crashed we would try to spawn a fresh one
every 2 seconds, which is expensive in itself, but also results in a
high I/O load from the part of the compaction prior to the failure,
which we re-run every 2 seconds.

Closes: https://github.com/neondatabase/neon/issues/5671
2023-10-26 14:00:26 +01:00
MMeent
6129077d31 WALRedo: Limit logging to log_level = ERROR and above (#5587)
This fixes issues in pageserver's walredo process where WALRedo
logs of loglevel=LOG are interpreted as errors.

## Problem

See #5560

## Summary of changes

Set the log level to something that doesn't include LOG.
2023-10-26 12:21:41 +01:00
John Spray
e0ebdfc7ce pageserver: suppress compaction/gc errors while stopping (#5670)
## Problem

Tenant deletions would sometimes be accompanied by compaction stack
traces, because `shutdown()` puts the tenant into stopping state before
it joins background tasks.

## Summary of changes

Treat GC+Compaction as no-ops on a Stopping tenant.
2023-10-26 10:59:24 +01:00
Joonas Koivunen
c508d3b5fa reimpl Layer, remove remote layer, trait Layer, trait PersistentLayer (#4938)
Implement a new `struct Layer` abstraction which manages downloadness
internally, requiring no LayerMap locking or rewriting to download or
evict providing a property "you have a layer, you can read it". The new
`struct Layer` provides ability to keep the file resident via a RAII
structure for new layers which still need to be uploaded. Previous
solution solved this `RemoteTimelineClient::wait_completion` which lead
to bugs like #5639. Evicting or the final local deletion after garbage
collection is done using Arc'd value `Drop`.

With a single `struct Layer` the closed open ended `trait Layer`, `trait
PersistentLayer` and `struct RemoteLayer` are removed following noting
that compaction could be simplified by simply not using any of the
traits in between: #4839.

The new `struct Layer` is a preliminary to remove
`Timeline::layer_removal_cs` documented in #4745.

Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058,
#5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off:
#5057, #5134.
2023-10-26 12:36:38 +03:00
John Spray
acda65d7d4 pageserver: quieten "Failed to get info about AUX files" (#5669)
## Problem

This line caused lots of errors to be emitted for healthy tenants.

## Summary of changes

Downgrade to debug, since it is an expected code path we'll take for
tenants at startup.
2023-10-26 09:53:18 +01:00
dependabot[bot]
378daa358b build(deps): bump werkzeug from 2.2.3 to 3.0.1 (#5665) 2023-10-25 22:50:35 +00:00
Alexander Bayandin
85f4514e7d Get env var for real Azure tests from GitHub (#5662)
## Problem

We'll need to switch `REMOTE_STORAGE_AZURE_REGION` from the current
`eastus2` region to something `eu-central-1`-like. This may require
changing `AZURE_STORAGE_ACCESS_KEY`.
To make it possible to switch from one place (not to break a lot of
builds on CI), move `REMOTE_STORAGE_AZURE_CONTAINER` and
`REMOTE_STORAGE_AZURE_REGION` to GitHub Variables.

See https://github.com/neondatabase/neon/settings/variables/actions

## Summary of changes
- Get values for `REMOTE_STORAGE_AZURE_CONTAINER` &
`REMOTE_STORAGE_AZURE_REGION` from GitHub Variables
2023-10-25 22:54:23 +01:00
Joonas Koivunen
f70019797c refactor(rtc): schedule compaction update (#5649)
a single operation instead of N uploads and 1 deletion scheduling with
write(layer_map) lock releasing in the between. Compaction update will
make for a much better place to change how the operation will change in
future compared to more general file based operations.

builds upon #5645. solves the problem of difficult to see hopeful
correctness w.r.t. other `index_part.json` changing operations.

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-10-25 22:25:43 +01:00
Joonas Koivunen
325258413a fix: trampling on global physical size metric (#5663)
All loading (attached, or from disk) timelines overwrite the global
gauge for physical size. The `_set` method cannot be used safely, so
remove it and just "add" the physical size.
2023-10-25 19:29:12 +01:00
Konstantin Knizhnik
4ddbc0e46d Ignore missed AUX_FILES_KEY when generating image layer (#5660)
## Problem

Logical replication requires new AUX_FILES_KEY which is definitely
absent in existed database.
We do not have function to check if key exists in our KV storage.
So I have to handle the error in `list_aux_files` method.
But this key is also included in key space range and accessed y
`create_image_layer` method.

## Summary of changes

Check if AUX_FILES_KEY  exists before including it in keyspace.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Shany Pozin <shany@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-10-25 18:35:23 +01:00
Arpad Müller
a673e4e7a9 Optionally return json from get_lsn_by_timestamp (#5608)
This does two things: first a minor refactor to not use HTTP/1.x
style header names and also to not panic if some certain requests had no
"Accept" header. As a second thing, it addresses the third bullet point
from #3689:

> Change `get_lsn_by_timestamp` API method to return LSN even if we only
found commit before the specified timestamp.

This is done by adding a version parameter to the `get_lsn_by_timestamp`
API call and making its behaviour depend on the version number.

Part of #3414 (but doesn't address it in its entirety).

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-10-25 18:46:34 +02:00
bojanserafimov
c155cc0c3f Fix test instructions readme (#5644) 2023-10-25 11:53:04 -04:00
Conrad Ludgate
32126d705b proxy refactor serverless (#4685)
## Problem

Our serverless backend was a bit jumbled. As a comment indicated, we
were handling SQL-over-HTTP in our `websocket.rs` file.

I've extracted out the `sql_over_http` and `websocket` files from the
`http` module and put them into a new module called `serverless`.

## Summary of changes

```sh
mkdir proxy/src/serverless
mv proxy/src/http/{conn_pool,sql_over_http,websocket}.rs proxy/src/serverless/
mv proxy/src/http/server.rs proxy/src/http/health_server.rs
mv proxy/src/metrics proxy/src/usage_metrics.rs
```

I have also extracted the hyper server and handler from websocket.rs
into `serverless.rs`
2023-10-25 15:43:03 +01:00
John Spray
5683ae9eab pageserver: suppress some of the most common spurious warnings (#5658)
Two of the most common spurious log messages:
- broker connections terminate & we log at error severity. Unfortunately
tonic gives us an "Unknown" error so to suppress these we're doing
string matching. It's hacky but worthwhile for operations.
- the first iteration of tenant background tasks tends to over-run its
schedule and emit a warning. Ultimately we should fix these to run on
time, but for now we are not benefiting from polluting our logs with the
warnings.
2023-10-25 14:55:37 +01:00
Alexander Bayandin
4778b6a12e Switch to querying new tests results DB (#5616)
## Problem

We started to store test results in a new format in
https://github.com/neondatabase/neon/pull/4549.
This PR switches scripts to query this db.

(we can completely remove old DB/ingestions scripts in a couple of
weeks after the PR merged)

## Summary of changes
- `scripts/benchmark_durations.py` query new database
- `scripts/flaky_tests.py` query new database
2023-10-25 14:25:13 +01:00
John Spray
8b8be7bed4 tests: don't fail tests on torn log lines (#5655)
## Problem

Tests that force-kill and restart a service can generate torn log lines
that might match WARN|ERROR, but not match the allow expression that a
test has loaded, e.g.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5651/6638398772/index.html#suites/7538959189f4501983ddd9e167836c8b/d272ba8a73e6945c

## Summary of changes

Ignore log lines which match a regex for torn log lines on restart: they
have two timestamps and the second line is an "INFO version"... message.
2023-10-25 13:29:30 +01:00
Conrad Ludgate
a461c459d8 fix http pool test (#5653)
## Problem

We defer the returning of connections the the connection pool. It's
possible for our test to be faster than the returning of connections -
which then gets a differing process ID because it opens a new
connection.

## Summary of changes

1. Delay the tests just a little (20ms) to give more chance for
connections to return.
2. Correlate connection IDs with the connection logs a bit more
2023-10-25 13:20:45 +01:00
Joonas Koivunen
4ae2d1390d refactor(remote_timeline_client): Split deletion into unlinking + deletion (#5645)
Quest: #4745. Prerequisite for #4938. Original
https://github.com/neondatabase/neon/pull/4938#issuecomment-1777150665.

The new Layer implementation has so far been using
`RemoteTimelineClient::schedule_layer_file_deletion` from `Layer::drop`
but it was noticed that this could mean that the L0s compaction wanted
to remove could linger in the index part for longer time or be left
there for longer time. Solution is to split the
`RemoteTimelineClient::schedule_layer_file_deletion` into two parts:
- unlinking from index_part.json, to be called from end of compaction
and gc
- scheduling of actual deletions, to be called from `Layer::drop`

The added methods are added unused.
2023-10-25 15:01:19 +03:00
Joonas Koivunen
c5949e1fd6 misc smaller improvements (#5527)
- finally add an `#[instrument]` to Timeline::create_image_layers,
making it easier to see that something is happening because we create
image layers
- format some macro context code
- add a warning not to create new validation functions a la parse do not
validate

Split off from #5198.
2023-10-25 14:59:43 +03:00
John Spray
127837abb0 tests: de-flake test_eviction_across_generations (#5650)
## Problem

There was an edge case where initial logical size calculation can be
downloading a layer that wasn't hit by the test's `SELECT`, and it's
on-disk but still marked as remote in the pageserver's internal state,
so evicting it fails.


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5648/6630099807/index.html#categories/dee044ec96f666edb90a77c01099a941/e38e97a2735ffa8c/

## Summary of changes

Use pageserver API to learn about layers, instead of inspecting local
disk, so that we will always agree with the pageserver about which layer
are local.
2023-10-25 10:55:45 +01:00
Conrad Ludgate
b2c96047d0 move wake compute after the auth quirks logic (#5642)
## Problem

https://github.com/neondatabase/neon/issues/5568#issuecomment-1777015606

## Summary of changes

Make the auth_quirks_creds return the authentication information, and
push the wake_compute loop to after, inside `auth_quirks`
2023-10-25 08:30:47 +01:00
Em Sharnoff
44202eeb3b Bump vm-builder v0.18.1 -> v0.18.2 (#5646)
Only applicable change was neondatabase/autoscaling#571, removing the
postgres_exporter flags `--auto-discover-databases` and
`--exclude-databases=...`
2023-10-24 16:04:28 -07:00
Arpad Müller
4bef977c56 Use tuples instead of manual comparison chain (#5637)
Makes code a little bit simpler
2023-10-24 17:16:23 +00:00
John Spray
a0b862a8bd pageserver: schedule frozen layer uploads inside the layers lock (#5639)
## Problem

Compaction's source of truth for what layers exist is the LayerManager.
`flush_frozen_layer` updates LayerManager before it has scheduled upload
of the frozen layer.

Compaction can then "see" the new layer, decide to delete it, schedule
uploads of replacement layers, all before `flush_frozen_layer` wakes up
again and schedules the upload. When the upload is scheduled, the local
layer file may be gone, in which case we end up with no such layer in
remote storage, but an entry still added to IndexPart pointing to the
missing layer.

## Summary of changes

Schedule layer uploads inside the `self.layers` lock, so that whenever a
frozen layer is present in LayerManager, it is also present in
RemoteTimelineClient's metadata.

Closes: #5635
2023-10-24 13:57:01 +01:00
Conrad Ludgate
767ef29390 proxy: filter out more quota exceeded errors (#5640)
## Problem

Looking at logs, I saw more retries being performed for other quota
exceeded errors

## Summary of changes

Filter out all quota exceeded family of errors
2023-10-24 13:13:23 +01:00
Alexander Bayandin
a8a800af51 Run real Azure tests on CI (#5627)
## Problem
We do not run real Azure-related tests on CI 

## Summary of changes
- Set required env variables to run real Azure blob storage tests on CI
2023-10-24 12:12:11 +01:00
Arpad Müller
1e250cd90a Cleanup in azure_upload_download_works test (#5636)
The `azure_upload_download_works` test is not cleaning up after itself,
leaving behind the files it is uploading. I found these files when
looking at the contents of the bucket in #5627.

We now clean up the file we uploaded before, like the other tests do it
as well.

Follow-up of #5546
2023-10-23 19:08:56 +01:00
John Spray
eaaa18f6ed attachment_service: graceful SIGQUIT (#5626)
`attachment_service` doesn't explicitly handle signals, which causes a
backtrace when `neon_local` kills it with SIGQUIT.

Closes: https://github.com/neondatabase/neon/issues/5613
2023-10-23 17:30:25 +01:00
John Spray
188f67e1df pageserver: forward compat: be tolerant of deletion marker in timelines/ (#5632)
## Problem

https://github.com/neondatabase/neon/pull/5580 will move the remote
deletion marker into the `timelines/` path.

This would cause old pageserver code to fail loading the tenant due to
an apparently invalid timeline ID. That would be a problem if we had to
roll back after deploying #5580

## Summary of changes

If a `deleted` file is in `timelines/` just ignore it.
2023-10-23 17:51:38 +02:00
John Spray
7e805200bb pageserver: parallel load of configs (#5607)
## Problem

When the number of tenants is large, sequentially issuing the open/read
calls for their config files is a ~1000ms delay during startup. It's not
a lot, but it's simple to fix.

## Summary of changes

Put all the config loads into spawn_blocking() tasks and run them in a
JoinSet. We can simplify this a bit later when we have full async disk
I/O.

---------

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-10-23 15:32:34 +01:00
Christian Schwarz
c6ca1d76d2 consumption_metrics: fix periodicness behavior & reporting (#5625)
Before this PR, the ticker was running at default miss behavior `Delay`.
For example, here is the startup output with 25k tenants:

```
2023-10-19T09:57:21.682466Z  INFO synthetic_size_worker: starting calculate_synthetic_size_worker
2023-10-19T10:50:44.678202Z  WARN synthetic_size_worker: task iteration took longer than the configured period elapsed=3202.995707156s period=10m task=ConsumptionMetricsSyntheticSizeWorker
2023-10-19T10:52:17.408056Z  WARN synthetic_size_worker: task iteration took longer than the configured period elapsed=2695.72556035s period=10m task=ConsumptionMetricsSyntheticSizeWorker
```

The first message's `elapsed` value is correct. It matches the
delta between the log line timestamps.

The second one is logged ca 1.5min after, though, but reports a much
larger
`elapsed` than 1.5min.

This PR fixes the behavior by copying what `eviction_task.rs` does.
2023-10-23 16:31:38 +02:00
Conrad Ludgate
94b4e76e13 proxy: latency connect outcome (#5588)
## Problem

I recently updated the latency timers to include cache miss and pool
miss, as well as connection protocol. By moving the latency timer to
start before authentication, we count a lot more failures and it's
messed up the latency dashboard.

## Summary of changes

Add another label to LatencyTimer metrics for outcome. Explicitly report
on success
2023-10-23 15:17:28 +01:00
khanova
b514da90cb Set up timeout for scram protocol execution (#5551)
## Problem
Context:
https://github.com/neondatabase/neon/issues/5511#issuecomment-1759649679

Some of out scram protocol execution timed out only after 17 minutes. 
## Summary of changes
Make timeout for scram execution meaningful and configurable.
2023-10-23 15:11:05 +01:00
Conrad Ludgate
7d17f1719f reduce cancel map contention (#5555)
## Problem

Every database request locks this cancel map rwlock. At high requests
per second this would have high contention

## Summary of changes

Switch to dashmap which has a sharded rwlock to reduce contention
2023-10-23 14:12:41 +01:00
John Spray
41ee75bc71 pageserver: do config writes in a spawn_blocking (#5603)
## Problem

We now persist tenant configuration every time we spawn a tenant. The
persist_tenant_config function is doing a series of non-async filesystem
I/O, because `crashsafe::` isn't async yet. This isn't a demonstrated
problem, but is a source of uncertainty when reasoning about what's
happening with our startup times.

## Summary of changes

- Wrap `crashsafe_overwrite` in `spawn_blocking`.
- Although I think this change makes sense, it does not have a
measurable impact on load time when testing with 10k tenants.
- This can be reverted when we have full async I/O
2023-10-23 09:19:01 +01:00
Christian Schwarz
11e523f503 walredo: fix EGAGAIN/"os error 11" false page reconstruction failures (#5560)
Stacked atop https://github.com/neondatabase/neon/pull/5559

Before this PR, there was the following race condition:

```
T1: polls for writeable stdin
T1: writes to stdin
T1: enters poll for stdout/stderr
T2: enters poll for stdin write
WALREDO: writes to stderr
KERNEL: wakes up T1 and T2
Tx: reads stderr and prints it
Ty: reads stderr and gets EAGAIN
(valid values for (x, y) are (1, 2) or (2, 1))
```

The concrete symptom that we observed repeatedly was with PG16,
which started logging `registered custom resource manager`
to stderr always, during startup, thereby giving us repeated
opportunity to hit above race condition. PG14 and PG15 didn't log
anything to stderr, hence we could have only hit this race condition
if there was an actual error happening.

This PR fixes the race by moving the reading of stderr into a tokio
task. It exits when the stderr is closed by the child process, which
in turn happens when the child exits, either by itself or because
we killed it.

The downside is that the async scheduling can reorder the log messages,
which can be seen in the new `test_stderr`, which runs in a
single-threaded runtime. I included the output below.

Overall I think we should move the entire walredo to async, as Joonas
proposed many months ago. This PR's asyncification is just the first
step to resolve these
false page reconstruction errors.

After this is fixed, we should stop printing that annoying stderr
message
on walredo startup; it causes noise in the pageserver logs.
That work is tracked in #5399 .

```
2023-10-13T19:05:21.878858Z ERROR apply_wal_records{tenant_id=d546fb76ba529195392fb4d19e243991 pid=753986}: failed to write out the walredo errored input: No such file or directory (os error 2) target=walredo-1697223921878-1132-0.walredo length=1132
2023-10-13T19:05:21.878932Z DEBUG postgres applied 2 WAL records (1062 bytes) in 114666 us to reconstruct page image at LSN 0/0
2023-10-13T19:05:21.878942Z ERROR error applying 2 WAL records 0/16A9388..0/16D4080 (1062 bytes) to base image with LSN 0/0 to reconstruct page image at LSN 0/0 n_attempts=0: apply_wal_records

Caused by:
    WAL redo process closed its stdout unexpectedly
2023-10-13T19:05:21.879027Z  INFO kill_and_wait_impl{pid=753986}: wait successful exit_status=signal: 11 (SIGSEGV) (core dumped)
2023-10-13T19:05:21.879079Z DEBUG wal-redo-postgres-stderr{pid=753986 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task started
2023-10-13T19:05:21.879104Z ERROR wal-redo-postgres-stderr{pid=753986 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: received output output="2023-10-13 19:05:21.769 GMT [753986] LOG:  registered custom resource manager \"neon\" with ID 134\n"
2023-10-13T19:05:21.879116Z DEBUG wal-redo-postgres-stderr{pid=753986 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task finished
2023-10-13T19:05:22.004439Z ERROR apply_wal_records{tenant_id=d546fb76ba529195392fb4d19e243991 pid=754000}: failed to write out the walredo errored input: No such file or directory (os error 2) target=walredo-1697223922004-1132-0.walredo length=1132
2023-10-13T19:05:22.004493Z DEBUG postgres applied 2 WAL records (1062 bytes) in 125344 us to reconstruct page image at LSN 0/0
2023-10-13T19:05:22.004501Z ERROR error applying 2 WAL records 0/16A9388..0/16D4080 (1062 bytes) to base image with LSN 0/0 to reconstruct page image at LSN 0/0 n_attempts=1: apply_wal_records

Caused by:
    WAL redo process closed its stdout unexpectedly
2023-10-13T19:05:22.004588Z  INFO kill_and_wait_impl{pid=754000}: wait successful exit_status=signal: 11 (SIGSEGV) (core dumped)
2023-10-13T19:05:22.004624Z DEBUG wal-redo-postgres-stderr{pid=754000 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task started
2023-10-13T19:05:22.004653Z ERROR wal-redo-postgres-stderr{pid=754000 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: received output output="2023-10-13 19:05:21.884 GMT [754000] LOG:  registered custom resource manager \"neon\" with ID 134\n"
2023-10-13T19:05:22.004666Z DEBUG wal-redo-postgres-stderr{pid=754000 tenant_id=d546fb76ba529195392fb4d19e243991 pg_version=16}: wal-redo-postgres stderr_logger_task finished
```
2023-10-23 09:00:13 +01:00
Konstantin Knizhnik
b1a1126152 Grant replication permission to newly created users (#5615)
## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-10-20 21:29:17 +03:00
John Spray
a8899e1e0f pageserver: apply timeout when waiting for tenant loads (#5601)
## Problem

Loading tenants shouldn't hang. However, if it does, we shouldn't let
one hung tenant prevent the entire process from starting background
jobs.

## Summary of changes

Generalize the timeout mechanism that we already applied to loading
initial logical sizes: each phase in startup where we wait for a barrier
is subject to a timeout, and startup will proceed if it doesn't complete
within timeout.

Startup metrics will still reflect the time when a phase actually
completed, rather than when we skipped it.

The code isn't the most beautiful, but that kind of reflects the
awkwardness of await'ing on a future and then stashing it to await again
later if we time out. I could imagine making this cleaner in future by
waiting on a structure that doesn't self-destruct on wait() the way
Barrier does, then make InitializationOrder into a structure that
manages the series of waits etc.
2023-10-20 09:15:34 +01:00
Arseny Sher
2fbd5ab075 Add safekeeper test_late_init. 2023-10-20 10:57:59 +03:00
Arseny Sher
702382e99a Add check that WAL segments are identical after recovery. 2023-10-20 10:57:59 +03:00
Arseny Sher
1b53b3e200 Make test_pageserver_http_get_wal_receiver_success not wait for keepalive. 2023-10-20 10:57:59 +03:00
Arseny Sher
b332268cec Introduce safekeeper peer recovery.
Implements fetching of WAL by safekeeper from another safekeeper by imitating
behaviour of last elected leader. This allows to avoid WAL accumulation on
compute and facilitates faster compute startup as it doesn't need to download
any WAL. Actually removing WAL download in walproposer is a matter of another
patch though.

There is a per timeline task which always runs, checking regularly if it should
start recovery frome someone, meaning there is something to fetch and there is
no streaming compute. It then proceeds with fetching, finishing when there is
nothing more to receive.

Implements https://github.com/neondatabase/neon/pull/4875
2023-10-20 10:57:59 +03:00
Arseny Sher
76c702219c Don't use AppenRequestHeader.epoch_start_lsn.
It is simpler to get it once from ProposerEelected.
2023-10-20 10:57:59 +03:00
Arthur Petukhovsky
ba856140e7 Fix neon_extra_build.yml (#5605)
Build walproposer-lib in gather-rust-build-stats, fix nproc usage, fix
walproposer-lib on macos.
2023-10-19 22:20:39 +01:00
Em Sharnoff
2cf6a47cca vm-monitor: Deny not fail downscale if no memory stats yet (#5606)
Fixes an issue we observed on staging that happens when the
autoscaler-agent attempts to immediately downscale the VM after binding,
which is typical for pooled computes.

The issue was occurring because the autoscaler-agent was requesting
downscaling before the vm-monitor had gathered sufficient cgroup memory
stats to be confident in approving it. When the vm-monitor returned an
internal error instead of denying downscaling, the autoscaler-agent
retried the connection and immediately hit the same issue (in part
because cgroup stats are collected per-connection, rather than
globally).
2023-10-19 19:09:37 +01:00
Konstantin Knizhnik
5a8bcdccb0 Fix elog format error in wallog_mapping_file (#5602)
## Problem

Fix elog format error in wallog_mapping_file 

## Summary of changes

Use proper case to avoid compilation warning=error in C at MacOS.


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-10-19 20:24:35 +03:00
Em Sharnoff
2c8741a5ed vm-monitor: Log full error on message handling failure (#5604)
There's currently an issue with the vm-monitor on staging that's not
really feasible to debug because the current display impl gives no
context to the errors (just says "failed to downscale").

Logging the full error should help.

For communications with the autoscaler-agent, it's ok to only provide
the outermost cause, because we can cross-reference with the VM logs.
At some point in the future, we may want to change that.
2023-10-19 18:10:33 +02:00
Shany Pozin
893b7bac9a Fix neon_extra_builds.yml : nproc is not supported in mac os (#5598)
## Problem
nproc is not supported in mac os, use sysctl -n hw.ncpu instead
2023-10-19 15:24:23 +01:00
Arthur Petukhovsky
66f8f5f1c8 Call walproposer from Rust (#5403)
Create Rust bindings for C functions from walproposer. This allows to
write better tests with real walproposer code without spawning multiple
processes and starting up the whole environment.

`make walproposer-lib` stage was added to build static libraries
`libwalproposer.a`, `libpgport.a`, `libpgcommon.a`. These libraries can
be statically linked to any executable to call walproposer functions.

`libs/walproposer/src/walproposer.rs` contains
`test_simple_sync_safekeepers` to test that walproposer can be called
from Rust to emulate sync_safekeepers logic. It can also be used as a
usage example.
2023-10-19 14:17:15 +01:00
Alexander Bayandin
3a19da1066 build(deps): bump rustix from 0.37.19 to 0.37.25 (#5596)
## Problem

@dependabot has bumped `rustix` 0.36 version to the latest in
https://github.com/neondatabase/neon/pull/5591, but didn't bump 0.37.

Also, update all Rust dependencies for
`test_runner/pg_clients/rust/tokio-postgres`.

Fixes
- https://github.com/neondatabase/neon/security/dependabot/39
- https://github.com/neondatabase/neon/security/dependabot/40

## Summary of changes
- `cargo update -p rustix@0.37.19`
- Update all dependencies for
`test_runner/pg_clients/rust/tokio-postgres`
2023-10-19 13:49:06 +01:00
Conrad Ludgate
572eda44ee update tokio-postgres (#5597)
https://github.com/neondatabase/rust-postgres/pull/23
2023-10-19 14:32:19 +02:00
Arpad Müller
b1d6af5ebe Azure blobs: Simplify error conversion by addition of to_download_error (#5575)
There is a bunch of duplication and manual Result handling that can be
simplified by moving the error conversion into a shared function, using
`map_err`, and the question mark operator.
2023-10-19 14:31:09 +02:00
Arpad Müller
f842b22b90 Add endpoint for querying time info for lsn (#5497)
## Problem

See #5468.

## Summary of changes

Add a new `get_timestamp_of_lsn` endpoint, returning the timestamp
associated with the given lsn.

Fixes #5468.

---------

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-10-19 04:50:49 +02:00
dependabot[bot]
d444d4dcea build(deps): bump rustix from 0.36.14 to 0.36.16 (#5591)
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.14
to 0.36.16.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-19 03:43:49 +01:00
Tristan Partin
c8637f3736 Remove specific file references in NOTICE
Seems like a burden to update this file with each major release.
2023-10-18 14:58:48 -05:00
John Spray
ecf759be6d tests: allow-list S3 500 on DeleteObjects key (#5586)
## Problem

S3 can give us a 500 whenever it likes: when this happens at request
level we eat it in `backoff::retry`, but when it happens for a key
inside a DeleteObjects request, we log it at warn level.

## Summary of changes

Allow-list this class of log message in all tests.
2023-10-18 15:16:58 +00:00
Arthur Petukhovsky
9a9d9eba42 Add test_idle_reconnections 2023-10-18 17:09:26 +03:00
Arseny Sher
1f4805baf8 Remove remnants of num_computes field.
Fixes https://github.com/neondatabase/neon/issues/5581
2023-10-18 17:09:26 +03:00
Konstantin Knizhnik
5c88213eaf Logical replication (#5271)
## Problem

See https://github.com/neondatabase/company_projects/issues/111

## Summary of changes

Save logical replication files in WAL at compute and include them in
basebackup at pate server.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2023-10-18 16:42:22 +03:00
John Spray
607d19f0e0 pageserver: clean up page service Result handling for shutdown/disconnect (#5504)
## Problem

- QueryError always logged at error severity, even though disconnections
are not true errors.
- QueryError type is not expressive enough to distinguish actual errors
from shutdowns.
- In some functions we're returning Ok(()) on shutdown, in others we're
returning an error

## Summary of changes

- Add QueryError::Shutdown and use it in places we check for
cancellation
- Adopt consistent Result behavior: disconnects and shutdowns are always
QueryError, not ok
- Transform shutdown+disconnect errors to Ok(()) at the very top of the
task that runs query handler
- Use the postgres protocol error code for "admin shutdown" in responses
to clients when we are shutting down.

Closes: #5517
2023-10-18 13:28:38 +01:00
dependabot[bot]
1fa0478980 build(deps): bump urllib3 from 1.26.17 to 1.26.18 (#5582) 2023-10-18 12:21:54 +01:00
Christian Schwarz
9da67c4f19 walredo: make request_redo() an async fn (#5559)
Stacked atop https://github.com/neondatabase/neon/pull/5557
Prep work for https://github.com/neondatabase/neon/pull/5560

These changes have a 2% impact on `bench_walredo`.
That's likely because of the `block_on() in the innermost piece of
benchmark-only code.
So, it doesn't affect production code.
The use of closures in the benchmarking code prevents a straightforward
conversion of the whole benchmarking code to async.

before:

```
    $ cargo bench --features testing --bench bench_walredo
       Compiling pageserver v0.1.0 (/home/cs/src/neon/pageserver)
        Finished bench [optimized + debuginfo] target(s) in 2m 11s
         Running benches/bench_walredo.rs (target/release/deps/bench_walredo-d99a324337dead70)
    Gnuplot not found, using plotters backend
    short/short/1           time:   [26.363 µs 27.451 µs 28.573 µs]
    Found 1 outliers among 100 measurements (1.00%)
      1 (1.00%) high mild
    short/short/2           time:   [64.340 µs 64.927 µs 65.485 µs]
    Found 2 outliers among 100 measurements (2.00%)
      2 (2.00%) low mild
    short/short/4           time:   [101.98 µs 104.06 µs 106.13 µs]
    short/short/8           time:   [151.42 µs 152.74 µs 154.03 µs]
    short/short/16          time:   [296.30 µs 297.53 µs 298.88 µs]
    Found 14 outliers among 100 measurements (14.00%)
      10 (10.00%) high mild
      4 (4.00%) high severe

    medium/medium/1         time:   [225.12 µs 225.90 µs 226.66 µs]
    Found 1 outliers among 100 measurements (1.00%)
      1 (1.00%) low mild
    medium/medium/2         time:   [490.80 µs 491.64 µs 492.49 µs]
    Found 1 outliers among 100 measurements (1.00%)
      1 (1.00%) low mild
    medium/medium/4         time:   [934.47 µs 936.49 µs 938.52 µs]
    Found 5 outliers among 100 measurements (5.00%)
      3 (3.00%) low mild
      1 (1.00%) high mild
      1 (1.00%) high severe
    medium/medium/8         time:   [1.8364 ms 1.8412 ms 1.8463 ms]
    Found 4 outliers among 100 measurements (4.00%)
      4 (4.00%) high mild
    medium/medium/16        time:   [3.6694 ms 3.6896 ms 3.7104 ms]

```

after:

```

    $ cargo bench --features testing --bench bench_walredo
       Compiling pageserver v0.1.0 (/home/cs/src/neon/pageserver)
        Finished bench [optimized + debuginfo] target(s) in 2m 11s
         Running benches/bench_walredo.rs (target/release/deps/bench_walredo-d99a324337dead70)
    Gnuplot not found, using plotters backend
    short/short/1           time:   [28.345 µs 28.529 µs 28.699 µs]
                            change: [-0.2201% +3.9276% +8.2451%] (p = 0.07 > 0.05)
                            No change in performance detected.
    Found 17 outliers among 100 measurements (17.00%)
      4 (4.00%) low severe
      5 (5.00%) high mild
      8 (8.00%) high severe
    short/short/2           time:   [66.145 µs 66.719 µs 67.274 µs]
                            change: [+1.5467% +2.7605% +3.9927%] (p = 0.00 < 0.05)
                            Performance has regressed.
    Found 5 outliers among 100 measurements (5.00%)
      5 (5.00%) low mild
    short/short/4           time:   [105.51 µs 107.52 µs 109.49 µs]
                            change: [+0.5023% +3.3196% +6.1986%] (p = 0.02 < 0.05)
                            Change within noise threshold.
    short/short/8           time:   [151.90 µs 153.16 µs 154.41 µs]
                            change: [-1.0001% +0.2779% +1.4221%] (p = 0.65 > 0.05)
                            No change in performance detected.
    short/short/16          time:   [297.38 µs 298.26 µs 299.20 µs]
                            change: [-0.2953% +0.2462% +0.7763%] (p = 0.37 > 0.05)
                            No change in performance detected.
    Found 2 outliers among 100 measurements (2.00%)
      2 (2.00%) high mild

    medium/medium/1         time:   [229.76 µs 230.72 µs 231.69 µs]
                            change: [+1.5804% +2.1354% +2.6635%] (p = 0.00 < 0.05)
                            Performance has regressed.
    medium/medium/2         time:   [501.14 µs 502.31 µs 503.64 µs]
                            change: [+1.8730% +2.1709% +2.5199%] (p = 0.00 < 0.05)
                            Performance has regressed.
    Found 7 outliers among 100 measurements (7.00%)
      1 (1.00%) low mild
      1 (1.00%) high mild
      5 (5.00%) high severe
    medium/medium/4         time:   [954.15 µs 956.74 µs 959.33 µs]
                            change: [+1.7962% +2.1627% +2.4905%] (p = 0.00 < 0.05)
                            Performance has regressed.
    medium/medium/8         time:   [1.8726 ms 1.8785 ms 1.8848 ms]
                            change: [+1.5858% +2.0240% +2.4626%] (p = 0.00 < 0.05)
                            Performance has regressed.
    Found 6 outliers among 100 measurements (6.00%)
      1 (1.00%) low mild
      3 (3.00%) high mild
      2 (2.00%) high severe
    medium/medium/16        time:   [3.7565 ms 3.7746 ms 3.7934 ms]
                            change: [+1.5503% +2.3044% +3.0818%] (p = 0.00 < 0.05)
                            Performance has regressed.
    Found 3 outliers among 100 measurements (3.00%)
      3 (3.00%) high mild
```
2023-10-18 11:23:06 +01:00
Em Sharnoff
16c87b5bda Bump vm-builder v0.17.12 -> v0.18.1 (#5583)
Only applicable change was neondatabase/autoscaling#566, updating
pgbouncer to 1.21.0 and enabling support for prepared statements.
2023-10-18 11:10:01 +02:00
Em Sharnoff
9fe5cc6a82 vm-monitor: Switch from memory.high to polling memory.stat (#5524)
tl;dr it's really hard to avoid throttling from memory.high, and it
counts tmpfs & page cache usage, so it's also hard to make sense of.

In the interest of fixing things quickly with something that should be
*good enough*, this PR switches to instead periodically fetch memory
statistics from the cgroup's memory.stat and use that data to determine
if and when we should upscale.

This PR fixes #5444, which has a lot more detail on the difficulties
we've hit with memory.high. This PR also supersedes #5488.
2023-10-17 15:30:40 -07:00
Conrad Ludgate
543b8153c6 proxy: add flag to reject requests without proxy protocol client ip (#5417)
## Problem

We need a flag to require proxy protocol (prerequisite for #5416)

## Summary of changes

Add a cli flag to require client IP addresses. Error if IP address is
missing when the flag is active.
2023-10-17 16:59:35 +01:00
Christian Schwarz
3a8959a4c4 page_cache: remove dead code (#5493) 2023-10-17 15:56:16 +01:00
Christian Schwarz
4a50483861 docs: error handling: document preferred anyhow context & logging style (#5178)
We already had strong support for this many months ago on Slack:
https://neondb.slack.com/archives/C0277TKAJCA/p1673453329770429
2023-10-17 15:41:47 +01:00
Conrad Ludgate
f775928dfc proxy: refactor how and when connections are returned to the pool (#5095)
## Problem

Transactions break connections in the pool

fixes #4698 

## Summary of changes

* Pool `Client`s are smart object that return themselves to the pool
* Pool `Client`s can be 'discard'ed
* Pool `Client`s are discarded when certain errors are encountered.
* Pool `Client`s are discarded when ReadyForQuery returns a non-idle
state.
2023-10-17 13:55:52 +00:00
John Spray
ea648cfbc6 tests: fix test_eviction_across_generations trying to evict temp files (#5579)
This test is listing files in a timeline and then evicting them: if the
test ran slowly this could encounter temp files for unfinished
downloads: fix by filtering these out in evict_all_layers.
2023-10-17 13:26:11 +01:00
Arpad Müller
093f8c5f45 Update rust to 1.73.0 (#5574)
[Release notes](https://blog.rust-lang.org/2023/10/05/Rust-1.73.0.html)
2023-10-17 13:13:12 +01:00
Arpad Müller
00c71bb93a Also try to login to Azure via SDK provided methods (#5573)
## Problem

We ideally use the Azure SDK's way of obtaining authorization, as
pointed out in
https://github.com/neondatabase/neon/pull/5546#discussion_r1360619178 .

## Summary of changes

This PR adds support for Azure SDK based authentication, using
[DefaultAzureCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.DefaultAzureCredential.html),
which tries the following credentials:

* [EnvironmentCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.EnvironmentCredential.html),
reading from various env vars
* [ImdsManagedIdentityCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.ImdsManagedIdentityCredential.html),
using managed identity
* [AzureCliCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.AzureCliCredential.html),
using Azure CLI

closes #5566.
2023-10-17 11:59:57 +01:00
Christian Schwarz
9256788273 limit imitate accesses concurrency, using same semaphore as compactions (#5578)
Before this PR, when we restarted pageserver, we'd see a rush of
`$number_of_tenants` concurrent eviction tasks starting to do imitate
accesses building up in the period of `[init_order allows activations,
$random_access_delay + EvictionPolicyLayerAccessThreshold::period]`.

We simply cannot handle that degree of concurrent IO.

We already solved the problem for compactions by adding a semaphore.
So, this PR shares that semaphore for use by evictions.

Part of https://github.com/neondatabase/neon/issues/5479

Which is again part of https://github.com/neondatabase/neon/issues/4743

Risks / Changes In System Behavior
==================================

* we don't do evictions as timely as we currently do
* we log a bunch of warnings about eviction taking too long
* imitate accesses and compactions compete for the same concurrency
limit, so, they'll slow each other down through this shares semaphore


Changes
=======

- Move the `CONCURRENT_COMPACTIONS` semaphore into `tasks.rs`
- Rename it to `CONCURRENT_BACKGROUND_TASKS`
- Use it also for the eviction imitate accesses:
    - Imitate acceses are both per-TIMELINE and per-TENANT
    - The per-TENANT is done through coalescing all the per-TIMELINE
      tasks via a tokio mutex `eviction_task_tenant_state`.
    - We acquire the CONCURRENT_BACKGROUND_TASKS permit early, at the
      beginning of the eviction iteration, much before the imitate
      acesses start (and they may not even start at all in the given
      iteration, as they happen only every $threshold).
    - Acquiring early is **sub-optimal** because when the per-timline
      tasks coalesce on the `eviction_task_tenant_state` mutex,
      they are already holding a CONCURRENT_BACKGROUND_TASKS permit.
    - It's also unfair because tenants with many timelines win
      the CONCURRENT_BACKGROUND_TASKS more often.
    - I don't think there's another way though, without refactoring
      more of the imitate accesses logic, e.g, making it all per-tenant.
- Add metrics for queue depth behind the semaphore.
I found these very useful to understand what work is queued in the
system.

    - The metrics are tagged by the new `BackgroundLoopKind`.
    - On a green slate, I would have used `TaskKind`, but we already had
      pre-existing labels whose names didn't map exactly to task kind.
      Also the task kind is kind of a lower-level detail, so, I think
it's fine to have a separate enum to identify background work kinds.


Future Work
===========

I guess I could move the eviction tasks from a ticker to "sleep for
$period".
The benefit would be that the semaphore automatically "smears" the
eviction task scheduling over time, so, we only have the rush on restart
but a smeared-out rush afterward.

The downside is that this perverts the meaning of "$period", as we'd
actually not run the eviction at a fixed period. It also means the the
"took to long" warning & metric becomes meaningless.

Then again, that is already the case for the compaction and gc tasks,
which do sleep for `$period` instead of using a ticker.
2023-10-17 11:29:48 +02:00
Joonas Koivunen
9e1449353d crash-consistent layer map through index_part.json (#5198)
Fixes #5172 as it:
- removes recoinciliation with remote index_part.json and accepts remote
index_part.json as the truth, deleting any local progress which is yet
to be reflected in remote
- moves to prefer remote metadata

Additionally:
- tests with single LOCAL_FS parametrization are cleaned up
- adds a test case for branched (non-bootstrap) local only timeline
availability after restart

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
2023-10-17 10:04:56 +01:00
John Spray
b06dffe3dc pageserver: fixes to /location_config API (#5548)
## Problem

I found some issues with the `/location_config` API when writing new
tests.

## Summary of changes

- Calling the API with the "Detached" state is now idempotent.
- `Tenant::spawn_attach` now takes a boolean to indicate whether to
expect a marker file. Marker files are used in the old attach path, but
not in the new location conf API. They aren't needed because in the New
World, the choice of whether to attach via remote state ("attach") or to
trust local state ("load") will be revised to cope with the transitions
between secondary & attached (see
https://github.com/neondatabase/neon/issues/5550). It is okay to merge
this change ahead of that ticket, because the API is not used in the
wild yet.
- Instead of using `schedule_local_tenant_processing`, the location conf
API handler does its own directory creation and calls `spawn_attach`
directly.
- A new `unsafe_create_dir_all` is added. This differs from
crashsafe::create_dir_all in two ways:
- It is intentionally not crashsafe, because in the location conf API we
are no longer using directory or config existence as the signal for any
important business logic.
   - It is async and uses `tokio::fs`.
2023-10-17 10:21:31 +02:00
Christian Schwarz
b08a0ee186 walredo: fix race condition where shutdown kills the wrong process (#5557)
Before this PR, the following race condition existed:

```
T1: does the apply_wal_records() call and gets back an error
T2: does the apply_wal_records() call and gets back an error
T2: does the kill_and_shutdown
T2: new loop iteration
T2: launches new walredo process
T1: does the kill_and_shutdown of the new process
```

That last step is wrong, T2 already did the kill_and_shutdown.

The symptom of this race condition was that T2 would observe an error
when it tried to do something with the process after T1 killed it.
For example, but not limited to:
`POLLHUP` /  `"WAL redo process closed its stderr unexpectedly"`.

The fix in this PR is the following:

* Use Arc to represent walredo processes.
  The Arc lives at least as long as the walredo process.
* Use Arc::ptr_eq to determine whether to kill the process or not.

The price is an additional RwLock to protect the new `redo_process`
field
that holds the Arc. I guess that could perhaps be an atomic pointer
swap some day. But, let's get one race fixed without risking introducing
a new one.

The use of Arc/drop is also not super great here because it now allows
for an unlimited number of to-be-killed processes to exist concurrently.
See the various `NB` comments above `drop(proc)` for why it's "ok" right
now due to the blocking `wait` inside `drop`.

Note: an earlier fix attempt was
https://github.com/neondatabase/neon/pull/5545
where we apply_batch_postgres would compare stdout_fd for equality.
That's incorrect because the kernel can reuse the file descriptor when
T2 launches the new process.
Details:
https://github.com/neondatabase/neon/pull/5545#pullrequestreview-1676589373
2023-10-17 09:55:39 +02:00
Arpad Müller
3666df6342 azure_blob.rs: use division instead of left shift (#5572)
Should have been a right shift but I did a left shift. It's constant
folded anyways so we just use a shift.
2023-10-16 19:52:07 +01:00
Alexey Kondratov
0ca342260c [compute_ctl+pgxn] Handle invalid databases after failed drop (#5561)
## Problem

In 89275f6c1e we fixed an issue, when we were dropping db in Postgres
even though cplane request failed. Yet, it introduced a new problem that
we now de-register db in cplane even if we didn't actually drop it in
Postgres.

## Summary of changes

Here we revert extension change, so we now again may leave db in invalid
state after failed drop. Instead, `compute_ctl` is now responsible for
cleaning up invalid databases during full configuration. Thus, there are
two ways of recovering from failed DROP DATABASE:
1. User can just repeat DROP DATABASE, same as in Vanilla Postgres.
2. If they didn't, then on next full configuration (dbs / roles changes
   in the API; password reset; or data availability check) invalid db
   will be cleaned up in the Postgres and re-created by `compute_ctl`. So
   again it follows pretty much the same semantics as Vanilla Postgres --
   you need to drop it again after failed drop.

That way, we have a recovery trajectory for both problems.

See this commit for info about `invalid` db state:
  a4b4cc1d60

According to it:
> An invalid database cannot be connected to anymore, but can still be
dropped.

While on it, this commit also fixes another issue, when `compute_ctl`
was trying to connect to databases with `ALLOW CONNECTIONS false`. Now
it will just skip them.

Fixes #5435
2023-10-16 20:46:45 +02:00
John Spray
ded7f48565 pageserver: measure startup duration spent fetching remote indices (#5564)
## Problem

Currently it's unclear how much of the `initial_tenant_load` period is
in S3 objects, and therefore how impactful it is to make changes to
remote operations during startup.

## Summary of changes

- `Tenant::load` is refactored to load remote indices in parallel and to
wait for all these remote downloads to finish before it proceeds to
construct any `Timeline` objects.
- `pageserver_startup_duration_seconds` gets a new `phase` value of
`initial_tenant_load_remote` which counts the time from startup to when
the last tenant finishes loading remote content.
- `test_pageserver_restart` is extended to validate this phase. The
previous version of the test was relying on order of dict entries, which
stopped working when adding a phase, so this is refactored a bit.
- `test_pageserver_restart` used to explicitly create a branch, now it
uses the default initial_timeline. This avoids startup getting held up
waiting for logical sizes, when one of the branches is not in use.
2023-10-16 18:21:37 +01:00
Arpad Müller
e09d5ada6a Azure blob storage support (#5546)
Adds prototype-level support for [Azure blob storage](https://azure.microsoft.com/en-us/products/storage/blobs). Some corners were cut, see the TODOs and the followup issue #5567 for details.

Steps to try it out:

* Create a storage account with block blobs (this is a per-storage
account setting).
* Create a container inside that storage account.
* Set the appropriate env vars: `AZURE_STORAGE_ACCOUNT,
AZURE_STORAGE_ACCESS_KEY, REMOTE_STORAGE_AZURE_CONTAINER,
REMOTE_STORAGE_AZURE_REGION`
* Set the env var `ENABLE_REAL_AZURE_REMOTE_STORAGE=y` and run `cargo
test -p remote_storage azure`

Fixes  #5562
2023-10-16 17:37:09 +02:00
Conrad Ludgate
8c522ea034 proxy: count cache-miss for compute latency (#5539)
## Problem

Would be good to view latency for hot-path vs cold-path

## Summary of changes

add some labels to latency metrics
2023-10-16 16:31:04 +01:00
John Spray
44b1c4c456 pageserver: fix eviction across generations (#5538)
## Problem

Bug was introduced by me in 83ae2bd82c

When eviction constructs a RemoteLayer to replace the layer it just
evicted, it is building a LayerFileMetadata using its _current_
generation, rather than the generation of the layer.

## Summary of changes

- Retrieve Generation from RemoteTimelineClient when evicting. This will
no longer be necessary when #4938 lands.
- Add a test for the scenario in question (this fails without the fix).
2023-10-15 20:23:18 +01:00
Christian Schwarz
99c15907c1 walredo: trim public interfaces (#5556)
Stacked atop https://github.com/neondatabase/neon/pull/5554.
2023-10-13 19:35:53 +01:00
Christian Schwarz
c3626e3432 walredo: remove legacy wal-redo-datadir cleanup code (#5554)
It says it in the comment.
2023-10-13 19:16:15 +01:00
Christian Schwarz
dd6990567f walredo: apply_batch_postgres: get a backtrace whenever it encounters an error (#5541)
For 2 weeks we've seen rare, spurious, not-reproducible page
reconstruction
failures with PG16 in prod.

One of the commits we deployed this week was

Commit

    commit fc467941f9
    Author: Joonas Koivunen <joonas@neon.tech>
    Date:   Wed Oct 4 16:19:19 2023 +0300

        walredo: log retryed error (#546)

With the logs from that commit, we learned that some read() or write()
system call that walredo does fails with `EAGAIN`, aka
`Resource temporarily unavailable (os error 11)`.

But we have no idea where exactly in the code we get back that error.

So, use anyhow instead of fake std::io::Error's as an easy way to get
a backtrace when the error happens, and change the logging to print
that backtrace (i.e., use `{:?}` instead of
`utils::error::report_compact_sources(e)`).

The `WalRedoError` type had to go because we add additional `.context()`
further up the call chain before we `{:?}`-print it. That additional
`.context()` further up doesn't see that there's already an
anyhow::Error
inside the `WalRedoError::ApplyWalRecords` variant, and hence captures
another backtrace and prints that one on `{:?}`-print instead of the
original one inside `WalRedoError::ApplyWalRecords`.

If we ever switch back to `report_compact_sources`, we should make sure
we have some other way to uniquely identify the places where we return
an error in the error message.
2023-10-13 14:08:23 +00:00
khanova
21deb81acb Fix case for array of jsons (#5523)
## Problem

Currently proxy doesn't handle array of json parameters correctly.

## Summary of changes

Added one more level of quotes escaping for the array of jsons case.
Resolves: https://github.com/neondatabase/neon/issues/5515
2023-10-12 14:32:49 +02:00
khanova
dbb21d6592 Make http timeout configurable (#5532)
## Problem

Currently http timeout is hardcoded to 15 seconds.

## Summary of changes

Added an option to configure it via cli args.

Context: https://neondb.slack.com/archives/C04DGM6SMTM/p1696941726151899
2023-10-12 11:41:07 +02:00
Joonas Koivunen
ddceb9e6cd fix(branching): read last record lsn only after Tenant::gc_cs (#5535)
Fixes #5531, at least the latest error of not being able to create a
branch from the head under write and gc pressure.
2023-10-11 16:24:36 +01:00
John Spray
0fc3708de2 pageserver: use a backoff::retry in Deleter (#5534)
## Problem

The `Deleter` currently doesn't use a backoff::retry because it doesn't
need to: it is already inside a loop when doing the deletion, so can
just let the loop go around.

However, this is a problem for logging, because we log on errors, which
includes things like 503/429 cases that would usually be swallowed by a
backoff::retry in most places we use the RemoteStorage interface.

The underlying problem is that RemoteStorage doesn't have a proper error
type, and an anyhow::Error can't easily be interrogated for its original
S3 SdkError because downcast_ref requires a concrete type, but SdkError
is parametrized on response type.

## Summary of changes

Wrap remote deletions in Deleter in a backoff::retry to avoid logging
warnings on transient 429/503 conditions, and for symmetry with how
RemoteStorage is used in other places.
2023-10-11 15:25:08 +01:00
John Spray
e0c8ad48d4 remote_storage: log detail errors in delete_objects (#5530)
## Problem

When we got an error in the payload of a DeleteObjects response, we only
logged how many errors, not what they were.

## Summary of changes

Log up to 10 specific errors. We do not log all of them because that
would be up to 1000 log lines per request.
2023-10-11 13:22:00 +01:00
John Spray
39e144696f pageserver: clean up mgr.rs types that needn't be public (#5529)
## Problem

These types/functions are public and it prevents clippy from catching
unused things.

## Summary of changes

Move to `pub(crate)` and remove the error enum that becomes clearly
unused as a result.
2023-10-11 11:50:16 +00:00
Alexander Bayandin
653044f754 test_runners: increase some timeouts to make tests less flaky (#5521)
## Problem
- `test_heavy_write_workload` is flaky, and fails because of to
statement timeout
- `test_wal_lagging` is flaky and fails because of the default pytest
timeout (see https://github.com/neondatabase/neon/issues/5305)

## Summary of changes
- `test_heavy_write_workload`: increase statement timeout to 5 minutes
(from default 2 minutes)
- `test_wal_lagging`: increase pytest timeout to 600s (from default
300s)
2023-10-11 10:49:15 +01:00
Vadim Kharitonov
80dcdfa8bf Update pgvector to 0.5.1 (#5525) 2023-10-11 09:47:19 +01:00
Arseny Sher
685add2009 Enable /metrics without auth.
To enable auth faster.
2023-10-10 20:06:25 +03:00
Conrad Ludgate
d4dc86f8e3 proxy: more connection metrics (#5464)
## Problem

Hard to tell 
1. How many clients are connected to proxy
2. How many requests clients are making
3. How many connections are made to a database

1 and 2 are different because of the properties of HTTP.

We have 2 already tracked through `proxy_accepted_connections_total` and
`proxy_closed_connections_total`, but nothing for 1 and 3

## Summary of changes

Adds 2 new counter gauges.

*
`proxy_opened_client_connections_total`,`proxy_closed_client_connections_total`
- how many client connections are open to proxy
*
`proxy_opened_db_connections_total`,`proxy_closed_db_connections_total`
- how many active connections are made through to a database.

For TCP and Websockets, we expect all 3 of these quantities to be
roughly the same, barring users connecting but with invalid details.

For HTTP:
* client_connections/connections can differ because the client
connections can be reused.
* connections/db_connections can differ because of connection pooling.
2023-10-10 16:33:20 +01:00
Alex Chi Z
5158de70f3 proxy: breakdown wake up failure metrics (#4933)
## Problem

close https://github.com/neondatabase/neon/issues/4702

## Summary of changes

This PR adds a new metrics for wake up errors and breaks it down by most
common reasons (mostly follows the `could_retry` implementation).
2023-10-10 13:17:37 +01:00
khanova
aec9188d36 Added timeout for http requests (#5514)
# Problem
Proxy timeout for HTTP-requests

## Summary of changes
If the HTTP-request exceeds 15s, it would be killed.

Resolves: https://github.com/neondatabase/neon/issues/4847
2023-10-10 13:39:38 +02:00
John Spray
acefee9a32 pageserver: flush deletion queue on detach (#5452)
## Problem

If a caller detaches a tenant and then attaches it again, pending
deletions from the old attachment might not have happened yet. This is
not a correctness problem, but it causes:
- Risk of leaking some objects in S3
- Some warnings from the deletion queue when pending LSN updates and
pending deletions don't pass validation.

## Summary of changes

- Deletion queue now uses UnboundedChannel so that the push interfaces
don't have to be async.
- This was pulled out of https://github.com/neondatabase/neon/pull/5397,
where it is also useful to be able to drive the queue from non-async
contexts.
- Why is it okay for this to be unbounded? The only way the
unbounded-ness of the channel can become a problem is if writing out
deletion lists can't keep up, but if the system were that overloaded
then the code generating deletions (GC, compaction) would also be
impacted.
- DeletionQueueClient gets a new `flush_advisory` function, which is
like flush_execute, but doesn't wait for completion: this is appropriate
for use in contexts where we would like to encourage the deletion queue
to flush, but don't need to block on it.
- This function is also expected to be useful in next steps for seamless
migration, where the option to flush to S3 while transitioning into
AttachedStale will also include flushing deletion queue, but we wouldn't
want to block on that flush.
- The tenant_detach code in mgr.rs invokes flush_advisory after stopping
the `Tenant` object.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-10-10 10:46:24 +01:00
Conrad Ludgate
bf065aabdf proxy: update locked error retry filter (#5376)
## Problem

We don't want to retry customer quota exhaustion errors.

## Summary of changes

Make sure both types of quota exhaustion errors are not retried
2023-10-10 08:59:16 +01:00
Konstantin Knizhnik
fe74fac276 Fix handling flush error in prefetch (#5473)
## Problem

See https://neondb.slack.com/archives/C05U648A9NJ

In case of failure of flush in prefetch, 
prefetch state is reseted. We need to retry register buffer attempt,
otherwise we will get assertion failure.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-10-10 07:43:37 +03:00
Alexander Bayandin
b91ac670e1 Update plpgsql_check extension to 2.5.3 (#5437) 2023-10-09 17:07:43 +01:00
John Spray
b3195afd20 tests: fix a race in test_deletion_queue_recovery on loaded nodes (#5495)
## Problem

Seen in CI for https://github.com/neondatabase/neon/pull/5453 -- the
time gap between validation completing and the header getting written is
long enough to fail the test, where it was doing a cheeky 1 second
sleep.

## Summary of changes

- Replace 1 second sleep with a wait_until to see the header file get
written
- Use enums as test params to make the results more readable (instead of
True-False parameters)
- Fix the temp suffix used for deletion queue headers: this worked fine,
but resulted in `..tmp` extension.
2023-10-09 16:28:28 +01:00
John Spray
7eaa7a496b pageserver: cancellation handling in writes to postgres client socket (#5503)
## Problem

Writes to the postgres client socket from the page server were not
wrapped in cancellation handling, so a stuck client connection could
prevent tenant shutdowwn.

## Summary of changes

All the places we call flush() to write to the socket, we should be
respecting the cancellation token for the task.

In this PR, I explicitly pass around a CancellationToken rather than
doing inline `task_mgr::shutdown_token` calls, to avoid coupling it to
the global task_mgr state and make it easier to refactor later.

I have some follow-on commits that add a Shutdown variant to QueryError
and use it more extensively, but that's pure refactor so will keep
separate from this bug fix PR.

Closes: https://github.com/neondatabase/neon/issues/5341
2023-10-09 15:54:17 +01:00
Joonas Koivunen
4772cd6c93 fix: deny branching, starting compute from not yet uploaded timelines (#5484)
Part of #5172. First commits show that we used to allow starting up a
compute or creating a branch off a not yet uploaded timeline. This PR
moves activation of a timeline to happen **after** initial layer file(s)
(if any) and `index_part.json` have been uploaded. Simply moving
activation to be *after* downloads have finished works because we now
spawn a task per http request handler.

Current behaviour of uploading on the timelines on next startup is kept,
to be removed later as part of #5172.

Adds:
- `NeonCli.map_branch` and corresponding `neon_local` implementation:
allow creating computes for timelines managed via pageserver http
client/api
- possibly duplicate tests (I did not want to search for, will cleanup
in a follow-up if these duplicated)

Changes:
- make `wait_until_tenant_state` return immediatedly on `Broken` and not
wait more
2023-10-09 17:03:38 +03:00
Shany Pozin
010b4d0d5c Move ApiError 404 to info level (#5501)
## Problem
Moving ApiError 404 to info level logging (see
https://github.com/neondatabase/neon/pull/5489#issuecomment-1750211212)
2023-10-09 13:54:46 +03:00
Rahul Modpur
477cb3717b Fix neon_local pageserver status command (#5475)
## Problem
Fix neon_local pageserver status command
#5430

## Summary of changes
Fix clap config for pageserver status subcommand

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.


Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
2023-10-09 09:13:57 +01:00
John Spray
ea5a97e7b4 pageserver: implement emergency mode for operating without control plane (#5469)
## Problem

Pageservers with `control_plane_api` configured require a control plane
to start up: in an incident this might be a problem.

## Summary of changes

Note to reviewers: most of the code churn in mgr.rs is the refactor
commit that enables the later emergency mode commit: you may want to
review commits separately.

- Add `control_plane_emergency_mode` configuration property
- Refactor init_tenant_mgr to separate loading configurations from the
main loop where we construct Tenant, so that the generations fetch can
peek at the configs in emergency mode.
- During startup, in emergency mode, attach any tenants that were
attached on their last run, using the same generation number.

Closes: #5381 
Closes: https://github.com/neondatabase/neon/issues/5492
2023-10-06 17:25:21 +01:00
John Spray
547914fe19 pageserver: adjust timeline deletion for generations (#5453)
## Problem

Spun off from https://github.com/neondatabase/neon/pull/5449

Timeline deletion does the following:
1. Delete layers referenced in the index
2. Delete everything else in the timeline prefix, except the index
3. Delete the index.

When generations were added, the filter in step 2 got outdated, such
that the index objects were deleted along with everything else at step
2. That didn't really break anything, but it makes an automated test
unhappy and is a violation of the original intent of the code, which
presumably intends to upload an invariant that as long as any objects
for a timeline exist, the index exists.

(Eventually, this index-object-last complexity can go away: when we do
https://github.com/neondatabase/neon/issues/5080, there is no need to
keep the index_part around, as deletions can always be retried any time
any where.)

## Summary of changes

After object listing, split the listed objects into layers and index
objects. Delete the layers first, then the index objects.
2023-10-06 16:15:18 +00:00
Arpad Müller
607b185a49 Fix 1.73.0 clippy lints (#5494)
Doesn't do an upgrade of rustc to 1.73.0 as we want to wait for the
cargo response of the curl CVE before updating. In preparation for an
update, we address the clippy lints that are newly firing in 1.73.0.
2023-10-06 14:17:19 +01:00
Christian Schwarz
bfba5e3aca page_cache: ensure forward progress on miss (#5482)
Problem
=======

Prior to this PR, when we had a cache miss, we'd get back a write guard,
fill it, the drop it and retry the read from cache.

If there's severe contention for the cache, it could happen that the
just-filled data gets evicted before our retry, resulting in lost work
and no forward progress.

Solution
========

This PR leverages the now-available `tokio::sync::RwLockWriteGuard`'s
`downgrade()` functionality to turn the filled slot write guard into a
read guard.
We don't drop the guard at any point, so, forward progress is ensured.


Refs
====

Stacked atop https://github.com/neondatabase/neon/pull/5480 

part of https://github.com/neondatabase/neon/issues/4743
specifically part of https://github.com/neondatabase/neon/issues/5479
2023-10-06 13:41:13 +01:00
Christian Schwarz
ecc7a9567b page_cache: inline {,try_}lock_for_write into memorize_materialized_page (#5480)
Motivation
==========

It's the only user, and the name of `_for_write` is wrong as of

    commit 7a63685cde
    Author: Christian Schwarz <christian@neon.tech>
    Date:   Fri Aug 18 19:31:03 2023 +0200

        simplify page-caching of EphemeralFile (#4994)

Notes
=====

This also allows us to get rid of the WriteBufResult type.

Also rename `search_mapping_for_write` to `search_mapping_exact`. It
makes more sense that way because there is `_for_write`-locking anymore.

Refs
====

part of https://github.com/neondatabase/neon/issues/4743
specifically https://github.com/neondatabase/neon/issues/5479

this is prep work for https://github.com/neondatabase/neon/pull/5482
2023-10-06 13:38:02 +02:00
Joonas Koivunen
45f98dd018 debug_tool: get page at lsn and keyspace via http api (#5057)
If there are any layermap or layer file related problems, having a
reproducable `get_page@lsn` easily usable for fast debugging iteration
is helpful.

Split off from #4938.

Later evolved to add http apis for:
- `get_page@lsn` at
`/v1/tenant/:tenant_id/timeline/:timeline_id/get?key=<hex>&lsn=<lsn
string>`
- collecting the keyspace at
`/v1/tenant/:tenant_id/timeline/:timeline_id/keyspace?[at_lsn=<lsn
string>]`
    - defaults to `last_record_lsn`

collecting the keyspace seems to yield some ranges for which there is no
key.
2023-10-06 12:17:38 +01:00
John Spray
bdfe27f3ac swagger: add a 503 definition to each endpoint (#5476)
## Problem

The control plane doesn't have generic handling for this.

## Summary of changes

Add a 503 response to every endpoint.
2023-10-06 11:31:49 +01:00
Joonas Koivunen
a15f9b3baa pageserver: Tune 503 Resource unavailable (#5489)
503 Resource Unavailable appears as error in logs, but is not really an
error which should ever fail a test on, or even log an error in prod,
[evidence].

Changes:
- log 503 as `info!` level
- use `Cow<'static, str>` instead of `String`
- add an additional `wait_until_tenant_active` in
`test_actually_duplicate_l1`
 
We ought to have in tests "wait for tenants to complete loading" but
this is easier to implement for now.

[evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5485/6423110295/index.html#/testresult/182de66203864fc0
2023-10-06 09:59:14 +01:00
Alexander Bayandin
ce92638185 test_runner: allow race in test_tenant_delete_is_resumed_on_attach (#5478)
## Problem

`test_tenant_delete_is_resumed_on_attach` is flaky

## Summary of changes
- Allow race in `test_tenant_delete_is_resumed_on_attach`
- Cleanup `allowed_errors` in the file a bit
2023-10-06 09:49:31 +01:00
Joonas Koivunen
a3c82f19b8 tests: prettier subprocess output in test log (#5485)
Clean subprocess output so that:
- one line of output is just one line without a linebreak
    - like shells handle `echo subshell says: $(echo foo)`
- multiple lines are indented like other pytest output
- error output is dedented and then indented to be like other pytest
output

Minor readability changes remove friction.
2023-10-05 20:15:55 +00:00
Arthur Petukhovsky
8b15252f98 Move walproposer state into struct (#5364)
This patch extracts all postgres-dependent functions in a separate
`walproposer_api` functions struct. It helps to compile walproposer as
static library without compiling all other postgres server code. This is
useful to allow calling walproposer C code from Rust, or linking this
library with anything else.

All global variables containing walproposer state were extracted to a
separate `WalProposer` struct. This makes it possible to run several
walproposers in the same process, in separate threads.

There were no logic changes and PR mostly consists of shuffling
functions between several files. We have a good test coverage for
walproposer code and I've seen no issues with tests while I was
refactoring it, so I don't expect any issues after merge.

ref https://github.com/neondatabase/neon/issues/547

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2023-10-05 18:48:01 +01:00
Alexander Bayandin
522aaca718 Temporary deploy staging preprod region from main (#5477)
## Problem

Stating preprod region can't use `release-XXX` right now, the config is
unified across all regions, it supports only `XXX`.

Ref
https://neondb.slack.com/archives/C03H1K0PGKH/p1696506459720909?thread_ts=1696437812.365249&cid=C03H1K0PGKH

## Summary of changes
- Deploy staging-preprod from main
2023-10-05 14:02:20 +00:00
John Spray
7cbb39063a tests: stabilize + extend deletion queue recovery test (#5457)
## Problem

This test was unstable when run in parallel with lots of others: if the
pageserver stayed up long enough for some of the deletions to get
validated, they won't be discarded on restart the way the test expects
when keep_attachment=True. This was a test bug, not a pageserver bug.

## Summary of changes

- Add failpoints to control plane api client
- Use failpoint to pause validation in the test to cover the case where
it had been flaky
- Add a metric for the number of deleted keys validated
- Add a permutation to the test to additionally exercise the case where
we _do_ validate lists before restart: this is a coverage enhancement
that seemed sensible when realizing that the test was relying on nothing
being validated before restart.
- the test will now always enter the restart with nothing or everything
validated.
2023-10-05 11:22:05 +01:00
John Spray
baa5fa1e77 pageserver: location configuration API, attachment modes, secondary locations (#5299)
## Problem

These changes are part of building seamless tenant migration, as
described in the RFC:
- https://github.com/neondatabase/neon/pull/5029

## Summary of changes

- A new configuration type `LocationConf` supersedes `TenantConfOpt` for
storing a tenant's configuration in the pageserver repo dir. It contains
`TenantConfOpt`, as well as a new `mode` attribute that describes what
kind of location this is (secondary, attached, attachment mode etc). It
is written to a file called `config-v1` instead of `config` -- this
prepares us for neatly making any other profound changes to the format
of the file in future. Forward compat for existing pageserver code is
achieved by writing out both old and new style files. Backward compat is
achieved by checking for the old-style file if the new one isn't found.
- The `TenantMap` type changes, to hold `TenantSlot` instead of just
`Tenant`. The `Tenant` type continues to be used for attached tenants
only. Tenants in other states (such as secondaries) are represented by a
different variant of `TenantSlot`.
- Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>,
they now hold a reference to a AttachedTenantConf, which includes the
extra information from LocationConf. This enables them to know the
current attachment mode.
- The attachment mode is used as an advisory input to decide whether to
do compaction and GC (AttachedStale is meant to avoid doing uploads,
AttachedMulti is meant to avoid doing deletions).
- A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config`
to drive new location configuration. This provides a superset of the
functionality of attach/detach/load/ignore:
  - Attaching a tenant is just configuring it in an attached state
  - Detaching a tenant is configuring it to a detached state
  - Loading a tenant is just the same as attaching it
- Ignoring a tenant is the same as configuring it into Secondary with
warm=false (i.e. retain the files on disk but do nothing else).

Caveats:
- AttachedMulti tenants don't do compaction in this PR, but they do in
the follow on #5397
- Concurrent updates to the `location_config` API are not handled
elegantly in this PR, a better mechanism is added in the follow on
https://github.com/neondatabase/neon/pull/5367
- Secondary mode is just a placeholder in this PR: the code to upload
heatmaps and do downloads on secondary locations will be added in a
later PR (but that shouldn't change any external interfaces)

Closes: https://github.com/neondatabase/neon/issues/5379

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-10-05 09:55:10 +01:00
Conrad Ludgate
c216b16b0f proxy: fix memory leak (#5472)
## Problem

these JoinSets live for the duration of the process. they might have
many millions of connections spawned on them and they never get cleared.

Fixes #4672 

## Summary of changes

Drain the connections as we go
2023-10-05 07:30:28 +01:00
John Spray
c5ea91f831 pageserver: fix loading control plane JWT token (#5470)
## Problem

In #5383 this configuration was added, but it missed the parts of the
Builder class that let it actually be used.

## Summary of changes

Add `control_plane_api_token` hooks to PageserverConfigBuilder
2023-10-05 01:31:17 +01:00
Em Sharnoff
6489a4ea40 vm-monitor: Remove mem::forget of tokio::sync::mpsc::Sender (#5441)
If the cgroup integration was not enabled, this would cause compute_ctl
to leak memory.

Thankfully, we never use vm-monitor *without* the cgroup handling
enabled, so this wasn't actually impacting us, but... it still looked
suspicious, so figured it was worth changing.
2023-10-04 15:08:10 -07:00
Arthur Petukhovsky
f8a7498965 Wait for sk tli init in test_timeline_status (#5467)
Fix #5447
2023-10-04 22:53:34 +01:00
Joonas Koivunen
7dce62a9ee test: duplicate L1 layer (#5412)
We overwrite L1 layers if compaction gets interrupted. We did not have a
test showing that we do in fact do this.

The test might be a bit flaky due to timestamp usage, but separating for
smaller diff in as part of #5172.

Also removes an unrelated 200s pgbench from the test suite.
2023-10-04 16:52:32 +01:00
Alexander Bayandin
7a2cafb34d Use zstd to compress large allure artifacts (#5458)
## Problem

- Because we compress artifacts file by file, we don't need to put them
into `tar` containers (ie instead of `tar.gz` we can use just `gz`).
- Pythons gz single-threaded and pretty slow.

A benchmark has shown ~20 times speedup (19.876176291 vs
0.8748335830000009) on my laptop (for a pageserver.log size is 1.3M)

## Summary of changes
- Replace tarfile with zstandart
- Update allure to 2.24.0
2023-10-04 16:20:16 +01:00
duguorong009
25a37215f3 fix: replace all std::PathBufs with camino::Utf8PathBuf (#5352)
Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with
`camino::Utf8Path`, `camino::Utf8PathBuf` in
- pageserver
- safekeeper
- control_plane
- libs/remote_storage

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-10-04 17:52:23 +03:00
Conrad Ludgate
f002b1a219 proxy: http limits (#5460)
## Problem

1MB request body is apparently too small for some clients

## Summary of changes

Update to 10 MB request body. Also revert the removal of response limits
while we don't have streaming support.
2023-10-04 15:01:05 +01:00
Joonas Koivunen
fc467941f9 walredo: log retryed error (#5462)
We currently lose the actual reason the first walredo attempt failed.
Together with implicit retry making it difficult to eyeball what is
happening.

PR version keeps the logging the same error message twice, which is what
we've been doing all along. However correlating the retrying case and
the finally returned error is difficult, because the actual error
message was left out before this PR.

Lastly, log the final error we present to postgres *in the same span*,
not outside it. Additionally, suppress the stacktrace as the comment
suggested.
2023-10-04 14:19:19 +01:00
Christian Schwarz
25bf791568 metrics: distinguish page reconstruction success & failure (#5463)
Here's the existing dashboards that use the metric:


https://github.com/search?q=repo%3Aneondatabase%2Fgrafana-dashboard-export%20pageserver_getpage_reconstruct_seconds&type=code

Looks like only `_count` and `_sum` values are used currently.
We can fix them up easily post merge.

I think the histogram is worth keeping, though.

follow-up to
https://github.com/neondatabase/neon/pull/5459#pullrequestreview-1657072882

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-10-04 13:40:00 +01:00
Joonas Koivunen
dee2bcca44 fix: time the reconstruction, not future creation (#5459)
`pageserver_getpage_reconstruct_seconds` histogram had been only
recording the time it takes to create a future, not await on it. Since:
eb0a698adc.
2023-10-04 11:01:07 +01:00
Joonas Koivunen
db8ff9d64b testing: record walredo failures to test reports (#5451)
We have rare walredo failures with pg16.

Let's introduce recording of failing walredo input in `#[cfg(feature =
"testing")]`. There is additional logging (the value reconstruction path
logging usually shown with not found keys), keeping it for
`#[cfg(features = "testing")]`.

Cc: #5404.
2023-10-04 11:24:30 +03:00
Rahul Modpur
af6a20dfc2 Improve CrashsafeOverwriteError source printing (#5410)
## Problem

Duplication of error in log

Fixes #5366 

## Summary of changes

Removed `{0}` from error description above each enum due to presence of
`#[source]` to avoid duplication

Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
2023-10-04 02:38:42 +02:00
Alexander Bayandin
fec94ad5b3 Update checksums for pg_jsonschema & pg_graphql (#5455)
## Problem

Folks have re-taged releases for `pg_jsonschema` and `pg_graphql` (to
increase timeouts on their CI), for us, these are a noop changes, 
but unfortunately, this will cause our builds to fail due to checksums 
mismatch (this might not strike right away because of the build cache).
- 8ba7c7be9d
- aa7509370a

## Summary of changes
- `pg_jsonschema` update checksum
- `pg_graphql` update checksum
2023-10-03 18:42:39 +01:00
John Spray
ace0c775fc pageserver: prefer 503 to 500 for transient unavailability (#5439)
## Problem

The 500 status code should only be used for bugs or unrecoverable
failures: situations we did not expect. Currently, the pageserver is
misusing this response code for some situations that are totally normal,
like requests targeting tenants that are in the process of activating.

The 503 response is a convenient catch-all for "I can't right now, but I
will be able to".

## Summary of changes

- Change some transient availability error conditions to return 503
instead of 500
- Update the HTTP client configuration in integration tests to retry on
503

After these changes, things like creating a tenant and then trying to
create a timeline within it will no longer require carefully checking
its status first, or retrying on 500s. Instead, a client which is
properly configured to retry on 503 can quietly handle such situations.
2023-10-03 17:00:55 +01:00
dependabot[bot]
78dde31827 build(deps): bump urllib3 from 1.26.11 to 1.26.17 (#5442) 2023-10-03 11:50:27 +01:00
Christian Schwarz
de0e96d2be remote_storage: separate semaphores for read and write ops (#5440)
Before this PR, a compaction that queues a lot of uploads could grab all
the semaphore permits.

Any readers that need on-demand downloads would queue up, causing
getpage@lsn outliers.

Internal context:
https://neondb.slack.com/archives/C05NXJFNRPA/p1696264359425419?thread_ts=1696250393.840899&cid=C05NXJFNRPA
2023-10-03 11:22:11 +03:00
Alexander Bayandin
00369c8c2a Update pg_jsonschema & pg_grapgql extensions (#5438)
- Update `pg_jsonschema` to 0.2.0 with Postgres 16 support
- Update `pg_grapgql` to 1.4.0 with Postgres 16 support
- Remove `pgx` (old name of `pgrx`) layer from Dockerfile
2023-10-02 23:50:27 +01:00
Vadim Kharitonov
c1dcf61ca2 Update pgx-ulid extension (#5382)
- Update `pgx-ulid` from 0.1.0 to 0.1.3, and add it to Postgres 16
- Add `pg_tiktoken` to Postgres 16 image

Closes #5374

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-10-02 15:52:45 +01:00
Sasha Krassovsky
89275f6c1e Fix invalid database resulting from failed DROP DB (#5423)
## Problem
If the control plane happened to respond to a DROP DATABASE request with
a non-200 response, we'd abort the DROP DATABASE transaction in the
usual spot. However, Postgres for some reason actually performs the drop
inside of `standard_ProcessUtility`. As such, the database was left in a
weird state after aborting the transaction. We had test coverage of a
failed CREATE DATABASE but not a failed DROP DATABASE.
 
## Summary of changes
Since DROP DATABASE can't be inside of a transaction block, we can just
forward the DDL changes to the control plane inside of
`ProcessUtility_hook`, and if we respond with 500 bail out of
`ProcessUtility` before we perform the drop. This change also adds a
test, which reproduced the invalid database issue before the fix was
applied.
2023-09-29 19:39:28 +01:00
Christian Schwarz
c07eef8ea5 page_cache: find_victim: don't spin while there's no chance for a slot (#5319)
It is wasteful to cycle through the page cache slots trying to find a
victim slot if all the slots are currently un-evictable because a read /
write guard is alive.

We suspect this wasteful cycling to be the root cause for an
"indigestion" we observed in staging (#5291).
The hypothesis is that we `.await` after we get ahold of a read / write
guard, and that tokio actually deschedules us in favor of another
future.
If that other future then needs a page slot, it can't get ours because
we're holding the guard.
Repeat this, and eventually, the other future(s) will find themselves
doing `find_victim` until they hit `exceeded evict iter limit`.

The `find_victim` is wasteful and CPU-starves the futures that are
already holding the read/write guard. A `yield` inside `find_victim`
could mitigate the starvation, but wouldn't fix the wasting of CPU
cycles.

So instead, this PR queues waiters behind a tokio semaphore that counts
evictable slots.
The downside is that this stops the clock page replacement if we have 0
evictable slots.

Also, as explained by the big block comment in `find_victims`, the
semaphore doesn't fully prevent starvation because because we can't make
tokio prioritize those tasks executing `find_victim` that have been
trying the longest.

Implementation
===============
We need to acquire the semaphore permit before locking the slot.
Otherwise, we could deadlock / discover that all permits are gone and
would have to relinquish the slot, having moved forward the Clock LRU
without making progress.

The downside is that, we never get full throughput for read-heavy
workloads, because, until the reader coalesces onto an existing permit,
it'll hold its own permit.


Addendum To Root-Cause Analysis In #5291
========================================

Since merging that PR, @arpad-m pointed out that we couldn't have
reached the `slot.write().await` with his patches because the
VirtualFile slots can't have all been write-locked, because we only hold
them locked while the IO is ongoing, and the IO is still done with
synchronous system calls in that patch set, so, we can have had at most
$number_of_executor_threads locked at any given time.
I count 3 tokio runtimes that do `Timeline::get`, each with 8 executor
threads in our deployment => $number_of_executor_threads = 3*8 = 24 .
But the virtual file cache has 100 slots.

We both agree that nothing changed about the core hypothesis, i.e.,
additional await points inside VirtualFile caused higher concurrency
resulting in exhaustion of page cache slots.
But we'll need to reproduce the issue and investigate further to truly
understand the root cause, or find out that & why we were indeed using
100 VirtualFile slots.

TODO: could it be compaction that needs to hold guards of many
VirtualFile's in its iterators?
2023-09-29 20:03:56 +02:00
Alexander Bayandin
86dd28d4fb Bump hermit-abi & num_cpus packages (#5427)
## Problem

I've noticed that `hermit-abi`
0.3.1 [1] has been yanked from crates.io (looks like nothing too
bad [2]).
Also, we have 2 versions of `hermit-api` in dependencies (0.3.* and
0.2.*), update `num-cpus` to use the latest `hermit-api` 0.3.3.

- [1] https://crates.io/crates/hermit-abi/0.3.1
- [2] https://github.com/hermit-os/hermit-rs/issues/436

## Summary of changes
- `cargo update -p num-cpus`
- `cargo update -p hermit-abi`
- Unignore `RUSTSEC-2023-0052` in `deny.toml` (it has been fixed in
https://github.com/neondatabase/neon/pull/5069)
2023-09-29 12:57:45 +01:00
Conrad Ludgate
fd20bbc6cb proxy: log params when no endpoint (#5418)
## Problem

Our SNI error dashboard features IP addresses but it's not immediately
clear who that is still (#5369)

## Summary of changes

Log some startup params with this error
2023-09-29 09:40:27 +01:00
John Spray
6a1903987a tests: use approximate equality in test_get_tenant_size_with_multiple_branches (#5411)
## Problem

This test has been flaky for a long time.

As far as I can tell, the test was simply wrong to expect postgres
activity to result in deterministic sizes: making the match fuzzy is not
a hack, it's just matching the reality that postgres doesn't promise to
write exactly the same number of pages every time it runs a given query.

## Summary of changes

Equalities now tolerate up to 4 pages different. This is big enough to
tolerate the deltas we've seen in practice.

Closes: https://github.com/neondatabase/neon/issues/2962
2023-09-29 09:15:43 +01:00
John Spray
1881373ec4 Update CODEOWNERS (#5421)
It is usually not intended to notify a random member of the compute team
for pageserver PRs.

Leaving the notification of the storage team in place, because this
serves a purpose when some external contributor opens a PR and isn't
sure who to ask.
2023-09-28 17:34:51 +01:00
John Spray
ca3ca2bb9c pageserver: don't try and recover deletion queue if no remote storage (#5419)
## Problem

Because `neon_local` by default runs with no remote storage, it was not
running the deletion queue workers, and the attempt to call into
`recover()` was failing.

This is a bogus configuration that will go away when we make remote
storage mandatory.

## Summary of changes

Don't try and do deletion queue recovery when remote storage is
disabled.

The reason we don't just unset `control_plane_api` to avoid this is that
generations will soon become mandatory, irrespective of when we make
remote storage mandatory.
2023-09-28 17:20:34 +01:00
Em Sharnoff
b497d0094e file cache: Remove free space monitor (#5406)
This effectively reverts #3832.

There's a couple issues we just discovered with the free space monitor,
and to my knowledge, the fact we're putting the file cache on a separate
filesystem (even when on disk) that's guaranteed to have more room than
the maximum size means that this free space monitor should have no
effect.

More details:

1. The control plane sets the maximum file cache size based on max CU
2. The control plane sets the size of the filesystem underlying the file
cache based on the maximum user selectable CU (or, if the endpoint is
larger, then that size), so that there's always enough room
3. If postmaster gets SIGKILL'd, then the free space monitor process
does not exit
4. If the free space monitor is acting on the cache file but not subject
to locking or up-to-date metadata from a newer postgres instance, then
this could lead to data corruption.

So, in practice I belive the risk of data corruption is *low* but not
nothing, and given the issues we hit because of (3), and given that this
the free space monitor shouldn't be necessary because of (1) and (2),
it's best to just remove it outright.

See also: neondatabase/autoscaling#534, #5405
2023-09-28 06:47:44 -07:00
Conrad Ludgate
528fb1bd81 proxy: metrics2 (#5179)
## Problem

We need to count metrics always when a connection is open. Not only when
the transfer is 0.

We also need to count bytes usage for HTTP.

## Summary of changes

New structure for usage metrics. A `DashMap<Ids, Arc<Counters>>`.

If the arc has 1 owner (the map) then I can conclude that no connections
are open.
If the counters has "open_connections" non zero, then I can conclude a
new connection was opened in the last interval and should be reported
on.

Also, keep count of how many bytes processed for HTTP and report it
here.
2023-09-28 11:38:26 +01:00
Joonas Koivunen
af28362a47 tests: Default to LOCAL_FS for pageserver remote storage (#5402)
Part of #5172. Builds upon #5243, #5298. Includes the test changes:
- no more RemoteStorageKind.NOOP
- no more testing of pageserver without remote storage
- benchmarks now use LOCAL_FS as well

Support for running without RemoteStorage is still kept but in practice,
there are no tests and should not be any tests.

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-09-28 12:25:20 +03:00
John Spray
6b4bb91d0a docs/rfcs: add RFC for fast tenant migration/failover (#5029)
## Problem

Currently we don't have a way to migrate tenants from one pageserver to
another without a risk of gap in availability.

## Summary of changes

This follows on from https://github.com/neondatabase/neon/pull/4919

Migrating tenants between pageservers is essential to operating a
service
at scale, in several contexts:

1. Responding to a pageserver node failure by migrating tenants to other
pageservers
2. Balancing load and capacity across pageservers, for example when a
user expands their
   database and they need to migrate to a pageserver with more capacity.
3. Restarting pageservers for upgrades and maintenance

Currently, a tenant may migrated by attaching to a new node,
re-configuring endpoints to use the new node, and then later detaching
from the old node. This is safe once [generation
numbers](025-generation-numbers.md) are implemented, but does meet
our seamless/fast/efficient goals:

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-09-28 10:07:11 +01:00
Em Sharnoff
5fdc80db03 Bump vm-builder v0.17.11 -> v0.17.12 (#5407)
Only relevant change is neondatabase/autoscaling#534 - refer there for
more details.
2023-09-28 09:52:39 +02:00
Em Sharnoff
48e85460fc vm-monitor: Unset memory.high on start + refactor cgroup handling (#5348)
## Problem

Over the past couple days, we've had a couple VMs hit issues with
postgres getting hit by memory.high throttling, even after #5303 was
supposed to fix that. The tl;dr of those issues is that because
vm-monitor startup sets the file cache size first, before interacting
with the cgroup, cgroup throttling can mean we timeout connecting to the
file cache and never reset the cgroup, even if memory has been upscaled
since then.

See e.g.:

- https://neondb.slack.com/archives/C03F5SM1N02/p1695218132208249
- https://neondb.slack.com/archives/C03F5SM1N02/p1695314613696659

## Summary of changes

This PR adds an additional step into vm-monitor startup, where we first
set the cgroup's memory.high value to 'max', removing the capacity for
throttling. This preferable to just setting memory.high before the file
cache, because it's theoretically possible that the new value of
memory.high could still be less than the current memory usage, in which
case postgres could continue to be throttled without sufficient memory
events to relieve that.

Implementing this properly involved adding a method to our internal
cgroup interface, and it seemed like there was duplicated functionality
there, so this PR unifies that as well, making things a bit more
consistent.
2023-09-27 21:27:23 -07:00
Christian Schwarz
090a644392 metrics for resident & remote physical size without tenant/timeline dimension (#5389)
So that we can compute worst-case /storage size dashboard panel more
cheaply.
2023-09-27 13:18:05 +01:00
John Spray
2cced770da pageserver: add control_plane_api_token config (#5383)
## Problem

Control plane API calls in prod will need authentication.

## Summary of changes

`control_plane_api_token` config is loaded and set as HTTP
`Authorization` header.

Closes: https://github.com/neondatabase/neon/issues/5139
2023-09-27 13:12:13 +01:00
MMeent
7038ce40ce Fix neon_zeroextend's WAL logging (#5387)
When you log more than a few blocks, you need to reserve the space in
advance. We didn't do that, so we got errors. Now we do that, and
shouldn't get errors.
2023-09-27 13:48:30 +02:00
Joonas Koivunen
ce45fd4cc7 test_pageserver_metric_collection: allowed synthetic size to be cancelled at shutdown (#5398)
[evidence] of these messages during shutdown. They can happen if we are
unlucky enough.

[evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/6323709725/index.html#suites/e557ea0d920cfebd45c1921296031273/4120269a64eed172
2023-09-27 12:00:49 +01:00
Joonas Koivunen
6cc8c31fd8 disk_usage_based_eviction: switch warmup to use full table scans (#5384)
Fixes #3978. `test_partial_evict_tenant` can fail multiple times so even
though we retry it as flaky, it will still haunt us.

Originally was going to just relax the comparison, then ended up
replacing warming up to use full table scans instead of `pgbench
--select-only`. This seems to help by producing the expected layer
accesses. There might be something off with how many layers pg16
produces compared to pg14 and pg15. Created #5392.
2023-09-27 10:00:21 +01:00
John Spray
ba92668e37 pageserver: deletion queue & generation validation for deletions (#5207)
## Problem

Pageservers must not delete objects or advertise updates to
remote_consistent_lsn without checking that they hold the latest
generation for the tenant in question (see [the RFC](
https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md))

In this PR:
- A new "deletion queue" subsystem is introduced, through which
deletions flow
- `RemoteTimelineClient` is modified to send deletions through the
deletion queue:
- For GC & compaction, deletions flow through the full generation
verifying process
- For timeline deletions, deletions take a fast path that bypasses
generation verification
- The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced
with a mechanism that maintains a "projected" lsn (equivalent to the
previous property), and a "visible" LSN (which is the one that we may
share with safekeepers).
- Until `control_plane_api` is set, all deletions skip generation
validation
- Tests are introduced for the new functionality in
`test_pageserver_generations.py`

Once this lands, if a pageserver is configured with the
`control_plane_api` configuration added in
https://github.com/neondatabase/neon/pull/5163, it becomes safe to
attach a tenant to multiple pageservers concurrently.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-09-26 16:11:55 +01:00
Joonas Koivunen
16f0622222 fix: real_s3 flakyness with rust tests (#5386)
Fixes #5072. See proof from
https://github.com/neondatabase/neon/issues/5072#issuecomment-1735580798.
Turns out multiple threads can get the same nanoseconds since epoch, so
switch to using millis (for finding the prefix later on) and randomness
via `thread_rng` (protect against adversial ci runners).

Also changes the "per test looking alike" prefix to more "general"
prefix.
2023-09-26 15:59:25 +01:00
Christian Schwarz
3322b6c5b0 page cache: metrics: add page content kind dimension (#5373)
The TaskKind dimension added in #5339 is insufficient to understand what
kind of data causes the cache hits.

Regarding performance considerations: I'm not too worried because we're
moving from 3 to 4 one-byte sized fields; likely the space now used by
the new field was padding before. Didn't check this, though, and it
doesn't matter, we need the data.

What I don't like about this PR is that we have an `Unknown` content
type, and I also don't like that there's no compile-time way to assert
that it's set to something != `Unknown` when calling the page cache.
But, this is what I could come up with before tomorrow’s release, and I
think it covers the hot paths.
2023-09-26 10:01:09 +03:00
Konstantin Knizhnik
c338bb7423 Update last written LSN after walloging all createdb stuff (#5340)
## Problem

See https://neondb.slack.com/archives/C033RQ5SPDH/p1694595347598249

## Summary of changes

Update last written LSN after walloging all createdb stuff

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-09-26 09:20:56 +03:00
Em Sharnoff
a24cd69589 Bump vm-builder v0.17.10 -> v0.17.11 (#5371)
This only includes the changes from neondatabase/autoscaling#525, which
improves graceful VM shutdown.
2023-09-25 19:49:07 +01:00
Christian Schwarz
1d98d3e4c1 VirtualFile::atomic_overwrite: add basic unit tests (#5191)
Should have added them in the initial PR #5186.

Would have been nice to test the failure cases as well, but, without
mocking the FS, that's too hard / platform-dependent.
2023-09-25 17:16:36 +00:00
Christian Schwarz
a0c82969a2 page cache: per-task-kind access stats (#5339)
This PR adds a `task_kind` label to page cache access metrics.

These are to validate our hypothesis that the high hit page cache rate
we observe in prod is due to internal tasks, not getpage requests from
compute.
We believe the latter should near-always be a pageserver-page-cache
_miss_ because compute has it's own page cache, and hence there is no
locality of reference for its accesses to pageserver page cache.

Before this PR, we didn't have `RequestContext` propagation to any code
below the on-demand downloader.
The vast majority of changes in this PR is concerned with adding that
propagation.
2023-09-25 18:30:10 +02:00
George MacKerron
d8977d5199 Altered retry timing parameters for connect to compute, to get more and quicker retries (#5358)
## Problem

Compute start time has improved, but the timing of connection retries
from the proxy is rather slow, meaning we could be making clients wait
hundreds of milliseconds longer than necessary.

## Summary of changes

Previously, retry time in ms was `100 * 1.5**n`, and `n` starts at 1,
giving: 150, 225, 337, 506, 759, 1139, 1709, ...

This PR changes that to `25 * sqrt(2)**(n - 1)` instead, giving: 25, 35,
50, 71, 100, 141, 200, ...
2023-09-25 12:27:41 +01:00
Alexander Bayandin
211f882428 Update hyper-tungstenite to 0.11 (#5361) 2023-09-23 18:06:25 +01:00
Alexander Bayandin
3a2e6a03bc Forbid installation of hnsw extension (#5346)
## Problem

Do not allow new installation of deprecated `hnsw` extension. 
The same approach as in https://github.com/neondatabase/neon/pull/5345

## Summary of changes
- Remove `trusted = true` from `hnsw.control`
- Remove `hnsw` related targets from Makefile
2023-09-23 16:47:57 +01:00
Vadim Kharitonov
6d33d8b092 Update rust to 1.72.1 (#5359) 2023-09-22 16:55:55 +01:00
Alexander Bayandin
3048a5f0e2 Deploy releases to staging-preprod first (#5308)
## Problem

Before releasing new version to production, we'd like to run a set of
required checks on the incoming release.
The simplest approach, which doesn't require many changes — dedicate one
staging region to `preprod` installation.

The proposed changes to the release flow are the following:
- When a release PR is merged into the release branch — trigger
deployment from the release branch to a dedicated staging-preprod region
(for now, it's going to be `eu-west-1` — Ireland)

Corresponding infrastructure PR:
https://github.com/neondatabase/aws/pull/585

## Summary of changes
- Trigger `deploy.dev` workflow with `-f deployPreprodRegion=true` for
release branch
2023-09-22 14:17:43 +01:00
dependabot[bot]
ae79978ae4 build(deps): bump cryptography from 41.0.3 to 41.0.4 (#5349) 2023-09-22 13:15:33 +01:00
Heikki Linnakangas
810a355b9d Add script to download a basebackup from pageserver. (#5344)
I used this while investigating a production issue, and seems like it
could come handy in the future, too.
2023-09-22 11:11:28 +00:00
Vadim Kharitonov
e1e1c08563 Forbid installation of pg_embedding extension (#5345) 2023-09-21 22:28:56 +02:00
John Spray
97a571091e README: update for libicu dependency (#5343)
## Problem

In 83e7e5dbbd dependencies were only
updated for Mac users. Without libicu, postgres 16 build fails.

## Summary of changes

Update dependencies on Ubuntu and fedora to include libicu.
2023-09-21 10:27:58 +02:00
Christian Schwarz
93b41cbb58 page cache metrics: remove unused read_accesses_ephemeral & read_hits_ephemeral (#5338)
We removed the user of this in #4994 .

But the metrics field was `pub`, so, didn't cause an unused-warning in
#4994.

This is preliminary for: #5339
2023-09-20 15:55:58 +00:00
Konstantin Knizhnik
6723a79bec Do not handle lfc_change_limit in processes not haing PGPROC structure (#5332)
## Problem

See https://neondb.slack.com/archives/C05L7D1JAUS/p1693775881474019

## Summary of changes

Do not perform local file cache resizing in processes having no PGPROC

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-09-19 21:55:36 +03:00
Joonas Koivunen
5d8597c2f0 refactor(consumption_metrics): post-split cleanup (#5327)
Split off from #5297. Builds upon #5326. Handles original review
comments which I did not move to earlier split PRs. Completes test
support for verifying events by notifying of the last batch of events.
Adds cleaning up of tempfiles left because of an unlucky shutdown or
SIGKILL.

Finally closes #5175.

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-09-18 23:30:01 +03:00
Em Sharnoff
722e5260bf vm-monitor: Don't set cgroup memory.max (#5333)
All it does is make postgres OOM more often (which, tbf, means we're
less likely to have e.g. compute_ctl get OOM-killed, but that tradeoff
isn't worth it).

Internally, this means removing all references to `memory.max` and the
places where we calculate or store the intended value.

As discussed in the sync earlier.

ref:

- https://neondb.slack.com/archives/C03H1K0PGKH/p1694698949252439?thread_ts=1694505575.693449&cid=C03H1K0PGKH
- https://neondb.slack.com/archives/C03H1K0PGKH/p1695049198622759
2023-09-18 17:47:48 +00:00
Em Sharnoff
18f3a706da Bump vm-builder v0.17.5 -> v0.17.10 (#5334)
Only notable change is including neondatabase/autoscaling#523, which we
hope will help with making sure that TCP connections are properly
terminated before shutdown (which hopefully fixes a leak in the
pageserver).
2023-09-18 17:30:34 +00:00
Alexander Bayandin
70b17981a7 Enable compatibility tests on Postgres 16 (#5314)
## Problem

We didn't have a Postgres 16 snapshot of data to run compatibility tests
on, but now we have it (since the release).

## Summary of changes
- remove `@skip_on_postgres(PgVersion.V16, ...)` from compatibility
tests
2023-09-18 12:58:34 +01:00
Alexander Bayandin
0904d8cf4a Downgrade plv8 for Postgres 14/15 (#5320)
Backport https://github.com/neondatabase/neon/pull/5318 from release 
into main
2023-09-18 12:55:49 +01:00
Joonas Koivunen
55371af711 test: workaround known bad mock_s3 ListObjectsV2 response (#5330)
this should allow test
test_delete_tenant_exercise_crash_safety_failpoints with
debug-pg16-Check.RETRY_WITH_RESTART-mock_s3-tenant-delete-before-remove-timelines-dir-True
to pass more reliably.
2023-09-18 09:24:53 +02:00
Joonas Koivunen
e62ab176b8 refactor(consumption_metrics): split (#5326)
Split off from #5297. Builds upon #5325, should contain only the
splitting. Next up: #5327.
2023-09-16 18:45:08 +03:00
Joonas Koivunen
a221ecb0da test: test_download_remote_layers_api again (#5322)
The test is still flaky, perhaps more after #5233, see #3831.

Do one more `timeline_checkpoint` *after* shutting down safekeepers
*before* shutting down pageserver. Put more effort into not compacting
or creating image layers.
2023-09-16 18:27:19 +03:00
Joonas Koivunen
9cf4ae86ff refactor(consumption_metrics): pre-split cleanup (#5325)
Cleanups in preparation to splitting the consumption_metrics.rs in
#5326.

Split off from #5297.
2023-09-16 18:08:33 +03:00
Joonas Koivunen
74d99b5883 refactor(test_consumption_metrics): split for pageserver and proxy (#5324)
With the addition of the "stateful event verification" the
test_consumption_metrics.py is now too crowded. Split it up for
pageserver and proxy.

Split from #5297.
2023-09-16 18:05:35 +03:00
Joonas Koivunen
f902777202 fix: consumption metrics on restart (#5323)
Write collected metrics to disk to recover previously sent metrics on
restart.

Recover the previously collected metrics during startup, send them over
at right time
  - send cached synthetic size before actual is calculated
  - when `last_record_lsn` rolls back on startup
      - stay at last sent `written_size` metric
      - send `written_size_delta_bytes` metric as 0

Add test support: stateful verification of events in python tests.

Fixes: #5206
Cc: #5175 (loggings, will be enhanced in follow-up)
2023-09-16 11:24:42 +03:00
Joonas Koivunen
a7f4ee02a3 fix(consumption_metrics): exp backoff retry (#5317)
Split off from #5297. Depends on #5315.
Cc: #5175 for retry
2023-09-16 01:11:01 +03:00
Joonas Koivunen
00c4c8e2e8 feat(consumption_metrics): remove event deduplication support (#5316)
We no longer use pageserver deduplication anywhere. Give out a warning
instead.

Split off from #5297.

Cc: #5175 for dedup.
2023-09-16 00:06:19 +03:00
Joonas Koivunen
c5d226d9c7 refactor(consumption_metrics): prereq refactorings, tests (#5315)
Split off from #5297.

There should be no functional changes here:
- refactor tenant metric "production" like previously timeline, allows
unit testing, though not interesting enough yet to test
- introduce type aliases for tuples
- extra refactoring for `collect`, was initially thinking it was useful
but will do a inline later
- shorter binding names
- support for future allocation reuse quests with IdempotencyKey
- move code out of tokio::select to make it rustfmt-able
- generification, allow later replacement of `&'static str` with enum
- add tests that assert sent event contents exactly
2023-09-15 19:44:14 +03:00
Konstantin Knizhnik
66fa176cc8 Handle update of VM in XLOG_HEAP_LOCK/XLOG_HEAP2_LOCK_UPDATED WAL records (#4896)
## Problem

VM should be updated if XLH_LOCK_ALL_FROZEN_CLEARED flags is set in
XLOG_HEAP_LOCK,XLOG_HEAP_2_LOCK_UPDATED WAL records

## Summary of changes

Add handling of this records in walingest.rs

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-09-15 17:47:29 +03:00
Heikki Linnakangas
9e6b5b686c Add a test case for "CREATE DATABASE STRATEGY=file_copy". (#5301)
It was utterly broken on v15 before commit 83e7e5dbbd, which fixed the
incorrect definition of XLOG_DBASE_CREATE_WAL_LOG. We never noticed
because we had no tests for it.
2023-09-15 16:50:57 +03:00
Rahul Modpur
e6985bd098 Move tenant & timeline dir method to NeonPageserver and use them everywhere (#5262)
## Problem
In many places in test code, paths are built manually from what
NeonEnv.tenant_dir and NeonEnv.timeline_dir could do.

## Summary of changes
1. NeonEnv.tenant_dir and NeonEnv.timeline_dir moved under class
NeonPageserver as the path they use is per-pageserver instance.
2. Used these everywhere to replace manual path building

Closes #5258

---------

Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
2023-09-15 11:17:18 +01:00
Konstantin Knizhnik
e400a38fb9 References to old and new blocks were mixed in xlog_heap_update handler (#5312)
## Problem

See https://neondb.slack.com/archives/C05L7D1JAUS/p1694614585955029

https://www.notion.so/neondatabase/Duplicate-key-issue-651627ce843c45188fbdcb2d30fd2178

## Summary of changes

Swap old/new block references

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-09-15 10:32:25 +03:00
Alexander Bayandin
bd36d1c44a approved-for-ci-run.yml: fix variable name and permissions (#5307)
## Problem
- `gh pr list` fails with `unknown argument "main"; please quote all
values that have spaces due to using a variable with the wrong name
- `permissions: write-all` are too wide for the job

## Summary of changes
- For variable name `HEAD` -> `BRANCH`
- Grant only required permissions for each job

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-09-14 20:18:49 +03:00
Alexander Bayandin
0501b74f55 Update checksum for pg_hint_plan (#5309)
## Problem

The checksum for `pg_hint_plan` doesn't match:
```
sha256sum: WARNING: 1 computed checksum did NOT match
```

Ref
https://github.com/neondatabase/neon/actions/runs/6185715461/job/16793609251?pr=5307

It seems that the release was retagged yesterday:
https://github.com/ossc-db/pg_hint_plan/releases/tag/REL16_1_6_0

I don't see any malicious changes from 15_1.5.1:
https://github.com/ossc-db/pg_hint_plan/compare/REL15_1_5_1...REL16_1_6_0,
so it should be ok to update.

## Summary of changes
- Update checksum for `pg_hint_plan` 16_1.6.0
2023-09-14 18:17:50 +03:00
Em Sharnoff
3895829bda vm-monitor: Fix cgroup throttling (#5303)
I believe this (not actual IO problems) is the cause of the "disk speed
issue" that we've had for VMs recently. See e.g.:

1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH
2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659

The vm-informant (and now, the vm-monitor, its replacement) is supposed
to gradually increase the `neon-postgres` cgroup's memory.high value,
because otherwise the kernel will throttle all the processes in the
cgroup.

This PR fixes a bug with the vm-monitor's implementation of this
behavior.

---

Other references, for the vm-informant's implementation:

- Original issue: neondatabase/autoscaling#44
- Original PR: neondatabase/autoscaling#223
2023-09-14 13:21:50 +03:00
Joonas Koivunen
ffd146c3e5 refactor: globals in tests (#5298)
Refactor tests to have less globals.

This will allow to hopefully write more complex tests for our new metric
collection requirements in #5297. Includes reverted work from #4761
related to test globals.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: MMeent <matthias@neon.tech>
2023-09-13 22:05:30 +03:00
Konstantin Knizhnik
1697e7b319 Fix lfc_ensure_function which now disables LFC (#5294)
## Problem

There was a bug in lfc_ensure_opened which actually disables LFC

## Summary of changes

Return true ifLFC file is normally opened

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-09-13 08:56:03 +03:00
bojanserafimov
8556d94740 proxy http: reproduce issue with transactions in pool (#5293)
xfail test reproducing issue https://github.com/neondatabase/neon/issues/4698
2023-09-12 17:13:25 -04:00
MMeent
3b6b847d76 Fixes for Pg16: (#5292)
- pagestore_smgr.c had unnecessary WALSync() (see #5287 )
- Compute node dockerfile didn't build the neon_rmgr extension
- Add PostgreSQL 16 image to docker-compose tests
- Fix issue with high CPU usage in Safekeeper due to a bug in WALSender

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-09-12 22:02:03 +03:00
Alexander Bayandin
2641ff3d1a Use CI_ACCESS_TOKEN to create release PR (#5286)
## Problem

If @github-actions creates release PR, the CI pipeline is not triggered
(but we have `release-notify.yml` workflow that we expect to run on this
event).
I suspect this happened because @github-actions is not a repository
member.

Ref
https://github.com/neondatabase/neon/pull/5283#issuecomment-1715209291

## Summary of changes
- Use `CI_ACCESS_TOKEN` to create a PR
- Use `gh` instead of `thomaseizinger/create-pull-request`
- Restrict permissions for GITHUB_TOKEN to `contents: write` only
(required for `git push`)
2023-09-12 20:01:21 +01:00
Alexander Bayandin
e1661c3c3c approved-for-ci-run.yml: fix ci-run/pr-* branch deletion (#5278)
## Problem

`ci-run/pr-*` branches (and attached PRs) should be deleted
automatically when their parent PRs get closed.
But there are not

## Summary of changes
- Fix if-condition
2023-09-12 19:29:26 +03:00
Alexander Bayandin
9c3f38e10f Document how to run CI for external contributors (#5279)
## Problem
We don't have this instruction written anywhere but in internal Slack

## Summary of changes
- Add `How to run a CI pipeline on Pull Requests from external
contributors` section to `CONTRIBUTING.md`

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-09-12 16:53:13 +01:00
Christian Schwarz
ab1f37e908 revert recent VirtualFile asyncification changes (#5291)
Motivation
==========

We observed two "indigestion" events on staging, each shortly after
restarting `pageserver-0.eu-west-1.aws.neon.build`. It has ~8k tenants.

The indigestion manifests as `Timeline::get` calls failing with
`exceeded evict iter limit` .
The error is from `page_cache.rs`; it was unable to find a free page and
hence failed with the error.

The indigestion events started occuring after we started deploying
builds that contained the following commits:

```
[~/src/neon]: git log --oneline c0ed362790caa368aa65ba57d352a2f1562fd6bf..15eaf78083ecff62b7669
091da1a1c8b4f60ebf8
15eaf7808 Disallow block_in_place and Handle::block_on (#5101)
a18d6d9ae Make File opening in VirtualFile async-compatible (#5280)
76cc87398 Use tokio locks in VirtualFile and turn with_file into macro (#5247)
```

The second and third commit are interesting.
They add .await points to the VirtualFile code.

Background
==========

On the read path, which is the dominant user of page cache & VirtualFile
during pageserver restart, `Timeline::get` `page_cache` and VirtualFile
interact as follows:

1. Timeline::get tries to read from a layer
2. This read goes through the page cache.
3. If we have a page miss (which is known to be common after restart),
page_cache uses `find_victim` to find an empty slot, and once it has
found a slot, it gives exclusive ownership of it to the caller through a
`PageWriteGuard`.
4. The caller is supposed to fill the write guard with data from the
underlying backing store, i.e., the layer `VirtualFile`.
5. So, we call into `VirtualFile::read_at`` to fill the write guard.

The `find_victim` method finds an empty slot using a basic
implementation of clock page replacement algorithm.
Slots that are currently in use (`PageReadGuard` / `PageWriteGuard`)
cannot become victims.
If there have been too many iterations, `find_victim` gives up with
error `exceeded evict iter limit`.

Root Cause For Indigestion
==========================

The second and third commit quoted in the "Motivation" section
introduced `.await` points in the VirtualFile code.
These enable tokio to preempt us and schedule another future __while__
we hold the `PageWriteGuard` and are calling `VirtualFile::read_at`.
This was not possible before these commits, because there simply were no
await points that weren't Poll::Ready immediately.
With the offending commits, there is now actual usage of
`tokio::sync::RwLock` to protect the VirtualFile file descriptor cache.
And we __know__ from other experiments that, during the post-restart
"rush", the VirtualFile fd cache __is__ too small, i.e., all slots are
taken by _ongoing_ VirtualFile operations and cannot be victims.
So, assume that VirtualFile's `find_victim_slot`'s
`RwLock::write().await` calls _will_ yield control to the executor.

The above can lead to the pathological situation if we have N runnable
tokio tasks, each wanting to do `Timeline::get`, but only M slots, N >>
M.
Suppose M of the N tasks win a PageWriteGuard and get preempted at some
.await point inside `VirtualFile::read_at`.
Now suppose tokio schedules the remaining N-M tasks for fairness, then
schedules the first M tasks again.
Each of the N-M tasks will run `find_victim()` until it hits the
`exceeded evict iter limit`.
Why? Because the first M tasks took all the slots and are still holding
them tight through their `PageWriteGuard`.

The result is massive wastage of CPU time in `find_victim()`.
The effort to find a page is futile, but each of the N-M tasks still
attempts it.

This delays the time when tokio gets around to schedule the first M
tasks again.
Eventually, tokio will schedule them, they will make progress, fill the
`PageWriteGuard`, release it.
But in the meantime, the N-M tasks have already bailed with error
`exceeded evict iter limit`.

Eventually, higher level mechanisms will retry for the N-M tasks, and
this time, there won't be as many concurrent tasks wanting to do
`Timeline::get`.
So, it will shake out.

But, it's a massive indigestion until then.

This PR
=======

This PR reverts the offending commits until we find a proper fix.

```
    Revert "Use tokio locks in VirtualFile and turn with_file into macro (#5247)"
    
    This reverts commit 76cc87398c.


    Revert "Make File opening in VirtualFile async-compatible (#5280)"
    
    This reverts commit a18d6d9ae3.
```
2023-09-12 17:38:31 +02:00
MMeent
83e7e5dbbd Feat/postgres 16 (#4761)
This adds PostgreSQL 16 as a vendored postgresql version, and adapts the
code to support this version. 
The important changes to PostgreSQL 16 compared to the PostgreSQL 15
changeset include the addition of a neon_rmgr instead of altering Postgres's
original WAL format.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-09-12 15:11:32 +02:00
Christian Schwarz
5be8d38a63 fix deadlock around TENANTS (#5285)
The sequence that can lead to a deadlock:

1. DELETE request gets all the way to `tenant.shutdown(progress,
false).await.is_err() ` , while holding TENANTS.read()
2. POST request for tenant creation comes in, calls `tenant_map_insert`,
it does `let mut guard = TENANTS.write().await;`
3. Something that `tenant.shutdown()` needs to wait for needs a
`TENANTS.read().await`.
The only case identified in exhaustive manual scanning of the code base
is this one:
Imitate size access does `get_tenant().await`, which does
`TENANTS.read().await` under the hood.

In the above case (1) waits for (3), (3)'s read-lock request is queued
behind (2)'s write-lock, and (2) waits for (1).
Deadlock.

I made a reproducer/proof-that-above-hypothesis-holds in
https://github.com/neondatabase/neon/pull/5281 , but, it's not ready for
merge yet and we want the fix _now_.

fixes https://github.com/neondatabase/neon/issues/5284
2023-09-12 11:23:46 +02:00
John Spray
36c261851f s3_scrubber: remove atty dependency (#5171)
## Problem

- https://github.com/neondatabase/neon/security/dependabot/28

## Summary of changes

Remove atty, and remove the `with_ansi` arg to scrubber's stdout logger.
2023-09-12 10:11:41 +01:00
Arpad Müller
15eaf78083 Disallow block_in_place and Handle::block_on (#5101)
## Problem

`block_in_place` is a quite expensive operation, and if it is used, we
should explicitly have to opt into it by allowing the
`clippy::disallowed_methods` lint.

For more, see
https://github.com/neondatabase/neon/pull/5023#discussion_r1304194495.

Similar arguments exist for `Handle::block_on`, but we don't do this yet
as there is still usages.

## Summary of changes

Adds a clippy.toml file, configuring the [`disallowed_methods`
lint](https://rust-lang.github.io/rust-clippy/master/#/disallowed_method).
2023-09-12 00:11:16 +00:00
Arpad Müller
a18d6d9ae3 Make File opening in VirtualFile async-compatible (#5280)
## Problem

Previously, we were using `observe_closure_duration` in `VirtualFile`
file opening code, but this doesn't support async open operations, which
we want to use as part of #4743.

## Summary of changes

* Move the duration measurement from the `with_file` macro into a
`observe_duration` macro.
* Some smaller drive-by fixes to replace the old strings with the new
variant names introduced by #5273

Part of #4743, follow-up of #5247.
2023-09-11 18:41:08 +02:00
Arpad Müller
76cc87398c Use tokio locks in VirtualFile and turn with_file into macro (#5247)
## Problem

For #4743, we want to convert everything up to the actual I/O operations
of `VirtualFile` to `async fn`.

## Summary of changes

This PR is the last change in a series of changes to `VirtualFile`:
#5189, #5190, #5195, #5203, and #5224.

It does the last preparations before the I/O operations are actually
made async. We are doing the following things:

* First, we change the locks for the file descriptor cache to tokio's
locks that support Send. This is important when one wants to hold locks
across await points (which we want to do), otherwise the Future won't be
Send. Also, one shouldn't generally block in async code as executors
don't like that.
* Due to the lock change, we now take an approach for the `VirtualFile`
destructors similar to the one proposed by #5122 for the page cache, to
use `try_write`. Similarly to the situation in the linked PR, one can
make an argument that if we are in the destructor and the slot has not
been reused yet, we are the only user accessing the slot due to owning
the lock mutably. It is still possible that we are not obtaining the
lock, but the only cause for that is the clock algorithm touching the
slot, which should be quite an unlikely occurence. For the instance of
`try_write` failing, we spawn an async task to destroy the lock. As just
argued however, most of the time the code path where we spawn the task
should not be visited.
* Lastly, we split `with_file` into a macro part, and a function part
that contains most of the logic. The function part returns a lock
object, that the macro uses. The macro exists to perform the operation
in a more compact fashion, saving code from putting the lock into a
variable and then doing the operation while measuring the time to run
it. We take the locks approach because Rust has no support for async
closures. One can make normal closures return a future, but that
approach gets into lifetime issues the moment you want to pass data to
these closures via parameters that has a lifetime (captures work). For
details, see
[this](https://smallcultfollowing.com/babysteps/blog/2023/03/29/thoughts-on-async-closures/)
and
[this](https://users.rust-lang.org/t/function-that-takes-an-async-closure/61663)
link. In #5224, we ran into a similar problem with the `test_files`
function, and we ended up passing the path and the `OpenOptions`
by-value instead of by-ref, at the expense of a few extra copies. This
can be done as the data is cheaply copyable, and we are in test code.
But here, we are not, and while `File::try_clone` exists, it [issues
system calls
internally](1e746d7741/library/std/src/os/fd/owned.rs (L94-L111)).
Also, it would allocate an entirely new file descriptor, something that
the fd cache was built to prevent.
* We change the `STORAGE_IO_TIME` metrics to support async.

Part of #4743.
2023-09-11 17:35:05 +02:00
bojanserafimov
c0ed362790 Measure pageserver wal recovery time and fix flush() method (#5240) 2023-09-11 09:46:06 -04:00
duguorong009
d7fa2dba2d fix(pageserver): update the STORAGE_IO_TIME metrics to avoid expensive operations (#5273)
Introduce the `StorageIoOperation` enum, `StorageIoTime` struct, and
`STORAGE_IO_TIME_METRIC` static which provides lockless access to
histograms consumed by `VirtualFile`.

Closes #5131

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-09-11 14:58:15 +03:00
Joonas Koivunen
a55a78a453 Misc test flakyness fixes (#5233)
Assorted flakyness fixes from #5198, might not be flaky on `main`.

Migrate some tests using neon_simple_env to just neon_env_builder and
using initial_tenant to make flakyness understanding easier. (Did not
understand the flakyness of
`test_timeline_create_break_after_uninit_mark`.)

`test_download_remote_layers_api` is flaky because we have no atomic
"wait for WAL, checkpoint, wait for upload and do not receive any more
WAL".

`test_tenant_size` fixes are just boilerplate which should had always
existed; we should wait for the tenant to be active. similarly for
`test_timeline_delete`.

`test_timeline_size_post_checkpoint` fails often for me with reading
zero from metrics. Give it a few attempts.
2023-09-11 11:42:49 +03:00
Rahul Modpur
999fe668e7 Ack tenant detach before local files are deleted (#5211)
## Problem

Detaching a tenant can involve many thousands of local filesystem
metadata writes, but the control plane would benefit from us not
blocking detach/delete responses on these.

## Summary of changes

After rename of local tenant directory ack tenant detach and delete
tenant directory in background

#5183 

---------

Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
2023-09-10 22:59:51 +03:00
Alexander Bayandin
d33e1b1b24 approved-for-ci-run.yml: use token to checkout the repo (#5266)
## Problem

Another thing I overlooked regarding'approved-for-ci-run`:
- When we create a PR, the action is associated with @vipvap and this
triggers the pipeline — this is good.
- When we update the PR by force-pushing to the branch, the action is
associated with @github-actions, which doesn't trigger a pipeline — this
is bad.

Initially spotted in #5239 / #5211
([link](https://github.com/neondatabase/neon/actions/runs/6122249456/job/16633919558?pr=5239))
— `check-permissions` should not fail.


## Summary of changes
- Use `CI_ACCESS_TOKEN` to check out the repo (I expect this token will
be reused in the following `git push`)
2023-09-10 20:12:38 +01:00
Alexander Bayandin
15fd188fd6 Fix GitHub Autocomment for ci-run/prs (#5268)
## Problem

When PR `ci-run/pr-*` is created the GitHub Autocomment with test
results are supposed to be posted to the original PR, currently, this
doesn't work.

I created this PR from a personal fork to debug and fix the issue. 

## Summary of changes
- `scripts/comment-test-report.js`: use `pull_request.head` instead of
`pull_request.base` 🤦
2023-09-10 20:06:10 +01:00
Alexander Bayandin
34e39645c4 GitHub Workflows: add actionlint (#5265)
## Problem

Add a CI pipeline that checks GitHub Workflows with
https://github.com/rhysd/actionlint (it uses `shellcheck` for shell
scripts in steps)

To run it locally: `SHELLCHECK_OPTS=--exclude=SC2046,SC2086 actionlint`

## Summary of changes
- Add `.github/workflows/actionlint.yml`
- Fix actionlint warnings
2023-09-10 20:05:07 +01:00
Em Sharnoff
1cac923af8 vm-monitor: Rate-limit upscale requests (#5263)
Some VMs, when already scaled up as much as possible, end up spamming
the autoscaler-agent with upscale requests that will never be fulfilled.
If postgres is using memory greater than the cgroup's memory.high, it
can emit new memory.high events 1000 times per second, which... just
means unnecessary load on the rest of the system.

This changes the vm-monitor so that we skip sending upscale requests if
we already sent one within the last second, to avoid spamming the
autoscaler-agent. This matches previous behavior that the vm-informant
hand.
2023-09-10 20:33:53 +03:00
Em Sharnoff
853552dcb4 vm-monitor: Don't include Args in top-level span (#5264)
It makes the logs too verbose.

ref https://neondb.slack.com/archives/C03F5SM1N02/p1694281232874719?thread_ts=1694272777.207109&cid=C03F5SM1N02
2023-09-10 20:15:53 +03:00
Alexander Bayandin
1ea93af56c Create GitHub release from release tag (#5246)
## Problem

This PR creates a GitHub release from a release tag with an
autogenerated changelog: https://github.com/neondatabase/neon/releases

## Summary of changes
- Call GitHub API to create a release
2023-09-09 22:02:28 +01:00
Konstantin Knizhnik
f64b338ce3 Ingore DISK_FULL error when performing availability check for client (#5010)
See #5001

No space is what's expected if we're at size limit.
Of course if SK incorrectly returned "no space", the availability check
wouldn't fire.
But  users would notice such a bug quite soon anyways.
So  ignoring "no space" is the right trade-off.


## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-09-09 21:51:04 +03:00
Konstantin Knizhnik
ba06ea26bb Fix issues with reanabling LFC (#5209)
refer #5208

## Problem

See
https://neondb.slack.com/archives/C03H1K0PGKH/p1693938336062439?thread_ts=1693928260.704799&cid=C03H1K0PGKH

#5208 disable LFC forever in case of error. It is not good because the
problem causing this error (for example ENOSPC) can be resolved anti
will be nice to reenable it after fixing.

Also #5208 disables LFC locally in one backend. But other backends may
still see corrupted data.
It should not cause problems right now with "permission denied" error
because there should be no backend which is able to normally open LFC.
But in case of out-of-disk-space error, other backend can read corrupted
data.

## Summary of changes

1. Cleanup hash table after error to prevent access to stale or
corrupted data
2. Perform disk write under exclusive lock (hoping it will not affect
performance because usually write just copy data from user to system
space)
3. Use generations to prevent access to stale data in lfc_read

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-09-09 17:51:16 +03:00
Joonas Koivunen
6f28da1737 fix: LocalFs root in test_compatibility is PosixPath('...') (#5261)
I forgot a `str(...)` conversion in #5243. This lead to log lines such
as:

```
Using fs root 'PosixPath('/tmp/test_output/test_backward_compatibility[debug-pg14]/compatibility_snapshot/repo/local_fs_remote_storage/pageserver')' as a remote storage
```

This surprisingly works, creating hierarchy of under current working
directory (`repo_dir` for tests):
- `PosixPath('`
  - `tmp` .. up until .. `local_fs_remote_storage`
    - `pageserver')`

It should not work but right now test_compatibility.py tests finds local
metadata and layers, which end up used. After #5172 when remote storage
is the source of truth it will no longer work.
2023-09-08 20:27:00 +03:00
Heikki Linnakangas
60050212e1 Update rdkit to version 2023_03_03. (#5260)
It includes PostgreSQL 16 support.
2023-09-08 19:40:29 +03:00
Joonas Koivunen
66633ef2a9 rust-toolchain: use 1.72.0, same as CI (#5256)
Switches everyone without an `rustup override` to 1.72.0.

Code changes required already done in #5255.
Depends on https://github.com/neondatabase/build/pull/65.
2023-09-08 19:36:02 +03:00
Alexander Bayandin
028fbae161 Miscellaneous fixes for tests-related things (#5259)
## Problem

A bunch of fixes for different test-related things 

## Summary of changes
- Fix test_runner/pg_clients (`subprocess_capture` return value has
changed)
- Do not run create-test-report if check-permissions failed for not
cancelled jobs
- Fix Code Coverage comment layout after flaky tests. Add another
healing "\n"
- test_compatibility: add an instruction for local run


Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-09-08 16:28:09 +01:00
John Spray
7b6337db58 tests: enable multiple pageservers in neon_local and neon_fixture (#5231)
## Problem

Currently our testing environment only supports running a single
pageserver at a time. This is insufficient for testing failover and
migrations.
- Dependency of writing tests for #5207 

## Summary of changes

- `neon_local` and `neon_fixture` now handle multiple pageservers
- This is a breaking change to the `.neon/config` format: any local
environments will need recreating
- Existing tests continue to work unchanged:
  - The default number of pageservers is 1
- `NeonEnv.pageserver` is now a helper property that retrieves the first
pageserver if there is only one, else throws.
- Pageserver data directories are now at `.neon/pageserver_{n}` where n
is 1,2,3...
- Compatibility tests get some special casing to migrate neon_local
configs: these are not meant to be backward/forward compatible, but they
were treated that way by the test.
2023-09-08 16:19:57 +01:00
Konstantin Knizhnik
499d0707d2 Perform throttling for concurrent build index which is done outside transaction (#5048)
See 
https://neondb.slack.com/archives/C03H1K0PGKH/p1692550646191429

## Problem

Build index concurrently is writing WAL outside transaction.
`backpressure_throttling_impl` doesn't perform throttling for read-only
transactions (not assigned XID).
It cause huge write lag which can cause large delay of accessing the
table.

## Summary of changes

Looks at `PROC_IN_SAFE_IC` in process state set during concurrent index
build.
 
## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-09-08 18:05:08 +03:00
Joonas Koivunen
720d59737a rust-1.72.0 changes (#5255)
Prepare to upgrade rust version to latest stable.

- `rustfmt` has learned to format `let irrefutable = $expr else { ...
};` blocks
- There's a new warning about virtual (workspace) crate resolver, picked
the latest resolver as I suspect everyone would expect it to be the
latest; should not matter anyways
- Some new clippies, which seem alright
2023-09-08 16:28:41 +03:00
Joonas Koivunen
ff87fc569d test: Remote storage refactorings (#5243)
Remote storage cleanup split from #5198:
- pageserver, extensions, and safekeepers now have their separate remote
storage
- RemoteStorageKind has the configuration code
- S3Storage has the cleanup code
- with MOCK_S3, pageserver, extensions, safekeepers use different
buckets
- with LOCAL_FS, `repo_dir / "local_fs_remote_storage" / $user` is used
as path, where $user is `pageserver`, `safekeeper`
- no more `NeonEnvBuilder.enable_xxx_remote_storage` but one
`enable_{pageserver,extensions,safekeeper}_remote_storage`

Should not have any real changes. These will allow us to default to
`LOCAL_FS` for pageserver on the next PR, remove
`RemoteStorageKind.NOOP`, work towards #5172.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-09-08 13:54:23 +03:00
Heikki Linnakangas
cdc65c1857 Update pg_cron to version 1.6.0 (#5252)
This includes PostgreSQL 16 support. There are no catalog changes, so
this is a drop-in replacement, no need to run "ALTER EXTENSION UPDATE".
2023-09-08 12:42:46 +03:00
Heikki Linnakangas
dac995e7e9 Update plpgsql_check extension to version v2.4.0 (#5249)
This brings v16 support.
2023-09-08 10:46:02 +03:00
Alexander Bayandin
b80740bf9f test_startup: increase timeout (#5238)
## Problem

`test_runner/performance/test_startup.py::test_startup` started to fail
more frequently because of the timeout.
Let's increase the timeout to see the failures on the perf dashboard.

## Summary of changes
- Increase timeout for`test_startup` from 600 to 900 seconds
2023-09-08 01:57:38 +01:00
Heikki Linnakangas
57c1ea49b3 Update hypopg extension to version 1.4.0 (#5245)
The v1.4.0 includes changes to make it compile with PostgreSQL 16. The
commit log doesn't call it out explicitly, but I tested it manually.

v1.4.0 includes some new functions, but I tested manually that the the
v1.3.1 functionality works with the v1.4.0 version of the library. That
means that this doesn't break existing installations. Users can do
"ALTER EXTENSION hypopg UPDATE" if they want to use the new v1.4.0
functionality, but they don't have to.
2023-09-08 03:30:11 +03:00
Heikki Linnakangas
6c31a2d342 Upgrade prefix extension to version 1.2.10 (#5244)
This version includes trivial changes to make it compile with PostgreSQL
16. No functional changes.
2023-09-08 02:10:01 +03:00
Heikki Linnakangas
252b953f18 Upgrade postgresql-hll to version 2.18. (#5241)
This includes PostgreSQL 16 support. No other changes, really.

The extension version in the upstream was changed from 2.17 to 2.18,
however, there is no difference between the catalog objects. So if you
had installed 2.17 previously, it will continue to work. You can run
"ALTER EXTENSION hll UPDATE", but all it will do is update the version
number in the pg_extension table.
2023-09-08 02:07:17 +03:00
Heikki Linnakangas
b414360afb Upgrade ip4r to version 2.4.2 (#5242)
Includes PostgreSQL v16 support. No functional changes.
2023-09-08 02:06:53 +03:00
Arpad Müller
d206655a63 Make VirtualFile::{open, open_with_options, create,sync_all,with_file} async fn (#5224)
## Problem

Once we use async file system APIs for `VirtualFile`, these functions
will also need to be async fn.

## Summary of changes

Makes the functions `open, open_with_options, create,sync_all,with_file`
of `VirtualFile` async fn, including all functions that call it. Like in
the prior PRs, the actual I/O operations are not using async APIs yet,
as per request in the #4743 epic.

We switch towards not using `VirtualFile` in the par_fsync module,
hopefully this is only temporary until we can actually do fully async
I/O in `VirtualFile`. This might cause us to exhaust fd limits in the
tests, but it should only be an issue for the local developer as we have
high ulimits in prod.

This PR is a follow-up of #5189, #5190, #5195, and #5203. Part of #4743.
2023-09-08 00:50:50 +02:00
Heikki Linnakangas
e5adc4efb9 Upgrade h3-pg to version 4.1.3. (#5237)
This includes v16 support.
2023-09-07 21:39:12 +03:00
Heikki Linnakangas
c202f0ba10 Update PostGIS to version 3.3.3 (#5236)
It's a good idea to keep up-to-date in general. One noteworthy change is
that PostGIS 3.3.3 adds support for PostgreSQL v16. We'll need that.

PostGIS 3.4.0 has already been released, and we should consider
upgrading to that. However, it's a major upgrade and requires running
"SELECT postgis_extensions_upgrade();" in each database, to upgrade the
catalogs. I don't want to deal with that right now.
2023-09-07 21:38:55 +03:00
Alexander Bayandin
d15563f93b Misc workflows: fix quotes in bash (#5235) 2023-09-07 19:39:42 +03:00
Rahul Modpur
485a2cfdd3 Fix pg_config version parsing (#5200)
## Problem
Fix pg_config version parsing

## Summary of changes
Use regex to capture major version of postgres
#5146
2023-09-07 15:34:22 +02:00
Alexander Bayandin
1fee69371b Update plv8 to 3.1.8 (#5230)
## Problem

We likely need this to support Postgres 16
It's also been asked by a user
https://github.com/neondatabase/neon/discussions/5042

The latest version is 3.2.0, but it requires some changes in the build
script (which I haven't checked, but it didn't work right away)

## Summary of changes
```
3.1.8       2023-08-01
            - force v8 to compile in release mode

3.1.7       2023-06-26
            - fix byteoffset issue with arraybuffers
            - support postgres 16 beta

3.1.6       2023-04-08
            - fix crash issue on fetch apply
            - fix interrupt issue
```
From https://github.com/plv8/plv8/blob/v3.1.8/Changes
2023-09-07 14:21:38 +01:00
Alexander Bayandin
f8a91e792c Even better handling of approved-for-ci-run label (#5227)
## Problem

We've got `approved-for-ci-run` to work 🎉 
But it's still a bit rough, this PR should improve the UX for external
contributors.

## Summary of changes
- `build_and_test.yml`: add `check-permissions` job, which fails if PR is
created from a fork. Make all jobs in the workflow to be dependant on
`check-permission` to fail fast
- `approved-for-ci-run.yml`: add `cleanup` job to close `ci-run/pr-*`
PRs and delete linked branches when the parent PR is closed
- `approved-for-ci-run.yml`: fix the layout for the `ci-run/pr-*` PR
description
- GitHub Autocomment: add a comment with tests result to the original PR
(instead of a PR from `ci-run/pr-*` )
2023-09-07 14:21:01 +01:00
duguorong009
706977fb77 fix(pageserver): add the walreceiver state to tenant timeline GET api endpoint (#5196)
Add a `walreceiver_state` field to `TimelineInfo` (response of `GET /v1/tenant/:tenant_id/timeline/:timeline_id`) and while doing that, refactor out a common `Timeline::walreceiver_state(..)`. No OpenAPI changes, because this is an internal debugging addition.

Fixes #3115.

Co-authored-by: Joonas Koivunen <joonas.koivunen@gmail.com>
2023-09-07 14:17:18 +03:00
Arpad Müller
7ba0f5c08d Improve comment in page cache (#5220)
It was easy to interpret comment in the page cache initialization code
to be about justifying why we leak here at all, not just why this
specific type of leaking is done (which the comment was actually meant
to describe).

See
https://github.com/neondatabase/neon/pull/5125#discussion_r1308445993

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-09-06 21:44:54 +02:00
Arpad Müller
6243b44dea Remove Virtual from FileBlockReaderVirtual variant name (#5225)
With #5181, the generics for `FileBlockReader` have been removed, so
having a `Virtual` postfix makes less sense now.
2023-09-06 20:54:57 +02:00
Joonas Koivunen
3a966852aa doc: tests expect lsof (#5226)
On a clean system `lsof` needs to be installed. Add it to the list just
to keep things nice and copy-pasteable (except for poetry).
2023-09-06 21:40:00 +03:00
duguorong009
31e1568dee refactor(pageserver): refactor pageserver router state creation (#5165)
Fixes #3894 by:
- Refactor the pageserver router creation flow
- Create the router state in `pageserver/src/bin/pageserver.rs`
2023-09-06 21:31:49 +03:00
Chengpeng Yan
9a9187b81a Complete the missing metrics for files_created/bytes_written (#5120) 2023-09-06 14:00:15 -04:00
Chengpeng Yan
dfe2e5159a remove the duplicate entries in postgresql.conf (#5090) 2023-09-06 13:57:03 -04:00
Alexander Bayandin
e4b1d6b30a Misc post-merge fixes (#5219)
## Problem
- `SCALE: unbound variable` from
https://github.com/neondatabase/neon/pull/5079
- The layout of the GitHub auto-comment is broken if the code coverage
section follows flaky test section from
https://github.com/neondatabase/neon/pull/4999

## Summary of changes
- `benchmarking.yml`: Rename `SCALE` to `TEST_OLAP_SCALE` 
- `comment-test-report.js`: Add an extra new-line before Code coverage
section
2023-09-06 20:11:44 +03:00
Alexander Bayandin
76a96b0745 Notify Slack channel about upcoming releases (#5197)
## Problem

When the next release is coming, we want to let everyone know about it by
posting a message to the Slack channel with a list of commits.

## Summary of changes
- `.github/workflows/release-notify.yml` is added
- the workflow sends a message to
`vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID` (or
[#test-release-notifications](https://neondb.slack.com/archives/C05QQ9J1BRC)
if not configured)
- On each PR update, the workflow updates the list of commits in the
message (it doesn't send additional messages)
2023-09-06 17:52:21 +01:00
Arpad Müller
5e00c44169 Add WriteBlobWriter buffering and make VirtualFile::{write,write_all} async (#5203)
## Problem

We want to convert the `VirtualFile` APIs to async fn so that we can
adopt one of the async I/O solutions.

## Summary of changes

This PR is a follow-up of #5189, #5190, and #5195, and does the
following:

* Move the used `Write` trait functions of `VirtualFile` into inherent
functions
* Add optional buffering to `WriteBlobWriter`. The buffer is discarded
on drop, similarly to how tokio's `BufWriter` does it: drop is neither
async nor does it support errors.
* Remove the generics by `Write` impl of `WriteBlobWriter`, alwaays
using `VirtualFile`
* Rename `WriteBlobWriter` to `BlobWriter`
* Make various functions in the write path async, like
`VirtualFile::{write,write_all}`.

Part of #4743.
2023-09-06 18:17:12 +02:00
Alexander Bayandin
d5f1858f78 approved-for-ci-run.yml: use different tokens (#5218)
## Problem

`CI_ACCESS_TOKEN` has quite limited access (which is good), but this
doesn't allow it to remove labels from PRs (which is bad)

## Summary of changes
- Use `GITHUB_TOKEN` to remove labels
- Use `CI_ACCESS_TOKEN` to create PRs
2023-09-06 18:50:59 +03:00
John Spray
61d661a6c3 pageserver: generation number fetch on startup and use in /attach (#5163)
## Problem

- #5050 

Closes: https://github.com/neondatabase/neon/issues/5136

## Summary of changes

- A new configuration property `control_plane_api` controls other
functionality in this PR: if it is unset (default) then everything still
works as it does today.
- If `control_plane_api` is set, then on startup we call out to control
plane `/re-attach` endpoint to discover our attachments and their
generations. If an attachment is missing from the response we implicitly
detach the tenant.
- Calls to pageserver `/attach` API may include a `generation`
parameter. If `control_plane_api` is set, then this parameter is
mandatory.
- RemoteTimelineClient's loading of index_part.json is generation-aware,
and will try to load the index_part with the most recent generation <=
its own generation.
- The `neon_local` testing environment now includes a new binary
`attachment_service` which implements the endpoints that the pageserver
requires to operate. This is on by default if running `cargo neon` by
hand. In `test_runner/` tests, it is off by default: existing tests
continue to run with in the legacy generation-less mode.

Caveats:
- The re-attachment during startup assumes that we are only re-attaching
tenants that have previously been attached, and not totally new tenants
-- this relies on the control plane's attachment logic to keep retrying
so that we should eventually see the attach API call. That's important
because the `/re-attach` API doesn't tell us which timelines we should
attach -- we still use local disk state for that. Ref:
https://github.com/neondatabase/neon/issues/5173
- Testing: generations are only enabled for one integration test right
now (test_pageserver_restart), as a smoke test that all the machinery
basically works. Writing fuller tests that stress tenant migration will
come later, and involve extending our test fixtures to deal with
multiple pageservers.
- I'm not in love with "attachment_service" as a name for the neon_local
component, but it's not very important because we can easily rename
these test bits whenever we want.
- Limited observability when in re-attach on startup: when I add
generation validation for deletions in a later PR, I want to wrap up the
control plane API calls in some small client class that will expose
metrics for things like errors calling the control plane API, which will
act as a strong red signal that something is not right.

Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-09-06 14:44:48 +01:00
Alexander Bayandin
da60f69909 approved-for-ci-run.yml: use our bot (#5216)
## Problem

Pull Requests created by GitHub Actions bot doesn't have access to
secrets, so we need to use our bot for it to auto-trigger a tests run

See previous PRs  #4663, #5210, #5212

## Summary of changes
- Use our bot to create PRs
2023-09-06 14:55:11 +03:00
John Spray
743933176e scrubber: add scan-metadata and hook into integration tests (#5176)
## Problem

- Scrubber's `tidy` command requires presence of a control plane
- Scrubber has no tests at all 

## Summary of changes

- Add re-usable async streams for reading metadata from a bucket
- Add a `scan-metadata` command that reads from those streams and calls
existing `checks.rs` code to validate metadata, then returns a summary
struct for the bucket. Command returns nonzero status if errors are
found.
- Add an `enable_scrub_on_exit()` function to NeonEnvBuilder so that
tests using remote storage can request to have the scrubber run after
they finish
- Enable remote storarge and scrub_on_exit in test_pageserver_restart
and test_pageserver_chaos

This is a "toe in the water" of the overall space of validating the
scrubber. Later, we should:
- Enable scrubbing at end of tests using remote storage by default
- Make the success condition stricter than "no errors": tests should
declare what tenants+timelines they expect to see in the bucket (or
sniff these from the functions tests use to create them) and we should
require that the scrubber reports on these particular tenants/timelines.

The `tidy` command is untouched in this PR, but it should be refactored
later to use similar async streaming interface instead of the current
batch-reading approach (the streams are faster with large buckets), and
to also be covered by some tests.


---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Conrad Ludgate <conrad@neon.tech>
2023-09-06 11:55:24 +01:00
Alexander Bayandin
8e25d3e79e test_runner: add scale parameter to tpc-h tests (#5079)
## Problem

It's hard to find out which DB size we use for OLAP benchmarks (TPC-H in
particular).
This PR adds handling of `TEST_OLAP_SCALE` env var, which is get added
to a test name as a parameter.

This is required for performing larger periodic benchmarks. 

## Summary of changes
- Handle `TEST_OLAP_SCALE` in
`test_runner/performance/test_perf_olap.py`
- Set `TEST_OLAP_SCALE` in `.github/workflows/benchmarking.yml` to a
TPC-H scale
2023-09-06 13:22:57 +03:00
duguorong009
4fec48f2b5 chore(pageserver): remove unnecessary logging in tenant task loops (#5188)
Fixes #3830 by adding the `#[cfg(not(feature = "testing"))]` attribute
to unnecessary loggings in `pageserver/src/tenant/tasks.rs`.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-09-06 13:19:19 +03:00
Vadim Kharitonov
88b1ac48bd Create Release PR at 7:00 UTC every Tuesday (#5213) 2023-09-06 13:17:52 +03:00
Alexander Bayandin
15ff4e5fd1 approved-for-ci-run.yml: trigger on pull_request_target (#5212)
## Problem

Continuation of #4663, #5210

We're still getting an error:
```
GraphQL: Resource not accessible by integration (removeLabelsFromLabelable)
```

## Summary of changes
- trigger `approved-for-ci-run.yml` workflow on `pull_request_target`
instead of `pull_request`
2023-09-06 13:14:07 +03:00
Alexander Bayandin
dbfb4ea7b8 Make CI more friendly for external contributors. Second try (#5210)
## Problem

`approved-for-ci-run` label logic doesn't work as expected:
- https://github.com/neondatabase/neon/pull/4722#issuecomment-1636742145
- https://github.com/neondatabase/neon/pull/4722#issuecomment-1636755394

Continuation of https://github.com/neondatabase/neon/pull/4663
Closes #2222 (hopefully)

## Summary of changes
- Create a twin PR automatically
- Allow `GITHUB_TOKEN` to manipulate with labels
2023-09-06 10:06:55 +01:00
Alexander Bayandin
c222320a2a Generate lcov coverage report (#4999)
## Problem

We want to display coverage information for each PR.

- an example of a full coverage report:
https://neon-github-public-dev.s3.amazonaws.com/code-coverage/abea64800fb390c32a3efe6795d53d8621115c83/lcov/index.html
- an example of GitHub auto-comment with coverage information:
https://github.com/neondatabase/neon/pull/4999#issuecomment-1679344658

## Summary of changes
- Use
patched[*](426e7e7a22)
lcov to generate coverage report
- Upload HTML coverage report to S3
- `scripts/comment-test-report.js`: add coverage information
2023-09-06 00:48:15 +01:00
MMeent
89c64e179e Fix corruption issue in Local File Cache (#5208)
Fix issue where updating the size of the Local File Cache could lead to
invalid reads:

## Problem

LFC cache can get re-enabled when lfc_max_size is set, e.g. through an
autoscaler configuration, or PostgreSQL not liking us setting the
variable.

1. initialize: LFC enabled (lfc_size_limit > 0; lfc_desc = 0)
2. Open LFC file fails, lfc_desc = -1. lfc_size_limit is set to 0;
3. lfc_size_limit is updated by autoscaling to >0
4. read() now thinks LFC is enabled (size_limit > 0) and lfc_desc is
valid, but doesn't try to read from the invalid file handle and thus
doesn't update the buffer content with the page's data, but does think
the data was read...
Any buffer we try to read from local file cache is essentially
uninitialized memory. Those are likely 0-bytes, but might potentially be
any old buffer that was previously read from or flushed to disk.

Fix this by adding a more definitive disable flag, plus better invalid state handling.
2023-09-05 20:00:47 +02:00
Alexander Bayandin
7ceddadb37 Merge custom extension CI jobs (#5194)
## Problem

When a remote custom extension build fails, it looks a bit confusing on
neon CI:
- `trigger-custom-extensions-build` is green
- `wait-for-extensions-build` is red
- `build-and-upload-extensions` is red

But to restart the build (to get everything green), you need to restart
the only passed `trigger-custom-extensions-build`.

## Summary of changes
- Merge `trigger-custom-extensions-build` and
`wait-for-extensions-build` jobs into
`trigger-custom-extensions-build-and-wait`
2023-09-05 14:02:37 +01:00
Arpad Müller
4904613aaa Convert VirtualFile::{seek,metadata} to async (#5195)
## Problem

We want to convert the `VirtualFile` APIs to async fn so that we can
adopt one of the async I/O solutions.

## Summary of changes

Convert the following APIs of `VirtualFile` to async fn (as well as all
of the APIs calling it):

* `VirtualFile::seek`
* `VirtualFile::metadata`
* Also, prepare for deletion of the write impl by writing the summary to
a buffer before writing it to disk, as suggested in
https://github.com/neondatabase/neon/issues/4743#issuecomment-1700663864
. This change adds an additional warning for the case when the summary
exceeds a block. Previously, we'd have silently corrupted data in this
(unlikely) case.
* `WriteBlobWriter::write_blob`, in preparation for making
`VirtualFile::write_all` async.
2023-09-05 12:55:45 +02:00
Nikita Kalyanov
77658a155b support deploying in IPv6-only environments (#4135)
A set of changes to enable neon to work in IPv6 environments. The
changes are backward-compatible but allow to deploy neon even to
IPv6-only environments:
- bind to both IPv4 and IPv6 interfaces
- allow connections to Postgres from IPv6 interface
- parse the address from control plane that could also be IPv6
2023-09-05 12:45:46 +03:00
Arpad Müller
128a85ba5e Convert many VirtualFile APIs to async (#5190)
## Problem

`VirtualFile` does both reading and writing, and it would be nice if
both could be converted to async, so that it doesn't have to support an
async read path and a blocking write path (especially for the locks this
is annoying as none of the lock implementations in std, tokio or
parking_lot have support for both async and blocking access).

## Summary of changes

This PR is some initial work on making the `VirtualFile` APIs async. It
can be reviewed commit-by-commit.

* Introduce the `MaybeVirtualFile` enum to be generic in a test that
compares real files with virtual files.
* Make various APIs of `VirtualFile` async, including `write_all_at`,
`read_at`, `read_exact_at`.

Part of #4743 , successor of #5180.

Co-authored-by: Christian Schwarz <me@cschwarz.com>
2023-09-04 17:05:20 +02:00
Arpad Müller
6cd497bb44 Make VirtualFile::crashsafe_overwrite async fn (#5189)
## Problem

The `VirtualFile::crashsafe_overwrite` function was introduced by #5186
but it was not turned `async fn` yet. We want to make these functions
async fn as part of #4743.

## Summary of changes

Make `VirtualFile::crashsafe_overwrite` async fn, as well as all the
functions calling it. Don't make anything inside `crashsafe_overwrite`
use async functionalities, as per #4743 instructions.

Also, add rustdoc to `crashsafe_overwrite`.

Part of #4743.
2023-09-04 12:52:35 +02:00
John Spray
80f10d5ced pageserver: safe deletion for tenant directories (#5182)
## Problem

If a pageserver crashes partway through deleting a tenant's directory,
it might leave a partial state that confuses a subsequent
startup/attach.

## Summary of changes

Rename tenant directory to a temporary path before deleting.

Timeline deletions already have deletion markers to provide safety.

In future, it would be nice to exploit this to send responses to detach
requests earlier: https://github.com/neondatabase/neon/issues/5183
2023-09-04 08:31:55 +01:00
Christian Schwarz
7e817789d5 VirtualFile: add crash-safe overwrite abstraction & use it (#5186)
(part of #4743)
(preliminary to #5180)
 
This PR adds a special-purpose API to `VirtualFile` for write-once
files.
It adopts it for `save_metadata` and `persist_tenant_conf`.

This is helpful for the asyncification efforts (#4743) and specifically
asyncification of `VirtualFile` because above two functions were the
only ones that needed the VirtualFile to be an `std::io::Write`.
(There was also `manifest.rs` that needed the `std::io::Write`, but, it
isn't used right now, and likely won't be used because we're taking a
different route for crash consistency, see #5172. So, let's remove it.
It'll be in Git history if we need to re-introduce it when picking up
the compaction work again; that's why it was introduced in the first
place).

We can't remove the `impl std::io::Write for VirtualFile` just yet
because of the `BufWriter` in

```rust
struct DeltaLayerWriterInner {
...
    blob_writer: WriteBlobWriter<BufWriter<VirtualFile>>,
}
```

But, @arpad-m and I have a plan to get rid of that by extracting the
append-only-ness-on-top-of-VirtualFile that #4994 added to
`EphemeralFile` into an abstraction that can be re-used in the
`DeltaLayerWriterInner` and `ImageLayerWriterInner`.
That'll be another PR.


### Performance Impact

This PR adds more fsyncs compared to before because we fsync the parent
directory every time.

1. For `save_metadata`, the additional fsyncs are unnecessary because we
know that `metadata` fits into a kernel page, and hence the write won't
be torn on the way into the kernel. However, the `metadata` file in
general is going to lose signficance very soon (=> see #5172), and the
NVMes in prod can definitely handle the additional fsync. So, let's not
worry about it.
2. For `persist_tenant_conf`, which we don't check to fit into a single
kernel page, this PR makes it actually crash-consistent. Before, we
could crash while writing out the tenant conf, leaving a prefix of the
tenant conf on disk.
2023-09-02 10:06:14 +02:00
John Spray
41aa627ec0 tests: get test name automatically for remote storage (#5184)
## Problem

Tests using remote storage have manually entered `test_name` parameters,
which:
- Are easy to accidentally duplicate when copying code to make a new
test
- Omit parameters, so don't actually create unique S3 buckets when
running many tests concurrently.

## Summary of changes

- Use the `request` fixture in neon_env_builder fixture to get the test
name, then munge that into an S3 compatible bucket name.
- Remove the explicit `test_name` parameters to enable_remote_storage
2023-09-01 17:29:38 +01:00
Conrad Ludgate
44da9c38e0 proxy: error typo (#5187)
## Problem

https://github.com/neondatabase/neon/pull/5162#discussion_r1311853491
2023-09-01 19:21:33 +03:00
Christian Schwarz
cfc0fb573d pageserver: run all Rust tests with remote storage enabled (#5164)
For
[#5086](https://github.com/neondatabase/neon/pull/5086#issuecomment-1701331777)
we will require remote storage to be configured in pageserver.

This PR enables `localfs`-based storage for all Rust unit tests.

Changes:

- In `TenantHarness`, set up localfs remote storage for the tenant.
- `create_test_timeline` should mimic what real timeline creation does,
and real timeline creation waits for the timeline to reach remote
storage. With this PR, `create_test_timeline` now does that as well.
- All the places that create the harness tenant twice need to shut down
the tenant before the re-create through a second call to `try_load` or
`load`.
- Without shutting down, upload tasks initiated by/through the first
incarnation of the harness tenant might still be ongoing when the second
incarnation of the harness tenant is `try_load`/`load`ed. That doesn't
make sense in the tests that do that, they generally try to set up a
scenario similar to pageserver stop & start.
- There was one test that recreates a timeline, not the tenant. For that
case, I needed to create a `Timeline::shutdown` method. It's a
refactoring of the existing `Tenant::shutdown` method.
- The remote_timeline_client tests previously set up their own
`GenericRemoteStorage` and `RemoteTimelineClient`. Now they re-use the
one that's pre-created by the TenantHarness. Some adjustments to the
assertions were needed because the assertions now need to account for
the initial image layer that's created by `create_test_timeline` to be
present.
2023-09-01 18:10:40 +02:00
Christian Schwarz
aa22000e67 FileBlockReader<File> is never used (#5181)
part of #4743

preliminary to #5180
2023-09-01 17:30:22 +02:00
Christian Schwarz
5edae96a83 rfc: Crash-Consistent Layer Map Updates By Leveraging index_part.json (#5086)
This RFC describes a simple scheme to make layer map updates crash
consistent by leveraging the index_part.json in remote storage. Without
such a mechanism, crashes can induce certain edge cases in which broadly
held assumptions about system invariants don't hold.
2023-09-01 15:24:58 +02:00
Christian Schwarz
40ce520c07 remote_timeline_client: tests: run upload ops on the tokio::test runtime (#5177)
The `remote_timeline_client` tests use `#[tokio::test]` and rely on the
fact that the test runtime that is set up by this macro is
single-threaded.

In PR https://github.com/neondatabase/neon/pull/5164, we observed
interesting flakiness with the `upload_scheduling` test case:
it would observe the upload of the third layer (`layer_file_name_3`)
before we did `wait_completion`.

Under the single-threaded-runtime assumption, that wouldn't be possible,
because the test code doesn't await inbetween scheduling the upload
and calling `wait_completion`.

However, RemoteTimelineClient was actually using `BACKGROUND_RUNTIME`.
That means there was parallelism where the tests didn't expect it,
leading to flakiness such as execution of an UploadOp task before
the test calls `wait_completion`.

The most confusing scenario is code like this:

```
schedule upload(A);
wait_completion.await; // B
schedule_upload(C);
wait_completion.await; // D
```

On a single-threaded executor, it is guaranteed that the upload up C
doesn't run before D, because we (the test) don't relinquish control
to the executor before D's `await` point.

However, RemoteTimelineClient actually scheduled onto the
BACKGROUND_RUNTIME, so, `A` could start running before `B` and
`C` could start running before `D`.

This would cause flaky tests when making assertions about the state
manipulated by the operations. The concrete issue that led to discover
of this bug was an assertion about `remote_fs_dir` state in #5164.
2023-09-01 16:24:04 +03:00
Alexander Bayandin
e9f2c64322 Wait for custom extensions build before deploy (#5170)
## Problem

Currently, the `deploy` job doesn't wait for the custom extension job
(in another repo) and can be started even with failed extensions build.
This PR adds another job that polls the status of the extension build job
and fails if the extension build fails.

## Summary of changes
- Add `wait-for-extensions-build` job, which waits for a custom
extension build in another repo.
2023-09-01 12:59:19 +01:00
John Spray
715077ab5b tests: broaden a log allow regex in test_ignored_tenant_stays_broken_without_metadata (#5168)
## Problem

- https://github.com/neondatabase/neon/issues/5167

## Summary of changes

Accept "will not become active" log line with _either_ Broken or
Stopping state, because we may hit it while in the process of doing the
`/ignore` (earlier in the test than the test expects to see the same
line with Broken)
2023-09-01 08:36:38 +01:00
John Spray
616e7046c7 s3_scrubber: import into the main neon repository (#5141)
## Problem

The S3 scrubber currently lives at
https://github.com/neondatabase/s3-scrubber

We don't have tests that use it, and it has copies of some data
structures that can get stale.

## Summary of changes

- Import the s3-scrubber as `s3_scrubber/
- Replace copied_definitions/ in the scrubber with direct access to the
`utils` and `pageserver` crates
- Modify visibility of a few definitions in `pageserver` to allow the
scrubber to use them
- Update scrubber code for recent changes to `IndexPart`
- Update `KNOWN_VERSIONS` for IndexPart and move the definition into
index.rs so that it is easier to keep up to date

As a future refinement, it would be good to pull the remote persistence
types (like IndexPart) out of `pageserver` into a separate library so
that the scrubber doesn't have to link against the whole pageserver, and
so that it's clearer which types need to be public.

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-08-31 19:01:39 +01:00
Conrad Ludgate
1b916a105a proxy: locked is not retriable (#5162)
## Problem

Management service returns Locked when quotas are exhausted. We cannot
retry on those

## Summary of changes

Makes Locked status unretriable
2023-08-31 15:50:15 +03:00
Conrad Ludgate
d11621d904 Proxy: proxy protocol v2 (#5028)
## Problem

We need to log the client IP, not the IP of the NLB.

## Summary of changes

Parse the proxy [protocol version
2](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) if
possible
2023-08-31 14:30:25 +03:00
John Spray
43bb8bfdbb pageserver: fix flake in test_timeline_deletion_with_files_stuck_in_upload_queue (#5149)
## Problem

Test failing on a different ERROR log than it anticipated.

Closes: https://github.com/neondatabase/neon/issues/5148

## Summary of changes

Add the "could not flush frozen layer" error log to the permitted
errors.
2023-08-31 10:42:32 +01:00
John Spray
300a5aa05e pageserver: fix test v4_indexpart_is_parsed (#5157)
## Problem

Two recent PRs raced:
- https://github.com/neondatabase/neon/pull/5153
- https://github.com/neondatabase/neon/pull/5140

## Summary of changes

Add missing `generation` argument to IndexLayerMetadata construction
2023-08-31 10:40:46 +01:00
Nikita Kalyanov
b9c111962f pass JWT to management API (#5151)
support authentication with JWT from env for proxy calls to mgmt API
2023-08-31 12:23:51 +03:00
John Spray
83ae2bd82c pageserver: generation number support in keys and indices (#5140)
## Problem

To implement split brain protection, we need tenants and timelines to be
aware of their current generation, and use it when composing S3 keys.


## Summary of changes

- A `Generation` type is introduced in the `utils` crate -- it is in
this broadly-visible location because it will later be used from
`control_plane/` as well as `pageserver/`. Generations can be a number,
None, or Broken, to support legacy content (None), and Tenants in the
broken state (Broken).
- Tenant, Timeline, and RemoteTimelineClient all get a generation
attribute
- IndexPart's IndexLayerMetadata has a new `generation` attribute.
Legacy layers' metadata will deserialize to Generation::none().
- Remote paths are composed with a trailing generation suffix. If a
generation is equal to Generation::none() (as it currently always is),
then this suffix is an empty string.
- Functions for composing remote storage paths added in
remote_timeline_client: these avoid the way that we currently always
compose a local path and then strip the prefix, and avoid requiring a
PageserverConf reference on functions that want to create remote paths
(the conf is only needed for local paths). These are less DRY than the
old functions, but remote storage paths are a very rarely changing
thing, so it's better to write out our paths clearly in the functions
than to compose timeline paths from tenant paths, etc.
- Code paths that construct a Tenant take a `generation` argument in
anticipation that we will soon load generations on startup before
constructing Tenant.

Until the whole feature is done, we don't want any generation-ful keys
though: so initially we will carry this everywhere with the special
Generation::none() value.

Closes: https://github.com/neondatabase/neon/issues/5135

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-08-31 09:19:34 +01:00
Alexey Kondratov
f2c21447ce [compute_ctl] Create check availability data during full configuration (#5084)
I've moved it to the API handler in the 589cf1ed2 to do not delay
compute start. Yet, we now skip full configuration and catalog updates
in the most hot path -- waking up suspended compute, and only do it at:

- first start
- start with applying new configuration
- start for availability check

so it doesn't really matter anymore.

The problem with creating the table and test record in the API handler
is that someone can fill up timeline till the logical limit. Then it's
suspended and availability check is scheduled, so it fails.

If table + test row are created at the very beginning, we reserve a 8 KB
page for future checks, which theoretically will last almost forever.
For example, my ~1y old branch still has 8 KB sized test table:
```sql
cloud_admin@postgres=# select pg_relation_size('health_check');
 pg_relation_size
------------------
             8192
(1 row)
```

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-08-30 17:44:28 +02:00
Conrad Ludgate
93dcdb293a proxy: password hack hack (#5126)
## Problem

fixes #4881 

## Summary of changes
2023-08-30 16:20:27 +01:00
John Spray
a93274b389 pageserver: remove vestigial timeline_layers attribute (#5153)
## Problem

`timeline_layers` was write-only since
b95addddd5

We deployed the version that no longer requires it for deserializing, so
now we can stop including it when serializing.

## Summary of changes

Fully remove `timeline_layers`.
2023-08-30 16:14:04 +01:00
Anastasia Lubennikova
a7c0e4dcd0 Check if custiom extension is enabled.
This check was lost in the latest refactoring.

If extension is not present in 'public_extensions' or 'custom_extensions' don't download it
2023-08-30 17:47:06 +03:00
Conrad Ludgate
3b81e0c86d chore: remove webpki (#5069)
## Problem

webpki is unmaintained

Closes https://github.com/neondatabase/neon/security/dependabot/33

## Summary of changes

Update all dependents of webpki.
2023-08-30 15:14:03 +01:00
Anastasia Lubennikova
e5a397cf96 Form archive_path for remote extensions on the fly 2023-08-30 13:56:51 +03:00
Joonas Koivunen
05773708d3 fix: add context for ancestor lsn wait (#5143)
In logs it is confusing to see seqwait timeouts which seemingly arise
from the branched lsn but actually are about the ancestor, leading to
questions like "has the last_record_lsn went back".

Noticed by @problame.
2023-08-30 12:21:41 +03:00
John Spray
382473d9a5 docs: add RFC for remote storage generation numbers (#4919)
## Summary

A scheme of logical "generation numbers" for pageservers and their
attachments is proposed, along with
changes to the remote storage format to include these generation numbers
in S3 keys.

Using the control plane as the issuer of these generation numbers
enables strong anti-split-brain
properties in the pageserver cluster without implementing a consensus
mechanism directly
in the pageservers.

## Motivation

Currently, the pageserver's remote storage format does not provide a
mechanism for addressing
split brain conditions that may happen when replacing a node during
failover or when migrating
a tenant from one pageserver to another. From a remote storage
perspective, a split brain condition
occurs whenever two nodes both think they have the same tenant attached,
and both can write to S3. This
can happen in the case of a network partition, pathologically long
delays (e.g. suspended VM), or software
bugs.

This blocks robust implementation of failover from unresponsive
pageservers, due to the risk that
the unresponsive pageserver is still writing to S3.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-08-30 09:49:55 +01:00
Arpad Müller
eb0a698adc Make page cache and read_blk async (#5023)
## Problem

`read_blk` does I/O and thus we would like to make it async. We can't
make the function async as long as the `PageReadGuard` returned by
`read_blk` isn't `Send`. The page cache is called by `read_blk`, and
thus it can't be async without `read_blk` being async. Thus, we have a
circular dependency.

## Summary of changes

Due to the circular dependency, we convert both the page cache and
`read_blk` to async at the same time:

We make the page cache use `tokio::sync` synchronization primitives as
those are `Send`. This makes all the places that acquire a lock require
async though, which we then also do. This includes also asyncification
of the `read_blk` function.

Builds upon #4994, #5015, #5056, and #5129.

Part of #4743.
2023-08-30 09:04:31 +02:00
Arseny Sher
81b6578c44 Allow walsender in recovery mode give WAL till dynamic flush_lsn.
Instead of fixed during the start of replication. To this end, create
term_flush_lsn watch channel similar to commit_lsn one. This allows to continue
recovery streaming if new data appears.
2023-08-29 23:19:40 +03:00
Arseny Sher
bc49c73fee Move wal_stream_connection_config to utils.
It will be used by safekeeper as well.
2023-08-29 23:19:40 +03:00
Arseny Sher
e98580b092 Add term and http endpoint to broker messaged SkTimelineInfo.
We need them for safekeeper peer recovery
https://github.com/neondatabase/neon/pull/4875
2023-08-29 23:19:40 +03:00
Arseny Sher
804ef23043 Rename TermSwitchEntry to TermLsn.
Add derive Ord for easy comparison of <term, lsn> pairs.

part of https://github.com/neondatabase/neon/pull/4875
2023-08-29 23:19:40 +03:00
Arseny Sher
87f7d6bce3 Start and stop per timeline recovery task.
Slightly refactors init: now load_tenant_timelines is also async to properly
init the timeline, but to keep global map lock sync we just acquire it anew for
each timeline.

Recovery task itself is just a stub here.

part of
https://github.com/neondatabase/neon/pull/4875
2023-08-29 23:19:40 +03:00
Arseny Sher
39e3fbbeb0 Add safekeeper peers to TimelineInfo.
Now available under GET /tenant/xxx/timeline/yyy for inspection.
2023-08-29 23:19:40 +03:00
Em Sharnoff
8d2a4aa5f8 vm-monitor: Add flag for when file cache on disk (#5130)
Part 1 of 2, for moving the file cache onto disk.

Because VMs are created by the control plane (and that's where the
filesystem for the file cache is defined), we can't rely on any kind of
synchronization between releases, so the change needs to be
feature-gated (kind of), with the default remaining the same for now.

See also: neondatabase/cloud#6593
2023-08-29 12:44:48 -07:00
Joonas Koivunen
d1fcdf75b3 test: enhanced logging for curious mock_s3 (#5134)
Possible flakyness with mock_s3. Add logging in hopes this will happen
again.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-08-29 14:48:50 +03:00
Alexander Bayandin
7e39a96441 scripts/flaky_tests.py: Improve flaky tests detection (#5094)
## Problem

We still need to rerun some builds manually because flaky tests weren't
detected automatically.
I found two reasons for it:
- If a test is flaky on a particular build type, on a particular
Postgres version, there's a high chance that this test is flaky on all
configurations, but we don't automatically detect such cases.
- We detect flaky tests only on the main branch, which requires manual
retrigger runs for freshly made flaky tests.
Both of them are fixed in the PR.

## Summary of changes
- Spread flakiness of a single test to all configurations
- Detect flaky tests in all branches (not only in the main)
- Look back only at  7 days of test history (instead of 10)
2023-08-29 11:53:24 +01:00
Vadim Kharitonov
babefdd3f9 Upgrade pgvector to 0.5.0 (#5132) 2023-08-29 12:53:50 +03:00
Arpad Müller
805fee1483 page cache: small code cleanups (#5125)
## Problem

I saw these things while working on #5111.

## Summary of changes

* Add a comment explaining why we use `Vec::leak` instead of
`Vec::into_boxed_slice` plus `Box::leak`.
* Add another comment explaining what `valid` is doing, it wasn't very
clear before.
* Add a function `set_usage_count` to not set it directly.
2023-08-29 11:49:04 +03:00
Felix Prasanna
85d6d9dc85 monitor/compute_ctl: remove references to the informant (#5115)
Also added some docs to the monitor :)

Co-authored-by: Em Sharnoff <sharnoff@neon.tech>
2023-08-29 02:59:27 +03:00
Em Sharnoff
e40ee7c3d1 remove unused file 'vm-cgconfig.conf' (#5127)
Honestly no clue why it's still here, should have been removed ages ago.
This is handled by vm-builder now.
2023-08-28 13:04:57 -07:00
Christian Schwarz
0fe3b3646a page cache: don't proactively evict EphemeralFile pages (#5129)
Before this patch, when dropping an EphemeralFile, we'd scan the entire
`slots` to proactively evict its pages (`drop_buffers_for_immutable`).

This was _necessary_ before #4994 because the page cache was a
write-back cache: we'd be deleting the EphemeralFile from disk after,
so, if we hadn't evicted its pages before that, write-back in
`find_victim` wouldhave failed.

But, since #4994, the page cache is a read-only cache, so, it's safe
to keep read-only data cached. It's never going to get accessed again
and eventually, `find_victim` will evict it.

The only remaining advantage of `drop_buffers_for_immutable` over
relying on `find_victim` is that `find_victim` has to do the clock
page replacement iterations until the count reaches 0,
whereas `drop_buffers_for_immutable` can kick the page out right away.

However, weigh that against the cost of `drop_buffers_for_immutable`,
which currently scans the entire `slots` array to find the
EphemeralFile's pages.

Alternatives have been proposed in #5122 and #5128, but, they come
with their own overheads & trade-offs.

Also, the real reason why we're looking into this piece of code is
that we want to make the slots rwlock async in #5023.
Since `drop_buffers_for_immutable` is called from drop, and there
is no async drop, it would be nice to not have to deal with this.

So, let's just stop doing `drop_buffers_for_immutable` and observe
the performance impact in benchmarks.
2023-08-28 20:42:18 +02:00
Em Sharnoff
529f8b5016 compute_ctl: Fix switched vm-monitor args (#5117)
Small switcheroo from #4946.
2023-08-28 14:55:41 +02:00
Joonas Koivunen
fbcd174489 load_layer_map: schedule deletions for any future layers (#5103)
Unrelated fixes noticed while integrating #4938.

- Stop leaking future layers in remote storage
- We schedule extra index_part uploads if layer name to be removed was
not actually present
2023-08-28 10:51:49 +03:00
Felix Prasanna
7b5489a0bb compute_ctl: start pg in cgroup for vms (#4920)
Starts `postgres` in cgroup directly from `compute_ctl` instead of from
`vm-builder`. This is required because the `vm-monitor` cannot be in the
cgroup it is managing. Otherwise, it itself would be frozen when
freezing the cgroup.

Requires https://github.com/neondatabase/cloud/pull/6331, which adds the
`AUTOSCALING` environment variable letting `compute_ctl` know to start
`postgres` in the cgroup.

Requires https://github.com/neondatabase/autoscaling/pull/468, which
prevents `vm-builder` from starting the monitor and putting postgres in
a cgroup. This will require a `VM_BUILDER_VERSION` bump.
2023-08-25 15:59:12 -04:00
Felix Prasanna
40268dcd8d monitor: fix filecache calculations (#5112)
## Problem
An underflow bug in the filecache calculations.

## Summary of changes
Fixed the bug, cleaned up calculations in general.
2023-08-25 13:29:10 -04:00
Vadim Kharitonov
4436c84751 Change codeowners (#5109) 2023-08-25 19:48:16 +03:00
John Spray
b758bf47ca pageserver: refactor TimelineMetadata serialization in IndexPart (#5091)
## Problem

The `metadata_bytes` field of IndexPart required explicit
deserialization & error checking everywhere it was used -- there isn't
anything special about this structure that should prevent it from being
serialized & deserialized along with the rest of the structure.

## Summary of changes

- Implement Serialize and Deserialize for TimelineMetadata
- Replace IndexPart::metadata_bytes with a simpler `metadata`, that can
be used directly.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-08-25 16:16:20 +01:00
Felix Prasanna
024e306f73 monitor: improve logging (#5099) 2023-08-25 10:09:53 -04:00
Alek Westover
f71c82e5de remove obsolete need dependency (#5087) 2023-08-25 09:10:26 -04:00
Conrad Ludgate
faf070f288 proxy: dont return connection pending (#5107)
## Problem

We were returning Pending when a connection had a notice/notification
(introduced recently in #5020). When returning pending, the runtime
assumes you will call `cx.waker().wake()` in order to continue
processing.

We weren't doing that, so the connection task would get stuck

## Summary of changes

Don't return pending. Loop instead
2023-08-25 15:08:45 +03:00
Arpad Müller
8c13296add Remove BlockReader::read_blk in favour of BlockCursor (#5015)
## Problem

We want to make `read_blk` an async function, but outside of
`async_trait`, which allocates, and nightly features, we can't use async
fn's in traits.

## Summary of changes

* Remove all uses of `BlockReader::read_blk` in favour of using block
  cursors, at least where the type of the `BlockReader` is behind a
  generic
* Introduce a `BlockReaderRef` enum that lists all implementors of
  `BlockReader::read_blk`.
* Remove `BlockReader::read_blk` and move its implementations into
  inherent functions on the types instead.

We don't turn `read_blk` into an async fn yet, for that we also need to
modify the page cache. So this is a preparatory PR, albeit an important
one.

Part of #4743.
2023-08-25 12:28:01 +02:00
Felix Prasanna
18537be298 monitor: listen on correct port to accept agent connections (#5100)
## Problem
The previous arguments have the monitor listen on `localhost`, which the
informant can connect to since it's also in the VM, but which the agent
cannot. Also, the port is wrong.

## Summary of changes
Listen on `0.0.0.0:10301`
2023-08-24 17:32:46 -04:00
Felix Prasanna
3128eeff01 compute_ctl: add vm-monitor (#4946)
Co-authored-by: Em Sharnoff <sharnoff@neon.tech>
2023-08-24 15:54:37 -04:00
Arpad Müller
227c87e333 Make EphemeralFile::write_blob function async (#5056)
## Problem

The `EphemeralFile::write_blob` function accesses the page cache
internally. We want to require `async` for these accesses in #5023.

## Summary of changes

This removes the implementaiton of the `BlobWriter` trait for
`EphemeralFile` and turns the `write_blob` function into an inherent
function. We can then make it async as well as the `push_bytes`
function. We move the `SER_BUFFER` thread-local into the
`InMemoryLayerInner` so that the same buffer can be accessed by
different threads as the async is (potentially) moved between threads.

Part of #4743, preparation for #5023.
2023-08-24 19:18:30 +02:00
Alek Westover
e8f9aaf78c Don't use non-existent docker tags (#5096) 2023-08-24 19:45:23 +03:00
Chengpeng Yan
fa74d5649e rename EphmeralFile::size to EphemeralFile::len (#5076)
## Problem
close https://github.com/neondatabase/neon/issues/5034

## Summary of changes
Based on the
[comment](https://github.com/neondatabase/neon/pull/4994#discussion_r1297277922).
Just rename the `EphmeralFile::size` to `EphemeralFile::len`.
2023-08-24 16:41:57 +02:00
Joonas Koivunen
f70871dfd0 internal-devx: pageserver future layers (#5092)
I've personally forgotten why/how can we have future layers during
reconciliation. Adds `#[cfg(feature = "testing")]` logging when we
upload such index_part.json, with a cross reference to where the cleanup
happens.

Latest private slack thread:
https://neondb.slack.com/archives/C033RQ5SPDH/p1692879032573809?thread_ts=1692792276.173979&cid=C033RQ5SPDH

Builds upon #5074. Should had been considered on #4837.
2023-08-24 17:22:36 +03:00
Alek Westover
99a1be6c4e remove upload step from neon, it is in private repo now (#5085) 2023-08-24 17:14:40 +03:00
Joonas Koivunen
76aa01c90f refactor: single phase Timeline::load_layer_map (#5074)
Current implementation first calls `load_layer_map`, which loads all
local layers, cleans up files, leave cleaning up stuff to "second
function". Then the "second function" is finally called, it does not do
the cleanup and some of the first functions setup can torn down. "Second
function" is actually both `reconcile_with_remote` and
`create_remote_layers`.

This change makes it a bit more verbose but in one phase with the
following sub-steps:
1. scan the timeline directory
2. delete extra files
    - now including on-demand download files
    - fixes #3660
3. recoincile the two sources of layers (directory, index_part)
4. rename_to_backup future layers, short layers
5. create the remaining as layers

Needed by #4938.

It was also noticed that this is blocking code in an `async fn` so just
do it in a `spawn_blocking`, which should be healthy for our startup
times. Other effects includes hopefully halving of `stat` calls; extra
calls which were not done previously are now done for the future layers.

Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
2023-08-24 16:07:40 +03:00
John Spray
3e2f0ffb11 libs: make backoff::retry() take a cancellation token (#5065)
## Problem

Currently, anything that uses backoff::retry will delay the join of its
task by however long its backoff sleep is, multiplied by its max
retries.

Whenever we call a function that sleeps, we should be passing in a
CancellationToken.

## Summary of changes

- Add a `Cancel` type to backoff::retry that wraps a CancellationToken
and an error `Fn` to generate an error if the cancellation token fires.
- In call sites that already run in a `task_mgr` task, use
`shutdown_token()` to provide the token. In other locations, use a dead
`CancellationToken` to satisfy the interface, and leave a TODO to fix it
up when we broaden the use of explicit cancellation tokens.
2023-08-24 14:54:46 +03:00
Arseny Sher
d597e6d42b Track list of walreceivers and their voting/streaming state in shmem.
Also add both walsenders and walreceivers to TimelineStatus (available under
v1/tenant/xxx/timeline/yyy).

Prepares for
https://github.com/neondatabase/neon/pull/4875
2023-08-23 16:04:08 +03:00
Christian Schwarz
71ccb07a43 ci: fix upload-postgres-extensions-to-s3 job (#5063)
This is cherry-picked-then-improved version of release branch commit
4204960942 PR #4861)

The commit

	commit 5f8fd640bf
	Author: Alek Westover <alek.westover@gmail.com>
	Date:   Wed Jul 26 08:24:03 2023 -0400

	    Upload Test Remote Extensions (#4792)

switched to using the release tag instead of `latest`, but,
the `promote-images` job only uploads `latest` to the prod ECR.

The switch to using release tag was good in principle, but,
it broke the release pipeline. So, switch release pipeline
back to using `latest`.

Note that a proper fix should abandon use of `:latest` tag
at all: currently, if a `main` pipeline runs concurrently
with a `release` pipeline, the `release` pipeline may end
up using the `main` pipeline's images.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-08-22 22:45:25 +03:00
Joonas Koivunen
ad8d777c1c refactor: remove is_incremental=true for ImageLayers footgun (#5061)
Accidentially giving is_incremental=true for ImageLayers costs a lot of
debugging time. Removes all API which would allow to do that. They can
easily be restored later *when needed*.

Split off from #4938.
2023-08-22 22:12:05 +03:00
Joonas Koivunen
2f97b43315 build: update tar, get rid of duplicate xattr (#5071)
`tar` recently pushed to 0.4.40. No big changes, but less Cargo.lock and
one less nagging from `cargo-deny`.

The diff:
https://github.com/alexcrichton/tar-rs/compare/0.4.38...0.4.40.
2023-08-22 21:21:44 +03:00
Joonas Koivunen
533a92636c refactor: pre-cleanup Layer, PersistentLayer and impls (#5059)
Remove pub but dead code, move trait methods as inherent methods, remove
unnecessary. Split off from #4938.
2023-08-22 21:14:28 +03:00
Alek Westover
bf303a6575 Trigger workflow in remote (private) repo to build and upload private extensions (#4944) 2023-08-22 13:32:29 -04:00
Christian Schwarz
8cd20485f8 metrics: smgr query time: add a pre-aggregated histogram (#5064)
When doing global queries in VictoriaMetrics, the per-timeline
histograms make us run into cardinality limits.

We don't want to give them up just yet because we don't
have an alternative for drilling down on timeline-specific
performance issues.

So, add a pre-aggregated histogram and add observations to it
whenever we add observations to the per-timeline histogram.

While we're at it, switch to using a strummed enum for the operation
type names.
2023-08-22 20:08:31 +03:00
Joonas Koivunen
933a869f00 refactor: compaction becomes async again (#5058)
#4938 will make on-demand download of layers in compaction possible, so
it's not suitable for our "policy" of no `spawn_blocking(|| ...
Handle::block_on(async { spawn_blocking(...).await })` because this
poses a clear deadlock risk. Nested spawn_blockings are because of the
download using `tokio::fs::File`.

- Remove `spawn_blocking` from caller of `compact_level0_phase1`
- Remove `Handle::block_on` from `compact_level0_phase1` (indentation
change)
- Revert to `AsLayerDesc::layer_desc` usage temporarily (until it
becomes field access in #4938)
2023-08-22 20:03:14 +03:00
Conrad Ludgate
8c6541fea9 chore: add supported targets to deny (#5070)
## Problem

many duplicate windows crates pollute the cargo deny output

## Summary of changes

we don't build those crates, so remove those targets from being checked
2023-08-22 19:44:31 +03:00
Alek Westover
5cf75d92d8 Fix cargo deny errors (#5068)
## Problem
cargo deny lint broken

Links to the CVEs:

[rustsec.org/advisories/RUSTSEC-2023-0052](https://rustsec.org/advisories/RUSTSEC-2023-0052)

[rustsec.org/advisories/RUSTSEC-2023-0053](https://rustsec.org/advisories/RUSTSEC-2023-0053)
One is fixed, the other one isn't so we allow it (for now), to unbreak
CI. Then later we'll try to get rid of webpki in favour of the rustls
fork.

## Summary of changes
```
+ignore = ["RUSTSEC-2023-0052"]
```
2023-08-22 18:41:32 +03:00
Conrad Ludgate
0b001a0001 proxy: remove connections on shutdown (#5051)
## Problem

On shutdown, proxy connections are staying open.

## Summary of changes

Remove the connections on shutdown
2023-08-21 19:20:58 +01:00
Felix Prasanna
4a8bd866f6 bump vm-builder version to v0.16.3 (#5055)
This change to autoscaling allows agents to connect directly to the
monitor, completely removing the informant.
2023-08-21 13:29:16 -04:00
John Spray
615a490239 pageserver: refactor Tenant/Timeline args into structs (#5053)
## Problem

There are some common types that we pass into tenants and timelines as
we construct them, such as remote storage and the broker client.
Currently the list is small, but this is likely to grow -- the deletion
queue PR (#4960) pushed some methods to the point of clippy complaining
they had too many args, because of the extra deletion queue client being
passed around.

There are some shared objects that currently aren't passed around
explicitly because they use a static `once_cell` (e.g.
CONCURRENT_COMPACTIONS), but as we add more resource management and
concurreny control over time, it will be more readable & testable to
pass a type around in the respective Resources object, rather than to
coordinate via static objects. The `Resources` structures in this PR
will make it easier to add references to central coordination functions,
without having to rely on statics.

## Summary of changes

- For `Tenant`, the `broker_client` and `remote_storage` are bundled
into `TenantSharedResources`
- For `Timeline`, the `remote_client` is wrapped into
`TimelineResources`.

Both of these structures will get an additional deletion queue member in
#4960.
2023-08-21 17:30:28 +01:00
John Spray
b95addddd5 pageserver: do not read redundant timeline_layers from IndexPart, so that we can remove it later (#4972)
## Problem

IndexPart contains two redundant lists of layer names: a set of the
names, and then a map of name to metadata.

We already required that all the layers in `timeline_layers` are also in
`layers_metadata`, in `initialize_with_current_remote_index_part`, so if
there were any index_part.json files in the field that relied on these
sets being different, they would already be broken.

## Summary of changes

`timeline_layers` is made private and no longer read at runtime. It is
still serialized, but not deserialized.

`disk_consistent_lsn` is also made private, as this field only exists
for convenience of humans reading the serialized JSON.

This prepares us to entirely remove `timeline_layers` in a future
release, once this change is fully deployed, and therefore no
pageservers are trying to read the field.
2023-08-21 14:29:36 +03:00
Joonas Koivunen
130ccb4b67 Remove initial timeline id troubles (#5044)
I made a mistake when I adding `env.initial_timeline:
Optional[TimelineId]` in the #3839, should had just generated it and
used it to create a specific timeline. This PR fixes those mistakes, and
some extra calling into psql which must be slower than python field
access.
2023-08-20 12:33:19 +03:00
Dmitry Rodionov
9140a950f4 Resume tenant deletion on attach (#5039)
I'm still a bit nervous about attach -> crash case. But it should work.
(unlike case with timeline). Ideally would be cool to cover this with
test.

This continues tradition of adding bool flags for Tenant::set_stopping.
Probably lifecycle project will help with fixing it.
2023-08-20 12:28:50 +03:00
Arpad Müller
a23b0773f1 Fix DeltaLayer dumping (#5045)
## Problem

Before, DeltaLayer dumping (via `cargo run --release -p pagectl --
print-layer-file` ) would crash as one can't call `Handle::block_on` in
an async executor thread.

## Summary of changes

Avoid the problem by using `DeltaLayerInner::load_keys` to load the keys
into RAM (which we already do during compaction), and then load the
values one by one during dumping.
2023-08-19 00:56:03 +02:00
Joonas Koivunen
368ee6c8ca refactor: failpoint support (#5033)
- move them to pageserver which is the only dependant on the crate fail
- "move" the exported macro to the new module
- support at init time the same failpoints as runtime

Found while debugging test failures and making tests more repeatable by
allowing "exit" from pageserver start via environment variables. Made
those changes to `test_gc_cutoff.py`.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-08-19 01:01:44 +03:00
Felix Prasanna
5c6a692cf1 bump VM_BUILDER_VERSION to v0.16.2 (#5031)
A very slight change that allows us to configure the UID of the
neon-postgres cgroup owner. We start postgres in this cgroup so we can
scale it with the cgroups v2 api. Currently, the control plane
overwrites the entrypoint set by `vm-builder`, so `compute_ctl` (and
thus postgres), is not started in the neon-postgres cgroup. Having
`compute_ctl` start postgres in the cgroup should fix this. However, at
the moment appears like it does not have the correct permissions.
Configuring the neon-postgres UID to `postgres` (which is the UID
`compute_ctl` runs under) should hopefully fix this.


See #4920 - the PR to modify `compute_ctl` to start postgres in the
cgorup.
See: neondatabase/autoscaling#480, neondatabase/autoscaling#477. Both
these PR's are part of an effort to increase `vm-builder`'s
configurability and allow us to adjust it as we integrate in the
monitor.
2023-08-18 14:29:20 -04:00
Dmitry Rodionov
30888a24d9 Avoid flakiness in test_timeline_delete_fail_before_local_delete (#5032)
The problem was that timeline detail can return timelines in not only
active state. And by the time request comes timeline deletion can still
be in progress if we're unlucky (test execution happened to be slower
for some reason)

Reference for failed test run

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5022/5891420105/index.html#suites/f588e0a787c49e67b29490359c589fae/dab036e9bd673274

The error was `Exception: detail succeeded (it should return 404)`

reported by @koivunej
2023-08-18 20:49:11 +03:00
Dmitry Rodionov
f6c671c140 resume timeline deletions on attach (#5030)
closes [#5036](https://github.com/neondatabase/neon/issues/5036)
2023-08-18 20:48:33 +03:00
Christian Schwarz
ed5bce7cba rfcs: archive my MVCC S3 Notion Proposal (#5040)
This is a copy from the [original Notion
page](https://www.notion.so/neondatabase/Proposal-Pageserver-MVCC-S3-Storage-8a424c0c7ec5459e89d3e3f00e87657c?pvs=4),
taken on 2023-08-16.

This is for archival mostly.
The RFC that we're likely to go with is
https://github.com/neondatabase/neon/pull/4919.
2023-08-18 19:34:29 +02:00
Christian Schwarz
7a63685cde simplify page-caching of EphemeralFile (#4994)
(This PR is the successor of https://github.com/neondatabase/neon/pull/4984 )

## Summary

The current way in which `EphemeralFile` uses `PageCache` complicates
the Pageserver code base to a degree that isn't worth it.
This PR refactors how we cache `EphemeralFile` contents, by exploiting
the append-only nature of `EphemeralFile`.

The result is that `PageCache` only holds `ImmutableFilePage` and
`MaterializedPage`.
These types of pages are read-only and evictable without write-back.
This allows us to remove the writeback code from `PageCache`, also
eliminating an entire failure mode.

Futher, many great open-source libraries exist to solve the problem of a
read-only cache,
much better than our `page_cache.rs` (e.g., better replacement policy,
less global locking).
With this PR, we can now explore using them.

## Problem & Analysis

Before this PR, `PageCache` had three types of pages:

* `ImmutableFilePage`: caches Delta / Image layer file contents
* `MaterializedPage`: caches results of Timeline::get (page
materialization)
* `EphemeralPage`: caches `EphemeralFile` contents

`EphemeralPage` is quite different from `ImmutableFilePage` and
`MaterializedPage`:

* Immutable and materialized pages are for the acceleration of (future)
reads of the same data using `PAGE_CACHE_SIZE * PAGE_SIZE` bytes of
DRAM.
* Ephemeral pages are a write-back cache of `EphemeralFile` contents,
i.e., if there is pressure in the page cache, we spill `EphemeralFile`
contents to disk.

`EphemeralFile` is only used by `InMemoryLayer`, for the following
purposes:
* **write**: when filling up the `InMemoryLayer`, via `impl BlobWriter
for EphemeralFile`
* **read**: when doing **page reconstruction** for a page@lsn that isn't
written to disk
* **read**: when writing L0 layer files, we re-read the `InMemoryLayer`
and put the contents into the L0 delta writer
(**`create_delta_layer`**). This happens every 10min or when
InMemoryLayer reaches 256MB in size.

The access patterns of the `InMemoryLayer` use case are as follows:

* **write**: via `BlobWriter`, strictly append-only
* **read for page reconstruction**: via `BlobReader`, random
* **read for `create_delta_layer`**: via `BlobReader`, dependent on
data, but generally random. Why?
* in classical LSM terms, this function is what writes the
memory-resident `C0` tree into the disk-resident `C1` tree
* in our system, though, the values of InMemoryLayer are stored in an
EphemeralFile, and hence they are not guaranteed to be memory-resident
* the function reads `Value`s in `Key, LSN` order, which is `!=` insert
order

What do these `EphemeralFile`-level access patterns mean for the page
cache?

* **write**:
* the common case is that `Value` is a WAL record, and if it isn't a
full-page-image WAL record, then it's smaller than `PAGE_SIZE`
* So, the `EphemeralPage` pages act as a buffer for these `< PAGE_CACHE`
sized writes.
* If there's no page cache eviction between subsequent
`InMemoryLayer::put_value` calls, the `EphemeralPage` is still resident,
so the page cache avoids doing a `write` system call.
* In practice, a busy page server will have page cache evictions because
we only configure 64MB of page cache size.
* **reads for page reconstruction**: read acceleration, just as for the
other page types.
* **reads for `create_delta_layer`**:
* The `Value` reads happen through a `BlockCursor`, which optimizes the
case of repeated reads from the same page.
* So, the best case is that subsequent values are located on the same
page; hence `BlockCursor`s buffer is maximally effective.
* The worst case is that each `Value` is on a different page; hence the
`BlockCursor`'s 1-page-sized buffer is ineffective.
* The best case translates into `256MB/PAGE_SIZE` page cache accesses,
one per page.
    * the worst case translates into `#Values` page cache accesses
* again, the page cache accesses must be assumed to be random because
the `Value`s aren't accessed in insertion order but `Key, LSN` order.

## Summary of changes

Preliminaries for this PR were:

- #5003
- #5004 
- #5005 
  - uncommitted microbenchmark in #5011 

Based on the observations outlined above, this PR makes the following
changes:

* Rip out `EphemeralPage` from `page_cache.rs`
* Move the `block_io::FileId` to `page_cache::FileId`
* Add a `PAGE_SIZE`d buffer to the `EphemeralPage` struct.
  It's called `mutable_tail`.
* Change `write_blob` to use `mutable_tail` for the write buffering
instead of a page cache page.
* if `mutable_tail` is full, it writes it out to disk, zeroes it out,
and re-uses it.
* There is explicitly no double-buffering, so that memory allocation per
`EphemeralFile` instance is fixed.
* Change `read_blob` to return different `BlockLease` variants depending
on `blknum`
* for the `blknum` that corresponds to the `mutable_tail`, return a ref
to it
* Rust borrowing rules prevent `write_blob` calls while refs are
outstanding.
  * for all non-tail blocks, return a page-cached `ImmutablePage`
* It is safe to page-cache these as ImmutablePage because EphemeralFile
is append-only.

## Performance 

How doe the changes above affect performance?
M claim is: not significantly.

* **write path**:
* before this PR, the `EphemeralFile::write_blob` didn't issue its own
`write` system calls.
* If there were enough free pages, it didn't issue *any* `write` system
calls.
* If it had to evict other `EphemeralPage`s to get pages a page for its
writes (`get_buf_for_write`), the page cache code would implicitly issue
the writeback of victim pages as needed.
* With this PR, `EphemeralFile::write_blob` *always* issues *all* of its
*own* `write` system calls.
* Also, the writes are explicit instead of implicit through page cache
write back, which will help #4743
* The perf impact of always doing the writes is the CPU overhead and
syscall latency.
* Before this PR, we might have never issued them if there were enough
free pages.
* We don't issue `fsync` and can expect the writes to only hit the
kernel page cache.
* There is also an advantage in issuing the writes directly: the perf
impact is paid by the tenant that caused the writes, instead of whatever
tenant evicts the `EphemeralPage`.
* **reads for page reconstruction**: no impact.
* The `write_blob` function pre-warms the page cache when it writes the
`mutable_tail` to disk.
* So, the behavior is the same as with the EphemeralPages before this
PR.
* **reads for `create_delta_layer`**: no impact.
  * Same argument as for page reconstruction.
  * Note for the future:
* going through the page cache likely causes read amplification here.
Why?
* Due to the `Key,Lsn`-ordered access pattern, we don't read all the
values in the page before moving to the next page. In the worst case, we
might read the same page multiple times to read different `Values` from
it.
    * So, it might be better to bypass the page cache here.
    * Idea drafts:
      * bypass PS page cache + prefetch pipeline + iovec-based IO
* bypass PS page cache + use `copy_file_range` to copy from ephemeral
file into the L0 delta file, without going through user space
2023-08-18 20:31:03 +03:00
Joonas Koivunen
0a082aee77 test: allow race with flush and stopped queue (#5027)
A lucky race can happen with the shutdown order I guess right now. Seen
in [test_tenant_delete_smoke].

The message is not the greatest to match against.

[test_tenant_delete_smoke]:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/5892262320/index.html#suites/3556ed71f2d69272a7014df6dcb02317/189a0d1245fb5a8c
2023-08-18 19:36:25 +03:00
Arthur Petukhovsky
0b90411380 Fix safekeeper recovery with auth (#5035)
Fix missing a password in walrcv_connect for a safekeeper recovery. Add
a test which restarts endpoint and triggers a recovery.
2023-08-18 16:48:55 +01:00
Arpad Müller
f4da010aee Make the compaction warning more tolerant (#5024)
## Problem

The performance benchmark in `test_runner/performance/test_layer_map.py`
is currently failing due to the warning added in #4888.

## Summary of changes

The test mentioned has a `compaction_target_size` of 8192, which is just
one page size. This is an unattainable goal, as we generate at least
three pages: one for the header, one for the b-tree (minimally sized
ones have just the root node in a single page), one for the data.

Therefore, we add two pages to the warning limit. The warning text
becomes a bit less accurate but I think this is okay.
2023-08-18 16:36:31 +02:00
Conrad Ludgate
ec10838aa4 proxy: pool connection logs (#5020)
## Problem

Errors and notices that happen during a pooled connection lifecycle have
no session identifiers

## Summary of changes

Using a watch channel, we set the session ID whenever it changes. This
way we can see the status of a connection for that session

Also, adding a connection id to be able to search the entire connection
lifecycle
2023-08-18 11:44:08 +01:00
Joonas Koivunen
67af24191e test: cleanup remote_timeline_client tests (#5013)
I will have to change these as I change remote_timeline_client api in
#4938. So a bit of cleanup, handle my comments which were just resolved
during initial review.

Cleanup:
- use unwrap in tests instead of mixed `?` and `unwrap`
- use `Handle` instead of `&'static Reactor` to make the
RemoteTimelineClient more natural
- use arrays in tests
- use plain `#[tokio::test]`
2023-08-17 19:27:30 +03:00
Joonas Koivunen
6af5f9bfe0 fix: format context (#5022)
We return an error with unformatted `{timeline_id}`.
2023-08-17 14:30:25 +00:00
Dmitry Rodionov
64fc7eafcd Increase timeout once again. (#5021)
When failpoint is early in deletion process it takes longer to complete
after failpoint is removed.

Example was: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5889544346/index.html#suites/3556ed71f2d69272a7014df6dcb02317/49826c68ce8492b1
2023-08-17 15:37:28 +03:00
Conrad Ludgate
3e4710c59e proxy: add more sasl logs (#5012)
## Problem

A customer is having trouble connecting to neon from their production
environment. The logs show a mix of "Internal error" and "authentication
protocol violation" but not the full error

## Summary of changes

Make sure we don't miss any logs during SASL/SCRAM
2023-08-17 12:05:54 +01:00
Dmitry Rodionov
d8b0a298b7 Do not attach deleted tenants (#5008)
Rather temporary solution before proper:
https://github.com/neondatabase/neon/issues/5006

It requires more plumbing so lets not attach deleted tenants first and
then implement resume.

Additionally fix `assert_prefix_empty`. It had a buggy prefix calculation,
and since we always asserted for absence of stuff it worked. Here I
started to assert for presence of stuff too and it failed. Added more
"presence" asserts to other places to be confident that it works.

Resolves [#5016](https://github.com/neondatabase/neon/issues/5016)
2023-08-17 13:46:49 +03:00
Alexander Bayandin
c8094ee51e test_compatibility: run amcheck unconditionally (#4985)
## Problem

The previous version of neon (that we use in the forward compatibility test)
has installed `amcheck` extension now. We can run `pg_amcheck`
unconditionally.

## Summary of changes
- Run `pg_amcheck` in compatibility tests unconditionally
2023-08-17 11:46:00 +01:00
Christian Schwarz
957af049c2 ephemeral file: refactor write_blob impl to concentrate mutable state (#5004)
Before this patch, we had the `off` and `blknum` as function-wide
mutable state. Now it's contained in the `Writer` struct.

The use of `push_bytes` instead of index-based filling of the buffer
also makes it easier to reason about what's going on.

This is prep for https://github.com/neondatabase/neon/pull/4994
2023-08-17 13:07:25 +03:00
Anastasia Lubennikova
786c7b3708 Refactor remote extensions index download.
Don't download ext_index.json from s3, but instead receive it as a part of spec from control plane.
This eliminates s3 access for most compute starts,
and also allows us to update extensions spec on the fly
2023-08-17 12:48:33 +03:00
Joonas Koivunen
d3612ce266 delta_layer: Restore generic from last week (#5014)
Restores #4937 work relating to the ability to use `ResidentDeltaLayer`
(which is an Arc wrapper) in #4938 for the ValueRef's by removing the
borrow from `ValueRef` and providing it from an upper layer.

This should not have any functional changes, most importantly, the
`main` will continue to use the borrowed `DeltaLayerInner`. It might be
that I can change #4938 to be like this. If that is so, I'll gladly rip
out the `Ref` and move the borrow back. But I'll first want to look at
the current test failures.
2023-08-17 11:47:31 +03:00
Christian Schwarz
994411f5c2 page cache: newtype the blob_io and ephemeral_file file ids (#5005)
This makes it more explicit that these are different u64-sized
namespaces.
Re-using one in place of the other would be catastrophic.

Prep for https://github.com/neondatabase/neon/pull/4994
which will eliminate the ephemeral_file::FileId and move the
blob_io::FileId into page_cache.
It makes sense to have this preliminary commit though,
to minimize amount of new concept in #4994 and other
preliminaries that depend on that work.
2023-08-16 18:33:47 +02:00
Conrad Ludgate
25934ec1ba proxy: reduce global conn pool contention (#4747)
## Problem

As documented, the global connection pool will be high contention.

## Summary of changes

Use DashMap rather than Mutex<HashMap>.

Of note, DashMap currently uses a RwLock internally, but it's partially
sharded to reduce contention by a factor of N. We could potentially use
flurry which is a port of Java's concurrent hashmap, but I have no good
understanding of it's performance characteristics. Dashmap is at least
equivalent to hashmap but less contention.

See the read heavy benchmark to analyse our expected performance
<https://github.com/xacrimon/conc-map-bench#ready-heavy>

I also spoke with the developer of dashmap recently, and they are
working on porting the implementation to use concurrent HAMT FWIW
2023-08-16 17:20:28 +01:00
Arpad Müller
0bdbc39cb1 Compaction: unify key and value reference vecs (#4888)
## Problem

PR #4839 has already reduced the number of b-tree traversals and vec
creations from 3 to 2, but as pointed out in
https://github.com/neondatabase/neon/pull/4839#discussion_r1279167815 ,
we would ideally just traverse the b-tree once during compaction.

Afer #4836, the two vecs created are one for the list of keys, lsns and
sizes, and one for the list of `(key, lsn, value reference)`. However,
they are not equal, as pointed out in
https://github.com/neondatabase/neon/pull/4839#issuecomment-1660418012
and the following comment: the key vec creation combines multiple
entries for which the lsn is changing but the key stays the same into
one, with the size being the sum of the sub-sizes. In SQL, this would
correspond to something like `SELECT key, lsn, SUM(size) FROM b_tree
GROUP BY key;` and `SELECT key, lsn, val_ref FROM b_tree;`. Therefore,
the join operation is non-trivial.

## Summary of changes

This PR merges the two lists of keys and value references into one. It's
not a trivial change and affects the size pattern of the resulting
files, which is why this is in a separate PR from #4839 .

The key vec is used in compaction for determining when to start a new
layer file. The loop uses various thresholds to come to this conclusion,
but the grouping via the key has led to the behaviour that regardless of
the threshold, it only starts a new file when either a new key is
encountered, or a new delta file.

The new code now does the combination after the merging and sorting of
the various keys from the delta files. This *mostly* does the same as
the old code, except for a detail: with the grouping done on a
per-delta-layer basis, the sorted and merged vec would still have
multiple entries for multiple delta files, but now, we don't have an
easy way to tell when a new input delta layer file is encountered, so we
cannot create multiple entries on that basis easily.

To prevent possibly infinite growth, our new grouping code compares the
combined size with the threshold, and if it is exceeded, it cuts a new
entry so that the downstream code can cut a new output file. Here, we
perform a tradeoff however, as if the threshold is too small, we risk
putting entries for the same key into multiple layer files, but if the
threshold is too big, we can in some instances exceed the target size.

Currently, we set the threshold to the target size, so in theory we
would stay below or roughly at double the `target_file_size`.

We also fix the way the size was calculated for the last key. The calculation
was wrong and accounted for the old layer's btree, even though we
already account for the overhead of the in-construction btree.

Builds on top of #4839 .
2023-08-16 18:27:18 +03:00
Dmitry Rodionov
96b84ace89 Correctly remove orphaned objects in RemoteTimelineClient::delete_all (#5000)
Previously list_prefixes was incorrectly used for that purpose. Change
to use list_files. Add a test.

Some drive by refactorings on python side to move helpers out of
specific test file to be widely accessible

resolves https://github.com/neondatabase/neon/issues/4499
2023-08-16 17:31:16 +03:00
Christian Schwarz
368b783ada ephemeral_file: remove FileExt impl (was only used by tests) (#5003)
Extracted from https://github.com/neondatabase/neon/pull/4994
2023-08-16 15:41:25 +02:00
Dmitry Rodionov
0f47bc03eb Fix delete_objects in UnreliableWrapper (#5002)
For `delete_objects` it was injecting failures for whole delete_objects operation
and then for every delete it contains. Make it fail once for the whole operation.
2023-08-16 14:08:53 +03:00
Arseny Sher
fdbe8dc8e0 Fix test_s3_wal_replay flakiness.
ref https://github.com/neondatabase/neon/issues/4466
2023-08-16 12:57:43 +03:00
Arthur Petukhovsky
1b97a3074c Disable neon-pool-opt-in (#4995) 2023-08-15 20:57:56 +03:00
John Spray
5c836ee5b4 tests: extend timeout in timeline deletion test (#4992)
## Problem

This was set to 5 seconds, which was very close to how long a compaction
took on my workstation, and when deletion is blocked on compaction the
test would fail.

We will fix this to make compactions drop out on deletion, but for the
moment let's stabilize the test.

## Summary of changes

Change timeout on timeline deletion in
`test_timeline_deletion_with_files_stuck_in_upload_queue` from 5 seconds
to 30 seconds.
2023-08-15 20:14:03 +03:00
Arseny Sher
4687b2e597 Test that auth on pg/http services can be enabled separately in sks.
To this end add
1) -e option to 'neon_local safekeeper start' command appending extra options
   to safekeeper invocation;
2) Allow multiple occurrences of the same option in safekeepers, the last
   value is taken.
3) Allow to specify empty string for *-auth-public-key-path opts, it
   disables auth for the service.
2023-08-15 19:31:20 +03:00
Arseny Sher
13adc83fc3 Allow to enable http/pg/pg tenant only auth separately in safekeeper.
The same option enables auth and specifies public key, so this allows to use
different public keys as well. The motivation is to
1) Allow to e.g. change pageserver key/token without replacing all compute
  tokens.
2) Enable auth gradually.
2023-08-15 19:31:20 +03:00
Dmitry Rodionov
52c2c69351 fsync directory before mark file removal (#4986)
## Problem

Deletions can be possibly reordered. Use fsync to avoid the case when
mark file doesnt exist but other tenant/timeline files do.

See added comments.

resolves #4987
2023-08-15 19:24:23 +03:00
Alexander Bayandin
207919f5eb Upload test results to DB right after generation (#4967)
## Problem

While adding new test results format, I've also changed the way we
upload Allure reports to S3
(722c7956bb)
to avoid duplicated results from previous runs. But it broke links at
earlier results (results are still available but on different URLs).

This PR fixes this (by reverting logic in
722c7956bb
changes), and moves the logic for storing test results into db to allure
generate step. It allows us to avoid test results duplicates in the db
and saves some time on extra s3 downloads that happened in a different
job before the PR.

Ref https://neondb.slack.com/archives/C059ZC138NR/p1691669522160229

## Summary of changes
- Move test results storing logic from a workflow to
`actions/allure-report-generate`
2023-08-15 15:32:30 +01:00
George MacKerron
218be9eb32 Added deferrable transaction option to http batch queries (#4993)
## Problem

HTTP batch queries currently allow us to set the isolation level and
read only, but not deferrable.

## Summary of changes

Add support for deferrable.

Echo deferrable status in response headers only if true.

Likewise, now echo read-only status in response headers only if true.
2023-08-15 14:52:00 +01:00
Joonas Koivunen
8198b865c3 Remote storage metrics follow-up (#4957)
#4942 left old metrics in place for migration purposes. It was noticed
that from new metrics the total number of deleted objects was forgotten,
add it.

While reviewing, it was noticed that the delete_object could just be
delete_objects of one.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-08-15 12:30:27 +03:00
Arpad Müller
baf395983f Turn BlockLease associated type into an enum (#4982)
## Problem

The `BlockReader` trait is not ready to be asyncified, as associated
types are not supported by asyncification strategies like via the
`async_trait` macro, or via adopting enums.

## Summary of changes

Remove the `BlockLease` associated type from the `BlockReader` trait and
turn it into an enum instead, bearing the same name. The enum has two
variants, one of which is gated by `#[cfg(test)]`. Therefore, outside of
test settings, the enum has zero overhead over just having the
`PageReadGuard`. Using the enum allows us to impl `BlockReader` without
needing the page cache.

Part of https://github.com/neondatabase/neon/issues/4743
2023-08-14 18:48:09 +02:00
Arpad Müller
ce7efbe48a Turn BlockCursor::{read_blob,read_blob_into_buf} async fn (#4905)
## Problem

The `BlockCursor::read_blob` and `BlockCursor::read_blob_into_buf`
functions are calling `read_blk` internally, so if we want to make that
function async fn, they need to be async themselves.

## Summary of changes

* We first turn `ValueRef::load` into an async fn.
* Then, we switch the `RwLock` implementation in `InMemoryLayer` to use
the one from `tokio`.
* Last, we convert the `read_blob` and `read_blob_into_buf` functions
into async fn.

In three instances we use `Handle::block_on`:

* one use is in compaction code, which currently isn't async. We put the
entire loop into an `async` block to prevent the potentially hot loop
from doing cross-thread operations.
* one use is in dumping code for `DeltaLayer`. The "proper" way to
address this would be to enable the visit function to take async
closures, but then we'd need to be generic over async fs non async,
which [isn't supported by rust right
now](https://blog.rust-lang.org/inside-rust/2022/07/27/keyword-generics.html).
The other alternative would be to do a first pass where we cache the
data into memory, and only then to dump it.
* the third use is in writing code, inside a loop that copies from one
file to another. It is is synchronous and we'd like to keep it that way
(for now?).

Part of #4743
2023-08-14 17:20:37 +02:00
Tristan Partin
ef4a76c01e Update Postgres to v15.4 and v14.9 (#4965) 2023-08-14 16:19:45 +01:00
George MacKerron
1ca08cc523 Changed batch query body to from [...] to { queries: [...] } (#4975)
## Problem

It's nice if `single query : single response :: batch query : batch
response`.

But at present, in the single case we send `{ query: '', params: [] }`
and get back a single `{ rows: [], ... }` object, while in the batch
case we send an array of `{ query: '', params: [] }` objects and get
back not an array of `{ rows: [], ... }` objects but a `{ results: [ {
rows: [] , ... }, { rows: [] , ... }, ... ] }` object instead.

## Summary of changes

With this change, the batch query body becomes `{ queries: [{ query: '',
params: [] }, ... ] }`, which restores a consistent relationship between
the request and response bodies.
2023-08-14 16:07:33 +01:00
Dmitry Rodionov
4626d89eda Harden retries on tenant/timeline deletion path. (#4973)
Originated from test failure where we got SlowDown error from s3.
The patch generalizes `download_retry` to not be download specific.
Resulting `retry` function is moved to utils crate. `download_retries`
is now a thin wrapper around this `retry` function.

To ensure that all needed retries are in place test code now uses
`test_remote_failures=1` setting.

Ref https://neondb.slack.com/archives/C059ZC138NR/p1691743624353009
2023-08-14 17:16:49 +03:00
Arseny Sher
49c57c0b13 Add neon_local to docker image.
People sometimes ask about this.

https://community.neon.tech/t/is-the-neon-local-binary-in-any-of-the-official-docker-images/360/2
2023-08-14 14:08:51 +03:00
John Spray
d3a97fdf88 pageserver: avoid incrementing access time when reading layers for compaction (#4971)
## Problem

Currently, image generation reads delta layers before writing out
subsequent image layers, which updates the access time of the delta
layers and effectively puts them at the back of the queue for eviction.
This is the opposite of what we want, because after a delta layer is
covered by a later image layer, it's likely that subsequent reads of
latest data will hit the image rather than the delta layer, so the delta
layer should be quite a good candidate for eviction.

## Summary of changes

`RequestContext` gets a new `ATimeBehavior` field, and a
`RequestContextBuilder` helper so that we can optionally add the new
field without growing `RequestContext::new` every time we add something
like this.

Request context is passed into the `record_access` function, and the
access time is not updated if `ATimeBehavior::Skip` is set.

The compaction background task constructs its request context with this
skip policy.

Closes: https://github.com/neondatabase/neon/issues/4969
2023-08-14 10:18:22 +01:00
Arthur Petukhovsky
763f5c0641 Remove dead code from walproposer_utils.c (#4525)
This code was mostly copied from walsender.c and the idea was to keep it
similar to walsender.c, so that we can easily copy-paste future upstream
changes to walsender.c to waproposer_utils.c, too. But right now I see that
deleting it doesn't break anything, so it's better to remove unused parts.
2023-08-14 09:49:51 +01:00
Arseny Sher
8173813584 Add term=n option to safekeeper START_REPLICATION command.
It allows term leader to ensure he pulls data from the correct term. Absense of
it wasn't very problematic due to CRC checks, but let's be strict.

walproposer still doesn't use it as we're going to remove recovery completely
from it.
2023-08-12 12:20:13 +03:00
Felix Prasanna
cc2d00fea4 bump vm-builder version to v0.15.4 (#4980)
Patches a bug in vm-builder where it did not include enough parameters
in the query string. These parameters are `host=localhost port=5432`.
These parameters were not necessary for the monitor because the `pq` go
postgres driver included them by default.
2023-08-11 14:26:53 -04:00
Arpad Müller
9ffccb55f1 InMemoryLayer: move end_lsn out of the lock (#4963)
## Problem

In some places, the lock on `InMemoryLayerInner` is only created to
obtain `end_lsn`. This is not needed however, if we move `end_lsn` to
`InMemoryLayer` instead.

## Summary of changes

Make `end_lsn` a member of `InMemoryLayer`, and do less locking of
`InMemoryLayerInner`. `end_lsn` is changed from `Option<Lsn>` into an
`OnceLock<Lsn>`. Thanks to this change, we don't need to lock any more
in three functions.

Part of #4743 . Suggested in
https://github.com/neondatabase/neon/pull/4905#issuecomment-1666458428 .
2023-08-11 18:01:02 +02:00
Arthur Petukhovsky
3a6b99f03c proxy: improve http logs (#4976)
Fix multiline logs on websocket errors and always print sql-over-http
errors sent to the user.
2023-08-11 18:18:07 +03:00
Dmitry Rodionov
d39fd66773 tests: remove redundant wait_while (#4952)
Remove redundant `wait_while` in tests. It had only one usage. Use
`wait_tenant_status404`.

Related:
https://github.com/neondatabase/neon/pull/4855#discussion_r1289610641
2023-08-11 10:18:13 +03:00
Arthur Petukhovsky
73d7a9bc6e proxy: propagate ws span (#4966)
Found this log on staging:
```
2023-08-10T17:42:58.573790Z  INFO handling interactive connection from client protocol="ws"
```

We seem to be losing websocket span in spawn, this patch fixes it.
2023-08-10 23:38:22 +03:00
Sasha Krassovsky
3a71cf38c1 Grant BypassRLS to new neon_superuser roles (#4935) 2023-08-10 21:04:45 +02:00
Conrad Ludgate
25c66dc635 proxy: http logging to 11 (#4950)
## Problem

Mysterious network issues

## Summary of changes

Log a lot more about HTTP/DNS in hopes of detecting more of the network
errors
2023-08-10 17:49:24 +01:00
George MacKerron
538373019a Increase max sql-over-http response size from 1MB to 10MB (#4961)
## Problem

1MB response limit is very small.

## Summary of changes

This data is not yet tracked, so we shoudn't raise the limit too high yet. 
But as discussed with @kelvich and @conradludgate, this PR lifts it to 
10MB, and adds also details of the limit to the error response.
2023-08-10 17:21:52 +01:00
Dmitry Rodionov
c58b22bacb Delete tenant's data from s3 (#4855)
## Summary of changes

For context see
https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md

Create Flow to delete tenant's data from pageserver. The approach
heavily mimics previously implemented timeline deletion implemented
mostly in https://github.com/neondatabase/neon/pull/4384 and followed up
in https://github.com/neondatabase/neon/pull/4552

For remaining deletion related issues consult with deletion project
here: https://github.com/orgs/neondatabase/projects/33

resolves #4250
resolves https://github.com/neondatabase/neon/issues/3889

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-08-10 18:53:16 +03:00
Alek Westover
17aea78aa7 delete already present files from library index (#4955) 2023-08-10 16:51:16 +03:00
Joonas Koivunen
71f9d9e5a3 test: allow slow shutdown warning (#4953)
Introduced in #4886, did not consider that tests with real_s3 could
sometimes go over the limit. Do not fail tests because of that.
2023-08-10 15:55:41 +03:00
Alek Westover
119b86480f test: make pg_regress less flaky, hopefully (#4903)
`pg_regress` is flaky: https://github.com/neondatabase/neon/issues/559

Consolidated `CHECKPOINT` to `check_restored_datadir_content`, add a
wait for `wait_for_last_flush_lsn`.

Some recently introduced flakyness was fixed with #4948.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-08-10 15:24:43 +03:00
Arpad Müller
fa1f87b268 Make the implementation of DiskBtreeReader::visit non-recursive (#4884)
## Problem

The `DiskBtreeReader::visit` function calls `read_blk` internally, and
while #4863 converted the API of `visit` to async, the internal function
is still recursive. So, analogously to #4838, we turn the recursive
function into an iterative one.

## Summary of changes

First, we prepare the change by moving the for loop outside of the case
switch, so that we only have one loop that calls recursion. Then, we
switch from using recursion to an approach where we store the search
path inside the tree on a stack on the heap.

The caller of the `visit` function can control when the search over the
B-Tree ends, by returning `false` from the closure. This is often used
to either only find one specific entry (by always returning `false`),
but it is also used to iterate over all entries of the B-tree (by always
returning `true`), or to look for ranges (mostly in tests, but
`get_value_reconstruct_data` also has such a use).

Each stack entry contains two things: the block number (aka the block's
offset), and a children iterator. The children iterator is constructed
depending on the search direction, and with the results of a binary
search over node's children list. It is the only thing that survives a
spilling/push to the stack, everything else is reconstructed. In other
words, each stack spill, will, if the search is still ongoing, cause an
entire re-parsing of the node. Theoretically, this would be a linear
overhead in the number of leaves the search visits. However, one needs
to note:

* the workloads to look for a specific entry are just visiting one leaf,
ever, so this is mostly about workloads that visit larger ranges,
including ones that visit the entire B-tree.
* the requests first hit the page cache, so often the cost is just in
terms of node deserialization
* for nodes that only have leaf nodes as children, no spilling to the
stack-on-heap happens (outside of the initial request where the iterator
is `None`). In other words, for balanced trees, the spilling overhead is
$\Theta\left(\frac{n}{b^2}\right)$, where `b` is the branching factor
and `n` is the number of nodes in the tree. The B-Trees in the current
implementation have a branching factor of roughly `PAGE_SZ/L` where
`PAGE_SZ` is 8192, and `L` is `DELTA_KEY_SIZE = 26` or `KEY_SIZE = 18`
in production code, so this gives us an estimate that we'd be re-loading
an inner node for every 99000 leaves in the B-tree in the worst case.

Due to these points above, I'd say that not fully caching the inner
nodes with inner children is reasonable, especially as we also want to
be fast for the "find one specific entry" workloads, where the stack
content is never accessed: any action to make the spilling
computationally more complex would contribute to wasted cycles here,
even if these workloads "only" spill one node for each depth level of
the b-tree (which is practically always a low single-digit number,
Kleppmann points out on page 81 that for branching factor 500, a four
level B-tree with 4 KB pages can store 250 TB of data).

But disclaimer, this is all stuff I thought about in my head, I have not
confirmed it with any benchmarks or data.

Builds on top of #4863, part of #4743
2023-08-10 13:43:13 +02:00
Joonas Koivunen
db48f7e40d test: mark test_download_extensions.py skipped for now (#4948)
The test mutates a shared directory which does not work with multiple
concurrent tests. It is being fixed, so this should be a very temporary
band-aid.

Cc: #4949.
2023-08-10 11:05:27 +00:00
Alek Westover
e157b16c24 if control file already exists ignore the remote version of the extension (#4945) 2023-08-09 18:56:09 +00:00
bojanserafimov
94ad9204bb Measure compute-pageserver latency (#4901)
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-08-09 13:20:30 -04:00
Joonas Koivunen
c8aed107c5 refactor: make {Delta,Image}LayerInners usable without {Delta,Image}Layer (#4937)
On the quest of #4745, these are more related to the task at hand, but
still small. In addition to $subject, allow
`ValueRef<ResidentDeltaLayer>`.
2023-08-09 19:18:44 +03:00
Anastasia Lubennikova
da128a509a fix pkglibdir path for remote extensions 2023-08-09 19:13:11 +03:00
Alexander Bayandin
5993b2bedc test_runner: remove excessive timeouts (#4659)
## Problem

For some tests, we override the default timeout (300s / 5m) with a larger
values like 600s / 10m or even 1800s / 30m, even if it's not required.
I've collected some statistics (for the last 60 days) for tests
duration:

| test | max (s) | p99 (s) | p50 (s) | count |

|-----------------------------------|---------|---------|---------|-------|
| test_hot_standby | 9 | 2 | 2 | 5319 |
| test_import_from_vanilla | 16 | 9 | 6 | 5692 |
| test_import_from_pageserver_small | 37 | 7 | 5 | 5719 |
| test_pg_regress | 101 | 73 | 44 | 5642 |
| test_isolation | 65 | 56 | 39 | 5692 |

A couple of tests that I left with custom 600s / 10m timeout.

| test | max (s) | p99 (s) | p50 (s) | count |

|-----------------------------------|---------|---------|---------|-------|
| test_gc_cutoff | 456 | 224 | 109 | 5694 |
| test_pageserver_chaos | 528 | 267 | 121 | 5712 |

## Summary of changes
- Remove `@pytest.mark.timeout` annotation from several tests
2023-08-09 16:27:53 +01:00
Anastasia Lubennikova
4ce7aa9ffe Fix extensions download error handling (#4941)
Don't panic if library or extension is not found in remote extension storage 
or download has failed. Instead, log the error and proceed - if file is not 
present locally as well, postgres will fail with postgres error.  If it is a 
shared_preload_library, it won't start, because of bad config. Otherwise, it 
will just fail to run the SQL function/ command that needs the library. 

Also, don't try to download extensions if remote storage is not configured.
2023-08-09 15:37:51 +03:00
Joonas Koivunen
cbd04f5140 remove_remote_layer: uninteresting refactorings (#4936)
In the quest to solve #4745 by moving the download/evictedness to be
internally mutable factor of a Layer and get rid of `trait
PersistentLayer` at least for prod usage, `layer_removal_cs`, we present
some misc cleanups.

---------

Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
2023-08-09 14:35:56 +03:00
Arpad Müller
1037a8ddd9 Explain why VirtualFile stores tenant_id and timeline_id as strings (#4930)
## Problem

One might wonder why the code here doesn't use `TimelineId` or
`TenantId`. I originally had a refactor to use them, but then discarded
it, because converting to strings on each time there is a read or write
is wasteful.

## Summary of changes

We add some docs explaining why here no `TimelineId` or `TenantId` is
being used.
2023-08-08 23:41:09 +02:00
Felix Prasanna
6661f4fd44 bump vm-builder version to v0.15.0-alpha1 (#4934) 2023-08-08 15:22:10 -05:00
Alexander Bayandin
b9f84b9609 Improve test results format (#4549)
## Problem

The current test history format is a bit inconvenient:
- It stores all test results in one row, so all queries should include
subqueries which expand the test 
- It includes duplicated test results if the rerun is triggered manually
for one of the test jobs (for example, if we rerun `debug-pg14`, then
the report will include duplicates for other build types/postgres
versions)
- It doesn't have a reference to run_id, which we use to create a link
to allure report

Here's the proposed new format:
```
    id           BIGSERIAL PRIMARY KEY,
    parent_suite TEXT NOT NULL,
    suite        TEXT NOT NULL,
    name         TEXT NOT NULL,
    status       TEXT NOT NULL,
    started_at   TIMESTAMPTZ NOT NULL,
    stopped_at   TIMESTAMPTZ NOT NULL,
    duration     INT NOT NULL,
    flaky        BOOLEAN NOT NULL,
    build_type   TEXT NOT NULL,
    pg_version   INT NOT NULL,
    run_id       BIGINT NOT NULL,
    run_attempt  INT NOT NULL,
    reference    TEXT NOT NULL,
    revision     CHAR(40) NOT NULL,
    raw          JSONB COMPRESSION lz4 NOT NULL,
```

## Summary of changes
- Misc allure changes:
  - Update allure to 2.23.1
- Delete files from previous runs in HTML report (by using `sync
--delete` instead of `mv`)
- Use `test-cases/*.json` instead of `suites.json`, using this directory
allows us to catch all reruns.
- Until we migrated `scripts/flaky_tests.py` and
`scripts/benchmark_durations.py` store test results in 2 formats (in 2
different databases).
2023-08-08 20:09:38 +01:00
Felix Prasanna
459253879e Revert "bump vm-builder to v0.15.0-alpha1 (#4895)" (#4931)
This reverts commit 682dfb3a31.
2023-08-08 20:21:39 +03:00
Conrad Ludgate
0fa85aa08e proxy: delay auth on retry (#4929)
## Problem

When an endpoint is shutting down, it can take a few seconds. Currently
when starting a new compute, this causes an "endpoint is in transition"
error. We need to add delays before retrying to ensure that we allow
time for the endpoint to shutdown properly.

## Summary of changes

Adds a delay before retrying in auth. connect_to_compute already has
this delay
2023-08-08 17:19:24 +03:00
Cuong Nguyen
039017cb4b Add new flag for advertising pg address (#4898)
## Problem

The safekeeper advertises the same address specified in `--listen-pg`,
which is problematic when the listening address is different from the
address that the pageserver can use to connect to the safekeeper.

## Summary of changes

Add a new optional flag called `--advertise-pg` for the address to be
advertised. If this flag is not specified, the behavior is the same as
before.
2023-08-08 14:26:38 +03:00
John Spray
4dc644612b pageserver: expose prometheus metrics for startup time (#4893)
## Problem

Currently to know how long pageserver startup took requires inspecting
logs.

## Summary of changes

`pageserver_startup_duration_ms` metric is added, with label `phase` for
different phases of startup.

These are broken down by phase, where the phases correspond to the
existing wait points in the code:
- Start of doing I/O
- When tenant load is done
- When initial size calculation is done
- When background jobs start
- Then "complete" when everything is done.

`pageserver_startup_is_loading` is a 0/1 gauge that indicates whether we are in the initial load of tenants.

`pageserver_tenant_activation_seconds` is a histogram of time in seconds taken to activate a tenant.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-08-08 12:41:37 +03:00
Anastasia Lubennikova
6d17d6c775 Use WebIdentityTokenCredentialsProvider to access remote extensions (#4921)
Fixes access to s3 buckets that use IAM roles for service accounts
access control method

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-08-08 12:37:22 +03:00
John Spray
4892a5c5b7 pageserver: avoid logging the "ERROR" part of DbErrors that are successes (#4902)
## Problem

The pageserver<->safekeeper protocol uses error messages to indicate end
of stream. pageserver already logs these at INFO level, but the inner
error message includes the word "ERROR", which interferes with log
searching.
   
Example:
```
  walreceiver connection handling ended: db error: ERROR: ending streaming to Some("pageserver") at 0/4031CA8
```
    
The inner DbError has a severity of ERROR so DbError's Display
implementation includes that ERROR, even though we are actually
logging the error at INFO level.

## Summary of changes

Introduce an explicit WalReceiverError type, and in its From<>
for postgres errors, apply the logic from ExpectedError, for
expected errors, and a new condition for successes.
    
The new output looks like:
```
    walreceiver connection handling ended: Successful completion: ending streaming to Some("pageserver") at 0/154E9C0, receiver is caughtup and there is no computes
 ```
2023-08-08 12:35:24 +03:00
John Spray
33cb1e9c0c tests: enable higher concurrency and adjust tests with outlier runtime (#4904)
## Problem

I spent a few minutes seeing how fast I could get our regression test
suite to run on my workstation, for when I want to run a "did I break
anything?" smoke test before pushing to CI.

- Test runtime was dominated by a couple of tests that run for longer
than all the others take together
- Test concurrency was limited to <16 by the ports-per-worker setting

There's no "right answer" for how long a test should
be, but as a rule of thumb, no one test should run
for much longer than the time it takes to run all the
other tests together.

## Summary of changes

- Make the ports per worker setting dynamic depending on worker count
- Modify the longest running tests to run for a shorter time
(`test_duplicate_layers` which uses a pgbench runtime) or fewer
iterations (`test_restarts_frequent_checkpoints`).
2023-08-08 09:16:21 +01:00
Arpad Müller
9559ef6f3b Sort by (key, lsn), not just key (#4918)
## Problem

PR #4839 didn't output the keys/values in lsn order, but for a given
key, the lsns were kept in incoming file order.

I think the ordering by lsn is expected.

## Summary of changes

We now also sort by `(key, lsn)`, like we did before #4839.
2023-08-07 18:14:15 +03:00
John Spray
64a4fb35c9 pagectl: skip metadata file in pagectl draw-timeline (#4872)
## Problem

Running `pagectl draw-timeline` on a pageserver directory wasn't working
out of the box because it trips up on the `metadata` file.

## Summary of changes

Just ignore the `metadata` file in the list of input files passed to
`draw-timeline`.
2023-08-07 08:24:50 +01:00
MMeent
95ec42f2b8 Change log levels on various operations (#4914)
Cache changes are now DEBUG2
Logs that indicate disabled caches now explicitly call out that the file cache is disabled on WARNING level instead of LOG/INFO
2023-08-06 20:37:09 +02:00
Joonas Koivunen
ba9df27e78 fix: silence not found error when removing ephmeral (#4900)
We currently cannot drop tenant before removing it's directory, or use
Tenant::drop for this. This creates unnecessary or inactionable warnings
during detach at least. Silence the most typical, file not found. Log
remaining at `error!`.

Cc: #2442
2023-08-04 21:03:17 +03:00
Joonas Koivunen
ea3e1b51ec Remote storage metrics (#4892)
We don't know how our s3 remote_storage is performing, or if it's
blocking the shutdown. Well, for sampling reasons, we will not really
know even after this PR.

Add metrics:
- align remote_storage metrics towards #4813 goals
- histogram
`remote_storage_s3_request_seconds{request_type=(get_object|put_object|delete_object|list_objects),
result=(ok|err|cancelled)}`
- histogram `remote_storage_s3_wait_seconds{request_type=(same kinds)}`
- counter `remote_storage_s3_cancelled_waits_total{request_type=(same
kinds)}`

Follow-up work:
- After release, remove the old metrics, migrate dashboards

Histogram buckets are rough guesses, need to be tuned. In pageserver we
have a download timeout of 120s, so I think the 100s bucket is quite
nice.
2023-08-04 21:01:29 +03:00
John Spray
e3e739ee71 pageserver: remove no-op attempt to report fail/failpoint feature (#4879)
## Problem

The current output from a prod binary at startup is:
```
git-env:765455bca22700e49c053d47f44f58a6df7c321f failpoints: true, features: [] launch_timestamp: 2023-08-02 10:30:35.545217477 UTC
```

It's confusing to read that line, then read the code and think "if
failpoints is true, but not in the features list, what does that mean?".
As far as I can tell, the check of `fail/failpoints` is just always
false because cargo doesn't expose features across crates like this: the
`fail/failpoints` syntax works in the cargo CLI but not from a macro in
some crate other than `fail`.

## Summary of changes

Remove the lines that try to check `fail/failpoints` from the pageserver
entrypoint module. This has no functional impact but makes the code
slightly easier to understand when trying to make sense of the line
printed on startup.
2023-08-04 17:56:31 +01:00
Conrad Ludgate
606caa0c5d proxy: update logs and span data to be consistent and have more info (#4878)
## Problem

Pre-requisites for #4852 and #4853

## Summary of changes

1. Includes the client's IP address (which we already log) with the span
info so we can have it on all associated logs. This makes making
dashboards based on IP addresses easier.
2. Switch to a consistent error/warning log for errors during
connection. This includes error, num_retries, retriable=true/false and a
consistent log message that we can grep for.
2023-08-04 12:37:18 +03:00
Arpad Müller
6a906c68c9 Make {DeltaLayer,ImageLayer}::{load,load_inner} async (#4883)
## Problem

The functions `DeltaLayer::load_inner` and `ImageLayer::load_inner` are
calling `read_blk` internally, which we would like to turn into an async
fn.

## Summary of changes

We switch from `once_cell`'s `OnceCell` implementation to the one in
`tokio` in order to be able to call an async `get_or_try_init` function.

Builds on top of #4839, part of #4743
2023-08-04 12:35:45 +03:00
Felix Prasanna
682dfb3a31 bump vm-builder to v0.15.0-alpha1 (#4895) 2023-08-03 14:26:14 -04:00
Joonas Koivunen
5263b39e2c fix: shutdown logging again (#4886)
During deploys of 2023-08-03 we logged too much on shutdown. Fix the
logging by timing each top level shutdown step, and possibly warn on it
taking more than a rough threshold, based on how long I think it
possibly should be taking. Also remove all shutdown logging from
background tasks since there is already "shutdown is taking a long time"
logging.

Co-authored-by: John Spray <john@neon.tech>
2023-08-03 20:34:05 +03:00
Arpad Müller
a241c8b2a4 Make DiskBtreeReader::{visit, get} async (#4863)
## Problem

`DiskBtreeReader::get` and `DiskBtreeReader::visit` both call `read_blk`
internally, which we would like to make async in the future. This PR
focuses on making the interface of these two functions `async`. There is
further work to be done in forms of making `visit` to not be recursive
any more, similar to #4838. For that, see
https://github.com/neondatabase/neon/pull/4884.

Builds on top of https://github.com/neondatabase/neon/pull/4839, part of
https://github.com/neondatabase/neon/issues/4743

## Summary of changes

Make `DiskBtreeReader::get` and `DiskBtreeReader::visit` async functions
and `await` in the places that call these functions.
2023-08-03 17:36:46 +02:00
John Spray
e71d8095b9 README: make it a bit clearer how to get regression tests running (#4885)
## Problem

When setting up for the first time I hit a couple of nits running tests:
- It wasn't obvious that `openssl` and `poetry` were needed (poetry is
mentioned kind of obliquely via "dependency installation notes" rather
than being in the list of rpm/deb packages to install.
- It wasn't obvious how to get the tests to run for just particular
parameters (e.g. just release mode)

## Summary of changes

Add openssl and poetry to the package lists.

Add an example of how to run pytest for just a particular build type and
postgres version.
2023-08-03 15:23:23 +01:00
Dmitry Rodionov
1497a42296 tests: split neon_fixtures.py (#4871)
## Problem

neon_fixtures.py has grown to unmanageable size. It attracts conflicts.

When adding specific utils under for example `fixtures/pageserver`
things sometimes need to import stuff from `neon_fixtures.py` which
creates circular import. This is usually only needed for type
annotations, so `typing.TYPE_CHECKING` flag can mask the issue.
Nevertheless I believe that splitting neon_fixtures.py into smaller
parts is a better approach.

Currently the PR contains small things, but I plan to continue and move
NeonEnv to its own `fixtures.env` module. To keep the diff small I think
this PR can already be merged to cause less conflicts.

UPD: it looks like currently its not really possible to fully avoid
usage of `typing.TYPE_CHECKING`, because some components directly depend
on each other. I e Env -> Cli -> Env cycle. But its still worth it to
avoid it in as many places as possible. And decreasing neon_fixture's
size still makes sense.
2023-08-03 17:20:24 +03:00
Alexander Bayandin
cd33089a66 test_runner: set AWS credentials for endpoints (#4887)
## Problem

If AWS credentials are not set locally (via
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars)
`test_remote_library[release-pg15-mock_s3]` test fails with the
following error:

```
ERROR could not start the compute node: Failed to download a remote file: Failed to download S3 object: failed to construct request
```

## Summary of changes
- set AWS credentials for endpoints programmatically
2023-08-03 16:44:48 +03:00
Arpad Müller
416c14b353 Compaction: sort on slices directly instead of kmerge (#4839)
## Problem

The k-merge in pageserver compaction currently relies on iterators over
the keys and also over the values. This approach does not support async
code because we are using iterators and those don't support async in
general. Also, the k-merge implementation we use doesn't support async
either. Instead, as we already load all the keys into memory, just do
sorting in-memory.

## Summary of changes

The PR can be read commit-by-commit, but most importantly, it:

* Stops using kmerge in compaction, using slice sorting instead.
* Makes `load_keys` and `load_val_refs` async, using `Handle::block_on`
in the compaction code as we don't want to turn the compaction function,
called inside `spawn_blocking`, into an async fn.

Builds on top of #4836, part of
https://github.com/neondatabase/neon/issues/4743
2023-08-03 15:30:41 +02:00
John Spray
df49a9b7aa pagekeeper: suppress error logs in shutdown/detach (#4876)
## Problem

Error messages like this coming up during normal operations:
```
        Compaction failed, retrying in 2s: timeline is Stopping

       Compaction failed, retrying in 2s: Cannot run compaction iteration on inactive tenant
```

## Summary of changes

Add explicit handling for the shutdown case in these locations, to
suppress error logs.
2023-08-02 19:31:09 +01:00
bojanserafimov
4ad0c8f960 compute_ctl: Prewarm before starting http server (#4867) 2023-08-02 14:19:06 -04:00
Joonas Koivunen
e0b05ecafb build: ca-certificates need to be present (#4880)
as needed since #4715 or this will happen:

```
ERROR panic{thread=main location=.../hyper-rustls-0.23.2/src/config.rs:48:9}: no CA certificates found
```
2023-08-02 20:34:21 +03:00
Vadim Kharitonov
ca4d71a954 Upgrade pg_embedding to 0.3.5 (#4873) 2023-08-02 18:18:33 +03:00
Alexander Bayandin
381f41e685 Bump cryptography from 41.0.2 to 41.0.3 (#4870) 2023-08-02 14:10:36 +03:00
Alek Westover
d005c77ea3 Tar Remote Extensions (#4715)
Add infrastructure to dynamically load postgres extensions and shared libraries from remote extension storage.

Before postgres start downloads list of available remote extensions and libraries, and also downloads 'shared_preload_libraries'. After postgres is running, 'compute_ctl' listens for HTTP requests to load files.

Postgres has new GUC 'extension_server_port' to specify port on which 'compute_ctl' listens for requests.

When PostgreSQL requests a file, 'compute_ctl' downloads it.

See more details about feature design and remote extension storage layout in docs/rfcs/024-extension-loading.md

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Alek Westover <alek.westover@gmail.com>
2023-08-02 12:38:12 +03:00
Joonas Koivunen
04776ade6c fix(consumption): rename _size_ => _data_ (#4866)
I failed at renaming the metric middle part while managing to do a great
job with the suffix. Fix the middle part as well.
2023-08-01 19:18:25 +03:00
Dmitry Rodionov
c3fe335eaf wait for tenant to be active before polling for timeline absence (#4856)
## Problem

https://neon-github-public-dev.s3.amazonaws.com/reports/main/5692829577/index.html#suites/f588e0a787c49e67b29490359c589fae/4c50937643d68a66

## Summary of changes

wait for tenant to be active after restart before polling for timeline
absence
2023-08-01 18:28:18 +03:00
Joonas Koivunen
3a00a5deb2 refactor: tidy consumption metrics (#4860)
Tidying up I've been wanting to do for some time.

Follow-up to #4857.
2023-08-01 18:14:16 +03:00
Joonas Koivunen
78fa2b13e5 test: written_size_bytes_delta (#4857)
Two stabs at this, by mocking a http receiver and the globals out (now
reverted) and then by separating the timeline dependency and just
testing what kind of events certain timelines produce. I think this
pattern could work for some of our problems.

Follow-up to #4822.
2023-08-01 15:30:36 +03:00
John Spray
7c076edeea pageserver: tweak period of imitate_layer_accesses (#4859)
## Problem

When the eviction threshold is an integer multiple of the eviction
period, it is unreliable to skip imitating accesses based on whether the
last imitation was more recent than the threshold.

This is because as finite time passes
between the time used for the periodic execution, and the 'now' time
used for updating last_layer_access_imitation. When this is just a few
milliseconds, and everything else is on-time, then a 5 second threshold
with a 1 second period will end up entering its 5th iteration slightly
_less than_ 5 second since last_layer_access_imitation, and thereby
skipping instead of running the imitation. If a few milliseconds then
pass before we check the access time of a file that _should_ have been
bumped by the imitation pass, then we end up evicting something we
shouldn't have evicted.

## Summary of changes

We can make this race far less likely by using the threshold minus one
interval as the period for re-executing the imitate_layer_accesses: that
way we're not vulnerable to racing by just a few millis, and there would
have to be a delay of the order `period` to cause us to wrongly evict a
layer.

This is not a complete solution: it would be good to revisit this and
use a non-walltime mechanism for pinning these layers into local
storage, rather than relying on bumping access times.
2023-08-01 13:17:49 +01:00
Arpad Müller
69528b7c30 Prepare k-merge in compaction for async I/O (#4836)
## Problem

The k-merge in pageserver compaction currently relies on iterators over
the keys and also over the values. This approach does not support async
code because we are using iterators and those don't support async in
general. Also, the k-merge implementation we use doesn't support async
either. Instead, as we already load all the keys into memory, the plan
is to just do the sorting in-memory for now, switch to async, and then
once we want to support workloads that don't have all keys stored in
memory, we can look into switching to a k-merge implementation that
supports async instead.

## Summary of changes

The core of this PR is the move from functions on the `PersistentLayer`
trait to return custom iterator types to inherent functions on `DeltaLayer`
that return buffers with all keys or value references.
Value references are a type we created in this PR, containing a
`BlobRef` as well as an `Arc` pointer to the `DeltaLayerInner`, so that
we can lazily load the values during compaction. This preserves the
property of the current code.

This PR does not switch us to doing the k-merge via sort on slices, but
with this PR, doing such a switch is relatively easy and only requires
changes of the compaction code itself.

Part of https://github.com/neondatabase/neon/issues/4743
2023-08-01 13:38:35 +02:00
Konstantin Knizhnik
a98a80abc2 Deffine NEON_SMGR to make it possible for extensions to use Neon SMG API (#4840)
## Problem

See https://neondb.slack.com/archives/C036U0GRMRB/p1689148023067319

## Summary of changes

Define NEON_SMGR in smgr.h

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-08-01 10:04:45 +03:00
Alex Chi Z
7b6c849456 support isolation level + read only for http batch sql (#4830)
We will retrieve `neon-batch-isolation-level` and `neon-batch-read-only`
from the http header, which sets the txn properties.
https://github.com/neondatabase/serverless/pull/38#issuecomment-1653130981

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-08-01 02:59:11 +03:00
Joonas Koivunen
326189d950 consumption_metrics: send timeline_written_size_delta (#4822)
We want to have timeline_written_size_delta which is defined as
difference to the previously sent `timeline_written_size` from the
current `timeline_written_size`.

Solution is to send it. On the first round `disk_consistent_lsn` is used
which is captured during `load` time. After that an incremental "event"
is sent on every collection. Incremental "events" are not part of
deduplication.

I've added some infrastructure to allow somewhat typesafe
`EventType::Absolute` and `EventType::Incremental` factories per
metrics, now that we have our first `EventType::Incremental` usage.
2023-07-31 22:10:19 +03:00
bojanserafimov
ddbe170454 Prewarm compute nodes (#4828) 2023-07-31 14:13:32 -04:00
Alexander Bayandin
39e458f049 test_compatibility: fix pg_tenant_only_port port collision (#4850)
## Problem

Compatibility tests fail from time to time due to `pg_tenant_only_port`
port collision (added in https://github.com/neondatabase/neon/pull/4731)

## Summary of changes
- replace `pg_tenant_only_port` value in config with new port
- remove old logic, than we don't need anymore
- unify config overrides
2023-07-31 20:49:46 +03:00
Vadim Kharitonov
e1424647a0 Update pg_embedding to 0.3.1 version (#4811) 2023-07-31 20:23:18 +03:00
Yinnan Yao
705ae2dce9 Fix error message for listen_pg_addr_tenant_only binding (#4787)
## Problem

Wrong use of `conf.listen_pg_addr` in `error!()`.

## Summary of changes

Use `listen_pg_addr_tenant_only` instead of `conf.listen_pg_addr`.

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2023-07-31 14:40:52 +01:00
Conrad Ludgate
eb78603121 proxy: div by zero (#4845)
## Problem

1. In the CacheInvalid state loop, we weren't checking the
`num_retries`. If this managed to get up to `32`, the retry_after
procedure would compute 2^32 which would overflow to 0 and trigger a div
by zero
2. When fixing the above, I started working on a flow diagram for the
state machine logic and realised it was more complex than it had to be:
  a. We start in a `Cached` state
b. `Cached`: call `connect_once`. After the first connect_once error, we
always move to the `CacheInvalid` state, otherwise, we return the
connection.
c. `CacheInvalid`: we attempt to `wake_compute` and we either switch to
Cached or we retry this step (or we error).
d. `Cached`: call `connect_once`. We either retry this step or we have a
connection (or we error) - After num_retries > 1 we never switch back to
`CacheInvalid`.

## Summary of changes

1. Insert a `num_retries` check in the `handle_try_wake` procedure. Also
using floats in the retry_after procedure to prevent the overflow
entirely
2. Refactor connect_to_compute to be more linear in design.
2023-07-31 09:30:24 -04:00
John Spray
f0ad603693 pageserver: add unit test for deleted_at in IndexPart (#4844)
## Problem

Existing IndexPart unit tests only exercised the version 1 format (i.e.
without deleted_at set).

## Summary of changes

Add a test that sets version to 2, and sets a value for deleted_at.

Closes https://github.com/neondatabase/neon/issues/4162
2023-07-31 12:51:18 +01:00
Arpad Müller
e5183f85dc Make DiskBtreeReader::dump async (#4838)
## Problem

`DiskBtreeReader::dump` calls `read_blk` internally, which we want to
make async in the future. As it is currently relying on recursion, and
async doesn't like recursion, we want to find an alternative to that and
instead traverse the tree using a loop and a manual stack.

## Summary of changes

* Make `DiskBtreeReader::dump` and all the places calling it async
* Make `DiskBtreeReader::dump` non-recursive internally and use a stack
instead. It now deparses the node in each iteration, which isn't
optimal, but on the other hand it's hard to store the node as it is
referencing the buffer. Self referential data are hard in Rust. For a
dumping function, speed isn't a priority so we deparse the node multiple
times now (up to branching factor many times).

Part of https://github.com/neondatabase/neon/issues/4743

I have verified that output is unchanged by comparing the output of this
command both before and after this patch:
```
cargo test -p pageserver -- particular_data --nocapture 
```
2023-07-31 12:52:29 +02:00
Joonas Koivunen
89ee8f2028 fix: demote warnings, fix flakyness (#4837)
`WARN ... found future (image|delta) layer` are not actionable log
lines. They don't need to be warnings. `info!` is enough.

This also fixes some known but not tracked flakyness in
[`test_remote_timeline_client_calls_started_metric`][evidence].

[evidence]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4829/5683495367/index.html#/testresult/34fe79e24729618b

Closes #3369.
Closes #4473.
2023-07-31 07:43:12 +00:00
Alex Chi Z
a8f3540f3d proxy: add unit test for wake_compute (#4819)
## Problem

ref https://github.com/neondatabase/neon/pull/4721, ref
https://github.com/neondatabase/neon/issues/4709

## Summary of changes

This PR adds unit tests for wake_compute.

The patch adds a new variant `Test` to auth backends. When
`wake_compute` is called, we will verify if it is the exact operation
sequence we are expecting. The operation sequence now contains 3 more
operations: `Wake`, `WakeRetry`, and `WakeFail`.

The unit tests for proxy connects are now complete and I'll continue
work on WebSocket e2e test in future PRs.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-28 19:10:55 -04:00
Konstantin Knizhnik
4338eed8c4 Make it possible to grant self perfmissions to self created roles (#4821)
## Problem

See: https://neondb.slack.com/archives/C04USJQNLD6/p1689973957908869

## Summary of changes

Bump Postgres version 

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-07-28 22:06:03 +03:00
Joonas Koivunen
2fbdf26094 test: raise timeout to avoid flakyness (#4832)
2s timeout was too tight for our CI,
[evidence](https://neon-github-public-dev.s3.amazonaws.com/reports/main/5669956577/index.html#/testresult/6388e31182cc2d6e).
15s might be better.

Also cleanup code no longer needed after #4204.
2023-07-28 14:32:01 -04:00
Alexander Bayandin
7374634845 test_runner: clean up test_compatibility (#4770)
## Problem

We have some amount of outdated logic in test_compatibility, that we
don't need anymore.

## Summary of changes
- Remove `PR4425_ALLOWED_DIFF` and tune `dump_differs` method to accept
allowed diffs in the future (a cleanup after
https://github.com/neondatabase/neon/pull/4425)
- Remote etcd related code (a cleanup after
https://github.com/neondatabase/neon/pull/2733)
- Don't set `preserve_database_files`
2023-07-28 16:15:31 +01:00
Alexander Bayandin
9fdd3a4a1e test_runner: add amcheck to test_compatibility (#4772)
Run `pg_amcheck` in forward and backward compatibility tests to catch
some data corruption.

## Summary of changes
- Add amcheck compiling to Makefile
- Add `pg_amcheck` to test_compatibility
2023-07-28 16:00:55 +01:00
Alek Westover
3681fc39fd modify relative_path_to_s3_object logic for prefix=None (#4795)
see added unit tests for more description
2023-07-28 10:03:18 -04:00
Joonas Koivunen
67d2fa6dec test: fix test_neon_cli_basics flakyness without making it better for future (#4827)
The test was starting two endpoints on the same branch as discovered by
@petuhovskiy.

The fix is to allow passing branch-name from the python side over to
neon_local, which already accepted it.

Split from #4824, which will handle making this more misuse resistant.
2023-07-27 19:13:58 +03:00
Dmitry Rodionov
cafbe8237e Move tenant/delete.rs to tenant/timeline/delete.rs (#4825)
move tenant/delete.rs to tenant/timeline/delete.rs to prepare for
appearance of tenant deletion routines in tenant/delete.rs
2023-07-27 15:52:36 +03:00
Joonas Koivunen
3e425c40c0 fix(compute_ctl): remove stray variable in error message (#4823)
error is not needed because anyhow will have the cause chain reported
anyways.

related to test_neon_cli_basics being flaky, but doesn't actually fix
any flakyness, just the obvious stray `{e}`.
2023-07-27 15:40:53 +03:00
Joonas Koivunen
395bd9174e test: allow future image layer warning (#4818)
https://neon-github-public-dev.s3.amazonaws.com/reports/main/5670795960/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/5a73fa4a69399123/retries

Allow it because we are doing immediate stop.
2023-07-27 10:22:44 +03:00
Alek Westover
b9a7a661d0 add list of public extensions and lookup table for libraries (#4807) 2023-07-26 15:55:55 -04:00
Joonas Koivunen
48ce95533c test: allow normal warnings in test_threshold_based_eviction (#4801)
See:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/5654328815/index.html#suites/3fc871d9ee8127d8501d607e03205abb/3482458eba88c021
2023-07-26 20:20:12 +03:00
Dmitry Rodionov
874c31976e dedup cleanup fs traces (#4778)
This is a follow up for discussion:
https://github.com/neondatabase/neon/pull/4552#discussion_r1253417777
see context there
2023-07-26 18:39:32 +03:00
Conrad Ludgate
231d7a7616 proxy: retry compute wake in auth (#4817)
## Problem

wake_compute can fail sometimes but is eligible for retries. We retry
during the main connect, but not during auth.

## Summary of changes

retry wake_compute during auth flow if there was an error talking to
control plane, or if there was a temporary error in waking the compute
node
2023-07-26 16:34:46 +01:00
arpad-m
5705413d90 Use OnceLock instead of manually implementing it (#4805)
## Problem

In https://github.com/neondatabase/neon/issues/4743 , I'm trying to make
more of the pageserver async, but in order for that to happen, I need to
be able to persist the result of `ImageLayer::load` across await points.
For that to happen, the return value needs to be `Send`.

## Summary of changes

Use `OnceLock` in the image layer instead of manually implementing it
with booleans, locks and `Option`.

Part of #4743
2023-07-26 17:20:09 +02:00
Conrad Ludgate
35370f967f proxy: add some connection init logs (#4812)
## Problem

The first session event we emit is after we receive the first startup
packet from the client. This means we can't detect any issues between
TCP open and handling of the first PG packet

## Summary of changes

Add some new logs for websocket upgrade and connection handling
2023-07-26 15:03:51 +00:00
Alexander Bayandin
b98419ee56 Fix allure report overwriting for different Postgres versions (#4806)
## Problem

We've got an example of Allure reports from 2 different runners for the
same build that started to upload at the exact second, making one
overwrite another

## Summary of changes
- Use the Postgres version to distinguish artifacts (along with the
build type)
2023-07-26 15:19:18 +01:00
Alexander Bayandin
86a61b318b Bump certifi from 2022.12.7 to 2023.7.22 (#4815) 2023-07-26 16:32:56 +03:00
Alek Westover
5f8fd640bf Upload Test Remote Extensions (#4792)
We need some real extensions in S3 to accurately test the code for
handling remote extensions.
In this PR we just upload three extensions (anon, kq_imcx and postgis), which is
enough for testing purposes for now. In addition to creating and
uploading the extension archives, we must generate a file
`ext_index.json` which specifies important metadata about the
extensions.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-07-26 15:24:03 +03:00
bojanserafimov
916a5871a6 compute_ctl: Parse sk connstring (#4809) 2023-07-26 08:10:49 -04:00
Dmitry Rodionov
700d929529 Init Timeline in Stopping state in create_timeline_struct when Cause::Delete (#4780)
See
https://github.com/neondatabase/neon/pull/4552#discussion_r1258368127
for context.

TLDR: use CreateTimelineCause to infer desired state instead of using
.set_stopping after initialization
2023-07-26 14:05:18 +03:00
bojanserafimov
520046f5bd cold starts: Add sync-safekeepers fast path (#4804) 2023-07-25 19:44:18 -04:00
Conrad Ludgate
2ebd2ce2b6 proxy: record connection type (#4802)
## Problem

We want to measure how many users are using TCP/WS connections.
We also want to measure how long it takes to establish a connection with
the compute node.

I plan to also add a separate counter for HTTP requests, but because of
pooling this needs to be disambiguated against new HTTP compute
connections

## Summary of changes

* record connection type (ws/tcp) in the connection counters.
* record connection latency including retry latency
2023-07-25 18:57:42 +03:00
Alex Chi Z
bcc2aee704 proxy: add tests for batch http sql (#4793)
This PR adds an integration test case for batch HTTP SQL endpoint.
https://github.com/neondatabase/neon/pull/4654/ should be merged first
before we land this PR.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-25 15:08:24 +00:00
Dmitry Rodionov
6d023484ed Use mark file to allow for deletion operations to continue through restarts (#4552)
## Problem

Currently we delete local files first, so if pageserver restarts after
local files deletion then remote deletion is not continued. This can be
solved with inversion of these steps.

But even if these steps are inverted when index_part.json is deleted
there is no way to distinguish between "this timeline is good, we just
didnt upload it to remote" and "this timeline is deleted we should
continue with removal of local state". So to solve it we use another
mark file. After index part is deleted presence of this mark file
indentifies that it was a deletion intention.

Alternative approach that was discussed was to delete all except
metadata first, and then delete metadata and index part. In this case we
still do not support local only configs making them rather unsafe
(deletion in them is already unsafe, but this direction solidifies this
direction instead of fixing it). Another downside is that if we crash
after local metadata gets removed we may leave dangling index part on
the remote which in theory shouldnt be a big deal because the file is
small.

It is not a big change to choose another approach at this point.

## Summary of changes

Timeline deletion sequence:
1. Set deleted_at in remote index part.
2. Create local mark file.
3. Delete local files except metadata (it is simpler this way, to be
able to reuse timeline initialization code that expects metadata)
4. Delete remote layers
5. Delete index part
6. Delete meta, timeline directory.
7. Delete mark file.

This works for local only configuration without remote storage.
Sequence is resumable from any point.

resolves #4453
resolves https://github.com/neondatabase/neon/pull/4552 (the issue was
created with async cancellation in mind, but we can still have issues
with retries if metadata is deleted among the first by remove_dir_all
(which doesnt have any ordering guarantees))

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-07-25 16:25:27 +03:00
Nick Randall
062159ac17 support non-interactive transactions in sql-over-http (#4654)
This PR adds support for non-interactive transaction query endpoint.
It accepts an array of queries and parameters and returns an array of
query results. The queries will be run in a single transaction one
after another on the proxy side.
2023-07-25 13:03:55 +01:00
cui fliter
f2e2b8a7f4 fix some typos (#4662)
Typos fix

Signed-off-by: cui fliter <imcusg@gmail.com>
2023-07-25 14:39:29 +03:00
Joonas Koivunen
f9214771b4 fix: count broken tenant more correct (#4800)
count only once; on startup create the counter right away because we
will not observe any changes later.

small, probably never reachable from outside fix for #4796.
2023-07-25 12:31:24 +03:00
Joonas Koivunen
77a68326c5 Thin out TenantState metric, keep set of broken tenants (#4796)
We currently have a timeseries for each of the tenants in different
states. We only want this for Broken. Other states could be counters.

Fix this by making the `pageserver_tenant_states_count` a counter
without a `tenant_id` and
add a `pageserver_broken_tenants_count` which has a `tenant_id` label,
each broken tenant being 1.
2023-07-25 11:15:54 +03:00
Joonas Koivunen
a25504deae Limit concurrent compactions (#4777)
Compactions can create a lot of concurrent work right now with #4265.

Limit compactions to use at most 6/8 background runtime threads.
2023-07-25 10:19:04 +03:00
Joonas Koivunen
294b8a8fde Convert per timeline metrics to global (#4769)
Cut down the per-(tenant, timeline) histograms by making them global:

- `pageserver_getpage_get_reconstruct_data_seconds`
- `pageserver_read_num_fs_layers`
- `pageserver_remote_operation_seconds`
- `pageserver_remote_timeline_client_calls_started`
- `pageserver_wait_lsn_seconds`
- `pageserver_io_operations_seconds`

---------

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-07-25 00:43:27 +03:00
Alex Chi Z
407a20ceae add proxy unit tests for retry connections (#4721)
Given now we've refactored `connect_to_compute` as a generic, we can
test it with mock backends. In this PR, we mock the error API and
connect_once API to test the retry behavior of `connect_to_compute`. In
the next PR, I'll add mock for credentials so that we can also test
behavior with `wake_compute`.

ref https://github.com/neondatabase/neon/issues/4709

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-24 20:41:42 +03:00
arpad-m
e5b7ddfeee Preparatory pageserver async conversions (#4773)
In #4743, we'd like to convert the read path to use `async` rust. In
preparation of that, this PR switches some functions that are calling
lower level functions like `BlockReader::read_blk`,
`BlockCursor::read_blob`, etc into `async`. The PR does not switch all
functions however, and only focuses on the ones which are easy to
switch.

This leaves around some async functions that are (currently)
unnecessarily `async`, but on the other hand it makes future changes
smaller in diff.

Part of #4743 (but does not completely address it).
2023-07-24 14:01:54 +02:00
Alek Westover
7feb0d1a80 unwrap instead of passing anyhow::Error on failure to spawn a thread (#4779) 2023-07-21 15:17:16 -04:00
Konstantin Knizhnik
457e3a3ebc Mx offset bug (#4775)
Fix mx_offset_to_flags_offset() function

Fixes issue #4774

Postgres `MXOffsetToFlagsOffset` was not correctly converted to Rust
because cast to u16 is done before division by modulo. It is possible
only if divider is power of two.

Add a small rust unit test to check that the function produces same results
as the PostgreSQL macro, and extend the existing python test to cover
this bug.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-07-21 21:20:53 +03:00
Joonas Koivunen
25d2f4b669 metrics: chunked responses (#4768)
Metrics can get really large in the order of hundreds of megabytes,
which we used to buffer completly (after a few rounds of growing the
buffer).
2023-07-21 15:10:55 +00:00
Alex Chi Z
1685593f38 stable merge and sort in compaction (#4573)
Per discussion at
https://github.com/neondatabase/neon/pull/4537#discussion_r1242086217,
it looks like a better idea to use `<` instead of `<=` for all these
comparisons.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-21 10:15:44 -04:00
dependabot[bot]
8d0f4a7857 Bump aiohttp from 3.7.4 to 3.8.5 (#4762) 2023-07-20 22:33:50 +03:00
Alex Chi Z
3fc3666df7 make flush frozen layer an atomic operation (#4720)
## Problem

close https://github.com/neondatabase/neon/issues/4712

## Summary of changes

Previously, when flushing frozen layers, it was split into two
operations: add delta layer to disk + remove frozen layer from memory.
This would cause a short period of time where we will have the same data
both in frozen and delta layer. In this PR, we merge them into one
atomic operation in layer map manager, therefore simplifying the code.
Note that if we decide to create image layers for L0 flush, it will
still be split into two operations on layer map.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-07-20 13:39:19 -04:00
Joonas Koivunen
89746a48c6 chore: fix copypaste caused flakyness (#4763)
I introduced a copypaste error leading to flaky [test failure][report]
in #4737. Solution is to use correct/unique test name.

I also looked into providing a proper fn name via macro but ... Yeah,
it's probably not a great idea.

[report]:
https://github.com/neondatabase/neon/actions/runs/5612473297/job/15206293430#step:15:197
2023-07-20 19:55:40 +03:00
Joonas Koivunen
8d27a9c54e Less verbose eviction failures (#4737)
As seen in staging logs with some massive compactions
(create_image_layer), in addition to racing with compaction or gc or
even between two invocations to `evict_layer_batch`.

Cc: #4745
Fixes: #3851 (organic tech debt reduction)

Solution is not to log the Not Found in such cases; it is perfectly
natural to happen. Route to this is quite long, but implemented two
cases of "race between two eviction processes" which are like our disk
usage based eviction and eviction_task, both have the separate "lets
figure out what to evict" and "lets evict" phases.
2023-07-20 17:45:10 +03:00
arpad-m
d98cb39978 pageserver: use tokio::time::timeout where possible (#4756)
Removes a bunch of cases which used `tokio::select` to emulate the
`tokio::time::timeout` function. I've done an additional review on the
cancellation safety of these futures, all of them seem to be
cancellation safe (not that `select!` allows non-cancellation-safe
futures, but as we touch them, such a review makes sense).

Furthermore, I correct a few mentions of a non-existent
`tokio::timeout!` macro in the docs to the `tokio::time::timeout`
function.
2023-07-20 16:19:38 +02:00
Alexander Bayandin
27c73c8740 Bump pg_embedding extension (#4758)
```
% git log --pretty=oneline 2465f831ea1f8d49c1d74f8959adb7fc277d70cd..eeb3ba7c3a60c95b2604dd543c64b2f1bb4a3703 
eeb3ba7c3a60c95b2604dd543c64b2f1bb4a3703 (HEAD -> main, origin/main) Fixc in-mmeory index rebuild after TRUNCATE
1d7cfcfe3d58e2cf4566900437c609725448d14b Correctly handle truncate forin-0memory HNSW index
8fd2a4a191f67858498d876ec378b58e76b5874a :Fix empty index search issue
30e9ef4064cff40c60ff2f78afeac6c296722757 Fix extensiomn name in makefile
23bb5d504aa21b1663719739f6eedfdcb139d948 Fix getting memory size at Mac OS/X
39193a38d6ad8badd2a8d1dce2dd999e1b86885d Update a comment for the extension
bf3b0d62a7df56a5e4db9d9e62dc535794c425bc Merge branch 'main' of https://github.com/neondatabase/pg_embedding
c2142d514280e14322d1026f0c811876ccf7a91f Update README.md
53b641880f786d2b69a75941c49e569018e8e97e Create LICENSE
093aaa36d5af183831bf370c97b563c12d15f23a Update README.md
91f0bb84d14cb26fd8b452bf9e1ecea026ac5cbc Update README.md
7f7efa38015f24ee9a09beca3009b8d0497a40b4 Update README.md
71defdd4143ecf35489d93289f6cdfa2545fbd36 Merge pull request #4 from neondatabase/danieltprice-patch-1
e06c228b99c6b7c47ebce3bb7c97dbd494088b0a Update README.md
d7e52b576b47d9023743b124bdd0360a9fc98f59 Update README.md
70ab399c861330b50a9aff9ab9edc7044942a65b Merge pull request #5 from neondatabase/oom_error_reporting
0aee1d937997198fa2d2b2ed7a0886d1075fa790 Fix OOM error reporting and support vectprization for ARM
18d80079ce60b2aa81d58cefdf42fc09d2621fc1 Update README.md
```
2023-07-20 12:32:57 +01:00
Joonas Koivunen
9e871318a0 Wait detaches or ignores on pageserver shutdown (#4678)
Adds in a barrier for the duration of the `Tenant::shutdown`.
`pageserver_shutdown` will join this await, `detach`es and `ignore`s
will not.

Fixes #4429.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-07-20 13:14:13 +03:00
bojanserafimov
e1061879aa Improve startup python test (#4757) 2023-07-19 23:46:16 -04:00
Daniel
f09e82270e Update comment for hnsw extension (#4755)
Updated the description that appears for hnsw when you query extensions:
```
neondb=> SELECT * FROM pg_available_extensions WHERE name = 'hnsw';
         name         | default_version | installed_version |                     comment                     
----------------------+-----------------+-------------------+--------------------------------------------------
 hnsw                 | 0.1.0           |                   | ** Deprecated ** Please use pg_embedding instead
(1 row)
```

---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-07-19 19:08:25 +01:00
Alexander Bayandin
d4a5fd5258 Disable extension uploading to S3 (#4751)
## Problem
We're going to reset S3 buckets for extensions
(https://github.com/neondatabase/aws/pull/413), and as soon as we're
going to change the format we store extensions on S3. Let's stop
uploading extensions in the old format.

## Summary of changes
- Disable `aws s3 cp` step for extensions
2023-07-19 15:44:14 +01:00
Arseny Sher
921bb86909 Use safekeeper tenant only port in all tests and actually test it.
Compute now uses special safekeeper WAL service port allowing auth tokens with
only tenant scope. Adds understanding of this port to neon_local and fixtures,
as well as test of both ports behaviour with different tokens.

ref https://github.com/neondatabase/neon/issues/4730
2023-07-19 06:03:51 +04:00
Arseny Sher
1e7db5458f Add one more WAL service port allowing only tenant scoped auth tokens.
It will make it easier to limit access at network level, with e.g. k8s network
policies.

ref https://github.com/neondatabase/neon/issues/4730
2023-07-19 06:03:51 +04:00
Alexander Bayandin
b4d36f572d Use sharded-slab from crates (#4729)
## Problem

We use a patched version of `sharded-slab` with increased MAX_THREADS
[1]. It is not required anymore because safekeepers are async now.

A valid comment from the original PR tho [1]:
> Note that patch can affect other rust services, not only the
safekeeper binary.

- [1] https://github.com/neondatabase/neon/pull/4122

## Summary of changes
- Remove patch for `sharded-slab`
2023-07-18 13:50:44 +01:00
Joonas Koivunen
762a8a7bb5 python: more linting (#4734)
Ruff has "B" class of lints, including B018 which will nag on useless
expressions, related to #4719. Enable such lints and fix the existing
issues.

Most notably:
- https://beta.ruff.rs/docs/rules/mutable-argument-default/
- https://beta.ruff.rs/docs/rules/assert-false/

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-07-18 12:56:40 +03:00
Conrad Ludgate
2e8a3afab1 proxy: merge handle_client (#4740)
## Problem

Second half of #4699. we were maintaining 2 implementations of
handle_client.

## Summary of changes

Merge the handle_client code, but abstract some of the details.

## Checklist before requesting a review

- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-07-17 22:20:23 +01:00
Alexander Bayandin
4580f5085a test_runner: run benchmarks in parallel (#4683)
## Problem

Benchmarks run takes about an hour on main branch (in a single job),
which delays pipeline results. And it takes another hour if we want to
restart the job due to some failures.
 
## Summary of changes
- Use `pytest-split` plugin to run benchmarks on separate CI runners in
4 parallel jobs
- Add `scripts/benchmark_durations.py` for getting benchmark durations
from the database to help `pytest-split` schedule tests more evenly. It
uses p99 for the last 10 days' results (durations).

The current distribution could be better; each worker's durations vary
from 9m to 35m, but this could be improved in consequent PRs.
2023-07-17 20:09:45 +01:00
Conrad Ludgate
e074ccf170 reduce proxy timeouts (#4708)
## Problem

10 retries * 10 second timeouts makes for a very long retry window.

## Summary of changes

Adds a 2s timeout to sql_over_http connections, and also reduces the 10s
timeout in TCP.
2023-07-17 20:05:26 +01:00
George MacKerron
196943c78f CORS preflight OPTIONS support for /sql (http fetch) endpoint (#4706)
## Problem

HTTP fetch can't be used from browsers because proxy doesn't support
[CORS 'preflight' `OPTIONS`
requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests).

## Summary of changes

Added a simple `OPTIONS` endpoint for `/sql`.
2023-07-17 20:01:25 +01:00
bojanserafimov
149dd36b6b Update pg: add startup logs (#4736) 2023-07-17 14:47:08 -04:00
Kirill Bulatov
be271e3edf Use upstream version of tokio-tar (#4722)
tokio-tar 0.3.1 got released, including all changes from the fork
currently used, switch over to that one.
2023-07-17 17:18:33 +01:00
Conrad Ludgate
7c85c7ea91 proxy: merge connect compute (#4713)
## Problem

Half of #4699.

TCP/WS have one implementation of `connect_to_compute`, HTTP has another
implementation of `connect_to_compute`.

Having both is annoying to deal with.

## Summary of changes

Creates a set of traits `ConnectMechanism` and `ShouldError` that allows
the `connect_to_compute` to be generic over raw TCP stream or
tokio_postgres based connections.

I'm not super happy with this. I think it would be nice to
remove tokio_postgres entirely but that will need a lot more thought to
be put into it.

I have also slightly refactored the caching to use fewer references.
Instead using ownership to ensure the state of retrying is encoded in
the type system.
2023-07-17 15:53:01 +01:00
Alex Chi Z
1066bca5e3 compaction: allow duplicated layers and skip in replacement (#4696)
## Problem

Compactions might generate files of exactly the same name as before
compaction due to our naming of layer files. This could have already
caused some mess in the system, and is known to cause some issues like
https://github.com/neondatabase/neon/issues/4088. Therefore, we now
consider duplicated layers in the post-compaction process to avoid
violating the layer map duplicate checks.

related previous works: close
https://github.com/neondatabase/neon/pull/4094
error reported in: https://github.com/neondatabase/neon/issues/4690,
https://github.com/neondatabase/neon/issues/4088

## Summary of changes

If a file already exists in the layer map before the compaction, do not
modify the layer map and do not delete the file. The file on disk at
that time should be the new one overwritten by the compaction process.

This PR also adds a test case with a fail point that produces exactly
the same set of files.

This bypassing behavior is safe because the produced layer files have
the same content / are the same representation of the original file.

An alternative might be directly removing the duplicate check in the
layer map, but I feel it would be good if we can prevent that in the
first place.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-07-17 17:26:29 +03:00
bojanserafimov
1aad8918e1 Document recommended ccls setup (#4723) 2023-07-17 09:21:42 -04:00
Christian Schwarz
966213f429 basebackup query metric: use same buckets as control plane (#4732)
The `CRITICAL_OPS_BUCKETS` is not useful for getting an accurate
picture of basebackup latency because all the observations
that negatively affect our SLI fall into one bucket, i.e., 100ms-1s.

Use the same buckets as control plane instead.
2023-07-17 13:46:13 +02:00
arpad-m
35e73759f5 Reword comment and add comment on race condition (#4725)
The race condition that caused #4526 is still not fixed, so point it out
in a comment. Also, reword a comment in upload.rs.

Follow-up of #4694
2023-07-17 12:49:58 +02:00
Vadim Kharitonov
48936d44f8 Update postgres version (#4727) 2023-07-16 13:40:59 +03:00
Em Sharnoff
2eae0a1fe5 Update vm-builder v0.12.1 -> v0.13.1 (#4728)
This should only affect the version of the vm-informant used. The only
change to the vm-informant from v0.12.1 to v0.13.1 was:

* https://github.com/neondatabase/autoscaling/pull/407

Just a typo fix; worth getting in anyways.
2023-07-15 15:38:15 -07:00
dependabot[bot]
53470ad12a Bump cryptography from 41.0.0 to 41.0.2 (#4724) 2023-07-15 14:36:13 +03:00
Alexander Bayandin
edccef4514 Make CI more friendly for external contributors (#4663)
## Problem

CI doesn't work for external contributors (for PRs from forks), see
#2222 for more information.

I'm proposing the following:
- External PR is created
- PR is reviewed so that it doesn't contain any malicious code
- Label `approved-for-ci-run` is added to that PR (by the reviewer)
- A new workflow picks up this label and creates an internal branch from
that PR (the branch name is `ci-run/pr-*`)
- CI is run on the branch, but the results are also propagated to the
PRs check
- We can merge a PR itself if it's green; if not — repeat.

## Summary of changes
- Create `approved-for-ci-run.yml` workflow which handles
`approved-for-ci-run` label
- Trigger `build_and_test.yml` and `neon_extra_builds.yml` workflows on
`ci-run/pr-*` branches
2023-07-15 11:58:15 +01:00
arpad-m
982fce1e72 Fix rustdoc warnings and test cargo doc in CI (#4711)
## Problem

`cargo +nightly doc` is giving a lot of warnings: broken links, naked
URLs, etc.

## Summary of changes

* update the `proc-macro2` dependency so that it can compile on latest
Rust nightly, see https://github.com/dtolnay/proc-macro2/pull/391 and
https://github.com/dtolnay/proc-macro2/issues/398
* allow the `private_intra_doc_links` lint, as linking to something
that's private is always more useful than just mentioning it without a
link: if the link breaks in the future, at least there is a warning due
to that. Also, one might enable
[`--document-private-items`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html#documentation-options)
in the future and make these links work in general.
* fix all the remaining warnings given by `cargo +nightly doc`
* make it possible to run `cargo doc` on stable Rust by updating
`opentelemetry` and associated crates to version 0.19, pulling in a fix
that previously broke `cargo doc` on stable:
https://github.com/open-telemetry/opentelemetry-rust/pull/904
* Add `cargo doc` to CI to ensure that it won't get broken in the
future.

Fixes #2557

## Future work
* Potentially, it might make sense, for development purposes, to publish
the generated rustdocs somewhere, like for example [how the rust
compiler does
it](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html).
I will file an issue for discussion.
2023-07-15 05:11:25 +03:00
Vadim Kharitonov
e767ced8d0 Update rust to 1.71.0 (#4718)
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-07-14 18:34:01 +02:00
Alex Chi Z
1309571f5d proxy: switch to structopt for clap parsing (#4714)
Using `#[clap]` for parsing cli opts, which is easier to maintain.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-14 19:11:01 +03:00
Joonas Koivunen
9a69b6cb94 Demote deletion warning, list files (#4688)
Handle test failures like:

```
AssertionError: assert not ['$ts  WARN delete_timeline{tenant_id=X timeline_id=Y}: About to remove 1 files\n']
```

Instead of logging:

```
WARN delete_timeline{tenant_id=X timeline_id=Y}: Found 1 files not bound to index_file.json, proceeding with their deletion
WARN delete_timeline{tenant_id=X timeline_id=Y}: About to remove 1 files
```

For each one operation of timeline deletion, list all unref files with
`info!`, and then continue to delete them with the added spice of
logging the rare/never happening non-utf8 name with `warn!`.

Rationale for `info!` instead of `warn!`: this is a normal operation;
like we had mentioned in `test_import.py` -- basically whenever we
delete a timeline which is not idle.

Rationale for N * (`ìnfo!`|`warn!`): symmetry for the layer deletions;
if we could ever need those, we could also need these for layer files
which are not yet mentioned in `index_part.json`.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-07-14 18:59:16 +03:00
Joonas Koivunen
cc82cd1b07 spanchecks: Support testing without tracing (#4682)
Tests cannot be ran without configuring tracing. Split from #4678.

Does not nag about the span checks when there is no subscriber
configured, because then the spans will have no links and nothing can be
checked. Sadly the `SpanTrace::status()` cannot be used for this.
`tracing` is always configured in regress testing (running with
`pageserver` binary), which should be enough.

Additionally cleans up the test code in span checks to be in the test
code. Fixes a `#[should_panic]` test which was flaky before these
changes, but the `#[should_panic]` hid the flakyness.

Rationale for need: Unit tests might not be testing only the public or
`feature="testing"` APIs which are only testable within `regress` tests
so not all spans might be configured.
2023-07-14 17:45:25 +03:00
Alex Chi Z
c76b74c50d semantic layer map operations (#4618)
## Problem

ref https://github.com/neondatabase/neon/issues/4373

## Summary of changes

A step towards immutable layer map. I decided to finish the refactor
with this new approach and apply
https://github.com/neondatabase/neon/pull/4455 on this patch later.

In this PR, we moved all modifications of the layer map to one place
with semantic operations like `initialize_local_layers`,
`finish_compact_l0`, `finish_gc_timeline`, etc, which is now part
of `LayerManager`. This makes it easier to build new features upon
this PR:

* For immutable storage state refactor, we can simply replace the layer
map with `ArcSwap<LayerMap>` and remove the `layers` lock. Moving
towards it requires us to put all layer map changes in a single place as
in https://github.com/neondatabase/neon/pull/4455.
* For manifest, we can write to manifest in each of the semantic
functions.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-07-13 17:35:27 +03:00
Alexey Kondratov
ed938885ff [compute_ctl] Fix deletion of template databases (#4661)
If database was created with `is_template true` Postgres doesn't allow
dropping it right away and throws error
```
ERROR:  cannot drop a template database
```
so we have to unset `is_template` first.

Fixing it, I noticed that our `escape_literal` isn't exactly correct
and following the same logic as in `quote_literal_internal`, we need to
prepend string with `E`. Otherwise, it's not possible to filter
`pg_database` using `escape_literal()` result if name contains `\`, for
example.

Also use `FORCE` to drop database even if there are active connections.
We run this from `cloud_admin`, so it should have enough privileges.

NB: there could be other db states, which prevent us from dropping
the database. For example, if db is used by any active subscription
or logical replication slot.
TODO: deal with it once we allow logical replication. Proper fix should
involve returning an error code to the control plane, so it could
figure out that this is a non-retryable error, return it to the user and
mark operation as permanently failed.

Related to neondatabase/cloud#4258
2023-07-13 13:18:35 +02:00
Conrad Ludgate
db4d094afa proxy: add more error cases to retry connect (#4707)
## Problem

In the logs, I noticed we still weren't retrying in some cases. Seemed
to be timeouts but we explicitly wanted to handle those

## Summary of changes

Retry on io::ErrorKind::TimedOut errors.
Handle IO errors in tokio_postgres::Error.
2023-07-13 11:47:27 +01:00
Conrad Ludgate
0626e0bfd3 proxy: refactor some error handling and shutdowns (#4684)
## Problem

It took me a while to understand the purpose of all the tasks spawned in
the main functions.

## Summary of changes

Utilising the type system and less macros, plus much more comments,
document the shutdown procedure of each task in detail
2023-07-13 11:03:37 +01:00
Stas Kelvich
444d6e337f add rfcs/022-user-mgmt.md (#3838)
Co-authored-by: Vadim Kharitonov <vadim@neon.tech>
2023-07-12 19:58:55 +02:00
Arthur Petukhovsky
3a1be9b246 Broadcast before exiting sync safekeepers (#4700)
Recently we started doing sync-safekeepers before exiting compute_ctl,
expecting that it will make next sync faster by skipping recovery. But
recovery is still running in some cases
(https://github.com/neondatabase/neon/pull/4574#issuecomment-1629256166)
because of the lagging truncateLsn. This PR should help with updating
truncateLsn.
2023-07-12 17:48:20 +01:00
arpad-m
664d32eb7f Don't propagate but log file not found error in layer uploading (#4694)
This addresses the issue in #4526 by adding a test that reproduces the
race condition that gave rise to the bug (or at least *a* race condition
that gave rise to the same error message), and then implementing a fix
that just prints a message to the log if a file could not been found for
uploading. Even though the underlying race condition is not fixed yet,
this will un-block the upload queue in that situation, greatly reducing
the impact of such a (rare) race.

Fixes #4526.
2023-07-12 18:10:49 +02:00
Alexander Bayandin
ed845b644b Prevent unintentional Postgres submodule update (#4692)
## Problem

Postgres submodule can be changed unintentionally, and these changes are
easy to miss during the review.

Adding a check that should prevent this from happening, the check fails
`build-neon` job with the following message:
```
Expected postgres-v14 rev to be at '1414141414141414141414141414141414141414', but it is at '1144aee1661c79eec65e784a8dad8bd450d9df79'
Expected postgres-v15 rev to be at '1515151515151515151515151515151515151515', but it is at '1984832c740a7fa0e468bb720f40c525b652835d'
Please update vendors/revisions.json if these changes are intentional.
```
This is an alternative approach to
https://github.com/neondatabase/neon/pull/4603

## Summary of changes
- Add `vendor/revisions.json` file with expected revisions
- Add built-time check (to `build-neon` job) that Postgres submodules
match revisions from `vendor/revisions.json`
- A couple of small improvements for logs from
https://github.com/neondatabase/neon/pull/4603
- Fixed GitHub autocomment for no tests was run case

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-07-12 15:12:37 +01:00
Joonas Koivunen
87dd37a2f2 pageserver: Align tenant, timeline id names in spans (#4687)
Uses `(tenant|timeline)_id`. Not a statement about endorsing this naming
style but it is better to be aligned.
2023-07-12 16:58:40 +03:00
arpad-m
1355bd0ac5 layer deletion: Improve a comment and fix TOCTOU (#4673)
The comment referenced an issue that was already closed. Remove that
reference and replace it with an explanation why we already don't print
an error.

See discussion in
https://github.com/neondatabase/neon/issues/2934#issuecomment-1626505916

For the TOCTOU fixes, the two calls after the `.exists()` both didn't
handle the situation well where the file was deleted after the initial
`.exists()`: one would assume that the path wasn't a file, giving a bad
error, the second would give an accurate error but that's not wanted
either.

We remove both racy `exists` and `is_file` checks, and instead just look
for errors about files not being found.
2023-07-12 15:52:14 +02:00
Conrad Ludgate
a1d6b1a4af proxy wake_compute loop (#4675)
## Problem

If we fail to wake up the compute node, a subsequent connect attempt
will definitely fail. However, kubernetes won't fail the connection
immediately, instead it hangs until we timeout (10s).

## Summary of changes

Refactor the loop to allow fast retries of compute_wake and to skip a
connect attempt.
2023-07-12 11:38:36 +01:00
bojanserafimov
92aee7e07f cold starts: basebackup compression (#4482)
Co-authored-by: Alex Chi Z <iskyzh@gmail.com>
2023-07-11 13:11:23 -04:00
Em Sharnoff
5e2f29491f Update vm-builder v0.11.1 -> v0.12.1 (#4680)
This should only affect the version of the vm-informant used. The only
PR changing the informant since v0.11.1 was:

* https://github.com/neondatabase/autoscaling/pull/389

The bug that autoscaling#389 fixed impacts all pooled VMs, so the
updated images from this PR must be released before
https://github.com/neondatabase/cloud/pull/5721.
2023-07-11 12:45:25 +02:00
bojanserafimov
618d36ee6d compute_ctl: log a structured event on successful start (#4679) 2023-07-10 15:34:26 -04:00
Alexander Bayandin
33c2d94ba6 Fix git-env version for PRs (#4641)
## Problem

Binaries created from PRs (both in docker images and for tests) have 
wrong git-env versions, they point to phantom merge commits.

## Summary of changes
- Prefer GIT_VERSION env variable even if git information was accessible
- Use `${{ github.event.pull_request.head.sha || github.sha }}` instead
of `${{ github.sha }}` for `GIT_VERSION` in workflows

So the builds will still happen from this phantom commit, but we will
report the PR commit.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-07-10 20:01:01 +01:00
Alex Chi Z
08bfe1c826 remove LayerDescriptor and use LayerObject for tests (#4637)
## Problem

part of https://github.com/neondatabase/neon/pull/4340

## Summary of changes

Remove LayerDescriptor and remove `todo!`. At the same time, this PR
adds `AsLayerDesc` trait for all persistent layers and changed
`LayerFileManager` to have a generic type. For tests, we are now using
`LayerObject`, which is a wrapper around `PersistentLayerDesc`.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-10 19:40:37 +03:00
Christian Schwarz
65ff256bb8 page_service: add peer_addr span field, and set tenant_id / timeline_id fields earlier (#4638)
Before this PR, during shutdown, we'd find naked logs like this one for every active page service connection:

```
2023-07-05T14:13:50.791992Z  INFO shutdown request received in run_message_loop
```

This PR
1. adds a peer_addr span field to distinguish the connections in logs
2. sets the tenant_id / timeline_id fields earlier

It would be nice to have `tenant_id` and `timeline_id` directly on
 the `page_service_conn_main` span (empty, initially), then set
them at the top of `process_query`.
The problem is that the debug asserts for `tenant_id` and
`timeline_id` presence in the tracing span doesn't support
detecting empty values [1].
So, I'm a bit hesitant about over-using `Span::record`.

[1] https://github.com/neondatabase/neon/issues/4676
2023-07-10 15:23:40 +02:00
Alex Chi Z
5177c1e4b1 pagectl: separate xy margin for draw timeline (#4669)
We were computing margin by lsn range, but this will cause problems for
layer maps with large overlapping LSN range. Now we compute x, y margin
separately to avoid this issue.

## Summary of changes

before:

<img width="1651" alt="image"
src="https://github.com/neondatabase/neon/assets/4198311/3bfb50cb-960b-4d8f-9bbe-a55c89d82a28">

we have a lot of rectangles of negative width, and they disappear in the
layer map.

after:

<img width="1320" alt="image"
src="https://github.com/neondatabase/neon/assets/4198311/550f0f96-849f-4bdc-a852-b977499f04f4">


Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-10 09:22:06 -04:00
Christian Schwarz
49efcc3773 walredo: add tenant_id to span of NoLeakChild::drop (#4640)
We see the following log lines occasionally in prod:

```
kill_and_wait_impl{pid=1983042}: wait successful exit_status=signal: 9 (SIGKILL)
```

This PR makes it easier to find the tenant for the pid, by including the
tenant id as a field in the span.
2023-07-10 12:49:22 +03:00
Dmitry Rodionov
76b1cdc17e Order tenant_id argument before timeline_id, use references (#4671)
It started from few config methods that have various orderings and
sometimes use references sometimes not. So I unified path manipulation
methods to always order tenant_id before timeline_id and use referenced
because we dont need owned values.

Similar changes happened to call-sites of config methods.

I'd say its a good idea to always order tenant_id before timeline_id so
it is consistent across the whole codebase.
2023-07-10 10:23:37 +02:00
Alexander Bayandin
1f151d03d8 Dockerfile.compute-node: support arm64 (#4660)
## Problem

`docker build ... -f Dockerfile.compute-node ...` fails on ARM (I'm
checking on macOS).

## Summary of changes
- Download the arm version of cmake on arm
2023-07-07 18:21:15 +01:00
Conrad Ludgate
ac758e4f51 allow repeated IO errors from compute node (#4624)
## Problem

#4598 compute nodes are not accessible some time after wake up due to
kubernetes DNS not being fully propagated.

## Summary of changes

Update connect retry mechanism to support handling IO errors and
sleeping for 100ms

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-07-07 19:50:50 +03:00
arpad-m
4f280c2953 Small pageserver cleanups (#4657)
## Problem

I was reading the code of the page server today and found these minor
things that I thought could be cleaned up.

## Summary of changes

* remove a redundant indentation layer and continue in the flushing loop
* use the builtin `PartialEq` check instead of hand-rolling a `range_eq`
function
* Add a missing `>` to a prominent doc comment
2023-07-07 16:53:14 +02:00
Dmitry Rodionov
20137d9588 Polish tracing helpers (#4651)
Context: comments here: https://github.com/neondatabase/neon/pull/4645
2023-07-06 19:49:14 +03:00
Arseny Sher
634be4f4e0 Fix async write in safekeepers.
General Rust Write trait semantics (as well as its async brother) is that write
definitely happens only after Write::flush(). This wasn't needed in sync where
rust write calls the syscall directly, but is required in async.

Also fix setting initial end_pos in walsender, sometimes it was from the future.

fixes https://github.com/neondatabase/neon/issues/4518
2023-07-06 19:56:28 +04:00
Alex Chi Z
d340cf3721 dump more info in layer map (#4567)
A simple commit extracted from
https://github.com/neondatabase/neon/pull/4539

This PR adds more info for layer dumps (is_delta, is_incremental, size).

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-06 18:21:45 +03:00
Konstantin Knizhnik
1741edf933 Pageserver reconnect v2 (#4519)
## Problem

Compute is not always able to reconnect to pages server.
First of all it caused by long time of restart of pageserver.
So number of attempts is increased from 5 (hardcoded) to 60 (GUC).
Also we do not perform flush after each command to increase performance
(it is especially critical for prefetch).
Unfortunately such pending flush makes it not possible to transparently
reconnect to restarted pageserver.
What we can do is to try to minimzie such probabilty.
Most likely broken connection will be detected in first sens command
after some idle period.
This is why max_flush_delay parameter is added which force flush to be
performed after first request after some idle period.

See #4497

## Summary of changes

Add neon.max_reconnect_attempts and neon.max_glush_delay GUCs which
contol when flush has to be done
and when it is possible to try to reconnect to page server


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-07-06 12:47:43 +03:00
Joonas Koivunen
269e20aeab fix: filter out zero synthetic sizes (#4639)
Apparently sending the synthetic_size == 0 is causing problems, so
filter out sending zeros.

Slack discussion:
https://neondb.slack.com/archives/C03F5SM1N02/p1688574285718989?thread_ts=1687874910.681049&cid=C03F5SM1N02
2023-07-06 12:32:34 +03:00
Tomoka Hayashi
91435006bd Fix docker-compose file and document (#4621)
## Problem

- Running the command according to docker.md gives warning and error.
- Warning `permissions should be u=rw (0600) or less` is output when
executing `psql -h localhost -p 55433 -U cloud_admin`.
- `FATAL: password authentication failed for user "root”` is output in
compute logs.

## Summary of changes

- Add `$ chmod 600 ~/.pgpass` in docker.md to avoid warning.
- Add username (cloud_admin) to pg_isready command in docker-compose.yml
to avoid error.

---------

Co-authored-by: Tomoka Hayashi <tomoka.hayashi@ntt.com>
2023-07-06 10:11:24 +01:00
Dmitry Rodionov
b263510866 move some logical size bits to separate logical_size.rs 2023-07-06 11:58:41 +03:00
Dmitry Rodionov
e418fc6dc3 move some tracing related assertions to separate module for tenant and timeline 2023-07-06 11:58:41 +03:00
Dmitry Rodionov
434eaadbe3 Move uninitialized timeline from tenant.rs to timeline/uninit.rs 2023-07-06 11:58:41 +03:00
Alexander Bayandin
6fb7edf494 Compile pg_embedding extension (#4634)
```
CREATE EXTENSION embedding;
CREATE TABLE t (val real[]);
INSERT INTO t (val) VALUES ('{0,0,0}'), ('{1,2,3}'), ('{1,1,1}'), (NULL);
CREATE INDEX ON t USING hnsw (val) WITH (maxelements = 10, dims=3, m=3);
INSERT INTO t (val) VALUES (array[1,2,4]);

SELECT * FROM t ORDER BY val <-> array[3,3,3];
   val   
---------
 {1,2,3}
 {1,2,4}
 {1,1,1}
 {0,0,0}
 
(5 rows)
```
2023-07-05 18:40:25 +01:00
Christian Schwarz
505aa242ac page cache: add size metrics (#4629)
Make them a member of `struct PageCache` to prepare for a future
where there's no global state.
2023-07-05 15:36:42 +03:00
arpad-m
1c516906e7 Impl Display for LayerFileName and require it for Layer (#4630)
Does three things:

* add a `Display` impl for `LayerFileName` equal to the `short_id`
* based on that, replace the `Layer::short_id` function by a requirement
for a `Display` impl
* use that `Display` impl in the places where the `short_id` and `file_name()` functions were used instead

Fixes #4145
2023-07-05 14:27:50 +02:00
Christian Schwarz
7d7cd8375c callers of task_mgr::spawn: some top-level async blocks were missing tenant/timeline id (#4283)
Looking at logs from staging and prod, I found there are a bunch of log
lines without tenant / timeline context.

Manully walk through all task_mgr::spawn lines and fix that using the
least amount of work required.

While doing it, remove some redundant `shutting down` messages. 

refs https://github.com/neondatabase/neon/issues/4222
2023-07-05 14:04:05 +02:00
Vadim Kharitonov
c92b7543b5 Update pgvector to 0.4.4 (#4632)
After announcing `hnsw`, there is a hypothesis that the community will
start comparing it with `pgvector` by themselves. Therefore, let's have
an actual version of `pgvector` in Neon.
2023-07-05 13:39:51 +03:00
Stas Kelvich
dbf88cf2d7 Minimalistic pool for http endpoint compute connections (under opt-in flag)
Cache up to 20 connections per endpoint. Once all pooled connections
are used current implementation can open an extra connection, so the
maximum number of simultaneous connections is not enforced.

There are more things to do here, especially with background clean-up
of closed connections, and checks for transaction state. But current
implementation allows to check for smaller coonection latencies that
this cache should bring.
2023-07-05 12:00:03 +03:00
Konstantin Knizhnik
f1db87ac36 Check if there is enough memory for HNSW index (#4602)
## Problem

HNSW index is created in memory.
Try to prevent OOM by checking of available RAM.


## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-07-05 11:40:38 +03:00
Christian Schwarz
3f9defbfb4 page cache: add access & hit rate metrics (#4628)
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
2023-07-05 10:38:32 +02:00
bojanserafimov
c7143dbde6 compute_ctl: Fix misleading metric (#4608) 2023-07-04 19:07:36 -04:00
Stas Kelvich
cbf9a40889 Set a shorter timeout for the initial connection attempti in proxy.
In case we try to connect to an outdated address that is no longer valid, the
default behavior of Kubernetes is to drop the packets, causing us to wait for
the entire timeout period. We want to fail fast in such cases.

A specific case to consider is when we have cached compute node information
with a 5-minute TTL (Time To Live), but the user has executed a `/suspend` API
call, resulting in the nonexistence of the compute node.
2023-07-04 20:34:22 +03:00
Joonas Koivunen
10aba174c9 metrics: Remove comments regarding upgradeable rwlocks (#4622)
Closes #4001 by removing the comments alluding towards
upgradeable/downgradeable RwLocks.
2023-07-04 17:40:51 +03:00
Conrad Ludgate
ab2ea8cfa5 use pbkdf2 crate (#4626)
## Problem

While pbkdf2 is a simple algorithm, we should probably use a well tested
implementation

## Summary of changes

* Use pbkdf2 crate
* Use arrays like the hmac comment says

## Checklist before requesting a review

- [X] I have performed a self-review of my code.
- [X] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-07-04 14:54:59 +01:00
arpad-m
9c8c55e819 Add _cached and _bytes to pageserver_tenant_synthetic_size metric name (#4616)
This renames the `pageserver_tenant_synthetic_size` metric to
`pageserver_tenant_synthetic_cached_size_bytes`, as was requested on
slack (link in the linked issue).

* `_cached` to hint that it is not incrementally calculated
* `_bytes` to indicate the unit the size is measured in

Fixes #3748
2023-07-03 19:34:07 +02:00
Conrad Ludgate
10110bee69 fix setup instructions (#4615)
## Problem

1. The local endpoints provision 2 ports (postgres and HTTP) which means
the migration_check endpoint has a different port than what the setup
implies
2. psycopg2-binary 2.9.3 has a deprecated poetry config and doesn't
install.

## Summary of changes

Update psycopg2-binary and update the endpoint ports in the readme

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-07-03 18:10:44 +03:00
Joonas Koivunen
cff7ae0b0d fix: no more ansi colored logs (#4613)
Allure does not support ansi colored logs, yet `compute_ctl` has them.

Upgrade criterion to get rid of atty dependency, disable ansi colors,
remove atty dependency and disable ansi feature of tracing-subscriber.

This is a heavy-handed approach. I am not aware of a workflow where
you'd want to connect a terminal directly to for example `compute_ctl`,
usually you find the logs in a file. If someone had been using colors,
they will now need to:
- turn the `tracing-subscriber.default-features` to `true`
- edit their wanted project to have colors

I decided to explicitly disable ansi colors in case we would have in
future a dependency accidentally enabling the feature on
`tracing-subscriber`, which would be quite surprising but not
unimagineable.

By getting rid of `atty` from dependencies we get rid of
<https://github.com/advisories/GHSA-g98v-hv3f-hcfr>.
2023-07-03 16:37:02 +03:00
Alexander Bayandin
78a7f68902 Make pg_version and build_type regular parameters (#4311)
## Problem

All tests have already been parametrised by Postgres version and build
type (to have them distinguishable in the Allure report), but despite
it, it's anyway required to have DEFAULT_PG_VERSION and BUILD_TYPE env
vars set to corresponding values, for example to
run`test_timeline_deletion_with_files_stuck_in_upload_queue[release-pg14-local_fs]`
test it's required to set `DEFAULT_PG_VERSION=14` and
`BUILD_TYPE=release`.
This PR makes the test framework pick up parameters from the test name
itself.

## Summary of changes
- Postgres version and build type related fixtures now are
function-scoped (instead of being sessions scoped before)
- Deprecate `--pg-version` argument in favour of DEFAULT_PG_VERSION env
variable (it's easier to parse)
- GitHub autocomment now includes only one command with all the failed
tests + runs them in parallel
2023-07-03 13:51:40 +01:00
Christian Schwarz
24eaa3b7ca timeline creation: reflect failures due to ancestor LSN issues in status code (#4600)
Before, it was a `500` and control plane would retry, wereas it actually
should have stopped retrying.

(Stacked on top of https://github.com/neondatabase/neon/pull/4597 )

fixes https://github.com/neondatabase/neon/issues/4595
part of https://github.com/neondatabase/cloud/issues/5626

---------

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-07-03 15:21:10 +03:00
Shany Pozin
26828560a8 Add timeouts and retries to consumption metrics reporting client (#4563)
## Problem
#4528
## Summary of changes
Add a 60 seconds default timeout to the reqwest client 
Add retries for up to 3 times to call into the metric consumption
endpoint

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-07-03 15:20:49 +03:00
Alek Westover
86604b3b7d Delete Unnecessary files in Extension Bucket (#4606)
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-07-03 07:37:30 -04:00
Joonas Koivunen
4957bb2d48 fix(proxy): stray span enter globbers up logs (#4612)
Prod logs have deep accidential span nesting. Introduced in #3759 and
has been untouched since, maybe no one watches proxy logs? :) I found it
by accident when looking to see if proxy logs have ansi colors with
`{neon_service="proxy"}`.

The solution is to mostly stop using `Span::enter` or `Span::entered` in
async code. Kept on `Span::entered` in cancel on shutdown related path.
2023-07-03 11:53:57 +01:00
Sasha Krassovsky
ff1a1aea86 Make control plane connector send encrypted password (#4607)
Control plane needs the encrypted hash that postgres itself generates
2023-06-30 14:17:44 -07:00
Em Sharnoff
c9f05d418d Bump vm-builder v0.11.0 -> v0.11.1 (#4605)
This applies the fix from https://github.com/neondatabase/autoscaling/pull/367,
which should resolve the "leaking cloud_admin connections" issue that
has been  observed for some customers.
2023-06-30 23:49:06 +03:00
bojanserafimov
9de1a6fb14 cold starts: Run sync_safekeepers on compute_ctl shutdown (#4588) 2023-06-30 16:29:47 -04:00
Konstantin Knizhnik
fbd37740c5 Make file_cache logging less verbose (#4601)
## Problem


Message "set local file cache limit to..." polutes compute logs.

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-06-30 21:55:56 +03:00
Anastasia Lubennikova
3e55d9dec6 Bump vendor/postgres
Point to REL_14_STABLE_neon and REL_15_STABLE_neon.
This change updates PostgreSQL versions to 14.8 and 15.3
2023-06-30 16:29:10 +03:00
Christian Schwarz
f558f88a08 refactor: distinguished error type for timeline creation failure (#4597)
refs https://github.com/neondatabase/neon/issues/4595
2023-06-30 14:53:21 +02:00
Dmitry Rodionov
b990200496 tests: use shortcut everywhere to get timeline path (#4586) 2023-06-30 15:01:06 +03:00
Alex Chi Z
7e20b49da4 refactor: use LayerDesc in LayerMap (part 2) (#4437)
## Problem

part of https://github.com/neondatabase/neon/issues/4392, continuation
of https://github.com/neondatabase/neon/pull/4408

## Summary of changes

This PR removes all layer objects from LayerMap and moves it to the
timeline struct. In timeline struct, LayerFileManager maps a layer
descriptor to a layer object, and it is stored in the same RwLock as
LayerMap to avoid behavior difference.

Key changes:

* LayerMap now does not have generic, and only stores descriptors.
* In Timeline, we add a new struct called layer mapping.
* Currently, layer mapping is stored in the same lock with layer map.
Every time we retrieve data from the layer map, we will need to map the
descriptor to the actual object.
* Replace_historic is moved to layer mapping's replace, and the return
value behavior is different from before. I'm a little bit unsure about
this part and it would be good to have some comments on that.
* Some test cases are rewritten to adapt to the new interface, and we
can decide whether to remove it in the future because it does not make
much sense now.
* LayerDescriptor is moved to `tests` module and should only be intended
for unit testing / benchmarks.
* Because we now have a usage pattern like "take the guard of lock, then
get the reference of two fields", we want to avoid dropping the
incorrect object when we intend to unlock the lock guard. Therefore, a
new set of helper function `drop_r/wlock` is added. This can be removed
in the future when we finish the refactor.

TODOs after this PR: fully remove RemoteLayer, and move LayerMapping to
a separate LayerCache.

all refactor PRs:

```
#4437 --- #4479 ------------ #4510 (refactor done at this point)
      \-- #4455 -- #4502 --/
```

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-29 15:06:07 -04:00
Alek Westover
032b603011 Fix: Wrong Enum Variant (#4589) 2023-06-29 10:55:02 -04:00
Alex Chi Z
ca0e0781c8 use const instead of magic number for repartition threshold (#4286)
There is a magic number about how often we repartition and therefore
affecting how often we compact. This PR makes this number `10` a global
constant and add docs.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-29 16:56:17 +03:00
Alexander Bayandin
b2a5e91a88 Upload custom extensions to S3 (#4585)
## Problem

We want to have a number of custom extensions that will not be available
by default, as an example of such is [Postgres
Anonymizer](https://postgresql-anonymizer.readthedocs.io/en/stable/)
(please note that the extension should be added to
`shared_preload_libraries`). To distinguish them, custom extensions
should be added to a different S3 path:

```
s3://<bucket>/<release version>/<postgres_version>/<ext_name>/share/extensions/
s3://<bucket>/<release version>/<postgres_version>/<ext_name>/lib

where <ext_name> is an extension name
```

Resolves https://github.com/neondatabase/neon/issues/4582

## Summary of changes
- Add Postgres Anonymizer extension to Dockerfile (it's included only to
postgres-extensions target)
- Build extensions image from postgres-extensions target in a workflow
- Upload custom extensions to S3 (different directory)
2023-06-29 16:33:26 +03:00
Joonas Koivunen
44e7d5132f fix: hide token from logs (#4584)
fixes #4583 and also changes all needlessly arg listing places to use
`skip_all`.
2023-06-29 15:53:16 +03:00
Alex Chi Z
c19681bc12 neon_local: support force init (#4363)
When we use local SSD for bench and create `.neon` directory before we
do `cargo neon init`, the initialization process will error due to
directory already exists. This PR adds a flag `--force` that removes
everything inside the directory if `.neon` already exists.

---------

Signed-off-by: Alex Chi Z. <chi@neon.tech>
2023-06-28 11:39:07 -04:00
Shany Pozin
ec9b585837 Add Activating as a possible state for attaching a tenant in test_tenant_relocation validation (#4581)
Fix flakyness of test_tenant_relocation
2023-06-28 12:35:52 +03:00
Joonas Koivunen
02ef246db6 refactor: to pattern of await after timeout (#4432)
Refactor the `!completed` to be about `Option<_>` instead, side-stepping
any boolean true/false or false/true. As discussed on
https://github.com/neondatabase/neon/pull/4399#discussion_r1219321848
2023-06-28 06:18:45 +00:00
Konstantin Knizhnik
195d4932c6 Set LwLSN after WAL records when redo is performed or skipped (#4579)
## Problem

See #4516

Inspecting log it is possible to notice that if lwlsn is set to the
beginning of applied WAL record, then
incorrect version of the page is loaded:

```
2023-06-27 18:36:51.930 GMT [3273945] CONTEXT:  WAL redo at 0/14AF6F0 for Heap/INSERT: off 2 flags 0x01; blkref #0: rel 1663/5/1259, blk 0 FPW
2023-06-27 18:36:51.930 GMT [3273945] LOG:  Do REDO block 0 of rel 1663/5/1259 fork 0 at LSN 0/**014AF6F0**..0/014AFA60
2023-06-27 18:37:02.173 GMT [3273963] LOG:  Read blk 0 in rel 1663/5/1259 fork 0 (request LSN 0/**014AF6F0**): lsn=0/**0143C7F8** at character 22
2023-06-27 18:37:47.780 GMT [3273945] LOG:  apply WAL record at 0/1BB8F38 xl_tot_len=188, xl_prev=0/1BB8EF8 
2023-06-27 18:37:47.780 GMT [3273945] CONTEXT:  WAL redo at 0/1BB8F38 for Heap/INPLACE: off 2; blkref #0: rel 1663/5/1259, blk 0
2023-06-27 18:37:47.780 GMT [3273945] LOG:  Do REDO block 0 of rel 1663/5/1259 fork 0 at LSN 0/01BB8F38..0/01BB8FF8
2023-06-27 18:37:47.780 GMT [3273945] CONTEXT:  WAL redo at 0/1BB8F38 for Heap/INPLACE: off 2; blkref #0: rel 1663/5/1259, blk 0
2023-06-27 18:37:47.780 GMT [3273945] PANIC:  invalid lp
```

## Summary of changes

1. Use end record LSN  for both cases
2. Update lwlsn for relation metadata

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-28 09:04:13 +03:00
Alexander Bayandin
7fe0a4bf1a Fix promote-images job (#4577)
## Problem

```
+ crane tag neondatabase/extensions:3337 latest
Error: fetching "neondatabase/extensions:3337": GET https://index.docker.io/v2/neondatabase/extensions/manifests/3337: MANIFEST_UNKNOWN: manifest unknown; unknown tag=3337
```

We don't build `neondatabase/extensions` image yet (broken in
https://github.com/neondatabase/neon/pull/4505)

## Summary of changes
- Do not try to interact with `neondatabase/extensions`
2023-06-27 20:05:10 +03:00
bojanserafimov
ef2b9ffbcb Basebackup forward compatibility (#4572) 2023-06-27 12:05:27 -04:00
Alexander Bayandin
250a27fb85 Upload Postgres Extensions to S3 (#4505)
## Problem

We want to store Postgres Extensions in S3 (resolves
https://github.com/neondatabase/neon/issues/4493).

Proposed solution:
- Create a separate docker image (from scratch) that contains only
extensions
- Do not include extensions into compute-node (except for neon
extensions)*
- For release and main builds upload extract extension from the image
and upload to S3 (`s3://<bucket>/<release
version>/<postgres_version>/...`)**

*) We're not doing it until the feature is not fully implemented
**) This differs from the initial proposal in
https://github.com/neondatabase/neon/issues/4493 of putting extensions
straight into `s3://<bucket>/<postgres_version>/...`, because we can't
upload directory atomicly. A drawback of this is that we end up with
unnecessary copies of files ~2.1 GB per release (i.e. +2.1 GB for each
commit in main also). We don't really need to update extensions for each
release if there're no relevant changes, but this requires extra work.

## Summary of changes
- Created a separate stage in Dockerfile.compute-node
`postgres-extensions` that contains only extensions
- Added a separate step in a workflow that builds `postgres-extensions`
image (because of a bug in kaniko this step is commented out because it
takes way too long to get built)
- Extract extensions from the image and upload files to S3 for release
and main builds
- Upload extenstions only for staging (for now)
2023-06-27 16:23:22 +01:00
Shany Pozin
d748615c1f RemoteTimelineClient::delete_all() to use s3::delete_objects (#4461)
## Problem
[#4325](https://github.com/neondatabase/neon/issues/4325)
## Summary of changes
Use delete_objects() method
2023-06-27 15:01:32 +03:00
Dmitry Rodionov
681c6910c2 Straighten the spec for timeline delete (#4538)
## Problem

Lets keep 500 for unusual stuff that is not considered normal. Came up
during one of the discussions around console logs now seeing this 500's.

## Summary of changes

- Return 409 Conflict instead of 500
- Remove 200 ok status because it is not used anymore
2023-06-27 13:56:32 +03:00
Vadim Kharitonov
148f0f9b21 Compile pg_roaringbitmap extension (#4564)
## Problem

```
postgres=# create extension roaringbitmap;
CREATE EXTENSION
postgres=# select roaringbitmap('{1,100,10}');
                 roaringbitmap
------------------------------------------------
 \x3a30000001000000000002001000000001000a006400
(1 row)
```
2023-06-27 10:55:03 +01:00
Shany Pozin
a7f3f5f356 Revert "run Layer::get_value_reconstruct_data in spawn_blocking#4498" (#4569)
This reverts commit 1faf69a698.
2023-06-27 10:57:28 +03:00
Felix Prasanna
00d1cfa503 bump VM_BUILDER_VERSION to 0.11.0 (#4566)
Routine bump of autoscaling version `0.8.0` -> `0.11.0`
2023-06-26 14:10:27 -04:00
Christian Schwarz
1faf69a698 run Layer::get_value_reconstruct_data in spawn_blocking (#4498)
This PR concludes the "async `Layer::get_value_reconstruct_data`"
project.

The problem we're solving is that, before this patch, we'd execute
`Layer::get_value_reconstruct_data` on the tokio executor threads.
This function is IO- and/or CPU-intensive.
The IO is using VirtualFile / std::fs; hence it's blocking.
This results in unfairness towards other tokio tasks, especially under
(disk) load.

Some context can be found at
https://github.com/neondatabase/neon/issues/4154
where I suspect (but can't prove) load spikes of logical size
calculation to
cause heavy eviction skew.

Sadly we don't have tokio runtime/scheduler metrics to quantify the
unfairness.
But generally, we know blocking the executor threads on std::fs IO is
bad.
So, let's have this change and watch out for severe perf regressions in
staging & during rollout.

## Changes

* rename `Layer::get_value_reconstruct_data` to
`Layer::get_value_reconstruct_data_blocking`
* add a new blanket impl'd `Layer::get_value_reconstruct_data`
`async_trait` method that runs `get_value_reconstruct_data_blocking`
inside `spawn_blocking`.
* The `spawn_blocking` requires `'static` lifetime of the captured
variables; hence I had to change the data flow to _move_ the
`ValueReconstructState` into and back out of get_value_reconstruct_data
instead of passing a reference. It's a small struct, so I don't expect a
big performance penalty.

## Performance

Fundamentally, the code changes cause the following performance-relevant
changes:

* Latency & allocations: each `get_value_reconstruct_data` call now
makes a short-lived allocation because `async_trait` is just sugar for
boxed futures under the hood
* Latency: `spawn_blocking` adds some latency because it needs to move
the work to a thread pool
* using `spawn_blocking` plus the existing synchronous code inside is
probably more efficient better than switching all the synchronous code
to tokio::fs because _each_ tokio::fs call does `spawn_blocking` under
the hood.
* Throughput: the `spawn_blocking` thread pool is much larger than the
async executor thread pool. Hence, as long as the disks can keep up,
which they should according to AWS specs, we will be able to deliver
higher `get_value_reconstruct_data` throughput.
* Disk IOPS utilization: we will see higher disk utilization if we get
more throughput. Not a problem because the disks in prod are currently
under-utilized, according to node_exporter metrics & the AWS specs.
* CPU utilization: at higher throughput, CPU utilization will be higher.

Slightly higher latency under regular load is acceptable given the
throughput gains and expected better fairness during disk load peaks,
such as logical size calculation peaks uncovered in #4154.


## Full Stack Of Preliminary PRs

This PR builds on top of the following preliminary PRs

1. Clean-ups
  * https://github.com/neondatabase/neon/pull/4316
  * https://github.com/neondatabase/neon/pull/4317
  * https://github.com/neondatabase/neon/pull/4318
  * https://github.com/neondatabase/neon/pull/4319
  * https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
thought we'd need in my original plan where we would need to convert
`Tenant::timelines` into an async locking primitive (#4333). In reviews,
we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485
7. https://github.com/neondatabase/neon/pull/4441
2023-06-26 11:43:11 +02:00
Christian Schwarz
44a441080d bring back spawn_blocking for compact_level0_phase1 (#4537)
The stats for `compact_level0_phase` that I added in #4527 show the
following breakdown (24h data from prod, only looking at compactions
with > 1 L1 produced):

* 10%ish of wall-clock time spent between the two read locks
* I learned that the `DeltaLayer::iter()` and `DeltaLayer::key_iter()`
calls actually do IO, even before we call `.next()`. I suspect that is
why they take so much time between the locks.
* 80+% of wall-clock time spent writing layer files
* Lock acquisition time is irrelevant (low double-digit microseconds at
most)
* The generation of the holes holds the read lock for a relatively long
time and it's proportional to the amount of keys / IO required to
iterate over them (max: 110ms in prod; staging (nightly benchmarks):
multiple seconds).

Find below screenshots from my ad-hoc spreadsheet + some graphs.

<img width="1182" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/81398b3f-6fa1-40dd-9887-46a4715d9194">

<img width="901" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/e4ac0393-f2c1-4187-a5e5-39a8b0c394c9">

<img width="210" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/7977ade7-6aa5-4773-a0a2-f9729aecee0d">


## Changes In This PR

This PR makes the following changes:

* rearrange the `compact_level0_phase1` code such that we build the
`all_keys_iter` and `all_values_iter` later than before
* only grab the `Timeline::layers` lock once, and hold it until we've
computed the holes
* run compact_level0_phase1 in spawn_blocking, pre-grabbing the
`Timeline::layers` lock in the async code and passing it in as an
`OwnedRwLockReadGuard`.
* the code inside spawn_blocking drops this guard after computing the
holds
* the `OwnedRwLockReadGuard` requires the `Timeline::layers` to be
wrapped in an `Arc`. I think that's Ok, the locking for the RwLock is
more heavy-weight than an additional pointer indirection.
 
## Alternatives Considered

The naive alternative is to throw the entire function into
`spawn_blocking`, and use `blocking_read` for `Timeline::layers` access.

What I've done in this PR is better because, with this alternative,
1. while we `blocking_read()`, we'd waste one slot in the spawn_blocking
pool
2. there's deadlock risk because the spawn_blocking pool is a finite
resource


![image](https://github.com/neondatabase/neon/assets/956573/46c419f1-6695-467e-b315-9d1fc0949058)

## Metadata

Fixes https://github.com/neondatabase/neon/issues/4492
2023-06-26 11:42:17 +02:00
Sasha Krassovsky
c215389f1c quote_ident identifiers when creating neon_superuser (#4562)
## Problem
2023-06-24 10:34:15 +03:00
Sasha Krassovsky
b1477b4448 Create neon_superuser role, grant it to roles created from control plane (#4425)
## Problem
Currently, if a user creates a role, it won't by default have any grants
applied to it. If the compute restarts, the grants get applied. This
gives a very strange UX of being able to drop roles/not have any access
to anything at first, and then once something triggers a config
application, suddenly grants are applied. This removes these grants.
2023-06-24 01:38:27 +03:00
Christian Schwarz
a500bb06fb use preinitialize_metrics to initialize page cache metrics (#4557)
This is follow-up to

```
commit 2252c5c282
Author: Alex Chi Z <iskyzh@gmail.com>
Date:   Wed Jun 14 17:12:34 2023 -0400

    metrics: convert some metrics to pageserver-level (#4490)
```
2023-06-23 16:40:50 -04:00
Christian Schwarz
15456625c2 don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker (#4560)
The consumption metrics synthetic size worker does logical size
calculation. Logical size calculation currently does synchronous disk
IO. This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving
other futures.

While there's work on the way to move the synchronous disk IO into
spawn_blocking, the quickfix here is to use the BACKGROUND_RUNTIME
instead of MGMT_REQUEST_RUNTIME.

Actually it's not just a quickfix. We simply shouldn't be blocking
MGMT_REQUEST_RUNTIME executor threads on CPU or sync disk IO.
That work isn't done yet, as many of the mgmt tasks still _do_ disk IO.
But it's not as intensive as the logical size calculations that we're
fixing here.

While we're at it, fix disk-usage-based eviction in a similar way. It
wasn't the culprit here, according to prod logs, but it can
theoretically be a little CPU-intensive.

More context, including graphs from Prod:
https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949
2023-06-23 15:40:36 -04:00
Vadim Kharitonov
a3f0dd2d30 Compile pg_uuidv7 (#4558)
Doc says that it should be added into `shared_preload_libraries`, but,
practically, it's not required.

```
postgres=# create extension pg_uuidv7;
CREATE EXTENSION
postgres=# SELECT uuid_generate_v7();
           uuid_generate_v7
--------------------------------------
 0188e823-3f8f-796c-a92c-833b0b2d1746
(1 row)
```
2023-06-23 15:56:49 +01:00
Christian Schwarz
76718472be add pageserver-global histogram for basebackup latency (#4559)
The histogram distinguishes by ok/err.

I took the liberty to create a small abstraction for such use cases.
It helps keep the label values inside `metrics.rs`, right next
to the place where the metric and its labels are declared.
2023-06-23 16:42:12 +02:00
Alexander Bayandin
c07b6ffbdc Fix git tag name for release (#4545)
## Problem

A git tag for a release has an extra `release-` prefix (it looks like
`release-release-3439`).
 
## Summary of changes
- Do not add `release-` prefix when create git tag
2023-06-23 12:52:17 +01:00
Alexander Bayandin
6c3605fc24 Nightly Benchmarks: Increase timeout for pgbench-compare job (#4551)
## Problem

In the test environment vacuum duration fluctuates from ~1h to ~5h, along
with another two 1h benchmarks (`select-only` and `simple-update`) it
could be up to 7h which is longer than 6h timeout.

## Summary of changes
- Increase timeout for pgbench-compare job to 8h
- Remove 6h timeouts from Nightly Benchmarks (this is a default value)
2023-06-23 12:47:37 +01:00
Vadim Kharitonov
d96d51a3b7 Update rust to 1.70.0 (#4550) 2023-06-23 13:09:04 +02:00
Alex Chi Z
a010b2108a pgserver: better template config file (#4554)
* `compaction_threshold` should be an integer, not a string.
* uncomment `[section]` so that if a user needs to modify the config,
they can simply uncomment the corresponding line. Otherwise it's easy
for us to forget uncommenting the `[section]` when uncommenting the
config item we want to configure.

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-23 10:18:06 +03:00
Anastasia Lubennikova
2f618f46be Use BUILD_TAG in compute_ctl binary. (#4541)
Pass BUILD_TAG to compute_ctl binary. 
We need it to access versioned extension storage.
2023-06-22 17:06:16 +03:00
Alexander Bayandin
d3aa8a48ea Update client libs for test_runner/pg_clients to their latest versions (#4547)
Resolves https://github.com/neondatabase/neon/security/dependabot/27
2023-06-21 16:20:35 +01:00
Christian Schwarz
e4da76f021 update_gc_info: fix typo in timeline_id tracing field (#4546)
Commit

```
commit 472cc17b7a
Author: Dmitry Rodionov <dmitry@neon.tech>
Date:   Thu Jun 15 17:30:12 2023 +0300

    propagate lock guard to background deletion task (#4495)
```

did a drive-by fix, but, the drive-by had a typo.

```
gc_loop{tenant_id=2e2f2bff091b258ac22a4c4dd39bd25d}:update_gc_info{timline_id=837c688fd37c903639b9aa0a6dd3f1f1}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000024DA0D1-000000000443FB51}:panic{thread=background op worker location=pageserver/src/tenant/timeline.rs:4843:25}: missing extractors: ["TimelineId"]

Stack backtrace:
   0: utils::logging::tracing_panic_hook
             at /libs/utils/src/logging.rs:166:21
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:2002:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:692:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:579:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys_common/backtrace.rs:137:18
   5: rust_begin_unwind
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5
   6: core::panicking::panic_fmt
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14
   7: pageserver::tenant::timeline::debug_assert_current_span_has_tenant_and_timeline_id
             at /pageserver/src/tenant/timeline.rs:4843:25
   8: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0}
             at /pageserver/src/tenant/timeline.rs:4368:9
   9: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0}> as core::future::future::Future>::poll
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  10: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}
             at /pageserver/src/tenant/timeline.rs:4363:5
  11: <pageserver::tenant::timeline::Timeline>::get_reconstruct_data::{closure#0}
             at /pageserver/src/tenant/timeline.rs:2618:69
  12: <pageserver::tenant::timeline::Timeline>::get::{closure#0}
             at /pageserver/src/tenant/timeline.rs:565:13
  13: <pageserver::tenant::timeline::Timeline>::list_slru_segments::{closure#0}
             at /pageserver/src/pgdatadir_mapping.rs:427:42
  14: <pageserver::tenant::timeline::Timeline>::is_latest_commit_timestamp_ge_than::{closure#0}
             at /pageserver/src/pgdatadir_mapping.rs:390:13
  15: <pageserver::tenant::timeline::Timeline>::find_lsn_for_timestamp::{closure#0}
             at /pageserver/src/pgdatadir_mapping.rs:338:17
  16: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0}
             at /pageserver/src/tenant/timeline.rs:3967:71
  17: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0}> as core::future::future::Future>::poll
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  18: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}
             at /pageserver/src/tenant/timeline.rs:3948:5
  19: <pageserver::tenant::Tenant>::refresh_gc_info_internal::{closure#0}
             at /pageserver/src/tenant.rs:2687:21
  20: <pageserver::tenant::Tenant>::gc_iteration_internal::{closure#0}
             at /pageserver/src/tenant.rs:2551:13
  21: <pageserver::tenant::Tenant>::gc_iteration::{closure#0}
             at /pageserver/src/tenant.rs:1490:13
  22: pageserver::tenant::tasks::gc_loop::{closure#0}::{closure#0}
             at /pageserver/src/tenant/tasks.rs:187:21
  23: pageserver::tenant::tasks::gc_loop::{closure#0}
             at /pageserver/src/tenant/tasks.rs:208:5
```

## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-21 18:00:14 +03:00
Christian Schwarz
870740c949 cargo update -p openssl (#4542)
To unblock release
https://github.com/neondatabase/neon/pull/4536#issuecomment-1600678054

Context: https://rustsec.org/advisories/RUSTSEC-2023-0044
2023-06-21 15:50:52 +03:00
Dmitry Rodionov
75d583c04a Tenant::load: fix uninit timeline marker processing (#4458)
## Problem

During timeline creation we create special mark file which presense
indicates that initialization didnt complete successfully. In case of a
crash restart we can remove such half-initialized timeline and following
retry from control plane side should perform another attempt.

So in case of a possible crash restart during initial loading we have
following picture:

```
timelines
| - <timeline_id>___uninit
| - <timeline_id>
| - | <timeline files>
```

We call `std::fs::read_dir` to walk files in `timelines` directory one
by one. If we see uninit file
we proceed with deletion of both, timeline directory and uninit file. If
we see timeline we check if uninit file exists and do the same cleanup.
But in fact its possible to get both branches to be true at the same
time. Result of readdir doesnt reflect following directory state
modifications. So you can still get "valid" entry on the next iteration
of the loop despite the fact that it was deleted in one of the previous
iterations of the loop.

To see that you can apply the following patch (it disables uninit mark
cleanup on successful timeline creation):
```diff
diff --git a/pageserver/src/tenant.rs b/pageserver/src/tenant.rs
index 4beb2664..b3cdad8f 100644
--- a/pageserver/src/tenant.rs
+++ b/pageserver/src/tenant.rs
@@ -224,11 +224,6 @@ impl UninitializedTimeline<'_> {
                             )
                         })?;
                 }
-                uninit_mark.remove_uninit_mark().with_context(|| {
-                    format!(
-                        "Failed to remove uninit mark file for timeline {tenant_id}/{timeline_id}"
-                    )
-                })?;
                 v.insert(Arc::clone(&new_timeline));
 
                 new_timeline.maybe_spawn_flush_loop();
```
And perform the following steps:

```bash
neon_local init
neon_local start
neon_local tenant create
neon_local stop
neon_local start
```

The error is:
```log
INFO load{tenant_id=X}:blocking: Found an uninit mark file .neon/tenants/X/timelines/Y.___uninit, removing the timeline and its uninit mark
2023-06-09T18:43:41.664247Z ERROR load{tenant_id=X}: load failed, setting tenant state to Broken: failed to load metadata

Caused by:
    0: Failed to read metadata bytes from path .neon/tenants/X/timelines/Y/metadata
    1: No such file or directory (os error 2)
```

So uninit mark got deleted together with timeline directory but we still
got directory entry for it and tried to load it.

The bug prevented tenant from being successfully loaded.

## Summary of changes

Ideally I think we shouldnt place uninit marks in the same directory as timeline directories but move them to separate directory and
gather them as an input to actual listing, but that would be sort of an
on-disk format change, so just check whether entries are still valid
before operating on them.
2023-06-21 14:25:58 +03:00
Alek Westover
b4c5beff9f list_files function in remote_storage (#4522) 2023-06-20 15:36:28 -04:00
bojanserafimov
90e1f629e8 Add test for skip_pg_catalog_updates (#4530) 2023-06-20 11:38:59 -04:00
Alek Westover
2023e22ed3 Add RelationError error type to pageserver rather than string parsing error messages (#4508) 2023-06-19 13:14:20 -04:00
Christian Schwarz
036fda392f log timings for compact_level0_phase1 (#4527)
The data will help decide whether it's ok
to keep holding Timeline::layers in shared mode until
after we've calculated the holes.

Other timings are to understand the general breakdown
of timings in that function.

Context: https://github.com/neondatabase/neon/issues/4492
2023-06-19 17:25:57 +03:00
Arseny Sher
557abc18f3 Fix test_s3_wal_replay assertion flakiness.
Supposedly fixes https://github.com/neondatabase/neon/issues/4277
2023-06-19 16:08:20 +04:00
Arseny Sher
3b06a5bc54 Raise pageserver walreceiver timeouts.
I observe sporadic reconnections with ~10k idle computes. It looks like a
separate issue, probably walreceiver runtime gets blocked somewhere, but in any
case 2-3 seconds is too small.
2023-06-19 15:59:38 +04:00
Alexander Bayandin
1b947fc8af test_runner: workaround rerunfailures and timeout incompatibility (#4469)
## Problem

`pytest-timeout` and `pytest-rerunfailures` are incompatible (or rather
not fully compatible). Timeouts aren't set for reruns.

Ref https://github.com/pytest-dev/pytest-rerunfailures/issues/99

## Summary of changes
- Dynamically make timeouts `func_only` for tests that we're going to
retry. It applies timeouts for reruns as well.
2023-06-16 18:08:11 +01:00
Christian Schwarz
78082d0b9f create_delta_layer: avoid needless stat (#4489)
We already do it inside `frozen_layer.write_to_disk()`.

Context:
https://github.com/neondatabase/neon/pull/4441#discussion_r1228083959
2023-06-16 16:54:41 +02:00
Alexander Bayandin
190c3ba610 Add tags for releases (#4524)
## Problem

It's not a trivial task to find corresponding changes for a particular
release (for example, for 3371 — 🤷)

Ref:
https://neondb.slack.com/archives/C04BLQ4LW7K/p1686761537607649?thread_ts=1686736854.174559&cid=C04BLQ4LW7K

## Summary of changes
- Tag releases
- Add a manual trigger for the release workflow
2023-06-16 14:17:37 +01:00
Christian Schwarz
14d495ae14 create_delta_layer: improve misleading TODO comment (#4488)
Context:
https://github.com/neondatabase/neon/pull/4441#discussion_r1228086608
2023-06-16 14:23:55 +03:00
Dmitry Rodionov
472cc17b7a propagate lock guard to background deletion task (#4495)
## Problem

1. During the rollout we got a panic: "timeline that we were deleting
was concurrently removed from 'timelines' map" that was caused by lock
guard not being propagated to the background part of the deletion.
Existing test didnt catch it because failpoint that was used for
verification was placed earlier prior to background task spawning.
2. When looking at surrounding code one more bug was detected. We
removed timeline from the map before deletion is finished, which breaks
client retry logic, because it will indicate 404 before actual deletion
is completed which can lead to client stopping its retry poll earlier.

## Summary of changes

1. Carry the lock guard over to background deletion. Ensure existing
test case fails without applied patch (second deletion becomes stuck
without it, which eventually leads to a test failure).
2. Move delete_all call earlier so timeline is removed from the map is
the last thing done during deletion.

Additionally I've added timeline_id to the `update_gc_info` span,
because `debug_assert_current_span_has_tenant_and_timeline_id` in
`download_remote_layer` was firing when `update_gc_info` lead to
on-demand downloads via `find_lsn_for_timestamp` (caught by @problame).
This is not directly related to the PR but fixes possible flakiness.

Another smaller set of changes involves deletion wrapper used in python
tests. Now there is a simpler wrapper that waits for deletions to
complete `timeline_delete_wait_completed`. Most of the
test_delete_timeline.py tests make negative tests, i.e., "does
ps_http.timeline_delete() fail in this and that scenario".
These can be left alone. Other places when we actually do the deletions,
we need to use the helper that polls for completion.

Discussion
https://neondb.slack.com/archives/C03F5SM1N02/p1686668007396639

resolves #4496

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-06-15 17:30:12 +03:00
Arthur Petukhovsky
76413a0fb8 Revert reconnect_timeout to improve performance (#4512)
Default value for `wal_acceptor_reconnect_timeout` was changed in
https://github.com/neondatabase/neon/pull/4428 and it affected
performance up to 20% in some cases.

Revert the value back.
2023-06-15 15:26:59 +03:00
Alexander Bayandin
e60b70b475 Fix data ingestion scripts (#4515)
## Problem

When I switched `psycopg2.connect` from context manager to a regular
function call in https://github.com/neondatabase/neon/pull/4382 I 
embarrassingly forgot about commit, so it doesn't really put data into DB 😞

## Summary of changes
- Enable autocommit for data ingestion scripts
2023-06-15 15:01:06 +03:00
Alex Chi Z
2252c5c282 metrics: convert some metrics to pageserver-level (#4490)
## Problem

Some metrics are better to be observed at page-server level. Otherwise,
as we have a lot of tenants in production, we cannot do a sum b/c
Prometheus has limit on how many time series we can aggregate. This also
helps reduce metrics scraping size.

## Summary of changes

Some integration tests are likely not to pass as it will check the
existence of some metrics. Waiting for CI complete and fix them.

Metrics downgraded: page cache hit (where we are likely to have a
page-server level page cache in the future instead of per-tenant), and
reconstruct time (this would better be tenant-level, as we have one pg
replayer for each tenant, but now we make it page-server level as we do
not need that fine-grained data).

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-14 17:12:34 -04:00
Alexander Bayandin
94f315d490 Remove neon-image-depot job (#4506)
## Problem

`neon-image-depot` is an experimental job we use to compare with the
main `neon-image` job.
But it's not stable and right now we don't have the capacity to properly
fix and evaluate it.
We can come back to this later.

## Summary of changes

Remove `neon-image-depot` job
2023-06-14 19:03:09 +01:00
Christian Schwarz
cd3faa8c0c test_basic_eviction: avoid some sources of flakiness (#4504)
We've seen the download_layer() call return 304 in prod because of a
spurious on-demand download caused by a GetPage request from compute.

Avoid these and some other sources of on-demand downloads by shutting
down compute, SKs, and by disabling background loops.

CF
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4498/5258914461/index.html#suites/2599693fa27db8427603ba822bcf2a20/357808fd552fede3
2023-06-14 19:04:22 +02:00
Arthur Petukhovsky
a7a0c3cd27 Invalidate proxy cache in http-over-sql (#4500)
HTTP queries failed with errors `error connecting to server: failed to
lookup address information: Name or service not known\n\nCaused by:\n
failed to lookup address information: Name or service not known`

The fix reused cache invalidation logic in proxy from usual postgres
connections and added it to HTTP-over-SQL queries.

Also removed a timeout for HTTP request, because it almost never worked
on staging (50s+ time just to start the compute), and we can have the
similar case in production. Should be ok, since we have a limits for the
requests and responses.
2023-06-14 19:24:46 +03:00
Dmitry Rodionov
ee9a5bae43 Filter only active timelines for compaction (#4487)
Previously we may've included Stopping/Broken timelines here, which
leads to errors in logs -> causes tests to sporadically fail

resolves #4467
2023-06-14 19:07:42 +03:00
Alexander Bayandin
9484b96d7c GitHub Autocomment: do not fail the job (#4478)
## Problem

If the script fails to generate a test summary, the step also fails the
job/workflow (despite this could be a non-fatal problem).

## Summary of changes
- Separate JSON parsing and summarisation into separate functions
- Wrap functions calling into try..catch block, add an error message to
GitHub comment and do not fail the step
- Make `scripts/comment-test-report.js` a CLI script that can be run
locally (mock GitHub calls) to make it easier to debug issues locally
2023-06-14 15:07:30 +01:00
Shany Pozin
ebee8247b5 Move s3 delete_objects to use chunks of 1000 OIDs (#4463)
See
https://github.com/neondatabase/neon/pull/4461#pullrequestreview-1474240712
2023-06-14 15:38:01 +03:00
bojanserafimov
3164ad7052 compute_ctl: Spec parser forward compatibility test (#4494) 2023-06-13 21:48:09 -04:00
Alexander Bayandin
a0b3990411 Retry data ingestion scripts on connection errors (#4382)
## Problem

From time to time, we're catching a race condition when trying to upload
perf or regression test results.

Ref:
- https://neondb.slack.com/archives/C03H1K0PGKH/p1685462717870759
- https://github.com/neondatabase/cloud/issues/3686

## Summary of changes

Wrap `psycopg2.connect` method with `@backoff.on_exception`
contextmanager
2023-06-13 22:33:42 +01:00
Stas Kelvich
4385e0c291 Return more RowDescription fields via proxy json endpoint
As we aim to align client-side behavior with node-postgres, it's necessary for us to return these fields, because node-postgres does so as well.
2023-06-13 22:31:18 +03:00
Christian Schwarz
3693d1f431 turn Timeline::layers into tokio::sync::RwLock (#4441)
This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).

# Full Stack Of Preliminary PRs

Thanks to the countless preliminary PRs, this conversion is relatively
straight-forward.

1. Clean-ups
  * https://github.com/neondatabase/neon/pull/4316
  * https://github.com/neondatabase/neon/pull/4317
  * https://github.com/neondatabase/neon/pull/4318
  * https://github.com/neondatabase/neon/pull/4319
  * https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
   thought we'd need in my original plan where we would need to convert
   `Tenant::timelines` into an async locking primitive (#4333). In reviews,
   we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485

# Significant Changes In This PR

## `compact_level0_phase1` & `create_delta_layer`

This commit partially reverts

   "pgserver: spawn_blocking in compaction (#4265)"
    4e359db4c7.

Specifically, it reverts the `spawn_blocking`-ificiation of
`compact_level0_phase1`.
If we didn't revert it, we'd have to use `Timeline::layers.blocking_read()`
inside `compact_level0_phase1`. That would use up a thread in the
`spawn_blocking` thread pool, which is hard-capped.

I considered wrapping the code that follows the second
`layers.read().await` into `spawn_blocking`, but there are lifetime
issues with `deltas_to_compact`.

Also, this PR switches the `create_delta_layer` _function_ back to
async, and uses `spawn_blocking` inside to run the code that does sync
IO, while keeping the code that needs to lock `Timeline::layers` async.

## `LayerIter` and `LayerKeyIter` `Send` bounds

I had to add a `Send` bound on the `dyn` type that `LayerIter`
and `LayerKeyIter` wrap. Why? Because we now have the second
`layers.read().await` inside `compact_level0_phase`, and these
iterator instances are held across that await-point.

More background:
https://github.com/neondatabase/neon/pull/4462#issuecomment-1587376960

## `DatadirModification::flush`

Needed to replace the `HashMap::retain` with a hand-rolled variant
because `TimelineWriter::put` is now async.
2023-06-13 18:38:41 +02:00
Christian Schwarz
fdf7a67ed2 init_empty_layer_map: use try_write (#4485)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
Or more specifically, #4441, where we turn Timeline::layers into a
tokio::sync::RwLock.

By using try_write() here, we can avoid turning init_empty_layer_map
async,
which is nice because much of its transitive call(er) graph isn't async.
2023-06-13 13:49:40 +02:00
Alexey Kondratov
1299df87d2 [compute_ctl] Fix logging if catalog updates are skipped (#4480)
Otherwise, it wasn't clear from the log when Postgres started up
completely if catalog updates were skipped.

Follow-up for 4936ab6
2023-06-13 13:34:56 +02:00
Christian Schwarz
754ceaefac make TimelineWriter Send by using tokio::sync Mutex internally (#4477)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).

There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.

That will require the `TimelineWriter` to become async, because at times
its functions need to lock `Timeline::layers` in order to freeze the
open layer.

While doing that, rustc complains that we're now holding
`Timeline::write_lock` across await points (lock order is that
`write_lock` must be acquired before `Timelines::layers`).

So, we need to switch it over to an async primitive.
2023-06-13 10:15:25 +02:00
Arseny Sher
143fa0da42 Remove timeout on test_close_on_connections_exit
We have 300s timeout on all tests, and doubling logic in popen.wait sometimes
exceeds 5s, making the test flaky.

ref https://github.com/neondatabase/neon/issues/4211
2023-06-13 06:26:03 +04:00
bojanserafimov
4936ab6842 compute_ctl: add flag to avoid config step (#4457)
Add backwards-compatible flag that cplane can use to speed up startup time
2023-06-12 13:57:02 -04:00
Christian Schwarz
939593d0d3 refactor check_checkpoint_distance to prepare for async Timeline::layers (#4476)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).

There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.

That will require the `TimelineWriter` to become async.

That will require `freeze_inmem_layer` to become async.

So, inside check_checkpoint_distance, we will have
`freeze_inmem_layer().await`.

But current rustc isn't smart enough to understand that we
`drop(layers)` earlier, and hence, will complain about the `!Send`
`layers` being held across the `freeze_inmem_layer().await`-point.

This patch puts the guard into a scope, so rustc will shut up in the
next patch where we make the transition for `TimelineWriter`.

obsoletes https://github.com/neondatabase/neon/pull/4474
2023-06-12 17:45:56 +01:00
Christian Schwarz
2011cc05cd make Delta{Value,Key}Iter Send (#4472)
... by switching the internal RwLock to a OnceCell.

This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).

See https://github.com/neondatabase/neon/pull/4462#issuecomment-1587398883
for more context.

fixes https://github.com/neondatabase/neon/issues/4471
2023-06-12 17:45:56 +01:00
Arthur Petukhovsky
b0286e3c46 Always truncate WAL after restart (#4464)
c058e1cec2 skipped `truncate_wal()` it if `write_lsn` is equal to
truncation position, but didn't took into account that `write_lsn` is
reset on restart.

Fixes regression looking like:

```
ERROR WAL acceptor{cid=22 ...}:panic{thread=WAL acceptor 19b6c1743666ec02991a7633c57178db/b07db8c88f4c76ea5ed0954c04cc1e74 location=safekeeper/src/wal_storage.rs:230:13}: unexpected write into non-partial segment file
```

This fix will prevent skipping WAL truncation when we are running for
the first time after restart.
2023-06-12 13:42:28 +00:00
Heikki Linnakangas
e4f05ce0a2 Enable sanity check that disk_consistent_lsn is valid on created timeline.
Commit `create_test_timeline: always put@initdb_lsn the minimum required keys`
already switched us over to using valid initdb_lsns.

All that's left to do is to actually flush the minimum keys so that
we move from disk_consistent_lsn=Lsn(0) to disk_consistent_lsn=initdb_lsn.

Co-authored-by: Christian Schwarz <christian@neon.tech>

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Heikki Linnakangas
8d106708d7 Clean up timeline initialization code.
Clarify who's responsible for initializing the layer map. There were
previously two different ways to do it:

- create_empty_timeline and bootstrap_timeline let prepare_timeline()
  initialize an empty layer map.

- branch_timeline passed a flag to initialize_with_lock() to tell
  initialize_with_lock to call load_layer_map(). Because it was a
  newly created timeline, load_layer_map() never found any layer
  files, so it just initialized an empty layer map.

With this commit, prepare_new_timeline() always does it. The LSN to
initialize it with is passed as argument.

Other changes per function:

prepare_timeline:
- rename to 'prepare_new_timeline' to make it clear that it's only used
  when creating a new timeline, not when loading an existing timeline
- always initialize an empty layer map. The caller can pass the LSN to
  initialize it with. (Previously, prepare_timeline would optionally
  load the layer map at 'initdb_lsn'. Some caller used that, while others
  let initialize_with_lock do it

initialize_with_lock:
- As mentioned above, remove the option to load the layer map
- Acquire the 'timelines' lock in the function itself. None of the callers
  did any other work while holding the lock.
- Rename it to finish_creation() to make its intent more clear. It's only
  used when creating a new timeline now.

create_timeline_data:
- Rename to create_timeline_struct() for clarity. It just initializes
  the Timeline struct, not any other "data"

create_timeline_files:
- use create_dir rather than create_dir_all, to be a little more strict.
  We know that the parent directory should already exist, and the timeline
  directory should not exist.
- Move the call to create_timeline_struct() to the caller. It was just
  being "passed through"

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Christian Schwarz
f450369b20 timeline_init_and_sync: don't hold Tenant::timelines while load_layer_map
This patch inlines `initialize_with_lock` and then reorganizes the code
such that we can `load_layer_map` without holding the
`Tenant::timelines` lock.

As a nice aside, we can get rid of the dummy() uninit mark, which has
always been a terrible hack.

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Christian Schwarz
aad918fb56 create_test_timeline: tests for put@initdb_lsn optimization code 2023-06-12 11:04:49 +01:00
Christian Schwarz
86dd8c96d3 add infrastructure to expect use of initdb_lsn flush optimization 2023-06-12 11:04:49 +01:00
Christian Schwarz
6a65c4a4fe create_test_timeline: always put@initdb_lsn the minimum required keys (#4451)
See the added comment on `create_empty_timeline`.

The various test cases now need to set a valid `Lsn` instead of
`Lsn(0)`.

Rough context:
https://github.com/neondatabase/neon/pull/4364#discussion_r1221995691
2023-06-12 09:28:34 +00:00
Vadim Kharitonov
e9072ee178 Compile rdkit (#4442)
`rdkit` extension

```
postgres=# create extension rdkit;
CREATE EXTENSION
postgres=# select 'c1[o,s]ncn1'::qmol;
    qmol
-------------
 c1[o,s]ncn1
(1 row)
```
2023-06-12 11:13:33 +02:00
Joonas Koivunen
7e17979d7a feat: http request logging on safekeepers.
With RequestSpan, successfull GETs are not logged, but all others, errors and
warns on cancellations are.
2023-06-11 22:53:08 +04:00
Arseny Sher
227271ccad Switch safekeepers to async.
This is a full switch, fs io operations are also tokio ones, working through
thread pool. Similar to pageserver, we have multiple runtimes for easier `top`
usage and isolation.

Notable points:
- Now that guts of safekeeper.rs are full of .await's, we need to be very
  careful not to drop task at random point, leaving timeline in unclear
  state. Currently the only writer is walreceiver and we don't have top
  level cancellation there, so we are good. But to be safe probably we should
  add a fuse panicking if task is being dropped while operation on a timeline
  is in progress.
- Timeline lock is Tokio one now, as we do disk IO under it.
- Collecting metrics got a crutch: since prometheus Collector is
  synchronous, it spawns a thread with current thread runtime collecting data.
- Anything involving closures becomes significantly more complicated, as
  async fns are already kinda closures + 'async closures are unstable'.
- Main thread now tracks other main tasks, which got much easier.
- The only sync place left is initial data loading, as otherwise clippy
  complains on timeline map lock being held across await points -- which is
  not bad here as it happens only in single threaded runtime of main thread.
  But having it sync doesn't hurt either.

I'm concerned about performance of thread pool io offloading, async traits and
many await points; but we can try and see how it goes.

fixes https://github.com/neondatabase/neon/issues/3036
fixes https://github.com/neondatabase/neon/issues/3966
2023-06-11 22:53:08 +04:00
dependabot[bot]
fbf0367e27 build(deps): bump cryptography from 39.0.1 to 41.0.0 (#4409) 2023-06-11 19:14:30 +01:00
Arthur Petukhovsky
a21b55fe0b Use connect_timeout for broker::connect (#4452)
Use `storage_broker::connect` everywhere. Add a default 5 seconds
timeout for opening new connection.
2023-06-09 17:38:53 +03:00
Shany Pozin
add51e1372 Add delete_objects to storage api (#4449)
## Summary of changes
Add missing delete_objects API to support bulk deletes
2023-06-09 13:23:12 +03:00
Alex Chi Z
cdce04d721 pgserver: add local manifest for atomic operation (#4422)
## Problem

Part of https://github.com/neondatabase/neon/issues/4418

## Summary of changes

This PR implements the local manifest interfaces. After the refactor of
timeline is done, we can integrate this with the current storage. The
reader will stop at the first corrupted record.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-08 19:34:25 -04:00
bojanserafimov
6bac770811 Add cold start test (#4436) 2023-06-08 18:11:33 -04:00
Stas Kelvich
c82d19d8d6 Fix NULLs handling in proxy json endpoint
There were few problems with null handling:

* query_raw_txt() accepted vector of string so it always (erroneously)
treated "null" as a string instead of null. Change rust pg client
to accept the vector of Option<String> instead of just Strings. Adopt
coding here to pass nulls as None.

* pg_text_to_json() had a check that always interpreted "NULL" string
as null. That is wrong and nulls were already handled by match None.
This bug appeared as a bad attempt to parse arrays containing NULL
elements. Fix coding by checking presence of quotes while parsing an
array (no quotes -> null, quoted -> "null" string).

Array parser fix also slightly changes behavior by always cleaning
current entry when pushing to the resulting vector. This seems to be
an omission by previous coding, however looks like it was harmless
as entry was not cleared only at the end of the nested or to-level
array.
2023-06-08 16:00:18 +03:00
Stas Kelvich
d73639646e Add more output options to proxy json endpoint
With this commit client can pass following optional headers:

`Neon-Raw-Text-Output: true`. Return postgres values as text, without parsing them. So numbers, objects, booleans, nulls and arrays will be returned as text. That can be useful in cases when client code wants to implement it's own parsing or reuse parsing libraries from e.g. node-postgres.

`Neon-Array-Mode: true`. Return postgres rows as arrays instead of objects. That is more compact representation and also helps in some edge
cases where it is hard to use rows represented as objects (e.g. when several fields have the same name).
2023-06-08 16:00:18 +03:00
Dmitry Rodionov
d53f9ab3eb delete timelines from s3 (#4384)
Delete data from s3 when timeline deletion is requested

## Summary of changes

UploadQueue is altered to support scheduling of delete operations in
stopped state. This looks weird, and I'm thinking whether there are
better options/refactorings for upload client to make it look better.

Probably can be part of https://github.com/neondatabase/neon/issues/4378

Deletion is implemented directly in existing endpoint because changes are not
that significant. If we want more safety we can separate those or create
feature flag for new behavior.

resolves [#4193](https://github.com/neondatabase/neon/issues/4193)

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-06-08 15:01:22 +03:00
Dmitry Rodionov
8560a98d68 fix openapi spec to pass swagger editor validation (#4445)
There shouldnt be a dash before `type: object`. Also added description.
2023-06-08 13:25:30 +03:00
Alex Chi Z
2e687bca5b refactor: use LayerDesc in layer map (part 1) (#4408)
## Problem

part of https://github.com/neondatabase/neon/issues/4392

## Summary of changes

This PR adds a new HashMap that maps persistent layer desc to the layer
object *inside* LayerMap. Originally I directly went towards adding such
layer cache in Timeline, but the changes are too many and cannot be
reviewed as a reasonably-sized PR. Therefore, we take this intermediate
step to change part of the codebase to use persistent layer desc, and
come up with other PRs to move this hash map of layer desc to the
timeline struct.

Also, file_size is now part of the layer desc.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-07 18:28:18 +03:00
Dmitry Rodionov
1a1019990a map TenantState::Broken to TenantAttachmentStatus::Failed (#4371)
## Problem

Attach failures are not reported in public part of the api (in
`attachment_status` field of TenantInfo).

## Summary of changes

Expose TenantState::Broken as TenantAttachmentStatus::Failed

In the way its written Failed status will be reported even if no
attachment happened. (I e if tenant become broken on startup). This is
in line with other members. I e Active will be resolved to Attached even
if no actual attach took place.

This can be tweaked if needed. At the current stage it would be overengineering without clear motivation

resolves #4344
2023-06-07 18:25:30 +03:00
Alex Chi Z
1c200bd15f fix: break dev dependencies between wal_craft and pg_ffi (#4424)
## Problem

close https://github.com/neondatabase/neon/issues/4266

## Summary of changes

With this PR, rust-analyzer should be able to give lints and auto
complete in `mod tests`, and this makes writing tests easier.
Previously, rust-analyzer cannot do auto completion.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-07 17:51:13 +03:00
Arseny Sher
37bf2cac4f Persist safekeeper control file once in a while.
It should make remote_consistent_lsn commonly up-to-date on non actively writing
projects, which removes spike or pageserver -> safekeeper reconnections on
storage nodes restart.
2023-06-07 17:23:37 +04:00
Joonas Koivunen
5761190e0d feat: three phased startup order (#4399)
Initial logical size calculation could still hinder our fast startup
efforts in #4397. See #4183. In deployment of 2023-06-06
about a 200 initial logical sizes were calculated on hosts which
took the longest to complete initial load (12s).

Implements the three step/tier initialization ordering described in
#4397:
1. load local tenants
2. do initial logical sizes per walreceivers for 10s
3. background tasks

Ordering is controlled by:
- waiting on `utils::completion::Barrier`s on background tasks
- having one attempt for each Timeline to do initial logical size
calculation
- `pageserver/src/bin/pageserver.rs` releasing background jobs after
timeout or completion of initial logical size calculation

The timeout is there just to safeguard in case a legitimate non-broken
timeline initial logical size calculation goes long. The timeout is
configurable, by default 10s, which I think would be fine for production
systems. In the test cases I've been looking at, it seems that these
steps are completed as fast as possible.

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-06-07 14:29:23 +03:00
Vadim Kharitonov
88f0cfc575 Fix pgx_ulid extension (#4431)
The issue was in the wrong `control` file name
2023-06-07 11:41:53 +02:00
Arseny Sher
6b3c020cd9 Don't warn on system id = 0 in walproposer greeting.
sync-safekeepers doesn't know it and sends 0.
2023-06-07 12:39:20 +04:00
Arseny Sher
c058e1cec2 Quick exit in truncate_wal if nothing to do.
ref https://github.com/neondatabase/neon/issues/4414
2023-06-07 12:39:20 +04:00
Arseny Sher
dc6a382873 Increase timeouts on compute -> sk connections.
context: https://github.com/neondatabase/neon/issues/4414

And improve messages/comments here and there.
2023-06-07 12:39:20 +04:00
Heikki Linnakangas
df3bae2ce3 Use compute_ctl to manage Postgres in tests. (#3886)
This adds test coverage for 'compute_ctl', as it is now used by all
the python tests.
    
There are a few differences in how 'compute_ctl' is called in the
tests, compared to the real web console:
    
- In the tests, the postgresql.conf file is included as one large
  string in the spec file, and it is written out as it is to the data
  directory.  I added a new field for that to the spec file. The real
  web console, however, sets all the necessary settings in the
  'settings' field, and 'compute_ctl' creates the postgresql.conf from
  those settings.

- In the tests, the information needed to connect to the storage, i.e.
  tenant_id, timeline_id, connection strings to pageserver and
  safekeepers, are now passed as new fields in the spec file. The real
  web console includes them as the GUCs in the 'settings' field. (Both
  of these are different from what the test control plane used to do:
  It used to write the GUCs directly in the postgresql.conf file). The
  plan is to change the control plane to use the new method, and
  remove the old method, but for now, support both.

Some tests that were sensitive to the amount of WAL generated needed
small changes, to accommodate that compute_ctl runs the background
health monitor which makes a few small updates. Also some tests shut
down the pageserver, and now that the background health check can run
some queries while the pageserver is down, that can produce a few
extra errors in the logs, which needed to be allowlisted.

Other changes:
- remove obsolete comments about PostgresNode;
- create standby.signal file for Static compute node;
- log output of `compute_ctl` and `postgres` is merged into
`endpoints/compute.log`.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-06-06 14:59:36 +01:00
Joonas Koivunen
0cef7e977d refactor: just one way to shutdown a tenant (#4407)
We have 2 ways of tenant shutdown, we should have just one.

Changes are mostly mechanical simple refactorings.

Added `warn!` on the "shutdown all remaining tasks" should trigger test
failures in the between time of not having solved the "tenant/timeline
owns all spawned tasks" issue.

Cc: #4327.
2023-06-06 15:30:55 +03:00
Joonas Koivunen
18a9d47f8e test: restore NotConnected being allowed globally (#4426)
Flakyness introduced by #4402 evidence [^1].

I had assumed the NotConnected would had been an expected io error, but
it's not. Restore the global `allowed_error`.

[^1]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4407/5185897757/index.html#suites/82004ab4e3720b47bf78f312dabe7c55/14f636d0ecd3939d/
2023-06-06 13:51:39 +03:00
Sasha Krassovsky
ac11e7c32d Remove arch-specific stuff from HNSW extension (#4423) 2023-06-05 22:04:15 -08:00
Konstantin Knizhnik
8e1b5e1224 Remove -ftree-vectorizer-verbose=0 option notrecognized by MaxOS/X c… (#4412)
…ompiler

## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-05 20:10:19 +03:00
Joonas Koivunen
e0bd81ce1f test: fix flaky warning on attach (#4415)
added the `allowed_error` to the `positive_env` so any tests completing
the attach are allowed have this print out. they are allowed to do so,
because the `random_init_delay` can produce close to zero and thus the
first run will be near attach. Though... Unsure if we ever really need
the eviction task to run **before** it can evict something, as in after
20min or 24h.

in the failed test case however period is 20s so interesting that we
didn't run into this sooner.

evidence of flaky:
https://github.com/neondatabase/neon/actions/runs/5175677035/jobs/9323705929?pr=4399#step:4:38536
2023-06-05 18:12:58 +03:00
Joonas Koivunen
77598f5d0a Better walreceiver logging (#4402)
walreceiver logs are a bit hard to understand because of partial span
usage, extra messages, ignored errors popping up as huge stacktraces.

Fixes #3330 (by spans, also demote info -> debug).

- arrange walreceivers spans into a hiearchy: 
    - `wal_connection_manager{tenant_id, timeline_id}` ->
      `connection{node_id}` -> `poller`
- unifies the error reporting inside `wal_receiver`:
- All ok errors are now `walreceiver connection handling ended: {e:#}`
- All unknown errors are still stacktraceful task_mgr reported errors
  with context `walreceiver connection handling failure`
- Remove `connect` special casing, was: `DB connection stream finished`
  for ok errors
- Remove `done replicating` special casing, was `Replication stream
  finished` for ok errors
- lowered log levels for (non-exhaustive list):
    - `WAL receiver manager started, connecting to broker` (at startup)
    - `WAL receiver shutdown requested, shutting down` (at shutdown)
    - `Connection manager loop ended, shutting down` (at shutdown)
    - `sender is dropped while join handle is still alive` (at lucky
      shutdown, see #2885)
    - `timeline entered terminal state {:?}, stopping wal connection manager
      loop` (at shutdown)
    - `connected!` (at startup)
- `Walreceiver db connection closed` (at disconnects?, was without span)
    - `Connection cancelled` (at shutdown, was without span)
- `observed timeline state change, new state is {new_state:?}` (never
  after Timeline::activate was made infallible)
- changed:
    - `Timeline dropped state updates sender, stopping wal connection
      manager loop`
    - was out of date; sender is not dropped but `Broken | Stopping` state
      transition
        - also made `debug!`
    - `Timeline dropped state updates sender before becoming active,
      stopping wal connection manager loop`
    - was out of date: sender is again not dropped but `Broken | Stopping`
      state transition
        - also made `debug!`
- log fixes:
    - stop double reporting panics via JoinError
2023-06-05 17:35:23 +03:00
Joonas Koivunen
8142edda01 test: Less flaky gc (#4416)
Solves a flaky test error in the wild[^1] by:

- Make the gc shutdown signal reading an `allowed_error`
- Note the gc shutdown signal readings as being in `allowed_error`s
- Allow passing tenant conf to init_start to avoid unncessary tenants

[^1]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4399/5176432780/index.html#suites/b97efae3a617afb71cb8142f5afa5224/2cd76021ea011f93
2023-06-05 15:43:52 +03:00
Vadim Kharitonov
b9871158ba Compile PGX ULID extension (#4413)
Create pgx_ulid extension

```
postgres=# create extension ulid;
CREATE EXTENSION
postgres=# CREATE TABLE users (
  id ulid NOT NULL DEFAULT gen_ulid() PRIMARY KEY,
  name text NOT NULL
);
CREATE TABLE
postgres=# insert into users (name) values ('vadim');
INSERT 0 1
postgres=# select * from users;
             id             | name
----------------------------+-------
 01H25DDG3KYMYZTNR41X38E256 | vadim
```
2023-06-05 12:52:13 +03:00
Joonas Koivunen
8caef2c0c5 fix: delay eviction_task as well (#4397)
As seen on deployment of 2023-06-01 release, times were improving but
there were some outliers caused by:
- timelines `eviction_task` starting while activating and running
imitation
- timelines `initial logical size` calculation

This PR fixes it so that `eviction_task` is delayed like other
background tasks fixing an oversight from earlier #4372.

After this PR activation will be two phases:
1. load and activate tenants AND calculate some initial logical sizes
2. rest of initial logical sizes AND background tasks
- compaction, gc, disk usage based eviction, timelines `eviction_task`,
consumption metrics
2023-06-05 09:37:53 +03:00
Konstantin Knizhnik
04542826be Add HNSW extension (#4227)
## Describe your changes

Port HNSW implementation for ANN search top Postgres 

## Issue ticket number and link

https://www.pinecone.io/learn/hnsw

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-04 11:41:38 +03:00
bojanserafimov
4ba950a35a Add libcurl as dependency to readme (#4405) 2023-06-02 18:07:45 -04:00
Joonas Koivunen
a55c663848 chore: comment marker fixes (#4406)
Upgrading to rust 1.70 will require these.
2023-06-02 21:03:12 +03:00
Heikki Linnakangas
9787227c35 Shield HTTP request handlers from async cancellations. (#4314)
We now spawn a new task for every HTTP request, and wait on the
JoinHandle. If Hyper drops the Future, the spawned task will keep
running. This protects the rest of the pageserver code from unexpected
async cancellations.

This creates a CancellationToken for each request and passes it to the
handler function. If the HTTP request is dropped by the client, the
CancellationToken is signaled. None of the handler functions make use
for the CancellationToken currently, but they now they could.

The CancellationToken arguments also work like documentation. When
you're looking at a function signature and you see that it takes a
CancellationToken as argument, it's a nice hint that the function might
run for a long time, and won't be async cancelled. The default
assumption in the pageserver is now that async functions are not
cancellation-safe anyway, unless explictly marked as such, but this is a
nice extra reminder.

Spawning a task for each request is OK from a performance point of view
because spawning is very cheap in Tokio, and none of our HTTP requests
are very performance critical anyway.

Fixes issue #3478
2023-06-02 08:28:13 -04:00
Joonas Koivunen
ef80a902c8 pg_sni_router: add session_id to more messages (#4403)
See superceded #4390.

- capture log in test
- expand the span to cover init and error reporting
- remove obvious logging by logging only unexpected
2023-06-02 14:59:10 +03:00
Alex Chi Z
66cdba990a refactor: use PersistentLayerDesc for persistent layers (#4398)
## Problem

Part of https://github.com/neondatabase/neon/issues/4373

## Summary of changes

This PR adds `PersistentLayerDesc`, which will be used in LayerMap
mapping and probably layer cache. After this PR and after we change
LayerMap to map to layer desc, we can safely drop RemoteLayerDesc.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-01 22:06:28 +03:00
Alex Chi Z
82484e8241 pgserver: add more metrics for better observability (#4323)
## Problem

This PR includes doc changes to the current metrics as well as adding
new metrics. With the new set of metrics, we can quantitatively analyze
the read amp., write amp. and space amp. in the system, when used
together with https://github.com/neondatabase/neonbench

close https://github.com/neondatabase/neon/issues/4312
ref https://github.com/neondatabase/neon/issues/4347

compaction metrics TBD, a novel idea is to print L0 file number and
number of layers in the system, and we can do this in the future when we
start working on compaction.

## Summary of changes

* Add `READ_NUM_FS_LAYERS` for computing read amp.
* Add `MATERIALIZED_PAGE_CACHE_HIT_UPON_REQUEST`.
* Add `GET_RECONSTRUCT_DATA_TIME`. GET_RECONSTRUCT_DATA_TIME +
RECONSTRUCT_TIME + WAIT_LSN_TIME should be approximately total time of
reads.
* Add `5.0` and `10.0` to `STORAGE_IO_TIME_BUCKETS` given some fsync
runs slow (i.e., > 1s) in some cases.
* Some `WAL_REDO` metrics are only used when Postgres is involved in the
redo process.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-01 21:46:04 +03:00
Joonas Koivunen
36fee50f4d compute_ctl: enable tracing panic hook (#4375)
compute_ctl can panic, but `tracing` is used for logging. panic stderr
output can interleave with messages from normal logging. The fix is to
use the established way (pageserver, safekeeper, storage_broker) of using
`tracing` to report panics.
2023-06-01 20:12:07 +03:00
bojanserafimov
330083638f Fix stale and misleading comment in LayerMap (#4297) 2023-06-01 05:04:46 +03:00
Konstantin Knizhnik
952d6e43a2 Add pageserver parameter forced_image_creation_limit which can be used… (#4353)
This parameter can be use to restrict number of image layers generated
because of GC request (wanted image layers).
Been set to zero it completely eliminates creation of such image layers.
So it allows to avoid extra storage consumption after merging #3673

## Problem
PR #3673 forces generation of missed image layers. So i short term is
cause cause increase (in worst case up to two times) size of storage.
It was intended (by me) that GC period is comparable with PiTR interval.
But looks like it is not the case now - GC is performed much more
frequently. It may cause the problem with space exhaustion: GC forces
new image creation while large PiTR still prevent GC from collecting old
layers.

## Summary of changes

Add new pageserver parameter` forced_image_creation_limit` which
restrict number of created image layers which are requested by GC.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-05-31 21:37:20 +03:00
bojanserafimov
b6447462dc Fix layer map correctness bug (#4342) 2023-05-31 12:23:00 -04:00
Dmitry Rodionov
b190c3e6c3 reduce flakiness by allowing Compaction failed, retrying in X queue is in state Stopped. (#4379)
resolves https://github.com/neondatabase/neon/issues/4374 by adding the error to allowed_errors
2023-05-30 20:11:44 +03:00
Joonas Koivunen
f4db85de40 Continued startup speedup (#4372)
Startup continues to be slow, work towards to alleviate it.

Summary of changes:

- pretty the functional improvements from #4366 into
`utils::completion::{Completion, Barrier}`
- extend "initial load completion" usage up to tenant background tasks
    - previously only global background tasks
- spawn_blocking the tenant load directory traversal
- demote some logging
- remove some unwraps
- propagate some spans to `spawn_blocking`

Runtime effects should be major speedup to loading, but after that, the
`BACKGROUND_RUNTIME` will be blocked for a long time (minutes). Possible
follow-ups:
- complete initial tenant sizes before allowing background tasks to
block the `BACKGROUND_RUNTIME`
2023-05-30 16:25:07 +03:00
Arthur Petukhovsky
210be6b6ab Replace broker duration logs with metrics (#4370)
I've added logs for broker push duration after every iteration in https://github.com/neondatabase/neon/pull/4142. This log has not found any real issues, so we can replace it with metrics, to slightly reduce log volume.

LogQL query found that pushes longer that 500ms happened only 90 times for the last month. https://neonprod.grafana.net/goto/KTNj9UwVg?orgId=1

`{unit="safekeeper.service"} |= "timeline updates to broker in" | regexp "to broker in (?P<duration>.*)" | duration > 500ms`
2023-05-30 16:08:02 +03:00
Alexander Bayandin
daa79b150f Code Coverage: store lcov report (#4358)
## Problem

In the future, we want to compare code coverage on a PR with coverage on
the main branch.
Currently, we store only code coverage HTML reports, I suggest we start
storing reports in "lcov info" format that we can use/parse in the
future. Currently, the file size is ~7Mb (it's a text-based format and
could be compressed into a ~400Kb archive)

- More about "lcov info" format:
https://manpages.ubuntu.com/manpages/jammy/man1/geninfo.1.html#files
- Part of https://github.com/neondatabase/neon/issues/3543

## Summary of changes
- Change `scripts/coverage` to output lcov coverage to
`report/lcov.info` file instead of stdout (we already upload the whole
`report/` directory to S3)
2023-05-30 14:05:41 +01:00
Joonas Koivunen
db14355367 revert: static global init logical size limiter (#4368)
added in #4366. revert for testing without it; it may have unintenteded
side-effects, and it's very difficult to understand the results from the
10k load testing environments. earlier results:
https://github.com/neondatabase/neon/pull/4366#issuecomment-1567491064
2023-05-30 10:40:37 +03:00
Joonas Koivunen
cb83495744 try: startup speedup (#4366)
Startup can take a long time. We suspect it's the initial logical size
calculations. Long term solution is to not block the tokio executors but
do most of I/O in spawn_blocking.

See: #4025, #4183

Short-term solution to above:

- Delay global background tasks until initial tenant loads complete
- Just limit how many init logical size calculations can we have at the
same time to `cores / 2`

This PR is for trying in staging.
2023-05-29 21:48:38 +03:00
Christian Schwarz
f4f300732a refactor TenantState transitions (#4321)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
The motivation is to avoid locking `Tenant::timelines` in places that
can't be `async`, because in #4333 we want to convert Tenant::timelines
from `std::sync::Mutex` to `tokio::sync::Mutex`.

But, the changes here are useful in general because they clean up &
document tenant state transitions.
That also paves the way for #4350, which is an alternative to #4333 that
refactors the pageserver code so that we can keep the
`Tenant::timelines` mutex sync.

This patch consists of the following core insights and changes:

* spawn_load and spawn_attach own the tenant state until they're done
* once load()/attach() calls are done ...
* if they failed, transition them to Broken directly (we know that
there's no background activity because we didn't call activate yet)
* if they succeed, call activate. We can make it infallible. How? Later.
* set_broken() and set_stopping() are changed to wait for spawn_load() /
spawn_attach() to finish.
* This sounds scary because it might hinder detach or shutdown, but
actually, concurrent attach+detach, or attach+shutdown, or
load+shutdown, or attach+shutdown were just racy before this PR.
     So, with this change, they're not anymore.
In the future, we can add a `CancellationToken` stored in Tenant to
cancel `load` and `attach` faster, i.e., make `spawn_load` /
`spawn_attach` transition them to Broken state sooner.

See the doc comments on TenantState for the state transitions that are
now possible.
It might seem scary, but actually, this patch reduces the possible state
transitions.

We introduce a new state `TenantState::Activating` to avoid grabbing the
`Tenant::timelines` lock inside the `send_modify` closure.
These were the humble beginnings of this PR (see Motivation section),
and I think it's still the right thing to have this `Activating` state,
even if we decide against async `Tenant::timelines` mutex. The reason is
that `send_modify` locks internally, and by moving locking of
Tenant::timelines out of the closure, the internal locking of
`send_modify` becomes a leaf of the lock graph, and so, we eliminate
deadlock risk.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-05-29 17:52:41 +03:00
Em Sharnoff
ccf653c1f4 re-enable file cache integration for VM compute node (#4338)
#4155 inadvertently switched to a version of the VM builder that leaves
the file cache integration disabled by default. This re-enables the
vm-informant's file cache integration.

(as a refresher: The vm-informant is the autoscaling component that sits
inside the VM and manages postgres / compute_ctl)

See also: https://github.com/neondatabase/autoscaling/pull/265
2023-05-28 10:22:45 -07:00
Heikki Linnakangas
2d6a022bb8 Don't allow two timeline_delete operations to run concurrently. (#4313)
If the timeline is already being deleted, return an error. We used to
notice the duplicate request and error out in
persist_index_part_with_deleted_flag(), but it's better to detect it
earlier. Add an explicit lock for the deletion.

Note: This doesn't do anything about the async cancellation problem
(github issue #3478): if the original HTTP request dropped, because the
client disconnected, the timeline deletion stops half-way through the
operation. That needs to be fixed, too, but that's a separate story.

(This is a simpler replacement for PR #4194. I'm also working on the
cancellation shielding, see PR #4314.)
2023-05-27 15:55:43 +03:00
Heikki Linnakangas
2cdf07f12c Refactor RequestSpan into a function.
Previously, you used it like this:

    |r| RequestSpan(my_handler).handle(r)

But I don't see the point of the RequestSpan struct. It's just a
wrapper around the handler function. With this commit, the call
becomes:

    |r| request_span(r, my_handler)

Which seems a little simpler.

At first I thought that the RequestSpan struct would allow "chaining"
other kinds of decorators like RequestSpan, so that you could do
something like this:

    |r| CheckPermissions(RequestSpan(my_handler)).handle(r)

But it doesn't work like that. If each of those structs wrap a handler
*function*, it would actually look like this:

    |r| CheckPermissions(|r| RequestSpan(my_handler).handle(r))).handle(r)

This commit doesn't make that kind of chaining any easier, but seems a
little more straightforward anyway.
2023-05-27 11:47:22 +03:00
Heikki Linnakangas
200a520e6c Minor refactoring in RequestSpan
Require the error type to be ApiError. It implicitly required that
anyway, because the function used error::handler, which downcasted the
error to an ApiError. If the error was in fact anything else than
ApiError, it would just panic. Better to check it at compilation time.

Also make the last-resort error handler more forgiving, so that it
returns an 500 Internal Server error response, instead of panicking,
if a request handler returns some other error than an ApiError.
2023-05-27 11:47:22 +03:00
Alex Chi Z
4e359db4c7 pgserver: spawn_blocking in compaction (#4265)
Compaction is usually a compute-heavy process and might affect other
futures running on the thread of the compaction. Therefore, we add
`block_in_place` as a temporary solution to avoid blocking other futures
on the same thread as compaction in the runtime. As we are migrating
towards a fully-async-style pageserver, we can revert this change when
everything is async and when we move compaction to a separate runtime.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-26 17:15:47 -04:00
Joonas Koivunen
be177f82dc Revert "Allow for higher s3 concurrency (#4292)" (#4356)
This reverts commit 024109fbeb for it
failing to be speed up anything, but run into more errors.

See: #3698.
2023-05-26 18:37:17 +03:00
Alexander Bayandin
339a3e3146 GitHub Autocomment: comment commits for branches (#4335)
## Problem

GitHub Autocomment script posts a comment only for PRs. It's harder
to debug failed tests on main or release branches.

## Summary of changes

- Change the GitHub Autocomment script to be able to post a comment to
either a PR or a commit of a branch
2023-05-26 14:49:42 +01:00
Heikki Linnakangas
a560b28829 Make new tenant/timeline IDs mandatory in create APIs. (#4304)
We used to generate the ID, if the caller didn't specify it. That's bad
practice, however, because network is never fully reliable, so it's
possible we create a new tenant but the caller doesn't know about it,
and because it doesn't know the tenant ID, it has no way of retrying or
checking if it succeeded. To discourage that, make it mandatory. The web
control plane has not relied on the auto-generation for a long time.
2023-05-26 16:19:36 +03:00
Joonas Koivunen
024109fbeb Allow for higher s3 concurrency (#4292)
We currently have a semaphore based rate limiter which we hope will keep
us under S3 limits. However, the semaphore does not consider time, so
I've been hesitant to raise the concurrency limit of 100.

See #3698.

The PR Introduces a leaky-bucket based rate limiter instead of the
`tokio::sync::Semaphore` which will allow us to raise the limit later
on. The configuration changes are not contained here.
2023-05-26 13:35:50 +03:00
Alexander Bayandin
2b25f0dfa0 Fix flakiness of test_metric_collection (#4346)
## Problem

Test `test_metric_collection` become flaky:

```
AssertionError: assert not ['2023-05-25T14:03:41.644042Z ERROR metrics_collection: failed to send metrics: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(18022), path: "/billing/api/v1/usage_events", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" })) }',
                            ...]
```
I suspect it is caused by having 2 places when we define
`httpserver_listen_address` fixture (which is internally used by
`pytest-httpserver` plugin)

## Summary of changes
- Remove the definition of `httpserver_listen_address` from
`test_runner/regress/test_ddl_forwarding.py` and keep one in
`test_runner/fixtures/neon_fixtures.py`
- Also remote unused `httpserver_listen_address` parameter from
`test_proxy_metric_collection`
2023-05-26 00:05:11 +03:00
Christian Schwarz
057cceb559 refactor: make timeline activation infallible (#4319)
Timeline::activate() was only fallible because `launch_wal_receiver`
was.

`launch_wal_receiver` was fallible only because of some preliminary
checks in `WalReceiver::start`.

Turns out these checks can be shifted to the type system by delaying
creatinon of the `WalReceiver` struct to the point where we activate the
timeline.

The changes in this PR were enabled by my previous refactoring that
funneled the broker_client from pageserver startup to the activate()
call sites.

Patch series:

- #4316
- #4317
- #4318
- #4319
2023-05-25 20:26:43 +02:00
sharnoff
ae805b985d Bump vm-builder v0.7.3-alpha3 -> v0.8.0 (#4339)
Routine `vm-builder` version bump, from autoscaling repo release. You
can find the release notes here:
https://github.com/neondatabase/autoscaling/releases/tag/v0.8.0
The changes are from v0.7.2 — most of them were already included in
v0.7.3-alpha3.

Of particular note: This (finally) fixes the cgroup issues, so we should
now be able to scale up when we're about to run out of memory.

**NB:** This has the effect of limit the DB's memory usage in a way it
wasn't limited before. We may run into issues because of that. There is
currently no way to disable that behavior, other than switching the
endpoint back to the k8s-pod provisioner.
2023-05-25 09:33:18 -07:00
Joonas Koivunen
85e76090ea test: fix ancestor is stopping flakyness (#4234)
Flakyness most likely introduced in #4170, detected in
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4232/4980691289/index.html#suites/542b1248464b42cc5a4560f408115965/18e623585e47af33.

Opted to allow it globally because it can happen in other tests as well,
basically whenever compaction is enabled and we stop pageserver
gracefully.
2023-05-25 16:22:58 +00:00
Alexander Bayandin
08e7d2407b Storage: use Postgres 15 as default (#2809) 2023-05-25 15:55:46 +01:00
Alex Chi Z
ab2757f64a bump dependencies version (#4336)
proceeding https://github.com/neondatabase/neon/pull/4237, this PR bumps
AWS dependencies along with all other dependencies to the latest
compatible semver.

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-25 10:21:15 -04:00
Christian Schwarz
e5617021a7 refactor: eliminate global storage_broker client state (#4318)
(This is prep work to make `Timeline::activate` infallible.)

This patch removes the global storage_broker client instance from the
pageserver codebase.

Instead, pageserver startup instantiates it and passes it down to the
`Timeline::activate` function, which in turn passes it to the
WalReceiver, which is the entity that actually uses it.

Patch series:

- #4316
- #4317
- #4318
- #4319
2023-05-25 16:47:42 +03:00
Christian Schwarz
83ba02b431 tenant_status: don't InternalServerError if tenant not found (#4337)
Note this also changes the status code to the (correct) 404. Not sure if
that's relevant to Console.

Context:
https://neondb.slack.com/archives/C04PSBP2SAF/p1684746238831449?thread_ts=1684742106.169859&cid=C04PSBP2SAF

Atop #4300 because it cleans up the mgr::get_tenant() error type and I want eyes on that PR.
2023-05-25 11:38:04 +02:00
Christian Schwarz
37ecebe45b mgr::get_tenant: distinguished error type (#4300)
Before this patch, it would use error type `TenantStateError` which has
many more error variants than can actually happen with
`mgr::get_tenant`.

Along the way, I also introduced `SetNewTenantConfigError` because it
uses `mgr::get_tenant` and also can only fail in much fewer ways than
`TenantStateError` suggests.

The new `page_service.rs`'s `GetActiveTimelineError` and
`GetActiveTenantError` types were necessary to avoid an `Other` variant
on the `GetTenantError`.

This patch is a by-product of reading code that subscribes to
`Tenant::state` changes.
Can't really connect it to any given project.
2023-05-25 11:37:12 +02:00
Sasha Krassovsky
6052ecee07 Add connector extension to send Role/Database updates to console (#3891)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-05-25 12:36:57 +03:00
Christian Schwarz
e11ba24ec5 tenant loops: operate on the Arc<Tenant> directly (#4298)
(Instead of going through mgr every iteration.)

The `wait_for_active_tenant` function's `wait` argument could be removed
because it was only used for the loop that waits for the tenant to show
up in the tenants map. Since we're passing the tenant in, we now longer
need to get it from the tenants map.

NB that there's no guarantee that the tenant object is in the tenants
map at the time the background loop function starts running. But the
tenant mgr guarantees that it will be quite soon. See
`tenant_map_insert` way upwards in the call hierarchy for details.

This is prep work to eliminate `subscribe_for_state_updates` (PR #4299 )

Fixes: #3501
2023-05-25 10:49:09 +02:00
Alex Chi Z
f276f21636 ci: use eu-central-1 bucket (#4315)
Probably increase CI success rate.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-25 00:00:21 +03:00
Alex Chi Z
7126197000 pagectl: refactor ctl and support dump kv in delta (#4268)
This PR refactors the original page_binutils with a single tool pagectl,
use clap derive for better command line parsing, and adds the dump kv
tool to extract information from delta file. This helps me better
understand what's inside the page server. We can add support for other
types of file and more functionalities in the future.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-24 19:36:07 +03:00
Christian Schwarz
afc48e2cd9 refactor responsibility for tenant/timeline activation (#4317)
(This is prep work to make `Timeline::activate()` infallible.)

The current possibility for failure in `Timeline::activate()` is the
broker client's presence / absence. It should be an assert, but we're
careful with these. So, I'm planning to pass in the broker client to
activate(), thereby eliminating the possiblity of its absence.

In the unit tests, we don't have a broker client. So, I thought I'd be
in trouble because the unit tests also called `activate()` before this
PR.

However, closer inspection reveals a long-standing FIXME about this,
which is addressed by this patch.

It turns out that the unit tests don't actually need the background
loops to be running. They just need the state value to be `Active`. So,
for the tests, we just set it to that value but don't spawn the
background loops.

We'll need to revisit this if we ever do more Rust unit tests in the
future. But right now, this refactoring improves the code, so, let's
revisit when we get there.

Patch series:

- #4316
- #4317
- #4318
- #4319
2023-05-24 16:54:11 +02:00
Christian Schwarz
df52587bef attach-time tenant config (#4255)
This PR adds support for supplying the tenant config upon /attach.

Before this change, when relocating a tenant using `/detach` and
`/attach`, the tenant config after `/attach` would be the default config
from `pageserver.toml`.
That is undesirable for settings such as the PITR-interval: if the
tenant's config on the source was `30 days` and the default config on
the attach-side is `7 days`, then the first GC run would eradicate 23
days worth of PITR capability.

The API change is backwards-compatible: if the body is empty, we
continue to use the default config.
We'll remove that capability as soon as the cloud.git code is updated to
use attach-time tenant config
(https://github.com/neondatabase/neon/issues/4282 keeps track of this).

unblocks https://github.com/neondatabase/cloud/issues/5092 
fixes https://github.com/neondatabase/neon/issues/1555
part of https://github.com/neondatabase/neon/issues/886 (Tenant
Relocation)

Implementation
==============

The preliminary PRs for this work were (most-recent to least-recent)

* https://github.com/neondatabase/neon/pull/4279
* https://github.com/neondatabase/neon/pull/4267
* https://github.com/neondatabase/neon/pull/4252
* https://github.com/neondatabase/neon/pull/4235
2023-05-24 17:46:30 +03:00
Alexander Bayandin
35bb10757d scripts/ingest_perf_test_result.py: increase connection timeout (#4329)
## Problem

Sometimes default connection timeout is not enough to connect to the DB
with perf test results, [an
example](https://github.com/neondatabase/neon/actions/runs/5064263522/jobs/9091692868#step:10:332).

Similar changes were made for similar scripts:
- For `scripts/flaky_tests.py` in
https://github.com/neondatabase/neon/pull/4096
- For `scripts/ingest_regress_test_result.py` in
https://github.com/neondatabase/neon/pull/2367 (from the very
begginning)

## Summary of changes
- Connection timeout increased to 30s for
`scripts/ingest_perf_test_result.py`
2023-05-24 10:11:24 -04:00
Alexander Bayandin
2a3f54002c test_runner: update dependencies (#4328)
## Problem

`pytest` 6 truncates error messages and this is not configured.
It's fixed in `pytest` 7, it prints the whole message (truncating limit
is higher) if `--verbose` is set (it's set on CI).

## Summary of changes
- `pytest` and `pytest` plugins are updated to their latest versions
- linters (`black` and `ruff`) are updated to their latest versions
- `mypy` and types are updated to their latest versions, new warnings
are fixed
- while we're here, allure updated its latest version as well
2023-05-24 12:47:01 +01:00
Joonas Koivunen
f3769d45ae chore: upgrade tokio to 1.28.1 (#4294)
no major changes, but this is the most recent LTS release and will be
required by #4292.
2023-05-24 08:15:39 +03:00
Arseny Sher
c200ebc096 proxy: log endpoint name everywhere.
Checking out proxy logs for the endpoint is a frequent (often first) operation
during user issues investigation; let's remove endpoint id -> session id mapping
annoying extra step here.
2023-05-24 09:11:23 +04:00
Konstantin Knizhnik
417f37b2e8 Pass set of wanted image layers from GC to compaction (#3673)
## Describe your changes

Right now the only criteria for image layer generation is number of
delta layer since last image layer.
If we have "stairs" layout of delta layers (see link below) then it can
happen that there a lot of old delta layers which can not be reclaimed
by GC because are not fully covered with image layers.

This PR constructs list of "wanted" image layers in GC (which image
layers are needed to be able to remove old layers)
and pass this list to compaction task which performs generation of image
layers.
So right now except deltas count criteria we also take in account
"wishes" of GC.

## Issue ticket number and link

See 
https://neondb.slack.com/archives/C033RQ5SPDH/p1676914249982519


## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-05-24 08:01:41 +03:00
sharnoff
7f1973f8ac bump vm-builder, use Neon-specific version (#4155)
In the v0.6.0 release, vm-builder was changed to be Neon-specific, so
it's handling all the stuff that Dockerfile.vm-compute-node used to do.

This commit bumps vm-builder to v0.7.3-alpha3.
2023-05-23 15:20:20 -07:00
Christian Schwarz
00f7fc324d tenant_map_insert: don't expose the vacant entry to the closure (#4316)
This tightens up the API a little.
Byproduct of some refactoring work that I'm doing right now.
2023-05-23 15:16:12 -04:00
Stas Kelvich
dad3519351 Add SQL-over-HTTP endpoint to Proxy
This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON
response structure resembling that of the node-postgres driver. This method,
using HTTP POST, achieves smaller amortized latencies in edge setups due to
fewer round trips and an enhanced open connection reuse by the v8 engine.

This update involves several intricacies:
1. SQL injection protection: We employed the extended query protocol, modifying
   the rust-postgres driver to send queries in one roundtrip using a text
   protocol rather than binary, bypassing potential issues like those identified
   in https://github.com/sfackler/rust-postgres/issues/1030.

2. Postgres type compatibility: As not all postgres types have binary
   representations (e.g., acl's in pg_class), we adjusted rust-postgres to
   respond with text protocol, simplifying serialization and fixing queries with
   text-only types in response.

3. Data type conversion: Considering JSON supports fewer data types than
   Postgres, we perform conversions where possible, passing all other types as
   strings. Key conversions include:
   - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain
     text)
   - postgres bool, null, text -> json bool, null, string
   - postgres array -> json array
   - postgres json and jsonb -> json object

4. Alignment with node-postgres: To facilitate integration with js libraries,
   we've matched the response structure of node-postgres, returning command tags
   and column oids. Command tag capturing was added to the rust-postgres
   functionality as part of this change.
2023-05-23 20:01:40 +03:00
dependabot[bot]
d75b4e0f16 Bump requests from 2.28.1 to 2.31.0 (#4305) 2023-05-23 14:54:51 +01:00
Christian Schwarz
4d41b2d379 fix: max_lsn_wal_lag broken in tenant conf (#4279)
This patch fixes parsing of the `max_lsn_wal_lag` tenant config item.
We were incorrectly expecting a string before, but the type is a
NonZeroU64.

So, when setting it in the config, the (updated) test case would fail
with

```
 E       psycopg2.errors.InternalError_: Tenant a1fa9cc383e32ddafb73ff920de5f2e6 will not become active. Current state: Broken due to: Failed to parse config from file '.../repo/tenants/a1fa9cc383e32ddafb73ff920de5f2e6/config' as pageserver config: configure option max_lsn_wal_lag is not a string. Backtrace:
```

So, not even the assertions added are necessary.

The test coverage for tenant config is rather thin in general.
For example, the `test_tenant_conf.py` test doesn't cover all the
options.

I'll add a new regression test as part of attach-time-tenant-conf PR
https://github.com/neondatabase/neon/pull/4255
2023-05-23 16:29:59 +03:00
Shany Pozin
d6cf347670 Add an option to set "latest gc cutoff lsn" in pageserver binutils (#4290)
## Problem
[#2539](https://github.com/neondatabase/neon/issues/2539)
## Summary of changes
Add support  for latest_gc_cutoff_lsn update in pageserver_binutils
2023-05-23 15:48:43 +03:00
Joonas Koivunen
6388454375 test: allow benign warning in relation to startup ordering (#4262)
Allow the warning which happens because the disk usage based eviction
runs before tenants are loaded. Example failure:

https://neon-github-public-dev.s3.amazonaws.com/reports/main/5001582237/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/a711f5baf8f8bd8d/
2023-05-22 11:59:54 +03:00
Alexander Bayandin
3837fca7a2 compute-node-image: fix postgis download (#4280)
## Problem

`osgeo.org` is experiencing some problems with DNS resolving which
breaks `compute-node-image` (because it can't download postgis)

## Summary of changes
- Add `140.211.15.30 download.osgeo.org` to /etc/hosts by passing it via
the container option
2023-05-19 15:34:22 +01:00
Dmitry Rodionov
7529ee2ec7 rfc: the state of pageserver tenant relocation (#3868)
Summarize current state of tenant relocation related activities and implementation ideas
2023-05-19 14:35:33 +03:00
Christian Schwarz
b391c94440 tenant create / update-config: reject unknown fields (#4267)
This PR enforces that the tenant create / update-config APIs reject
requests with unknown fields.

This is a desirable property because some tenant config settings control
the lifetime of user data (e.g., GC horizon or PITR interval).

Suppose we inadvertently rename the `pitr_interval` field in the Rust
code.
Then, right now, a client that still uses the old name will send a
tenant config request to configure a new PITR interval.
Before this PR, we would accept such a request, ignore the old name
field, and use the pageserver.toml default value for what the new PITR
interval is.
With this PR, we will instead reject such a request.

One might argue that the client could simply check whether the config it
sent has been applied, using the `/v1/tenant/.../config` endpoint.
That is correct for tenant create and update-config.

But, attach will soon [^1] grow the ability to have attach-time config
as well.
If we ignore unknown fields and fall back to global defaults in that
case, we risk data loss.
Example:
1. Default PITR in pageservers is 7 days.
2. Create a tenant and set its PITR to 30 days.
3. For 30 days, fill the tenant continuously with data.
4. Detach the tenant.
5. Attach tenant.

Attach must use the 30-day PITR setting in this scenario.
If it were to fall back to the 7-day default value, we would lose 23
days of PITR capability for the tenant.

So, the PR that adds attach-time tenant config will build on the
(clunky) infrastructure added in this PR

[^1]: https://github.com/neondatabase/neon/pull/4255

Implementation Notes
====================

This could have been a simple `#[serde(deny_unknown_fields)]` but sadly,
that is documented- but silent-at-compile-time-incompatible with
`#[serde(flatten)]`. But we are still using this by adding on outer struct and use unit tests to ensure it is correct.

`neon_local tenant config` now uses the `.remove()` pattern + bail if
there are leftover config args. That's in line with what
`neon_local tenant create` does. We should dedupe that logic in a future
PR.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: Alex Chi <iskyzh@gmail.com>
2023-05-18 21:16:09 -04:00
Alexander Bayandin
5abc4514b7 Un-xfail fixed tests on Postgres 15 (#4275)
- https://github.com/neondatabase/neon/pull/4182  
- https://github.com/neondatabase/neon/pull/4213
2023-05-18 22:38:33 +01:00
Alexander Bayandin
1b2ece3715 Re-enable compatibility tests on Postgres 15 (#4274)
- Enable compatibility tests for Postgres 15
- Also add `PgVersion::v_prefixed` property to return the version number
with, _guess what,_ v-prefix!
2023-05-18 19:56:09 +01:00
Anastasia Lubennikova
8ebae74c6f Fix handling of XLOG_XACT_COMMIT/ABORT:
Previously we didn't handle XACT_XINFO_HAS_INVALS and XACT_XINFO_HAS_DROPPED_STAT correctly,
which led to getting incorrect value of twophase_xid for records with XACT_XINFO_HAS_TWOPHASE.
This caused 'twophase file for xid {} does not exist' errors in test_isolation
2023-05-18 14:36:45 +01:00
Vadim Kharitonov
fc886dc8c0 Compile pg_cron extension 2023-05-17 17:43:50 +02:00
Heikki Linnakangas
72346e102d Document that our code is mostly not async cancellation-safe.
We had a hot debate on whether we should try to make our code
cancellation-safe, or just accept that it's not, and make sure that
our Futures are driven to completion. The decision is that we drive
Futures to completion. This documents the decision, and summarizes the
reasoning for that.

Discussion that sparked this:
https://github.com/neondatabase/neon/pull/4198#discussion_r1190209316
2023-05-17 17:29:54 +03:00
Joonas Koivunen
918cd25453 ondemand_download_large_rel: solve flakyness (#3697)
Disable background tasks to not get compaction downloading all layers
but also stop safekeepers before checkpointing, use a readonly endpoint.

Fixes: #3666

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-05-17 16:19:02 +02:00
Alex Chi Z
9767432cff add cargo neon shortcut for neon_local (#4240)
Add `cargo neon` as a shortcut for compiling and running `neon_local`.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-17 16:48:00 +03:00
Anastasia Lubennikova
0c4dc55a39 Disable recovery_prefetch for Neon hot standby.
Prefetching of blocks referenced in WAL doesn't make sense for us,
because Neon hot standby anyway ignores pages that are not in the shared_buffers.
2023-05-17 13:35:56 +01:00
Alexander Bayandin
7b9e8be6e4 GitHub Autocomment: add a command to run all failed tests (#4200)
- Group tests by Postgres version
- Merge different build types
- Add a command to GitHub comment on how to rerun all failed tests
(different command for different Postgres versions)
- Restore a link to a test report in the build summary
2023-05-17 11:38:41 +01:00
Christian Schwarz
89307822b0 mgmt api: share a single tenant config model struct in Rust and OpenAPI (#4252)
This is prep for https://github.com/neondatabase/neon/pull/4255

[1/X] OpenAPI: share a single definition of TenantConfig

DRYs up the pageserver OpenAPI YAML's representation of
tenant config.

All the fields of tenant config are now located in a model schema
called TenantConfig.

The tenant create & config-change endpoints have separate schemas,
TenantCreateInfo and TenantConfigureArg, respectively.
These schemas inherit from TenantConfig, using allOf 1.

The tenant config-GET handler's response was previously named
TenantConfig.
It's now named TenantConfigResponse.

None of these changes affect how the request looks on the wire.

The generated Go code will change for Console because the OpenAPI code
generator maps `allOf` to a Go struct embedding.
Luckily, usage of tenant config in Console is still very lightweigt,
but that will change in the near future.
So, this is a good chance to set things straight.

The console changes are tracked in
 https://github.com/neondatabase/cloud/pull/5046

[2/x]: extract the tenant config parts of create & config requests

[3/x]: code movement: move TenantConfigRequestConfig next to
TenantCreateRequestConfig

[4/x] type-alias TenantConfigRequestConfig = TenantCreateRequestConfig;
They are exactly the same.

[5/x] switch to qualified use for tenant create/config request api
models

[6/x] rename models::TenantConfig{RequestConfig,} and remove the alias

[7/x] OpenAPI: sync tenant create & configure body names from Rust code

[8/x]: dedupe the two TryFrom<...> for TenantConfOpt impls
The only difference is that the TenantConfigRequest impl does

```
        tenant_conf.max_lsn_wal_lag = request_data.max_lsn_wal_lag;
        tenant_conf.trace_read_requests = request_data.trace_read_requests;
```

and the TenantCreateRequest impl does

```
        if let Some(max_lsn_wal_lag) = request_data.max_lsn_wal_lag {
            tenant_conf.max_lsn_wal_lag = Some(max_lsn_wal_lag);
        }
        if let Some(trace_read_requests) = request_data.trace_read_requests {
            tenant_conf.trace_read_requests = Some(trace_read_requests);
        }
```

As far as I can tell, these are identical.
2023-05-17 12:31:17 +02:00
Alexander Bayandin
30fe310602 Code Coverage: upload reports to S3 (#4256)
## Problem

`neondatabase/zenith-coverage-data` is too big:
- It takes ~6 minutes to clone and push the repo
- GitHub fails to publish an HTML report to github.io

Part of https://github.com/neondatabase/neon/issues/3543

## Summary of changes
Replace pushing code coverage report to
`neondatabase/zenith-coverage-data` with uploading it to S3
2023-05-17 11:30:07 +01:00
0x29a
ef41b63db7 docs: add links to the doc for better read experience (#4258)
add links to the doc  and refine links for better read experience
2023-05-17 12:25:01 +03:00
Christian Schwarz
1bceceac5a add helper to debug_assert that current span has a TenantId (#4248)
We already have `debug_assert_current_span_has_tenant_and_timeline_id`.
Have the same for just TenantId.
2023-05-17 11:03:46 +02:00
Christian Schwarz
4431779e32 refactor: attach: use create_tenant_files + schedule_local_tenant_processing (#4235)
With this patch, the attach handler now follows the same pattern as
tenant create with regards to instantiation of the new tenant:

1. Prepare on-disk state using `create_tenant_files`.
2. Use the same code path as pageserver startup to load it into memory
and start background loops (`schedule_local_tenant_processing`).

It's a bit sad we can't use the
`PageServerConfig::tenant_attaching_mark_file_path` method inside
`create_tenant_files` because it operates in a temporary directory.
However, it's a small price to pay for the gained simplicity.

During implementation, I noticed that we don't handle failures post
`create_tenant_files` well. I left TODO comments in the code linking to
the issue that I created for this [^1].

Also, I'll dedupe the spawn_load and spawn_attach code in a future
commit.

refs https://github.com/neondatabase/neon/issues/1555
part of https://github.com/neondatabase/neon/issues/886 (Tenant
Relocation)

[^1]: https://github.com/neondatabase/neon/issues/4233
2023-05-16 12:53:17 -04:00
Alexander Bayandin
131343ed45 Fix regress-tests job for Postgres 15 on release branch (#4253)
## Problem

Compatibility tests don't support Postgres 15 yet, but we're still
trying to upload compatibility snapshot (which we do not collect).

Ref
https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129

## Summary of changes

Add `pg_version` parameter to `run-python-test-set` actions and do not
upload compatibility snapshot for Postgres 15
2023-05-16 17:18:56 +01:00
Joseph Koshakow
511b0945c3 Replace usages of wait_for_active_timeline (#4243)
This commit replaces all usages of connection_manager.rs:
wait_for_active_timeline with Timeline::wait_to_become_active.
wait_to_become_active is better and in the right module.

close https://github.com/neondatabase/neon/issues/4189

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-05-16 10:38:39 -04:00
Dmitry Rodionov
b7db62411b Make storage time operations an enum instead of an array (#4238)
Use an enum instead of an array. Before that there was no connection
between definition of the metric and point where it was used aside from
matching string literals. Now its possible to use IDE features to check
for references. Also this allows to avoid mismatch between set of
metrics that was defined and set of metrics that was actually used

What is interesting is that `init logical size` case is not used. I
think `LogicalSize` is a duplicate of `InitLogicalSize`. So removed the latter.
2023-05-16 16:54:29 +03:00
MMeent
efe9e131a7 Update vendored PostgreSQL to latest patch releases (#4208)
Conflicts:

- Changes in PG15's xlogrecovery.c resulted in non-substantial conflicts between
ecb01e6ebb5a67f3fc00840695682a8b1ba40461 and
aee72b7be903e52d9bdc6449aa4c17fb852d8708

Fixes #4207
2023-05-16 15:23:50 +02:00
Alex Chi Z
4a67f60a3b bump aws dep version (#4237)
This PR is simply the patch from
https://github.com/neondatabase/neon/issues/4008 except we enabled
`force_path_style` for custom endpoints. This is because at some
version, the s3 sdk by default uses the virtual-host style access, which
is not supported by MinIO in the default configuration. By enforcing
path style access for custom endpoints, we can pass all e2e test cases.

SDK 0.55 is not the latest version and we can bump it further later when
all flaky tests in this PR are resolved.

This PR also (hopefully) fixes flaky test
`test_ondemand_download_timetravel`.

close https://github.com/neondatabase/neon/issues/4008

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-16 09:09:50 -04:00
Alexander Bayandin
a65e0774a5 Increase shared memory size for regression test run (#4232)
Should fix flakiness caused by the error
```
FATAL:  could not resize shared memory segment "/PostgreSQL.3944613150" to 1048576 bytes: No space left on device
```
2023-05-16 14:06:47 +01:00
Dmitry Rodionov
a0b34e8c49 add create tenant metric to storage operations (#4231)
Add a metric to track time spent in create tenant requests

Originated from https://github.com/neondatabase/neon/pull/4204
2023-05-16 15:15:29 +03:00
bojanserafimov
fdc1c12fb0 Simplify github PR template (#4241) 2023-05-16 08:13:54 -04:00
Alexander Bayandin
0322e2720f Nightly Benchmarks: add neonvm to pgbench-compare (#4225) 2023-05-16 12:46:28 +01:00
Vadim Kharitonov
4f64be4a98 Add endpoint to connection string 2023-05-15 23:45:04 +02:00
Tristan Partin
e7514cc15e Wrap naked PQerrorMessage calls in libpagestore with pchomp (#4242) 2023-05-15 15:36:53 -05:00
Tristan Partin
6415dc791c Fix use-after-free issue in libpagestore (#4239)
## Describe your changes

`pageserver_disconnect()` calls `PQfinish()` which deallocates resources
on the connection structure. `PQerrorMessage()` hands back a pointer to
an allocated resource. Duplicate the error message prior to calling
`pageserver_disconnect()`.

## Issue ticket number and link

Fixes https://github.com/neondatabase/neon/issues/4214

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [x] Do not forget to reformat commit message to not include the above
checklist
2023-05-15 13:38:18 -05:00
Alexander Bayandin
a5615bd8ea Fix Allure reports for different benchmark jobs (#4229)
- Fix Allure report generation failure for Nightly Benchmarks
- Fix GitHub Autocomment for `run-benchmarks` label
(`build_and_test.yml::benchmarks` job)
2023-05-15 13:04:03 +01:00
Joonas Koivunen
4a76f2b8d6 upload new timeline index part json before 201 or on retry (#4204)
Await for upload to complete before returning 201 Created on
`branch_timeline` or when `bootstrap_timeline` happens. Should either of
those waits fail, then on the retried request await for uploads again.
This should work as expected assuming control-plane does not start to
use timeline creation as a wait_for_upload mechanism.

Fixes #3865, started from
https://github.com/neondatabase/neon/pull/3857/files#r1144468177

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-05-15 14:16:43 +03:00
Shany Pozin
9cd6f2ceeb Remove duplicated logic in creating TenantConfOpt (#4230)
## Describe your changes

Remove duplicated logic in creating TenantConfOpt in both TryFrom of
TenantConfigRequest and TenantCreateRequest
2023-05-15 10:08:44 +03:00
Heikki Linnakangas
2855c73990 Fix race condition after attaching tenant with branches. (#4170)
After tenant attach, there is a window where the child timeline is
loaded and accepts GetPage requests, but its parent is not. If a
GetPage request needs to traverse to the parent, it needs to wait for
the parent timeline to become active, or it might miss some records on
the parent timeline.

It's also possible that the parent timeline is active, but it hasn't
yet received all the WAL up to the branch point from the safekeeper.
This happens if a pageserver crashes soon after creating a timeline,
so that the WAL leading to the branch point has not yet been uploaded
to remote storage. After restart, the WAL will be re-streamed and
ingested from the safekeeper, but that takes a while. Because of that,
it's not enough to check that the parent timeline is active, we also
need to wait for the WAL to arrive on the parent timeline, just like
at the beginning of GetPage handling. We probably should change the
behavior at create_timeline so that a timeline can only be created
after all the WAL up to the branch point has been uploaded to remote
storage, but that's not currently the case and out of scope for this
PR (see github issue #4218).

@NanoBjorn encountered this while working on tenant migration. After
migrating a tenant with a parent and child branch, connecting to the
child branch failed with an error like:

```
FATAL:  "base/16385" is not a valid data directory
DETAIL:  File "base/16385/PG_VERSION" is missing.
```

This commit adds two tests that reproduce the bug, with slightly
different symptoms.
2023-05-13 10:44:11 +03:00
Christian Schwarz
edcf4d61a4 distinguish imitated from real size::gather_input calls in metrics (#4224)
Before this PR, the gather_inputs() calls made to imitate synthetic size
calculation accesses were accounted towards the real logical size
calculation metric.

This PR forces all callers to declare the cause for making logical size
calculations, making the decision which cause counts towards which
metric explicit.

This is follow-up to

```
commit 1d266a6365
Author: Christian Schwarz <christian@neon.tech>
Date:   Thu May 11 16:09:29 2023 +0200

    logical size calculation metrics: differentiate regular vs imitated (#4197)
```

After merging this patch, I hope to be able to explain why we have ca
30x more "logical size" ops in prod than "imitate logical size" for any
given observation interval.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-12 17:57:33 +00:00
Christian Schwarz
a2a9c598be add counter metric that increases whenever a background loop overruns its period (#4223)
We already have the warn!() log line for this condition. This PR adds a
corresponding metric on which we can have a dedicated alert. Cheaper and
more reliable than alerting on the logs, because, we run into log rate
limits from time to time these days.

refs https://github.com/neondatabase/neon/issues/4222
2023-05-12 19:00:06 +03:00
Alexander Bayandin
bb06d281ea Run regressions tests on both Postgres 14 and 15 (#4192)
This PR adds tests runs on Postgres 15 and created unified Allure report
with results for all tests.

- Split `.github/actions/allure-report` into
`.github/actions/allure-report-store` and
`.github/actions/allure-report-generate`
- Add debug or release pytest parameter for all tests (depending on
`BUILD_TYPE` env variable)
- Add Postgres version as a pytest parameter for all tests (depending on
`DEFAULT_PG_VERSION` env variable)
- Fix `test_wal_restore` and `restore_from_wal.sh` to support path with
`[`/`]` in it (fixed by applying spellcheck to the script and fixing all
warnings), `restore_from_wal_archive.sh` is deleted as unused.
- All known failures on Postgres 15 marked with xfail
2023-05-12 15:28:51 +01:00
Christian Schwarz
5869234290 logical size calculation: spawn with in_current_span (#4196)
While investigating https://github.com/neondatabase/neon/issues/4154 I
found that the `Calculating logical size for timeline` tracing events
created from within the logical size computation code are not always
attributable to the background task that caused it.

My goal is to be able to distinguish in the logs whether a `Calculating
logical size for timeline` was logged as part of a real synthetic size
calculation VS an imitation by the eviction task.

I want this distinction so I can prove my assumption that the disk IO
peaks which we see every 24h on prod are due to eviction's imitate
synthetic size calculations.

The alternative here, which I would have preferred, but is more work:
link RequestContext's into a child->parent list and dump this list when
we log `Calculating logical size for timeline`.

I would have preferred that over what we have in this PR because,
technically, the ondemand logical size computation can outlive the
caller that spawned it. This is against the idea of correctly nested
spans.

I guess in OpenTelemetry land, the correct modelling would be a link
between the caller's span and the task_mgr task's span.

Anyways, I think the case where we hang up on the spawned ondemand
logical size calculation is quite rare. So, I'm willing to tolerate
incorrectly nested spans for these edge-cases.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-12 15:36:30 +02:00
Rahul Modpur
ecfe4757d3 fix bogus at character context in log messages
Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
2023-05-11 23:31:42 +01:00
Christian Schwarz
845e296562 eviction: add global histogram for iteration durations (#4212)
I would like to know whether and by how much the eviction iterations
spike in the $period-sized window that happens every $threshold , when
all the timelines do the imitate accesses.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-11 18:02:19 +03:00
Heikki Linnakangas
1988cc5527 Fix failpoint_sleep_millis_async without use std::time::Duration (#4195)
I tried to use failpoint_sleep_millis_async(...) in a source file that
didn't do `use std::time::Duration`, and got a compiler error:

```
error[E0433]: failed to resolve: use of undeclared type `Duration`
   --> pageserver/src/walingest.rs:316:17
    |
316 |                 utils::failpoint_sleep_millis_async!("wal-ingest-logical-message-sleep");
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in this scope
    |
    = note: this error originates in the macro `utils::failpoint_sleep_millis_async` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing one of these items
    |
24  | use chrono::Duration;
    |
24  | use core::time::Duration;
    |
24  | use humantime::Duration;
    |
24  | use serde_with::__private__::Duration;
    |
      and 2 other candidates
```
2023-05-11 17:53:42 +03:00
Christian Schwarz
1d266a6365 logical size calculation metrics: differentiate regular vs imitated (#4197)
I want this distinction so I can prove my assumption that the disk IO
peaks which we see every 24h on prod are due to eviction's imitate
synthetic size calculations.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-11 17:09:29 +03:00
Christian Schwarz
80522a1b9d replace has_in_progress_downloads with new attachment_status field (#4168)
Control Plane currently [^1] polls for `has_in_progress_downloads ==
false` after /attach to determine that an attach operation succeeded.

As pointed out in the OpenAPI spec as of neon#4151, polling for
`has_in_progress_downloads` is incorrect.

This patch changes the situation by
- removing `has_in_progress_downloads`
- adding a new field `attachment_status.`
- changing instructions for `/attach` to poll for `attachment_status ==
attached`.

This makes the instructions in `/attach` actionable for Control Plane.
NB that we don't expose the TenantState in the OpenAPI docs, even though
we expose it in the endpoint. That is with good reason because we don't
want to commit to a fixed set of tenant states forever. Hence, the
separate `attachment_status` field that exposes the bare minimum
required to make /attach + subsequent polling 100% safe wrt split brain.

It would have been nice to report failures explicitly, but the problem
is that we lose that state when we restart. So, we return `attached`
upon attach failure. The tenant is Broken in that case, causing Control
Plane's subsequent health check will fail. Control Plane can roll back
the relocation operation then.
NB: the reliance on the subsequent health check is no change to what we
had before this patch!
NB: we can always add additional TenantAttachmentStatus'es in the future
to communicate failure.

This PR also moves the attach-marker file's creation to the API
handler's synchronous part. That was done to avoid the need to
distinguish
* `Attaching but marker not yet written => AttachmentStatus::Maybe` from
* `Attaching, marker written, but attach failed for other reason =>
AttachmentStatus::Attached`

Coincidentally, this also adds more transactionality to the /attach API
because we only return 202 once we've written the marker file. But, in
the end, it doesn't affect how the control plane interacts with us or
how it needs to do retries. So, we don't mention any of this in the API
docs.

[^1]: The one-click tenant relocation PR cloud#4740, currently WIP, is
      the first real user.
2023-05-11 16:53:46 +03:00
Joonas Koivunen
ecced13d90 try: higher page_service timeouts to isolate an issue (#4206)
See #4205.
2023-05-11 16:14:42 +03:00
Alexander Bayandin
59510f6449 scripts/flaky_tests.py: use retriesStatusChange from Allure 2023-05-10 16:59:03 +01:00
Alexander Bayandin
7fc778d251 GitHub Autocomment: fix flaky test notifications 2023-05-10 16:59:03 +01:00
Alexander Bayandin
1d490b2311 Make benchmark_fixture less noisy 2023-05-10 16:59:03 +01:00
Dmitry Rodionov
eb3a8be933 keep track of timeline deletion status in IndexPart to prevent timeline resurrection (#3919)
Before this patch, the following sequence would lead to the resurrection of a deleted timeline:

- create timeline
- wait for its index part to reach s3
- delete timeline
- wait an arbitrary amount of time, including 0 seconds
- detach tenant
- attach tenant
- the timeline is there and Active again

This happens because we only kept track of the deletion in the tenant dir (by deleting the timeline dir) but not in S3.

The solution is to turn the deleted timeline's IndexPart into a tombstone.
The deletion status of the timeline is expressed in the `deleted_at: Option<NativeDateTime>` field of IndexPart.
It's `None` while the timeline is alive and `Some(deletion time stamp)` if it is deleted.

We change the timeline deletion handler to upload this tombstoned IndexPart.
The handler does not return success if the upload fails.

Coincidentally, this fixes the long-stanging TODO about the `std::fs::remove_dir_all` being not atomic.
It need not be atomic anymore because we set the `deleted_at=Some()` before starting the `remove_dir_all`.

The tombstone is in the IndexPart only, not in the `metadata`.
So, we only have the tombstone and the `remove_dir_all` benefits mentioned above if remote storage is configured.
This was a conscious trade-off because there's no good format evolution story for the current metadata file format.

The introduction of this additional step into `delete_timeline` was painful because delete_timeline needs to be
1. cancel-safe
2. idempotent
3. safe to call concurrently
These are mostly self-inflicted limitations that can be avoided by using request-coalescing.
PR https://github.com/neondatabase/neon/pull/4159 will do that.

fixes https://github.com/neondatabase/neon/issues/3560

refs https://github.com/neondatabase/neon/issues/3889 (part of tenant relocation)


Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-05-10 10:27:12 +02:00
Christian Schwarz
3ec52088dd eviction_task: tracing::instrument the imitate-access calls (#4180)
Currently, if we unexpectly download from the eviction task, the log
lines look like what we have in
https://github.com/neondatabase/neon/issues/4154

```
2023-05-04T14:42:57.586772Z  WARN eviction_task{tenant_id=$TENANT timeline_id=$TIMELINE}:eviction_iteration{policy_kind="LayerAccessThreshold"}: unexpectedly on-demand downloading remote layer remote $TIMELINE/000000067F000032AC0000400C00FFFFFFFF-000000067F000032AC000040140000000008__0000000001696070-0000000003DC76E9 for task kind Eviction
```

We know these are caused by the imitate accesses.
But we don't know which one (my bet is on update_gc_info).

I didn't want to pollute the other tasks' logs with the additional
spans, so, using `.instrument()` when we call non-eviction-task code.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-09 18:16:22 +02:00
Heikki Linnakangas
66b06e416a Pass tracing context in env variables instead of the spec file. (#4174)
If compute_ctl is launched without a spec file, it fetches it from the
control plane with an HTTP request. We cannot get the startup tracing
context from the compute spec in that case, because we don't have it
available on start. We could still read the tracing context from the
compute spec after we have fetched it, but that would leave the fetch
itself out of the context. Pass the tracing context in environment
variables instead.
2023-05-09 17:08:02 +03:00
Arthur Petukhovsky
d62315327a Allow parallel backup in safekeepers (#4177)
Add `wal_backup_parallel_jobs` cmdline argument to specify the max count
of parallel segments upload. New default value is 5, meaning that
safekeepers will try to upload 5 segments concurrently if they are
available. Setting this value to 1 will be equivalent to the sequential
upload that we had before.

Part of the https://github.com/neondatabase/neon/issues/3957
2023-05-09 12:20:35 +03:00
Anastasia Lubennikova
4bd7b1daf2 Bump vendor/postgres:
Fix entering hot standby mode for Neon postgres v15
2023-05-08 21:25:47 +01:00
Sergey Melnikov
0d3d022eb1 Remove deploy workflows (#4157)
## Describe your changes
Removing deploy workflows (moving to aws repo)
2023-05-08 17:30:16 +02:00
Raouf Chebri
e85cbddd2e Update neondatabase banner in README.md (#4176)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-05-08 17:12:42 +02:00
Anton Chaporgin
51ff9f9359 pg-sni-router nlb is internal (#4164) 2023-05-08 18:03:50 +03:00
Vadim Kharitonov
0f8b2d8f0a Compile kq_imcx extension (#3568)
## Describe your changes
Compiles kq_imcx extension

At this moment, there are some issues with the extension:
1. I'm cloning it directly from the master branch. It's better to fetch
tag/archive
2. PG14:
```
postgres=# CREATE EXTENSION IF NOT EXISTS kq_imcx;
postgres=# select * from kq_calendar_cache_info();
2023-02-08 13:55:22.853 UTC [412] ERROR:  relation "ketteq.slice_type" does not exist at character 34
2023-02-08 13:55:22.853 UTC [412] QUERY:  select min(s.id), max(s.id) from ketteq.slice_type s
2023-02-08 13:55:22.853 UTC [412] STATEMENT:  select * from kq_calendar_cache_info();
ERROR:  relation "ketteq.slice_type" does not exist
LINE 1: select min(s.id), max(s.id) from ketteq.slice_type s
```
3. PG15:
`cannot request additional shared memory outside shmem_request_hook`

Note: I don't think we need to publish info about this extension in the
docs.

## Issue ticket number and link
neondatabase/cloud#3387
2023-05-08 15:56:08 +02:00
Gleb Novikov
9860d59aa2 Public docker image repository by default 2023-05-08 15:51:54 +04:00
Christian Schwarz
411c71b486 document current tenant attach API semantics (#4151)
We currently return 202 as soon as the tenant is allocated in memory
before we've written out the marker file. So, the /attach API currently
does not have a transactional character. For example, it can happen that
we respond with a 202 and then crash before writing out the marker file.
In such a case, it is important that the client

1. observes the lost attach (by polling tenant status and observing 404)
2. and consequently retries the attach.

It has to do it in this loop until it observes the tenant as "Active" in
the tenant status. If the client doesn't follow this protocol and
instead goes to another pageserver to attach the tenant, we risk a
split-brain situation where both the first and second pageserver write
to the tenant's S3 state.

The improved description highlights the consequences of this behavior
for clients that use the /attach endpoint.

The tenant relocation that is currently being implemented in cloud#4740
implements retries of Attach and it does poll afterwards, but, it polls
`has_in_progress_downloads`.
That is incorrect, as described in the patch body.

The motivation for this write-up is that, in a future PR, we'll extend
the /attach endpoint with an option to provide the tenant config. If we
decide to leave the non-transactional behavior of /attach unmodified, we
will be able to avoid persisting the tenant config. Conversely, if we
decide that the /attach API should become transactional, we'll need to
persist the tenant config in the attach-marker-file before acknowledging
receipt of the /attach operation.

refs https://github.com/neondatabase/cloud/pull/4740
refs https://github.com/neondatabase/neon/issues/2238
refs https://github.com/neondatabase/neon/issues/1555
2023-05-05 19:32:41 +03:00
Alexey Kondratov
dd4fd89dc6 [compute_ctl] Do not initialize last_active on start (#4137)
Our scale-to-zero logic was optimized for short auto-suspend intervals,
e.g. minutes or hours. In this case, if compute was restarted by k8s due
to some reason (OOM, k8s node went down, pod relocation, etc.),
`last_active` got bumped, we start counting auto-suspend timeout again.
It's not a big deal, i.e. we suspend completely idle compute not after 5
minutes, but after 10 minutes or so.

Yet, some clients may want days or even weeks. And chance that compute
could be restarted during this interval is pretty high, but in this case
we could be not able to suspend some computes for weeks.

After this commit, we won't initialize `last_active` on start, so
`/status` could return an unset attribute. This means that there was no
user activity since start. Control-plane should deal with it by taking
`max()` out of all available activity timestamps: `started_at`,
`last_active`, etc.

compute_ctl part of neondatabase/cloud#4853
2023-05-05 11:45:37 +02:00
Alexander Bayandin
653e633c59 test_runner: add --pg-version pytest argument (#4037)
- allows setting Postgres version for testing using --pg-version argument
- fixes tests for the non-default Postgres version.
2023-05-05 02:57:47 +03:00
Alexander Bayandin
291b4f0d41 Update client libs for test_runner/pg_clients to their latest versions (#4092)
Also, use Workaround D for `swift/PostgresClientKitExample`, 
which supports neither SNI nor connections options
2023-05-04 18:22:04 +01:00
Christian Schwarz
88f39c11d4 refactor: the code that builds TenantConfOpt from mgmt API requests (#4152)
- extract code that builds TenantConfOpt from requests into a From<>
impl
- move map_err(ApiError::BadRequest) into callers
2023-05-04 18:10:40 +03:00
Heikki Linnakangas
b627fa71e4 Make read-only replicas explicit in compute spec (#4136)
This builds on top of PR #4058, and supersedes #4018
2023-05-04 17:41:42 +03:00
Christian Schwarz
7dd9553bbb eviction: regression test + distinguish layer write from map insert (#4005)
This patch adds a regression test for the threshold-based layer
eviction.
The test asserts the basic invariant that, if left alone, the residence
statuses will stabilize, with some layers resident and some layers
evicted.
Thereby, we cover both the aspect of last-access-time-threshold-based
eviction, and the "imitate access" hacks that we put in recently.

The aggressive `period` and `threshold` values revealed a subtle bug
which is also fixed in this patch.
The symptom was that, without the Rust changes of this patch, there
would be occasional test failures due to `WARN... unexpectedly
downloading` log messages.
These log messages were caused by the "imitate access" calls of the
eviction task.
But, the whole point of the "imitate access" hack was to prevent
eviction of the layers that we access there.
After some digging, I found the root cause, which is the following race
condition:

1. Compact: Write out an L1 layer from several L0 layers. This records
residence event `LayerCreate` with the current timestamp.
2. Eviction: imitate access logical size calculation. This accesses the
L0 layers because the L1 layer is not yet in the layer map.
3. Compact: Grab layer map lock, add the new L1 to layer map and remove
the L0s, release layer map lock.
4. Eviction: observes the new L1 layer whose only activity timestamp is
the `LayerCreate` event.

The L1 layer had no chance of being accessed until after (3).
So, if enough time passes between (1) and (3), then (4) will observe a
layer with `now-last_activity > threshold` and evict it

The fix is to require the first `record_residence_event` to happen while
we already hold the layer map lock.
The API requires a ref to a `BatchedUpdates` as a witness that we are
inside a layer map lock.
That is not fool-proof, e.g., new call sites for `insert_historic` could
just completely forget to record the residence event.
It would be nice to prevent this at the type level.
In the meantime, we have a rate-limited log messages to warn us, if such
an implementation error sneaks in in the future.

fixes https://github.com/neondatabase/neon/issues/3593
fixes https://github.com/neondatabase/neon/issues/3942

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-05-04 16:16:48 +02:00
Heikki Linnakangas
b5d64a1e32 Rename field, to match field name in XLogData struct and in rust-postgres (#4149)
The field means the same thing as the `wal_end` field in the XLogData
struct. And in the postgres-protocol crate's corresponding
PrimaryKeepAlive struct, it's also called `wal_end`. Let's be
consistent.

As noted by Arthur at
https://github.com/neondatabase/neon/pull/4144#pullrequestreview-1411031881
2023-05-04 14:41:15 +03:00
Christian Schwarz
f9839a0dd9 import_basebackup_from_tar: don't load local layers twice (#4111)
PR #4104 removed these bits as part of a revert of a larger change.

follow-up to
https://github.com/neondatabase/neon/pull/4104#discussion_r1180444952

---

Let's not merge this before the release.
2023-05-04 09:23:49 +02:00
Arthur Petukhovsky
ce1bbc9fa7 Always send the latest commit_lsn in send_wal (#4150)
When a new connection is established to the safekeeper, the 'end_pos'
field is initially set to Lsn::INVALID (i.e 0/0). If there is no WAL to
send to the client, we send KeepAlive messages with Lsn::INVALID. That
confuses the pageserver: it thinks that safekeeper is lagging very much
behind the tip of the branch, and will reconnect to a different
safekeeper. Then the same thing happens with the new safekeeper, until
some WAL is streamed which sets 'end_pos' to a valid value.

This fix always sets `end_pos` to the most recent `commit_lsn` value.
This is useful to send the latest `commit_lsn` to the receiver, so it
will know how advanced this safekeeper compared to the others.

Fixes https://github.com/neondatabase/neon/issues/3972
Supersedes https://github.com/neondatabase/neon/pull/4144
2023-05-04 00:07:45 +03:00
Alexander Bayandin
b114ef26c2 GitHub Autocomment: add a note if no tests were run (#4109)
- Always (if not cancelled) add a comment to a PR
- Mention in the comment if no tests were run / reports were not
generated.
2023-05-03 15:38:49 +01:00
Arthur Petukhovsky
3ceef7b17a Add more safekeeper and walreceiver metrics (#4142)
Add essential safekeeper and pageserver::walreceiver metrics. Mostly
counters, such as the number of received queries, broker messages,
removed WAL segments, or connection switches events in walreceiver.
Also logs broker push loop duration.
2023-05-03 17:07:41 +03:00
Kirill Bulatov
586e6e55f8 Print WalReceiver context on WAL waiting timeout (#4090)
Closes https://github.com/neondatabase/neon/issues/2106

Before:
```
Extracting base backup to create postgres instance: path=/Users/someonetoignore/work/neon/neon_main/test_output/test_pageserver_lsn_wait_error_safekeeper_stop/repo/endpoints/ep-2/pgdata port=15017

              stderr: command failed: page server 'basebackup' command failed

Caused by:
    0: db error: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50
    1: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50

Stack backtrace:
```

After:
```
Extracting base backup to create postgres instance: path=/Users/someonetoignore/work/neon/neon/test_output/test_pageserver_lsn_wait_error_safekeeper_stop/repo/endpoints/ep-2/pgdata port=15011

              stderr: command failed: page server 'basebackup' command failed

Caused by:
    0: db error: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50, WalReceiver status (update 2023-04-26 14:20:39): streaming WAL from node 12346, commit|streaming Lsn: 0/A2C3F58|0/A2C3F58, safekeeper candidates (id|update_time|commit_lsn): [(12348|14:20:40|0/A2C3F58), (12346|14:20:40|0/A2C3F58), (12347|14:20:40|0/A2C3F58)]
    1: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50, WalReceiver status (update 2023-04-26 14:20:39): streaming WAL from node 12346, commit|streaming Lsn: 0/A2C3F58|0/A2C3F58, safekeeper candidates (id|update_time|commit_lsn): [(12348|14:20:40|0/A2C3F58), (12346|14:20:40|0/A2C3F58), (12347|14:20:40|0/A2C3F58)]

Stack backtrace:
```

As the issue requests, the PR adds the context in logs only, but I think
we should expose the context via HTTP management API similar way — it
should be simple with the new API, but better be done in a separate PR.

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2023-05-03 16:25:19 +03:00
Anton Chaporgin
db81242f4a add debug to pg-sni-router install (#4143) 2023-05-03 16:14:16 +03:00
dependabot[bot]
39ca7c7c09 Bump flask from 2.1.3 to 2.2.5 (#4138) 2023-05-03 10:40:35 +01:00
Heikki Linnakangas
ecc0cf8cd6 Treat EPIPE as an expected error. (#4141)
If the other end of a TCP connection closes its read end of the socket,
you get an EPIPE when you try to send. I saw that happen in the CI once:


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4136/release/4869464644/index.html#suites/c19bc2126511ef8cb145cca25c438215/7ec87b016c0b4b50/

```
2023-05-03T07:53:22.394152Z ERROR Task 'serving compute connection task' tenant_id: Some(c204447079e02e7ba8f593cb8bc57e76), timeline_id: Some(b666f26600e6deaa9f43e1aeee5bacb7) exited with error: Postgres connection error

Caused by:
    Broken pipe (os error 32)

Stack backtrace:
   0: pageserver::page_service::page_service_conn_main::{{closure}}
             at /__w/neon/neon/pageserver/src/page_service.rs:282:17
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::future::future::Future>::poll
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panic/unwind_safe.rs:296:9
      <futures_util::future::future::catch_unwind::CatchUnwind<Fut> as core::future::future::Future>::poll::{{closure}}
             at /__w/neon/neon/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.28/src/future/future/catch_unwind.rs:36:42
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panic/unwind_safe.rs:271:9
   ...
```

In the passing, add a comment to explain what the "expected" in the
`is_expected_io_error` function means.
2023-05-03 12:03:13 +03:00
Joonas Koivunen
faebe3177b wal_craft: cleanup to keep editable (#4121)
wal_craft had accumulated some trouble by using `use anyhow::*;`. Fixes
that, removes redundant conversions (never need to convert a Path to
OsStr), especially at the `Process` args.

Originally in #4100 but we merged a later PR instead for the fixes. I
dropped the `postmaster.pid` polling in favor of just having a longer
connect timeout.
2023-05-03 11:11:55 +03:00
Joonas Koivunen
474f69c1c0 fix: omit cancellation logging when panicking (#4125)
noticed while describing `RequestSpan`, this fix will omit the otherwise
logged message about request being cancelled when panicking in the
request handler. this was missed on #4064.
2023-05-03 10:56:49 +03:00
Konstantin Knizhnik
47521693ed Fix file_cache build warnings (#4118)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-05-02 22:28:21 +03:00
Heikki Linnakangas
4d55d61807 Store basic endpoint info in endpoint.json file. (#4058)
It's more convenient than parsing the postgresql.conf file.

Extracted from PR #3886. I started working on another patch (to make it
safe to run two "neon_local endpoint create" commands concurrently), and
realized that this change will make that simpler too.
2023-05-02 20:36:11 +03:00
Sergey Melnikov
093fafd6bd Deploy pg-sni-router (#4132) 2023-05-01 17:18:45 +02:00
Shany Pozin
e3ae2661ee Add 2 new sets of safekeepers to us-west2 (#4130)
## Describe your changes
TF output:       
module.safekeeper-us-west-2.aws_instance.this["3"]: Creation complete
after 13s [id=i-089f6b9ef426dff76]
module.safekeeper-us-west-2.aws_instance.this["4"]: Creation complete
after 13s [id=i-0fe6bf912c4710c82]
module.safekeeper-us-west-2.aws_instance.this["5"]: Creation complete
after 13s [id=i-0a83c1c46d2b4e409]
module.safekeeper-us-west-2.aws_instance.this["6"]: Creation complete
after 13s [id=i-0fef5317b8fdc9f8d]
module.safekeeper-us-west-2.aws_instance.this["7"]: Creation complete
after 13s [id=i-0be739190d4289bf9]
module.safekeeper-us-west-2.aws_instance.this["8"]: Creation complete
after 13s [id=i-00e851803669e5cfe]
2023-05-01 14:22:59 +03:00
Anton Chaporgin
7e368f3edf build pg-sni-router binary (#4129)
## Describe your changes
This adds pg-sni-router binary to the build pipeline and neon image.

## Issue ticket number and link

https://github.com/neondatabase/cloud/issues/1461
2023-05-01 12:14:31 +02:00
Joonas Koivunen
138bc028ed fix: quick and dirty panic avoidance on drop path (#4128)
Sentry caught a panic on load testing server related to metric removals:
https://neondatabase.sentry.io/issues/4142396994

Turn the `expect` into logging, but also add logging for each removal,
so we could identify in which cases we do double-remove. The
double-removal (or never adding) cause is not obvious or expected.

Original added in #3837.
2023-05-01 11:54:09 +03:00
Stas Kelvich
d53f81b449 Add one more pageserver to staging 2023-04-30 22:39:34 +03:00
Joonas Koivunen
6f472df0d0 fix: restore not logging ignored io errors as errors (#4120)
the fix is rather indirect due to the accidental applying of too much
`anyhow`: if handle_pagerequests returns a `QueryError` it will now be
bubbled up as-is `QueryError`. `QueryError` allows the inner
`std::io::Error` to be inspected and thus we can filter certain error
kinds which are perfectly normal without a huge log message.

for a very long time (b2f5102) the errors were converted to `anyhow` by
mistake which made this difficult or impossible, even though from the
types it would *appear* that we propagate wrapped `std::io::Error`s and
can filter them.

Fixes #4113, most likely filters some other errors as well.
2023-04-30 14:34:55 +03:00
Rahul Patil
21eb944b5e Staging: Add safekeeper nodes [3-8] to eu-west-1 (#4123) 2023-04-29 15:25:57 +03:00
Arthur Petukhovsky
95244912c5 Override sharded-slab to increase MAX_THREADS (#4122)
Add patch directive to Cargo.toml to use patched version of
sharded-slab:
98d16753ab

Patch changes the MAX_THREADS limit from 4096 to 32768. This is a
temporary workaround for using tracing from many threads in safekeepers
code, until async safekeepers patch is merged to the main.

Note that patch can affect other rust services, not only the safekeeper
binary.
2023-04-29 13:31:04 +03:00
Shany Pozin
2617e70008 Add 4 new Pageservers for retool launch (#4115)
## Describe your changes
Adding 4 new pageserves to us-west

TF apply output:
module.pageserver-us-west-2.aws_instance.this["7"]: Creation complete
after 21s [id=i-02eec9b40617db5bc]
module.pageserver-us-west-2.aws_instance.this["5"]: Creation complete
after 21s [id=i-00ca6417c7bf96820]
module.pageserver-us-west-2.aws_instance.this["4"]: Creation complete
after 21s [id=i-013263dd1c239adcc]
module.pageserver-us-west-2.aws_instance.this["6"]: Creation complete
after 22s [id=i-01cdf7d2bc1433b6a]
2023-04-29 11:42:52 +02:00
Arthur Petukhovsky
8543485e92 Pull clone timeline from peer safekeepers (#4089)
Add HTTP endpoint to initialize safekeeper timeline from peer
safekeepers. This is useful for initializing new safekeeper to replace
failed safekeeper. Not fully "correct" in all cases, but should work in
most.

This code is not suitable for production workloads but can be tested on
staging to get started. New endpoint is separated from usual cases and
should not affect anything if no one explicitly uses a new endpoint. We
can rollback this commit in case of issues.
2023-04-28 14:20:46 +00:00
Joonas Koivunen
ec53c5ca2e revert: "Add check for duplicates of generated image layers" (#4104)
This reverts commit 732acc5.

Reverted PR: #3869

As noted in PR #4094, we do in fact try to insert duplicates to the
layer map, if L0->L1 compaction is interrupted. We do not have a proper
fix for that right now, and we are in a hurry to make a release to
production, so revert the changes related to this to the state that we
have in production currently. We know that we have a bug here, but
better to live with the bug that we've had in production for a long
time, than rush a fix to production without testing it in staging first.

Cc: #4094, #4088
2023-04-28 17:20:18 +03:00
Stas Kelvich
94d612195a bump rust-postgres version, after merging PR in rust-postgres 2023-04-28 17:15:43 +03:00
Stas Kelvich
b1329db495 fix sigterm handling 2023-04-28 17:15:43 +03:00
Stas Kelvich
5bb971d64e fix more python tests 2023-04-28 17:15:43 +03:00
Stas Kelvich
0364f77b9a fix python styling 2023-04-28 17:15:43 +03:00
Stas Kelvich
4ac6a9f089 add backward compatibility to proxy 2023-04-28 17:15:43 +03:00
Stas Kelvich
9486d76b2a Add tests for link auth to compute connection 2023-04-28 17:15:43 +03:00
Stas Kelvich
040f736909 remove changes in main proxy that are now not needed 2023-04-28 17:15:43 +03:00
Stas Kelvich
645e4f6ab9 use TLS in link proxy 2023-04-28 17:15:43 +03:00
Heikki Linnakangas
e947cc119b Add a small test case for pg_sni_router 2023-04-28 17:15:43 +03:00
Heikki Linnakangas
53e5d18da5 Start passthrough earlier
As soon as we have received the SSLRequest packet, and have figured
out the hostname to connect to from the SNI, we can start passing
through data. We don't need to parse the StartupPacket that the client
will send next.
2023-04-28 17:15:43 +03:00
Heikki Linnakangas
3813c703c9 Add an option for destination port.
Makes it easier to test locally.
2023-04-28 17:15:43 +03:00
Heikki Linnakangas
b15204fa8c Fix --help, and required args 2023-04-28 17:15:43 +03:00
Alexey Kondratov
81c75586ab Take port from SNI, formatting, make clippy happy 2023-04-28 17:15:43 +03:00
Anton Chaporgin
556fb1642a fixed the way hostname is parsed 2023-04-28 17:15:43 +03:00
Stas Kelvich
23aca81943 Add SNI-based proxy router
In order to not to create NodePorts for each compute we can setup
services that accept connections on wildcard domains and then use
information from domain name to route connection to some internal
service. There are ready solutions for HTTPS and TLS connections
but postgresql protocol uses opportunistic TLS and we haven't found
any ready solutions.

This patch introduces `pg_sni_router` which routes connections to
`aaa--bbb--123.external.domain` to `aaa.bbb.123.internal.domain`.

In the long run we can avoid console -> compute psql communications,
but now this router seems to be the easier way forward.
2023-04-28 17:15:43 +03:00
Arseny Sher
42798e6adc Increase connection_timeout to PG in find end of WAL test.
And log postgres to stdout.

Probably fixes https://github.com/neondatabase/neon/issues/3778
2023-04-28 16:17:23 +04:00
Arthur Petukhovsky
b03143dfc8 Use serde_as DisplayFromStr everywhere (#4103)
We used `display_serialize` previously, but it works only for Serialize.
`DisplayFromStr` does the same, but also works for Deserialize.
2023-04-28 13:55:07 +03:00
Arseny Sher
fdacfaabfd Move PageserverFeedback to utils.
It allows to replace u64 with proper Lsn and pretty print PageserverFeedback
with serde(_json). Now walsenders on safekeepers queried with debug_dump look
like

"walsenders": [
  {
    "ttid": "fafe0cf39a99c608c872706149de9d2a/b4fb3be6f576935e7f0fcb84bdb909a1",
    "addr": "127.0.0.1:48774",
    "conn_id": 3,
    "appname": "pageserver",
    "feedback": {
      "Pageserver": {
	"current_timeline_size": 32096256,
	"last_received_lsn": "0/2415298",
	"disk_consistent_lsn": "0/1696628",
	"remote_consistent_lsn": "0/0",
	"replytime": "2023-04-12T13:54:53.958856+00:00"
      }
    }
  }
],
2023-04-28 06:22:13 +04:00
Arseny Sher
b2a3981ead Move tracking of walsenders out of Timeline.
Refactors walsenders out of timeline.rs to makes it less convoluted into
separate WalSenders with its own lock, but otherwise having the same structure.
Tracking of in-memory remote_consistent_lsn is also moved there as it is mainly
received from pageserver.

State of walsender (feedback) is also restructured to be cleaner; now it is
either PageserverFeedback or StandbyFeedback(StandbyReply, HotStandbyFeedback),
but not both.
2023-04-28 06:22:13 +04:00
Joonas Koivunen
fe0b616299 feat(page_service): read timeouts (#4093)
Introduce read timeouts to our `page_service` connections. Without read
timeouts, we essentially leak connections.

This is a port of #3995. Split the refactorings to the other PR: #4097.

Fixes #4028.
2023-04-27 17:55:35 +00:00
Alexander Bayandin
c4e1cafb63 scripts/flaky_tests.py: handle connection error (#4096)
- Increase `connect_timeout` to 30s, which should be enough for 
most of the cases
- If the script cannot connect to the DB (or any other
`psycopg2.OperationalError` occur) — do not fail the script, log
the error and proceed. Problems with fetching flaky tests shouldn't
block the PR
2023-04-27 17:08:00 +01:00
Joonas Koivunen
fdf5e4db5e refactor: Cleanup page service (#4097)
Refactoring part of #4093.

Numerious `Send + Sync` bounds were a distraction, that were not needed
at all. The proper `Bytes` usage and one `"error_message".to_string()`
are just drive-by fixes.

Not using the `PostgresBackendTCP` allows us to start setting read
timeouts (and more). `PostgresBackendTCP` is still used from proxy, so
it cannot be removed.
2023-04-27 18:51:57 +03:00
Heikki Linnakangas
d1e86d65dc Run rustfmt to fix whitespace.
Commit e6ec2400fc introduced some trivial whitespace issues.
2023-04-27 18:45:22 +03:00
Arseny Sher
f5b4697c90 Log session_id when proxy per client task errors out. 2023-04-27 19:08:22 +04:00
Christian Schwarz
3be81dd36b fix clippy --release failure introduced in #4030 (#4095)
PR `build: run clippy for powerset of features (#4077)` brought us a
`clippy --release` pass.

It was merged after #4030, which fails under `clippy --release` with

```
error: static `TENANT_ID_EXTRACTOR` is never used
    --> pageserver/src/tenant/timeline.rs:4270:16
     |
4270 |     pub static TENANT_ID_EXTRACTOR: once_cell::sync::Lazy<
     |                ^^^^^^^^^^^^^^^^^^^
     |
     = note: `-D dead-code` implied by `-D warnings`

error: static `TIMELINE_ID_EXTRACTOR` is never used
    --> pageserver/src/tenant/timeline.rs:4276:16
     |
4276 |     pub static TIMELINE_ID_EXTRACTOR: once_cell::sync::Lazy<
     |                ^^^^^^^^^^^^^^^^^^^^^
```

A merge queue would have prevented this.
2023-04-27 17:07:25 +03:00
MMeent
e6ec2400fc Enable hot standby PostgreSQL replicas.
Notes:
 - This still needs UI support from the Console
 - I've not tuned any GUCs for PostgreSQL to make this work better
 - Safekeeper has gotten a tweak in which WAL is sent and how: It now
sends zero-ed WAL data from the start of the timeline's first segment up to
the first byte of the timeline to be compatible with normal PostgreSQL
WAL streaming.
 - This includes the commits of #3714 

Fixes one part of https://github.com/neondatabase/neon/issues/769

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-04-27 15:26:44 +02:00
Christian Schwarz
5b911e1f9f build: run clippy for powerset of features (#4077)
This will catch compiler & clippy warnings in all feature combinations.

We should probably use cargo hack for build and test as well, but,
that's quite expensive and would add to overall CI wait times.

obsoletes https://github.com/neondatabase/neon/pull/4073
refs https://github.com/neondatabase/neon/pull/4070
2023-04-27 15:01:27 +03:00
Christian Schwarz
9ea7b5dd38 clean up logging around on-demand downloads (#4030)
- Remove repeated tenant & timeline from span
- Demote logging of the path to debug level
- Log completion at info level, in the same function where we log errors
- distinguish between layer file download success & on-demand download
succeeding as a whole in the log message wording
- Assert that the span contains a tenant id and a timeline id

fixes https://github.com/neondatabase/neon/issues/3945

Before:

```
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: download complete: /storage/pageserver/data/tenants/$TENANT_ID/timelines/$TIMELINE_ID/000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates.
```

After:

```
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: layer file download finished
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates.
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: on-demand download successful
```
2023-04-27 11:54:48 +02:00
Arseny Sher
0112a602e1 Add timeout on proxy -> compute connection establishment.
Otherwise we sit up to default tcp_syn_retries (about 2+ min) before gettings os
error 110 if compute has been migrated to another pod.
2023-04-27 09:50:52 +04:00
Anastasia Lubennikova
92214578af Fix proxy_io_bytes_per_client metric: use branch_id identifier properly. (#4084)
It fixes the miscalculation of the metric for projects that use multiple
branches for the same endpoint.
We were under billing users with such projects. So we need to
communicate the change in Release Notes.
2023-04-26 17:47:54 +03:00
Christian Schwarz
6861259be7 add global metric for unexpected on-demand downloads (#4069)
Until we have toned down the prod logs to zero WARN and ERROR, we want a
dedicated metric for which we can have a dedicated alert.

fixes https://github.com/neondatabase/neon/issues/3924
2023-04-26 15:18:26 +02:00
Sergey Melnikov
11df2ee5d7 Add safekeeper-3.us-east-2.aws.neon.build (#4085) 2023-04-26 14:40:36 +03:00
Arseny Sher
31a3910fd9 Remove wait_for_sk_commit_lsn_to_reach_remote_storage.
It had a couple of inherent races:

1) Even if compute is killed before the call, some more data might still arrive
to safekeepers after commit_lsn on them is polled, advancing it. Then checkpoint
on pageserver might not include this tail, and so upload of expected LSN won't
happen until one more checkpoint.

2) commit_lsn is updated asynchronously -- compute can commit transaction before
communicating commit_lsn to even single safekeeper (sync-safekeepers can be used
to forces the advancement). This makes semantics of
wait_for_sk_commit_lsn_to_reach_remote_storage quite complicated.

Replace it with last_flush_lsn_upload which
1) Learns last flush LSN on compute;
2) Waits for it to arrive to pageserver;
3) Checkpoints it;
4) Waits for the upload.

In some tests this keeps compute alive longer than before, but this doesn't seem
to be important.

There is a chance this fixes https://github.com/neondatabase/neon/issues/3209
2023-04-26 13:46:33 +04:00
Joonas Koivunen
381c8fca4f feat: log how long tenant activation takes (#4080)
Adds just a counter counting up from the creation to the tenant, logged
after activation. Might help guide us with the investigation of #4025.
2023-04-26 12:39:17 +03:00
Joonas Koivunen
4625da3164 build: remove busted sk-1.us-east-2 from staging hosts (#4082)
this should give us complete deployments while a new one is being
brought up.
2023-04-26 09:07:45 +00:00
Joonas Koivunen
850f6b1cb9 refactor: drop pageserver_ondisk_layers (#4071)
I didn't get through #3775 fast enough so we wanted to remove this
metric.

Fixes #3705.
2023-04-26 11:49:29 +03:00
Sergey Melnikov
f19b70b379 Configure extra domain for us-east-1 (#4078) 2023-04-26 09:36:26 +02:00
Sergey Melnikov
9d0cf08d5f Fix new storage-broker deploy for eu-central-1 (#4079) 2023-04-26 10:29:44 +03:00
Alexander Bayandin
2d6fd72177 GitHub Workflows: Fix crane for several registries (#4076)
Follow-up fix after https://github.com/neondatabase/neon/pull/4067

```
+ crane tag neondatabase/vm-compute-node-v14:3064 latest
Error: fetching "neondatabase/vm-compute-node-v14:3064": GET https://index.docker.io/v2/neondatabase/vm-compute-node-v14/manifests/3064: MANIFEST_UNKNOWN: manifest unknown; unknown tag=3064
```

I reverted back the previous approach for promoting images
(login to one registry, save images to local fs, logout and login to
another registry, and push images from local fs). It turns out what
works for one Google project (kaniko), doesn't work for another (crane)
[sigh]
2023-04-25 23:58:59 +01:00
Heikki Linnakangas
8945fbdb31 Enable OpenTelemetry tracing in proxy in staging. (#4065)
Depends on https://github.com/neondatabase/helm-charts/pull/32

Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>
2023-04-25 20:45:36 +03:00
Alexander Bayandin
05ac0e2493 Login to ECR and Docker Hub at once (#4067)
- Update kaniko to 1.9.2 (from 1.7.0), problem with reproducible build is fixed
- Login to ECR and Docker Hub at once, so we can push to several
registries, it makes job `push-docker-hub` unneeded
- `push-docker-hub` replaced with `promote-images` in `needs:` clause,
Pushing images to production ECR moved to `promote-images` job
2023-04-25 17:54:10 +01:00
Joonas Koivunen
bfd45dd671 test_tenant_config: allow ERROR from eviction task (#4074) 2023-04-25 18:41:09 +03:00
Joonas Koivunen
7f80230fd2 fix: stop dead_code rustc lint (#4070)
only happens without `--all-features` which is what `./run_clippy.sh`
uses.
2023-04-25 17:07:04 +02:00
Sergey Melnikov
78bbbccadb Deploy proxies for preview enviroments (#4052)
## Describe your changes
Deploy `main` proxies to the preview environments
We don't deploy storage there yet, as it's tricky.

## Issue ticket number and link
https://github.com/neondatabase/cloud/issues/4737
2023-04-25 16:46:52 +02:00
Christian Schwarz
dbbe032c39 neon_local: fix tenant create -c eviction_policy:... (#4004)
And add corresponding unit test.

The fix is to use `.remove()` instead of `.get()` when processing the
arugments hash map.
The code uses emptiness of the hash map to determine whether all
arguments have been processed.
This was likely a copy-paste error.

    
refs https://github.com/neondatabase/neon/issues/3942
2023-04-25 15:33:30 +02:00
Joonas Koivunen
cb9473928d feat: add rough timings for basebackup (#4062)
just record the time needed for waiting the lsn and then the basebackup
in a log message in millis. this is related to ongoing investigations to
cold start performance.

this could also be a a counter. it cannot be added next to smgr
histograms, because we don't want another histogram per timeline.

the aim is to allow drilling deeper into which timelines were slow, and
to understand why some need two basebackups.
2023-04-25 13:22:16 +00:00
Christian Schwarz
fa20e37574 add gauge for in-flight layer uploads (#3951)
For the "worst-case /storage usage panel", we need to compute
```
remote size + local-only size
```

We currently don't have a metric for local-only layers.

The number of in-flight layers in the upload queue is just that, so, let
Prometheus scrape it.

The metric is two counters (started and finished).
The delta is the amount of in-flight uploads in the queue.

The metrics are incremented in the respective `call_unfinished_metric_*`
functions.
These track ongoing operations by file_kind and op_kind.
We only need this metric for layer uploads, so, there's the new
RemoteTimelineClientMetricsCallTrackSize type that forces all call sites
to decide whether they want the size tracked or not.
If we find that other file_kinds or op_kinds are interesting (metadata
uploads, layer downloads, layer deletes) are interesting, we can just
enable them, and they'll be just another label combination within the
metrics that this PR adds.

fixes https://github.com/neondatabase/neon/issues/3922
2023-04-25 14:22:48 +02:00
Joonas Koivunen
4911d7ce6f feat: warn when requests get cancelled (#4064)
Add a simple disarmable dropguard to log if request is cancelled before
it is completed. We currently don't have this, and it makes for
difficult to know when the request was dropped.
2023-04-25 15:22:23 +03:00
Christian Schwarz
e83684b868 add libmetric metric for each logged log message (#4055)
This patch extends the libmetrics logging setup functionality with a
`tracing` layer that increments a Prometheus counter each time we log a
log message. We have the counter per tracing event level. This allows
for monitoring WARN and ERR log volume without parsing the log. Also, it
would allow cross-checking whether logs got dropped on the way into
Loki.

It would be nicer if we could hook deeper into the tracing logging
layer, to avoid evaluating the filter twice.
But I don't know how to do it.
2023-04-25 14:10:18 +02:00
Eduard Dyckman
afbbc61036 Adding synthetic size to pageserver swagger (#4049)
## Describe your changes

I added synthetic size response to the console swagger. Now I am syncing
it back to neon
2023-04-24 16:19:25 +03:00
Alexey Kondratov
7ba5c286b7 [compute_ctl] Improve 'empty' compute startup sequence (#4034)
Do several attempts to get spec from the control-plane and retry network
errors and all reasonable HTTP response codes. Do not hang waiting for
spec without confirmation from the control-plane that compute is known
and is in the `Empty` state.

Adjust the way we track `total_startup_ms` metric, it should be
calculated since the moment we received spec, not from the moment
`compute_ctl` started. Also introduce a new `wait_for_spec_ms` metric
to track the time spent sleeping and waiting for spec to be delivered
from control-plane.

Part of neondatabase/cloud#3533
2023-04-21 11:10:48 +02:00
sharnoff
02b28ae0b1 fix vm-informant dbname: "neondb" -> "postgres" (#4046)
Changes the vm-informant's postgres connection string's dbname from
"neondb" (which sometimes doesn't exist) to "postgres" (which
_hopefully_ should exist more often?).

Currently there are a handful of VMs in prod that aren't working with
autoscaling because they don't have the "neondb" database.

The vm-informant doesn't require any database in particular; it's just
connecting as `cloud_admin` to be able to adjust the file cache
settings.
2023-04-18 18:54:32 +03:00
Cihan Demirci
0bfbae2d73 Add storage broker deployment to us-east-1 (#4048) 2023-04-18 18:41:09 +03:00
fcdm
f1b7dc4064 Update pageserver instances in us-east-1 2023-04-18 14:08:12 +01:00
Alexander Bayandin
e2a5177e89 Bump h2 from 0.3.17 to 0.3.18 (#4045) 2023-04-18 16:04:10 +03:00
Cihan Demirci
0c083564ce Add us-east-1 hosts file and update regions (#4042)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-04-17 15:25:27 +03:00
fcdm
d8dd60dc81 Add helm values for us-east-1 2023-04-17 11:59:38 +01:00
Arthur Petukhovsky
73f34eaa5e Send AppendResponse keepalive once per second (#4036)
Walproposer sends AppendRequest at least once per second. This patch
adds a response to these requests once per second.

Fixes https://github.com/neondatabase/neon/issues/4017
2023-04-17 11:24:57 +03:00
Matt Nappo
c2496c7ef2 Added black_box in layer_map benches (fix #3396) 2023-04-16 16:33:37 +03:00
Kirill Bulatov
ebea298415 Update most of the dependencies to their latest versions (#4026)
See https://github.com/neondatabase/neon/pull/3991

Brings the changes back with the right way to use new `toml_edit` to
deserialize values and a unit test for this.

All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated.

3 sets of crates remain unupdated:

* `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13).
* `opentelemetry` and `opentelemetry-*` crates

```
error[E0308]: mismatched types
  --> libs/tracing-utils/src/http.rs:65:21
   |
65 |     span.set_parent(parent_ctx);
   |          ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context`
   |          |
   |          arguments to this method are incorrect
   |
   = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types
note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
   = note: perhaps two different versions of crate `opentelemetry_api` are being used?
note: associated function defined here
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8
   |
43 |     fn set_parent(&self, cx: Context);
   |        ^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `tracing-utils` due to previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `tracing-utils` due to previous error
```

`tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need.

* similarly, `rustls`, `tokio-rustls`, `rustls-*` and `tls-listener` crates have similar issue:

```
error[E0308]: mismatched types
   --> libs/postgres_backend/tests/simple_select.rs:112:78
    |
112 |     let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg);
    |                                --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig`
    |                                |
    |                                arguments to this function are incorrect
    |
    = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types
note: struct `ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1
    |
125 | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1
    |
91  | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
    = note: perhaps two different versions of crate `rustls` are being used?
note: associated function defined here
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12
    |
23  |     pub fn new(config: ClientConfig) -> Self {
    |            ^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `postgres_backend` due to previous error
warning: build failed, waiting for other jobs to finish...
```

* aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed.
Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008
2023-04-14 18:28:54 +03:00
Vadim Kharitonov
5ffa20dd82 [proxy] adjust proxy sleep timeout 2023-04-14 15:08:07 +03:00
Vadim Kharitonov
75ea8106ec Add procps into compute containers 2023-04-14 15:02:26 +03:00
Vadim Kharitonov
017d3a390d Compile postgres with lz4 and zstd support 2023-04-14 15:02:26 +03:00
Alexey Kondratov
589cf1ed21 [compute_ctl] Do not create availability checker data on each start (#4019)
Initially, idea was to ensure that when we come and check data
availability, special service table already contains one row. So if we
loose it for some reason, we will error out.

Yet, to do availability check we anyway start compute first! So it
doesn't really add some value, but we affect each compute start as we
update at least one row in the database. Also this writes some WAL, so
if timeline is close to `neon.max_cluster_size` it could prevent compute
from starting up.

That said, do CREATE TABLE IF NOT EXISTS + UPSERT right in the
`/check_writability` handler.
2023-04-14 13:05:07 +02:00
Alexander Bayandin
0c82ff3d98 test_runner: add Timeline Inspector to Grafana links (#4021) 2023-04-14 11:46:47 +01:00
Christian Schwarz
8895f28dae make evictions_low_residence_duration_metric_threshold per-tenant (#3949)
Before this patch, if a tenant would override its eviction_policy
setting to use a lower LayerAccessThreshold::threshold than the
`evictions_low_residence_duration_metric_threshold`, the evictions done
for that tenant would count towards the
`evictions_with_low_residence_duration` metric.

That metric is used to identify pre-mature evictions, commonly triggered
by disk-usage-based eviction under disk pressure.

We don't want that to happen for the legitimate evictions of the tenant
that overrides its eviction_policy.

So, this patch
- moves the setting into TenantConf
- adds test coverage
- updates the staging & prod yamls

Forward Compatibility:
Software before this patch will ignore the new tenant conf field and use
the global one instead.
So we can roll back safely.

Backward Compatibility:
Parsing old configs with software as of this patch will fail in
`PageServerConf::parse_and_validate` with error 
`unrecognized pageserver option 'evictions_low_residence_duration_metric_threshold'`
if the option is still present in the global section.
We deal with this by updating the configs in Ansible.

fixes https://github.com/neondatabase/neon/issues/3940
2023-04-14 13:25:45 +03:00
dependabot[bot]
b6c7c3290f Bump h2 from 0.3.15 to 0.3.17 (#4020) 2023-04-13 20:03:24 +01:00
Sasha Krassovsky
fd31fafeee Make proxy shutdown when all connections are closed (#3764)
## Describe your changes
Makes Proxy start draining connections on SIGTERM.
## Issue ticket number and link
#3333
2023-04-13 19:31:30 +03:00
Alexey Kondratov
db8dd6f380 [compute_ctl] Implement live reconfiguration (#3980)
With this commit one can request compute reconfiguration
from the running `compute_ctl` with compute in `Running` state
by sending a new spec:
```shell
curl -d "{\"spec\": $(cat ./compute-spec-new.json)}" http://localhost:3080/configure
```

Internally, we start a separate configurator thread that is waiting on
`Condvar` for `ConfigurationPending` compute state in a loop. Then it does
reconfiguration, sets compute back to `Running` state and notifies other
waiters.

It will need some follow-ups, e.g. for retry logic for control-plane
requests, but should be useful for testing in the current state. This
shouldn't affect any existing environment, since computes are configured
in a different way there.

Resolves neondatabase/cloud#4433
2023-04-13 18:07:29 +02:00
Alexander Bayandin
36c20946b4 Verify extensions checksums (#4014)
To not be taken by surprise by upstream git re-tag or by malicious activity,
let's verify the checksum for extensions we download

Also, unify the installation of `pg_graphql` and `pg_tiktoken` 
with other extensions.
2023-04-13 15:25:09 +01:00
Heikki Linnakangas
89b5589b1b Tenant size should never be zero. Simplify test.
Looking at the git history of this test, I think "size == 0" used to
have a special meaning earlier, but now it should never happen.
2023-04-13 16:57:31 +03:00
Heikki Linnakangas
53f438a8a8 Rename "Postgres nodes" in control_plane to endpoints.
We use the term "endpoint" in for compute Postgres nodes in the web UI
and user-facing documentation now. Adjust the nomenclature in the code.

This changes the name of the "neon_local pg" command to "neon_local
endpoint". Also adjust names of classes, variables etc. in the python
tests accordingly.

This also changes the directory structure so that endpoints are now
stored in:

    .neon/endpoints/<endpoint id>

instead of:

    .neon/pgdatadirs/tenants/<tenant_id>/<endpoint (node) name>

The tenant ID is no longer part of the path. That means that you
cannot have two endpoints with the same name/ID in two different
tenants anymore. That's consistent with how we treat endpoints in the
real control plane and proxy: the endpoint ID must be globally unique.
2023-04-13 14:34:29 +03:00
Vadim Kharitonov
356439aa33 Add note about manual_release_instructions label (#4015)
## Describe your changes
Do not forget to process required manual stuff after release

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
2023-04-13 13:13:24 +03:00
Vadim Kharitonov
c237a2f5fb Compile pg_hint_plan extension 2023-04-13 12:59:46 +03:00
Dmitry Rodionov
15d1f85552 Add reason to TenantState::Broken (#3954)
Reason and backtrace are added to the Broken state. Backtrace is automatically collected when tenant entered the broken state. The format for API, CLI and metrics is changed and unified to return tenant state name in camel case. Previously snake case was used for metrics and camel case was used for everything else. Now tenant state field in TenantInfo swagger spec is changed to contain state name in "slug" field and other fields (currently only reason and backtrace for Broken variant in "data" field). To allow for this breaking change state was removed from TenantInfo swagger spec because it was not used anywhere.

Please note that the tenant's broken reason is not persisted on disk so the reason is lost when pageserver is restarted.

Requires changes to grafana dashboard that monitors tenant states.

Closes #3001

---------

Co-authored-by: theirix <theirix@gmail.com>
2023-04-13 12:11:43 +03:00
Konstantin Knizhnik
732acc54c1 Add check for duplicates of generated image layers (#3869)
## Describe your changes

## Issue ticket number and link

#3673

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-04-13 10:19:34 +03:00
Stas Kelvich
5d0ecadf7c Add support for non-SNI case in multi-cert proxy
When no SNI is provided use the default certificate, otherwise we can't
get to the options parameter which can be used to set endpoint name too.
That means that non-SNI flow will not work for CNAME domains in verify-full
mode.
2023-04-12 18:16:49 +03:00
Kirill Bulatov
f7995b3c70 Revert "Update most of the dependencies to their latest versions (#3991)" (#4013)
This reverts commit a64044a7a9.

See https://neondb.slack.com/archives/C03H1K0PGKH/p1681306682795559
2023-04-12 14:51:59 +00:00
Alexander Bayandin
13e53e5dc8 GitHub Workflows: use '!cancelled' instead of 'success or failure' 2023-04-12 15:22:18 +01:00
Alexander Bayandin
c94b8998be GitHub Workflows: print error messages to stderr 2023-04-12 15:22:18 +01:00
Alexander Bayandin
218062ceba GitHub Workflows: use ref_name instead of ref 2023-04-12 15:22:18 +01:00
Sam Gaw
8d295780cb Add support for ip4r extension 2023-04-12 16:40:02 +03:00
Kirill Bulatov
a64044a7a9 Update most of the dependencies to their latest versions (#3991)
All non-trivial updates extracted into separate commits, also `carho
hakari` data and its manifest format were updated.

3 sets of crates remain unupdated:

* `base64` — touches proxy in a lot of places and changed its api (by
0.21 version) quite strongly since our version (0.13).
* `opentelemetry` and `opentelemetry-*` crates

```
error[E0308]: mismatched types
  --> libs/tracing-utils/src/http.rs:65:21
   |
65 |     span.set_parent(parent_ctx);
   |          ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context`
   |          |
   |          arguments to this method are incorrect
   |
   = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types
note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
   = note: perhaps two different versions of crate `opentelemetry_api` are being used?
note: associated function defined here
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8
   |
43 |     fn set_parent(&self, cx: Context);
   |        ^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `tracing-utils` due to previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `tracing-utils` due to previous error
```

`tracing-opentelemetry` of version `0.19` is not yet released, that is
supposed to have the update we need.

* similarly, `rustls`, `tokio-rustls`, `rustls-*` and `tls-listener`
crates have similar issue:

```
error[E0308]: mismatched types
   --> libs/postgres_backend/tests/simple_select.rs:112:78
    |
112 |     let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg);
    |                                --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig`
    |                                |
    |                                arguments to this function are incorrect
    |
    = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types
note: struct `ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1
    |
125 | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1
    |
91  | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
    = note: perhaps two different versions of crate `rustls` are being used?
note: associated function defined here
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12
    |
23  |     pub fn new(config: ClientConfig) -> Self {
    |            ^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `postgres_backend` due to previous error
warning: build failed, waiting for other jobs to finish...
```

* aws crates: I could not make new API to work with bucket endpoint
overload, and console e2e tests failed.
Other our tests passed, further investigation is worth to be done in
https://github.com/neondatabase/neon/issues/4008
2023-04-12 15:32:38 +03:00
Kirill Bulatov
d8939d4162 Move walreceiver start and stop behind a struct (#3973)
The PR changes module function-based walreceiver interface with a
`WalReceiver` struct that exposes a few public methods, `new`, `start`
and `stop` now.

Later, the same struct is planned to be used for getting walreceiver
stats (and, maybe, other extra data) to display during missing wal
errors for https://github.com/neondatabase/neon/issues/2106

Now though, the change required extra logic changes:

* due to the `WalReceiver` struct added, it became easier to pass `ctx`
and later do a `detached_child` instead of

bfee412701/pageserver/src/tenant/timeline.rs (L1379-L1381)

* `WalReceiver::start` which is now the public API to start the
walreceiver, could return an `Err` which now may turn a tenant into
`Broken`, same as the timeline that it tries to load during startup.

* `WalReceiverConf` was added to group walreceiver parameters from
pageserver's tenant config
2023-04-12 12:39:02 +03:00
Heikki Linnakangas
06ce83c912 Tolerate missing 'operation_uuid' field in spec file.
'compute_ctl' doesn't use the operation_uuid for anything, it just prints
it to the log.
2023-04-12 12:11:22 +03:00
Heikki Linnakangas
8ace7a7515 Remove unused 'timestamp' field from ComputeSpec struct. 2023-04-12 12:11:22 +03:00
Heikki Linnakangas
ef68321b31 Use Lsn, TenantId, TimelineId types in compute_ctl.
Stronger types are generally nicer.
2023-04-12 12:11:22 +03:00
Heikki Linnakangas
6064a26963 Refactor 'spec' in ComputeState.
Sometimes, it contained real values, sometimes just defaults if the
spec was not received yet. Make the state more clear by making it an
Option instead.

One consequence is that if some of the required settings like
neon.tenant_id are missing from the spec file sent to the /configure
endpoint, it is spotted earlier and you get an immediate HTTP error
response. Not that it matters very much, but it's nicer nevertheless.
2023-04-12 01:55:40 +03:00
Stas Kelvich
3c9f42a2e2 Support aarch64 in walredo seccomp code (#3996)
Aarch64 doesn't implement some old syscalls like open and select. Use
openat instead of open to check if seccomp is supported. Leave both
select and pselect6 in the allowlist since we don't call select syscall
directly and may hope that libc will call pselect6 on aarch64.

To check whether some syscall is supported it is possible to use
`scmp_sys_resolver` from seccopm package:

```
> apt install seccopm
> scmp_sys_resolver -a x86_64 select
23
> scmp_sys_resolver -a aarch64 select
-10101
> scmp_sys_resolver -a aarch64 pselect6
72
```

Negative value means that syscall is not supported.

Another cross-check is to look up for the actuall syscall table in
`unistd.h`. To resolve all the macroses one can use `gcc -E` as it is
done in `dump_sys_aarch64()` function in
libseccomp/src/arch-syscall-validate.

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-04-11 19:28:18 +00:00
Alexey Kondratov
40a68e9077 [compute_ctl] Add timeout for tracing_utils::shutdown_tracing() (#3982)
Shutting down OTEL tracing provider may hang for quite some time, see,
for example:
- https://github.com/open-telemetry/opentelemetry-rust/issues/868
- and our problems with staging
https://github.com/neondatabase/cloud/issues/3707#issuecomment-1493983636

Yet, we want computes to shut down fast enough, as we may need a new one
for the same timeline ASAP. So wait no longer than 2s for the shutdown
to complete, then just error out and exit the main thread.

Related to neondatabase/cloud#3707
2023-04-11 15:05:35 +02:00
Stas Kelvich
de99ee2c0d Add more proxy cnames 2023-04-11 14:54:09 +03:00
Alexander Bayandin
c79d5a947c Nightly Benchmarks: run third-party benchmarks once a week (#3987) 2023-04-11 10:58:04 +01:00
Arseny Sher
7ad5a5e847 Enable timeout on reading from socket in safekeeper WAL service.
TCP_KEEPALIVE is not enabled by default, so this prevents hanged up connections
in case of abrupt client termination. Add 'closed' flag to PostgresBackendReader
and pass it during handles join to prevent attempts to read from socket if we
errored out previously -- now with timeouts this is a common situation.

It looks like
2023-04-10T18:08:37.493448Z  INFO {cid=68}:WAL
receiver{ttid=59f91ad4e821ab374f9ccdf918da3a85/16438f99d61572c72f0c7b0ed772785d}:
terminated: timed out

Presumably fixes https://github.com/neondatabase/neon/issues/3971
2023-04-11 11:45:43 +04:00
Stas Kelvich
22c890b71c Add more cnames to proxies 2023-04-11 01:55:25 +03:00
Stas Kelvich
83549a8d40 Revert "Support aarch64 in walredo seccomp code"
This reverts commit 98df7db094.
2023-04-11 00:08:01 +03:00
Stas Kelvich
98df7db094 Support aarch64 in walredo seccomp code
Aarch64 doesn't implement some old syscalls like open and select. Use
openat instead of open to check if seccomp is supported. Leave both
select and pselect6 in the allowlist since we don't call select syscall
directly and may hope that libc will call pselect6 on aarch64.

To check whether some syscall is supported it is possible to use
`scmp_sys_resolver` from seccopm package:

```
> apt install seccopm
> scmp_sys_resolver -a x86_64 select
23
> scmp_sys_resolver -a aarch64 select
-10101
> scmp_sys_resolver -a aarch64 pselect6
72
```

Negative value means that syscall is not supported.

Another cross-check is to look up for the actuall syscall table
in `unistd.h`. To resolve all the macroses one can use `gcc -E` as
it is done in `dump_sys_aarch64()` function in
libseccomp/src/arch-syscall-validate.
2023-04-10 23:54:16 +03:00
Heikki Linnakangas
f0b2e076d9 Move compute_ctl structs used in HTTP API and spec file to separate crate.
This is in preparation of using compute_ctl to launch postgres nodes
in the neon_local control plane. And seems like a good idea to
separate the public interfaces anyway.

One non-mechanical change here is that the 'metrics' field is moved
under the Mutex, instead of using atomics. We were not using atomics
for performance but for convenience here, and it seems more clear to
not use atomics in the model for the HTTP response type.
2023-04-09 21:52:28 +03:00
Alexander Bayandin
818e341af0 Nightly Benchmarks: replace neon-captest-prefetch with -new/-reuse (#3970)
We have enabled prefetch by default, let's use this in Nightly
Benchmarks:
- effective_io_concurrency=100 by default (instead of 32)
- maintenance_io_concurrency=100 by default (instead of 32)

Rename `neon-captest-prefetch` to `neon-captest-new` (for pgbench with
initialisation) and `neon-captest-reuse` (for OLAP scenarios)
2023-04-09 12:52:49 +01:00
Kirill Bulatov
dec58092e8 Replace Box<dyn> with impl in RemoteStorage upload (#3984)
Replaces `Box<(dyn io::AsyncRead + Unpin + Send + Sync + 'static)>` with
`impl io::AsyncRead + Unpin + Send + Sync + 'static` usages in the
`RemoteStorage` interface, to make it closer to
[`#![feature(async_fn_in_trait)]`](https://blog.rust-lang.org/inside-rust/2022/11/17/async-fn-in-trait-nightly.html)

For `GenericRemoteStorage`, replaces `type Target = dyn RemoteStorage`
with another impl with `RemoteStorage` methods inside it.
We can reuse the trait, that would require importing the trait in every
file where it's used and makes us farther from the unstable feature.
After this PR, I've manged to create a patch with the changes:

https://github.com/neondatabase/neon/compare/kb/less-dyn-storage...kb/nightly-async-trait?expand=1

Current rust implementation does not like recursive async trait calls,
so `UnreliableWrapper` was removed: it contained a
`GenericRemoteStorage` that implemented the `RemoteStorage` trait, and
itself implemented the trait, which nightly rustc did not like and
proposed to box the future.
Similarly, `GenericRemoteStorage` cannot implement `RemoteStorage` for
nightly rustc to work, since calls various remote storages' methods from
inside.

I've compiled current `main` and the nightly branch both with `time env
RUSTC_WRAPPER="" cargo +nightly build --all --timings` command, and got
```
    Finished dev [optimized + debuginfo] target(s) in 2m 04s
env RUSTC_WRAPPER="" cargo +nightly build --all --timings  1283.19s user 50.40s system 1074% cpu 2:04.15 total

for the new feature tried and

    Finished dev [optimized + debuginfo] target(s) in 2m 40s
env RUSTC_WRAPPER="" cargo +nightly build --all --timings  1288.59s user 52.06s system 834% cpu 2:40.71 total

for the old async_trait approach.
```

On my machine, the `remote_storage` lib compilation takes ~10 less time
with the nightly feature (left) than the regular main (right).


![image](https://user-images.githubusercontent.com/2690773/230620797-163d8b89-dac8-4366-bcf6-cd1cdddcd22c.png)

Full cargo reports are available at
[timings.zip](https://github.com/neondatabase/neon/files/11179369/timings.zip)
2023-04-07 21:39:49 +03:00
Stas Kelvich
0bf70e113f Add extra cnames to staging proxy 2023-04-07 19:18:19 +03:00
Vadim Kharitonov
31f2cdeb1e Update Dockerfile.compute-node
Co-authored-by: MMeent <matthias@neon.tech>
2023-04-07 15:26:22 +02:00
Vadim Kharitonov
979fa8b1ba Compile timescaledb 2023-04-07 15:26:22 +02:00
Konstantin Knizhnik
bfee412701 Trigger tests for index scan implementation (#3968)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-04-07 14:26:21 +03:00
Dmitry Rodionov
bfeb428d1b tests: make neon_fixtures a bit thinner by splitting out some pageserver related helpers (#3977)
neon_fixture is quite big and messy, lets clean it up a bit.
2023-04-07 13:47:28 +03:00
Stas Kelvich
b1c2a6384a Set non-wildcard common names in link auth proxy
Old coding here ignored non-wildcard common names and passed None instead. With my recent changes
I started throwing an error in that case. Old logic doesn't seem to be a great choice, so instead
of passing None I actually set non-wildcard common names too. That way it is possible to avoid handling
cases with None in downstream code.
2023-04-07 01:24:27 +03:00
Anastasia Lubennikova
6d01d835a8 [proxy] Report error if proxy_io_bytes_per_client metric has decreased 2023-04-06 23:14:07 +03:00
Alexey Kondratov
e42982fb1e [compute_ctl] Empty computes and /configure API (#3963)
This commit adds an option to start compute without spec and then pass
it a valid spec via `POST /configure` API endpoint. This is a main
prerequisite for maintaining the pool of compute nodes in the
control-plane.

For example:

1. Start compute with
   ```shell
   cargo run --bin compute_ctl -- -i no-compute \
    -p http://localhost:9095 \
    -D compute_pgdata \
    -C "postgresql://cloud_admin@127.0.0.1:5434/postgres" \
    -b ./pg_install/v15/bin/postgres
   ```

2. Configure it with
   ```shell
   curl -d "{\"spec\": $(cat ./compute-spec.json)}" http://localhost:3080/configure
   ```

Internally, it's implemented using a `Condvar` + `Mutex`. Compute spec
is moved under Mutex, as it's now could be updated in the http handler.
Also `RwLock` was replaced with `Mutex` because the latter works well
with `Condvar`.

First part of the neondatabase/cloud#4433
2023-04-06 21:21:58 +02:00
Dmitry Rodionov
b45c92e533 tests: exclude compatibility tests by default (#3975)
This allows to skip compatibility tests based on `CHECK_ONDISK_DATA_COMPATIBILITY` environment variable. When the variable is missing (default) compatibility tests wont be run.
2023-04-06 21:21:39 +03:00
Arthur Petukhovsky
ba4a96fdb1 Eagerly update wal_backup_lsn after each segment offload (#3976)
Otherwise it can lag a lot, preventing WAL segments cleanup. Also max
wal_backup_lsn on update, pulling it down is pointless.

Should help with https://github.com/neondatabase/neon/issues/3957, but
will not fix it completely.
2023-04-06 20:57:06 +03:00
Alexander Bayandin
4d64edf8a5 Nightly Benchmarks: Add free tier sized compute (#3969)
- Add support for VMs and CU
- Add free tier limited benchmark (0.25 CU)
- Ensure we use 1 CU by default for pgbench workload
2023-04-06 19:18:24 +03:00
Kirill Bulatov
102746bc8f Apply clippy rule exclusion locally instead of a global approach (#3974) 2023-04-06 18:57:48 +03:00
Alexander Bayandin
887cee64e2 test_runner: add links to grafana for remote tests (#3961)
Add Grafana links to allure reports to make it easier to debug perf
test failures
2023-04-06 13:52:41 +01:00
Vadim Kharitonov
2ce973c72f Allow installation of pg_stat_statements 2023-04-06 13:26:40 +02:00
Gleb Novikov
9db70f6232 Added disk_size and instance_type to payload (#3918)
## Describe your changes

In https://github.com/neondatabase/cloud/issues/4354 we are making
scheduling of projects based on available disk space and overcommit, so
we need to know disk size and just in case instance type of the
pageserver

## Issue ticket number and link

https://github.com/neondatabase/cloud/issues/4354

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] ~If it is a core feature, I have added thorough tests.~
- [ ] ~Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?~
- [ ] ~If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.~
2023-04-06 14:02:56 +04:00
Joonas Koivunen
b17c24fa38 fix: settle down to configured percent (#3947)
in real env testing we noted that the disk-usage based eviction sails 1
percentage point above the configured value, which might be a source of
confusion, so it might be better to get rid of that confusion now.

confusion: "I configured 85% but pageserver sails at 86%".

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-04-06 12:47:21 +03:00
Alexander Bayandin
9310949b44 GitHub Autocomment: Retry on server errors (#3958)
Retry posting/updating a comment in case of 5XX errors from GitHub API
2023-04-05 22:08:06 +03:00
Stas Kelvich
d8df5237fa Aligne extra certificate name with default cert-manager names 2023-04-05 21:29:21 +03:00
Stas Kelvich
c3ca48c62b Support extra domain names for proxy.
Make it possible to specify directory where proxy will look up for
extra certificates. Proxy will iterate through subdirs of that directory
and load `key.pem` and `cert.pem` files from each subdir. Certs directory
structure may look like that:

  certs
  |--example.com
  |  |--key.pem
  |  |--cert.pem
  |--foo.bar
     |--key.pem
     |--cert.pem

Actual domain names are taken from certs and key, subdir names are
ignored.
2023-04-05 20:06:48 +03:00
Alexander Bayandin
957acb51b5 GitHub Autocomment: Fix the link to the latest commit (#3952) 2023-04-04 19:06:10 +03:00
Alexander Bayandin
1d23b5d1de Comment PR with test results (#3907)
This PR adds posting a comment with test results. Each workflow run
updates the comment with new results.
The layout and the information that we post can be changed to our needs,
right now, it contains failed tests and test which changes status after
rerun (i.e. flaky tests)
2023-04-04 12:22:47 +01:00
Alexander Bayandin
105b8bb9d3 test_runner: automatically rerun flaky tests (#3880)
This PR adds a plugin that automatically reruns (up to 3 times) flaky
tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and
`pytest-rerunfailures` plugin.

As the first approximation we consider the test flaky if it has failed on 
the main branch in the last 10 days.

Flaky tests are fetched by `scripts/flaky_tests.py` script (it's
possible to use it in a standalone mode to learn which tests are flaky),
stored to a JSON file, and then the file is passed to the pytest plugin.
2023-04-04 12:21:54 +01:00
Kirill Bulatov
846532112c Remove unused S3 list operation (#3936)
In S3, pageserver only lists tenants (prefixes) on S3, no other keys.
Remove the list operation from the API, since S3 impl does not seem to
work normally and not used anyway,
2023-04-03 23:44:38 +03:00
Dmitry Ivanov
f85a61ceac [proxy] Fix regression in logging
For some reason, `tracing::instrument` proc_macro doesn't always print
elements specified via `fields()` or even show that it's impossible
(e.g. there's no Display impl).

Work around this using the `?foo` notation.

Before:
2023-04-03T14:48:06.017504Z  INFO handle_client🤝 received SslRequest

After:
2023-04-03T14:51:24.424176Z  INFO handle_client{session_id=7bd07be8-3462-404e-8ccc-0a5332bf3ace}🤝 received SslRequest
2023-04-03 18:49:30 +03:00
Christian Schwarz
45bf76eb05 enable layer eviction by default in prod (#3933)
Leave disk_usage_based_eviction above the current max usage in prod
(82%ish), so that deploying this commit won't trigger
disk_usage_based_eviction.

As indicated in the TODO, we'll decrease the value to 80% later.

Also update the staging YAMLs to use the anchor syntax for
`evictions_low_residence_duration_metric_threshold` like we do in the
prod YAMLs as of this patch.
2023-04-03 14:57:36 +02:00
Joonas Koivunen
a415670bc3 feat: log evictions (#3930)
this will help log analysis with the counterpart of already logging all
remote download needs and downloads. ended up with a easily regexable
output in the final round.
2023-04-03 14:15:41 +03:00
Joonas Koivunen
cf5cfe6d71 fix: metric used for alerting threshold on staging (#3932)
This should remove the too eager alerts from staging.
2023-04-03 13:26:45 +03:00
Arseny Sher
d733bc54b8 Rename ReplicationFeedback and its fields.
This is the the feedback originating from pageserver, so change previous
confusing names to
s/ReplicationFeedback/PageserverFeedback
s/ps_writelsn/last_receive_lsn
s/ps_flushlsn/disk_consistent_lsn
s/ps_apply_lsn/remote_consistent_lsn

I haven't changed on the wire format to keep compatibility. However,
understanding of new field names is added to compute, so once all computes
receive this patch we can change the wire names as well. Safekeepers/pageservers
are deployed roughly at the same time and it is ok to live without feedbacks
during the short period, so this is not a problem there.
2023-04-03 01:52:41 +04:00
Arthur Petukhovsky
814abd9f84 Switch to safekeeper in the same AZ (#3883)
Add a condition to switch walreceiver connection to safekeeper that is
located in the same availability zone. Switch happens when commit_lsn of
a candidate is not less than commit_lsn from the active connection. This
condition is expected not to trigger instantly, because commit_lsn of a
current connection is usually greater than commit_lsn of updates from
the broker. That means that if WAL is written continuously, switch can
take a lot of time, but it should happen eventually.

Now protoc 3.15+ is required for building neon.

Fixes https://github.com/neondatabase/neon/issues/3200
2023-04-02 11:32:27 +03:00
Alexander Bayandin
75ffe34b17 check-macos-build: fix cache key (#3926)
We don't have `${{ matrix.build_type }}` there, so it gets resolved to
an empty substring and looks like this

[`v1-macOS--pg-f8a650e49b06d39ad131b860117504044b01f312-dcccd010ff851b9f72bb451f28243fa3a341f07028034bbb46ea802413b36d80`](https://github.com/neondatabase/neon/actions/runs/4575422427/jobs/8078231907#step:26:2)
2023-03-31 21:45:59 +03:00
Christian Schwarz
d2aa31f0ce fix pageserver_evictions_with_low_residence_duration metric (#3925)
It was doing the comparison in the wrong way.
2023-03-31 19:25:53 +03:00
Dmitry Rodionov
22f9ea5fe2 Remind people to clean up merge commit message in PR template (#3920) 2023-03-31 16:11:34 +03:00
Joonas Koivunen
d0711d0896 build: fix git perms for deploy job (#3921)
copy pasted from `build-neon` job. it is interesting that this is only
needed by `build-neon` and `deploy`.

Fixes:
https://github.com/neondatabase/neon/actions/runs/4568077915/jobs/8070960178
which seems to have been going for a while.
2023-03-31 16:05:15 +03:00
Arseny Sher
271f6a6e99 Always sync-safekeepers in neon_local on compute start.
Instead of checking neon.safekeepers GUC value in existing pg node data dir,
just always run sync-safekeepers when safekeepers are configured. Without this
change, creation of new compute didn't run it. That's ok for new
timeline/branch (it doesn't return anything useful anyway, and LSN is known by
pageserver), but restart of compute for existing timeline bore the risk of
getting basebackup not on the latest LSN, i.e. basically broken -- it might not
have prev_lsn, and even if it had, walproposer would complain anyway.

fixes https://github.com/neondatabase/neon/issues/2963
2023-03-31 16:15:06 +04:00
Christian Schwarz
a64dd3ecb5 disk-usage-based layer eviction (#3809)
This patch adds a pageserver-global background loop that evicts layers
in response to a shortage of available bytes in the $repo/tenants
directory's filesystem.

The loop runs periodically at a configurable `period`.

Each loop iteration uses `statvfs` to determine filesystem-level space
usage. It compares the returned usage data against two different types
of thresholds. The iteration tries to evict layers until app-internal
accounting says we should be below the thresholds. We cross-check this
internal accounting with the real world by making another `statvfs` at
the end of the iteration. We're good if that second statvfs shows that
we're _actually_ below the configured thresholds. If we're still above
one or more thresholds, we emit a warning log message, leaving it to the
operator to investigate further.

There are two thresholds:
- `max_usage_pct` is the relative available space, expressed in percent
of the total filesystem space. If the actual usage is higher, the
threshold is exceeded.
- `min_avail_bytes` is the absolute available space in bytes. If the
actual usage is lower, the threshold is exceeded.

The iteration evicts layers in LRU fashion with a reservation of up to
`tenant_min_resident_size` bytes of the most recent layers per tenant.
The layers not part of the per-tenant reservation are evicted
least-recently-used first until we're below all thresholds. The
`tenant_min_resident_size` can be overridden per tenant as
`min_resident_size_override` (bytes).

In addition to the loop, there is also an HTTP endpoint to perform one
loop iteration synchronous to the request. The endpoint takes an
absolute number of bytes that the iteration needs to evict before
pressure is relieved. The tests use this endpoint, which is a great
simplification over setting up loopback-mounts in the tests, which would
be required to test the statvfs part of the implementation. We will rely
on manual testing in staging to test the statvfs parts.

The HTTP endpoint is also handy in emergencies where an operator wants
the pageserver to evict a given amount of space _now. Hence, it's
arguments documented in openapi_spec.yml. The response type isn't
documented though because we don't consider it stable. The endpoint
should _not_ be used by Console but it could be used by on-call.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-31 14:47:57 +03:00
Konstantin Knizhnik
bf46237fc2 Fix prefetch for parallel bitmap scan (#3875)
## Describe your changes

Fix prefetch for parallel bitmap scan

## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-30 22:07:19 +03:00
Lassi Pölönen
41d364a8f1 Add more detailed logging to compute_ctl's shutdown (#3915)
Currently we don't see from the logs, if shutting down tracing takes
long time or not. We do see that shutting down computes gets delayed for
some reason and hits thhe grace period limit. Moving the shutdown
message to slightly later, when we don't have anything else than just
exit left.
## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-30 22:02:39 +03:00
Christian Schwarz
fa54a57ca2 random_init_delay: remove the minimum of 10 seconds (#3914)
Before this patch, the range from which the random delay is picked is at
minimum 10 seconds.
With this patch, they delay is bounded to whatever the given `period`
is, and zero, if period id Duration::ZERO.

Motivation for this: the disk usage eviction tests that we'll add in
https://github.com/neondatabase/neon/pull/3905 need to wait for the disk
usage eviction background loop to do its job.
They set a period of 1s.
It seems wasteful to wait 10 seconds in the tests.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-30 18:38:45 +02:00
Lassi Pölönen
1c1bb904ed Rename zenith_* labels to neon_* (#3911)
## Describe your changes
Get rid of the legacy labeling. Aslo `neon_region_slug` with the same
value as `neon_region` doesn't make much sense, so just drop it. This
allows us to drop the relabeling from zenith to neon in the log
collector.
2023-03-30 16:24:47 +03:00
Gleb Novikov
b26c837ed6 Fixed pageserver openapi spec properties reference (#3904)
## Describe your changes

In [this linter
run](https://github.com/neondatabase/cloud/actions/runs/4553032319/jobs/8029101300?pr=4391)
accidentally found out that spec is invalid. Reference other schemas in
properties should be done the way I changed.

Could not find documentation specifically for schemas embedding in
`components.schemas`, but it seems like the approach is inherited from
json schema:
https://json-schema.org/understanding-json-schema/structuring.html#ref

## Issue ticket number and link
-

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] ~If it is a core feature, I have added thorough tests.~
- [ ] ~Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?~
- [ ] ~If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.~
2023-03-29 19:18:44 +04:00
Kirill Bulatov
ac9c7e8c4a Replace pin! from tokio to the std one (#3903)
With fresh rustc brought by
https://github.com/neondatabase/neon/pull/3902, we can use
`std::pin::pin!` macro instead of the tokio one.
One place did not need the macro at all, other places were adjusted.
2023-03-29 14:14:56 +03:00
Vadim Kharitonov
f1b174dc6a Update rust version to 1.68.2 2023-03-29 12:50:04 +04:00
Kirill Bulatov
9d714a8413 Split $CARGO_FLAGS and $CARGO_FEATURES to make e2e tests work 2023-03-29 00:08:30 +03:00
Kirill Bulatov
6c84cbbb58 Run new Rust IT test in CI 2023-03-29 00:08:30 +03:00
Kirill Bulatov
1300dc9239 Replace Python IT test with the Rust one 2023-03-29 00:08:30 +03:00
Kirill Bulatov
018c8b0e2b Use proper tokens and delimeters when listing S3 2023-03-29 00:08:30 +03:00
Arseny Sher
b52389f228 Cleanly exit on any shutdown signal in storage_broker.
neon_local sends SIGQUIT, which otherwise dumps core by default. Also, remove
obsolete install_shutdown_handlers; in all binaries it was overridden by
ShutdownSignals::handle later.

ref https://github.com/neondatabase/neon/issues/3847
2023-03-28 22:29:42 +04:00
Heikki Linnakangas
5a123b56e5 Remove obsolete hack to rename neon-specific GUCs.
I checked the console database, we don't have any of these left in
production.
2023-03-28 17:57:22 +03:00
Arthur Petukhovsky
7456e5b71c Add script to collect state from safekeepers (#3835)
Add an ansible script to collect
https://github.com/neondatabase/neon/pull/3710 state JSON from all
safekeeper nodes and upload them to a postgres table.
2023-03-28 17:04:02 +03:00
Konstantin Knizhnik
9798737ec6 Update pgxn/neon/file_cache.c
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-28 14:43:34 +04:00
Konstantin Knizhnik
35ecb139dc Use stavfs instead inof statfs to fix MacOS build 2023-03-28 14:43:34 +04:00
Arseny Sher
278d0f117d Rename neon_local sk logs s/safekeeper 1.log/safekeeper-1.log.
I don't like spaces in file names.
2023-03-28 14:28:56 +04:00
Arseny Sher
c30b9e6eb1 Show full path to pg_ctl invokation when it fails. 2023-03-28 12:06:06 +04:00
Konstantin Knizhnik
82a4777046 Add local free space monitor (#3832)
## Describe your changes

Monitor free spae in local file system and shrink local file cache size
if it is under watermark.
Neon is using local storage for temp files (temp table + intermediate
results), unlogged relations
and local file cache.

Ideally all space not used for temporary files should be used for local
file cache.
Temporary files and even unlogged relation are intended to have small
life time (because
them can be lost at  any moment in case of compute restart).

So the policy is to overcommit local cache size and shrink it if there
is not enough free space.
As far as temporary files are expected to be needed for a short time,
there i no need
to permanently shrink local file cache size. Instead of it, we just
throw away least recently accessed elements
from local file cache, releasing some space on the local disk.

## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: sharnoff <sharnoff@neon.tech>
2023-03-28 08:27:50 +03:00
Dmitry Rodionov
6efea43449 Use precondition failed code in delete_timeline when tenant is missing (#3884)
This allows client to differentiate between missing tenant and missing
timeline cases
2023-03-27 21:01:46 +03:00
Joonas Koivunen
f14895b48e eviction: avoid post-restart download by synthetic_size (#3871)
As of #3867, we do artificial layer accesses to layers that will be
needed after the next restart, but not until then because of caches.

With this patch, we also do that for the accesses that the synthetic
size calculation worker does if consumption metrics are enabled.

The actual size calculation is not of importance, but we need to
calculate all of the sizes, so we only call tenant::size::gather_inputs.

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-03-27 19:20:23 +02:00
Christian Schwarz
fe15624570 eviction_task: only refresh layer accesses once per p.threshold (#3877)
Without this, we run it every p.period, which can be quite low. For
example, the running experiment with 3000 tenants in prod uses a period
of 1 minute.

Doing it once per p.threshold is enough to prevent eviction.
2023-03-27 14:33:40 +03:00
Christian Schwarz
ff51e96fbd fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn (#3874)
fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn

Assume a single-timeline project.
If the gc_horizon covers all WAL (last_record_lsn < gc_horizon)
but we have written more data than just initdb, the synthetic
size calculation worker needs to calculate the logical size
at LSN initdb_lsn (Segment BranchStart).

Before this patch, that calculation would incorrectly return
the initial logical size calculation result that we cache in
the Timeline::initial_logical_size. Presumably, because there
was confusion around initdb_lsn vs. initial size calculation.

The fix is to only hand out the initialized_size() only if
the LSN matches.

The distinction in the metrics between "init logical size" and "logical
size" was also incorrect because of the above. So, remove it.

There was a special case for `size != 0`. This was to cover the case of
LogicalSize::empty_initial(), but `initial_part_end` is `None` in that
case, so the new `LogicalSize::initialized_size()` will return None
in that case as well.

Lastly, to prevent confusion like this in the future, rename all
occurrences of `init_lsn` to either just `lsn` or a more specific name.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-27 12:45:10 +02:00
Vadim Kharitonov
e3cbcc2ea7 Revert "Add neondatabase/release team as a default reviewers for storage"
This reverts commit daeaa767c4.
2023-03-27 14:10:18 +04:00
Heikki Linnakangas
8d78329991 Remove some dead code.
whoami() was never called, 'is_test' was never set.

'restart()' might be useful, but it wasn't hooked up the CLI so it was
dead code. It's not clear what kind of a restart it should perform,
anyway: just restart Postgres, or re-initialize the data directory
from a fresh basebackup like "stop"+"start" does.
2023-03-27 12:24:35 +03:00
Dmitry Rodionov
4d8c765485 remove redundant dyn (#3878)
remove redundant dyn
2023-03-27 12:04:48 +03:00
dependabot[bot]
4071ff8c7b Bump openssl from 0.10.45 to 0.10.48 in /test_runner/pg_clients/rust/tokio-postgres (#3879) 2023-03-25 12:33:39 +00:00
Dmitry Rodionov
870ba43a1f return proper http codes in timeline delete endpoint (#3876)
return proper http codes in timeline delete endpoint
+ fix openapi spec for detach to include 404 responses
2023-03-24 19:25:39 +02:00
Joonas Koivunen
f5ca897292 fix: less logging at shutdown (#3866)
Log less during shutdown; don't log anything for quickly (less than 1s)
exiting tasks.
2023-03-23 12:00:52 +02:00
Kirill Bulatov
8bd565e09e Ensure branches with no layers have their remote storage counterpart created eventually (#3857)
Discovered during writing a test for
https://github.com/neondatabase/neon/pull/3843
2023-03-22 17:42:31 +02:00
Joonas Koivunen
6033dfdf4a Re-access layers before threshold eviction (#3867)
To avoid re-downloading evicted files on restart, re-compute logical
size and partitioning before each threshold based eviction run.

Cc: #3802

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-03-22 16:26:27 +02:00
mikecaat
14a40c9ca6 Fix minor things for the docker-compose file (#3862)
* Add the REPOSITORY env to build args to avoid the following error when
executing without the credentials for the repository.

```
ERROR: Service 'compute' failed to build: Head
"https://369495373322.dkr.ecr.eu-central-1.amazonaws.com/v2/compute-node-v15/manifests/2221":
no basic auth credentials
```

* update the tag version in the documentation to support storage broker
2023-03-22 08:10:53 +00:00
Shany Pozin
0f7de84785 Allow calling detach on ignored tenant (#3834)
## Describe your changes
Added a query param to detach API
Allow to remove local state of a tenant even if its not in the memory
(following ignore API)
## Issue ticket number and link
#3828
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2023-03-22 07:17:00 +00:00
Kirill Bulatov
dd22c87100 Remove older layer metadata format support code (#3854)
The PR enforces current newest `index_part.json` format in the type
system (version `1`), not allowing any previous forms of it, that were
used in the past.
Similarly, the code to mitigate the
https://github.com/neondatabase/neon/issues/3024 issue is now also
removed.

Current code does not produce old formats and extra files in the
index_part.json, in the future we will be able to use
https://github.com/neondatabase/aversion or other approach to make
version transitions more explicit.

See https://neondb.slack.com/archives/C033RQ5SPDH/p1679134185248119 for
the justification on the breaking changes.
2023-03-21 23:33:28 +02:00
Heikki Linnakangas
6fdd9c10d1 Read storage auth token from spec file.
We read the pageserver connection string from the spec file, so let's
read the auth token from the same place.

We've been talking about pre-launching compute nodes that are not
associated with any particular tenant at startup, so that the spec
file is delivered to the compute node later. We cannot change the env
variables after the process has been launched.

We still pass the token to 'postgres' binary in the NEON_AUTH_TOKEN
env variable, but compute_ctl is now responsible for setting it.
2023-03-21 20:12:09 +02:00
Dmitry Rodionov
4158e24e60 rfc: delete pageserver data from s3 (#3792)
[Rendered](https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md)

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-21 20:03:27 +02:00
Shany Pozin
809acb5fa9 Move neon-image-depot to a larger runner (#3860)
## Describe your changes
https://neondb.slack.com/archives/C039YKBRZB4/p1679413279637059
## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-21 19:32:36 +02:00
Heikki Linnakangas
299db9d028 Simplify and clean up the $NEON_AUTH_TOKEN stuff in compute
- Remove the neon.safekeeper_token_env GUC. It was used to set the
  name of an environment variable, which was then used in pageserver
  and safekeeper connection strings to in place of the
  password. Instead, always look up the environment variable called
  NEON_AUTH_TOKEN. That's what neon.safekeeper_token_env was always
  set to in practice, and I don't see the need for the extra level of
  indirection or configurability.

- Instead of substituting $NEON_AUTH_TOKEN in the connection strings,
  pass $NEON_AUTH_TOKEN "out-of-band" as the password, when we connect
  to the pageserver or safekeepers. That's simpler.

- Also use the password from $NEON_AUTH_TOKEN in compute_ctl, when it
  connects to the pageserver to get the "base backup".
2023-03-21 00:15:04 +02:00
Heikki Linnakangas
5a786fab4f Remove duplicated global variables in neon extension.
Walproposer used to live in the backend, while pagestore_smgr was an
extension. But now that both are part of the neon extension,
walproposer can access the same 'neon_tenant' and 'neon_timeline'
variables as the pageserver_smgr code.
2023-03-21 00:15:04 +02:00
Arseny Sher
699f200811 Send error context chain to the client when Copy stream errors. 2023-03-21 01:22:02 +04:00
Christian Schwarz
881356c417 add metrics to detect eviction-induced thrashing (#3837)
This patch adds two metrics that will enable us to detect *thrashing* of
layers, i.e., repetitions of `eviction, on-demand-download, eviction,
... ` for a given layer.

The first metric counts all layer evictions per timeline. It requires no
further explanation. The second metric counts the layer evictions where
the layer was resident for less than a given threshold.

We can alert on increments to the second metric. The first metric will
serve as a baseline, and further, it's generally interesting, outside of
thrashing.

The second metric's threshold is configurable in PageServerConf and
defaults to 24h. The threshold value is reproduced as a label in the
metric because the counter's value is semantically tied to that
threshold. Since changes to the config and hence the label value are
infrequent, this will have low storage overhead in the metrics storage.

The data source to determine the time that the layer was resident is the
file's `mtime`. Using `mtime` is more of a crutch. It would be better if
Pageserver did its own persistent bookkeeping of residence change events
instead of relying on the filesystem. We had some discussion about this:
https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900

My position is that `mtime` is good enough for now. It can theoretically
jump forward if someone copies files without resetting `mtime`. But that
shouldn't happen in practice. Note that moving files back and forth
doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`,
which is typically used for filesystem-level backup / restore, correctly
syncs `mtime`.

I've added a label that identifies the data source to keep options open
for a future, better data source than `mtime`. Since this value will
stay the same for the time being, it's not a problem for metrics
storage.

refs https://github.com/neondatabase/neon/issues/3728
2023-03-20 16:11:36 +01:00
Heikki Linnakangas
fea4b5f551 Switch to EdDSA algorithm for the storage JWT authentication tokens.
The control plane currently only supports EdDSA. We need to either teach
the storage to use EdDSA, or the control plane to use RSA. EdDSA is more
modern, so let's use that.

We could support both, but it would require a little more code and tests,
and we don't really need the flexibility since we control both sides.
2023-03-20 16:28:01 +02:00
Heikki Linnakangas
77107607f3 Allow JWT key generation to fail if authentication is not enabled.
This allows you to run without the 'openssl' binary as long as you
don't enable authentication. This becomes more important with the next
commit, which switches the JWT algorithm to EdDSA. LibreSSL does not
support EdDSA, and LibreSSL comes with macOS, so the next commit makes
it much more likely for the key generation to fail for macOS users.

To allow running without a keypair, don't generate the authentication
token in the 'neon_local init' step. Instead, generate a new token on
every request that needs one, using the private key.
2023-03-20 16:28:01 +02:00
Heikki Linnakangas
1da963b2f9 Remove some unused code in control plane. 2023-03-20 16:28:01 +02:00
Heikki Linnakangas
1ddb9249aa Reduce the # of histogram buckets in metrics. (#3850)
Shrinks the total number of metrics collected for each timeline by
about 50%.

See https://github.com/neondatabase/neon/issues/2848. This doesn't fully
solve the problem, we still collect a lot of metrics even with this, but
this gives us a lot of headroom.
2023-03-20 15:49:16 +02:00
Joonas Koivunen
0c1228c37a feat: store initial timeline in env fixture (#3839)
minor change, but will allow more use in future for the default
tenants.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-03-20 11:57:27 +02:00
Christian Schwarz
3c15874c48 allow specifying eviction_policy in TenantCreateRequest
This was on oversight from 175a577ad4.

Nothing uses this AFAIK, but, let's fix it anyways.

Noticed while working on https://github.com/neondatabase/neon/issues/3728
2023-03-20 10:43:53 +01:00
Shany Pozin
93f3f4ab5f Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818)
## Describe your changes
Add Error enum for tenant state response to allow better error handling
in mgmt api
## Issue ticket number and link
#2238
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-19 10:44:42 +02:00
sharnoff
f6e2e0042d Fix + re-enable VM cgroup creation + running (#3820)
Re-enable cgroup shenanigans in VMs, with some special care taken to
make sure that our version of cgroup-tools supports cgroup v2 (debian
bullseye does not, and probably won't because it requires a breaking
change in libcgroup).

This involves manually building libcgroup  / cgroup-tools from source,
then copying the output into the final build stage.

We originally considered pulling the package from debian's testing repo
(which is up-to-date), but decided against it. Refer to the PR for more
details.

Prior work, for reference:

* 2153d2e0 - Run compute_ctl in a cgroup in VMs
* 1360361f - Fix missing VM cgconfig.conf
* 8dae8799 - Disable VM cgroup shenanigans
2023-03-16 17:09:45 -07:00
Christian Schwarz
b917270c67 remove unused TenantConfig::update function 2023-03-16 16:25:35 +01:00
Arthur Petukhovsky
b067378d0d Measure cross-AZ traffic in safekeepers (#3806)
Create `safekeeper_pg_io_bytes_total` metric to track total amount of
bytes written/read in a postgres connections to safekeepers. This metric
has the following labels:
- `client_az` – availability zone of the connection initiator, or
`"unknown"`
- `sk_az` – availability zone of the safekeeper, or `"unknown"`
- `app_name` – `application_name` of the postgres client
- `dir` – data direction, either `"read"` or `"write"`
- `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from
`client_az` and `sk_az`, exists purely for convenience.

This is implemented by passing availability zone in the connection
string, like this: `-c tenant_id=AAA timeline_id=BBB
availability-zone=AZ-1`.

Update ansible deployment scripts to add availability_zone argument
to safekeeper and pageserver in systemd service files.
2023-03-16 17:24:01 +03:00
Joonas Koivunen
768c8d9972 test: allow gc to get unlucky (#3826)
this failure case was probably introduced by b220ba6, because earlier
the gc would always have run fast enough for restart every 1s. however,
test got added later, so we have just been lucky.

fixes #3824 by allowing this error to happen.
2023-03-16 11:03:21 +02:00
Rahul Patil
f1d960d2c2 Add new pageserver to eu-central-1 (#3829)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-15 16:58:28 +01:00
Arthur Petukhovsky
cd17802b1f Add pg_clients test for serverless driver (#3827)
Fixes #3819
2023-03-15 16:18:38 +03:00
Heikki Linnakangas
10a5d36af8 Separate mgmt and libpq authentication configs in pageserver. (#3773)
This makes it possible to enable authentication only for the mgmt HTTP
API or the compute API. The HTTP API doesn't need to be directly
accessible from compute nodes, and it can be secured through network
policies. This also allows rolling out authentication in a piecemeal
fashion.
2023-03-15 13:52:29 +02:00
Arseny Sher
a7ab53c80c Forward framed read buf contents to compute before proxy pass.
Otherwise they get lost. Normally buffer is empty before proxy pass, but this is
not the case with pipeline mode of out npm driver; fixes connection hangup
introduced by b80fe41af3 for it.

fixes https://github.com/neondatabase/neon/issues/3822
2023-03-15 14:32:41 +03:00
Heikki Linnakangas
2672fd09d8 Make test independent of the order of config lines. 2023-03-14 20:10:34 +02:00
Heikki Linnakangas
4a92799f24 Fix check for trailing garbage in basebackup import.
There was a warning for trailing garbage after end-of-tar archive, but
it didn't always work. The reason is that we created a StreamReader
over the original copyin-stream, but performed the check for garbage
on the copyin-stream. There could be some garbage bytes buffered in
the StreamReader, which were not caught by the warning.

I considered turning the the warning into a fatal error, aborting the
import, but I wasn't sure if we handle aborting the import properly.
Do we clean up the timeline directory on error? If we don't, we should
make that more robust, but that's a different story.

Also, normally a valid tar archive ends with two 512-byte blocks of zeros.
The tokio_tar crate stops at the first all-zeros block. Read and check
the second all-zeros block, and error out if it's not there, or contains
something unexpected.
2023-03-14 19:45:33 +02:00
Konstantin Knizhnik
5396273541 Avoid holes between generated image layers (#3771)
## Describe your changes

When we perform partitioning of the whole key space, we take in account
actual ranges of relation present in the database. So if we have
relation with relid=1 and size 100 and relation with relid=2 with size
200 then result of KeySpace::partition may contain partitions
<100000000..100000099> and <200000000..200000199>. Generated image
layers will contain the same boundaries.
But when GC is checking image coverage to find out of old layer is fully
covered by newer image layer and so can be deleted, it takes in account
only full key range. I.e. if there is delta layer <100000000..300000000>
then it never be garbage collected because image layers
<100000000..100000099> and <200000000..200000199> are not completely
covering it.This is how it looks in practice:


000000067F000032AC00000A300000000000-000000067F000032AC00000A330000000000__000000000F761828

000000067F000032AC00000A31000000001F-000000067F000032AC00000A620000000005__0000000001696070-000000000442A551

000000067F000032AC00000A3300FFFFFFFF-000000067F000032AC00000A650100000000__000000000F761828

So there are two image layers covering delta layer but ... there is a
hole: A330000000000...A3300FFFFFFFF and as a result delta layer is not
collected.

## Issue ticket number and link

This PR is deeply related with #3673 because it is addressing the same
problem: old layers are not utilized by GC.
The test test_gc_old_layers.py in #3673 can be used to see effect of
this patch.

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-14 16:15:07 +02:00
Joonas Koivunen
c23c8946a3 chore: clippies introduced with rust 1.68 (#3781)
- handle automatically fixable future clippies
- tune run-clippy.sh to remove macos specifics which we no longer have

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-03-14 15:29:02 +02:00
Joonas Koivunen
15b692ccc9 test: more strict finding of WARN, ERROR lines (#3798)
this prevents flakyness when `WARN|ERROR` appears in some other part of
the line, for example in a random filename.
2023-03-14 15:27:52 +02:00
Alexander Bayandin
3d869cbcde Replace flake8 and isort with ruff (#3810)
- Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort
- Update mypy and black
2023-03-14 13:25:44 +00:00
Lassi Pölönen
68ae020b37 Use RollingUpdate strategy also for legacy proxy (#3814)
## Describe your changes
We have previously changed the neon-proxy to use RollingUpdate. This
should be enabled in legacy proxy too in order to avoid breaking
connections for the clients and allow for example backups to run even
during deployment. (https://github.com/neondatabase/neon/pull/3683)

## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-03-14 13:23:46 +00:00
Joonas Koivunen
d6bb8caad4 refactor: correct return value for not found L0's on LayerMap::replace (#3805)
in prev implementation, an `ok_or_else(...)?` is used to cause a
"precondition error" on LayerMap::replace, however we only see this
particular error if an L0 for which replace fails is not in the layermap
because it is not in `l0_delta_layers`. changes or fixes this to be
Replacement::NotFound instead, making it more clear that an error would
only be raised for actual preconditions, like trying to replace layer
with completly unrelated layer.
2023-03-14 13:18:26 +02:00
Alexander Bayandin
319402fc74 postgres_ffi: restore POSTGRES_INSTALL_DIR support (#3811)
Fix path construction to `pg_config`: `pg_install_dir_versioned` 
already includes `pg_version`
2023-03-14 10:25:48 +00:00
Dmitry Ivanov
2e4bf7cee4 [proxy] Immediately log all compute node connection errors. 2023-03-14 01:45:57 +03:00
Dmitry Ivanov
15ed6af5f2 Add descriptions to proxy's python tests. 2023-03-14 01:32:37 +03:00
Dmitry Rodionov
50476a7cc7 test: update to match current interfaces 2023-03-13 17:50:10 +02:00
andres
d7ab69f303 add test for getting branchpoints from an inactive timeline 2023-03-13 17:50:10 +02:00
sharnoff
582620274a Enable file cache handling by vm-informant (#3794)
Enables the VM informant's file cache integration.

See also: https://github.com/neondatabase/autoscaling/pull/47
2023-03-13 07:16:39 -07:00
Vadim Kharitonov
daeaa767c4 Add neondatabase/release team as a default reviewers for storage
releases
2023-03-13 13:40:15 +01:00
Konstantin Knizhnik
f0573f5991 Remove block cursor cache (#3740)
## Describe your changes

Do not pin current block in BlockCursor

## Issue ticket number and link

See #3712 

There are places (see get_reconstruct_data) in our code when thread is
holding read layers lock and then try to read file and so lock page
cache slot. So we have edge in dependency graph layers->page cache slot.
At the same time (as Christian noticed) we can lock page cache slot in
BlockCursor and then try obtain shard lock on layers.
So there is backward edge in dependency graph page cache slot>layers
which forms loop and may cause deadlock.

There are three possible fixes of the problem:
1. Perform compaction under `layers` shared lock. See PR #3732. It fixes
the problem but make it not possible to append any data to pageserver
until compaction is completed.
2. Do not hold `layers` lock while accessing layers (not sure if it is
possible to do because it definitely introduce some new race
conditions).
3. Do not pin current pages in BockCursor (this PR).

My experiments shows that this cache in BlockCursor is not so useful:
the number of hits/misses for cursor cache on pgbench workload (-i -s
10/-c 10 -T 100/-c 10 -S -T 100):
```
hits: 163011
misses: 1023602
```
So number of cache misses is 10x times  larger.
And results for read-only pgbench are mostly the same:
```
with cache: 14581
w/out cache: 14429
```
 
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-13 14:07:35 +02:00
Nikita Kalyanov
07dcf679de set content type explicitly (#3799)
I moved management API v2 to ogen and the generated code seems to be
more strict about content type. Let's set it properly as it is json
after all

## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-13 14:00:01 +02:00
Heikki Linnakangas
e0ee138a8b Add a test for tokio-postgres client to the driver test suite.
It is fully supported. To enable TLS, though, it requires some extra
glue code, and a dependency to a TLS library.
2023-03-13 12:14:37 +02:00
Arthur Petukhovsky
d9a1329834 Make postgres_backend use generic IO type (#3789)
- Support measuring inbound and outbound traffic in MeasuredStream
- Start using MeasuredStream in safekeepers code
2023-03-13 12:18:10 +03:00
Joonas Koivunen
8699342249 Ondemand rx bytes and layer count (#3777)
Adds two new *global* metrics:
- pageserver_remote_ondemand_downloaded_layers_total
- pageserver_remote_ondemand_downloaded_bytes_total

An existing test is repurposed once more to check that we do get some
reasonable counts. These are to replace guessing from the nic RX bytes
metric how much was on-demand downloaded.

First part of #3745: This does not add the "(un)?avoidable" metric,
which I plan to add as a new metric, which will be a subset of the
counts of the metrics added here.
2023-03-13 09:26:49 +02:00
Joonas Koivunen
ce8fbbd910 Fix allowed error again (#3790)
Fixes #3360 again, this time checking all other "Error processing HTTP
request" messages and aligning the regex with the two others.
2023-03-10 17:44:12 +00:00
Vadim Kharitonov
1401021b21 Be able to get number of CPUs (#3774)
After enabling autoscaling, we faced the issue that customers are not
able to get the number of CPUs they use at this moment. Therefore I've
added these two options:

1. Postgresql function to allow customers to call it whenever they want
2. `compute_ctl` endpoint to show these number in console
2023-03-10 19:00:20 +02:00
Stas Kelvich
252b3685a2 Use unsafe-postgres feature to build pgx extension
Recently added `unsafe-postgres` feature allows to build pgx extensions
against postgres forks that decided to change their ABI name (like us).
With that we can build extensions without forking them and using stock
pgx. As this feature is new few manual version bumps were required.
2023-03-10 17:40:45 +02:00
Heikki Linnakangas
34d3385b2e Add unit tests for JWT encoding and decoding. 2023-03-10 16:09:32 +02:00
Heikki Linnakangas
b00530df2a Add section in internal docs on the JWT payload.
Just copied from the code comments. Could be improved, but this is a
start.
2023-03-10 16:09:32 +02:00
Heikki Linnakangas
bebf76c461 Accept RS384 and RS512 JWT tokens.
Previously, we only accepted RS256. Seems like a pointless limitation,
and when I was testing it with RS512 tokens, it took me a while to
understand why it wasn't working.
2023-03-10 16:09:32 +02:00
Vadim Kharitonov
2ceef91da1 Compile pg_tiktoken extension 2023-03-10 14:59:26 +01:00
Anastasia Lubennikova
b7fddfa70d Add branch_id field to proxy_io_bytes_per_client metric.
Since we allow switching endpoints between different branches, it is important to use composite key.
Otherwise, we may try to calculate delta between metric values for two different branches.
2023-03-10 15:00:48 +02:00
Heikki Linnakangas
d1537a49fa Fix escaping in postgresql.conf that we generate at compute startup
If there are any config options that contain single quotes or backslashes,
they need to be escaped
2023-03-10 14:59:21 +02:00
Heikki Linnakangas
856d01ff68 Add newline at end of postgresql.conf 2023-03-10 14:59:21 +02:00
Heikki Linnakangas
42ec79fb0d Make expected test output nicer to read.
By using Rust raw string literal.
2023-03-10 14:59:21 +02:00
Rory de Zoete
3c4f5af1b9 Try depot.dev for image building (#3768)
To see if it is faster. Run side-by-side for a while so we can gather
enough data.
2023-03-10 11:11:39 +01:00
Arseny Sher
290884ea3b Fix too many arguments in read_network clippy complain. 2023-03-10 10:50:03 +03:00
Arseny Sher
965837df53 Log connection ids in safekeeper instead of thread ids.
Fixes build on macOS (which doesn't have nix gettid) after 0d8ced8534.
2023-03-10 10:50:03 +03:00
Arseny Sher
d1a0f2f0eb Fix example why manual-range-contains is disabled. 2023-03-10 00:03:17 +03:00
Konstantin Knizhnik
a34e78d084 Retry attempt to connect to pageserver in order to make pageserver restart transparent for clients (#3700)
…start transparent for clients

## Describe your changes

Try to reestablish connection with pageserver if send is failed to be
able to make pageserver restart transparent for client

## Issue ticket number and link

https://github.com/neondatabase/neon/issues/1138

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-09 22:15:46 +02:00
Arseny Sher
b80fe41af3 Refactor postgres protocol parsing.
1) Remove allocation and data copy during each message read. Instead, parsing
functions now accept BytesMut from which data they form messages, with
pointers (e.g. in CopyData) pointing directly into BytesMut buffer. Accordingly,
move ConnectionError containing IO error subtype into framed.rs providing this
and leave in pq_proto only ProtocolError.

2) Remove anyhow from pq_proto.

3) Move FeStartupPacket out of FeMessage. Now FeStartupPacket::parse returns it
directly, eliminating dead code where user wants startup packet but has to match
for others.

proxy stream.rs is adapted to framed.rs with minimal changes. It also benefits
from framed.rs improvements described above.
2023-03-09 20:45:56 +03:00
Arseny Sher
0d8ced8534 Remove sync postgres_backend, tidy up its split usage.
- Add support for splitting async postgres_backend into read and write halfes.
  Safekeeper needs this for bidirectional streams. To this end, encapsulate
  reading-writing postgres messages to framed.rs with split support without any
  additional changes (relying on BufRead for reading and BytesMut out buffer for
  writing).
- Use async postgres_backend throughout safekeeper (and in proxy auth link
  part).
- In both safekeeper COPY streams, do read-write from the same thread/task with
  select! for easier error handling.
- Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL
  -- join split parts back catching errors from them before returning.

Initially I hoped to do that read-write without split at all, through polling
IO:
https://github.com/neondatabase/neon/pull/3522
However that turned out to be more complicated than I initially expected
due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>>
which is Send construct just to satisfy the checker; 2) can be workaround with
transmute. But this is so messy that I decided to leave split.
2023-03-09 20:45:56 +03:00
Arseny Sher
7627d85345 Move async postgres_backend to its own crate.
To untie cyclic dependency between sync and async versions of postgres_backend,
copy QueryError and some logging/error routines to postgres_backend.rs. This is
temporal glue to make commits smaller, sync version will be dropped by the
upcoming commit completely.
2023-03-09 20:45:56 +03:00
Arseny Sher
3f11a647c0 Rename write_message to write_message_noflush in postgres_backend_async.rs
To make it unifrom across the project; proxy stream.rs and older
postgres_backend uses write_message_noflush.
2023-03-09 20:45:56 +03:00
Alexey Kondratov
e43c413a3f [compute_tools] Add /insights endpoint to compute_ctl (#3704)
This commit adds a basic HTTP API endpoint that allows scraping the
`pg_stat_statements` data and getting a list of slow queries. New
insights like cache hit rate and so on could be added later.

Extension `pg_stat_statements` is checked / created only if compute
tries to load the corresponding shared library. The latter is configured
by control-plane and currently covered with feature flag.

Co-authored by Eduard Dyckman (bird.duskpoet@gmail.com)
2023-03-09 14:21:10 +01:00
Heikki Linnakangas
8459e0265e Add performance test for compaction and image layer creation 2023-03-09 14:30:12 +02:00
Kirill Bulatov
03a2ce9d13 Add tracing spans with request_id into pageserver management API handlers (#3755)
Adds a newtype that creates a span with request_id from
https://github.com/neondatabase/neon/pull/3708 for every HTTP request
served.

Moves request logging and error handlers under the new wrapper, so every request-related event now is logged under the request span.
For compatibility reasons, error handler is left on the general router, since not every service uses the new handler wrappers yet.
2023-03-09 09:24:01 +02:00
Heikki Linnakangas
ccf92df4da Remove deprecated support to handle ZENITH_AUTH_TOKEN.
It's not used anywhere anymore.
2023-03-09 00:53:13 +02:00
Vadim Kharitonov
37bc6d9be4 Compile plpgsql_check extension 2023-03-08 23:20:24 +01:00
Vadim Kharitonov
177f986795 Compile hll extension 2023-03-08 12:03:09 +01:00
Heikki Linnakangas
fb1581d0b9 Fix setting "image_creation_threshold" setting in tenant config. (#3762)
We have a few tests that try to set image_creation_threshold, but it
didn't actually have any effect because we were missing some critical
code to load the setting from config file into memory.

The two modified tests in `test_remote_storage.py perform
compaction and GC, and assert that GC removes some layers. That
only happens if new image layers are created by the
compaction. The tests explicitly disabled image layer creation by
setting image_creation_threshold to a high value, but it didn't
take effect because reading image_creation_threshold from config
file was broken, which is why the test worked. Fix the test to
set image_creation_threshold low, instead, so that GC has work to
do.

Change 'test_tenant_conf.py' so that it exercises the added code.

This might explain why we're apparently missing test coverage for GC
(issue #3415), although I didn't try to address that here, nor did I
check if this improves the it.
2023-03-08 11:39:30 +02:00
Sasha Krassovsky
02b8e0e5af Add OpenAPI spec for do_gc (#3756)
## Describe your changes
Adds a field to the OpenAPI spec for the page server which describes the
`do_gc` command.
## Issue ticket number and link
#3669
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-07 09:08:46 -08:00
Vadim Kharitonov
1b16de0d0f Compile prefix extension 2023-03-07 17:42:53 +01:00
Stas Kelvich
069b5b0a06 Make postgres --wal-redo more embeddable.
* Stop allocating and maintaining 128MB hash table for last written

  LSN cache as it is not needed in wal-redo.

* Do not require access to the initialized data directory. That
  saves few dozens megabytes of empty but initialized data directory.
  Currently such directories do occupy about 10% of the disk space
  on the pageservers as most of tenants are empty.

* Move shmem-initialization code to the extension instead of postgres
2023-03-07 15:01:14 +02:00
Joonas Koivunen
b05e94e4ff fix: allow ERROR log to appear per allowed failure (#3696)
The test already allows the background thread trying to checkpoint to
fail, however the resulting log message is currently not allowed thus
causing flakyness.
2023-03-07 12:44:04 +00:00
Arseny Sher
0acf9ace9a Return 404 if timeline is not found in safekeeper HTTP API. 2023-03-07 16:34:20 +04:00
Arseny Sher
ca85646df4 Max peer_horizon_lsn before adopting it.
Before this patch, persistent peer_horizon_lsn was always sent to walproposer,
making it initially calculate it equal to max of persistent values and in turn
pulling back the in memory value. Send instead in memory value and take max when
safekeeper sets it.

closes https://github.com/neondatabase/neon/issues/3752
2023-03-07 10:16:54 +04:00
Shany Pozin
7b9057ad01 Add timeout to download copy (#3675)
## Describe your changes
Adding a timeout handling for the remote download of layers of 120
seconds for each operation
Note that these downloads are being retried for N times 

## Issue ticket number and link
Fixes: #3672

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-06 18:52:59 +02:00
Konstantin Knizhnik
96f65fad68 Handle crash of walredo process and retry applying wal records (#3739)
## Describe your changes

Restart walredo process an d retry applying walredo records i case of
abnormal walredo process termination

## Issue ticket number and link

See #1700


## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-06 10:10:58 +02:00
Christian Schwarz
9cada8b59d fix benchmarks, broken by PR #3737
Benchmarks only run on `main` branch, so, the pre-commit tests didn't
catch these.
2023-03-03 18:47:57 +01:00
Christian Schwarz
66a5159511 fix: compaction: no index upload scheduled if no on-demand downloads
Commit

    0cf7fd0fb8
    Compaction with on-demand download (#3598)

introduced a subtle bug: if we don't have to do on-demand downloads,
we only take one ROUND in fn compact() and exit early.
Thereby, we miss scheduling the index part upload for any layers
created by fn compact_inner().

Before that commit, we didn't have this problem.
So, this patch fixes it.

Since no regression test caught this, I went ahead and extended the
timeline size tests to assert that, if remote storage is configured,
1. pageserver_remote_physical_size matches the other physical sizes
2. file sizes reported by the layer map info endpoint match the other
   physical size metrics

Without the pageserver code fix, the regression test would
fail at the physical size assertion, complaining that
any of the resident physical size != remote physical size metric
50790400.0 != 18399232.0
I figured out what the problem is by comparing the remote storage
and local directories like so, and noticed that the image layer
in the local directory wasn't present on the remote side.
It's size was exactly the difference
    50790400.0 - 18399232.0  =32391168.0

fixes https://github.com/neondatabase/neon/issues/3738
2023-03-03 16:11:54 +01:00
Christian Schwarz
d1a0a907ff tests: use parse_metrics everywhere (#3737)
- use parse_metrics() in all places where we parse Prometheus metrics
- query_all: make `filter` argument optional
- encourage using properly parsed, typed metrics by changing get_metrics()
  to return already-parsed metrics. The new get_metric_str() method,
  like in the Safekeeper type, returns the raw text response.
2023-03-03 14:53:27 +01:00
Christian Schwarz
1b780fa752 timeline_checkpoint_handler: add span with tenant and timeline id
Before this patch, the logs written by freeze_and_flush() and compact()
didn't have any span, which made the test logs annoying to read.
2023-03-03 12:10:24 +01:00
Christian Schwarz
38022ff11c gc: only decrement resident size if GC'd layer is resident
Before this patch, GC would call PersistentLayer::delete()
on every GC'ed layer.
RemoteLayer::delete() returned Ok(()) unconditionally.
GC would then proceed by decrementing the resident size metric,
even though the layer is a RemoteLayer.

This patch makes the following changes:
- Rename PersistentLayer::delete() to delete_resident_layer_file().
  That name is unambiguous.
- Make RemoteLayer::delete_resident_layer_file return an Err().
  We would have uncovered this bug if we had done that from the start.
- Change GC / Timeline::delete_historic_layer check whether
  the layer is remote or not, and only call delete_resident_layer_file()
  if it's not remote. This brings us in line with how eviction does it.
- Add a regression test.

fixes https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
1b9b9d60d4 eviction: add comment explaining resident size decrement on error
https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
68141a924d eviction: remove needless if-let around resident size decrement
The branch was always taken at runtime, so, this should not
constitute a behavioral change.

refs https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
764d27f696 fix checkpoint_timeout serialization in TenantConf
Without this change, when actually setting this conf opt, the tenant
would become Broken next time we load it.
Why?
The serde_toml representation that persist_tenant_conf would write out
would be a TOML inline table of `secs` and `nsecs`.
But our hand-rolled TenantConf parser expects a TOML string.

I checked that all other `Duration` values in TenantConfOpt
use the humantime serialization.

Issues like this would likely be systematically prevent by
https://github.com/neondatabase/neon/issues/3682
2023-03-03 12:10:24 +01:00
Arthur Petukhovsky
b23742e09c Create /v1/debug_dump safekeepers endpoint (#3710)
Add HTTP endpoint to get full safekeeper state of all existing timelines
(all in-memory values and info about all files stored on disk).

Example:
https://gist.github.com/petuhovskiy/3cbb8f870401e9f486731d145161c286
2023-03-03 14:01:05 +03:00
Vadim Kharitonov
5e514b8465 Compile pgTAP extension 2023-03-03 10:14:38 +01:00
Vadim Kharitonov
a60f687ce2 Compile rum extension 2023-03-02 12:28:20 +01:00
sharnoff
8dae879994 Disable VM cgroup shenanigans (#3730)
As discussed - temporary, so it can unblock releasing autoscaling.
Cleaner to fully remove, then add back rather than commenting it out.
2023-03-01 23:58:43 -08:00
Shany Pozin
d19c5248c9 Add UUID header to mgmt API (#3708)
## Describe your changes

## Issue ticket number and link
#3479
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-01 18:09:08 +02:00
sharnoff
1360361f60 Fix missing VM cgconfig.conf (#3718)
It was being added to the wrong stage in the dockerfile. This should fix
it, and resolves an ongoing issue on staging.
2023-02-28 21:11:00 -08:00
Alexander Bayandin
000eb1b069 Bump tempfile from 3.3.0 to 3.4.0 (#3709)
Update `tempfile` crate to get rid of `remove_dir_all` dependency
Ref https://github.com/neondatabase/neon/security/dependabot/15
2023-02-27 12:44:08 +00:00
Heikki Linnakangas
f51b48fa49 Fix UNLOGGED tables.
Instead of trying to create missing files on the way, send init fork contents as
main fork from pageserver during basebackup. Add test for that. Call
put_rel_drop for init forks; previously they weren't removed. Bump
vendor/postgres to revert previous approach on Postgres side.

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>

ref https://github.com/neondatabase/postgres/pull/264
ref https://github.com/neondatabase/postgres/pull/259
ref https://github.com/neondatabase/neon/issues/1222
2023-02-24 23:30:02 +04:00
Sergey Melnikov
9f906ff236 Add pageserver-2.us-east-2.aws.neon.tech (#3701) 2023-02-23 19:56:21 +01:00
Sam Kleinman
c79dd8d458 compute_ctl: support for fetching spec from control plane (#3610) 2023-02-23 13:19:39 -05:00
Vadim Kharitonov
ec4ecdd543 Enable postgres SPI extensions 2023-02-23 16:49:37 +01:00
MMeent
20a4d817ce Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581)
## Describe your changes
Rebase vendored PostgreSQL onto 14.7 and 15.2

## Issue ticket number and link

#3579

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
    ```
The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL
14 and 15.2 for PostgreSQL 15.
    ```
2023-02-23 16:10:22 +02:00
Vadim Kharitonov
5ebf7e5619 Fix pg_jsonschema and pg_graphql 2023-02-23 10:43:46 +01:00
Arseny Sher
0692fffbf3 Bump vendor/postgres to include hotfix for unlogged tables with indexes.
https://github.com/neondatabase/postgres/pull/259
https://github.com/neondatabase/postgres/pull/262
2023-02-23 01:34:59 +04:00
Vadim Kharitonov
093570af20 Compile pg_hashids extension 2023-02-22 21:00:25 +01:00
Dmitry Rodionov
eb403da814 Use debug level for successful GET http requests (#3681)
We started rather frequently scrap some apis for metadata. This includes
layer eviction tester, I believe console does that too.

It should eliminate these logs:
https://neonprod.grafana.net/goto/rr_ace1Vz?orgId=1 (Note the rate
around 2k messages per minute)
2023-02-22 22:19:05 +03:00
Vadim Kharitonov
f3ad635911 Compile pgrouting extension 2023-02-22 20:16:11 +01:00
Vadim Kharitonov
a8d7360881 Compile hypopg extension 2023-02-22 20:14:30 +01:00
Lassi Pölönen
b0311cfdeb Change the production neon-proxy-scram update strategy to RollingUpdate (#3683)
## Describe your changes
The same change in production as was done in staging by
https://github.com/neondatabase/neon/pull/3678

## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-02-22 20:15:37 +02:00
Konstantin Knizhnik
412e0aa985 Skip largest N holes during compaction (#3597)
## Describe your changes

This is yet another attempt to address problem with storage size
ballooning #2948
Previous PR #3348 tries to address this problem by maintaining list of
holes for each layer.
The problem with this approach is that we have to load all layer on
pageserver start.
Lazy loading of layers is not possible any more.

This PR tries to collect information of N largest holes on compaction
time and exclude this holes from produced layers.
It can cause generation of larger number of layers (up to 2 times) and
producing small layers.
But it requires minimal changes in code and doesn't affect storage
format.

For graphical explanation please see thread:
https://github.com/neondatabase/neon/pull/3597#discussion_r1112704451

## Issue ticket number and link

#2948
#3348

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-22 18:28:01 +02:00
Lassi Pölönen
965b4f4ae2 Change the staging neon-proxy-scram update strategy to RollingUpdate (#3678)
## Describe your changes
When we deploy the proxy with the default Recreate strategy, there's
always some downtime and existing connections will be shut down. Change
the strategy to RollingUpdate and delay the kill signal by one week. AWS
Network Loadbalancer keeps the existing connections alive for as long as
the pods are alive, but will direct new connections to new pods.


## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-02-22 16:50:07 +02:00
Arthur Petukhovsky
95018672fa Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671)
We migrated all timelines to
`safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be
removed.
2023-02-22 11:55:41 +02:00
Sergey Melnikov
2caece2077 Add -v to ansible invocations (#3670)
To get more debug output on failures
2023-02-21 23:11:52 +03:00
Joonas Koivunen
b8b8c19fb4 fix: hold permit until GetObject eof (#3663)
previously we applied the ratelimiting only up to receiving the headers
from s3, or somewhere near it. the commit adds an adapter which carries
the permit until the AsyncRead has been disposed.

fixes #3662.
2023-02-21 21:14:08 +02:00
Joonas Koivunen
225add041f calculate_logical_size: no longer use spawn_blocking (#3664)
Calculation of logical size is now async because of layer downloads, so
we shouldn't use spawn_blocking for it. Use of `spawn_blocking`
exhausted resources which are needed by `tokio::io::copy` when copying
from a stream to a file which lead to deadlock.

Fixes: #3657
2023-02-21 21:09:31 +02:00
Joonas Koivunen
5d001b1e5a chore: ignore all compaction inactive tenant errors (#3665)
these are happening in tests because of #3655 but they sure took some
time to appear.

makes the `Compaction failed, retrying in 2s: Cannot run compaction
iteration on inactive tenant` into a globally allowed error, because it
has been seen failing on different test cases.
2023-02-21 20:20:13 +02:00
Joonas Koivunen
fe462de85b fix: log download failed error (#3661)
Fixes #3659
2023-02-21 19:31:53 +02:00
Vadim Kharitonov
c0de7f5cd8 Build pg_jsonschema and pg_graphql extensions (#3535)
## Describe your changes
Layer for building pg extensions written on Rust

It required forking:
* `cargo-pgx` (in order not to catch an ABI mismatch error (`cargo-pgx`
hardcoded ABI tcdi/pgx#1032)
* `pg_jsonschema` (to use forked `cargo-pgx` version)
* `pgx-contrib-spiext` (to use forked `cargo-pgx`)
* `pg_graphql` (to use forked `cargo-pgx` and `pgx-contrib-spiext`
version)

Before the patch:

```
postgres=# create extension pg_jsonschema;
2023-02-02 17:45:23.120 UTC [35] ERROR:  incompatible library "/usr/local/lib/pg_jsonschema.so": ABI mismatch
2023-02-02 17:45:23.120 UTC [35] DETAIL:  Server has ABI "Neon Postgres", library has "PostgreSQL".
2023-02-02 17:45:23.120 UTC [35] STATEMENT:  create extension pg_jsonschema;
ERROR:  incompatible library "/usr/local/lib/pg_jsonschema.so": ABI mismatch
DETAIL:  Server has ABI "Neon Postgres", library has "PostgreSQL".
```

After 

```
postgres=# create extension pg_jsonschema;
CREATE EXTENSION
postgres=# select json_matches_schema('{"type": "object"}', '{}');
 json_matches_schema
---------------------
 t
postgres=# create extension pg_graphql;
CREATE EXTENSION
postgres=# create table book(id int primary key, title text);
CREATE TABLE
postgres=# insert into book(id, title) values (1, 'book 1');
INSERT 0 1
postgres=# select graphql.resolve($$
query {
  bookCollection {
    edges {
      node {
        id
      }
    }
  }
}
$$);
                            resolve
----------------------------------------------------------------
 {"data": {"bookCollection": {"edges": [{"node": {"id": 1}}]}}}
(1 row)
```

## Issue ticket number and link
Closes #3429, #3096

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

`pg_jsonschema` extension will be available for our customers
2023-02-21 17:31:23 +01:00
Joonas Koivunen
b220ba6cd1 add random init delay for background tasks (#3655)
Fixes #3649.
2023-02-21 12:42:11 +01:00
Joonas Koivunen
7de373210d Warn when background tasks exceed their configured period (#3654)
Fixes #3648.
2023-02-21 13:02:19 +02:00
Vadim Kharitonov
5c5b03ce08 Compile xml2 extension 2023-02-21 10:34:45 +01:00
Joonas Koivunen
d7d3f451f0 Use tracing panic hook in all binaries (#3634)
Enables tracing panic hook in addition to pageserver introduced in
#3475:

- proxy
- safekeeper
- storage_broker

For proxy, a drop guard which resets the original std panic hook was
added on the first commit. Other binaries don't need it so they never
reset anything by `disarm`ing the drop guard.

The aim of the change is to make sure all panics a) have span
information b) are logged similar to other messages, not interleaved
with other messages as happens right now. Interleaving happens right now
because std prints panics to stderr, and other logging happens in
stdout. If this was handled gracefully by some utility, the log message
splitter would treat panics as belonging to the previous message because
it expects a message to start with a timestamp.

Cc: #3468
2023-02-21 10:03:55 +02:00
Keanu Ashwell
bc7d3c6476 docs: add dependency requirements for arch based systems (#3588)
This pull request adds information on building neon on Arch based system
such as Artix, Manjaro, Antergos, etc.
2023-02-20 22:51:54 +03:00
Sergey Melnikov
e3d75879c0 Use fqdn to access console management API on production (#3651)
console-release.local is legacy manual CNAME to
neon-internal-api.aws.neon.tech in r53
We could use neon-internal-api.aws.neon.tech name directly

This already was deployed to staging in
https://github.com/neondatabase/neon/pull/3642
2023-02-20 18:11:06 +01:00
Christian Schwarz
485b269674 eviction: tone down logs to debug!() level if there were no evictions
fixes #3647
2023-02-20 18:01:59 +01:00
Christian Schwarz
ee1eda9921 eviction: remove EvictionStats::not_considered_due_to_clock_skew
Rationale: see the block comment added in this patch.

fixes #3641
2023-02-20 18:01:59 +01:00
Christian Schwarz
e363911c85 timeline: propagate span to download_remote_layer (#3644)
fixes #3643
refs #3604
2023-02-20 17:18:13 +02:00
Sergey Melnikov
d5d690c044 Use fqdn for staging console management API (#3642)
`console-staging.local` is legacy manual CNAME to
`neon-internal-api.aws.neon.build` in r53
We could use `neon-internal-api.aws.neon.build` name directly
2023-02-20 16:05:21 +01:00
Arthur Petukhovsky
8f557477c6 Add new safekeeper to ap-southeast-1 prod (#3645) 2023-02-20 17:51:27 +03:00
Shany Pozin
af210c8b42 Allow running do_gc in non testing env (#3639)
## Describe your changes
Since the current default gc period is set to 1 hour, whenever there is
an immediate need to reduce PITR and run gc, the user has to wait 1 hour
for PITR change to take effect
By enabling this API the user can configure PITR and immediately call
the do_gc API to trigger gc
## Issue ticket number and link
#3590
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-20 13:23:13 +02:00
sharnoff
2153d2e00a Run compute_ctl in a cgroup in VMs (#3577) 2023-02-17 14:14:41 -08:00
Alexander Bayandin
564fa11244 Update Postgres extensions (#3615)
- Update postgis from 3.3.1 from 3.3.2
- Update plv8 from 3.1.4 to 3.1.5
- Update h3-pg from 4.0.1 to 4.1.2 (and underlying h3 from 4.0.1 to 4.1.0)
2023-02-17 18:18:23 +00:00
Christian Schwarz
8d28a24b26 staging: enable automatic layer eviction at 20m threshold + period (#3636)
What it says on the tin.

Part of #2476
2023-02-17 18:32:01 +02:00
Anastasia Lubennikova
53128d56d9 Fix make clean:
Use correct paths in neon-pg-ext-clean
2023-02-17 17:57:45 +02:00
Anastasia Lubennikova
40799d8ae7 Add debug messages to catch abnormal consumption metric values 2023-02-17 17:57:45 +02:00
Konstantin Knizhnik
b242b0ad67 Fix flaky tests (#3616)
## Describe your changes

test_on_demand_download is flaky because not waiting until created image
layer is transferred to S3.
test_tenants_with_remote_storage just leaves garbage at the end of
overwritten file.

Right solution for test_on_demand_download is to add some API call to
wait completion of synchronization with S3 (not just based on last
record LSN). But right now it is solved using sleep.

## Issue ticket number and link

#3209 

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-17 15:56:56 +02:00
Dmitry Ivanov
d90cd36bcc [proxy] Improve tracing spans here and there. 2023-02-17 15:32:14 +03:00
Dmitry Ivanov
956b6f17ca [proxy] Handle some unix signals.
On the surface, this doesn't add much, but there are some benefits:

* We can do graceful shutdowns and thus record more code coverage data.

* We now have a foundation for the more interesting behaviors, e.g. "stop
  accepting new connections after SIGTERM but keep serving the existing ones".

* We give the otel machinery a chance to flush trace events before
  finally shutting down.
2023-02-17 15:32:14 +03:00
Heikki Linnakangas
6f9af0aa8c [proxy] Enable OpenTelemetry tracing.
This commit sets up OpenTelemetry tracing and exporter, so that they
can be exported as OpenTelemetry traces as well.

All outgoing HTTP requests will be traced. A separate (child)
span is created for each outgoing HTTP request, and the tracing
context is also propagated to the server in the HTTP headers.

If tracing is enabled in the control plane and compute node too, you
can now get an end-to-end distributed trace of what happens when a new
connection is established, starting from the handshake with the
client, creating the 'start_compute' operation in the control plane,
starting the compute node, all the way to down to fetching the base
backup and the availability checks in compute_ctl.

Co-authored-by: Dmitry Ivanov <dima@neon.tech>
2023-02-17 15:32:14 +03:00
Joonas Koivunen
8e6b27bf7c fix: avoid busy loop on replacement failure (#3613)
Add an AtomicBool per RemoteLayer, use it to mark together with closed
semaphore that remotelayer is unusable until restart or ignore+load.

https://github.com/neondatabase/neon/issues/3533#issuecomment-1431481554
2023-02-17 14:15:29 +02:00
Joonas Koivunen
ae3eff1ad2 Tracing panic hook (#3475)
Fixes #3468.

This does change how the panics look, and most importantly, make sure
they are not interleaved with other messages. Adds a `GET /v1/panic`
endpoint for panic testing (useful for sentry dedup and this hook
testing).

The panics are now logged within a new error level span called `panic`
which separates it from other error level events. The panic info is
unpacked into span fields:
- thread=mgmt request worker
- location="pageserver/src/http/routes.rs:898:9"

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-02-17 13:56:00 +02:00
Joonas Koivunen
501702b27c fix: flaky test_compaction_downloads_on_demand_with_image_creation (#3629)
fix is to stop postgres before the final checkpoint to ensure no
inmemory layer gets created.

Fixes #3627.
2023-02-17 13:34:26 +02:00
Alexander Bayandin
526f8b76aa Bump werkzeug from 2.1.2 to 2.2.3 (#3631)
## Describe your changes

```
$ poetry add werkzeug@latest "moto[server]@latest"
Using version ^2.2.3 for werkzeug
Using version ^4.1.2 for moto

Updating dependencies
Resolving dependencies... (1.6s)

Writing lock file

Package operations: 0 installs, 2 updates, 1 removal

  • Removing pytz (2022.1)
  • Updating werkzeug (2.1.2 -> 2.2.3)
  • Updating moto (3.1.18 -> 4.1.2)
```

Resolves:
- https://github.com/neondatabase/neon/security/dependabot/14
- https://github.com/neondatabase/neon/security/dependabot/13

`@dependabot` failed to create a PR for some reason (I guess because it
also needed to handle `moto` dependency)

## Issue ticket number and link
N/A

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-17 08:29:52 +00:00
Sergey Melnikov
a1b062123b Do not deploy storage to old account (#3630)
It's gone
2023-02-16 20:28:53 +00:00
Dmitry Ivanov
a4d5c8085b Move hacks to a dedicated module. 2023-02-16 22:10:56 +03:00
Dmitry Ivanov
edffe0dd9d Extract password hack & cleartext hack 2023-02-16 22:10:56 +03:00
Heikki Linnakangas
d9c518b2cc Refactor use_cleartext_password_flow.
It's not a property of the credentials that we receive from the
client, so remove it from ClientCredentials. Instead, pass it as an
argument directly to 'authenticate' function, where it's actually
used. All the rest of the changes is just plumbing to pass it through
the call stack to 'authenticate'
2023-02-16 22:10:56 +03:00
Anastasia Lubennikova
0d3aefb274 Only use active timelines in synthetic_size calculation 2023-02-16 17:58:53 +02:00
Anastasia Lubennikova
6139e8e426 Revert "Add debug messages around sending cached metrics"
This reverts commit a839860c2e.
2023-02-16 17:46:15 +02:00
Anastasia Lubennikova
d9ba3c5f5e Revert "Add debug messages around timeline.get_current_logical_size"
This reverts commit a5ce2b5330.
2023-02-16 17:46:15 +02:00
Joonas Koivunen
0cf7fd0fb8 Compaction with on-demand download (#3598)
Repeatedly (twice) try to download the compaction targeted layers before actual
compaction. Adds tests for both L0 compaction downloading layers and image
creation downloading layers. Image creation support existed already.

Fixes #3591

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-02-16 15:36:13 +02:00
Kirill Bulatov
f0b41e7750 Propose less verbose way to build neon (#3624)
Closes https://github.com/neondatabase/neon/issues/3518 and might help
https://github.com/neondatabase/neon/issues/3611 and the future build
attempts.

Propose `-s` flag in the Readme when building via `make` command, to
help people to spot build errors easier.
2023-02-16 14:25:35 +02:00
Vadim Kharitonov
5082d84f5b Compile pgjwt extension 2023-02-16 10:54:22 +01:00
Anastasia Lubennikova
7991bd3b69 Fix periodic metric sending: don't reset timer on every iteration (#3617)
Previously timer was reset on every collect_metrics_iteration and sending of cached metrics was never triggered.

This is a follow-up for a69da4a7.
2023-02-16 10:56:42 +02:00
Heikki Linnakangas
ddbdcdddd7 Tenant size calculation: refactor, rewrite, and add SVG (#2817)
Refactor the tenant_size_model code. Segment now contains just the
minimum amount of information needed to calculate the size. Other
information that is useful for building up the segment tree, and for
display purposes, is now kept elsewhere. The code in 'main.rs' has a new
ScenarioBuilder struct for that.

Calculating which Segments are "needed" is now the responsibility of the
caller of tenant_size_mode, not part of the calculation itself. So it's
up to the caller to make all the decisions with retention periods for
each branch.

The output of the sizing calculation is now a Vec of SizeResults, rather
than a tree. It uses a tree representation internally, when doing the
calculation, but it's not exposed to the caller anymore.

Refactor the way the recursive calculation is performed.

Rewrite the code in size.rs that builds the Segment model. Get rid of
the intermediate representation with Update structs. Build the Segments
directly, with some local HashMaps and Vecs to track branch points to
help with that.

retention_period is now an input to gather_inputs(), rather than an
output.

Update pageserver http API: rename /size endpoint to /synthetic_size
with following parameters:
    - /synthetic_size?inputs_only to get debug info;
- /synthetic_size?retention_period=0 to override cutoff that is used to
calculate the size;
pass header -H "Accept: text/html" to get HTML output, otherwise JSON is
returned

Update python tests and openapi spec.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-02-16 10:53:46 +02:00
Shany Pozin
7b182e2605 Update settings.md with latest PITR and gc period values (#3618)
## Describe your changes
Updates PITR and GC_PERIOD default value doc
## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-16 10:33:04 +02:00
Dmitry Ivanov
1d9d7c02db [proxy] Don't forward empty options to compute nodes
Clients may specify endpoint/project name via `options=project=...`,
so we should not only remove `project=` from `options` but also
drop `options` entirely, because connection pools don't support it.

Discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1676464382670119
2023-02-15 22:05:03 +03:00
Anna Stepanyan
a974602f9f fix the logical size term definition (#3609)
a size of a *database* cannot be a sum of the sizes of *all databases*
indicating that a logical size is calculated for a branch

## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] i checked the suggested changes
- [x] this is not a core feature
- [x] this is just a docs update, does not require analytics
- [x] this PR does not require a public announcement
2023-02-15 15:11:06 +01:00
Anastasia Lubennikova
a839860c2e Add debug messages around sending cached metrics 2023-02-15 16:02:02 +02:00
Anastasia Lubennikova
a5ce2b5330 Add debug messages around timeline.get_current_logical_size 2023-02-15 16:02:02 +02:00
Dmitry Ivanov
3569c1bacd [proxy] Fix: don't cache user & dbname in node info cache
Upstream proxy erroneously stores user & dbname in compute node info
cache entries, thus causing "funny" connection problems if such an entry
is reused while connecting to e.g. a different DB on the same compute node.

This PR fixes the problem but doesn't eliminate the root cause just yet.
I'll revisit this code and make it more type-safe in the upcoming PR.
2023-02-14 17:54:01 +03:00
Vadim Kharitonov
86681b92aa Enable plls and plcoffee extensions 2023-02-14 13:33:27 +01:00
Sergey Melnikov
eb21d9969d Add pageserver-3.us-west-2.aws.neon.tech (#3603) 2023-02-14 12:56:03 +01:00
Joonas Koivunen
e6618f1cc0 Update current logical size gauge (#3592)
Alternative to #3586.

Introduces usage of current_logical_size.current_size as a boundary after which we start to update the metric gauge on ingested wal. Previously any incremented value (ingested wal) would had updated the gauge, but this would had left the metric at zero for timelines which never receive any wal even if size had been calculated. Now the gauge is updated right away as the calculation completes, not requiring any wal to be received.
2023-02-14 13:17:34 +02:00
Dmitry Ivanov
eaff14da5f [proxy] Restore INFO as the default tracing level
Also move tracing init to its own function.
2023-02-13 17:09:43 +03:00
Arthur Petukhovsky
f383b4d540 Enable TCP_NODELAY for wss connections 2023-02-10 21:40:28 +03:00
Dmitry Ivanov
694150ce40 [proxy] Respect the magic RUST_LOG env variable
Usage: `RUST_LOG=trace proxy ...`
2023-02-10 18:49:32 +03:00
Vadim Kharitonov
f4359b688c Backport cargo fmt diff from release branch into main 2023-02-10 14:20:55 +01:00
Vadim Kharitonov
948f047f0a Compile pgvector extension 2023-02-10 13:37:52 +01:00
Vadim Kharitonov
4175cfbdac Create folder for file_cache 2023-02-10 13:03:12 +01:00
Dmitry Ivanov
9657459d80 [proxy] Fix possible unsoundness in the websocket machinery (#3569)
This PR replaces the ill-advised `unsafe Sync` impl with a de-facto
standard way to solve the underlying problem.

TLDR:
- tokio::task::spawn requires future to be Send
- ∀t. (t : Sync) <=> (&t : Send)
- ∀t. (t : Send + !Sync) => (&t : !Send)
2023-02-10 12:45:38 +03:00
Christian Schwarz
a4256b3250 allow on-demand downloads in walreceiver connection handler
Without this patch, basebackup fails if we evict all layers before that.

This slipped in as part of

   commit 01b4b0c2f3
   Author: Christian Schwarz <christian@neon.tech>
   Date:   Fri Jan 13 17:02:22 2023 +0100

       Introduce RequestContext
2023-02-09 13:39:04 +01:00
Christian Schwarz
175a577ad4 automatic layer eviction
This patch adds a per-timeline periodic task that executes an eviction
policy. The eviction policy is configurable per tenant.

Two policies exist:
- NoEviction (the default one)
- LayerAccessThreshold

The LayerAccessThreshold policy examines the last access timestamp per
layer in the layer map and evicts the layer if that last access is
further in the past than a configurable threshold value.
This policy kind is evaluated periodically at a configurable period.
It logs a summary statistic at `info!()` or `warn!()` level, depending
on whether any evictions failed.

This feature has no explicit killswitch since it's off by default.
2023-02-09 13:33:55 +01:00
Joonas Koivunen
1fdf01e3bc fix: readable Debug for Layers (#3575)
#3536 added the custom Debug implementations but it using derived Debug
on Key lead to too verbose output. Instead of making `Key`'s `Debug`
unconditionally or conditionally do the `Display` variant (for table
space'd keys), opted to build a newtype to provide `Debug` for
`Range<Key>` via `Display` which seemed to work unconditionally.

Also orders Key to have: 1. comment, 2. derive, 3. `struct Key`.
2023-02-09 13:55:37 +02:00
Christian Schwarz
446a39e969 make LayerAccesStatFullDetails Copy
Method to_api_model renamed to as_api_model because of Clippy complaint:
https://rust-lang.github.io/rust-clippy/master/index.html#wrong_self_convention
2023-02-09 12:35:45 +01:00
Arthur Petukhovsky
7ed9eb4a56 Add script for safekeeper tenants cleanup (#3452)
This script can be used to remove tenant directories on safekeepers for
projects which do not longer exist (deleted in the console).

To run this script you need to upload it to safekeeper (i.e. with SSH),
and run it with python3. Ansible can be used to run this script on
multiple safekeepers.

Fixes https://github.com/neondatabase/cloud/issues/3356
2023-02-09 13:28:20 +02:00
Joonas Koivunen
f07d6433b6 fix: one leftover Arc::ptr_eq (#3573)
@knizhnik noticed that one instance of `Arc::<dyn
PersistentLayer>::ptr_eq` was missed in #3558.

Now all `ptr_eq` which remain are in comments.
2023-02-09 13:02:07 +02:00
Heikki Linnakangas
2040db98ef Add docs for synthetic size calculation (#3328)
---------
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-02-09 11:20:10 +02:00
dependabot[bot]
371493ae32 Bump cryptography from 38.0.3 to 39.0.1 (#3565) 2023-02-08 16:08:01 +00:00
Rory de Zoete
1b9e5e84aa Add new storage hosts for placement group test (#3561)
To test the placement group setup
2023-02-08 16:48:29 +01:00
Christian Schwarz
7ed93fff06 refactor: allow for eviction of layers in a batch
The auto-eviction PR (#3552) operates in two phaes:

  1. find candidate layers
  2. evict them.

For (2), a batch API like the one added in this commit is useful.

Note that this PR requires #3558 to be merged first.
Otherwise, the tests won't pass.
2023-02-08 14:40:47 +01:00
Joonas Koivunen
a6dffb6ef9 fix: stop using Arc::ptr_eq with dyn Trait (#3558)
This changes the way we compare `Arc<dyn PersistentLayer>` in Timeline's
`LayerMap` not to use `Arc::ptr_eq` which has been witnessed in
development of #3557 to yield wrong results. It gives wrong results
because it compares fat pointers, which are `(object, vtable)` tuples
for `dyn Trait` and there are no guarantees that the `vtable`s are
unique. As in there were multiple vtables for `RemoteLayer` which is why
the comparison failed in #3557.

This is a known issue in rust, clippy warns against it and rust std
might be moving to the solution which has been reproduced on this PR:
compare only object pointers by "casting out" the vtable pointer.
2023-02-08 12:25:25 +00:00
Sergey Melnikov
c5c14368e3 Fix deploy-prod.yml syntax (#3556) 2023-02-07 15:27:31 +01:00
Sergey Melnikov
1254dc7ee2 Fix production deploy: run as root to access docker (#3555) 2023-02-07 15:21:15 +01:00
Joonas Koivunen
fcb905f519 Use LayerMap::replace in eviction (#3544)
Follow-up to #3536, to actually use the new `Debug` in replacing the
layers, and use replacement with manual eviction endpoint.

Turns out the two paths share a lot of handling of `Replacement` but
didn't unify the two (need 3). There are also upcoming refactorings
from other PRs to this.
2023-02-07 11:08:55 +02:00
697 changed files with 112554 additions and 36980 deletions

View File

@@ -1,16 +1,8 @@
# The binaries are really slow, if you compile them in 'dev' mode with the defaults.
# Enable some optimizations even in 'dev' mode, to make tests faster. The basic
# optimizations enabled by "opt-level=1" don't affect debuggability too much.
#
# See https://www.reddit.com/r/rust/comments/gvrgca/this_is_a_neat_trick_for_getting_good_runtime/
#
[profile.dev.package."*"]
# Set the default for dependencies in Development mode.
opt-level = 3
[profile.dev]
# Turn on a small amount of optimization in Development mode.
opt-level = 1
[build]
# This is only present for local builds, as it will be overridden
# by the RUSTDOCFLAGS env var in CI.
rustdocflags = ["-Arustdoc::private_intra_doc_links"]
[alias]
build_testing = ["build", "--features", "testing"]
neon = ["run", "--bin", "neon_local"]

View File

@@ -4,7 +4,7 @@
hakari-package = "workspace_hack"
# Format for `workspace-hack = ...` lines in other Cargo.tomls. Requires cargo-hakari 0.9.8 or above.
dep-format-version = "3"
dep-format-version = "4"
# Setting workspace.resolver = "2" in the root Cargo.toml is HIGHLY recommended.
# Hakari works much better with the new feature resolver.
@@ -22,5 +22,11 @@ platforms = [
# "x86_64-pc-windows-msvc",
]
[final-excludes]
# vm_monitor benefits from the same Cargo.lock as the rest of our artifacts, but
# it is built primarly in separate repo neondatabase/autoscaling and thus is excluded
# from depending on workspace-hack because most of the dependencies are not used.
workspace-members = ["vm_monitor"]
# Write out exact versions rather than a semver range. (Defaults to false.)
# exact-versions = true

View File

@@ -14,10 +14,14 @@
!pgxn/
!proxy/
!safekeeper/
!s3_scrubber/
!storage_broker/
!trace/
!vendor/postgres-v14/
!vendor/postgres-v15/
!vendor/postgres-v16/
!workspace_hack/
!neon_local/
!scripts/ninstall.sh
!scripts/combine_control_files.py
!vm-cgconfig.conf

View File

@@ -17,8 +17,9 @@ assignees: ''
## Implementation ideas
## Tasks
- [ ]
```[tasklist]
### Tasks
```
## Other related tasks and Epics

View File

@@ -3,13 +3,14 @@
**NB: this PR must be merged only by 'Create a merge commit'!**
### Checklist when preparing for release
- [ ] Read or refresh [the release flow guide](https://github.com/neondatabase/cloud/wiki/Release:-general-flow)
- [ ] Read or refresh [the release flow guide](https://www.notion.so/neondatabase/Release-general-flow-61f2e39fd45d4d14a70c7749604bd70b)
- [ ] Ask in the [cloud Slack channel](https://neondb.slack.com/archives/C033A2WE6BZ) that you are going to rollout the release. Any blockers?
- [ ] Does this release contain any db migrations? Destructive ones? What is the rollback plan?
<!-- List everything that should be done **before** release, any issues / setting changes / etc -->
### Checklist after release
- [ ] Make sure instructions from PRs included in this release and labeled `manual_release_instructions` are executed (either by you or by people who wrote them).
- [ ] Based on the merged commits write release notes and open a PR into `website` repo ([example](https://github.com/neondatabase/website/pull/219/files))
- [ ] Check [#dev-production-stream](https://neondb.slack.com/archives/C03F5SM1N02) Slack channel
- [ ] Check [stuck projects page](https://console.neon.tech/admin/projects?sort=last_active&order=desc&stuck=true)

12
.github/actionlint.yml vendored Normal file
View File

@@ -0,0 +1,12 @@
self-hosted-runner:
labels:
- arm64
- dev
- gen3
- large
- small
- us-east-2
config-variables:
- REMOTE_STORAGE_AZURE_CONTAINER
- REMOTE_STORAGE_AZURE_REGION
- SLACK_UPCOMING_RELEASE_CHANNEL_ID

View File

@@ -0,0 +1,245 @@
name: 'Create Allure report'
description: 'Generate Allure report from uploaded by actions/allure-report-store tests results'
inputs:
store-test-results-into-db:
description: 'Whether to store test results into the database. TEST_RESULT_CONNSTR/TEST_RESULT_CONNSTR_NEW should be set'
type: boolean
required: false
default: false
outputs:
base-url:
description: 'Base URL for Allure report'
value: ${{ steps.generate-report.outputs.base-url }}
base-s3-url:
description: 'Base S3 URL for Allure report'
value: ${{ steps.generate-report.outputs.base-s3-url }}
report-url:
description: 'Allure report URL'
value: ${{ steps.generate-report.outputs.report-url }}
report-json-url:
description: 'Allure report JSON URL'
value: ${{ steps.generate-report.outputs.report-json-url }}
runs:
using: "composite"
steps:
# We're using some of env variables quite offen, so let's set them once.
#
# It would be nice to have them set in common runs.env[0] section, but it doesn't work[1]
#
# - [0] https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsenv
# - [1] https://github.com/neondatabase/neon/pull/3907#discussion_r1154703456
#
- name: Set variables
shell: bash -euxo pipefail {0}
run: |
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
BRANCH_OR_PR=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")
fi
LOCK_FILE=reports/${BRANCH_OR_PR}/lock.txt
WORKDIR=/tmp/${BRANCH_OR_PR}-$(date +%s)
mkdir -p ${WORKDIR}
echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV
echo "LOCK_FILE=${LOCK_FILE}" >> $GITHUB_ENV
echo "WORKDIR=${WORKDIR}" >> $GITHUB_ENV
echo "BUCKET=${BUCKET}" >> $GITHUB_ENV
env:
BUCKET: neon-github-public-dev
# TODO: We can replace with a special docker image with Java and Allure pre-installed
- uses: actions/setup-java@v3
with:
distribution: 'temurin'
java-version: '17'
- name: Install Allure
shell: bash -euxo pipefail {0}
run: |
if ! which allure; then
ALLURE_ZIP=allure-${ALLURE_VERSION}.zip
wget -q https://github.com/allure-framework/allure2/releases/download/${ALLURE_VERSION}/${ALLURE_ZIP}
echo "${ALLURE_ZIP_SHA256} ${ALLURE_ZIP}" | sha256sum --check
unzip -q ${ALLURE_ZIP}
echo "$(pwd)/allure-${ALLURE_VERSION}/bin" >> $GITHUB_PATH
rm -f ${ALLURE_ZIP}
fi
env:
ALLURE_VERSION: 2.24.0
ALLURE_ZIP_SHA256: 60b1d6ce65d9ef24b23cf9c2c19fd736a123487c38e54759f1ed1a7a77353c90
# Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this
- name: Acquire lock
shell: bash -euxo pipefail {0}
run: |
LOCK_TIMEOUT=300 # seconds
LOCK_CONTENT="${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"
echo ${LOCK_CONTENT} > ${WORKDIR}/lock.txt
# Do it up to 5 times to avoid race condition
for _ in $(seq 1 5); do
for i in $(seq 1 ${LOCK_TIMEOUT}); do
LOCK_ACQUIRED=$(aws s3api head-object --bucket neon-github-public-dev --key ${LOCK_FILE} | jq --raw-output '.LastModified' || true)
# `date --date="..."` is supported only by gnu date (i.e. it doesn't work on BSD/macOS)
if [ -z "${LOCK_ACQUIRED}" ] || [ "$(( $(date +%s) - $(date --date="${LOCK_ACQUIRED}" +%s) ))" -gt "${LOCK_TIMEOUT}" ]; then
break
fi
sleep 1
done
aws s3 mv --only-show-errors ${WORKDIR}/lock.txt "s3://${BUCKET}/${LOCK_FILE}"
# Double-check that exactly THIS run has acquired the lock
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt
if [ "$(cat lock.txt)" = "${LOCK_CONTENT}" ]; then
break
fi
done
- name: Generate and publish final Allure report
id: generate-report
shell: bash -euxo pipefail {0}
run: |
REPORT_PREFIX=reports/${BRANCH_OR_PR}
RAW_PREFIX=reports-raw/${BRANCH_OR_PR}/${GITHUB_RUN_ID}
BASE_URL=https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}
BASE_S3_URL=s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}
REPORT_URL=${BASE_URL}/index.html
REPORT_JSON_URL=${BASE_URL}/data/suites.json
# Get previously uploaded data for this run
ZSTD_NBTHREADS=0
S3_FILEPATHS=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${RAW_PREFIX}/ | jq --raw-output '.Contents[]?.Key')
if [ -z "$S3_FILEPATHS" ]; then
# There's no previously uploaded data for this $GITHUB_RUN_ID
exit 0
fi
time aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${RAW_PREFIX}/" "${WORKDIR}/"
for archive in $(find ${WORKDIR} -name "*.tar.zst"); do
mkdir -p ${archive%.tar.zst}
time tar -xf ${archive} -C ${archive%.tar.zst}
rm -f ${archive}
done
# Get history trend
time aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${REPORT_PREFIX}/latest/history" "${WORKDIR}/latest/history" || true
# Generate report
time allure generate --clean --output ${WORKDIR}/report ${WORKDIR}/*
# Replace a logo link with a redirect to the latest version of the report
sed -i 's|<a href="." class=|<a href="https://'${BUCKET}'.s3.amazonaws.com/'${REPORT_PREFIX}'/latest/index.html?nocache='"'+Date.now()+'"'" class=|g' ${WORKDIR}/report/app.js
# Upload a history and the final report (in this particular order to not to have duplicated history in 2 places)
time aws s3 mv --recursive --only-show-errors "${WORKDIR}/report/history" "s3://${BUCKET}/${REPORT_PREFIX}/latest/history"
# Use aws s3 cp (instead of aws s3 sync) to keep files from previous runs to make old URLs work,
# and to keep files on the host to upload them to the database
time aws s3 cp --recursive --only-show-errors "${WORKDIR}/report" "s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}"
# Generate redirect
cat <<EOF > ${WORKDIR}/index.html
<!DOCTYPE html>
<meta charset="utf-8">
<title>Redirecting to ${REPORT_URL}</title>
<meta http-equiv="refresh" content="0; URL=${REPORT_URL}">
EOF
time aws s3 cp --only-show-errors ${WORKDIR}/index.html "s3://${BUCKET}/${REPORT_PREFIX}/latest/index.html"
echo "base-url=${BASE_URL}" >> $GITHUB_OUTPUT
echo "base-s3-url=${BASE_S3_URL}" >> $GITHUB_OUTPUT
echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT
echo "report-json-url=${REPORT_JSON_URL}" >> $GITHUB_OUTPUT
echo "[Allure Report](${REPORT_URL})" >> ${GITHUB_STEP_SUMMARY}
- name: Release lock
if: always()
shell: bash -euxo pipefail {0}
run: |
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt || exit 0
if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}" ]; then
aws s3 rm "s3://${BUCKET}/${LOCK_FILE}"
fi
- name: Store Allure test stat in the DB
if: ${{ !cancelled() && inputs.store-test-results-into-db == 'true' }}
shell: bash -euxo pipefail {0}
env:
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
REPORT_JSON_URL: ${{ steps.generate-report.outputs.report-json-url }}
run: |
export DATABASE_URL=${REGRESS_TEST_RESULT_CONNSTR}
./scripts/pysync
poetry run python3 scripts/ingest_regress_test_result.py \
--revision ${COMMIT_SHA} \
--reference ${GITHUB_REF} \
--build-type unified \
--ingest ${WORKDIR}/report/data/suites.json
- name: Store Allure test stat in the DB (new)
if: ${{ !cancelled() && inputs.store-test-results-into-db == 'true' }}
shell: bash -euxo pipefail {0}
env:
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
BASE_S3_URL: ${{ steps.generate-report.outputs.base-s3-url }}
run: |
if [ ! -d "${WORKDIR}/report/data/test-cases" ]; then
exit 0
fi
export DATABASE_URL=${REGRESS_TEST_RESULT_CONNSTR_NEW}
./scripts/pysync
poetry run python3 scripts/ingest_regress_test_result-new-format.py \
--reference ${GITHUB_REF} \
--revision ${COMMIT_SHA} \
--run-id ${GITHUB_RUN_ID} \
--run-attempt ${GITHUB_RUN_ATTEMPT} \
--test-cases-dir ${WORKDIR}/report/data/test-cases
- name: Cleanup
if: always()
shell: bash -euxo pipefail {0}
run: |
if [ -d "${WORKDIR}" ]; then
rm -rf ${WORKDIR}
fi
- uses: actions/github-script@v6
if: always()
env:
REPORT_URL: ${{ steps.generate-report.outputs.report-url }}
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
with:
script: |
const { REPORT_URL, COMMIT_SHA } = process.env
await github.rest.repos.createCommitStatus({
owner: context.repo.owner,
repo: context.repo.repo,
sha: `${COMMIT_SHA}`,
state: 'success',
target_url: `${REPORT_URL}`,
context: 'Allure report',
})

View File

@@ -0,0 +1,72 @@
name: 'Store Allure results'
description: 'Upload test results to be used by actions/allure-report-generate'
inputs:
report-dir:
description: 'directory with test results generated by tests'
required: true
unique-key:
description: 'string to distinguish different results in the same run'
required: true
runs:
using: "composite"
steps:
- name: Set variables
shell: bash -euxo pipefail {0}
run: |
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
BRANCH_OR_PR=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")
fi
echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV
echo "REPORT_DIR=${REPORT_DIR}" >> $GITHUB_ENV
env:
REPORT_DIR: ${{ inputs.report-dir }}
- name: Upload test results
shell: bash -euxo pipefail {0}
run: |
REPORT_PREFIX=reports/${BRANCH_OR_PR}
RAW_PREFIX=reports-raw/${BRANCH_OR_PR}/${GITHUB_RUN_ID}
# Add metadata
cat <<EOF > ${REPORT_DIR}/executor.json
{
"name": "GitHub Actions",
"type": "github",
"url": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/latest/index.html",
"buildOrder": ${GITHUB_RUN_ID},
"buildName": "GitHub Actions Run #${GITHUB_RUN_NUMBER}/${GITHUB_RUN_ATTEMPT}",
"buildUrl": "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/attempts/${GITHUB_RUN_ATTEMPT}",
"reportUrl": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html",
"reportName": "Allure Report"
}
EOF
cat <<EOF > ${REPORT_DIR}/environment.properties
COMMIT_SHA=${COMMIT_SHA}
EOF
ARCHIVE="${UNIQUE_KEY}-${GITHUB_RUN_ATTEMPT}-$(date +%s).tar.zst"
ZSTD_NBTHREADS=0
time tar -C ${REPORT_DIR} -cf ${ARCHIVE} --zstd .
time aws s3 mv --only-show-errors ${ARCHIVE} "s3://${BUCKET}/${RAW_PREFIX}/${ARCHIVE}"
env:
UNIQUE_KEY: ${{ inputs.unique-key }}
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
BUCKET: neon-github-public-dev
- name: Cleanup
if: always()
shell: bash -euxo pipefail {0}
run: |
rm -rf ${REPORT_DIR}

View File

@@ -1,232 +0,0 @@
name: 'Create Allure report'
description: 'Create and publish Allure report'
inputs:
action:
desctiption: 'generate or store'
required: true
build_type:
description: '`build_type` from run-python-test-set action'
required: true
test_selection:
description: '`test_selector` from run-python-test-set action'
required: false
outputs:
report-url:
description: 'Allure report URL'
value: ${{ steps.generate-report.outputs.report-url }}
runs:
using: "composite"
steps:
- name: Validate input parameters
shell: bash -euxo pipefail {0}
run: |
if [ "${{ inputs.action }}" != "store" ] && [ "${{ inputs.action }}" != "generate" ]; then
echo 2>&1 "Unknown inputs.action type '${{ inputs.action }}'; allowed 'generate' or 'store' only"
exit 1
fi
if [ -z "${{ inputs.test_selection }}" ] && [ "${{ inputs.action }}" == "store" ]; then
echo 2>&1 "inputs.test_selection must be set for 'store' action"
exit 2
fi
- name: Calculate variables
id: calculate-vars
shell: bash -euxo pipefail {0}
run: |
# TODO: for manually triggered workflows (via workflow_dispatch) we need to have a separate key
pr_number=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${pr_number}" != "null" ]; then
key=pr-${pr_number}
elif [ "${GITHUB_REF_NAME}" = "main" ]; then
# Shortcut for a special branch
key=main
elif [ "${GITHUB_REF_NAME}" = "release" ]; then
# Shortcut for a special branch
key=release
else
key=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")
fi
echo "KEY=${key}" >> $GITHUB_OUTPUT
# Sanitize test selection to remove `/` and any other special characters
# Use printf instead of echo to avoid having `\n` at the end of the string
test_selection=$(printf "${{ inputs.test_selection }}" | tr -c "[:alnum:]._-" "-" )
echo "TEST_SELECTION=${test_selection}" >> $GITHUB_OUTPUT
- uses: actions/setup-java@v3
if: ${{ inputs.action == 'generate' }}
with:
distribution: 'temurin'
java-version: '17'
- name: Install Allure
if: ${{ inputs.action == 'generate' }}
shell: bash -euxo pipefail {0}
run: |
if ! which allure; then
ALLURE_ZIP=allure-${ALLURE_VERSION}.zip
wget -q https://github.com/allure-framework/allure2/releases/download/${ALLURE_VERSION}/${ALLURE_ZIP}
echo "${ALLURE_ZIP_MD5} ${ALLURE_ZIP}" | md5sum -c
unzip -q ${ALLURE_ZIP}
echo "$(pwd)/allure-${ALLURE_VERSION}/bin" >> $GITHUB_PATH
rm -f ${ALLURE_ZIP}
fi
env:
ALLURE_VERSION: 2.19.0
ALLURE_ZIP_MD5: ced21401a1a8b9dfb68cee9e4c210464
- name: Upload Allure results
if: ${{ inputs.action == 'store' }}
env:
REPORT_PREFIX: reports/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
RAW_PREFIX: reports-raw/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
TEST_OUTPUT: /tmp/test_output
BUCKET: neon-github-public-dev
TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}
shell: bash -euxo pipefail {0}
run: |
# Add metadata
cat <<EOF > $TEST_OUTPUT/allure/results/executor.json
{
"name": "GitHub Actions",
"type": "github",
"url": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/latest/index.html",
"buildOrder": ${GITHUB_RUN_ID},
"buildName": "GitHub Actions Run #${{ github.run_number }}/${GITHUB_RUN_ATTEMPT}",
"buildUrl": "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/attempts/${GITHUB_RUN_ATTEMPT}",
"reportUrl": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html",
"reportName": "Allure Report"
}
EOF
cat <<EOF > $TEST_OUTPUT/allure/results/environment.properties
TEST_SELECTION=${{ inputs.test_selection }}
BUILD_TYPE=${{ inputs.build_type }}
EOF
ARCHIVE="${GITHUB_RUN_ID}-${TEST_SELECTION}-${GITHUB_RUN_ATTEMPT}-$(date +%s).tar.zst"
ZSTD_NBTHREADS=0
tar -C ${TEST_OUTPUT}/allure/results -cf ${ARCHIVE} --zstd .
aws s3 mv --only-show-errors ${ARCHIVE} "s3://${BUCKET}/${RAW_PREFIX}/${ARCHIVE}"
# Potentially we could have several running build for the same key (for example for the main branch), so we use improvised lock for this
- name: Acquire Allure lock
if: ${{ inputs.action == 'generate' }}
shell: bash -euxo pipefail {0}
env:
LOCK_FILE: reports/${{ steps.calculate-vars.outputs.KEY }}/lock.txt
BUCKET: neon-github-public-dev
TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}
run: |
LOCK_TIMEOUT=300 # seconds
for _ in $(seq 1 5); do
for i in $(seq 1 ${LOCK_TIMEOUT}); do
LOCK_ADDED=$(aws s3api head-object --bucket neon-github-public-dev --key ${LOCK_FILE} | jq --raw-output '.LastModified' || true)
# `date --date="..."` is supported only by gnu date (i.e. it doesn't work on BSD/macOS)
if [ -z "${LOCK_ADDED}" ] || [ "$(( $(date +%s) - $(date --date="${LOCK_ADDED}" +%s) ))" -gt "${LOCK_TIMEOUT}" ]; then
break
fi
sleep 1
done
echo "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" > lock.txt
aws s3 mv --only-show-errors lock.txt "s3://${BUCKET}/${LOCK_FILE}"
# A double-check that exactly WE have acquired the lock
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt
if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" ]; then
break
fi
done
- name: Generate and publish final Allure report
if: ${{ inputs.action == 'generate' }}
id: generate-report
env:
REPORT_PREFIX: reports/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
RAW_PREFIX: reports-raw/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
TEST_OUTPUT: /tmp/test_output
BUCKET: neon-github-public-dev
shell: bash -euxo pipefail {0}
run: |
# Get previously uploaded data for this run
ZSTD_NBTHREADS=0
s3_filepaths=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${RAW_PREFIX}/${GITHUB_RUN_ID}- | jq --raw-output '.Contents[].Key')
if [ -z "$s3_filepaths" ]; then
# There's no previously uploaded data for this run
exit 0
fi
for s3_filepath in ${s3_filepaths}; do
aws s3 cp --only-show-errors "s3://${BUCKET}/${s3_filepath}" "${TEST_OUTPUT}/allure/"
archive=${TEST_OUTPUT}/allure/$(basename $s3_filepath)
mkdir -p ${archive%.tar.zst}
tar -xf ${archive} -C ${archive%.tar.zst}
rm -f ${archive}
done
# Get history trend
aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${REPORT_PREFIX}/latest/history" "${TEST_OUTPUT}/allure/latest/history" || true
# Generate report
allure generate --clean --output $TEST_OUTPUT/allure/report $TEST_OUTPUT/allure/*
# Replace a logo link with a redirect to the latest version of the report
sed -i 's|<a href="." class=|<a href="https://'${BUCKET}'.s3.amazonaws.com/'${REPORT_PREFIX}'/latest/index.html" class=|g' $TEST_OUTPUT/allure/report/app.js
# Upload a history and the final report (in this particular order to not to have duplicated history in 2 places)
aws s3 mv --recursive --only-show-errors "${TEST_OUTPUT}/allure/report/history" "s3://${BUCKET}/${REPORT_PREFIX}/latest/history"
aws s3 mv --recursive --only-show-errors "${TEST_OUTPUT}/allure/report" "s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}"
REPORT_URL=https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html
# Generate redirect
cat <<EOF > ./index.html
<!DOCTYPE html>
<meta charset="utf-8">
<title>Redirecting to ${REPORT_URL}</title>
<meta http-equiv="refresh" content="0; URL=${REPORT_URL}">
EOF
aws s3 cp --only-show-errors ./index.html "s3://${BUCKET}/${REPORT_PREFIX}/latest/index.html"
echo "[Allure Report](${REPORT_URL})" >> ${GITHUB_STEP_SUMMARY}
echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT
- name: Release Allure lock
if: ${{ inputs.action == 'generate' && always() }}
shell: bash -euxo pipefail {0}
env:
LOCK_FILE: reports/${{ steps.calculate-vars.outputs.KEY }}/lock.txt
BUCKET: neon-github-public-dev
TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}
run: |
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt || exit 0
if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" ]; then
aws s3 rm "s3://${BUCKET}/${LOCK_FILE}"
fi
- uses: actions/github-script@v6
if: ${{ inputs.action == 'generate' && always() }}
env:
REPORT_URL: ${{ steps.generate-report.outputs.report-url }}
BUILD_TYPE: ${{ inputs.build_type }}
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
with:
script: |
const { REPORT_URL, BUILD_TYPE, SHA } = process.env
await github.rest.repos.createCommitStatus({
owner: context.repo.owner,
repo: context.repo.repo,
sha: `${SHA}`,
state: 'success',
target_url: `${REPORT_URL}`,
context: `Allure report / ${BUILD_TYPE}`,
})

View File

@@ -31,13 +31,13 @@ runs:
BUCKET=neon-github-public-dev
FILENAME=$(basename $ARCHIVE)
S3_KEY=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${PREFIX%$GITHUB_RUN_ATTEMPT} | jq -r '.Contents[].Key' | grep ${FILENAME} | sort --version-sort | tail -1 || true)
S3_KEY=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${PREFIX%$GITHUB_RUN_ATTEMPT} | jq -r '.Contents[]?.Key' | grep ${FILENAME} | sort --version-sort | tail -1 || true)
if [ -z "${S3_KEY}" ]; then
if [ "${SKIP_IF_DOES_NOT_EXIST}" = "true" ]; then
echo 'SKIPPED=true' >> $GITHUB_OUTPUT
exit 0
else
echo 2>&1 "Neither s3://${BUCKET}/${PREFIX}/${FILENAME} nor its version from previous attempts exist"
echo >&2 "Neither s3://${BUCKET}/${PREFIX}/${FILENAME} nor its version from previous attempts exist"
exit 1
fi
fi

View File

@@ -58,7 +58,7 @@ runs:
done
if [ -z "${branch_id}" ] || [ "${branch_id}" == "null" ]; then
echo 2>&1 "Failed to create branch after 10 attempts, the latest response was: ${branch}"
echo >&2 "Failed to create branch after 10 attempts, the latest response was: ${branch}"
exit 1
fi
@@ -122,7 +122,7 @@ runs:
done
if [ -z "${password}" ] || [ "${password}" == "null" ]; then
echo 2>&1 "Failed to reset password after 10 attempts, the latest response was: ${reset_password}"
echo >&2 "Failed to reset password after 10 attempts, the latest response was: ${reset_password}"
exit 1
fi

View File

@@ -48,7 +48,7 @@ runs:
done
if [ -z "${branch_id}" ] || [ "${branch_id}" == "null" ]; then
echo 2>&1 "Failed to delete branch after 10 attempts, the latest response was: ${deleted_branch}"
echo >&2 "Failed to delete branch after 10 attempts, the latest response was: ${deleted_branch}"
exit 1
fi
env:

View File

@@ -14,6 +14,12 @@ inputs:
api_host:
desctiption: 'Neon API host'
default: console.stage.neon.tech
provisioner:
desctiption: 'k8s-pod or k8s-neonvm'
default: 'k8s-pod'
compute_units:
desctiption: '[Min, Max] compute units; Min and Max are used for k8s-neonvm with autoscaling, for k8s-pod values Min and Max should be equal'
default: '[1, 1]'
outputs:
dsn:
@@ -31,6 +37,10 @@ runs:
# A shell without `set -x` to not to expose password/dsn in logs
shell: bash -euo pipefail {0}
run: |
if [ "${PROVISIONER}" == "k8s-pod" ] && [ "${MIN_CU}" != "${MAX_CU}" ]; then
echo >&2 "For k8s-pod provisioner MIN_CU should be equal to MAX_CU"
fi
project=$(curl \
"https://${API_HOST}/api/v2/projects" \
--fail \
@@ -42,6 +52,9 @@ runs:
\"name\": \"Created by actions/neon-project-create; GITHUB_RUN_ID=${GITHUB_RUN_ID}\",
\"pg_version\": ${POSTGRES_VERSION},
\"region_id\": \"${REGION_ID}\",
\"provisioner\": \"${PROVISIONER}\",
\"autoscaling_limit_min_cu\": ${MIN_CU},
\"autoscaling_limit_max_cu\": ${MAX_CU},
\"settings\": { }
}
}")
@@ -62,3 +75,6 @@ runs:
API_KEY: ${{ inputs.api_key }}
REGION_ID: ${{ inputs.region_id }}
POSTGRES_VERSION: ${{ inputs.postgres_version }}
PROVISIONER: ${{ inputs.provisioner }}
MIN_CU: ${{ fromJSON(inputs.compute_units)[0] }}
MAX_CU: ${{ fromJSON(inputs.compute_units)[1] }}

View File

@@ -36,14 +36,14 @@ inputs:
description: 'Region name for real s3 tests'
required: false
default: ''
real_s3_access_key_id:
description: 'Access key id'
rerun_flaky:
description: 'Whether to rerun flaky tests'
required: false
default: ''
real_s3_secret_access_key:
description: 'Secret access key'
default: 'false'
pg_version:
description: 'Postgres version to use for tests'
required: false
default: ''
default: 'v14'
runs:
using: "composite"
@@ -63,13 +63,16 @@ runs:
path: /tmp/neon-previous
prefix: latest
- name: Download compatibility snapshot for Postgres 14
- name: Download compatibility snapshot
if: inputs.build_type != 'remote'
uses: ./.github/actions/download
with:
name: compatibility-snapshot-${{ inputs.build_type }}-pg14
path: /tmp/compatibility_snapshot_pg14
name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}
path: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}
prefix: latest
# The lack of compatibility snapshot (for example, for the new Postgres version)
# shouldn't fail the whole job. Only relevant test should fail.
skip-if-does-not-exist: true
- name: Checkout
if: inputs.needs_postgres_source == 'true'
@@ -96,18 +99,18 @@ runs:
COMPATIBILITY_POSTGRES_DISTRIB_DIR: /tmp/neon-previous/pg_install
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: ${{ inputs.build_type }}
AWS_ACCESS_KEY_ID: ${{ inputs.real_s3_access_key_id }}
AWS_SECRET_ACCESS_KEY: ${{ inputs.real_s3_secret_access_key }}
COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg14
COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}
ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'backward compatibility breakage')
ALLOW_FORWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'forward compatibility breakage')
RERUN_FLAKY: ${{ inputs.rerun_flaky }}
PG_VERSION: ${{ inputs.pg_version }}
shell: bash -euxo pipefail {0}
run: |
# PLATFORM will be embedded in the perf test report
# and it is needed to distinguish different environments
export PLATFORM=${PLATFORM:-github-actions-selfhosted}
export POSTGRES_DISTRIB_DIR=${POSTGRES_DISTRIB_DIR:-/tmp/neon/pg_install}
export DEFAULT_PG_VERSION=${DEFAULT_PG_VERSION:-14}
export DEFAULT_PG_VERSION=${PG_VERSION#v}
if [ "${BUILD_TYPE}" = "remote" ]; then
export REMOTE_ENV=1
@@ -143,6 +146,25 @@ runs:
EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"
fi
if [ "${RERUN_FLAKY}" == "true" ]; then
mkdir -p $TEST_OUTPUT
poetry run ./scripts/flaky_tests.py "${TEST_RESULT_CONNSTR}" \
--days 7 \
--output "$TEST_OUTPUT/flaky.json" \
--pg-version "${DEFAULT_PG_VERSION}" \
--build-type "${BUILD_TYPE}"
EXTRA_PARAMS="--flaky-tests-json $TEST_OUTPUT/flaky.json $EXTRA_PARAMS"
fi
# We use pytest-split plugin to run benchmarks in parallel on different CI runners
if [ "${TEST_SELECTION}" = "test_runner/performance" ] && [ "${{ inputs.build_type }}" != "remote" ]; then
mkdir -p $TEST_OUTPUT
poetry run ./scripts/benchmark_durations.py "${TEST_RESULT_CONNSTR}" --days 10 --output "$TEST_OUTPUT/benchmark_durations.json"
EXTRA_PARAMS="--durations-path $TEST_OUTPUT/benchmark_durations.json $EXTRA_PARAMS"
fi
if [[ "${{ inputs.build_type }}" == "debug" ]]; then
cov_prefix=(scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage run)
elif [[ "${{ inputs.build_type }}" == "release" ]]; then
@@ -180,19 +202,18 @@ runs:
scripts/generate_and_push_perf_report.sh
fi
- name: Upload compatibility snapshot for Postgres 14
- name: Upload compatibility snapshot
if: github.ref_name == 'release'
uses: ./.github/actions/upload
with:
name: compatibility-snapshot-${{ inputs.build_type }}-pg14-${{ github.run_id }}
# The path includes a test name (test_create_snapshot) and directory that the test creates (compatibility_snapshot_pg14), keep the path in sync with the test
path: /tmp/test_output/test_create_snapshot/compatibility_snapshot_pg14/
name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}-${{ github.run_id }}
# Directory is created by test_compatibility.py::test_create_snapshot, keep the path in sync with the test
path: /tmp/test_output/compatibility_snapshot_pg${{ inputs.pg_version }}/
prefix: latest
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
- name: Upload test results
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-store
with:
action: store
build_type: ${{ inputs.build_type }}
test_selection: ${{ inputs.test_selection }}
report-dir: /tmp/test_output/allure/results
unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}

View File

@@ -23,7 +23,7 @@ runs:
mkdir -p $(dirname $ARCHIVE)
if [ -f ${ARCHIVE} ]; then
echo 2>&1 "File ${ARCHIVE} already exist. Something went wrong before"
echo >&2 "File ${ARCHIVE} already exist. Something went wrong before"
exit 1
fi
@@ -33,10 +33,10 @@ runs:
elif [ -f ${SOURCE} ]; then
time tar -cf ${ARCHIVE} --zstd ${SOURCE}
elif ! ls ${SOURCE} > /dev/null 2>&1; then
echo 2>&1 "${SOURCE} does not exist"
echo >&2 "${SOURCE} does not exist"
exit 2
else
echo 2>&1 "${SOURCE} is neither a directory nor a file, do not know how to handle it"
echo >&2 "${SOURCE} is neither a directory nor a file, do not know how to handle it"
exit 3
fi

View File

@@ -1,5 +0,0 @@
neon_install.tar.gz
.neon_current_version
collections/*
!collections/.keep

View File

@@ -1,12 +0,0 @@
[defaults]
localhost_warning = False
host_key_checking = False
timeout = 30
[ssh_connection]
ssh_args = -F ./ansible.ssh.cfg
# teleport doesn't support sftp yet https://github.com/gravitational/teleport/issues/7127
# and scp neither worked for me
transfer_method = piped
pipelining = True

View File

@@ -1,15 +0,0 @@
# Remove this once https://github.com/gravitational/teleport/issues/10918 is fixed
# (use pre 8.5 option name to cope with old ssh in CI)
PubkeyAcceptedKeyTypes +ssh-rsa-cert-v01@openssh.com
Host tele.zenith.tech
User admin
Port 3023
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
Host * !tele.zenith.tech
User admin
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
ProxyJump tele.zenith.tech

View File

@@ -1,193 +0,0 @@
- name: Upload Neon binaries
hosts: storage
gather_facts: False
remote_user: "{{ remote_user }}"
tasks:
- name: get latest version of Neon binaries
register: current_version_file
set_fact:
current_version: "{{ lookup('file', '.neon_current_version') | trim }}"
tags:
- pageserver
- safekeeper
- name: inform about versions
debug:
msg: "Version to deploy - {{ current_version }}"
tags:
- pageserver
- safekeeper
- name: upload and extract Neon binaries to /usr/local
ansible.builtin.unarchive:
owner: root
group: root
src: neon_install.tar.gz
dest: /usr/local
become: true
tags:
- pageserver
- safekeeper
- binaries
- putbinaries
- name: Deploy pageserver
hosts: pageservers
gather_facts: False
remote_user: "{{ remote_user }}"
tasks:
- name: upload init script
when: console_mgmt_base_url is defined
ansible.builtin.template:
src: scripts/init_pageserver.sh
dest: /tmp/init_pageserver.sh
owner: root
group: root
mode: '0755'
become: true
tags:
- pageserver
- name: init pageserver
shell:
cmd: /tmp/init_pageserver.sh
args:
creates: "/storage/pageserver/data/tenants"
environment:
NEON_REPO_DIR: "/storage/pageserver/data"
LD_LIBRARY_PATH: "/usr/local/v14/lib"
become: true
tags:
- pageserver
- name: read the existing remote pageserver config
ansible.builtin.slurp:
src: /storage/pageserver/data/pageserver.toml
register: _remote_ps_config
tags:
- pageserver
- name: parse the existing pageserver configuration
ansible.builtin.set_fact:
_existing_ps_config: "{{ _remote_ps_config['content'] | b64decode | sivel.toiletwater.from_toml }}"
tags:
- pageserver
- name: construct the final pageserver configuration dict
ansible.builtin.set_fact:
pageserver_config: "{{ pageserver_config_stub | combine({'id': _existing_ps_config.id }) }}"
tags:
- pageserver
- name: template the pageserver config
template:
src: templates/pageserver.toml.j2
dest: /storage/pageserver/data/pageserver.toml
become: true
tags:
- pageserver
- name: upload systemd service definition
ansible.builtin.template:
src: systemd/pageserver.service
dest: /etc/systemd/system/pageserver.service
owner: root
group: root
mode: '0644'
become: true
tags:
- pageserver
- name: start systemd service
ansible.builtin.systemd:
daemon_reload: yes
name: pageserver
enabled: yes
state: restarted
become: true
tags:
- pageserver
- name: post version to console
when: console_mgmt_base_url is defined
shell:
cmd: |
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/pageservers
tags:
- pageserver
- name: Deploy safekeeper
hosts: safekeepers
gather_facts: False
remote_user: "{{ remote_user }}"
tasks:
- name: upload init script
when: console_mgmt_base_url is defined
ansible.builtin.template:
src: scripts/init_safekeeper.sh
dest: /tmp/init_safekeeper.sh
owner: root
group: root
mode: '0755'
become: true
tags:
- safekeeper
- name: init safekeeper
shell:
cmd: /tmp/init_safekeeper.sh
args:
creates: "/storage/safekeeper/data/safekeeper.id"
environment:
NEON_REPO_DIR: "/storage/safekeeper/data"
LD_LIBRARY_PATH: "/usr/local/v14/lib"
become: true
tags:
- safekeeper
# in the future safekeepers should discover pageservers byself
# but currently use first pageserver that was discovered
- name: set first pageserver var for safekeepers
set_fact:
first_pageserver: "{{ hostvars[groups['pageservers'][0]]['inventory_hostname'] }}"
tags:
- safekeeper
- name: upload systemd service definition
ansible.builtin.template:
src: systemd/safekeeper.service
dest: /etc/systemd/system/safekeeper.service
owner: root
group: root
mode: '0644'
become: true
tags:
- safekeeper
- name: start systemd service
ansible.builtin.systemd:
daemon_reload: yes
name: safekeeper
enabled: yes
state: restarted
become: true
tags:
- safekeeper
- name: post version to console
when: console_mgmt_base_url is defined
shell:
cmd: |
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/safekeepers
tags:
- safekeeper

View File

@@ -1,42 +0,0 @@
#!/bin/bash
set -e
if [ -n "${DOCKER_TAG}" ]; then
# Verson is DOCKER_TAG but without prefix
VERSION=$(echo $DOCKER_TAG | sed 's/^.*-//g')
else
echo "Please set DOCKER_TAG environment variable"
exit 1
fi
# do initial cleanup
rm -rf neon_install postgres_install.tar.gz neon_install.tar.gz .neon_current_version
mkdir neon_install
# retrieve binaries from docker image
echo "getting binaries from docker image"
docker pull --quiet neondatabase/neon:${DOCKER_TAG}
ID=$(docker create neondatabase/neon:${DOCKER_TAG})
docker cp ${ID}:/data/postgres_install.tar.gz .
tar -xzf postgres_install.tar.gz -C neon_install
mkdir neon_install/bin/
docker cp ${ID}:/usr/local/bin/pageserver neon_install/bin/
docker cp ${ID}:/usr/local/bin/pageserver_binutils neon_install/bin/
docker cp ${ID}:/usr/local/bin/safekeeper neon_install/bin/
docker cp ${ID}:/usr/local/bin/storage_broker neon_install/bin/
docker cp ${ID}:/usr/local/bin/proxy neon_install/bin/
docker cp ${ID}:/usr/local/v14/bin/ neon_install/v14/bin/
docker cp ${ID}:/usr/local/v15/bin/ neon_install/v15/bin/
docker cp ${ID}:/usr/local/v14/lib/ neon_install/v14/lib/
docker cp ${ID}:/usr/local/v15/lib/ neon_install/v15/lib/
docker rm -vf ${ID}
# store version to file (for ansible playbooks) and create binaries tarball
echo ${VERSION} > neon_install/.neon_current_version
echo ${VERSION} > .neon_current_version
tar -czf neon_install.tar.gz -C neon_install .
# do final cleaup
rm -rf neon_install postgres_install.tar.gz

View File

@@ -1,38 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-ap-southeast-1
bucket_region: ap-southeast-1
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.epsilon.ap-southeast-1.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: ap-southeast-1
ansible_aws_ssm_bucket_name: neon-prod-storage-ap-southeast-1
console_region_id: aws-ap-southeast-1
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.ap-southeast-1.aws.neon.tech:
ansible_host: i-064de8ea28bdb495b
pageserver-1.ap-southeast-1.aws.neon.tech:
ansible_host: i-0b180defcaeeb6b93
safekeepers:
hosts:
safekeeper-0.ap-southeast-1.aws.neon.tech:
ansible_host: i-0d6f1dc5161eef894
safekeeper-1.ap-southeast-1.aws.neon.tech:
ansible_host: i-0e338adda8eb2d19f
safekeeper-2.ap-southeast-1.aws.neon.tech:
ansible_host: i-04fb63634e4679eb9

View File

@@ -1,38 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-eu-central-1
bucket_region: eu-central-1
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.gamma.eu-central-1.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: eu-central-1
ansible_aws_ssm_bucket_name: neon-prod-storage-eu-central-1
console_region_id: aws-eu-central-1
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.eu-central-1.aws.neon.tech:
ansible_host: i-0cd8d316ecbb715be
pageserver-1.eu-central-1.aws.neon.tech:
ansible_host: i-090044ed3d383fef0
safekeepers:
hosts:
safekeeper-0.eu-central-1.aws.neon.tech:
ansible_host: i-0b238612d2318a050
safekeeper-1.eu-central-1.aws.neon.tech:
ansible_host: i-07b9c45e5c2637cd4
safekeeper-2.eu-central-1.aws.neon.tech:
ansible_host: i-020257302c3c93d88

View File

@@ -1,39 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-us-east-2
bucket_region: us-east-2
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.delta.us-east-2.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: us-east-2
ansible_aws_ssm_bucket_name: neon-prod-storage-us-east-2
console_region_id: aws-us-east-2
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.us-east-2.aws.neon.tech:
ansible_host: i-062227ba7f119eb8c
pageserver-1.us-east-2.aws.neon.tech:
ansible_host: i-0b3ec0afab5968938
safekeepers:
hosts:
safekeeper-0.us-east-2.aws.neon.tech:
ansible_host: i-0e94224750c57d346
safekeeper-1.us-east-2.aws.neon.tech:
ansible_host: i-06d113fb73bfddeb0
safekeeper-2.us-east-2.aws.neon.tech:
ansible_host: i-09f66c8e04afff2e8

View File

@@ -1,41 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-us-west-2
bucket_region: us-west-2
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.eta.us-west-2.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: us-west-2
ansible_aws_ssm_bucket_name: neon-prod-storage-us-west-2
console_region_id: aws-us-west-2-new
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.us-west-2.aws.neon.tech:
ansible_host: i-0d9f6dfae0e1c780d
pageserver-1.us-west-2.aws.neon.tech:
ansible_host: i-0c834be1dddba8b3f
pageserver-2.us-west-2.aws.neon.tech:
ansible_host: i-051642d372c0a4f32
safekeepers:
hosts:
safekeeper-0.us-west-2.aws.neon.tech:
ansible_host: i-00719d8a74986fda6
safekeeper-1.us-west-2.aws.neon.tech:
ansible_host: i-074682f9d3c712e7c
safekeeper-2.us-west-2.aws.neon.tech:
ansible_host: i-042b7efb1729d7966

View File

@@ -1,40 +0,0 @@
---
storage:
vars:
console_mgmt_base_url: http://console-release.local
bucket_name: zenith-storage-oregon
bucket_region: us-west-2
broker_endpoint: http://storage-broker.prod.local:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "{{ inventory_hostname }}"
safekeeper_s3_prefix: prod-1/wal
hostname_suffix: ".local"
remote_user: admin
sentry_environment: production
children:
pageservers:
hosts:
zenith-1-ps-2:
console_region_id: aws-us-west-2
zenith-1-ps-3:
console_region_id: aws-us-west-2
zenith-1-ps-4:
console_region_id: aws-us-west-2
zenith-1-ps-5:
console_region_id: aws-us-west-2
safekeepers:
hosts:
zenith-1-sk-1:
console_region_id: aws-us-west-2
zenith-1-sk-2:
console_region_id: aws-us-west-2
zenith-1-sk-4:
console_region_id: aws-us-west-2

View File

@@ -1,33 +0,0 @@
#!/bin/sh
# fetch params from meta-data service
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
# store fqdn hostname in var
HOST=$(hostname -f)
cat <<EOF | tee /tmp/payload
{
"version": 1,
"host": "${HOST}",
"port": 6400,
"region_id": "{{ console_region_id }}",
"instance_id": "${INSTANCE_ID}",
"http_host": "${HOST}",
"http_port": 9898,
"active": false,
"availability_zone_id": "${AZ_ID}"
}
EOF
# check if pageserver already registered or not
if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/${INSTANCE_ID} -o /dev/null; then
# not registered, so register it now
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers -d@/tmp/payload | jq -r '.id')
# init pageserver
sudo -u pageserver /usr/local/bin/pageserver -c "id=${ID}" -c "pg_distrib_dir='/usr/local'" --init -D /storage/pageserver/data
fi

View File

@@ -1,31 +0,0 @@
#!/bin/sh
# fetch params from meta-data service
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
# store fqdn hostname in var
HOST=$(hostname -f)
cat <<EOF | tee /tmp/payload
{
"version": 1,
"host": "${HOST}",
"port": 6500,
"http_port": 7676,
"region_id": "{{ console_region_id }}",
"instance_id": "${INSTANCE_ID}",
"availability_zone_id": "${AZ_ID}",
"active": false
}
EOF
# check if safekeeper already registered or not
if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/${INSTANCE_ID} -o /dev/null; then
# not registered, so register it now
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers -d@/tmp/payload | jq -r '.id')
# init safekeeper
sudo -u safekeeper /usr/local/bin/safekeeper --id ${ID} --init -D /storage/safekeeper/data
fi

View File

@@ -1,2 +0,0 @@
ansible_connection: aws_ssm
ansible_python_interpreter: /usr/bin/python3

View File

@@ -1,36 +0,0 @@
storage:
vars:
bucket_name: neon-dev-storage-eu-west-1
bucket_region: eu-west-1
console_mgmt_base_url: http://console-staging.local
broker_endpoint: http://storage-broker-lb.zeta.eu-west-1.internal.aws.neon.build:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-staging.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: eu-west-1
ansible_aws_ssm_bucket_name: neon-dev-storage-eu-west-1
console_region_id: aws-eu-west-1
sentry_environment: staging
children:
pageservers:
hosts:
pageserver-0.eu-west-1.aws.neon.build:
ansible_host: i-01d496c5041c7f34c
safekeepers:
hosts:
safekeeper-0.eu-west-1.aws.neon.build:
ansible_host: i-05226ef85722831bf
safekeeper-1.eu-west-1.aws.neon.build:
ansible_host: i-06969ee1bf2958bfc
safekeeper-2.eu-west-1.aws.neon.build:
ansible_host: i-087892e9625984a0b

View File

@@ -1,42 +0,0 @@
storage:
vars:
bucket_name: neon-staging-storage-us-east-2
bucket_region: us-east-2
console_mgmt_base_url: http://console-staging.local
broker_endpoint: http://storage-broker-lb.beta.us-east-2.internal.aws.neon.build:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-staging.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: us-east-2
ansible_aws_ssm_bucket_name: neon-staging-storage-us-east-2
console_region_id: aws-us-east-2
sentry_environment: staging
children:
pageservers:
hosts:
pageserver-0.us-east-2.aws.neon.build:
ansible_host: i-0c3e70929edb5d691
pageserver-1.us-east-2.aws.neon.build:
ansible_host: i-0565a8b4008aa3f40
pageserver-2.us-east-2.aws.neon.build:
ansible_host: i-01e31cdf7e970586a
pageserver-3.us-east-2.aws.neon.build:
ansible_host: i-0602a0291365ef7cc
safekeepers:
hosts:
safekeeper-0.us-east-2.aws.neon.build:
ansible_host: i-027662bd552bf5db0
safekeeper-1.us-east-2.aws.neon.build:
ansible_host: i-0171efc3604a7b907
safekeeper-2.us-east-2.aws.neon.build:
ansible_host: i-0de0b03a51676a6ce

View File

@@ -1,18 +0,0 @@
[Unit]
Description=Neon pageserver
After=network.target auditd.service
[Service]
Type=simple
User=pageserver
Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/pageserver LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_PAGESERVER }} SENTRY_ENVIRONMENT={{ sentry_environment }}
ExecStart=/usr/local/bin/pageserver -c "pg_distrib_dir='/usr/local'" -c "listen_pg_addr='0.0.0.0:6400'" -c "listen_http_addr='0.0.0.0:9898'" -c "broker_endpoint='{{ broker_endpoint }}'" -D /storage/pageserver/data
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT
Restart=on-failure
TimeoutSec=10
LimitNOFILE=30000000
[Install]
WantedBy=multi-user.target

View File

@@ -1,18 +0,0 @@
[Unit]
Description=Neon safekeeper
After=network.target auditd.service
[Service]
Type=simple
User=safekeeper
Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/safekeeper/data LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_SAFEKEEPER }} SENTRY_ENVIRONMENT={{ sentry_environment }}
ExecStart=/usr/local/bin/safekeeper -l {{ inventory_hostname }}{{ hostname_suffix }}:6500 --listen-http {{ inventory_hostname }}{{ hostname_suffix }}:7676 -D /storage/safekeeper/data --broker-endpoint={{ broker_endpoint }} --remote-storage='{bucket_name="{{bucket_name}}", bucket_region="{{bucket_region}}", prefix_in_bucket="{{ safekeeper_s3_prefix }}"}'
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT
Restart=on-failure
TimeoutSec=10
LimitNOFILE=30000000
[Install]
WantedBy=multi-user.target

View File

@@ -1 +0,0 @@
{{ pageserver_config | sivel.toiletwater.to_toml }}

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-staging.local/management/api/v2"
domain: "*.eu-west-1.aws.neon.build"
sentryEnvironment: "staging"
wssPort: 8443
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: dev
zenith_region: eu-west-1
zenith_region_slug: eu-west-1
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: eu-west-1.aws.neon.build
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: staging
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.zeta.eu-west-1.internal.aws.neon.build
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "staging"

View File

@@ -1,68 +0,0 @@
# Helm chart values for neon-proxy-link.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "link"
authEndpoint: "https://console.stage.neon.tech/authenticate_proxy_request/"
uri: "https://console.stage.neon.tech/psql_session/"
domain: "pg.neon.build"
sentryEnvironment: "staging"
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy-link pods
podLabels:
zenith_service: proxy
zenith_env: dev
zenith_region: us-east-2
zenith_region_slug: us-east-2
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
external-dns.alpha.kubernetes.io/hostname: neon-proxy-link-mgmt.beta.us-east-2.aws.neon.build
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: neon-proxy-link.beta.us-east-2.aws.neon.build
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-staging.local/management/api/v2"
domain: "*.cloud.stage.neon.tech"
sentryEnvironment: "staging"
wssPort: 8443
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram-legacy
zenith_env: dev
zenith_region: us-east-2
zenith_region_slug: us-east-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: neon-proxy-scram-legacy.beta.us-east-2.aws.neon.build
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-staging.local/management/api/v2"
domain: "*.us-east-2.aws.neon.build"
sentryEnvironment: "staging"
wssPort: 8443
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: dev
zenith_region: us-east-2
zenith_region_slug: us-east-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: us-east-2.aws.neon.build
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: staging
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.beta.us-east-2.internal.aws.neon.build
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "staging"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.ap-southeast-1.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: ap-southeast-1
zenith_region_slug: ap-southeast-1
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: ap-southeast-1.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.epsilon.ap-southeast-1.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.eu-central-1.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: eu-central-1
zenith_region_slug: eu-central-1
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: eu-central-1.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.gamma.eu-central-1.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,59 +0,0 @@
# Helm chart values for neon-proxy-link.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "link"
authEndpoint: "https://console.neon.tech/authenticate_proxy_request/"
uri: "https://console.neon.tech/psql_session/"
domain: "pg.neon.tech"
sentryEnvironment: "production"
# -- Additional labels for zenith-proxy pods
podLabels:
zenith_service: proxy
zenith_env: production
zenith_region: us-east-2
zenith_region_slug: us-east-2
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
external-dns.alpha.kubernetes.io/hostname: neon-proxy-link-mgmt.delta.us-east-2.aws.neon.tech
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: neon-proxy-link.delta.us-east-2.aws.neon.tech
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.us-east-2.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: us-east-2
zenith_region_slug: us-east-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: us-east-2.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.delta.us-east-2.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.cloud.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: us-west-2
zenith_region_slug: us-west-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: neon-proxy-scram-legacy.eta.us-west-2.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.us-west-2.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: us-west-2
zenith_region_slug: us-west-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: us-west-2.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.eta.us-west-2.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,56 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker.prod.local
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: true
serviceMonitor:
enabled: true
selector:
release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,10 +1,14 @@
## Describe your changes
## Problem
## Issue ticket number and link
## Summary of changes
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above checklist

31
.github/workflows/actionlint.yml vendored Normal file
View File

@@ -0,0 +1,31 @@
name: Lint GitHub Workflows
on:
push:
branches:
- main
- release
paths:
- '.github/workflows/*.ya?ml'
pull_request:
paths:
- '.github/workflows/*.ya?ml'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
jobs:
actionlint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: reviewdog/action-actionlint@v1
env:
# SC2046 - Quote this to prevent word splitting. - https://www.shellcheck.net/wiki/SC2046
# SC2086 - Double quote to prevent globbing and word splitting. - https://www.shellcheck.net/wiki/SC2086
SHELLCHECK_OPTS: --exclude=SC2046,SC2086
with:
fail_on_error: true
filter_mode: nofilter
level: error

View File

@@ -0,0 +1,118 @@
name: Handle `approved-for-ci-run` label
# This workflow helps to run CI pipeline for PRs made by external contributors (from forks).
on:
pull_request_target:
branches:
- main
types:
# Default types that triggers a workflow ([1]):
# - [1] https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request
- opened
- synchronize
- reopened
# Types that we wand to handle in addition to keep labels tidy:
- closed
# Actual magic happens here:
- labeled
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.pull_request.number }}
BRANCH: "ci-run/pr-${{ github.event.pull_request.number }}"
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
defaults:
run:
shell: bash -euo pipefail {0}
jobs:
remove-label:
# Remove `approved-for-ci-run` label if the workflow is triggered by changes in a PR.
# The PR should be reviewed and labelled manually again.
permissions:
pull-requests: write # For `gh pr edit`
if: |
contains(fromJSON('["opened", "synchronize", "reopened", "closed"]'), github.event.action) &&
contains(github.event.pull_request.labels.*.name, 'approved-for-ci-run')
runs-on: ubuntu-latest
steps:
- run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"
create-or-update-pr-for-ci-run:
# Create local PR for an `approved-for-ci-run` labelled PR to run CI pipeline in it.
permissions:
pull-requests: write # for `gh pr edit`
# For `git push` and `gh pr create` we use CI_ACCESS_TOKEN
if: |
github.event.action == 'labeled' &&
contains(github.event.pull_request.labels.*.name, 'approved-for-ci-run')
runs-on: ubuntu-latest
steps:
- run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"
- uses: actions/checkout@v3
with:
ref: main
token: ${{ secrets.CI_ACCESS_TOKEN }}
- run: gh pr checkout "${PR_NUMBER}"
- run: git checkout -b "${BRANCH}"
- run: git push --force origin "${BRANCH}"
- name: Create a Pull Request for CI run (if required)
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: |
cat << EOF > body.md
This Pull Request is created automatically to run the CI pipeline for #${PR_NUMBER}
Please do not alter or merge/close it.
Feel free to review/comment/discuss the original PR #${PR_NUMBER}.
EOF
ALREADY_CREATED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --base main --json number --jq '.[].number')"
if [ -z "${ALREADY_CREATED}" ]; then
gh pr --repo "${GITHUB_REPOSITORY}" create --title "CI run for PR #${PR_NUMBER}" \
--body-file "body.md" \
--head "${BRANCH}" \
--base "main" \
--draft
fi
cleanup:
# Close PRs and delete branchs if the original PR is closed.
permissions:
contents: write # for `--delete-branch` flag in `gh pr close`
pull-requests: write # for `gh pr close`
if: |
github.event.action == 'closed' &&
github.event.pull_request.head.repo.full_name != github.repository
runs-on: ubuntu-latest
steps:
- name: Close PR and delete `ci-run/pr-${{ env.PR_NUMBER }}` branch
run: |
CLOSED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --json 'closed' --jq '.[].closed')"
if [ "${CLOSED}" == "false" ]; then
gh pr --repo "${GITHUB_REPOSITORY}" close "${BRANCH}" --delete-branch
fi

View File

@@ -16,12 +16,12 @@ on:
workflow_dispatch: # adds ability to run this manually
inputs:
region_id:
description: 'Use a particular region. If not set the default region will be used'
description: 'Project region id. If not set, the default region will be used'
required: false
default: 'aws-us-east-2'
save_perf_report:
type: boolean
description: 'Publish perf report or not. If not set, the report is published only for the main branch'
description: 'Publish perf report. If not set, the report will be published only for the main branch'
required: false
defaults:
@@ -30,7 +30,7 @@ defaults:
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
jobs:
@@ -42,7 +42,7 @@ jobs:
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: "neon-staging"
runs-on: [ self-hosted, us-east-2, x64 ]
@@ -92,11 +92,8 @@ jobs:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -107,25 +104,86 @@ jobs:
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
generate-matrices:
# Create matrices for the benchmarking jobs, so we run benchmarks on rds only once a week (on Saturday)
#
# Available platforms:
# - neon-captest-new: Freshly created project (1 CU)
# - neon-captest-freetier: Use freetier-sized compute (0.25 CU)
# - neon-captest-reuse: Reusing existing project
# - rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# - rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
runs-on: ubuntu-latest
outputs:
pgbench-compare-matrix: ${{ steps.pgbench-compare-matrix.outputs.matrix }}
olap-compare-matrix: ${{ steps.olap-compare-matrix.outputs.matrix }}
tpch-compare-matrix: ${{ steps.tpch-compare-matrix.outputs.matrix }}
steps:
- name: Generate matrix for pgbench benchmark
id: pgbench-compare-matrix
run: |
matrix='{
"platform": [
"neon-captest-new",
"neon-captest-reuse",
"neonvm-captest-new"
],
"db_size": [ "10gb" ],
"include": [{ "platform": "neon-captest-freetier", "db_size": "3gb" },
{ "platform": "neon-captest-new", "db_size": "50gb" },
{ "platform": "neonvm-captest-freetier", "db_size": "3gb" },
{ "platform": "neonvm-captest-new", "db_size": "50gb" }]
}'
if [ "$(date +%A)" = "Saturday" ]; then
matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres", "db_size": "10gb"},
{ "platform": "rds-aurora", "db_size": "50gb"}]')
fi
echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT
- name: Generate matrix for OLAP benchmarks
id: olap-compare-matrix
run: |
matrix='{
"platform": [
"neon-captest-reuse"
]
}'
if [ "$(date +%A)" = "Saturday" ]; then
matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres" },
{ "platform": "rds-aurora" }]')
fi
echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT
- name: Generate matrix for TPC-H benchmarks
id: tpch-compare-matrix
run: |
matrix='{
"platform": [
"neon-captest-reuse"
],
"scale": [
"10"
]
}'
if [ "$(date +%A)" = "Saturday" ]; then
matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres", "scale": "10" },
{ "platform": "rds-aurora", "scale": "10" }]')
fi
echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT
pgbench-compare:
needs: [ generate-matrices ]
strategy:
fail-fast: false
matrix:
# neon-captest-new: Run pgbench in a freshly created project
# neon-captest-reuse: Same, but reusing existing project
# neon-captest-prefetch: Same, with prefetching enabled (new project)
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-reuse, neon-captest-prefetch, rds-postgres ]
db_size: [ 10gb ]
runner: [ us-east-2 ]
include:
- platform: neon-captest-prefetch
db_size: 50gb
runner: us-east-2
- platform: rds-aurora
db_size: 50gb
runner: us-east-2
matrix: ${{fromJson(needs.generate-matrices.outputs.pgbench-compare-matrix)}}
env:
TEST_PG_BENCH_DURATIONS_MATRIX: "60m"
@@ -134,15 +192,16 @@ jobs:
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
runs-on: [ self-hosted, "${{ matrix.runner }}", x64 ]
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
# Increase timeout to 8h, default timeout is 6h
timeout-minutes: 480
steps:
- uses: actions/checkout@v3
@@ -160,13 +219,15 @@ jobs:
echo "${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin" >> $GITHUB_PATH
- name: Create Neon Project
if: contains(fromJson('["neon-captest-new", "neon-captest-prefetch"]'), matrix.platform)
if: contains(fromJson('["neon-captest-new", "neon-captest-freetier", "neonvm-captest-new", "neonvm-captest-freetier"]'), matrix.platform)
id: create-neon-project
uses: ./.github/actions/neon-project-create
with:
region_id: ${{ github.event.inputs.region_id || 'aws-us-east-2' }}
postgres_version: ${{ env.DEFAULT_PG_VERSION }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
compute_units: ${{ (matrix.platform == 'neon-captest-freetier' && '[0.25, 0.25]') || '[1, 1]' }}
provisioner: ${{ (contains(matrix.platform, 'neonvm-') && 'k8s-neonvm') || 'k8s-pod' }}
- name: Set up Connection String
id: set-up-connstr
@@ -175,7 +236,7 @@ jobs:
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}
;;
neon-captest-new | neon-captest-prefetch)
neon-captest-new | neon-captest-freetier | neonvm-captest-new | neonvm-captest-freetier)
CONNSTR=${{ steps.create-neon-project.outputs.dsn }}
;;
rds-aurora)
@@ -185,25 +246,18 @@ jobs:
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'neon-captest-new', 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}"
exit 1
;;
esac
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
QUERY="SELECT version();"
if [[ "${PLATFORM}" = "neon"* ]]; then
QUERY="${QUERY} SHOW neon.tenant_id; SHOW neon.timeline_id;"
fi
psql ${CONNSTR} -c "${QUERY}"
- name: Benchmark init
uses: ./.github/actions/run-python-test-set
@@ -252,11 +306,8 @@ jobs:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -275,23 +326,19 @@ jobs:
#
# *_CLICKBENCH_CONNSTR: Genuine ClickBench DB with ~100M rows
# *_CLICKBENCH_10M_CONNSTR: DB with the first 10M rows of ClickBench DB
if: success() || failure()
needs: [ pgbench-compare ]
if: ${{ !cancelled() }}
needs: [ generate-matrices, pgbench-compare ]
strategy:
fail-fast: false
matrix:
# neon-captest-prefetch: We have pre-created projects with prefetch enabled
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]
matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
runs-on: [ self-hosted, us-east-2, x64 ]
@@ -299,8 +346,6 @@ jobs:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
steps:
- uses: actions/checkout@v3
@@ -320,7 +365,7 @@ jobs:
id: set-up-connstr
run: |
case "${PLATFORM}" in
neon-captest-prefetch)
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CLICKBENCH_10M_CONNSTR }}
;;
rds-aurora)
@@ -330,25 +375,18 @@ jobs:
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CLICKBENCH_10M_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"
exit 1
;;
esac
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
QUERY="SELECT version();"
if [[ "${PLATFORM}" = "neon"* ]]; then
QUERY="${QUERY} SHOW neon.tenant_id; SHOW neon.timeline_id;"
fi
psql ${CONNSTR} -c "${QUERY}"
- name: ClickBench benchmark
uses: ./.github/actions/run-python-test-set
@@ -362,13 +400,11 @@ jobs:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
TEST_OLAP_SCALE: 10
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -386,32 +422,27 @@ jobs:
# We might change it after https://github.com/neondatabase/neon/issues/2900.
#
# *_TPCH_S10_CONNSTR: DB generated with scale factor 10 (~10 GB)
if: success() || failure()
needs: [ clickbench-compare ]
if: ${{ !cancelled() }}
needs: [ generate-matrices, clickbench-compare ]
strategy:
fail-fast: false
matrix:
# neon-captest-prefetch: We have pre-created projects with prefetch enabled
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]
matrix: ${{ fromJson(needs.generate-matrices.outputs.tpch-compare-matrix) }}
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
TEST_OLAP_SCALE: ${{ matrix.scale }}
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
steps:
- uses: actions/checkout@v3
@@ -427,39 +458,39 @@ jobs:
${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin/pgbench --version
echo "${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin" >> $GITHUB_PATH
- name: Set up Connection String
id: set-up-connstr
- name: Get Connstring Secret Name
run: |
case "${PLATFORM}" in
neon-captest-prefetch)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_TPCH_S10_CONNSTR }}
neon-captest-reuse)
ENV_PLATFORM=CAPTEST_TPCH
;;
rds-aurora)
CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_TPCH_S10_CONNSTR }}
ENV_PLATFORM=RDS_AURORA_TPCH
;;
rds-postgres)
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_TPCH_S10_CONNSTR }}
ENV_PLATFORM=RDS_AURORA_TPCH
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"
exit 1
;;
esac
CONNSTR_SECRET_NAME="BENCHMARK_${ENV_PLATFORM}_S${TEST_OLAP_SCALE}_CONNSTR"
echo "CONNSTR_SECRET_NAME=${CONNSTR_SECRET_NAME}" >> $GITHUB_ENV
- name: Set up Connection String
id: set-up-connstr
run: |
CONNSTR=${{ secrets[env.CONNSTR_SECRET_NAME] }}
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
QUERY="SELECT version();"
if [[ "${PLATFORM}" = "neon"* ]]; then
QUERY="${QUERY} SHOW neon.tenant_id; SHOW neon.timeline_id;"
fi
psql ${CONNSTR} -c "${QUERY}"
- name: Run TPC-H benchmark
uses: ./.github/actions/run-python-test-set
@@ -473,13 +504,11 @@ jobs:
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
TEST_OLAP_SCALE: ${{ matrix.scale }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -491,23 +520,19 @@ jobs:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
user-examples-compare:
if: success() || failure()
needs: [ tpch-compare ]
if: ${{ !cancelled() }}
needs: [ generate-matrices, tpch-compare ]
strategy:
fail-fast: false
matrix:
# neon-captest-prefetch: We have pre-created projects with prefetch enabled
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]
matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
runs-on: [ self-hosted, us-east-2, x64 ]
@@ -515,8 +540,6 @@ jobs:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
steps:
- uses: actions/checkout@v3
@@ -536,7 +559,7 @@ jobs:
id: set-up-connstr
run: |
case "${PLATFORM}" in
neon-captest-prefetch)
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_CAPTEST_CONNSTR }}
;;
rds-aurora)
@@ -546,25 +569,18 @@ jobs:
CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_RDS_POSTGRES_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"
exit 1
;;
esac
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
QUERY="SELECT version();"
if [[ "${PLATFORM}" = "neon"* ]]; then
QUERY="${QUERY} SHOW neon.tenant_id; SHOW neon.timeline_id;"
fi
psql ${CONNSTR} -c "${QUERY}"
- name: Run user examples
uses: ./.github/actions/run-python-test-set
@@ -580,17 +596,14 @@ jobs:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@v1
with:
channel-id: "C033QLM5P7D" # dev-staging-stream
slack-message: "Periodic TPC-H perf testing ${{ matrix.platform }}: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
slack-message: "Periodic User example perf testing ${{ matrix.platform }}: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

File diff suppressed because it is too large Load Diff

View File

@@ -1,179 +0,0 @@
name: Neon Deploy dev
on:
workflow_dispatch:
inputs:
dockerTag:
description: 'Docker tag to deploy'
required: true
type: string
branch:
description: 'Branch or commit used for deploy scripts and configs'
required: true
type: string
default: 'main'
deployStorage:
description: 'Deploy storage'
required: true
type: boolean
default: true
deployProxy:
description: 'Deploy proxy'
required: true
type: boolean
default: true
deployStorageBroker:
description: 'Deploy storage-broker'
required: true
type: boolean
default: true
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
concurrency:
group: deploy-dev
cancel-in-progress: false
jobs:
deploy-storage-new:
runs-on: [ self-hosted, gen3, small ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:pinned
options: --user root --privileged
if: inputs.deployStorage
defaults:
run:
shell: bash
strategy:
matrix:
target_region: [ eu-west-1, us-east-2 ]
environment:
name: dev-${{ matrix.target_region }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Redeploy
run: |
export DOCKER_TAG=${{ inputs.dockerTag }}
cd "$(pwd)/.github/ansible"
./get_binaries.sh
ansible-galaxy collection install sivel.toiletwater
ansible-playbook deploy.yaml -i staging.${{ matrix.target_region }}.hosts.yaml -e @ssm_config -e CONSOLE_API_TOKEN=${{ secrets.NEON_STAGING_API_KEY }} -e SENTRY_URL_PAGESERVER=${{ secrets.SENTRY_URL_PAGESERVER }} -e SENTRY_URL_SAFEKEEPER=${{ secrets.SENTRY_URL_SAFEKEEPER }}
rm -f neon_install.tar.gz .neon_current_version
- name: Cleanup ansible folder
run: rm -rf ~/.ansible
deploy-proxy-new:
runs-on: [ self-hosted, gen3, small ]
container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:pinned
if: inputs.deployProxy
defaults:
run:
shell: bash
strategy:
matrix:
include:
- target_region: us-east-2
target_cluster: dev-us-east-2-beta
deploy_link_proxy: true
deploy_legacy_scram_proxy: true
- target_region: eu-west-1
target_cluster: dev-eu-west-1-zeta
deploy_link_proxy: false
deploy_legacy_scram_proxy: false
environment:
name: dev-${{ matrix.target_region }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1-node16
with:
role-to-assume: arn:aws:iam::369495373322:role/github-runner
aws-region: eu-central-1
role-skip-session-tagging: true
role-duration-seconds: 1800
- name: Configure environment
run: |
helm repo add neondatabase https://neondatabase.github.io/helm-charts
aws --region ${{ matrix.target_region }} eks update-kubeconfig --name ${{ matrix.target_cluster }}
- name: Re-deploy scram proxy
run: |
DOCKER_TAG=${{ inputs.dockerTag }}
helm upgrade neon-proxy-scram neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s
- name: Re-deploy link proxy
if: matrix.deploy_link_proxy
run: |
DOCKER_TAG=${{ inputs.dockerTag }}
helm upgrade neon-proxy-link neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-link.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s
- name: Re-deploy legacy scram proxy
if: matrix.deploy_legacy_scram_proxy
run: |
DOCKER_TAG=${{ inputs.dockerTag }}
helm upgrade neon-proxy-scram-legacy neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram-legacy.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s
- name: Cleanup helm folder
run: rm -rf ~/.cache
deploy-storage-broker-new:
runs-on: [ self-hosted, gen3, small ]
container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:pinned
if: inputs.deployStorageBroker
defaults:
run:
shell: bash
strategy:
matrix:
include:
- target_region: us-east-2
target_cluster: dev-us-east-2-beta
- target_region: eu-west-1
target_cluster: dev-eu-west-1-zeta
environment:
name: dev-${{ matrix.target_region }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1-node16
with:
role-to-assume: arn:aws:iam::369495373322:role/github-runner
aws-region: eu-central-1
role-skip-session-tagging: true
role-duration-seconds: 1800
- name: Configure environment
run: |
helm repo add neondatabase https://neondatabase.github.io/helm-charts
aws --region ${{ matrix.target_region }} eks update-kubeconfig --name ${{ matrix.target_cluster }}
- name: Deploy storage-broker
run:
helm upgrade neon-storage-broker-lb neondatabase/neon-storage-broker --namespace neon-storage-broker-lb --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-storage-broker.yaml --set image.tag=${{ inputs.dockerTag }} --set settings.sentryUrl=${{ secrets.SENTRY_URL_BROKER }} --wait --timeout 5m0s
- name: Cleanup helm folder
run: rm -rf ~/.cache

View File

@@ -1,240 +0,0 @@
name: Neon Deploy prod
on:
workflow_dispatch:
inputs:
dockerTag:
description: 'Docker tag to deploy'
required: true
type: string
branch:
description: 'Branch or commit used for deploy scripts and configs'
required: true
type: string
default: 'release'
deployStorage:
description: 'Deploy storage'
required: true
type: boolean
default: true
deployProxy:
description: 'Deploy proxy'
required: true
type: boolean
default: true
deployStorageBroker:
description: 'Deploy storage-broker'
required: true
type: boolean
default: true
disclamerAcknowledged:
description: 'I confirm that there is an emergency and I can not use regular release workflow'
required: true
type: boolean
default: false
concurrency:
group: deploy-prod
cancel-in-progress: false
jobs:
deploy-prod-new:
runs-on: prod
container: 093970136003.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest
if: inputs.deployStorage && inputs.disclamerAcknowledged
defaults:
run:
shell: bash
strategy:
matrix:
target_region: [ us-east-2, us-west-2, eu-central-1, ap-southeast-1 ]
environment:
name: prod-${{ matrix.target_region }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Redeploy
run: |
export DOCKER_TAG=${{ inputs.dockerTag }}
cd "$(pwd)/.github/ansible"
./get_binaries.sh
ansible-galaxy collection install sivel.toiletwater
ansible-playbook deploy.yaml -i prod.${{ matrix.target_region }}.hosts.yaml -e @ssm_config -e CONSOLE_API_TOKEN=${{ secrets.NEON_PRODUCTION_API_KEY }} -e SENTRY_URL_PAGESERVER=${{ secrets.SENTRY_URL_PAGESERVER }} -e SENTRY_URL_SAFEKEEPER=${{ secrets.SENTRY_URL_SAFEKEEPER }}
rm -f neon_install.tar.gz .neon_current_version
deploy-proxy-prod-new:
runs-on: prod
container: 093970136003.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest
if: inputs.deployProxy && inputs.disclamerAcknowledged
defaults:
run:
shell: bash
strategy:
matrix:
include:
- target_region: us-east-2
target_cluster: prod-us-east-2-delta
deploy_link_proxy: true
deploy_legacy_scram_proxy: false
- target_region: us-west-2
target_cluster: prod-us-west-2-eta
deploy_link_proxy: false
deploy_legacy_scram_proxy: true
- target_region: eu-central-1
target_cluster: prod-eu-central-1-gamma
deploy_link_proxy: false
deploy_legacy_scram_proxy: false
- target_region: ap-southeast-1
target_cluster: prod-ap-southeast-1-epsilon
deploy_link_proxy: false
deploy_legacy_scram_proxy: false
environment:
name: prod-${{ matrix.target_region }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Configure environment
run: |
helm repo add neondatabase https://neondatabase.github.io/helm-charts
aws --region ${{ matrix.target_region }} eks update-kubeconfig --name ${{ matrix.target_cluster }}
- name: Re-deploy scram proxy
run: |
DOCKER_TAG=${{ inputs.dockerTag }}
helm upgrade neon-proxy-scram neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s
- name: Re-deploy link proxy
if: matrix.deploy_link_proxy
run: |
DOCKER_TAG=${{ inputs.dockerTag }}
helm upgrade neon-proxy-link neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-link.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s
- name: Re-deploy legacy scram proxy
if: matrix.deploy_legacy_scram_proxy
run: |
DOCKER_TAG=${{ inputs.dockerTag }}
helm upgrade neon-proxy-scram-legacy neondatabase/neon-proxy --namespace neon-proxy --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-proxy-scram-legacy.yaml --set image.tag=${DOCKER_TAG} --set settings.sentryUrl=${{ secrets.SENTRY_URL_PROXY }} --wait --timeout 15m0s
deploy-storage-broker-prod-new:
runs-on: prod
container: 093970136003.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest
if: inputs.deployStorageBroker && inputs.disclamerAcknowledged
defaults:
run:
shell: bash
strategy:
matrix:
include:
- target_region: us-east-2
target_cluster: prod-us-east-2-delta
- target_region: us-west-2
target_cluster: prod-us-west-2-eta
- target_region: eu-central-1
target_cluster: prod-eu-central-1-gamma
- target_region: ap-southeast-1
target_cluster: prod-ap-southeast-1-epsilon
environment:
name: prod-${{ matrix.target_region }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Configure environment
run: |
helm repo add neondatabase https://neondatabase.github.io/helm-charts
aws --region ${{ matrix.target_region }} eks update-kubeconfig --name ${{ matrix.target_cluster }}
- name: Deploy storage-broker
run:
helm upgrade neon-storage-broker-lb neondatabase/neon-storage-broker --namespace neon-storage-broker-lb --create-namespace --install --atomic -f .github/helm-values/${{ matrix.target_cluster }}.neon-storage-broker.yaml --set image.tag=${{ inputs.dockerTag }} --set settings.sentryUrl=${{ secrets.SENTRY_URL_BROKER }} --wait --timeout 5m0s
# Deploy to old account below
deploy:
runs-on: prod
container: 093970136003.dkr.ecr.eu-central-1.amazonaws.com/ansible:latest
if: inputs.deployStorage && inputs.disclamerAcknowledged
defaults:
run:
shell: bash
environment:
name: prod-old
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Redeploy
run: |
export DOCKER_TAG=${{ inputs.dockerTag }}
cd "$(pwd)/.github/ansible"
./get_binaries.sh
eval $(ssh-agent)
echo "${{ secrets.TELEPORT_SSH_KEY }}" | tr -d '\n'| base64 --decode >ssh-key
echo "${{ secrets.TELEPORT_SSH_CERT }}" | tr -d '\n'| base64 --decode >ssh-key-cert.pub
chmod 0600 ssh-key
ssh-add ssh-key
rm -f ssh-key ssh-key-cert.pub
ANSIBLE_CONFIG=./ansible.cfg ansible-galaxy collection install sivel.toiletwater
ANSIBLE_CONFIG=./ansible.cfg ansible-playbook deploy.yaml -i production.hosts.yaml -e CONSOLE_API_TOKEN=${{ secrets.NEON_PRODUCTION_API_KEY }} -e SENTRY_URL_PAGESERVER=${{ secrets.SENTRY_URL_PAGESERVER }} -e SENTRY_URL_SAFEKEEPER=${{ secrets.SENTRY_URL_SAFEKEEPER }}
rm -f neon_install.tar.gz .neon_current_version
# Cleanup script fails otherwise - rm: cannot remove '/nvme/actions-runner/_work/_temp/_github_home/.ansible/collections': Permission denied
- name: Cleanup ansible folder
run: rm -rf ~/.ansible
deploy-storage-broker:
name: deploy storage broker on old staging and old prod
runs-on: [ self-hosted, gen3, small ]
container: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/ansible:pinned
if: inputs.deployStorageBroker && inputs.disclamerAcknowledged
defaults:
run:
shell: bash
environment:
name: prod-old
env:
KUBECONFIG: .kubeconfig
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
fetch-depth: 0
ref: ${{ inputs.branch }}
- name: Store kubeconfig file
run: |
echo "${{ secrets.PRODUCTION_KUBECONFIG_DATA }}" | base64 --decode > ${KUBECONFIG}
chmod 0600 ${KUBECONFIG}
- name: Add neon helm chart
run: helm repo add neondatabase https://neondatabase.github.io/helm-charts
- name: Deploy storage-broker
run:
helm upgrade neon-storage-broker neondatabase/neon-storage-broker --namespace neon-storage-broker --create-namespace --install --atomic -f .github/helm-values/production.neon-storage-broker.yaml --set image.tag=${{ inputs.dockerTag }} --set settings.sentryUrl=${{ secrets.SENTRY_URL_BROKER }} --wait --timeout 5m0s
- name: Cleanup helm folder
run: rm -rf ~/.cache

View File

@@ -3,7 +3,7 @@ name: Check neon with extra platform builds
on:
push:
branches:
- main
- main
pull_request:
defaults:
@@ -12,7 +12,7 @@ defaults:
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
env:
@@ -21,7 +21,10 @@ env:
jobs:
check-macos-build:
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
timeout-minutes: 90
runs-on: macos-latest
@@ -32,13 +35,13 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
- name: Install macOS postgres dependencies
run: brew install flex bison openssl protobuf
run: brew install flex bison openssl protobuf icu4c pkg-config
- name: Set pg 14 revision for caching
id: pg_v14_rev
@@ -48,19 +51,30 @@ jobs:
id: pg_v15_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT
- name: Set pg 16 revision for caching
id: pg_v16_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT
- name: Cache postgres v14 build
id: cache_pg_14
uses: actions/cache@v3
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
key: v1-${{ runner.os }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v3
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
key: v1-${{ runner.os }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v16 build
id: cache_pg_16
uses: actions/cache@v3
with:
path: pg_install/v16
key: v1-${{ runner.os }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Set extra env for macOS
run: |
@@ -79,14 +93,21 @@ jobs:
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'
run: make postgres-v14 -j$(nproc)
run: make postgres-v14 -j$(sysctl -n hw.ncpu)
- name: Build postgres v15
if: steps.cache_pg_15.outputs.cache-hit != 'true'
run: make postgres-v15 -j$(nproc)
run: make postgres-v15 -j$(sysctl -n hw.ncpu)
- name: Build postgres v16
if: steps.cache_pg_16.outputs.cache-hit != 'true'
run: make postgres-v16 -j$(sysctl -n hw.ncpu)
- name: Build neon extensions
run: make neon-pg-ext -j$(nproc)
run: make neon-pg-ext -j$(sysctl -n hw.ncpu)
- name: Build walproposer-lib
run: make walproposer-lib -j$(sysctl -n hw.ncpu)
- name: Run cargo build
run: cargo build --all --release
@@ -94,8 +115,200 @@ jobs:
- name: Check that no warnings are produced
run: ./run_clippy.sh
check-linux-arm-build:
timeout-minutes: 90
runs-on: [ self-hosted, dev, arm64 ]
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
CARGO_FEATURES: --features testing
CARGO_FLAGS: --locked --release
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
steps:
- name: Fix git ownership
run: |
# Workaround for `fatal: detected dubious ownership in repository at ...`
#
# Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers
# Ref https://github.com/actions/checkout/issues/785
#
git config --global --add safe.directory ${{ github.workspace }}
git config --global --add safe.directory ${GITHUB_WORKSPACE}
for r in 14 15 16; do
git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"
git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"
done
- name: Checkout
uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
- name: Set pg 14 revision for caching
id: pg_v14_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT
- name: Set pg 15 revision for caching
id: pg_v15_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT
- name: Set pg 16 revision for caching
id: pg_v16_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT
- name: Set env variables
run: |
echo "CARGO_HOME=${GITHUB_WORKSPACE}/.cargo" >> $GITHUB_ENV
- name: Cache postgres v14 build
id: cache_pg_14
uses: actions/cache@v3
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v3
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v16 build
id: cache_pg_16
uses: actions/cache@v3
with:
path: pg_install/v16
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'
run: mold -run make postgres-v14 -j$(nproc)
- name: Build postgres v15
if: steps.cache_pg_15.outputs.cache-hit != 'true'
run: mold -run make postgres-v15 -j$(nproc)
- name: Build postgres v16
if: steps.cache_pg_16.outputs.cache-hit != 'true'
run: mold -run make postgres-v16 -j$(nproc)
- name: Build neon extensions
run: mold -run make neon-pg-ext -j$(nproc)
- name: Build walproposer-lib
run: mold -run make walproposer-lib -j$(nproc)
- name: Run cargo build
run: |
mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins --tests
- name: Run cargo test
run: |
cargo test $CARGO_FLAGS $CARGO_FEATURES
# Run separate tests for real S3
export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty
export REMOTE_STORAGE_S3_BUCKET=neon-github-public-dev
export REMOTE_STORAGE_S3_REGION=eu-central-1
# Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now
cargo test $CARGO_FLAGS --package remote_storage --test test_real_s3
# Run separate tests for real Azure Blob Storage
# XXX: replace region with `eu-central-1`-like region
export ENABLE_REAL_AZURE_REMOTE_STORAGE=y
export AZURE_STORAGE_ACCOUNT="${{ secrets.AZURE_STORAGE_ACCOUNT_DEV }}"
export AZURE_STORAGE_ACCESS_KEY="${{ secrets.AZURE_STORAGE_ACCESS_KEY_DEV }}"
export REMOTE_STORAGE_AZURE_CONTAINER="${{ vars.REMOTE_STORAGE_AZURE_CONTAINER }}"
export REMOTE_STORAGE_AZURE_REGION="${{ vars.REMOTE_STORAGE_AZURE_REGION }}"
# Avoid `$CARGO_FEATURES` since there's no `testing` feature in the e2e tests now
cargo test $CARGO_FLAGS --package remote_storage --test test_real_azure
check-codestyle-rust-arm:
timeout-minutes: 90
runs-on: [ self-hosted, dev, arm64 ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
steps:
- name: Fix git ownership
run: |
# Workaround for `fatal: detected dubious ownership in repository at ...`
#
# Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers
# Ref https://github.com/actions/checkout/issues/785
#
git config --global --add safe.directory ${{ github.workspace }}
git config --global --add safe.directory ${GITHUB_WORKSPACE}
for r in 14 15 16; do
git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"
git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"
done
- name: Checkout
uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
# Some of our rust modules use FFI and need those to be checked
- name: Get postgres headers
run: make postgres-headers -j$(nproc)
# cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.
# This will catch compiler & clippy warnings in all feature combinations.
# TODO: use cargo hack for build and test as well, but, that's quite expensive.
# NB: keep clippy args in sync with ./run_clippy.sh
- run: |
CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"
if [ "$CLIPPY_COMMON_ARGS" = "" ]; then
echo "No clippy args found in .neon_clippy_args"
exit 1
fi
echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV
- name: Run cargo clippy (debug)
run: cargo hack --feature-powerset clippy $CLIPPY_COMMON_ARGS
- name: Run cargo clippy (release)
run: cargo hack --feature-powerset clippy --release $CLIPPY_COMMON_ARGS
- name: Check documentation generation
run: cargo doc --workspace --no-deps --document-private-items
env:
RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"
# Use `${{ !cancelled() }}` to run quck tests after the longer clippy run
- name: Check formatting
if: ${{ !cancelled() }}
run: cargo fmt --all -- --check
# https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci
- name: Check rust dependencies
if: ${{ !cancelled() }}
run: |
cargo hakari generate --diff # workspace-hack Cargo.toml is up-to-date
cargo hakari manage-deps --dry-run # all workspace crates depend on workspace-hack
# https://github.com/EmbarkStudios/cargo-deny
- name: Check rust licenses/bans/advisories/sources
if: ${{ !cancelled() }}
run: cargo deny check
gather-rust-build-stats:
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats')
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
runs-on: [ self-hosted, gen3, large ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
@@ -111,7 +324,7 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
submodules: true
fetch-depth: 1
@@ -120,6 +333,9 @@ jobs:
- name: Get postgres headers
run: make postgres-headers -j$(nproc)
- name: Build walproposer-lib
run: make walproposer-lib -j$(nproc)
- name: Produce the build stats
run: cargo build --all --release --timings

View File

@@ -14,7 +14,7 @@ on:
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
jobs:

29
.github/workflows/release-notify.yml vendored Normal file
View File

@@ -0,0 +1,29 @@
name: Notify Slack channel about upcoming release
concurrency:
group: ${{ github.workflow }}-${{ github.event.number }}
cancel-in-progress: true
on:
pull_request:
branches:
- release
types:
# Default types that triggers a workflow:
# - https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request
- opened
- synchronize
- reopened
# Additional types that we want to handle:
- closed
jobs:
notify:
runs-on: [ ubuntu-latest ]
steps:
- uses: neondatabase/dev-actions/release-pr-notify@main
with:
slack-token: ${{ secrets.SLACK_BOT_TOKEN }}
slack-channel-id: ${{ vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID || 'C05QQ9J1BRC' }} # if not set, then `#test-release-notifications`
github-token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -2,15 +2,19 @@ name: Create Release Branch
on:
schedule:
- cron: '0 10 * * 2'
- cron: '0 6 * * 1'
workflow_dispatch:
jobs:
create_release_branch:
runs-on: [ubuntu-latest]
runs-on: [ ubuntu-latest ]
permissions:
contents: write # for `git push`
steps:
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
ref: main
@@ -25,9 +29,16 @@ jobs:
run: git push origin releases/${{ steps.date.outputs.date }}
- name: Create pull request into release
uses: thomaseizinger/create-pull-request@e3972219c86a56550fb70708d96800d8e24ba862 # 1.3.0
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
head: releases/${{ steps.date.outputs.date }}
base: release
title: Release ${{ steps.date.outputs.date }}
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: |
cat << EOF > body.md
## Release ${{ steps.date.outputs.date }}
**Please merge this PR using 'Create a merge commit'!**
EOF
gh pr create --title "Release ${{ steps.date.outputs.date }}" \
--body-file "body.md" \
--head "releases/${{ steps.date.outputs.date }}" \
--base "release"

3
.gitignore vendored
View File

@@ -18,3 +18,6 @@ test_output/
*.o
*.so
*.Po
# pgindent typedef lists
*.list

4
.gitmodules vendored
View File

@@ -6,3 +6,7 @@
path = vendor/postgres-v15
url = https://github.com/neondatabase/postgres.git
branch = REL_15_STABLE_neon
[submodule "vendor/postgres-v16"]
path = vendor/postgres-v16
url = https://github.com/neondatabase/postgres.git
branch = REL_16_STABLE_neon

4
.neon_clippy_args Normal file
View File

@@ -0,0 +1,4 @@
# * `-A unknown_lints` do not warn about unknown lint suppressions
# that people with newer toolchains might use
# * `-D warnings` - fail on any warnings (`cargo` returns non-zero exit status)
export CLIPPY_COMMON_ARGS="--locked --workspace --all-targets -- -A unknown_lints -D warnings"

View File

@@ -1,11 +1,12 @@
/compute_tools/ @neondatabase/control-plane
/compute_tools/ @neondatabase/control-plane @neondatabase/compute
/control_plane/ @neondatabase/compute @neondatabase/storage
/libs/pageserver_api/ @neondatabase/compute @neondatabase/storage
/libs/postgres_ffi/ @neondatabase/compute
/libs/remote_storage/ @neondatabase/storage
/libs/safekeeper_api/ @neondatabase/safekeepers
/pageserver/ @neondatabase/compute @neondatabase/storage
/libs/postgres_ffi/ @neondatabase/compute
/libs/remote_storage/ @neondatabase/storage
/libs/safekeeper_api/ @neondatabase/safekeepers
/libs/vm_monitor/ @neondatabase/autoscaling @neondatabase/compute
/pageserver/ @neondatabase/storage
/pgxn/ @neondatabase/compute
/proxy/ @neondatabase/control-plane
/proxy/ @neondatabase/proxy
/safekeeper/ @neondatabase/safekeepers
/vendor/ @neondatabase/compute

View File

@@ -2,13 +2,31 @@
Howdy! Usual good software engineering practices apply. Write
tests. Write comments. Follow standard Rust coding practices where
possible. Use 'cargo fmt' and 'clippy' to tidy up formatting.
possible. Use `cargo fmt` and `cargo clippy` to tidy up formatting.
There are soft spots in the code, which could use cleanup,
refactoring, additional comments, and so forth. Let's try to raise the
bar, and clean things up as we go. Try to leave code in a better shape
than it was before.
## Pre-commit hook
We have a sample pre-commit hook in `pre-commit.py`.
To set it up, run:
```bash
ln -s ../../pre-commit.py .git/hooks/pre-commit
```
This will run following checks on staged files before each commit:
- `rustfmt`
- checks for python files, see [obligatory checks](/docs/sourcetree.md#obligatory-checks).
There is also a separate script `./run_clippy.sh` that runs `cargo clippy` on the whole project
and `./scripts/reformat` that runs all formatting tools to ensure the project is up to date.
If you want to skip the hook, run `git commit` with `--no-verify` option.
## Submitting changes
1. Get at least one +1 on your PR before you push.
@@ -27,3 +45,28 @@ your patch's fault. Help to fix the root cause if something else has
broken the CI, before pushing.
*Happy Hacking!*
# How to run a CI pipeline on Pull Requests from external contributors
_An instruction for maintainers_
## TL;DR:
- Review the PR
- If and only if it looks **safe** (i.e. it doesn't contain any malicious code which could expose secrets or harm the CI), then:
- Press the "Approve and run" button in GitHub UI
- Add the `approved-for-ci-run` label to the PR
Repeat all steps after any change to the PR.
- When the changes are ready to get merged — merge the original PR (not the internal one)
## Longer version:
GitHub Actions triggered by the `pull_request` event don't share repository secrets with the forks (for security reasons).
So, passing the CI pipeline on Pull Requests from external contributors is impossible.
We're using the following approach to make it work:
- After the review, assign the `approved-for-ci-run` label to the PR if changes look safe
- A GitHub Action will create an internal branch and a new PR with the same changes (for example, for a PR `#1234`, it'll be a branch `ci-run/pr-1234`)
- Because the PR is created from the internal branch, it is able to access repository secrets (that's why it's crucial to make sure that the PR doesn't contain any malicious code that could expose our secrets or intentionally harm the CI)
- The label gets removed automatically, so to run CI again with new changes, the label should be added again (after the review)
For details see [`approved-for-ci-run.yml`](.github/workflows/approved-for-ci-run.yml)

3557
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,14 +1,33 @@
[workspace]
resolver = "2"
members = [
"compute_tools",
"control_plane",
"pageserver",
"pageserver/ctl",
"pageserver/client",
"proxy",
"safekeeper",
"storage_broker",
"s3_scrubber",
"workspace_hack",
"trace",
"libs/*",
"libs/compute_api",
"libs/pageserver_api",
"libs/postgres_ffi",
"libs/safekeeper_api",
"libs/utils",
"libs/consumption_metrics",
"libs/postgres_backend",
"libs/pq_proto",
"libs/tenant_size_model",
"libs/metrics",
"libs/postgres_connection",
"libs/remote_storage",
"libs/tracing-utils",
"libs/postgres_ffi/wal_craft",
"libs/vm_monitor",
"libs/walproposer",
]
[workspace.package]
@@ -18,19 +37,29 @@ license = "Apache-2.0"
## All dependency versions, used in the project
[workspace.dependencies]
anyhow = { version = "1.0", features = ["backtrace"] }
arc-swap = "1.6"
async-compression = { version = "0.4.0", features = ["tokio", "gzip", "zstd"] }
azure_core = "0.18"
azure_identity = "0.18"
azure_storage = "0.18"
azure_storage_blobs = "0.18"
flate2 = "1.0.26"
async-stream = "0.3"
async-trait = "0.1"
atty = "0.2.14"
aws-config = { version = "0.51.0", default-features = false, features=["rustls"] }
aws-sdk-s3 = "0.21.0"
aws-smithy-http = "0.51.0"
aws-types = "0.51.0"
aws-config = { version = "1.0", default-features = false, features=["rustls"] }
aws-sdk-s3 = "1.0"
aws-smithy-async = { version = "1.0", default-features = false, features=["rt-tokio"] }
aws-smithy-types = "1.0"
aws-credential-types = "1.0"
axum = { version = "0.6.20", features = ["ws"] }
base64 = "0.13.0"
bincode = "1.3"
bindgen = "0.61"
bindgen = "0.65"
bstr = "1.0"
byteorder = "1.4"
bytes = "1.0"
camino = "1.1.6"
cfg-if = "1.0.0"
chrono = { version = "0.4", default-features = false, features = ["clock"] }
clap = { version = "4.0", features = ["derive"] }
close_fds = "0.3.2"
@@ -38,6 +67,8 @@ comfy-table = "6.1"
const_format = "0.2"
crc32c = "0.6"
crossbeam-utils = "0.8.5"
dashmap = { version = "5.5.0", features = ["raw-api"] }
either = "1.8"
enum-map = "2.4.2"
enumset = "1.0.12"
fail = "0.5.0"
@@ -49,86 +80,111 @@ git-version = "0.3"
hashbrown = "0.13"
hashlink = "0.8.1"
hex = "0.4"
hex-literal = "0.3"
hex-literal = "0.4"
hmac = "0.12.1"
hostname = "0.3.1"
http-types = { version = "2", default-features = false }
humantime = "2.1"
humantime-serde = "1.1.1"
hyper = "0.14"
hyper-tungstenite = "0.9"
hyper-tungstenite = "0.11"
inotify = "0.10.2"
ipnet = "2.9.0"
itertools = "0.10"
jsonwebtoken = "8"
libc = "0.2"
md5 = "0.7.0"
memoffset = "0.8"
native-tls = "0.2"
nix = "0.26"
notify = "5.0.0"
num_cpus = "1.15"
num-traits = "0.2.15"
once_cell = "1.13"
opentelemetry = "0.18.0"
opentelemetry-otlp = { version = "0.11.0", default_features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }
opentelemetry-semantic-conventions = "0.10.0"
tracing-opentelemetry = "0.18.0"
opentelemetry = "0.19.0"
opentelemetry-otlp = { version = "0.12.0", default_features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }
opentelemetry-semantic-conventions = "0.11.0"
parking_lot = "0.12"
pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }
pin-project-lite = "0.2"
prometheus = {version = "0.13", default_features=false, features = ["process"]} # removes protobuf dependency
prost = "0.11"
rand = "0.8"
regex = "1.4"
regex = "1.10.2"
reqwest = { version = "0.11", default-features = false, features = ["rustls-tls"] }
reqwest-tracing = { version = "0.4.0", features = ["opentelemetry_0_19"] }
reqwest-middleware = "0.2.0"
reqwest-retry = "0.2.2"
routerify = "3"
rpds = "0.12.0"
rustls = "0.20"
rpds = "0.13"
rustc-hash = "1.1.0"
rustls = "0.21"
rustls-pemfile = "1"
rustls-split = "0.3"
scopeguard = "1.1"
sentry = { version = "0.29", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }
sysinfo = "0.29.2"
sd-notify = "0.4.1"
sentry = { version = "0.31", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"
serde_path_to_error = "0.1"
serde_with = "2.0"
serde_assert = "0.5.0"
sha2 = "0.10.2"
signal-hook = "0.3"
socket2 = "0.4.4"
smallvec = "1.11"
smol_str = { version = "0.2.0", features = ["serde"] }
socket2 = "0.5"
strum = "0.24"
strum_macros = "0.24"
svg_fmt = "0.4.1"
sync_wrapper = "0.1.2"
tar = "0.4"
task-local-extensions = "0.1.4"
test-context = "0.1"
thiserror = "1.0"
tls-listener = { version = "0.6", features = ["rustls", "hyper-h1"] }
tls-listener = { version = "0.7", features = ["rustls", "hyper-h1"] }
tokio = { version = "1.17", features = ["macros"] }
tokio-postgres-rustls = "0.9.0"
tokio-rustls = "0.23"
tokio-io-timeout = "1.2.0"
tokio-postgres-rustls = "0.10.0"
tokio-rustls = "0.24"
tokio-stream = "0.1"
tokio-util = { version = "0.7", features = ["io"] }
toml = "0.5"
toml_edit = { version = "0.17", features = ["easy"] }
tonic = {version = "0.8", features = ["tls", "tls-roots"]}
tokio-tar = "0.3"
tokio-util = { version = "0.7.10", features = ["io", "rt"] }
toml = "0.7"
toml_edit = "0.19"
tonic = {version = "0.9", features = ["tls", "tls-roots"]}
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
tracing-error = "0.2.0"
tracing-opentelemetry = "0.19.0"
tracing-subscriber = { version = "0.3", default_features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter", "json"] }
url = "2.2"
uuid = { version = "1.2", features = ["v4", "serde"] }
walkdir = "2.3.2"
webpki-roots = "0.22.5"
x509-parser = "0.14"
webpki-roots = "0.25"
x509-parser = "0.15"
## TODO replace this with tracing
env_logger = "0.10"
log = "0.4"
## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
tokio-tar = { git = "https://github.com/neondatabase/tokio-tar.git", rev="404df61437de0feef49ba2ccdbdd94eb8ad6e142" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }
postgres-native-tls = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }
## Other git libraries
heapless = { default-features=false, features=[], git = "https://github.com/japaric/heapless.git", rev = "644653bf3b831c6bb4963be2de24804acf5e5001" } # upstream release pending
## Local libraries
compute_api = { version = "0.1", path = "./libs/compute_api/" }
consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }
metrics = { version = "0.1", path = "./libs/metrics/" }
pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }
pageserver_client = { path = "./pageserver/client" }
postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }
postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }
postgres_ffi = { version = "0.1", path = "./libs/postgres_ffi/" }
pq_proto = { version = "0.1", path = "./libs/pq_proto/" }
@@ -138,21 +194,24 @@ storage_broker = { version = "0.1", path = "./storage_broker/" } # Note: main br
tenant_size_model = { version = "0.1", path = "./libs/tenant_size_model/" }
tracing-utils = { version = "0.1", path = "./libs/tracing-utils/" }
utils = { version = "0.1", path = "./libs/utils/" }
vm_monitor = { version = "0.1", path = "./libs/vm_monitor/" }
walproposer = { version = "0.1", path = "./libs/walproposer/" }
## Common library dependency
workspace_hack = { version = "0.1", path = "./workspace_hack/" }
## Build dependencies
criterion = "0.4"
rcgen = "0.10"
rstest = "0.16"
tempfile = "3.2"
tonic-build = "0.8"
criterion = "0.5.1"
rcgen = "0.11"
rstest = "0.18"
camino-tempfile = "1.0.2"
tonic-build = "0.9"
[patch.crates-io]
# This is only needed for proxy's tests.
# TODO: we should probably fork `tokio-postgres-rustls` instead.
[patch.crates-io]
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch="neon" }
################# Binary contents sections

View File

@@ -2,7 +2,7 @@
### The image itself is mainly used as a container for the binaries and for starting e2e tests with custom parameters.
### By default, the binaries inside the image have some mock parameters and can start, but are not intended to be used
### inside this image in the real deployments.
ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com
ARG REPOSITORY=neondatabase
ARG IMAGE=rust
ARG TAG=pinned
@@ -12,6 +12,7 @@ WORKDIR /home/nonroot
COPY --chown=nonroot vendor/postgres-v14 vendor/postgres-v14
COPY --chown=nonroot vendor/postgres-v15 vendor/postgres-v15
COPY --chown=nonroot vendor/postgres-v16 vendor/postgres-v16
COPY --chown=nonroot pgxn pgxn
COPY --chown=nonroot Makefile Makefile
COPY --chown=nonroot scripts/ninstall.sh scripts/ninstall.sh
@@ -26,6 +27,7 @@ RUN set -e \
FROM $REPOSITORY/$IMAGE:$TAG AS build
WORKDIR /home/nonroot
ARG GIT_VERSION=local
ARG BUILD_TAG
# Enable https://github.com/paritytech/cachepot to cache Rust crates' compilation results in Docker builds.
# Set up cachepot to use an AWS S3 bucket for cache results, to reuse it between `docker build` invocations.
@@ -39,12 +41,21 @@ ARG CACHEPOT_BUCKET=neon-github-dev
COPY --from=pg-build /home/nonroot/pg_install/v14/include/postgresql/server pg_install/v14/include/postgresql/server
COPY --from=pg-build /home/nonroot/pg_install/v15/include/postgresql/server pg_install/v15/include/postgresql/server
COPY . .
COPY --from=pg-build /home/nonroot/pg_install/v16/include/postgresql/server pg_install/v16/include/postgresql/server
COPY --chown=nonroot . .
# Show build caching stats to check if it was used in the end.
# Has to be the part of the same RUN since cachepot daemon is killed in the end of this RUN, losing the compilation stats.
RUN set -e \
&& mold -run cargo build --bin pageserver --bin pageserver_binutils --bin draw_timeline_dir --bin safekeeper --bin storage_broker --bin proxy --locked --release \
&& mold -run cargo build \
--bin pg_sni_router \
--bin pageserver \
--bin pagectl \
--bin safekeeper \
--bin storage_broker \
--bin proxy \
--bin neon_local \
--locked --release \
&& cachepot -s
# Build final image
@@ -57,21 +68,24 @@ RUN set -e \
&& apt install -y \
libreadline-dev \
libseccomp-dev \
libicu67 \
openssl \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
&& useradd -d /data neon \
&& chown -R neon:neon /data
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pg_sni_router /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pageserver /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pageserver_binutils /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/draw_timeline_dir /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pagectl /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/safekeeper /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_broker /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_broker /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/proxy /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/neon_local /usr/local/bin
COPY --from=pg-build /home/nonroot/pg_install/v14 /usr/local/v14/
COPY --from=pg-build /home/nonroot/pg_install/v15 /usr/local/v15/
COPY --from=pg-build /home/nonroot/pg_install/v16 /usr/local/v16/
COPY --from=pg-build /home/nonroot/postgres_install.tar.gz /data/
# By default, pageserver uses `.neon/` working directory in WORKDIR, so create one and fill it with the dummy config.

View File

@@ -1,6 +1,8 @@
ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com
ARG PG_VERSION
ARG REPOSITORY=neondatabase
ARG IMAGE=rust
ARG TAG=pinned
ARG BUILD_TAG
#########################################################################################
#
@@ -11,7 +13,7 @@ FROM debian:bullseye-slim AS build-deps
RUN apt update && \
apt install -y git autoconf automake libtool build-essential bison flex libreadline-dev \
zlib1g-dev libxml2-dev libcurl4-openssl-dev libossp-uuid-dev wget pkg-config libssl-dev \
libicu-dev
libicu-dev libxslt1-dev liblz4-dev libzstd-dev zstd
#########################################################################################
#
@@ -23,18 +25,30 @@ FROM build-deps AS pg-build
ARG PG_VERSION
COPY vendor/postgres-${PG_VERSION} postgres
RUN cd postgres && \
./configure CFLAGS='-O2 -g3' --enable-debug --with-openssl --with-uuid=ossp --with-icu && \
export CONFIGURE_CMD="./configure CFLAGS='-O2 -g3' --enable-debug --with-openssl --with-uuid=ossp \
--with-icu --with-libxml --with-libxslt --with-lz4" && \
if [ "${PG_VERSION}" != "v14" ]; then \
# zstd is available only from PG15
export CONFIGURE_CMD="${CONFIGURE_CMD} --with-zstd"; \
fi && \
eval $CONFIGURE_CMD && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C contrib/ install && \
# Install headers
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/include install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/interfaces/libpq install && \
# Enable some of contrib extensions
echo 'trusted = true' >> /usr/local/pgsql/share/extension/autoinc.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/bloom.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrowlocks.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/earthdistance.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/insert_username.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/intagg.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/moddatetime.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_stat_statements.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrowlocks.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgstattuple.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/earthdistance.control
echo 'trusted = true' >> /usr/local/pgsql/share/extension/refint.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/xml2.control
#########################################################################################
#
@@ -50,17 +64,21 @@ RUN apt update && \
libcgal-dev libgdal-dev libgmp-dev libmpfr-dev libopenscenegraph-dev libprotobuf-c-dev \
protobuf-c-compiler xsltproc
RUN wget https://gitlab.com/Oslandia/SFCGAL/-/archive/v1.3.10/SFCGAL-v1.3.10.tar.gz && \
tar zxvf SFCGAL-v1.3.10.tar.gz && \
cd SFCGAL-v1.3.10 && cmake . && make -j $(getconf _NPROCESSORS_ONLN) && \
# SFCGAL > 1.3 requires CGAL > 5.2, Bullseye's libcgal-dev is 5.2
RUN wget https://gitlab.com/Oslandia/SFCGAL/-/archive/v1.3.10/SFCGAL-v1.3.10.tar.gz -O SFCGAL.tar.gz && \
echo "4e39b3b2adada6254a7bdba6d297bb28e1a9835a9f879b74f37e2dab70203232 SFCGAL.tar.gz" | sha256sum --check && \
mkdir sfcgal-src && cd sfcgal-src && tar xvzf ../SFCGAL.tar.gz --strip-components=1 -C . && \
cmake -DCMAKE_BUILD_TYPE=Release . && make -j $(getconf _NPROCESSORS_ONLN) && \
DESTDIR=/sfcgal make install -j $(getconf _NPROCESSORS_ONLN) && \
make clean && cp -R /sfcgal/* /
RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.1.tar.gz && \
tar xvzf postgis-3.3.1.tar.gz && \
cd postgis-3.3.1 && \
ENV PATH "/usr/local/pgsql/bin:$PATH"
RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.3.tar.gz -O postgis.tar.gz && \
echo "74eb356e3f85f14233791013360881b6748f78081cc688ff9d6f0f673a762d13 postgis.tar.gz" | sha256sum --check && \
mkdir postgis-src && cd postgis-src && tar xvzf ../postgis.tar.gz --strip-components=1 -C . && \
find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /before.txt &&\
./autogen.sh && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
./configure --with-sfcgal=/usr/local/bin/sfcgal-config && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
cd extensions/postgis && \
@@ -72,7 +90,28 @@ RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.1.tar.gz && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_tiger_geocoder.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_topology.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer_data_us.control
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer_data_us.control && \
mkdir -p /extensions/postgis && \
cp /usr/local/pgsql/share/extension/postgis.control /extensions/postgis && \
cp /usr/local/pgsql/share/extension/postgis_raster.control /extensions/postgis && \
cp /usr/local/pgsql/share/extension/postgis_sfcgal.control /extensions/postgis && \
cp /usr/local/pgsql/share/extension/postgis_tiger_geocoder.control /extensions/postgis && \
cp /usr/local/pgsql/share/extension/postgis_topology.control /extensions/postgis && \
cp /usr/local/pgsql/share/extension/address_standardizer.control /extensions/postgis && \
cp /usr/local/pgsql/share/extension/address_standardizer_data_us.control /extensions/postgis
RUN wget https://github.com/pgRouting/pgrouting/archive/v3.4.2.tar.gz -O pgrouting.tar.gz && \
echo "cac297c07d34460887c4f3b522b35c470138760fe358e351ad1db4edb6ee306e pgrouting.tar.gz" | sha256sum --check && \
mkdir pgrouting-src && cd pgrouting-src && tar xvzf ../pgrouting.tar.gz --strip-components=1 -C . && \
mkdir build && cd build && \
cmake -DCMAKE_BUILD_TYPE=Release .. && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrouting.control && \
find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /after.txt &&\
cp /usr/local/pgsql/share/extension/pgrouting.control /extensions/postgis && \
sort -o /before.txt /before.txt && sort -o /after.txt /after.txt && \
comm -13 /before.txt /after.txt | tar --directory=/usr/local/pgsql --zstd -cf /extensions/postgis.tar.zst -T -
#########################################################################################
#
@@ -83,30 +122,31 @@ RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.1.tar.gz && \
FROM build-deps AS plv8-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt update && \
apt install -y ninja-build python3-dev libc++-dev libc++abi-dev libncurses5 binutils
apt install -y ninja-build python3-dev libncurses5 binutils clang
# https://github.com/plv8/plv8/issues/475:
# v8 uses gold for linking and sets `--thread-count=4` which breaks
# gold version <= 1.35 (https://sourceware.org/bugzilla/show_bug.cgi?id=23607)
# Install newer gold version manually as debian-testing binutils version updates
# libc version, which in turn breaks other extension built against non-testing libc.
RUN wget https://ftp.gnu.org/gnu/binutils/binutils-2.38.tar.gz && \
tar xvzf binutils-2.38.tar.gz && \
cd binutils-2.38 && \
cd libiberty && ./configure && make -j $(getconf _NPROCESSORS_ONLN) && \
cd ../bfd && ./configure && make bfdver.h && \
cd ../gold && ./configure && make -j $(getconf _NPROCESSORS_ONLN) && make install && \
cp /usr/local/bin/ld.gold /usr/bin/gold
# Sed is used to patch for https://github.com/plv8/plv8/issues/503
RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.4.tar.gz && \
tar xvzf v3.1.4.tar.gz && \
cd plv8-3.1.4 && \
RUN case "${PG_VERSION}" in \
"v14" | "v15") \
export PLV8_VERSION=3.1.5 \
export PLV8_CHECKSUM=1e108d5df639e4c189e1c5bdfa2432a521c126ca89e7e5a969d46899ca7bf106 \
;; \
"v16") \
export PLV8_VERSION=3.1.8 \
export PLV8_CHECKSUM=92b10c7db39afdae97ff748c9ec54713826af222c459084ad002571b79eb3f49 \
;; \
*) \
echo "Export the valid PG_VERSION variable" && exit 1 \
;; \
esac && \
wget https://github.com/plv8/plv8/archive/refs/tags/v${PLV8_VERSION}.tar.gz -O plv8.tar.gz && \
echo "${PLV8_CHECKSUM} plv8.tar.gz" | sha256sum --check && \
mkdir plv8-src && cd plv8-src && tar xvzf ../plv8.tar.gz --strip-components=1 -C . && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
sed -i 's/MemoryContextAlloc(/MemoryContextAllocZero(/' plv8.cc && \
make DOCKER=1 -j $(getconf _NPROCESSORS_ONLN) install && \
rm -rf /plv8-* && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plv8.control
find /usr/local/pgsql/ -name "plv8-*.so" | xargs strip && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plv8.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plcoffee.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plls.control
#########################################################################################
#
@@ -117,27 +157,37 @@ RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.4.tar.gz && \
FROM build-deps AS h3-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# packaged cmake is too old
RUN wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-x86_64.sh \
RUN case "$(uname -m)" in \
"x86_64") \
export CMAKE_CHECKSUM=739d372726cb23129d57a539ce1432453448816e345e1545f6127296926b6754 \
;; \
"aarch64") \
export CMAKE_CHECKSUM=281b42627c9a1beed03e29706574d04c6c53fae4994472e90985ef018dd29c02 \
;; \
*) \
echo "Unsupported architecture '$(uname -m)'. Supported are x86_64 and aarch64" && exit 1 \
;; \
esac && \
wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-$(uname -m).sh \
-q -O /tmp/cmake-install.sh \
&& echo "${CMAKE_CHECKSUM} /tmp/cmake-install.sh" | sha256sum --check \
&& chmod u+x /tmp/cmake-install.sh \
&& /tmp/cmake-install.sh --skip-license --prefix=/usr/local/ \
&& rm /tmp/cmake-install.sh
RUN wget https://github.com/uber/h3/archive/refs/tags/v4.0.1.tar.gz -O h3.tgz && \
tar xvzf h3.tgz && \
cd h3-4.0.1 && \
mkdir build && \
cd build && \
RUN wget https://github.com/uber/h3/archive/refs/tags/v4.1.0.tar.gz -O h3.tar.gz && \
echo "ec99f1f5974846bde64f4513cf8d2ea1b8d172d2218ab41803bf6a63532272bc h3.tar.gz" | sha256sum --check && \
mkdir h3-src && cd h3-src && tar xvzf ../h3.tar.gz --strip-components=1 -C . && \
mkdir build && cd build && \
cmake .. -DCMAKE_BUILD_TYPE=Release && \
make -j $(getconf _NPROCESSORS_ONLN) && \
DESTDIR=/h3 make install && \
cp -R /h3/usr / && \
rm -rf build
RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.0.1.tar.gz -O h3-pg.tgz && \
tar xvzf h3-pg.tgz && \
cd h3-pg-4.0.1 && \
RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.1.3.tar.gz -O h3-pg.tar.gz && \
echo "5c17f09a820859ffe949f847bebf1be98511fb8f1bd86f94932512c00479e324 h3-pg.tar.gz" | sha256sum --check && \
mkdir h3-pg-src && cd h3-pg-src && tar xvzf ../h3-pg.tar.gz --strip-components=1 -C . && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
@@ -153,9 +203,9 @@ RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.0.1.tar.gz -O h3
FROM build-deps AS unit-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/df7cb/postgresql-unit/archive/refs/tags/7.7.tar.gz && \
tar xvzf 7.7.tar.gz && \
cd postgresql-unit-7.7 && \
RUN wget https://github.com/df7cb/postgresql-unit/archive/refs/tags/7.7.tar.gz -O postgresql-unit.tar.gz && \
echo "411d05beeb97e5a4abf17572bfcfbb5a68d98d1018918feff995f6ee3bb03e79 postgresql-unit.tar.gz" | sha256sum --check && \
mkdir postgresql-unit-src && cd postgresql-unit-src && tar xvzf ../postgresql-unit.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
# unit extension's "create extension" script relies on absolute install path to fill some reference tables.
@@ -165,6 +215,524 @@ RUN wget https://github.com/df7cb/postgresql-unit/archive/refs/tags/7.7.tar.gz &
find /usr/local/pgsql/share/extension/ -name "unit*.sql" -print0 | xargs -0 sed -i "s|pgsql/||g" && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/unit.control
#########################################################################################
#
# Layer "vector-pg-build"
# compile pgvector extension
#
#########################################################################################
FROM build-deps AS vector-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/pgvector/pgvector/archive/refs/tags/v0.5.1.tar.gz -O pgvector.tar.gz && \
echo "cc7a8e034a96e30a819911ac79d32f6bc47bdd1aa2de4d7d4904e26b83209dc8 pgvector.tar.gz" | sha256sum --check && \
mkdir pgvector-src && cd pgvector-src && tar xvzf ../pgvector.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/vector.control
#########################################################################################
#
# Layer "pgjwt-pg-build"
# compile pgjwt extension
#
#########################################################################################
FROM build-deps AS pgjwt-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# 9742dab1b2f297ad3811120db7b21451bca2d3c9 made on 13/11/2021
RUN wget https://github.com/michelp/pgjwt/archive/9742dab1b2f297ad3811120db7b21451bca2d3c9.tar.gz -O pgjwt.tar.gz && \
echo "cfdefb15007286f67d3d45510f04a6a7a495004be5b3aecb12cda667e774203f pgjwt.tar.gz" | sha256sum --check && \
mkdir pgjwt-src && cd pgjwt-src && tar xvzf ../pgjwt.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgjwt.control
#########################################################################################
#
# Layer "hypopg-pg-build"
# compile hypopg extension
#
#########################################################################################
FROM build-deps AS hypopg-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/HypoPG/hypopg/archive/refs/tags/1.4.0.tar.gz -O hypopg.tar.gz && \
echo "0821011743083226fc9b813c1f2ef5897a91901b57b6bea85a78e466187c6819 hypopg.tar.gz" | sha256sum --check && \
mkdir hypopg-src && cd hypopg-src && tar xvzf ../hypopg.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/hypopg.control
#########################################################################################
#
# Layer "pg-hashids-pg-build"
# compile pg_hashids extension
#
#########################################################################################
FROM build-deps AS pg-hashids-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/iCyberon/pg_hashids/archive/refs/tags/v1.2.1.tar.gz -O pg_hashids.tar.gz && \
echo "74576b992d9277c92196dd8d816baa2cc2d8046fe102f3dcd7f3c3febed6822a pg_hashids.tar.gz" | sha256sum --check && \
mkdir pg_hashids-src && cd pg_hashids-src && tar xvzf ../pg_hashids.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_hashids.control
#########################################################################################
#
# Layer "rum-pg-build"
# compile rum extension
#
#########################################################################################
FROM build-deps AS rum-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/postgrespro/rum/archive/refs/tags/1.3.13.tar.gz -O rum.tar.gz && \
echo "6ab370532c965568df6210bd844ac6ba649f53055e48243525b0b7e5c4d69a7d rum.tar.gz" | sha256sum --check && \
mkdir rum-src && cd rum-src && tar xvzf ../rum.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/rum.control
#########################################################################################
#
# Layer "pgtap-pg-build"
# compile pgTAP extension
#
#########################################################################################
FROM build-deps AS pgtap-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/theory/pgtap/archive/refs/tags/v1.2.0.tar.gz -O pgtap.tar.gz && \
echo "9c7c3de67ea41638e14f06da5da57bac6f5bd03fea05c165a0ec862205a5c052 pgtap.tar.gz" | sha256sum --check && \
mkdir pgtap-src && cd pgtap-src && tar xvzf ../pgtap.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgtap.control
#########################################################################################
#
# Layer "ip4r-pg-build"
# compile ip4r extension
#
#########################################################################################
FROM build-deps AS ip4r-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/RhodiumToad/ip4r/archive/refs/tags/2.4.2.tar.gz -O ip4r.tar.gz && \
echo "0f7b1f159974f49a47842a8ab6751aecca1ed1142b6d5e38d81b064b2ead1b4b ip4r.tar.gz" | sha256sum --check && \
mkdir ip4r-src && cd ip4r-src && tar xvzf ../ip4r.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/ip4r.control
#########################################################################################
#
# Layer "prefix-pg-build"
# compile Prefix extension
#
#########################################################################################
FROM build-deps AS prefix-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/dimitri/prefix/archive/refs/tags/v1.2.10.tar.gz -O prefix.tar.gz && \
echo "4342f251432a5f6fb05b8597139d3ccde8dcf87e8ca1498e7ee931ca057a8575 prefix.tar.gz" | sha256sum --check && \
mkdir prefix-src && cd prefix-src && tar xvzf ../prefix.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/prefix.control
#########################################################################################
#
# Layer "hll-pg-build"
# compile hll extension
#
#########################################################################################
FROM build-deps AS hll-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/citusdata/postgresql-hll/archive/refs/tags/v2.18.tar.gz -O hll.tar.gz && \
echo "e2f55a6f4c4ab95ee4f1b4a2b73280258c5136b161fe9d059559556079694f0e hll.tar.gz" | sha256sum --check && \
mkdir hll-src && cd hll-src && tar xvzf ../hll.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/hll.control
#########################################################################################
#
# Layer "plpgsql-check-pg-build"
# compile plpgsql_check extension
#
#########################################################################################
FROM build-deps AS plpgsql-check-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/okbob/plpgsql_check/archive/refs/tags/v2.5.3.tar.gz -O plpgsql_check.tar.gz && \
echo "6631ec3e7fb3769eaaf56e3dfedb829aa761abf163d13dba354b4c218508e1c0 plpgsql_check.tar.gz" | sha256sum --check && \
mkdir plpgsql_check-src && cd plpgsql_check-src && tar xvzf ../plpgsql_check.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plpgsql_check.control
#########################################################################################
#
# Layer "timescaledb-pg-build"
# compile timescaledb extension
#
#########################################################################################
FROM build-deps AS timescaledb-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ARG PG_VERSION
ENV PATH "/usr/local/pgsql/bin:$PATH"
RUN case "${PG_VERSION}" in \
"v14" | "v15") \
export TIMESCALEDB_VERSION=2.10.1 \
export TIMESCALEDB_CHECKSUM=6fca72a6ed0f6d32d2b3523951ede73dc5f9b0077b38450a029a5f411fdb8c73 \
;; \
*) \
export TIMESCALEDB_VERSION=2.13.0 \
export TIMESCALEDB_CHECKSUM=584a351c7775f0e067eaa0e7277ea88cab9077cc4c455cbbf09a5d9723dce95d \
;; \
esac && \
apt-get update && \
apt-get install -y cmake && \
wget https://github.com/timescale/timescaledb/archive/refs/tags/${TIMESCALEDB_VERSION}.tar.gz -O timescaledb.tar.gz && \
echo "${TIMESCALEDB_CHECKSUM} timescaledb.tar.gz" | sha256sum --check && \
mkdir timescaledb-src && cd timescaledb-src && tar xvzf ../timescaledb.tar.gz --strip-components=1 -C . && \
./bootstrap -DSEND_TELEMETRY_DEFAULT:BOOL=OFF -DUSE_TELEMETRY:BOOL=OFF -DAPACHE_ONLY:BOOL=ON -DCMAKE_BUILD_TYPE=Release && \
cd build && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make install -j $(getconf _NPROCESSORS_ONLN) && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/timescaledb.control
#########################################################################################
#
# Layer "pg-hint-plan-pg-build"
# compile pg_hint_plan extension
#
#########################################################################################
FROM build-deps AS pg-hint-plan-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ARG PG_VERSION
ENV PATH "/usr/local/pgsql/bin:$PATH"
RUN case "${PG_VERSION}" in \
"v14") \
export PG_HINT_PLAN_VERSION=14_1_4_1 \
export PG_HINT_PLAN_CHECKSUM=c3501becf70ead27f70626bce80ea401ceac6a77e2083ee5f3ff1f1444ec1ad1 \
;; \
"v15") \
export PG_HINT_PLAN_VERSION=15_1_5_0 \
export PG_HINT_PLAN_CHECKSUM=564cbbf4820973ffece63fbf76e3c0af62c4ab23543142c7caaa682bc48918be \
;; \
"v16") \
export PG_HINT_PLAN_VERSION=16_1_6_0 \
export PG_HINT_PLAN_CHECKSUM=fc85a9212e7d2819d4ae4ac75817481101833c3cfa9f0fe1f980984e12347d00 \
;; \
*) \
echo "Export the valid PG_HINT_PLAN_VERSION variable" && exit 1 \
;; \
esac && \
wget https://github.com/ossc-db/pg_hint_plan/archive/refs/tags/REL${PG_HINT_PLAN_VERSION}.tar.gz -O pg_hint_plan.tar.gz && \
echo "${PG_HINT_PLAN_CHECKSUM} pg_hint_plan.tar.gz" | sha256sum --check && \
mkdir pg_hint_plan-src && cd pg_hint_plan-src && tar xvzf ../pg_hint_plan.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make install -j $(getconf _NPROCESSORS_ONLN) && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_hint_plan.control
#########################################################################################
#
# Layer "kq-imcx-pg-build"
# compile kq_imcx extension
#
#########################################################################################
FROM build-deps AS kq-imcx-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN apt-get update && \
apt-get install -y git libgtk2.0-dev libpq-dev libpam-dev libxslt-dev libkrb5-dev cmake && \
wget https://github.com/ketteq-neon/postgres-exts/archive/e0bd1a9d9313d7120c1b9c7bb15c48c0dede4c4e.tar.gz -O kq_imcx.tar.gz && \
echo "dc93a97ff32d152d32737ba7e196d9687041cda15e58ab31344c2f2de8855336 kq_imcx.tar.gz" | sha256sum --check && \
mkdir kq_imcx-src && cd kq_imcx-src && tar xvzf ../kq_imcx.tar.gz --strip-components=1 -C . && \
find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /before.txt &&\
mkdir build && cd build && \
cmake -DCMAKE_BUILD_TYPE=Release .. && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/kq_imcx.control && \
find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /after.txt &&\
mkdir -p /extensions/kq_imcx && cp /usr/local/pgsql/share/extension/kq_imcx.control /extensions/kq_imcx && \
sort -o /before.txt /before.txt && sort -o /after.txt /after.txt && \
comm -13 /before.txt /after.txt | tar --directory=/usr/local/pgsql --zstd -cf /extensions/kq_imcx.tar.zst -T -
#########################################################################################
#
# Layer "pg-cron-pg-build"
# compile pg_cron extension
#
#########################################################################################
FROM build-deps AS pg-cron-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://github.com/citusdata/pg_cron/archive/refs/tags/v1.6.0.tar.gz -O pg_cron.tar.gz && \
echo "383a627867d730222c272bfd25cd5e151c578d73f696d32910c7db8c665cc7db pg_cron.tar.gz" | sha256sum --check && \
mkdir pg_cron-src && cd pg_cron-src && tar xvzf ../pg_cron.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_cron.control
#########################################################################################
#
# Layer "rdkit-pg-build"
# compile rdkit extension
#
#########################################################################################
FROM build-deps AS rdkit-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt-get update && \
apt-get install -y \
cmake \
libboost-iostreams1.74-dev \
libboost-regex1.74-dev \
libboost-serialization1.74-dev \
libboost-system1.74-dev \
libeigen3-dev \
libfreetype6-dev
ENV PATH "/usr/local/pgsql/bin/:/usr/local/pgsql/:$PATH"
RUN wget https://github.com/rdkit/rdkit/archive/refs/tags/Release_2023_03_3.tar.gz -O rdkit.tar.gz && \
echo "bdbf9a2e6988526bfeb8c56ce3cdfe2998d60ac289078e2215374288185e8c8d rdkit.tar.gz" | sha256sum --check && \
mkdir rdkit-src && cd rdkit-src && tar xvzf ../rdkit.tar.gz --strip-components=1 -C . && \
cmake \
-D RDK_BUILD_CAIRO_SUPPORT=OFF \
-D RDK_BUILD_INCHI_SUPPORT=ON \
-D RDK_BUILD_AVALON_SUPPORT=ON \
-D RDK_BUILD_PYTHON_WRAPPERS=OFF \
-D RDK_BUILD_DESCRIPTORS3D=OFF \
-D RDK_BUILD_FREESASA_SUPPORT=OFF \
-D RDK_BUILD_COORDGEN_SUPPORT=ON \
-D RDK_BUILD_MOLINTERCHANGE_SUPPORT=OFF \
-D RDK_BUILD_YAEHMOP_SUPPORT=OFF \
-D RDK_BUILD_STRUCTCHECKER_SUPPORT=OFF \
-D RDK_USE_URF=OFF \
-D RDK_BUILD_PGSQL=ON \
-D RDK_PGSQL_STATIC=ON \
-D PostgreSQL_CONFIG=pg_config \
-D PostgreSQL_INCLUDE_DIR=`pg_config --includedir` \
-D PostgreSQL_TYPE_INCLUDE_DIR=`pg_config --includedir-server` \
-D PostgreSQL_LIBRARY_DIR=`pg_config --libdir` \
-D RDK_INSTALL_INTREE=OFF \
-D CMAKE_BUILD_TYPE=Release \
. && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/rdkit.control
#########################################################################################
#
# Layer "pg-uuidv7-pg-build"
# compile pg_uuidv7 extension
#
#########################################################################################
FROM build-deps AS pg-uuidv7-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://github.com/fboulnois/pg_uuidv7/archive/refs/tags/v1.0.1.tar.gz -O pg_uuidv7.tar.gz && \
echo "0d0759ab01b7fb23851ecffb0bce27822e1868a4a5819bfd276101c716637a7a pg_uuidv7.tar.gz" | sha256sum --check && \
mkdir pg_uuidv7-src && cd pg_uuidv7-src && tar xvzf ../pg_uuidv7.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_uuidv7.control
#########################################################################################
#
# Layer "pg-roaringbitmap-pg-build"
# compile pg_roaringbitmap extension
#
#########################################################################################
FROM build-deps AS pg-roaringbitmap-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://github.com/ChenHuajun/pg_roaringbitmap/archive/refs/tags/v0.5.4.tar.gz -O pg_roaringbitmap.tar.gz && \
echo "b75201efcb1c2d1b014ec4ae6a22769cc7a224e6e406a587f5784a37b6b5a2aa pg_roaringbitmap.tar.gz" | sha256sum --check && \
mkdir pg_roaringbitmap-src && cd pg_roaringbitmap-src && tar xvzf ../pg_roaringbitmap.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/roaringbitmap.control
#########################################################################################
#
# Layer "pg-embedding-pg-build"
# compile pg_embedding extension
#
#########################################################################################
FROM build-deps AS pg-embedding-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in \
"v14" | "v15") \
export PG_EMBEDDING_VERSION=0.3.5 \
export PG_EMBEDDING_CHECKSUM=0e95b27b8b6196e2cf0a0c9ec143fe2219b82e54c5bb4ee064e76398cbe69ae9 \
;; \
*) \
echo "pg_embedding not supported on this PostgreSQL version. Use pgvector instead." && exit 0;; \
esac && \
wget https://github.com/neondatabase/pg_embedding/archive/refs/tags/${PG_EMBEDDING_VERSION}.tar.gz -O pg_embedding.tar.gz && \
echo "${PG_EMBEDDING_CHECKSUM} pg_embedding.tar.gz" | sha256sum --check && \
mkdir pg_embedding-src && cd pg_embedding-src && tar xvzf ../pg_embedding.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install
#########################################################################################
#
# Layer "pg-anon-pg-build"
# compile anon extension
#
#########################################################################################
FROM build-deps AS pg-anon-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://gitlab.com/dalibo/postgresql_anonymizer/-/archive/1.1.0/postgresql_anonymizer-1.1.0.tar.gz -O pg_anon.tar.gz && \
echo "08b09d2ff9b962f96c60db7e6f8e79cf7253eb8772516998fc35ece08633d3ad pg_anon.tar.gz" | sha256sum --check && \
mkdir pg_anon-src && cd pg_anon-src && tar xvzf ../pg_anon.tar.gz --strip-components=1 -C . && \
find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /before.txt &&\
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/anon.control && \
find /usr/local/pgsql -type f | sed 's|^/usr/local/pgsql/||' > /after.txt &&\
mkdir -p /extensions/anon && cp /usr/local/pgsql/share/extension/anon.control /extensions/anon && \
sort -o /before.txt /before.txt && sort -o /after.txt /after.txt && \
comm -13 /before.txt /after.txt | tar --directory=/usr/local/pgsql --zstd -cf /extensions/anon.tar.zst -T -
#########################################################################################
#
# Layer "rust extensions"
# This layer is used to build `pgrx` deps
#
#########################################################################################
FROM build-deps AS rust-extensions-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt-get update && \
apt-get install -y curl libclang-dev cmake && \
useradd -ms /bin/bash nonroot -b /home
ENV HOME=/home/nonroot
ENV PATH="/home/nonroot/.cargo/bin:/usr/local/pgsql/bin/:$PATH"
USER nonroot
WORKDIR /home/nonroot
ARG PG_VERSION
RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && \
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain stable && \
rm rustup-init && \
cargo install --locked --version 0.10.2 cargo-pgrx && \
/bin/bash -c 'cargo pgrx init --pg${PG_VERSION:1}=/usr/local/pgsql/bin/pg_config'
USER root
#########################################################################################
#
# Layer "pg-jsonschema-pg-build"
# Compile "pg_jsonschema" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-jsonschema-pg-build
ARG PG_VERSION
RUN wget https://github.com/supabase/pg_jsonschema/archive/refs/tags/v0.2.0.tar.gz -O pg_jsonschema.tar.gz && \
echo "9118fc508a6e231e7a39acaa6f066fcd79af17a5db757b47d2eefbe14f7794f0 pg_jsonschema.tar.gz" | sha256sum --check && \
mkdir pg_jsonschema-src && cd pg_jsonschema-src && tar xvzf ../pg_jsonschema.tar.gz --strip-components=1 -C . && \
sed -i 's/pgrx = "0.10.2"/pgrx = { version = "0.10.2", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
cargo pgrx install --release && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_jsonschema.control
#########################################################################################
#
# Layer "pg-graphql-pg-build"
# Compile "pg_graphql" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-graphql-pg-build
ARG PG_VERSION
RUN wget https://github.com/supabase/pg_graphql/archive/refs/tags/v1.4.0.tar.gz -O pg_graphql.tar.gz && \
echo "bd8dc7230282b3efa9ae5baf053a54151ed0e66881c7c53750e2d0c765776edc pg_graphql.tar.gz" | sha256sum --check && \
mkdir pg_graphql-src && cd pg_graphql-src && tar xvzf ../pg_graphql.tar.gz --strip-components=1 -C . && \
sed -i 's/pgrx = "=0.10.2"/pgrx = { version = "0.10.2", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
cargo pgrx install --release && \
# it's needed to enable extension because it uses untrusted C language
sed -i 's/superuser = false/superuser = true/g' /usr/local/pgsql/share/extension/pg_graphql.control && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_graphql.control
#########################################################################################
#
# Layer "pg-tiktoken-build"
# Compile "pg_tiktoken" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-tiktoken-pg-build
ARG PG_VERSION
# 26806147b17b60763039c6a6878884c41a262318 made on 26/09/2023
RUN wget https://github.com/kelvich/pg_tiktoken/archive/26806147b17b60763039c6a6878884c41a262318.tar.gz -O pg_tiktoken.tar.gz && \
echo "e64e55aaa38c259512d3e27c572da22c4637418cf124caba904cd50944e5004e pg_tiktoken.tar.gz" | sha256sum --check && \
mkdir pg_tiktoken-src && cd pg_tiktoken-src && tar xvzf ../pg_tiktoken.tar.gz --strip-components=1 -C . && \
cargo pgrx install --release && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_tiktoken.control
#########################################################################################
#
# Layer "pg-pgx-ulid-build"
# Compile "pgx_ulid" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-pgx-ulid-build
ARG PG_VERSION
RUN wget https://github.com/pksunkara/pgx_ulid/archive/refs/tags/v0.1.3.tar.gz -O pgx_ulid.tar.gz && \
echo "ee5db82945d2d9f2d15597a80cf32de9dca67b897f605beb830561705f12683c pgx_ulid.tar.gz" | sha256sum --check && \
mkdir pgx_ulid-src && cd pgx_ulid-src && tar xvzf ../pgx_ulid.tar.gz --strip-components=1 -C . && \
echo "******************* Apply a patch for Postgres 16 support; delete in the next release ******************" && \
wget https://github.com/pksunkara/pgx_ulid/commit/f84954cf63fc8c80d964ac970d9eceed3c791196.patch && \
patch -p1 < f84954cf63fc8c80d964ac970d9eceed3c791196.patch && \
echo "********************************************************************************************************" && \
sed -i 's/pgrx = "=0.10.2"/pgrx = { version = "=0.10.2", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
cargo pgrx install --release && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/ulid.control
#########################################################################################
#
# Layer "wal2json-build"
# Compile "wal2json" extension
#
#########################################################################################
FROM build-deps AS wal2json-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://github.com/eulerto/wal2json/archive/refs/tags/wal2json_2_5.tar.gz && \
echo "b516653575541cf221b99cf3f8be9b6821f6dbcfc125675c85f35090f824f00e wal2json_2_5.tar.gz" | sha256sum --check && \
mkdir wal2json-src && cd wal2json-src && tar xvzf ../wal2json_2_5.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install
#########################################################################################
#
# Layer "neon-pg-ext-build"
@@ -172,17 +740,63 @@ RUN wget https://github.com/df7cb/postgresql-unit/archive/refs/tags/7.7.tar.gz &
#
#########################################################################################
FROM build-deps AS neon-pg-ext-build
# Public extensions
COPY --from=postgis-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=postgis-build /sfcgal/* /
COPY --from=plv8-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /h3/usr /
COPY --from=unit-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=vector-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pgjwt-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-jsonschema-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-graphql-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-tiktoken-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=hypopg-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-hashids-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=rum-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pgtap-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=ip4r-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=prefix-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=hll-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=plpgsql-check-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=timescaledb-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-hint-plan-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=kq-imcx-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-cron-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-pgx-ulid-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=rdkit-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-uuidv7-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-roaringbitmap-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-embedding-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=wal2json-pg-build /usr/local/pgsql /usr/local/pgsql
COPY pgxn/ pgxn/
RUN make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon \
-s install && \
make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon_utils \
-s install && \
make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon_rmgr \
-s install && \
case "${PG_VERSION}" in \
"v14" | "v15") \
;; \
"v16") \
echo "Skipping HNSW for PostgreSQL 16" && exit 0 \
;; \
*) \
echo "unexpected PostgreSQL version" && exit 1 \
;; \
esac && \
make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/hnsw \
-s install
#########################################################################################
@@ -191,6 +805,9 @@ RUN make -j $(getconf _NPROCESSORS_ONLN) \
#
#########################################################################################
FROM $REPOSITORY/$IMAGE:$TAG AS compute-tools
ARG BUILD_TAG
ENV BUILD_TAG=$BUILD_TAG
USER nonroot
# Copy entire project to get Cargo.* files with proper dependencies for the whole project
COPY --chown=nonroot . .
@@ -228,28 +845,48 @@ RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \
mkdir /var/db/postgres/compute && mkdir /var/db/postgres/specs && \
chown -R postgres:postgres /var/db/postgres && \
chmod 0750 /var/db/postgres/compute && \
echo '/usr/local/lib' >> /etc/ld.so.conf && /sbin/ldconfig
echo '/usr/local/lib' >> /etc/ld.so.conf && /sbin/ldconfig && \
# create folder for file cache
mkdir -p -m 777 /neon/cache
COPY --from=postgres-cleanup-layer --chown=postgres /usr/local/pgsql /usr/local
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
# Install:
# libreadline8 for psql
# libicu67, locales for collations (including ICU)
# libicu67, locales for collations (including ICU and plpgsql_check)
# liblz4-1 for lz4
# libossp-uuid16 for extension ossp-uuid
# libgeos, libgdal, libsfcgal1, libproj and libprotobuf-c1 for PostGIS
# libxml2, libxslt1.1 for xml2
# libzstd1 for zstd
# libboost*, libfreetype6, and zlib1g for rdkit
# ca-certificates for communicating with s3 by compute_ctl
RUN apt update && \
apt install --no-install-recommends -y \
locales \
gdb \
libicu67 \
liblz4-1 \
libreadline8 \
libboost-iostreams1.74.0 \
libboost-regex1.74.0 \
libboost-serialization1.74.0 \
libboost-system1.74.0 \
libossp-uuid16 \
libfreetype6 \
libgeos-c1v5 \
libgdal28 \
libproj19 \
libprotobuf-c1 \
libsfcgal1 \
gdb && \
libxml2 \
libxslt1.1 \
libzstd1 \
libcurl4-openssl-dev \
locales \
procps \
zlib1g \
ca-certificates && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8

View File

@@ -1,8 +1,9 @@
# First transient image to build compute_tools binaries
# NB: keep in sync with rust image version in .github/workflows/build_and_test.yml
ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com
ARG REPOSITORY=neondatabase
ARG IMAGE=rust
ARG TAG=pinned
ARG BUILD_TAG
FROM $REPOSITORY/$IMAGE:$TAG AS rust-build
WORKDIR /home/nonroot
@@ -16,6 +17,8 @@ ENV CACHEPOT_S3_KEY_PREFIX=cachepot
ARG CACHEPOT_BUCKET=neon-github-dev
#ARG AWS_ACCESS_KEY_ID
#ARG AWS_SECRET_ACCESS_KEY
ARG BUILD_TAG
ENV BUILD_TAG=$BUILD_TAG
COPY . .

135
Makefile
View File

@@ -29,6 +29,7 @@ else ifeq ($(UNAME_S),Darwin)
# It can be configured with OPENSSL_PREFIX variable
OPENSSL_PREFIX ?= $(shell brew --prefix openssl@3)
PG_CONFIGURE_OPTS += --with-includes=$(OPENSSL_PREFIX)/include --with-libraries=$(OPENSSL_PREFIX)/lib
PG_CONFIGURE_OPTS += PKG_CONFIG_PATH=$(shell brew --prefix icu4c)/lib/pkgconfig
# macOS already has bison and flex in the system, but they are old and result in postgres-v14 target failure
# brew formulae are keg-only and not symlinked into HOMEBREW_PREFIX, force their usage
EXTRA_PATH_OVERRIDES += $(shell brew --prefix bison)/bin/:$(shell brew --prefix flex)/bin/:
@@ -61,7 +62,7 @@ all: neon postgres neon-pg-ext
#
# The 'postgres_ffi' depends on the Postgres headers.
.PHONY: neon
neon: postgres-headers
neon: postgres-headers walproposer-lib
+@echo "Compiling Neon"
$(CARGO_CMD_PREFIX) cargo build $(CARGO_BUILD_FLAGS)
@@ -71,6 +72,10 @@ neon: postgres-headers
#
$(POSTGRES_INSTALL_DIR)/build/%/config.status:
+@echo "Configuring Postgres $* build"
@test -s $(ROOT_PROJECT_DIR)/vendor/postgres-$*/configure || { \
echo "\nPostgres submodule not found in $(ROOT_PROJECT_DIR)/vendor/postgres-$*/, execute "; \
echo "'git submodule update --init --recursive --depth 2 --progress .' in project root.\n"; \
exit 1; }
mkdir -p $(POSTGRES_INSTALL_DIR)/build/$*
(cd $(POSTGRES_INSTALL_DIR)/build/$* && \
env PATH="$(EXTRA_PATH_OVERRIDES):$$PATH" $(ROOT_PROJECT_DIR)/vendor/postgres-$*/configure \
@@ -83,6 +88,8 @@ $(POSTGRES_INSTALL_DIR)/build/%/config.status:
# I'm not sure why it wouldn't work, but this is the only place (apart from
# the "build-all-versions" entry points) where direct mention of PostgreSQL
# versions is used.
.PHONY: postgres-configure-v16
postgres-configure-v16: $(POSTGRES_INSTALL_DIR)/build/v16/config.status
.PHONY: postgres-configure-v15
postgres-configure-v15: $(POSTGRES_INSTALL_DIR)/build/v15/config.status
.PHONY: postgres-configure-v14
@@ -108,6 +115,8 @@ postgres-%: postgres-configure-% \
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_buffercache install
+@echo "Compiling pageinspect $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect install
+@echo "Compiling amcheck $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/amcheck install
.PHONY: postgres-clean-%
postgres-clean-%:
@@ -116,6 +125,10 @@ postgres-clean-%:
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/interfaces/libpq clean
.PHONY: postgres-check-%
postgres-check-%: postgres-%
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$* MAKELEVEL=0 check
.PHONY: neon-pg-ext-%
neon-pg-ext-%: postgres-%
+@echo "Compiling neon $*"
@@ -128,43 +141,109 @@ neon-pg-ext-%: postgres-%
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile install
+@echo "Compiling neon_rmgr $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_rmgr/Makefile install
+@echo "Compiling neon_test_utils $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile install
+@echo "Compiling neon_utils $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-utils-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile install
.PHONY: neon-pg-ext-clean-%
neon-pg-ext-clean-%:
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/pgxn/neon-$* -f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/pgxn/neon_walredo-$* -f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/pgxn/neon_test_utils-$* -f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile clean
# Build walproposer as a static library. walproposer source code is located
# in the pgxn/neon directory.
#
# We also need to include libpgport.a and libpgcommon.a, because walproposer
# uses some functions from those libraries.
#
# Some object files are removed from libpgport.a and libpgcommon.a because
# they depend on openssl and other libraries that are not included in our
# Rust build.
.PHONY: walproposer-lib
walproposer-lib: neon-pg-ext-v16
+@echo "Compiling walproposer-lib"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/walproposer-lib
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile walproposer-lib
cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgport.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib
cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgcommon.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib
ifeq ($(UNAME_S),Linux)
$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgport.a \
pg_strong_random.o
$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgcommon.a \
pg_crc32c.o \
hmac_openssl.o \
cryptohash_openssl.o \
scram-common.o \
md5_common.o \
checksum_helper.o
endif
.PHONY: walproposer-lib-clean
walproposer-lib-clean:
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean
.PHONY: neon-pg-ext
neon-pg-ext: \
neon-pg-ext-v14 \
neon-pg-ext-v15
neon-pg-ext-v15 \
neon-pg-ext-v16
.PHONY: neon-pg-ext-clean
neon-pg-ext-clean: \
neon-pg-ext-clean-v14 \
neon-pg-ext-clean-v15
neon-pg-ext-clean-v15 \
neon-pg-ext-clean-v16
# shorthand to build all Postgres versions
.PHONY: postgres
postgres: \
postgres-v14 \
postgres-v15
postgres-v15 \
postgres-v16
.PHONY: postgres-headers
postgres-headers: \
postgres-headers-v14 \
postgres-headers-v15
postgres-headers-v15 \
postgres-headers-v16
.PHONY: postgres-clean
postgres-clean: \
postgres-clean-v14 \
postgres-clean-v15
postgres-clean-v15 \
postgres-clean-v16
.PHONY: postgres-check
postgres-check: \
postgres-check-v14 \
postgres-check-v15 \
postgres-check-v16
# This doesn't remove the effects of 'configure'.
.PHONY: clean
@@ -181,6 +260,44 @@ distclean:
fmt:
./pre-commit.py --fix-inplace
postgres-%-pg-bsd-indent: postgres-%
+@echo "Compiling pg_bsd_indent"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/tools/pg_bsd_indent/
# Create typedef list for the core. Note that generally it should be combined with
# buildfarm one to cover platform specific stuff.
# https://wiki.postgresql.org/wiki/Running_pgindent_on_non-core_code_or_development_code
postgres-%-typedefs.list: postgres-%
$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/find_typedef $(POSTGRES_INSTALL_DIR)/$*/bin > $@
# Indent postgres. See src/tools/pgindent/README for details.
.PHONY: postgres-%-pgindent
postgres-%-pgindent: postgres-%-pg-bsd-indent postgres-%-typedefs.list
+@echo merge with buildfarm typedef to cover all platforms
+@echo note: I first tried to download from pgbuildfarm.org, but for unclear reason e.g. \
REL_16_STABLE list misses PGSemaphoreData
# wget -q -O - "http://www.pgbuildfarm.org/cgi-bin/typedefs.pl?branch=REL_16_STABLE" |\
# cat - postgres-$*-typedefs.list | sort | uniq > postgres-$*-typedefs-full.list
cat $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/typedefs.list |\
cat - postgres-$*-typedefs.list | sort | uniq > postgres-$*-typedefs-full.list
+@echo note: you might want to run it on selected files/dirs instead.
INDENT=$(POSTGRES_INSTALL_DIR)/build/$*/src/tools/pg_bsd_indent/pg_bsd_indent \
$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/pgindent --typedefs postgres-$*-typedefs-full.list \
$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/ \
--excludes $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/exclude_file_patterns
rm -f pg*.BAK
# Indent pxgn/neon.
.PHONY: pgindent
neon-pgindent: postgres-v16-pg-bsd-indent neon-pg-ext-v16
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/find_typedef \
INDENT=$(POSTGRES_INSTALL_DIR)/build/v16/src/tools/pg_bsd_indent/pg_bsd_indent \
PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/pgindent/pgindent \
-C $(POSTGRES_INSTALL_DIR)/build/neon-v16 \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile pgindent
.PHONY: setup-pre-commit-hook
setup-pre-commit-hook:
ln -s -f $(ROOT_PROJECT_DIR)/pre-commit.py .git/hooks/pre-commit

4
NOTICE
View File

@@ -1,5 +1,5 @@
Neon
Copyright 2022 Neon Inc.
The PostgreSQL submodules in vendor/postgres-v14 and vendor/postgres-v15 are licensed under the
PostgreSQL license. See vendor/postgres-v14/COPYRIGHT and vendor/postgres-v15/COPYRIGHT.
The PostgreSQL submodules in vendor/ are licensed under the PostgreSQL license.
See vendor/postgres-vX/COPYRIGHT for details.

101
README.md
View File

@@ -1,3 +1,5 @@
[![Neon](https://user-images.githubusercontent.com/13738772/236813940-dcfdcb5b-69d3-449b-a686-013febe834d4.png)](https://neon.tech)
# Neon
Neon is a serverless open-source alternative to AWS Aurora Postgres. It separates storage and compute and substitutes the PostgreSQL storage layer by redistributing data across a cluster of nodes.
@@ -15,7 +17,7 @@ The Neon storage engine consists of two major components:
- Pageserver. Scalable storage backend for the compute nodes.
- Safekeepers. The safekeepers form a redundant WAL service that received WAL from the compute node, and stores it durably until it has been processed by the pageserver and uploaded to cloud storage.
See developer documentation in [/docs/SUMMARY.md](/docs/SUMMARY.md) for more information.
See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more information.
## Running local installation
@@ -26,14 +28,23 @@ See developer documentation in [/docs/SUMMARY.md](/docs/SUMMARY.md) for more inf
* On Ubuntu or Debian, this set of packages should be sufficient to build the code:
```bash
apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \
libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler
libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler \
libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev
```
* On Fedora, these packages are needed:
```bash
dnf install flex bison readline-devel zlib-devel openssl-devel \
libseccomp-devel perl clang cmake postgresql postgresql-contrib protobuf-compiler \
protobuf-devel
protobuf-devel libcurl-devel openssl poetry lsof libicu-devel libpq-devel python3-devel \
libffi-devel
```
* On Arch based systems, these packages are needed:
```bash
pacman -S base-devel readline zlib libseccomp openssl clang \
postgresql-libs cmake postgresql protobuf curl lsof
```
Building Neon requires 3.15+ version of `protoc` (protobuf-compiler). If your distribution provides an older version, you can install a newer version from [here](https://github.com/protocolbuffers/protobuf/releases).
2. [Install Rust](https://www.rust-lang.org/tools/install)
```
@@ -41,11 +52,14 @@ dnf install flex bison readline-devel zlib-devel openssl-devel \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
#### Installing dependencies on OSX (12.3.1)
#### Installing dependencies on macOS (12.3.1)
1. Install XCode and dependencies
```
xcode-select --install
brew install protobuf openssl flex bison
brew install protobuf openssl flex bison icu4c pkg-config
# add openssl to PATH, required for ed25519 keys generation in neon_local
echo 'export PATH="$(brew --prefix openssl)/bin:$PATH"' >> ~/.zshrc
```
2. [Install Rust](https://www.rust-lang.org/tools/install)
@@ -83,9 +97,10 @@ cd neon
# The preferred and default is to make a debug build. This will create a
# demonstrably slower build than a release build. For a release build,
# use "BUILD_TYPE=release make -j`nproc`"
# use "BUILD_TYPE=release make -j`nproc` -s"
# Remove -s for the verbose build log
make -j`nproc`
make -j`nproc` -s
```
#### Building on OSX
@@ -99,9 +114,10 @@ cd neon
# The preferred and default is to make a debug build. This will create a
# demonstrably slower build than a release build. For a release build,
# use "BUILD_TYPE=release make -j`sysctl -n hw.logicalcpu`"
# use "BUILD_TYPE=release make -j`sysctl -n hw.logicalcpu` -s"
# Remove -s for the verbose build log
make -j`sysctl -n hw.logicalcpu`
make -j`sysctl -n hw.logicalcpu` -s
```
#### Dependency installation notes
@@ -116,34 +132,36 @@ Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (r
```sh
# Create repository in .neon with proper paths to binaries and data
# Later that would be responsibility of a package install script
> ./target/debug/neon_local init
Starting pageserver at '127.0.0.1:64000' in '.neon'.
> cargo neon init
Initializing pageserver node 1 at '127.0.0.1:64000' in ".neon"
# start pageserver, safekeeper, and broker for their intercommunication
> ./target/debug/neon_local start
Starting neon broker at 127.0.0.1:50051
> cargo neon start
Starting neon broker at 127.0.0.1:50051.
storage_broker started, pid: 2918372
Starting pageserver at '127.0.0.1:64000' in '.neon'.
Starting pageserver node 1 at '127.0.0.1:64000' in ".neon".
pageserver started, pid: 2918386
Starting safekeeper at '127.0.0.1:5454' in '.neon/safekeepers/sk1'.
safekeeper 1 started, pid: 2918437
# create initial tenant and use it as a default for every future neon_local invocation
> ./target/debug/neon_local tenant create --set-default
> cargo neon tenant create --set-default
tenant 9ef87a5bf0d92544f6fafeeb3239695c successfully created on the pageserver
Created an initial timeline 'de200bd42b49cc1814412c7e592dd6e9' at Lsn 0/16B5A50 for tenant: 9ef87a5bf0d92544f6fafeeb3239695c
Setting tenant 9ef87a5bf0d92544f6fafeeb3239695c as a default one
# create postgres compute node
> cargo neon endpoint create main
# start postgres compute node
> ./target/debug/neon_local pg start main
Starting new postgres (v14) main on timeline de200bd42b49cc1814412c7e592dd6e9 ...
Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/main port=55432
Starting postgres node at 'host=127.0.0.1 port=55432 user=cloud_admin dbname=postgres'
> cargo neon endpoint start main
Starting new endpoint main (PostgreSQL v14) on timeline de200bd42b49cc1814412c7e592dd6e9 ...
Starting postgres at 'postgresql://cloud_admin@127.0.0.1:55432/postgres'
# check list of running postgres instances
> ./target/debug/neon_local pg list
NODE ADDRESS TIMELINE BRANCH NAME LSN STATUS
main 127.0.0.1:55432 de200bd42b49cc1814412c7e592dd6e9 main 0/16B5BA8 running
> cargo neon endpoint list
ENDPOINT ADDRESS TIMELINE BRANCH NAME LSN STATUS
main 127.0.0.1:55432 de200bd42b49cc1814412c7e592dd6e9 main 0/16B5BA8 running
```
2. Now, it is possible to connect to postgres and run some queries:
@@ -163,29 +181,31 @@ postgres=# select * from t;
3. And create branches and run postgres on them:
```sh
# create branch named migration_check
> ./target/debug/neon_local timeline branch --branch-name migration_check
> cargo neon timeline branch --branch-name migration_check
Created timeline 'b3b863fa45fa9e57e615f9f2d944e601' at Lsn 0/16F9A00 for tenant: 9ef87a5bf0d92544f6fafeeb3239695c. Ancestor timeline: 'main'
# check branches tree
> ./target/debug/neon_local timeline list
> cargo neon timeline list
(L) main [de200bd42b49cc1814412c7e592dd6e9]
(L) ┗━ @0/16F9A00: migration_check [b3b863fa45fa9e57e615f9f2d944e601]
# create postgres on that branch
> cargo neon endpoint create migration_check --branch-name migration_check
# start postgres on that branch
> ./target/debug/neon_local pg start migration_check --branch-name migration_check
Starting new postgres migration_check on timeline b3b863fa45fa9e57e615f9f2d944e601 ...
Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/migration_check port=55433
Starting postgres node at 'host=127.0.0.1 port=55433 user=cloud_admin dbname=postgres'
> cargo neon endpoint start migration_check
Starting new endpoint migration_check (PostgreSQL v14) on timeline b3b863fa45fa9e57e615f9f2d944e601 ...
Starting postgres at 'postgresql://cloud_admin@127.0.0.1:55434/postgres'
# check the new list of running postgres instances
> ./target/debug/neon_local pg list
NODE ADDRESS TIMELINE BRANCH NAME LSN STATUS
> cargo neon endpoint list
ENDPOINT ADDRESS TIMELINE BRANCH NAME LSN STATUS
main 127.0.0.1:55432 de200bd42b49cc1814412c7e592dd6e9 main 0/16F9A38 running
migration_check 127.0.0.1:55433 b3b863fa45fa9e57e615f9f2d944e601 migration_check 0/16F9A70 running
migration_check 127.0.0.1:55434 b3b863fa45fa9e57e615f9f2d944e601 migration_check 0/16F9A70 running
# this new postgres instance will have all the data from 'main' postgres,
# but all modifications would not affect data in original postgres
> psql -p55433 -h 127.0.0.1 -U cloud_admin postgres
> psql -p55434 -h 127.0.0.1 -U cloud_admin postgres
postgres=# select * from t;
key | value
-----+-------
@@ -207,7 +227,7 @@ postgres=# select * from t;
4. If you want to run tests afterward (see below), you must stop all the running of the pageserver, safekeeper, and postgres instances
you have just started. You can terminate them all with one command:
```sh
> ./target/debug/neon_local stop
> cargo neon stop
```
## Running tests
@@ -222,11 +242,18 @@ CARGO_BUILD_FLAGS="--features=testing" make
./scripts/pytest
```
By default, this runs both debug and release modes, and all supported postgres versions. When
testing locally, it is convenient to run just run one set of permutations, like this:
```sh
DEFAULT_PG_VERSION=15 BUILD_TYPE=release ./scripts/pytest
```
## Documentation
[/docs/](/docs/) Contains a top-level overview of all available markdown documentation.
[docs](/docs) Contains a top-level overview of all available markdown documentation.
- [/docs/sourcetree.md](/docs/sourcetree.md) contains overview of source tree layout.
- [sourcetree.md](/docs/sourcetree.md) contains overview of source tree layout.
To view your `rustdoc` documentation in a browser, try running `cargo doc --no-deps --open`
@@ -251,6 +278,6 @@ To get more familiar with this aspect, refer to:
## Join the development
- Read `CONTRIBUTING.md` to learn about project code style and practices.
- To get familiar with a source tree layout, use [/docs/sourcetree.md](/docs/sourcetree.md).
- Read [CONTRIBUTING.md](/CONTRIBUTING.md) to learn about project code style and practices.
- To get familiar with a source tree layout, use [sourcetree.md](/docs/sourcetree.md).
- To learn more about PostgreSQL internals, check http://www.interdb.jp/pg/index.html

5
clippy.toml Normal file
View File

@@ -0,0 +1,5 @@
disallowed-methods = [
"tokio::task::block_in_place",
# Allow this for now, to deny it later once we stop using Handle::block_on completely
# "tokio::runtime::Handle::block_on",
]

View File

@@ -6,23 +6,36 @@ license.workspace = true
[dependencies]
anyhow.workspace = true
async-compression.workspace = true
chrono.workspace = true
cfg-if.workspace = true
clap.workspace = true
flate2.workspace = true
futures.workspace = true
hyper = { workspace = true, features = ["full"] }
notify.workspace = true
num_cpus.workspace = true
opentelemetry.workspace = true
postgres.workspace = true
regex.workspace = true
serde.workspace = true
serde_json.workspace = true
tar.workspace = true
reqwest = { workspace = true, features = ["json"] }
tokio = { workspace = true, features = ["rt", "rt-multi-thread"] }
tokio-postgres.workspace = true
tokio-util.workspace = true
tracing.workspace = true
tracing-opentelemetry.workspace = true
tracing-subscriber.workspace = true
tracing-utils.workspace = true
url.workspace = true
compute_api.workspace = true
utils.workspace = true
workspace_hack.workspace = true
toml_edit.workspace = true
remote_storage = { version = "0.1", path = "../libs/remote_storage/" }
vm_monitor = { version = "0.1", path = "../libs/vm_monitor/" }
zstd = "0.12.4"
bytes = "1.0"

View File

@@ -19,9 +19,10 @@ Also `compute_ctl` spawns two separate service threads:
- `http-endpoint` runs a Hyper HTTP API server, which serves readiness and the
last activity requests.
If the `vm-informant` binary is present at `/bin/vm-informant`, it will also be started. For VM
compute nodes, `vm-informant` communicates with the VM autoscaling system. It coordinates
downscaling and (eventually) will request immediate upscaling under resource pressure.
If `AUTOSCALING` environment variable is set, `compute_ctl` will start the
`vm-monitor` located in [`neon/libs/vm_monitor`]. For VM compute nodes,
`vm-monitor` communicates with the VM autoscaling system. It coordinates
downscaling and requests immediate upscaling under resource pressure.
Usage example:
```sh

View File

@@ -5,6 +5,8 @@
//! - `compute_ctl` accepts cluster (compute node) specification as a JSON file.
//! - Every start is a fresh start, so the data directory is removed and
//! initialized again on each run.
//! - If remote_extension_config is provided, it will be used to fetch extensions list
//! and download `shared_preload_libraries` from the remote storage.
//! - Next it will put configuration files into the `PGDATA` directory.
//! - Sync safekeepers and get commit LSN.
//! - Get `basebackup` from pageserver using the returned on the previous step LSN.
@@ -18,148 +20,316 @@
//! - `http-endpoint` runs a Hyper HTTP API server, which serves readiness and the
//! last activity requests.
//!
//! If the `vm-informant` binary is present at `/bin/vm-informant`, it will also be started. For VM
//! compute nodes, `vm-informant` communicates with the VM autoscaling system. It coordinates
//! downscaling and (eventually) will request immediate upscaling under resource pressure.
//! If `AUTOSCALING` environment variable is set, `compute_ctl` will start the
//! `vm-monitor` located in [`neon/libs/vm_monitor`]. For VM compute nodes,
//! `vm-monitor` communicates with the VM autoscaling system. It coordinates
//! downscaling and requests immediate upscaling under resource pressure.
//!
//! Usage example:
//! ```sh
//! compute_ctl -D /var/db/postgres/compute \
//! -C 'postgresql://cloud_admin@localhost/postgres' \
//! -S /var/db/postgres/specs/current.json \
//! -b /usr/local/bin/postgres
//! -b /usr/local/bin/postgres \
//! -r http://pg-ext-s3-gateway
//! ```
//!
use std::collections::HashMap;
use std::fs::File;
use std::panic;
use std::path::Path;
use std::process::exit;
use std::sync::{Arc, RwLock};
use std::sync::{mpsc, Arc, Condvar, Mutex, RwLock};
use std::{thread, time::Duration};
use anyhow::{Context, Result};
use chrono::Utc;
use clap::Arg;
use tracing::{error, info};
use url::Url;
use compute_tools::compute::{ComputeMetrics, ComputeNode, ComputeState, ComputeStatus};
use compute_api::responses::ComputeStatus;
use compute_tools::compute::{ComputeNode, ComputeState, ParsedSpec};
use compute_tools::configurator::launch_configurator;
use compute_tools::extension_server::get_pg_version;
use compute_tools::http::api::launch_http_server;
use compute_tools::informant::spawn_vm_informant_if_present;
use compute_tools::logger::*;
use compute_tools::monitor::launch_monitor;
use compute_tools::params::*;
use compute_tools::pg_helpers::*;
use compute_tools::spec::*;
use url::Url;
// this is an arbitrary build tag. Fine as a default / for testing purposes
// in-case of not-set environment var
const BUILD_TAG_DEFAULT: &str = "latest";
fn main() -> Result<()> {
init_tracing_and_logging(DEFAULT_LOG_LEVEL)?;
let matches = cli().get_matches();
let build_tag = option_env!("BUILD_TAG")
.unwrap_or(BUILD_TAG_DEFAULT)
.to_string();
info!("build_tag: {build_tag}");
let matches = cli().get_matches();
let pgbin_default = String::from("postgres");
let pgbin = matches.get_one::<String>("pgbin").unwrap_or(&pgbin_default);
let ext_remote_storage = matches
.get_one::<String>("remote-ext-config")
// Compatibility hack: if the control plane specified any remote-ext-config
// use the default value for extension storage proxy gateway.
// Remove this once the control plane is updated to pass the gateway URL
.map(|conf| {
if conf.starts_with("http") {
conf.trim_end_matches('/')
} else {
"http://pg-ext-s3-gateway"
}
});
let http_port = *matches
.get_one::<u16>("http-port")
.expect("http-port is required");
let pgdata = matches
.get_one::<String>("pgdata")
.expect("PGDATA path is required");
let connstr = matches
.get_one::<String>("connstr")
.expect("Postgres connection string is required");
let spec = matches.get_one::<String>("spec");
let spec_json = matches.get_one::<String>("spec");
let spec_path = matches.get_one::<String>("spec-path");
// Try to use just 'postgres' if no path is provided
let pgbin = matches.get_one::<String>("pgbin").unwrap();
let spec: ComputeSpec = match spec {
// First, try to get cluster spec from the cli argument
Some(json) => serde_json::from_str(json)?,
None => {
// Second, try to read it from the file if path is provided
if let Some(sp) = spec_path {
let path = Path::new(sp);
let file = File::open(path)?;
serde_json::from_reader(file)?
} else {
panic!("cluster spec should be provided via --spec or --spec-path argument");
}
}
};
// Extract OpenTelemetry context for the startup actions from the spec, and
// attach it to the current tracing context.
// Extract OpenTelemetry context for the startup actions from the
// TRACEPARENT and TRACESTATE env variables, and attach it to the current
// tracing context.
//
// This is used to propagate the context for the 'start_compute' operation
// from the neon control plane. This allows linking together the wider
// 'start_compute' operation that creates the compute container, with the
// startup actions here within the container.
//
// There is no standard for passing context in env variables, but a lot of
// tools use TRACEPARENT/TRACESTATE, so we use that convention too. See
// https://github.com/open-telemetry/opentelemetry-specification/issues/740
//
// Switch to the startup context here, and exit it once the startup has
// completed and Postgres is up and running.
//
// If this pod is pre-created without binding it to any particular endpoint
// yet, this isn't the right place to enter the startup context. In that
// case, the control plane should pass the tracing context as part of the
// /configure API call.
//
// NOTE: This is supposed to only cover the *startup* actions. Once
// postgres is configured and up-and-running, we exit this span. Any other
// actions that are performed on incoming HTTP requests, for example, are
// performed in separate spans.
let startup_context_guard = if let Some(ref carrier) = spec.startup_tracing_context {
//
// XXX: If the pod is restarted, we perform the startup actions in the same
// context as the original startup actions, which probably doesn't make
// sense.
let mut startup_tracing_carrier: HashMap<String, String> = HashMap::new();
if let Ok(val) = std::env::var("TRACEPARENT") {
startup_tracing_carrier.insert("traceparent".to_string(), val);
}
if let Ok(val) = std::env::var("TRACESTATE") {
startup_tracing_carrier.insert("tracestate".to_string(), val);
}
let startup_context_guard = if !startup_tracing_carrier.is_empty() {
use opentelemetry::propagation::TextMapPropagator;
use opentelemetry::sdk::propagation::TraceContextPropagator;
Some(TraceContextPropagator::new().extract(carrier).attach())
let guard = TraceContextPropagator::new()
.extract(&startup_tracing_carrier)
.attach();
info!("startup tracing context attached");
Some(guard)
} else {
None
};
let pageserver_connstr = spec
.cluster
.settings
.find("neon.pageserver_connstring")
.expect("pageserver connstr should be provided");
let tenant = spec
.cluster
.settings
.find("neon.tenant_id")
.expect("tenant id should be provided");
let timeline = spec
.cluster
.settings
.find("neon.timeline_id")
.expect("tenant id should be provided");
let compute_id = matches.get_one::<String>("compute-id");
let control_plane_uri = matches.get_one::<String>("control-plane-uri");
let compute_state = ComputeNode {
start_time: Utc::now(),
let spec;
let mut live_config_allowed = false;
match spec_json {
// First, try to get cluster spec from the cli argument
Some(json) => {
info!("got spec from cli argument {}", json);
spec = Some(serde_json::from_str(json)?);
}
None => {
// Second, try to read it from the file if path is provided
if let Some(sp) = spec_path {
let path = Path::new(sp);
let file = File::open(path)?;
spec = Some(serde_json::from_reader(file)?);
live_config_allowed = true;
} else if let Some(id) = compute_id {
if let Some(cp_base) = control_plane_uri {
live_config_allowed = true;
spec = match get_spec_from_control_plane(cp_base, id) {
Ok(s) => s,
Err(e) => {
error!("cannot get response from control plane: {}", e);
panic!("neither spec nor confirmation that compute is in the Empty state was received");
}
};
} else {
panic!("must specify both --control-plane-uri and --compute-id or none");
}
} else {
panic!(
"compute spec should be provided by one of the following ways: \
--spec OR --spec-path OR --control-plane-uri and --compute-id"
);
}
}
};
let mut new_state = ComputeState::new();
let spec_set;
if let Some(spec) = spec {
let pspec = ParsedSpec::try_from(spec).map_err(|msg| anyhow::anyhow!(msg))?;
info!("new pspec.spec: {:?}", pspec.spec);
new_state.pspec = Some(pspec);
spec_set = true;
} else {
spec_set = false;
}
let compute_node = ComputeNode {
connstr: Url::parse(connstr).context("cannot parse connstr as a URL")?,
pgdata: pgdata.to_string(),
pgbin: pgbin.to_string(),
spec,
tenant,
timeline,
pageserver_connstr,
metrics: ComputeMetrics::default(),
state: RwLock::new(ComputeState::new()),
pgversion: get_pg_version(pgbin),
live_config_allowed,
state: Mutex::new(new_state),
state_changed: Condvar::new(),
ext_remote_storage: ext_remote_storage.map(|s| s.to_string()),
ext_download_progress: RwLock::new(HashMap::new()),
build_tag,
};
let compute = Arc::new(compute_state);
let compute = Arc::new(compute_node);
// Launch service threads first, so we were able to serve availability
// If this is a pooled VM, prewarm before starting HTTP server and becoming
// available for binding. Prewarming helps postgres start quicker later,
// because QEMU will already have it's memory allocated from the host, and
// the necessary binaries will alreaady be cached.
if !spec_set {
compute.prewarm_postgres()?;
}
// Launch http service first, so we were able to serve control-plane
// requests, while configuration is still in progress.
let _http_handle = launch_http_server(&compute).expect("cannot launch http endpoint thread");
let _monitor_handle = launch_monitor(&compute).expect("cannot launch compute monitor thread");
// Also spawn the thread responsible for handling the VM informant -- if it's present
let _vm_informant_handle = spawn_vm_informant_if_present().expect("cannot launch VM informant");
let _http_handle =
launch_http_server(http_port, &compute).expect("cannot launch http endpoint thread");
let extension_server_port: u16 = http_port;
if !spec_set {
// No spec provided, hang waiting for it.
info!("no compute spec provided, waiting");
let mut state = compute.state.lock().unwrap();
while state.status != ComputeStatus::ConfigurationPending {
state = compute.state_changed.wait(state).unwrap();
if state.status == ComputeStatus::ConfigurationPending {
info!("got spec, continue configuration");
// Spec is already set by the http server handler.
break;
}
}
}
// We got all we need, update the state.
let mut state = compute.state.lock().unwrap();
// Record for how long we slept waiting for the spec.
state.metrics.wait_for_spec_ms = Utc::now()
.signed_duration_since(state.start_time)
.to_std()
.unwrap()
.as_millis() as u64;
// Reset start time to the actual start of the configuration, so that
// total startup time was properly measured at the end.
state.start_time = Utc::now();
state.status = ComputeStatus::Init;
compute.state_changed.notify_all();
drop(state);
// Launch remaining service threads
let _monitor_handle = launch_monitor(&compute);
let _configurator_handle = launch_configurator(&compute);
// Start Postgres
let mut delay_exit = false;
let mut exit_code = None;
let pg = match compute.start_compute() {
let pg = match compute.start_compute(extension_server_port) {
Ok(pg) => Some(pg),
Err(err) => {
error!("could not start the compute node: {:?}", err);
let mut state = compute.state.write().unwrap();
let mut state = compute.state.lock().unwrap();
state.error = Some(format!("{:?}", err));
state.status = ComputeStatus::Failed;
drop(state);
// Notify others that Postgres failed to start. In case of configuring the
// empty compute, it's likely that API handler is still waiting for compute
// state change. With this we will notify it that compute is in Failed state,
// so control plane will know about it earlier and record proper error instead
// of timeout.
compute.state_changed.notify_all();
drop(state); // unlock
delay_exit = true;
None
}
};
// Start the vm-monitor if directed to. The vm-monitor only runs on linux
// because it requires cgroups.
cfg_if::cfg_if! {
if #[cfg(target_os = "linux")] {
use std::env;
use tokio_util::sync::CancellationToken;
let vm_monitor_addr = matches
.get_one::<String>("vm-monitor-addr")
.expect("--vm-monitor-addr should always be set because it has a default arg");
let file_cache_connstr = matches.get_one::<String>("filecache-connstr");
let cgroup = matches.get_one::<String>("cgroup");
// Only make a runtime if we need to.
// Note: it seems like you can make a runtime in an inner scope and
// if you start a task in it it won't be dropped. However, make it
// in the outermost scope just to be safe.
let rt = if env::var_os("AUTOSCALING").is_some() {
Some(
tokio::runtime::Builder::new_multi_thread()
.worker_threads(4)
.enable_all()
.build()
.expect("failed to create tokio runtime for monitor")
)
} else {
None
};
// This token is used internally by the monitor to clean up all threads
let token = CancellationToken::new();
let vm_monitor = &rt.as_ref().map(|rt| {
rt.spawn(vm_monitor::start(
Box::leak(Box::new(vm_monitor::Args {
cgroup: cgroup.cloned(),
pgconnstr: file_cache_connstr.cloned(),
addr: vm_monitor_addr.clone(),
})),
token.clone(),
))
});
}
}
// Wait for the child Postgres process forever. In this state Ctrl+C will
// propagate to Postgres and it will be shut down as well.
if let Some(mut pg) = pg {
@@ -173,6 +343,34 @@ fn main() -> Result<()> {
exit_code = ecode.code()
}
// Terminate the vm_monitor so it releases the file watcher on
// /sys/fs/cgroup/neon-postgres.
// Note: the vm-monitor only runs on linux because it requires cgroups.
cfg_if::cfg_if! {
if #[cfg(target_os = "linux")] {
if let Some(handle) = vm_monitor {
// Kills all threads spawned by the monitor
token.cancel();
// Kills the actual task running the monitor
handle.abort();
// If handle is some, rt must have been used to produce it, and
// hence is also some
rt.unwrap().shutdown_timeout(Duration::from_secs(2));
}
}
}
// Maybe sync safekeepers again, to speed up next startup
let compute_state = compute.state.lock().unwrap().clone();
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
if matches!(pspec.spec.mode, compute_api::spec::ComputeMode::Primary) {
info!("syncing safekeepers on shutdown");
let storage_auth_token = pspec.storage_auth_token.clone();
let lsn = compute.sync_safekeepers(storage_auth_token)?;
info!("synced safekeepers at lsn {lsn}");
}
if let Err(err) = compute.check_for_core_dumps() {
error!("error while checking for core dumps: {err:?}");
}
@@ -182,13 +380,29 @@ fn main() -> Result<()> {
if delay_exit {
info!("giving control plane 30s to collect the error before shutdown");
thread::sleep(Duration::from_secs(30));
info!("shutting down");
}
// Shutdown trace pipeline gracefully, so that it has a chance to send any
// pending traces before we exit.
tracing_utils::shutdown_tracing();
// pending traces before we exit. Shutting down OTEL tracing provider may
// hang for quite some time, see, for example:
// - https://github.com/open-telemetry/opentelemetry-rust/issues/868
// - and our problems with staging https://github.com/neondatabase/cloud/issues/3707#issuecomment-1493983636
//
// Yet, we want computes to shut down fast enough, as we may need a new one
// for the same timeline ASAP. So wait no longer than 2s for the shutdown to
// complete, then just error out and exit the main thread.
info!("shutting down tracing");
let (sender, receiver) = mpsc::channel();
let _ = thread::spawn(move || {
tracing_utils::shutdown_tracing();
sender.send(()).ok()
});
let shutdown_res = receiver.recv_timeout(Duration::from_millis(2000));
if shutdown_res.is_err() {
error!("timed out while shutting down tracing, exiting anyway");
}
info!("shutting down");
exit(exit_code.unwrap_or(1))
}
@@ -197,6 +411,14 @@ fn cli() -> clap::Command {
let version = option_env!("CARGO_PKG_VERSION").unwrap_or("unknown");
clap::Command::new("compute_ctl")
.version(version)
.arg(
Arg::new("http-port")
.long("http-port")
.value_name("HTTP_PORT")
.default_value("3080")
.value_parser(clap::value_parser!(u16))
.required(false),
)
.arg(
Arg::new("connstr")
.short('C')
@@ -230,6 +452,47 @@ fn cli() -> clap::Command {
.long("spec-path")
.value_name("SPEC_PATH"),
)
.arg(
Arg::new("compute-id")
.short('i')
.long("compute-id")
.value_name("COMPUTE_ID"),
)
.arg(
Arg::new("control-plane-uri")
.short('p')
.long("control-plane-uri")
.value_name("CONTROL_PLANE_API_BASE_URI"),
)
.arg(
Arg::new("remote-ext-config")
.short('r')
.long("remote-ext-config")
.value_name("REMOTE_EXT_CONFIG"),
)
// TODO(fprasx): we currently have default arguments because the cloud PR
// to pass them in hasn't been merged yet. We should get rid of them once
// the PR is merged.
.arg(
Arg::new("vm-monitor-addr")
.long("vm-monitor-addr")
.default_value("0.0.0.0:10301")
.value_name("VM_MONITOR_ADDR"),
)
.arg(
Arg::new("cgroup")
.long("cgroup")
.default_value("neon-postgres")
.value_name("CGROUP"),
)
.arg(
Arg::new("filecache-connstr")
.long("filecache-connstr")
.default_value(
"host=localhost port=5432 dbname=postgres user=cloud_admin sslmode=disable",
)
.value_name("FILECACHE_CONNSTR"),
)
}
#[test]

View File

@@ -1,45 +1,79 @@
use anyhow::{anyhow, Result};
use anyhow::{anyhow, Ok, Result};
use postgres::Client;
use tokio_postgres::NoTls;
use tracing::{error, instrument};
use tracing::{error, instrument, warn};
use crate::compute::ComputeNode;
#[instrument(skip_all)]
pub fn create_writability_check_data(client: &mut Client) -> Result<()> {
/// Create a special service table for availability checks
/// only if it does not exist already.
pub fn create_availability_check_data(client: &mut Client) -> Result<()> {
let query = "
CREATE TABLE IF NOT EXISTS health_check (
id serial primary key,
updated_at timestamptz default now()
);
INSERT INTO health_check VALUES (1, now())
ON CONFLICT (id) DO UPDATE
SET updated_at = now();";
let result = client.simple_query(query)?;
if result.len() < 2 {
return Err(anyhow::format_err!("executed {} queries", result.len()));
}
DO $$
BEGIN
IF NOT EXISTS(
SELECT 1
FROM pg_catalog.pg_tables
WHERE tablename = 'health_check'
)
THEN
CREATE TABLE health_check (
id serial primary key,
updated_at timestamptz default now()
);
INSERT INTO health_check VALUES (1, now())
ON CONFLICT (id) DO UPDATE
SET updated_at = now();
END IF;
END
$$;";
client.execute(query, &[])?;
Ok(())
}
/// Update timestamp in a row in a special service table to check
/// that we can actually write some data in this particular timeline.
#[instrument(skip_all)]
pub async fn check_writability(compute: &ComputeNode) -> Result<()> {
// Connect to the database.
let (client, connection) = tokio_postgres::connect(compute.connstr.as_str(), NoTls).await?;
if client.is_closed() {
return Err(anyhow!("connection to postgres closed"));
}
// The connection object performs the actual communication with the database,
// so spawn it off to run on its own.
tokio::spawn(async move {
if let Err(e) = connection.await {
error!("connection error: {}", e);
}
});
let result = client
.simple_query("UPDATE health_check SET updated_at = now() WHERE id = 1;")
.await?;
let query = "
INSERT INTO health_check VALUES (1, now())
ON CONFLICT (id) DO UPDATE
SET updated_at = now();";
if result.len() != 1 {
return Err(anyhow!("statement can't be executed"));
match client.simple_query(query).await {
Result::Ok(result) => {
if result.len() != 1 {
return Err(anyhow::anyhow!(
"expected 1 query results, but got {}",
result.len()
));
}
}
Err(err) => {
if let Some(state) = err.code() {
if state == &tokio_postgres::error::SqlState::DISK_FULL {
warn!("Tenant disk is full");
return Ok(());
}
}
return Err(err.into());
}
}
Ok(())
}

File diff suppressed because it is too large Load Diff

View File

@@ -5,8 +5,9 @@ use std::path::Path;
use anyhow::Result;
use crate::pg_helpers::escape_conf_value;
use crate::pg_helpers::PgOptionsSerialize;
use crate::spec::ComputeSpec;
use compute_api::spec::{ComputeMode, ComputeSpec};
/// Check that `line` is inside a text file and put it there if it is not.
/// Create file if it doesn't exist.
@@ -32,20 +33,85 @@ pub fn line_in_file(path: &Path, line: &str) -> Result<bool> {
}
/// Create or completely rewrite configuration file specified by `path`
pub fn write_postgres_conf(path: &Path, spec: &ComputeSpec) -> Result<()> {
pub fn write_postgres_conf(
path: &Path,
spec: &ComputeSpec,
extension_server_port: Option<u16>,
) -> Result<()> {
// File::create() destroys the file content if it exists.
let mut postgres_conf = File::create(path)?;
let mut file = File::create(path)?;
write_auto_managed_block(&mut postgres_conf, &spec.cluster.settings.as_pg_settings())?;
// Write the postgresql.conf content from the spec file as is.
if let Some(conf) = &spec.cluster.postgresql_conf {
writeln!(file, "{}", conf)?;
}
// Add options for connecting to storage
writeln!(file, "# Neon storage settings")?;
if let Some(s) = &spec.pageserver_connstring {
writeln!(file, "neon.pageserver_connstring={}", escape_conf_value(s))?;
}
if !spec.safekeeper_connstrings.is_empty() {
writeln!(
file,
"neon.safekeepers={}",
escape_conf_value(&spec.safekeeper_connstrings.join(","))
)?;
}
if let Some(s) = &spec.tenant_id {
writeln!(file, "neon.tenant_id={}", escape_conf_value(&s.to_string()))?;
}
if let Some(s) = &spec.timeline_id {
writeln!(
file,
"neon.timeline_id={}",
escape_conf_value(&s.to_string())
)?;
}
match spec.mode {
ComputeMode::Primary => {}
ComputeMode::Static(lsn) => {
// hot_standby is 'on' by default, but let's be explicit
writeln!(file, "hot_standby=on")?;
writeln!(file, "recovery_target_lsn='{lsn}'")?;
}
ComputeMode::Replica => {
// hot_standby is 'on' by default, but let's be explicit
writeln!(file, "hot_standby=on")?;
}
}
// If there are any extra options in the 'settings' field, append those
if spec.cluster.settings.is_some() {
writeln!(file, "# Managed by compute_ctl: begin")?;
write!(file, "{}", spec.cluster.settings.as_pg_settings())?;
writeln!(file, "# Managed by compute_ctl: end")?;
}
if let Some(port) = extension_server_port {
writeln!(file, "neon.extension_server_port={}", port)?;
}
// This is essential to keep this line at the end of the file,
// because it is intended to override any settings above.
writeln!(file, "include_if_exists = 'compute_ctl_temp_override.conf'")?;
Ok(())
}
// Write Postgres config block wrapped with generated comment section
fn write_auto_managed_block(file: &mut File, buf: &str) -> Result<()> {
writeln!(file, "# Managed by compute_ctl: begin")?;
writeln!(file, "{}", buf)?;
writeln!(file, "# Managed by compute_ctl: end")?;
/// create file compute_ctl_temp_override.conf in pgdata_dir
/// add provided options to this file
pub fn compute_ctl_temp_override_create(pgdata_path: &Path, options: &str) -> Result<()> {
let path = pgdata_path.join("compute_ctl_temp_override.conf");
let mut file = File::create(path)?;
write!(file, "{}", options)?;
Ok(())
}
/// remove file compute_ctl_temp_override.conf in pgdata_dir
pub fn compute_ctl_temp_override_remove(pgdata_path: &Path) -> Result<()> {
let path = pgdata_path.join("compute_ctl_temp_override.conf");
std::fs::remove_file(path)?;
Ok(())
}

View File

@@ -0,0 +1,54 @@
use std::sync::Arc;
use std::thread;
use tracing::{error, info, instrument};
use compute_api::responses::ComputeStatus;
use crate::compute::ComputeNode;
#[instrument(skip_all)]
fn configurator_main_loop(compute: &Arc<ComputeNode>) {
info!("waiting for reconfiguration requests");
loop {
let state = compute.state.lock().unwrap();
let mut state = compute.state_changed.wait(state).unwrap();
if state.status == ComputeStatus::ConfigurationPending {
info!("got configuration request");
state.status = ComputeStatus::Configuration;
compute.state_changed.notify_all();
drop(state);
let mut new_status = ComputeStatus::Failed;
if let Err(e) = compute.reconfigure() {
error!("could not configure compute node: {}", e);
} else {
new_status = ComputeStatus::Running;
info!("compute node configured");
}
// XXX: used to test that API is blocking
// std::thread::sleep(std::time::Duration::from_millis(10000));
compute.set_status(new_status);
} else if state.status == ComputeStatus::Failed {
info!("compute node is now in Failed state, exiting");
break;
} else {
info!("woken up for compute status: {:?}, sleeping", state.status);
}
}
}
pub fn launch_configurator(compute: &Arc<ComputeNode>) -> thread::JoinHandle<()> {
let compute = Arc::clone(compute);
thread::Builder::new()
.name("compute-configurator".into())
.spawn(move || {
configurator_main_loop(&compute);
info!("configurator thread is exited");
})
.expect("cannot launch configurator thread")
}

View File

@@ -0,0 +1,296 @@
// Download extension files from the extension store
// and put them in the right place in the postgres directory (share / lib)
/*
The layout of the S3 bucket is as follows:
5615610098 // this is an extension build number
├── v14
│   ├── extensions
│   │   ├── anon.tar.zst
│   │   └── embedding.tar.zst
│   └── ext_index.json
└── v15
├── extensions
│   ├── anon.tar.zst
│   └── embedding.tar.zst
└── ext_index.json
5615261079
├── v14
│   ├── extensions
│   │   └── anon.tar.zst
│   └── ext_index.json
└── v15
├── extensions
│   └── anon.tar.zst
└── ext_index.json
5623261088
├── v14
│   ├── extensions
│   │   └── embedding.tar.zst
│   └── ext_index.json
└── v15
├── extensions
│   └── embedding.tar.zst
└── ext_index.json
Note that build number cannot be part of prefix because we might need extensions
from other build numbers.
ext_index.json stores the control files and location of extension archives
It also stores a list of public extensions and a library_index
We don't need to duplicate extension.tar.zst files.
We only need to upload a new one if it is updated.
(Although currently we just upload every time anyways, hopefully will change
this sometime)
*access* is controlled by spec
More specifically, here is an example ext_index.json
{
"public_extensions": [
"anon",
"pg_buffercache"
],
"library_index": {
"anon": "anon",
"pg_buffercache": "pg_buffercache"
},
"extension_data": {
"pg_buffercache": {
"control_data": {
"pg_buffercache.control": "# pg_buffercache extension \ncomment = 'examine the shared buffer cache' \ndefault_version = '1.3' \nmodule_pathname = '$libdir/pg_buffercache' \nrelocatable = true \ntrusted=true"
},
"archive_path": "5670669815/v14/extensions/pg_buffercache.tar.zst"
},
"anon": {
"control_data": {
"anon.control": "# PostgreSQL Anonymizer (anon) extension \ncomment = 'Data anonymization tools' \ndefault_version = '1.1.0' \ndirectory='extension/anon' \nrelocatable = false \nrequires = 'pgcrypto' \nsuperuser = false \nmodule_pathname = '$libdir/anon' \ntrusted = true \n"
},
"archive_path": "5670669815/v14/extensions/anon.tar.zst"
}
}
}
*/
use anyhow::{self, Result};
use anyhow::{bail, Context};
use bytes::Bytes;
use compute_api::spec::RemoteExtSpec;
use regex::Regex;
use remote_storage::*;
use reqwest::StatusCode;
use std::path::Path;
use std::str;
use tar::Archive;
use tracing::info;
use tracing::log::warn;
use zstd::stream::read::Decoder;
fn get_pg_config(argument: &str, pgbin: &str) -> String {
// gives the result of `pg_config [argument]`
// where argument is a flag like `--version` or `--sharedir`
let pgconfig = pgbin
.strip_suffix("postgres")
.expect("bad pgbin")
.to_owned()
+ "/pg_config";
let config_output = std::process::Command::new(pgconfig)
.arg(argument)
.output()
.expect("pg_config error");
std::str::from_utf8(&config_output.stdout)
.expect("pg_config error")
.trim()
.to_string()
}
pub fn get_pg_version(pgbin: &str) -> String {
// pg_config --version returns a (platform specific) human readable string
// such as "PostgreSQL 15.4". We parse this to v14/v15/v16 etc.
let human_version = get_pg_config("--version", pgbin);
return parse_pg_version(&human_version).to_string();
}
fn parse_pg_version(human_version: &str) -> &str {
// Normal releases have version strings like "PostgreSQL 15.4". But there
// are also pre-release versions like "PostgreSQL 17devel" or "PostgreSQL
// 16beta2" or "PostgreSQL 17rc1". And with the --with-extra-version
// configure option, you can tack any string to the version number,
// e.g. "PostgreSQL 15.4foobar".
match Regex::new(r"^PostgreSQL (?<major>\d+).+")
.unwrap()
.captures(human_version)
{
Some(captures) if captures.len() == 2 => match &captures["major"] {
"14" => return "v14",
"15" => return "v15",
"16" => return "v16",
_ => {}
},
_ => {}
}
panic!("Unsuported postgres version {human_version}");
}
// download the archive for a given extension,
// unzip it, and place files in the appropriate locations (share/lib)
pub async fn download_extension(
ext_name: &str,
ext_path: &RemotePath,
ext_remote_storage: &str,
pgbin: &str,
) -> Result<u64> {
info!("Download extension {:?} from {:?}", ext_name, ext_path);
// TODO add retry logic
let download_buffer =
match download_extension_tar(ext_remote_storage, &ext_path.to_string()).await {
Ok(buffer) => buffer,
Err(error_message) => {
return Err(anyhow::anyhow!(
"error downloading extension {:?}: {:?}",
ext_name,
error_message
));
}
};
let download_size = download_buffer.len() as u64;
info!("Download size {:?}", download_size);
// it's unclear whether it is more performant to decompress into memory or not
// TODO: decompressing into memory can be avoided
let decoder = Decoder::new(download_buffer.as_ref())?;
let mut archive = Archive::new(decoder);
let unzip_dest = pgbin
.strip_suffix("/bin/postgres")
.expect("bad pgbin")
.to_string()
+ "/download_extensions";
archive.unpack(&unzip_dest)?;
info!("Download + unzip {:?} completed successfully", &ext_path);
let sharedir_paths = (
unzip_dest.to_string() + "/share/extension",
Path::new(&get_pg_config("--sharedir", pgbin)).join("extension"),
);
let libdir_paths = (
unzip_dest.to_string() + "/lib",
Path::new(&get_pg_config("--pkglibdir", pgbin)).to_path_buf(),
);
// move contents of the libdir / sharedir in unzipped archive to the correct local paths
for paths in [sharedir_paths, libdir_paths] {
let (zip_dir, real_dir) = paths;
info!("mv {zip_dir:?}/* {real_dir:?}");
for file in std::fs::read_dir(zip_dir)? {
let old_file = file?.path();
let new_file =
Path::new(&real_dir).join(old_file.file_name().context("error parsing file")?);
info!("moving {old_file:?} to {new_file:?}");
// extension download failed: Directory not empty (os error 39)
match std::fs::rename(old_file, new_file) {
Ok(()) => info!("move succeeded"),
Err(e) => {
warn!("move failed, probably because the extension already exists: {e}")
}
}
}
}
info!("done moving extension {ext_name}");
Ok(download_size)
}
// Create extension control files from spec
pub fn create_control_files(remote_extensions: &RemoteExtSpec, pgbin: &str) {
let local_sharedir = Path::new(&get_pg_config("--sharedir", pgbin)).join("extension");
for (ext_name, ext_data) in remote_extensions.extension_data.iter() {
// Check if extension is present in public or custom.
// If not, then it is not allowed to be used by this compute.
if let Some(public_extensions) = &remote_extensions.public_extensions {
if !public_extensions.contains(ext_name) {
if let Some(custom_extensions) = &remote_extensions.custom_extensions {
if !custom_extensions.contains(ext_name) {
continue; // skip this extension, it is not allowed
}
}
}
}
for (control_name, control_content) in &ext_data.control_data {
let control_path = local_sharedir.join(control_name);
if !control_path.exists() {
info!("writing file {:?}{:?}", control_path, control_content);
std::fs::write(control_path, control_content).unwrap();
} else {
warn!("control file {:?} exists both locally and remotely. ignoring the remote version.", control_path);
}
}
}
}
// Do request to extension storage proxy, i.e.
// curl http://pg-ext-s3-gateway/latest/v15/extensions/anon.tar.zst
// using HHTP GET
// and return the response body as bytes
//
async fn download_extension_tar(ext_remote_storage: &str, ext_path: &str) -> Result<Bytes> {
let uri = format!("{}/{}", ext_remote_storage, ext_path);
info!("Download extension {:?} from uri {:?}", ext_path, uri);
let resp = reqwest::get(uri).await?;
match resp.status() {
StatusCode::OK => match resp.bytes().await {
Ok(resp) => {
info!("Download extension {:?} completed successfully", ext_path);
Ok(resp)
}
Err(e) => bail!("could not deserialize remote extension response: {}", e),
},
StatusCode::SERVICE_UNAVAILABLE => bail!("remote extension is temporarily unavailable"),
_ => bail!(
"unexpected remote extension response status code: {}",
resp.status()
),
}
}
#[cfg(test)]
mod tests {
use super::parse_pg_version;
#[test]
fn test_parse_pg_version() {
assert_eq!(parse_pg_version("PostgreSQL 15.4"), "v15");
assert_eq!(parse_pg_version("PostgreSQL 15.14"), "v15");
assert_eq!(
parse_pg_version("PostgreSQL 15.4 (Ubuntu 15.4-0ubuntu0.23.04.1)"),
"v15"
);
assert_eq!(parse_pg_version("PostgreSQL 14.15"), "v14");
assert_eq!(parse_pg_version("PostgreSQL 14.0"), "v14");
assert_eq!(
parse_pg_version("PostgreSQL 14.9 (Debian 14.9-1.pgdg120+1"),
"v14"
);
assert_eq!(parse_pg_version("PostgreSQL 16devel"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16beta1"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16rc2"), "v16");
assert_eq!(parse_pg_version("PostgreSQL 16extra"), "v16");
}
#[test]
#[should_panic]
fn test_parse_pg_unsupported_version() {
parse_pg_version("PostgreSQL 13.14");
}
#[test]
#[should_panic]
fn test_parse_pg_incorrect_version_format() {
parse_pg_version("PostgreSQL 14");
}
}

View File

@@ -1,16 +1,39 @@
use std::convert::Infallible;
use std::net::IpAddr;
use std::net::Ipv6Addr;
use std::net::SocketAddr;
use std::sync::Arc;
use std::thread;
use crate::compute::{ComputeNode, ComputeState, ParsedSpec};
use compute_api::requests::ConfigurationRequest;
use compute_api::responses::{ComputeStatus, ComputeStatusResponse, GenericAPIError};
use anyhow::Result;
use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Method, Request, Response, Server, StatusCode};
use num_cpus;
use serde_json;
use tracing::{error, info};
use tokio::task;
use tracing::{error, info, warn};
use tracing_utils::http::OtelName;
use crate::compute::ComputeNode;
fn status_response_from_state(state: &ComputeState) -> ComputeStatusResponse {
ComputeStatusResponse {
start_time: state.start_time,
tenant: state
.pspec
.as_ref()
.map(|pspec| pspec.tenant_id.to_string()),
timeline: state
.pspec
.as_ref()
.map(|pspec| pspec.timeline_id.to_string()),
status: state.status,
last_active: state.last_active,
error: state.error.clone(),
}
}
// Service function to handle all available routes.
async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body> {
@@ -23,23 +46,157 @@ async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body
// Serialized compute state.
(&Method::GET, "/status") => {
info!("serving /status GET request");
let state = compute.state.read().unwrap();
Response::new(Body::from(serde_json::to_string(&*state).unwrap()))
let state = compute.state.lock().unwrap();
let status_response = status_response_from_state(&state);
Response::new(Body::from(serde_json::to_string(&status_response).unwrap()))
}
// Startup metrics in JSON format. Keep /metrics reserved for a possible
// future use for Prometheus metrics format.
(&Method::GET, "/metrics.json") => {
info!("serving /metrics.json GET request");
Response::new(Body::from(serde_json::to_string(&compute.metrics).unwrap()))
let metrics = compute.state.lock().unwrap().metrics.clone();
Response::new(Body::from(serde_json::to_string(&metrics).unwrap()))
}
// Collect Postgres current usage insights
(&Method::GET, "/insights") => {
info!("serving /insights GET request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!("compute is not running, current status: {:?}", status);
error!(msg);
return Response::new(Body::from(msg));
}
let insights = compute.collect_insights().await;
Response::new(Body::from(insights))
}
(&Method::POST, "/check_writability") => {
info!("serving /check_writability POST request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for check_writability request: {:?}",
status
);
error!(msg);
return Response::new(Body::from(msg));
}
let res = crate::checker::check_writability(compute).await;
match res {
Ok(_) => Response::new(Body::from("true")),
Err(e) => Response::new(Body::from(e.to_string())),
Err(e) => {
error!("check_writability failed: {}", e);
Response::new(Body::from(e.to_string()))
}
}
}
(&Method::GET, "/info") => {
let num_cpus = num_cpus::get_physical();
info!("serving /info GET request. num_cpus: {}", num_cpus);
Response::new(Body::from(
serde_json::json!({
"num_cpus": num_cpus,
})
.to_string(),
))
}
// Accept spec in JSON format and request compute configuration. If
// anything goes wrong after we set the compute status to `ConfigurationPending`
// and update compute state with new spec, we basically leave compute
// in the potentially wrong state. That said, it's control-plane's
// responsibility to watch compute state after reconfiguration request
// and to clean restart in case of errors.
(&Method::POST, "/configure") => {
info!("serving /configure POST request");
match handle_configure_request(req, compute).await {
Ok(msg) => Response::new(Body::from(msg)),
Err((msg, code)) => {
error!("error handling /configure request: {msg}");
render_json_error(&msg, code)
}
}
}
// download extension files from remote extension storage on demand
(&Method::POST, route) if route.starts_with("/extension_server/") => {
info!("serving {:?} POST request", route);
info!("req.uri {:?}", req.uri());
// don't even try to download extensions
// if no remote storage is configured
if compute.ext_remote_storage.is_none() {
info!("no extensions remote storage configured");
let mut resp = Response::new(Body::from("no remote storage configured"));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
return resp;
}
let mut is_library = false;
if let Some(params) = req.uri().query() {
info!("serving {:?} POST request with params: {}", route, params);
if params == "is_library=true" {
is_library = true;
} else {
let mut resp = Response::new(Body::from("Wrong request parameters"));
*resp.status_mut() = StatusCode::BAD_REQUEST;
return resp;
}
}
let filename = route.split('/').last().unwrap().to_string();
info!("serving /extension_server POST request, filename: {filename:?} is_library: {is_library}");
// get ext_name and path from spec
// don't lock compute_state for too long
let ext = {
let compute_state = compute.state.lock().unwrap();
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
let spec = &pspec.spec;
// debug only
info!("spec: {:?}", spec);
let remote_extensions = match spec.remote_extensions.as_ref() {
Some(r) => r,
None => {
info!("no remote extensions spec was provided");
let mut resp = Response::new(Body::from("no remote storage configured"));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
return resp;
}
};
remote_extensions.get_ext(
&filename,
is_library,
&compute.build_tag,
&compute.pgversion,
)
};
match ext {
Ok((ext_name, ext_path)) => {
match compute.download_extension(ext_name, ext_path).await {
Ok(_) => Response::new(Body::from("OK")),
Err(e) => {
error!("extension download failed: {}", e);
let mut resp = Response::new(Body::from(e.to_string()));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
resp
}
}
}
Err(e) => {
warn!("extension download failed to find extension: {}", e);
let mut resp = Response::new(Body::from("failed to find file"));
*resp.status_mut() = StatusCode::INTERNAL_SERVER_ERROR;
resp
}
}
}
@@ -52,10 +209,100 @@ async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body
}
}
async fn handle_configure_request(
req: Request<Body>,
compute: &Arc<ComputeNode>,
) -> Result<String, (String, StatusCode)> {
if !compute.live_config_allowed {
return Err((
"live configuration is not allowed for this compute node".to_string(),
StatusCode::PRECONDITION_FAILED,
));
}
let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap();
let spec_raw = String::from_utf8(body_bytes.to_vec()).unwrap();
if let Ok(request) = serde_json::from_str::<ConfigurationRequest>(&spec_raw) {
let spec = request.spec;
let parsed_spec = match ParsedSpec::try_from(spec) {
Ok(ps) => ps,
Err(msg) => return Err((msg, StatusCode::BAD_REQUEST)),
};
// XXX: wrap state update under lock in code blocks. Otherwise,
// we will try to `Send` `mut state` into the spawned thread
// bellow, which will cause error:
// ```
// error: future cannot be sent between threads safely
// ```
{
let mut state = compute.state.lock().unwrap();
if state.status != ComputeStatus::Empty && state.status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for configuration request: {:?}",
state.status.clone()
);
return Err((msg, StatusCode::PRECONDITION_FAILED));
}
state.pspec = Some(parsed_spec);
state.status = ComputeStatus::ConfigurationPending;
compute.state_changed.notify_all();
drop(state);
info!("set new spec and notified waiters");
}
// Spawn a blocking thread to wait for compute to become Running.
// This is needed to do not block the main pool of workers and
// be able to serve other requests while some particular request
// is waiting for compute to finish configuration.
let c = compute.clone();
task::spawn_blocking(move || {
let mut state = c.state.lock().unwrap();
while state.status != ComputeStatus::Running {
state = c.state_changed.wait(state).unwrap();
info!(
"waiting for compute to become Running, current status: {:?}",
state.status
);
if state.status == ComputeStatus::Failed {
let err = state.error.as_ref().map_or("unknown error", |x| x);
let msg = format!("compute configuration failed: {:?}", err);
return Err((msg, StatusCode::INTERNAL_SERVER_ERROR));
}
}
Ok(())
})
.await
.unwrap()?;
// Return current compute state if everything went well.
let state = compute.state.lock().unwrap().clone();
let status_response = status_response_from_state(&state);
Ok(serde_json::to_string(&status_response).unwrap())
} else {
Err(("invalid spec".to_string(), StatusCode::BAD_REQUEST))
}
}
fn render_json_error(e: &str, status: StatusCode) -> Response<Body> {
let error = GenericAPIError {
error: e.to_string(),
};
Response::builder()
.status(status)
.body(Body::from(serde_json::to_string(&error).unwrap()))
.unwrap()
}
// Main Hyper HTTP server function that runs it and blocks waiting on it forever.
#[tokio::main]
async fn serve(state: Arc<ComputeNode>) {
let addr = SocketAddr::from(([0, 0, 0, 0], 3080));
async fn serve(port: u16, state: Arc<ComputeNode>) {
// this usually binds to both IPv4 and IPv6 on linux
// see e.g. https://github.com/rust-lang/rust/pull/34440
let addr = SocketAddr::new(IpAddr::from(Ipv6Addr::UNSPECIFIED), port);
let make_service = make_service_fn(move |_conn| {
let state = state.clone();
@@ -90,10 +337,10 @@ async fn serve(state: Arc<ComputeNode>) {
}
/// Launch a separate Hyper HTTP API server thread and return its `JoinHandle`.
pub fn launch_http_server(state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
pub fn launch_http_server(port: u16, state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
let state = Arc::clone(state);
Ok(thread::Builder::new()
.name("http-endpoint".into())
.spawn(move || serve(state))?)
.spawn(move || serve(port, state))?)
}

View File

@@ -10,12 +10,12 @@ paths:
/status:
get:
tags:
- "info"
summary: Get compute node internal status
- Info
summary: Get compute node internal status.
description: ""
operationId: getComputeStatus
responses:
"200":
200:
description: ComputeState
content:
application/json:
@@ -25,35 +25,149 @@ paths:
/metrics.json:
get:
tags:
- "info"
summary: Get compute node startup metrics in JSON format
- Info
summary: Get compute node startup metrics in JSON format.
description: ""
operationId: getComputeMetricsJSON
responses:
"200":
200:
description: ComputeMetrics
content:
application/json:
schema:
$ref: "#/components/schemas/ComputeMetrics"
/insights:
get:
tags:
- Info
summary: Get current compute insights in JSON format.
description: |
Note, that this doesn't include any historical data.
operationId: getComputeInsights
responses:
200:
description: Compute insights
content:
application/json:
schema:
$ref: "#/components/schemas/ComputeInsights"
/info:
get:
tags:
- Info
summary: Get info about the compute pod / VM.
description: ""
operationId: getInfo
responses:
200:
description: Info
content:
application/json:
schema:
$ref: "#/components/schemas/Info"
/check_writability:
post:
tags:
- "check"
summary: Check that we can write new data on this compute
- Check
summary: Check that we can write new data on this compute.
description: ""
operationId: checkComputeWritability
responses:
"200":
200:
description: Check result
content:
text/plain:
schema:
type: string
description: Error text or 'true' if check passed
description: Error text or 'true' if check passed.
example: "true"
/configure:
post:
tags:
- Configure
summary: Perform compute node configuration.
description: |
This is a blocking API endpoint, i.e. it blocks waiting until
compute is finished configuration and is in `Running` state.
Optional non-blocking mode could be added later.
operationId: configureCompute
requestBody:
description: Configuration request.
required: true
content:
application/json:
schema:
type: object
required:
- spec
properties:
spec:
# XXX: I don't want to explain current spec in the OpenAPI format,
# as it could be changed really soon. Consider doing it later.
type: object
responses:
200:
description: Compute configuration finished.
content:
application/json:
schema:
$ref: "#/components/schemas/ComputeState"
400:
description: Provided spec is invalid.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
412:
description: |
It's not possible to do live-configuration of the compute.
It's either in the wrong state, or compute doesn't use pull
mode of configuration.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
500:
description: |
Compute configuration request was processed, but error
occurred. Compute will likely shutdown soon.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
/extension_server:
post:
tags:
- Extension
summary: Download extension from S3 to local folder.
description: ""
operationId: downloadExtension
responses:
200:
description: Extension downloaded
content:
text/plain:
schema:
type: string
description: Error text or 'OK' if download succeeded.
example: "OK"
400:
description: Request is invalid.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
500:
description: Extension download request failed.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
components:
securitySchemes:
JWT:
@@ -64,13 +178,16 @@ components:
schemas:
ComputeMetrics:
type: object
description: Compute startup metrics
description: Compute startup metrics.
required:
- wait_for_spec_ms
- sync_safekeepers_ms
- basebackup_ms
- config_ms
- total_startup_ms
properties:
wait_for_spec_ms:
type: integer
sync_safekeepers_ms:
type: integer
basebackup_ms:
@@ -80,28 +197,80 @@ components:
total_startup_ms:
type: integer
Info:
type: object
description: Information about VM/Pod.
required:
- num_cpus
properties:
num_cpus:
type: integer
ComputeState:
type: object
required:
- start_time
- status
- last_active
properties:
start_time:
type: string
description: |
Time when compute was started. If initially compute was started in the `empty`
state and then provided with valid spec, `start_time` will be reset to the
moment, when spec was received.
example: "2022-10-12T07:20:50.52Z"
status:
$ref: '#/components/schemas/ComputeStatus'
last_active:
type: string
description: The last detected compute activity timestamp in UTC and RFC3339 format
description: |
The last detected compute activity timestamp in UTC and RFC3339 format.
It could be empty if compute was never used by user since start.
example: "2022-10-12T07:20:50.52Z"
error:
type: string
description: Text of the error during compute startup, if any
description: Text of the error during compute startup or reconfiguration, if any.
example: ""
tenant:
type: string
description: Identifier of the current tenant served by compute node, if any.
example: c9269c359e9a199fad1ea0981246a78f
timeline:
type: string
description: Identifier of the current timeline served by compute node, if any.
example: ece7de74d4b8cbe5433a68ce4d1b97b4
ComputeInsights:
type: object
properties:
pg_stat_statements:
description: Contains raw output from pg_stat_statements in JSON format.
type: array
items:
type: object
ComputeStatus:
type: string
enum:
- empty
- init
- failed
- running
- configuration_pending
- configuration
example: running
#
# Errors
#
GenericError:
type: object
required:
- error
properties:
error:
type: string
security:
- JWT: []

View File

@@ -1,50 +0,0 @@
use std::path::Path;
use std::process;
use std::thread;
use std::time::Duration;
use tracing::{info, warn};
use anyhow::{Context, Result};
const VM_INFORMANT_PATH: &str = "/bin/vm-informant";
const RESTART_INFORMANT_AFTER_MILLIS: u64 = 5000;
/// Launch a thread to start the VM informant if it's present (and restart, on failure)
pub fn spawn_vm_informant_if_present() -> Result<Option<thread::JoinHandle<()>>> {
let exists = Path::new(VM_INFORMANT_PATH)
.try_exists()
.context("could not check if path exists")?;
if !exists {
return Ok(None);
}
Ok(Some(
thread::Builder::new()
.name("run-vm-informant".into())
.spawn(move || run_informant())?,
))
}
fn run_informant() -> ! {
let restart_wait = Duration::from_millis(RESTART_INFORMANT_AFTER_MILLIS);
info!("starting VM informant");
loop {
let mut cmd = process::Command::new(VM_INFORMANT_PATH);
// Block on subprocess:
let result = cmd.status();
match result {
Err(e) => warn!("failed to run VM informant at {VM_INFORMANT_PATH:?}: {e}"),
Ok(status) if !status.success() => {
warn!("{VM_INFORMANT_PATH} exited with code {status:?}, retrying")
}
Ok(_) => info!("{VM_INFORMANT_PATH} ended gracefully (unexpectedly). Retrying"),
}
// Wait before retrying
thread::sleep(restart_wait);
}
}

View File

@@ -1,15 +1,17 @@
//!
//! Various tools and helpers to handle cluster / compute node (Postgres)
//! configuration.
//!
#![deny(unsafe_code)]
#![deny(clippy::undocumented_unsafe_blocks)]
pub mod checker;
pub mod config;
pub mod configurator;
pub mod http;
#[macro_use]
pub mod logger;
pub mod compute;
pub mod informant;
pub mod extension_server;
pub mod monitor;
pub mod params;
pub mod pg_helpers;
pub mod spec;
pub mod sync_sk;

View File

@@ -18,6 +18,7 @@ pub fn init_tracing_and_logging(default_log_level: &str) -> anyhow::Result<()> {
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new(default_log_level));
let fmt_layer = tracing_subscriber::fmt::layer()
.with_ansi(false)
.with_target(false)
.with_writer(std::io::stderr);
@@ -33,5 +34,7 @@ pub fn init_tracing_and_logging(default_log_level: &str) -> anyhow::Result<()> {
.init();
tracing::info!("logging and tracing started");
utils::logging::replace_panic_hook_with_tracing_panic_hook().forget();
Ok(())
}

View File

@@ -1,14 +1,13 @@
use std::sync::Arc;
use std::{thread, time};
use std::{thread, time::Duration};
use anyhow::Result;
use chrono::{DateTime, Utc};
use postgres::{Client, NoTls};
use tracing::{debug, info};
use crate::compute::ComputeNode;
const MONITOR_CHECK_INTERVAL: u64 = 500; // milliseconds
const MONITOR_CHECK_INTERVAL: Duration = Duration::from_millis(500);
// Spin in a loop and figure out the last activity time in the Postgres.
// Then update it in the shared state. This function never errors out.
@@ -18,13 +17,12 @@ fn watch_compute_activity(compute: &ComputeNode) {
let connstr = compute.connstr.as_str();
// Define `client` outside of the loop to reuse existing connection if it's active.
let mut client = Client::connect(connstr, NoTls);
let timeout = time::Duration::from_millis(MONITOR_CHECK_INTERVAL);
info!("watching Postgres activity at {}", connstr);
loop {
// Should be outside of the write lock to allow others to read while we sleep.
thread::sleep(timeout);
thread::sleep(MONITOR_CHECK_INTERVAL);
match &mut client {
Ok(cli) => {
@@ -46,7 +44,7 @@ fn watch_compute_activity(compute: &ComputeNode) {
AND usename != 'cloud_admin';", // XXX: find a better way to filter other monitors?
&[],
);
let mut last_active = compute.state.read().unwrap().last_active;
let mut last_active = compute.state.lock().unwrap().last_active;
if let Ok(backs) = backends {
let mut idle_backs: Vec<DateTime<Utc>> = vec![];
@@ -74,7 +72,7 @@ fn watch_compute_activity(compute: &ComputeNode) {
// Found non-idle backend, so the last activity is NOW.
// Save it and exit the for loop. Also clear the idle backend
// `state_change` timestamps array as it doesn't matter now.
last_active = Utc::now();
last_active = Some(Utc::now());
idle_backs.clear();
break;
}
@@ -82,15 +80,16 @@ fn watch_compute_activity(compute: &ComputeNode) {
// Get idle backend `state_change` with the max timestamp.
if let Some(last) = idle_backs.iter().max() {
last_active = *last;
last_active = Some(*last);
}
}
// Update the last activity in the shared state if we got a more recent one.
let mut state = compute.state.write().unwrap();
let mut state = compute.state.lock().unwrap();
// NB: `Some(<DateTime>)` is always greater than `None`.
if last_active > state.last_active {
state.last_active = last_active;
debug!("set the last compute activity time to: {}", last_active);
debug!("set the last compute activity time to: {:?}", last_active);
}
}
Err(e) => {
@@ -104,10 +103,11 @@ fn watch_compute_activity(compute: &ComputeNode) {
}
/// Launch a separate compute monitor thread and return its `JoinHandle`.
pub fn launch_monitor(state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
pub fn launch_monitor(state: &Arc<ComputeNode>) -> thread::JoinHandle<()> {
let state = Arc::clone(state);
Ok(thread::Builder::new()
thread::Builder::new()
.name("compute-monitor".into())
.spawn(move || watch_compute_activity(&state))?)
.spawn(move || watch_compute_activity(&state))
.expect("cannot launch compute monitor thread")
}

View File

@@ -6,4 +6,4 @@ pub const DEFAULT_LOG_LEVEL: &str = "info";
// https://www.postgresql.org/docs/15/auth-password.html
//
// So it's safe to set md5 here, as `control-plane` anyway uses SCRAM for all roles.
pub const PG_HBA_ALL_MD5: &str = "host\tall\t\tall\t\t0.0.0.0/0\t\tmd5";
pub const PG_HBA_ALL_MD5: &str = "host\tall\t\tall\t\tall\t\tmd5";

View File

@@ -1,3 +1,4 @@
use std::collections::HashMap;
use std::fmt::Write;
use std::fs;
use std::fs::File;
@@ -10,49 +11,45 @@ use std::time::{Duration, Instant};
use anyhow::{bail, Result};
use notify::{RecursiveMode, Watcher};
use postgres::{Client, Transaction};
use serde::Deserialize;
use tracing::{debug, instrument};
use compute_api::spec::{Database, GenericOption, GenericOptions, PgIdent, Role};
const POSTGRES_WAIT_TIMEOUT: Duration = Duration::from_millis(60 * 1000); // milliseconds
/// Rust representation of Postgres role info with only those fields
/// that matter for us.
#[derive(Clone, Deserialize)]
pub struct Role {
pub name: PgIdent,
pub encrypted_password: Option<String>,
pub options: GenericOptions,
/// Escape a string for including it in a SQL literal. Wrapping the result
/// with `E'{}'` or `'{}'` is not required, as it returns a ready-to-use
/// SQL string literal, e.g. `'db'''` or `E'db\\'`.
/// See <https://github.com/postgres/postgres/blob/da98d005cdbcd45af563d0c4ac86d0e9772cd15f/src/backend/utils/adt/quote.c#L47>
/// for the original implementation.
pub fn escape_literal(s: &str) -> String {
let res = s.replace('\'', "''").replace('\\', "\\\\");
if res.contains('\\') {
format!("E'{}'", res)
} else {
format!("'{}'", res)
}
}
/// Rust representation of Postgres database info with only those fields
/// that matter for us.
#[derive(Clone, Deserialize)]
pub struct Database {
pub name: PgIdent,
pub owner: PgIdent,
pub options: GenericOptions,
/// Escape a string so that it can be used in postgresql.conf. Wrapping the result
/// with `'{}'` is not required, as it returns a ready-to-use config string.
pub fn escape_conf_value(s: &str) -> String {
let res = s.replace('\'', "''").replace('\\', "\\\\");
format!("'{}'", res)
}
/// Common type representing both SQL statement params with or without value,
/// like `LOGIN` or `OWNER username` in the `CREATE/ALTER ROLE`, and config
/// options like `wal_level = logical`.
#[derive(Clone, Deserialize)]
pub struct GenericOption {
pub name: String,
pub value: Option<String>,
pub vartype: String,
trait GenericOptionExt {
fn to_pg_option(&self) -> String;
fn to_pg_setting(&self) -> String;
}
/// Optional collection of `GenericOption`'s. Type alias allows us to
/// declare a `trait` on it.
pub type GenericOptions = Option<Vec<GenericOption>>;
impl GenericOption {
impl GenericOptionExt for GenericOption {
/// Represent `GenericOption` as SQL statement parameter.
pub fn to_pg_option(&self) -> String {
fn to_pg_option(&self) -> String {
if let Some(val) = &self.value {
match self.vartype.as_ref() {
"string" => format!("{} '{}'", self.name, val),
"string" => format!("{} {}", self.name, escape_literal(val)),
_ => format!("{} {}", self.name, val),
}
} else {
@@ -61,18 +58,11 @@ impl GenericOption {
}
/// Represent `GenericOption` as configuration option.
pub fn to_pg_setting(&self) -> String {
fn to_pg_setting(&self) -> String {
if let Some(val) = &self.value {
let name = match self.name.as_str() {
"safekeepers" => "neon.safekeepers",
"wal_acceptor_reconnect" => "neon.safekeeper_reconnect_timeout",
"wal_acceptor_connection_timeout" => "neon.safekeeper_connection_timeout",
it => it,
};
match self.vartype.as_ref() {
"string" => format!("{} = '{}'", name, val),
_ => format!("{} = {}", name, val),
"string" => format!("{} = {}", self.name, escape_conf_value(val)),
_ => format!("{} = {}", self.name, val),
}
} else {
self.name.to_owned()
@@ -107,6 +97,7 @@ impl PgOptionsSerialize for GenericOptions {
.map(|op| op.to_pg_setting())
.collect::<Vec<String>>()
.join("\n")
+ "\n" // newline after last setting
} else {
"".to_string()
}
@@ -115,6 +106,7 @@ impl PgOptionsSerialize for GenericOptions {
pub trait GenericOptionsSearch {
fn find(&self, name: &str) -> Option<String>;
fn find_ref(&self, name: &str) -> Option<&GenericOption>;
}
impl GenericOptionsSearch for GenericOptions {
@@ -124,16 +116,25 @@ impl GenericOptionsSearch for GenericOptions {
let op = ops.iter().find(|s| s.name == name)?;
op.value.clone()
}
/// Lookup option by name, returning ref
fn find_ref(&self, name: &str) -> Option<&GenericOption> {
let ops = self.as_ref()?;
ops.iter().find(|s| s.name == name)
}
}
impl Role {
pub trait RoleExt {
fn to_pg_options(&self) -> String;
}
impl RoleExt for Role {
/// Serialize a list of role parameters into a Postgres-acceptable
/// string of arguments.
pub fn to_pg_options(&self) -> String {
fn to_pg_options(&self) -> String {
// XXX: consider putting LOGIN as a default option somewhere higher, e.g. in control-plane.
// For now, we do not use generic `options` for roles. Once used, add
// `self.options.as_pg_options()` somewhere here.
let mut params: String = "LOGIN".to_string();
let mut params: String = self.options.as_pg_options();
params.push_str(" LOGIN");
if let Some(pass) = &self.encrypted_password {
// Some time ago we supported only md5 and treated all encrypted_password as md5.
@@ -154,21 +155,17 @@ impl Role {
}
}
impl Database {
pub fn new(name: PgIdent, owner: PgIdent) -> Self {
Self {
name,
owner,
options: None,
}
}
pub trait DatabaseExt {
fn to_pg_options(&self) -> String;
}
impl DatabaseExt for Database {
/// Serialize a list of database parameters into a Postgres-acceptable
/// string of arguments.
/// NB: `TEMPLATE` is actually also an identifier, but so far we only need
/// to use `template0` and `template1`, so it is not a problem. Yet in the future
/// it may require a proper quoting too.
pub fn to_pg_options(&self) -> String {
fn to_pg_options(&self) -> String {
let mut params: String = self.options.as_pg_options();
write!(params, " OWNER {}", &self.owner.pg_quote())
.expect("String is documented to not to error during write operations");
@@ -177,10 +174,6 @@ impl Database {
}
}
/// String type alias representing Postgres identifier and
/// intended to be used for DB / role names.
pub type PgIdent = String;
/// Generic trait used to provide quoting / encoding for strings used in the
/// Postgres SQL queries and DATABASE_URL.
pub trait Escaping {
@@ -213,24 +206,43 @@ pub fn get_existing_roles(xact: &mut Transaction<'_>) -> Result<Vec<Role>> {
}
/// Build a list of existing Postgres databases
pub fn get_existing_dbs(client: &mut Client) -> Result<Vec<Database>> {
let postgres_dbs = client
pub fn get_existing_dbs(client: &mut Client) -> Result<HashMap<String, Database>> {
// `pg_database.datconnlimit = -2` means that the database is in the
// invalid state. See:
// https://github.com/postgres/postgres/commit/a4b4cc1d60f7e8ccfcc8ff8cb80c28ee411ad9a9
let postgres_dbs: Vec<Database> = client
.query(
"SELECT datname, datdba::regrole::text as owner
FROM pg_catalog.pg_database;",
"SELECT
datname AS name,
datdba::regrole::text AS owner,
NOT datallowconn AS restrict_conn,
datconnlimit = - 2 AS invalid
FROM
pg_catalog.pg_database;",
&[],
)?
.iter()
.map(|row| Database::new(row.get("datname"), row.get("owner")))
.map(|row| Database {
name: row.get("name"),
owner: row.get("owner"),
restrict_conn: row.get("restrict_conn"),
invalid: row.get("invalid"),
options: None,
})
.collect();
Ok(postgres_dbs)
let dbs_map = postgres_dbs
.iter()
.map(|db| (db.name.clone(), db.clone()))
.collect::<HashMap<_, _>>();
Ok(dbs_map)
}
/// Wait for Postgres to become ready to accept connections. It's ready to
/// accept connections when the state-field in `pgdata/postmaster.pid` says
/// 'ready'.
#[instrument(skip(pg))]
#[instrument(skip_all, fields(pgdata = %pgdata.display()))]
pub fn wait_for_postgres(pg: &mut Child, pgdata: &Path) -> Result<()> {
let pid_path = pgdata.join("postmaster.pid");

View File

@@ -1,68 +1,121 @@
use std::collections::HashMap;
use std::fs::File;
use std::path::Path;
use std::str::FromStr;
use anyhow::Result;
use anyhow::{anyhow, bail, Result};
use postgres::config::Config;
use postgres::{Client, NoTls};
use serde::Deserialize;
use tracing::{info, info_span, instrument, span_enabled, warn, Level};
use reqwest::StatusCode;
use tracing::{error, info, info_span, instrument, span_enabled, warn, Level};
use crate::compute::ComputeNode;
use crate::config;
use crate::params::PG_HBA_ALL_MD5;
use crate::pg_helpers::*;
/// Cluster spec or configuration represented as an optional number of
/// delta operations + final cluster state description.
#[derive(Clone, Deserialize)]
pub struct ComputeSpec {
pub format_version: f32,
pub timestamp: String,
pub operation_uuid: Option<String>,
/// Expected cluster state at the end of transition process.
pub cluster: Cluster,
pub delta_operations: Option<Vec<DeltaOp>>,
use compute_api::responses::{ControlPlaneComputeStatus, ControlPlaneSpecResponse};
use compute_api::spec::{ComputeSpec, PgIdent, Role};
pub startup_tracing_context: Option<HashMap<String, String>>,
// Do control plane request and return response if any. In case of error it
// returns a bool flag indicating whether it makes sense to retry the request
// and a string with error message.
fn do_control_plane_request(
uri: &str,
jwt: &str,
) -> Result<ControlPlaneSpecResponse, (bool, String)> {
let resp = reqwest::blocking::Client::new()
.get(uri)
.header("Authorization", format!("Bearer {}", jwt))
.send()
.map_err(|e| {
(
true,
format!("could not perform spec request to control plane: {}", e),
)
})?;
match resp.status() {
StatusCode::OK => match resp.json::<ControlPlaneSpecResponse>() {
Ok(spec_resp) => Ok(spec_resp),
Err(e) => Err((
true,
format!("could not deserialize control plane response: {}", e),
)),
},
StatusCode::SERVICE_UNAVAILABLE => {
Err((true, "control plane is temporarily unavailable".to_string()))
}
StatusCode::BAD_GATEWAY => {
// We have a problem with intermittent 502 errors now
// https://github.com/neondatabase/cloud/issues/2353
// It's fine to retry GET request in this case.
Err((true, "control plane request failed with 502".to_string()))
}
// Another code, likely 500 or 404, means that compute is unknown to the control plane
// or some internal failure happened. Doesn't make much sense to retry in this case.
_ => Err((
false,
format!(
"unexpected control plane response status code: {}",
resp.status()
),
)),
}
}
/// Cluster state seen from the perspective of the external tools
/// like Rails web console.
#[derive(Clone, Deserialize)]
pub struct Cluster {
pub cluster_id: String,
pub name: String,
pub state: Option<String>,
pub roles: Vec<Role>,
pub databases: Vec<Database>,
pub settings: GenericOptions,
}
/// Request spec from the control-plane by compute_id. If `NEON_CONTROL_PLANE_TOKEN`
/// env variable is set, it will be used for authorization.
pub fn get_spec_from_control_plane(
base_uri: &str,
compute_id: &str,
) -> Result<Option<ComputeSpec>> {
let cp_uri = format!("{base_uri}/compute/api/v2/computes/{compute_id}/spec");
let jwt: String = match std::env::var("NEON_CONTROL_PLANE_TOKEN") {
Ok(v) => v,
Err(_) => "".to_string(),
};
let mut attempt = 1;
let mut spec: Result<Option<ComputeSpec>> = Ok(None);
/// Single cluster state changing operation that could not be represented as
/// a static `Cluster` structure. For example:
/// - DROP DATABASE
/// - DROP ROLE
/// - ALTER ROLE name RENAME TO new_name
/// - ALTER DATABASE name RENAME TO new_name
#[derive(Clone, Deserialize)]
pub struct DeltaOp {
pub action: String,
pub name: PgIdent,
pub new_name: Option<PgIdent>,
}
info!("getting spec from control plane: {}", cp_uri);
/// It takes cluster specification and does the following:
/// - Serialize cluster config and put it into `postgresql.conf` completely rewriting the file.
/// - Update `pg_hba.conf` to allow external connections.
pub fn handle_configuration(spec: &ComputeSpec, pgdata_path: &Path) -> Result<()> {
// File `postgresql.conf` is no longer included into `basebackup`, so just
// always write all config into it creating new file.
config::write_postgres_conf(&pgdata_path.join("postgresql.conf"), spec)?;
// Do 3 attempts to get spec from the control plane using the following logic:
// - network error -> then retry
// - compute id is unknown or any other error -> bail out
// - no spec for compute yet (Empty state) -> return Ok(None)
// - got spec -> return Ok(Some(spec))
while attempt < 4 {
spec = match do_control_plane_request(&cp_uri, &jwt) {
Ok(spec_resp) => match spec_resp.status {
ControlPlaneComputeStatus::Empty => Ok(None),
ControlPlaneComputeStatus::Attached => {
if let Some(spec) = spec_resp.spec {
Ok(Some(spec))
} else {
bail!("compute is attached, but spec is empty")
}
}
},
Err((retry, msg)) => {
if retry {
Err(anyhow!(msg))
} else {
bail!(msg);
}
}
};
update_pg_hba(pgdata_path)?;
if let Err(e) = &spec {
error!("attempt {} to get spec failed with: {}", attempt, e);
} else {
return spec;
}
Ok(())
attempt += 1;
std::thread::sleep(std::time::Duration::from_millis(100));
}
// All attempts failed, return error.
spec
}
/// Check `pg_hba.conf` and update if needed to allow external connections.
@@ -80,6 +133,53 @@ pub fn update_pg_hba(pgdata_path: &Path) -> Result<()> {
Ok(())
}
/// Create a standby.signal file
pub fn add_standby_signal(pgdata_path: &Path) -> Result<()> {
// XXX: consider making it a part of spec.json
info!("adding standby.signal");
let signalfile = pgdata_path.join("standby.signal");
if !signalfile.exists() {
info!("created standby.signal");
File::create(signalfile)?;
} else {
info!("reused pre-existing standby.signal");
}
Ok(())
}
/// Compute could be unexpectedly shut down, for example, during the
/// database dropping. This leaves the database in the invalid state,
/// which prevents new db creation with the same name. This function
/// will clean it up before proceeding with catalog updates. All
/// possible future cleanup operations may go here too.
#[instrument(skip_all)]
pub fn cleanup_instance(client: &mut Client) -> Result<()> {
let existing_dbs = get_existing_dbs(client)?;
for (_, db) in existing_dbs {
if db.invalid {
// After recent commit in Postgres, interrupted DROP DATABASE
// leaves the database in the invalid state. According to the
// commit message, the only option for user is to drop it again.
// See:
// https://github.com/postgres/postgres/commit/a4b4cc1d60f7e8ccfcc8ff8cb80c28ee411ad9a9
//
// Postgres Neon extension is done the way, that db is de-registered
// in the control plane metadata only after it is dropped. So there is
// a chance that it still thinks that db should exist. This means
// that it will be re-created by `handle_databases()`. Yet, it's fine
// as user can just repeat drop (in vanilla Postgres they would need
// to do the same, btw).
let query = format!("DROP DATABASE IF EXISTS {}", db.name.pg_quote());
info!("dropping invalid database {}", db.name);
client.execute(query.as_str(), &[])?;
}
}
Ok(())
}
/// Given a cluster spec json and open transaction it handles roles creation,
/// deletion and update.
#[instrument(skip_all)]
@@ -183,22 +283,27 @@ pub fn handle_roles(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
match action {
RoleAction::None => {}
RoleAction::Update => {
// This can be run on /every/ role! Not just ones created through the console.
// This means that if you add some funny ALTER here that adds a permission,
// this will get run even on user-created roles! This will result in different
// behavior before and after a spec gets reapplied. The below ALTER as it stands
// now only grants LOGIN and changes the password. Please do not allow this branch
// to do anything silly.
let mut query: String = format!("ALTER ROLE {} ", name.pg_quote());
query.push_str(&role.to_pg_options());
xact.execute(query.as_str(), &[])?;
}
RoleAction::Create => {
let mut query: String = format!("CREATE ROLE {} ", name.pg_quote());
// This branch only runs when roles are created through the console, so it is
// safe to add more permissions here. BYPASSRLS and REPLICATION are inherited
// from neon_superuser.
let mut query: String = format!(
"CREATE ROLE {} INHERIT CREATEROLE CREATEDB BYPASSRLS REPLICATION IN ROLE neon_superuser",
name.pg_quote()
);
info!("role create query: '{}'", &query);
query.push_str(&role.to_pg_options());
xact.execute(query.as_str(), &[])?;
let grant_query = format!(
"GRANT pg_read_all_data, pg_write_all_data TO {}",
name.pg_quote()
);
xact.execute(grant_query.as_str(), &[])?;
info!("role grant query: '{}'", &grant_query);
}
}
@@ -224,8 +329,8 @@ pub fn handle_roles(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
/// Reassign all dependent objects and delete requested roles.
#[instrument(skip_all)]
pub fn handle_role_deletions(node: &ComputeNode, client: &mut Client) -> Result<()> {
if let Some(ops) = &node.spec.delta_operations {
pub fn handle_role_deletions(spec: &ComputeSpec, connstr: &str, client: &mut Client) -> Result<()> {
if let Some(ops) = &spec.delta_operations {
// First, reassign all dependent objects to db owners.
info!("reassigning dependent objects of to-be-deleted roles");
@@ -242,7 +347,7 @@ pub fn handle_role_deletions(node: &ComputeNode, client: &mut Client) -> Result<
// Check that role is still present in Postgres, as this could be a
// restart with the same spec after role deletion.
if op.action == "delete_role" && existing_roles.iter().any(|r| r.name == op.name) {
reassign_owned_objects(node, &op.name)?;
reassign_owned_objects(spec, connstr, &op.name)?;
}
}
@@ -266,10 +371,10 @@ pub fn handle_role_deletions(node: &ComputeNode, client: &mut Client) -> Result<
}
// Reassign all owned objects in all databases to the owner of the database.
fn reassign_owned_objects(node: &ComputeNode, role_name: &PgIdent) -> Result<()> {
for db in &node.spec.cluster.databases {
fn reassign_owned_objects(spec: &ComputeSpec, connstr: &str, role_name: &PgIdent) -> Result<()> {
for db in &spec.cluster.databases {
if db.owner != *role_name {
let mut conf = Config::from_str(node.connstr.as_str())?;
let mut conf = Config::from_str(connstr)?;
conf.dbname(&db.name);
let mut client = conf.connect(NoTls)?;
@@ -302,13 +407,13 @@ fn reassign_owned_objects(node: &ComputeNode, role_name: &PgIdent) -> Result<()>
/// which together provide us idempotency.
#[instrument(skip_all)]
pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
let existing_dbs: Vec<Database> = get_existing_dbs(client)?;
let existing_dbs = get_existing_dbs(client)?;
// Print a list of existing Postgres databases (only in debug mode)
if span_enabled!(Level::INFO) {
info!("postgres databases:");
for r in &existing_dbs {
info!(" {}:{}", r.name, r.owner);
for (dbname, db) in &existing_dbs {
info!(" {}:{}", dbname, db.owner);
}
}
@@ -320,16 +425,49 @@ pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
// We do not check either DB exists or not,
// Postgres will take care of it for us
"delete_db" => {
let query: String = format!("DROP DATABASE IF EXISTS {}", &op.name.pg_quote());
// In Postgres we can't drop a database if it is a template.
// So we need to unset the template flag first, but it could
// be a retry, so we could've already dropped the database.
// Check that database exists first to make it idempotent.
let unset_template_query: String = format!(
"
DO $$
BEGIN
IF EXISTS(
SELECT 1
FROM pg_catalog.pg_database
WHERE datname = {}
)
THEN
ALTER DATABASE {} is_template false;
END IF;
END
$$;",
escape_literal(&op.name),
&op.name.pg_quote()
);
// Use FORCE to drop database even if there are active connections.
// We run this from `cloud_admin`, so it should have enough privileges.
// NB: there could be other db states, which prevent us from dropping
// the database. For example, if db is used by any active subscription
// or replication slot.
// TODO: deal with it once we allow logical replication. Proper fix should
// involve returning an error code to the control plane, so it could
// figure out that this is a non-retryable error, return it to the user
// and fail operation permanently.
let drop_db_query: String = format!(
"DROP DATABASE IF EXISTS {} WITH (FORCE)",
&op.name.pg_quote()
);
warn!("deleting database '{}'", &op.name);
client.execute(query.as_str(), &[])?;
client.execute(unset_template_query.as_str(), &[])?;
client.execute(drop_db_query.as_str(), &[])?;
}
"rename_db" => {
let new_name = op.new_name.as_ref().unwrap();
// XXX: with a limited number of roles it is fine, but consider making it a HashMap
if existing_dbs.iter().any(|r| r.name == op.name) {
if existing_dbs.get(&op.name).is_some() {
let query: String = format!(
"ALTER DATABASE {} RENAME TO {}",
op.name.pg_quote(),
@@ -346,14 +484,12 @@ pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
}
// Refresh Postgres databases info to handle possible renames
let existing_dbs: Vec<Database> = get_existing_dbs(client)?;
let existing_dbs = get_existing_dbs(client)?;
info!("cluster spec databases:");
for db in &spec.cluster.databases {
let name = &db.name;
// XXX: with a limited number of databases it is fine, but consider making it a HashMap
let pg_db = existing_dbs.iter().find(|r| r.name == *name);
let pg_db = existing_dbs.get(name);
enum DatabaseAction {
None,
@@ -395,6 +531,11 @@ pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
query.push_str(&db.to_pg_options());
let _guard = info_span!("executing", query).entered();
client.execute(query.as_str(), &[])?;
let grant_query: String = format!(
"GRANT ALL PRIVILEGES ON DATABASE {} TO neon_superuser",
name.pg_quote()
);
client.execute(grant_query.as_str(), &[])?;
}
};
@@ -414,42 +555,33 @@ pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
/// Grant CREATE ON DATABASE to the database owner and do some other alters and grants
/// to allow users creating trusted extensions and re-creating `public` schema, for example.
#[instrument(skip_all)]
pub fn handle_grants(node: &ComputeNode, client: &mut Client) -> Result<()> {
let spec = &node.spec;
info!("cluster spec grants:");
// We now have a separate `web_access` role to connect to the database
// via the web interface and proxy link auth. And also we grant a
// read / write all data privilege to every role. So also grant
// create to everyone.
// XXX: later we should stop messing with Postgres ACL in such horrible
// ways.
let roles = spec
.cluster
.roles
.iter()
.map(|r| r.name.pg_quote())
.collect::<Vec<_>>();
for db in &spec.cluster.databases {
let dbname = &db.name;
let query: String = format!(
"GRANT CREATE ON DATABASE {} TO {}",
dbname.pg_quote(),
roles.join(", ")
);
info!("grant query {}", &query);
client.execute(query.as_str(), &[])?;
}
pub fn handle_grants(spec: &ComputeSpec, client: &mut Client, connstr: &str) -> Result<()> {
info!("modifying database permissions");
let existing_dbs = get_existing_dbs(client)?;
// Do some per-database access adjustments. We'd better do this at db creation time,
// but CREATE DATABASE isn't transactional. So we cannot create db + do some grants
// atomically.
for db in &node.spec.cluster.databases {
let mut conf = Config::from_str(node.connstr.as_str())?;
for db in &spec.cluster.databases {
match existing_dbs.get(&db.name) {
Some(pg_db) => {
if pg_db.restrict_conn || pg_db.invalid {
info!(
"skipping grants for db {} (invalid: {}, connections not allowed: {})",
db.name, pg_db.invalid, pg_db.restrict_conn
);
continue;
}
}
None => {
bail!(
"database {} doesn't exist in Postgres after handle_databases()",
db.name
);
}
}
let mut conf = Config::from_str(connstr)?;
conf.dbname(&db.name);
let mut db_client = conf.connect(NoTls)?;
@@ -487,6 +619,11 @@ pub fn handle_grants(node: &ComputeNode, client: &mut Client) -> Result<()> {
// Explicitly grant CREATE ON SCHEMA PUBLIC to the web_access user.
// This is needed because since postgres 15 this privilege is removed by default.
// TODO: web_access isn't created for almost 1 year. It could be that we have
// active users of 1 year old projects, but hopefully not, so check it and
// remove this code if possible. The worst thing that could happen is that
// user won't be able to use public schema in NEW databases created in the
// very OLD project.
let grant_query = "DO $$\n\
BEGIN\n\
IF EXISTS(\n\
@@ -515,3 +652,48 @@ pub fn handle_grants(node: &ComputeNode, client: &mut Client) -> Result<()> {
Ok(())
}
/// Create required system extensions
#[instrument(skip_all)]
pub fn handle_extensions(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
if let Some(libs) = spec.cluster.settings.find("shared_preload_libraries") {
if libs.contains("pg_stat_statements") {
// Create extension only if this compute really needs it
let query = "CREATE EXTENSION IF NOT EXISTS pg_stat_statements";
info!("creating system extensions with query: {}", query);
client.simple_query(query)?;
}
}
Ok(())
}
/// Run CREATE and ALTER EXTENSION neon UPDATE for postgres database
#[instrument(skip_all)]
pub fn handle_extension_neon(client: &mut Client) -> Result<()> {
info!("handle extension neon");
let mut query = "CREATE SCHEMA IF NOT EXISTS neon";
client.simple_query(query)?;
query = "CREATE EXTENSION IF NOT EXISTS neon WITH SCHEMA neon";
info!("create neon extension with query: {}", query);
client.simple_query(query)?;
query = "UPDATE pg_extension SET extrelocatable = true WHERE extname = 'neon'";
client.simple_query(query)?;
query = "ALTER EXTENSION neon SET SCHEMA neon";
info!("alter neon extension schema with query: {}", query);
client.simple_query(query)?;
// this will be a no-op if extension is already up to date,
// which may happen in two cases:
// - extension was just installed
// - extension was already installed and is up to date
let query = "ALTER EXTENSION neon UPDATE";
info!("update neon extension schema with query: {}", query);
client.simple_query(query)?;
Ok(())
}

View File

@@ -0,0 +1,98 @@
// Utils for running sync_safekeepers
use anyhow::Result;
use tracing::info;
use utils::lsn::Lsn;
#[derive(Copy, Clone, Debug)]
pub enum TimelineStatusResponse {
NotFound,
Ok(TimelineStatusOkResponse),
}
#[derive(Copy, Clone, Debug)]
pub struct TimelineStatusOkResponse {
flush_lsn: Lsn,
commit_lsn: Lsn,
}
/// Get a safekeeper's metadata for our timeline. The id is only used for logging
pub async fn ping_safekeeper(
id: String,
config: tokio_postgres::Config,
) -> Result<TimelineStatusResponse> {
// TODO add retries
// Connect
info!("connecting to {}", id);
let (client, conn) = config.connect(tokio_postgres::NoTls).await?;
tokio::spawn(async move {
if let Err(e) = conn.await {
eprintln!("connection error: {}", e);
}
});
// Query
info!("querying {}", id);
let result = client.simple_query("TIMELINE_STATUS").await?;
// Parse result
info!("done with {}", id);
if let postgres::SimpleQueryMessage::Row(row) = &result[0] {
use std::str::FromStr;
let response = TimelineStatusResponse::Ok(TimelineStatusOkResponse {
flush_lsn: Lsn::from_str(row.get("flush_lsn").unwrap())?,
commit_lsn: Lsn::from_str(row.get("commit_lsn").unwrap())?,
});
Ok(response)
} else {
// Timeline doesn't exist
Ok(TimelineStatusResponse::NotFound)
}
}
/// Given a quorum of responses, check if safekeepers are synced at some Lsn
pub fn check_if_synced(responses: Vec<TimelineStatusResponse>) -> Option<Lsn> {
// Check if all responses are ok
let ok_responses: Vec<TimelineStatusOkResponse> = responses
.iter()
.filter_map(|r| match r {
TimelineStatusResponse::Ok(ok_response) => Some(ok_response),
_ => None,
})
.cloned()
.collect();
if ok_responses.len() < responses.len() {
info!(
"not synced. Only {} out of {} know about this timeline",
ok_responses.len(),
responses.len()
);
return None;
}
// Get the min and the max of everything
let commit: Vec<Lsn> = ok_responses.iter().map(|r| r.commit_lsn).collect();
let flush: Vec<Lsn> = ok_responses.iter().map(|r| r.flush_lsn).collect();
let commit_max = commit.iter().max().unwrap();
let commit_min = commit.iter().min().unwrap();
let flush_max = flush.iter().max().unwrap();
let flush_min = flush.iter().min().unwrap();
// Check that all values are equal
if commit_min != commit_max {
info!("not synced. {:?} {:?}", commit_min, commit_max);
return None;
}
if flush_min != flush_max {
info!("not synced. {:?} {:?}", flush_min, flush_max);
return None;
}
// Check that commit == flush
if commit_max != flush_max {
info!("not synced. {:?} {:?}", commit_max, flush_max);
return None;
}
Some(*commit_max)
}

View File

@@ -1,14 +1,13 @@
#[cfg(test)]
mod pg_helpers_tests {
use std::fs::File;
use compute_api::spec::{ComputeSpec, GenericOption, GenericOptions, PgIdent};
use compute_tools::pg_helpers::*;
use compute_tools::spec::ComputeSpec;
#[test]
fn params_serialize() {
let file = File::open("tests/cluster_spec.json").unwrap();
let file = File::open("../libs/compute_api/tests/cluster_spec.json").unwrap();
let spec: ComputeSpec = serde_json::from_reader(file).unwrap();
assert_eq!(
@@ -17,18 +16,41 @@ mod pg_helpers_tests {
);
assert_eq!(
spec.cluster.roles.first().unwrap().to_pg_options(),
"LOGIN PASSWORD 'md56b1d16b78004bbd51fa06af9eda75972'"
" LOGIN PASSWORD 'md56b1d16b78004bbd51fa06af9eda75972'"
);
}
#[test]
fn settings_serialize() {
let file = File::open("tests/cluster_spec.json").unwrap();
let file = File::open("../libs/compute_api/tests/cluster_spec.json").unwrap();
let spec: ComputeSpec = serde_json::from_reader(file).unwrap();
assert_eq!(
spec.cluster.settings.as_pg_settings(),
"fsync = off\nwal_level = replica\nhot_standby = on\nneon.safekeepers = '127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501'\nwal_log_hints = on\nlog_connections = on\nshared_buffers = 32768\nport = 55432\nmax_connections = 100\nmax_wal_senders = 10\nlisten_addresses = '0.0.0.0'\nwal_sender_timeout = 0\npassword_encryption = md5\nmaintenance_work_mem = 65536\nmax_parallel_workers = 8\nmax_worker_processes = 8\nneon.tenant_id = 'b0554b632bd4d547a63b86c3630317e8'\nmax_replication_slots = 10\nneon.timeline_id = '2414a61ffc94e428f14b5758fe308e13'\nshared_preload_libraries = 'neon'\nsynchronous_standby_names = 'walproposer'\nneon.pageserver_connstring = 'host=127.0.0.1 port=6400'"
r#"fsync = off
wal_level = logical
hot_standby = on
neon.safekeepers = '127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501'
wal_log_hints = on
log_connections = on
shared_buffers = 32768
port = 55432
max_connections = 100
max_wal_senders = 10
listen_addresses = '0.0.0.0'
wal_sender_timeout = 0
password_encryption = md5
maintenance_work_mem = 65536
max_parallel_workers = 8
max_worker_processes = 8
neon.tenant_id = 'b0554b632bd4d547a63b86c3630317e8'
max_replication_slots = 10
neon.timeline_id = '2414a61ffc94e428f14b5758fe308e13'
shared_preload_libraries = 'neon'
synchronous_standby_names = 'walproposer'
neon.pageserver_connstring = 'host=127.0.0.1 port=6400'
test.escaping = 'here''s a backslash \\ and a quote '' and a double-quote " hooray'
"#
);
}
@@ -67,4 +89,12 @@ mod pg_helpers_tests {
assert_eq!(none_generic_options.find("missed_value"), None);
assert_eq!(none_generic_options.find("invalid_value"), None);
}
#[test]
fn test_escape_literal() {
assert_eq!(escape_literal("test"), "'test'");
assert_eq!(escape_literal("test'"), "'test'''");
assert_eq!(escape_literal("test\\'"), "E'test\\\\'''");
assert_eq!(escape_literal("test\\'\\'"), "E'test\\\\''\\\\'''");
}
}

View File

@@ -6,26 +6,37 @@ license.workspace = true
[dependencies]
anyhow.workspace = true
async-trait.workspace = true
camino.workspace = true
clap.workspace = true
comfy-table.workspace = true
futures.workspace = true
git-version.workspace = true
nix.workspace = true
once_cell.workspace = true
postgres.workspace = true
hex.workspace = true
hyper.workspace = true
regex.workspace = true
reqwest = { workspace = true, features = ["blocking", "json"] }
serde.workspace = true
serde_json.workspace = true
serde_with.workspace = true
tar.workspace = true
thiserror.workspace = true
toml.workspace = true
tokio.workspace = true
tokio-postgres.workspace = true
tokio-util.workspace = true
url.workspace = true
# Note: Do not directly depend on pageserver or safekeeper; use pageserver_api or safekeeper_api
# instead, so that recompile times are better.
pageserver_api.workspace = true
pageserver_client.workspace = true
postgres_backend.workspace = true
safekeeper_api.workspace = true
postgres_connection.workspace = true
storage_broker.workspace = true
utils.workspace = true
compute_api.workspace = true
workspace_hack.workspace = true
tracing.workspace = true

View File

@@ -2,7 +2,8 @@
[pageserver]
listen_pg_addr = '127.0.0.1:64000'
listen_http_addr = '127.0.0.1:9898'
auth_type = 'Trust'
pg_auth_type = 'Trust'
http_auth_type = 'Trust'
[[safekeepers]]
id = 1

View File

@@ -1,9 +1,11 @@
# Minimal neon environment with one safekeeper. This is equivalent to the built-in
# defaults that you get with no --config
[pageserver]
[[pageservers]]
id=1
listen_pg_addr = '127.0.0.1:64000'
listen_http_addr = '127.0.0.1:9898'
auth_type = 'Trust'
pg_auth_type = 'Trust'
http_auth_type = 'Trust'
[[safekeepers]]
id = 1

View File

@@ -0,0 +1,138 @@
use crate::{background_process, local_env::LocalEnv};
use anyhow::anyhow;
use camino::Utf8PathBuf;
use serde::{Deserialize, Serialize};
use std::{path::PathBuf, process::Child};
use utils::id::{NodeId, TenantId};
pub struct AttachmentService {
env: LocalEnv,
listen: String,
path: PathBuf,
client: reqwest::Client,
}
const COMMAND: &str = "attachment_service";
#[derive(Serialize, Deserialize)]
pub struct AttachHookRequest {
pub tenant_id: TenantId,
pub node_id: Option<NodeId>,
}
#[derive(Serialize, Deserialize)]
pub struct AttachHookResponse {
pub gen: Option<u32>,
}
#[derive(Serialize, Deserialize)]
pub struct InspectRequest {
pub tenant_id: TenantId,
}
#[derive(Serialize, Deserialize)]
pub struct InspectResponse {
pub attachment: Option<(u32, NodeId)>,
}
impl AttachmentService {
pub fn from_env(env: &LocalEnv) -> Self {
let path = env.base_data_dir.join("attachments.json");
// Makes no sense to construct this if pageservers aren't going to use it: assume
// pageservers have control plane API set
let listen_url = env.control_plane_api.clone().unwrap();
let listen = format!(
"{}:{}",
listen_url.host_str().unwrap(),
listen_url.port().unwrap()
);
Self {
env: env.clone(),
path,
listen,
client: reqwest::ClientBuilder::new()
.build()
.expect("Failed to construct http client"),
}
}
fn pid_file(&self) -> Utf8PathBuf {
Utf8PathBuf::from_path_buf(self.env.base_data_dir.join("attachment_service.pid"))
.expect("non-Unicode path")
}
pub async fn start(&self) -> anyhow::Result<Child> {
let path_str = self.path.to_string_lossy();
background_process::start_process(
COMMAND,
&self.env.base_data_dir,
&self.env.attachment_service_bin(),
["-l", &self.listen, "-p", &path_str],
[],
background_process::InitialPidFile::Create(self.pid_file()),
// TODO: a real status check
|| async move { anyhow::Ok(true) },
)
.await
}
pub fn stop(&self, immediate: bool) -> anyhow::Result<()> {
background_process::stop_process(immediate, COMMAND, &self.pid_file())
}
/// Call into the attach_hook API, for use before handing out attachments to pageservers
pub async fn attach_hook(
&self,
tenant_id: TenantId,
pageserver_id: NodeId,
) -> anyhow::Result<Option<u32>> {
use hyper::StatusCode;
let url = self
.env
.control_plane_api
.clone()
.unwrap()
.join("attach-hook")
.unwrap();
let request = AttachHookRequest {
tenant_id,
node_id: Some(pageserver_id),
};
let response = self.client.post(url).json(&request).send().await?;
if response.status() != StatusCode::OK {
return Err(anyhow!("Unexpected status {}", response.status()));
}
let response = response.json::<AttachHookResponse>().await?;
Ok(response.gen)
}
pub async fn inspect(&self, tenant_id: TenantId) -> anyhow::Result<Option<(u32, NodeId)>> {
use hyper::StatusCode;
let url = self
.env
.control_plane_api
.clone()
.unwrap()
.join("inspect")
.unwrap();
let request = InspectRequest { tenant_id };
let response = self.client.post(url).json(&request).send().await?;
if response.status() != StatusCode::OK {
return Err(anyhow!("Unexpected status {}", response.status()));
}
let response = response.json::<InspectResponse>().await?;
Ok(response.attachment)
}
}

View File

@@ -10,18 +10,19 @@
//! (non-Neon binaries don't necessarily follow our pidfile conventions).
//! The pid stored in the file is later used to stop the service.
//!
//! See [`lock_file`] module for more info.
//! See the [`lock_file`](utils::lock_file) module for more info.
use std::ffi::OsStr;
use std::io::Write;
use std::os::unix::prelude::AsRawFd;
use std::os::unix::process::CommandExt;
use std::path::{Path, PathBuf};
use std::path::Path;
use std::process::{Child, Command};
use std::time::Duration;
use std::{fs, io, thread};
use anyhow::Context;
use camino::{Utf8Path, Utf8PathBuf};
use nix::errno::Errno;
use nix::fcntl::{FcntlArg, FdFlag};
use nix::sys::signal::{kill, Signal};
@@ -43,15 +44,15 @@ const NOTICE_AFTER_RETRIES: u64 = 50;
/// Argument to `start_process`, to indicate whether it should create pidfile or if the process creates
/// it itself.
pub enum InitialPidFile<'t> {
pub enum InitialPidFile {
/// Create a pidfile, to allow future CLI invocations to manipulate the process.
Create(&'t Path),
Create(Utf8PathBuf),
/// The process will create the pidfile itself, need to wait for that event.
Expect(&'t Path),
Expect(Utf8PathBuf),
}
/// Start a background child process using the parameters given.
pub fn start_process<F, AI, A, EI>(
pub async fn start_process<F, Fut, AI, A, EI>(
process_name: &str,
datadir: &Path,
command: &Path,
@@ -61,7 +62,8 @@ pub fn start_process<F, AI, A, EI>(
process_status_check: F,
) -> anyhow::Result<Child>
where
F: Fn() -> anyhow::Result<bool>,
F: Fn() -> Fut,
Fut: std::future::Future<Output = anyhow::Result<bool>>,
AI: IntoIterator<Item = A>,
A: AsRef<OsStr>,
// Not generic AsRef<OsStr>, otherwise empty `envs` prevents type inference
@@ -85,10 +87,10 @@ where
.stdout(process_log_file)
.stderr(same_file_for_stderr)
.args(args);
let filled_cmd = fill_aws_secrets_vars(fill_rust_env_vars(background_command));
let filled_cmd = fill_remote_storage_secrets_vars(fill_rust_env_vars(background_command));
filled_cmd.envs(envs);
let pid_file_to_check = match initial_pid_file {
let pid_file_to_check = match &initial_pid_file {
InitialPidFile::Create(path) => {
pre_exec_create_pidfile(filled_cmd, path);
path
@@ -106,7 +108,7 @@ where
);
for retries in 0..RETRIES {
match process_started(pid, Some(pid_file_to_check), &process_status_check) {
match process_started(pid, pid_file_to_check, &process_status_check).await {
Ok(true) => {
println!("\n{process_name} started, pid: {pid}");
return Ok(spawned_process);
@@ -137,7 +139,11 @@ where
}
/// Stops the process, using the pid file given. Returns Ok also if the process is already not running.
pub fn stop_process(immediate: bool, process_name: &str, pid_file: &Path) -> anyhow::Result<()> {
pub fn stop_process(
immediate: bool,
process_name: &str,
pid_file: &Utf8Path,
) -> anyhow::Result<()> {
let pid = match pid_file::read(pid_file)
.with_context(|| format!("read pid_file {pid_file:?}"))?
{
@@ -180,6 +186,11 @@ pub fn stop_process(immediate: bool, process_name: &str, pid_file: &Path) -> any
}
// Wait until process is gone
wait_until_stopped(process_name, pid)?;
Ok(())
}
pub fn wait_until_stopped(process_name: &str, pid: Pid) -> anyhow::Result<()> {
for retries in 0..RETRIES {
match process_has_stopped(pid) {
Ok(true) => {
@@ -228,11 +239,13 @@ fn fill_rust_env_vars(cmd: &mut Command) -> &mut Command {
filled_cmd
}
fn fill_aws_secrets_vars(mut cmd: &mut Command) -> &mut Command {
fn fill_remote_storage_secrets_vars(mut cmd: &mut Command) -> &mut Command {
for env_key in [
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AZURE_STORAGE_ACCOUNT",
"AZURE_STORAGE_ACCESS_KEY",
] {
if let Ok(value) = std::env::var(env_key) {
cmd = cmd.env(env_key, value);
@@ -247,10 +260,10 @@ fn fill_aws_secrets_vars(mut cmd: &mut Command) -> &mut Command {
/// will remain held until the cmd exits.
fn pre_exec_create_pidfile<P>(cmd: &mut Command, path: P) -> &mut Command
where
P: Into<PathBuf>,
P: Into<Utf8PathBuf>,
{
let path: PathBuf = path.into();
// SAFETY
let path: Utf8PathBuf = path.into();
// SAFETY:
// pre_exec is marked unsafe because it runs between fork and exec.
// Why is that dangerous in various ways?
// Long answer: https://github.com/rust-lang/rust/issues/39575
@@ -304,22 +317,20 @@ where
cmd
}
fn process_started<F>(
async fn process_started<F, Fut>(
pid: Pid,
pid_file_to_check: Option<&Path>,
pid_file_to_check: &Utf8Path,
status_check: &F,
) -> anyhow::Result<bool>
where
F: Fn() -> anyhow::Result<bool>,
F: Fn() -> Fut,
Fut: std::future::Future<Output = anyhow::Result<bool>>,
{
match status_check() {
Ok(true) => match pid_file_to_check {
Some(pid_file_path) => match pid_file::read(pid_file_path)? {
PidFileRead::NotExist => Ok(false),
PidFileRead::LockedByOtherProcess(pid_in_file) => Ok(pid_in_file == pid),
PidFileRead::NotHeldByAnyProcess(_) => Ok(false),
},
None => Ok(true),
match status_check().await {
Ok(true) => match pid_file::read(pid_file_to_check)? {
PidFileRead::NotExist => Ok(false),
PidFileRead::LockedByOtherProcess(pid_in_file) => Ok(pid_in_file == pid),
PidFileRead::NotHeldByAnyProcess(_) => Ok(false),
},
Ok(false) => Ok(false),
Err(e) => anyhow::bail!("process failed to start: {e}"),

View File

@@ -0,0 +1,337 @@
/// The attachment service mimics the aspects of the control plane API
/// that are required for a pageserver to operate.
///
/// This enables running & testing pageservers without a full-blown
/// deployment of the Neon cloud platform.
///
use anyhow::anyhow;
use clap::Parser;
use hex::FromHex;
use hyper::StatusCode;
use hyper::{Body, Request, Response};
use pageserver_api::shard::TenantShardId;
use serde::{Deserialize, Serialize};
use std::path::{Path, PathBuf};
use std::{collections::HashMap, sync::Arc};
use utils::http::endpoint::request_span;
use utils::logging::{self, LogFormat};
use utils::signals::{ShutdownSignals, Signal};
use utils::{
http::{
endpoint::{self},
error::ApiError,
json::{json_request, json_response},
RequestExt, RouterBuilder,
},
id::{NodeId, TenantId},
tcp_listener,
};
use pageserver_api::control_api::{
ReAttachRequest, ReAttachResponse, ReAttachResponseTenant, ValidateRequest, ValidateResponse,
ValidateResponseTenant,
};
use control_plane::attachment_service::{
AttachHookRequest, AttachHookResponse, InspectRequest, InspectResponse,
};
#[derive(Parser)]
#[command(author, version, about, long_about = None)]
#[command(arg_required_else_help(true))]
struct Cli {
/// Host and port to listen on, like `127.0.0.1:1234`
#[arg(short, long)]
listen: std::net::SocketAddr,
/// Path to the .json file to store state (will be created if it doesn't exist)
#[arg(short, long)]
path: PathBuf,
}
// The persistent state of each Tenant
#[derive(Serialize, Deserialize, Clone)]
struct TenantState {
// Currently attached pageserver
pageserver: Option<NodeId>,
// Latest generation number: next time we attach, increment this
// and use the incremented number when attaching
generation: u32,
}
fn to_hex_map<S, V>(input: &HashMap<TenantId, V>, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
V: Clone + Serialize,
{
let transformed = input.iter().map(|(k, v)| (hex::encode(k), v.clone()));
transformed
.collect::<HashMap<String, V>>()
.serialize(serializer)
}
fn from_hex_map<'de, D, V>(deserializer: D) -> Result<HashMap<TenantId, V>, D::Error>
where
D: serde::de::Deserializer<'de>,
V: Deserialize<'de>,
{
let hex_map = HashMap::<String, V>::deserialize(deserializer)?;
hex_map
.into_iter()
.map(|(k, v)| {
TenantId::from_hex(k)
.map(|k| (k, v))
.map_err(serde::de::Error::custom)
})
.collect()
}
// Top level state available to all HTTP handlers
#[derive(Serialize, Deserialize)]
struct PersistentState {
#[serde(serialize_with = "to_hex_map", deserialize_with = "from_hex_map")]
tenants: HashMap<TenantId, TenantState>,
#[serde(skip)]
path: PathBuf,
}
impl PersistentState {
async fn save(&self) -> anyhow::Result<()> {
let bytes = serde_json::to_vec(self)?;
tokio::fs::write(&self.path, &bytes).await?;
Ok(())
}
async fn load(path: &Path) -> anyhow::Result<Self> {
let bytes = tokio::fs::read(path).await?;
let mut decoded = serde_json::from_slice::<Self>(&bytes)?;
decoded.path = path.to_owned();
Ok(decoded)
}
async fn load_or_new(path: &Path) -> Self {
match Self::load(path).await {
Ok(s) => {
tracing::info!("Loaded state file at {}", path.display());
s
}
Err(e)
if e.downcast_ref::<std::io::Error>()
.map(|e| e.kind() == std::io::ErrorKind::NotFound)
.unwrap_or(false) =>
{
tracing::info!("Will create state file at {}", path.display());
Self {
tenants: HashMap::new(),
path: path.to_owned(),
}
}
Err(e) => {
panic!("Failed to load state from '{}': {e:#} (maybe your .neon/ dir was written by an older version?)", path.display())
}
}
}
}
/// State available to HTTP request handlers
#[derive(Clone)]
struct State {
inner: Arc<tokio::sync::RwLock<PersistentState>>,
}
impl State {
fn new(persistent_state: PersistentState) -> State {
Self {
inner: Arc::new(tokio::sync::RwLock::new(persistent_state)),
}
}
}
#[inline(always)]
fn get_state(request: &Request<Body>) -> &State {
request
.data::<Arc<State>>()
.expect("unknown state type")
.as_ref()
}
/// Pageserver calls into this on startup, to learn which tenants it should attach
async fn handle_re_attach(mut req: Request<Body>) -> Result<Response<Body>, ApiError> {
let reattach_req = json_request::<ReAttachRequest>(&mut req).await?;
let state = get_state(&req).inner.clone();
let mut locked = state.write().await;
let mut response = ReAttachResponse {
tenants: Vec::new(),
};
for (t, state) in &mut locked.tenants {
if state.pageserver == Some(reattach_req.node_id) {
state.generation += 1;
response.tenants.push(ReAttachResponseTenant {
// TODO(sharding): make this shard-aware
id: TenantShardId::unsharded(*t),
gen: state.generation,
});
}
}
locked.save().await.map_err(ApiError::InternalServerError)?;
json_response(StatusCode::OK, response)
}
/// Pageserver calls into this before doing deletions, to confirm that it still
/// holds the latest generation for the tenants with deletions enqueued
async fn handle_validate(mut req: Request<Body>) -> Result<Response<Body>, ApiError> {
let validate_req = json_request::<ValidateRequest>(&mut req).await?;
let locked = get_state(&req).inner.read().await;
let mut response = ValidateResponse {
tenants: Vec::new(),
};
for req_tenant in validate_req.tenants {
// TODO(sharding): make this shard-aware
if let Some(tenant_state) = locked.tenants.get(&req_tenant.id.tenant_id) {
let valid = tenant_state.generation == req_tenant.gen;
tracing::info!(
"handle_validate: {}(gen {}): valid={valid} (latest {})",
req_tenant.id,
req_tenant.gen,
tenant_state.generation
);
response.tenants.push(ValidateResponseTenant {
id: req_tenant.id,
valid,
});
}
}
json_response(StatusCode::OK, response)
}
/// Call into this before attaching a tenant to a pageserver, to acquire a generation number
/// (in the real control plane this is unnecessary, because the same program is managing
/// generation numbers and doing attachments).
async fn handle_attach_hook(mut req: Request<Body>) -> Result<Response<Body>, ApiError> {
let attach_req = json_request::<AttachHookRequest>(&mut req).await?;
let state = get_state(&req).inner.clone();
let mut locked = state.write().await;
let tenant_state = locked
.tenants
.entry(attach_req.tenant_id)
.or_insert_with(|| TenantState {
pageserver: attach_req.node_id,
generation: 0,
});
if let Some(attaching_pageserver) = attach_req.node_id.as_ref() {
tenant_state.generation += 1;
tracing::info!(
tenant_id = %attach_req.tenant_id,
ps_id = %attaching_pageserver,
generation = %tenant_state.generation,
"issuing",
);
} else if let Some(ps_id) = tenant_state.pageserver {
tracing::info!(
tenant_id = %attach_req.tenant_id,
%ps_id,
generation = %tenant_state.generation,
"dropping",
);
} else {
tracing::info!(
tenant_id = %attach_req.tenant_id,
"no-op: tenant already has no pageserver");
}
tenant_state.pageserver = attach_req.node_id;
let generation = tenant_state.generation;
tracing::info!(
"handle_attach_hook: tenant {} set generation {}, pageserver {}",
attach_req.tenant_id,
tenant_state.generation,
attach_req.node_id.unwrap_or(utils::id::NodeId(0xfffffff))
);
locked.save().await.map_err(ApiError::InternalServerError)?;
json_response(
StatusCode::OK,
AttachHookResponse {
gen: attach_req.node_id.map(|_| generation),
},
)
}
async fn handle_inspect(mut req: Request<Body>) -> Result<Response<Body>, ApiError> {
let inspect_req = json_request::<InspectRequest>(&mut req).await?;
let state = get_state(&req).inner.clone();
let locked = state.write().await;
let tenant_state = locked.tenants.get(&inspect_req.tenant_id);
json_response(
StatusCode::OK,
InspectResponse {
attachment: tenant_state.and_then(|s| s.pageserver.map(|ps| (s.generation, ps))),
},
)
}
fn make_router(persistent_state: PersistentState) -> RouterBuilder<hyper::Body, ApiError> {
endpoint::make_router()
.data(Arc::new(State::new(persistent_state)))
.post("/re-attach", |r| request_span(r, handle_re_attach))
.post("/validate", |r| request_span(r, handle_validate))
.post("/attach-hook", |r| request_span(r, handle_attach_hook))
.post("/inspect", |r| request_span(r, handle_inspect))
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
logging::init(
LogFormat::Plain,
logging::TracingErrorLayerEnablement::Disabled,
logging::Output::Stdout,
)?;
let args = Cli::parse();
tracing::info!(
"Starting, state at {}, listening on {}",
args.path.to_string_lossy(),
args.listen
);
let persistent_state = PersistentState::load_or_new(&args.path).await;
let http_listener = tcp_listener::bind(args.listen)?;
let router = make_router(persistent_state)
.build()
.map_err(|err| anyhow!(err))?;
let service = utils::http::RouterService::new(router).unwrap();
let server = hyper::Server::from_tcp(http_listener)?.serve(service);
tracing::info!("Serving on {0}", args.listen);
tokio::task::spawn(server);
ShutdownSignals::handle(|signal| match signal {
Signal::Interrupt | Signal::Terminate | Signal::Quit => {
tracing::info!("Got {}. Terminating", signal.name());
// We're just a test helper: no graceful shutdown.
std::process::exit(0);
}
})?;
Ok(())
}

Some files were not shown because too many files have changed in this diff Show More