Compare commits

...

740 Commits

Author SHA1 Message Date
Alek Westover
9317aec580 not working 2023-07-19 11:21:56 -04:00
Alek Westover
f025f3adbf WIP 2023-07-19 10:35:29 -04:00
Alek Westover
b7a8eba4de revert to python script 2023-07-18 15:14:29 -04:00
Alek Westover
f8c69804f7 manually upload index file 2023-07-12 13:29:14 -04:00
Alek Westover
550d0ef946 get rid of container 2023-07-12 12:05:16 -04:00
Alek Westover
88c5c5b41e make it bash not python 2023-07-12 11:42:35 -04:00
Alek Westover
cc8bc91881 style python 2023-07-12 10:29:42 -04:00
Alek Westover
ebc48ed67c delete notes file 2023-07-12 10:26:44 -04:00
Alek Westover
a6a88d2b9f fix path 2023-07-12 10:24:30 -04:00
Alek Westover
7619a169c1 I guess poetry needs a containr to run int 2023-07-11 14:34:16 -04:00
Alek Westover
e9700d4022 fix style 2023-07-11 14:22:50 -04:00
Alek Westover
5d1c4d10fa I much preffer this organizational scheme 2023-07-11 14:21:06 -04:00
Alek Westover
4fecf82291 black style 2023-07-11 13:58:26 -04:00
Alek Westover
f37980a026 fix directory structure 2023-07-11 13:52:19 -04:00
Alek Westover
13fa5285af remove recursive flag 2023-07-11 12:53:12 -04:00
Alek Westover
f132904546 actually commnet out logic 2023-07-11 12:14:47 -04:00
Alek Westover
ad582470e4 fix bad thing 2023-07-11 10:47:22 -04:00
Alek Westover
de8b914bfe Merge branch 'zip_ext' of github.com:neondatabase/neon into zip_ext 2023-07-11 10:46:40 -04:00
Alek Westover
005e9eee3c upload files 2023-07-11 10:46:23 -04:00
Alek Westover
fdcbadd552 style fix 2023-07-10 18:47:12 -04:00
Alek Westover
e88a3ae640 Merge branch 'zip_ext' of github.com:neondatabase/neon into zip_ext 2023-07-10 18:42:55 -04:00
Alek Westover
8c634622cf try to put the files in the bucket 2023-07-10 18:42:45 -04:00
Alek Westover
7bfde488b6 fixed python script 2023-07-10 13:59:00 -04:00
Alek Westover
26a1daf1bf remember to create and upload index.json 2023-07-10 13:46:32 -04:00
Alek Westover
b06dcaa88a fix again 2023-07-10 13:13:46 -04:00
Alek Westover
4b96e19e44 move script file 2023-07-10 13:13:01 -04:00
Alek Westover
d33bfb130d fix pahs 2023-07-10 13:10:01 -04:00
Alek Westover
8a40044659 merge 2023-07-10 12:39:23 -04:00
Alek Westover
7ccdd0c14e fix typo 2023-07-10 12:38:22 -04:00
Alek Westover
ce14c50a4b try to create folder 2023-07-10 11:41:30 -04:00
Alek Westover
ed95dc1c7a try to do it with just copy-pasting a lot of code 2023-07-10 11:04:10 -04:00
Alek Westover
7f028266b7 control 2023-07-07 13:48:37 -04:00
Alek Westover
e308de8224 dont zip control files 2023-07-07 13:46:43 -04:00
Alek Westover
ae349c7874 fix syntax error 2023-07-07 13:40:55 -04:00
Alek Westover
807ca84e54 copy lib tar gz 2023-07-07 13:37:14 -04:00
Alek Westover
f07c03d5ff fix 2023-07-07 13:15:00 -04:00
Alek Westover
977572bff8 fix 2023-07-07 13:11:45 -04:00
Alek Westover
ed183d1720 change to targz 2023-07-07 11:24:30 -04:00
Alek Westover
afa797d534 zip anon extension. only upload zipped extensions 2023-07-07 11:14:03 -04:00
Dmitry Rodionov
20137d9588 Polish tracing helpers (#4651)
Context: comments here: https://github.com/neondatabase/neon/pull/4645
2023-07-06 19:49:14 +03:00
Arseny Sher
634be4f4e0 Fix async write in safekeepers.
General Rust Write trait semantics (as well as its async brother) is that write
definitely happens only after Write::flush(). This wasn't needed in sync where
rust write calls the syscall directly, but is required in async.

Also fix setting initial end_pos in walsender, sometimes it was from the future.

fixes https://github.com/neondatabase/neon/issues/4518
2023-07-06 19:56:28 +04:00
Alex Chi Z
d340cf3721 dump more info in layer map (#4567)
A simple commit extracted from
https://github.com/neondatabase/neon/pull/4539

This PR adds more info for layer dumps (is_delta, is_incremental, size).

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-07-06 18:21:45 +03:00
Konstantin Knizhnik
1741edf933 Pageserver reconnect v2 (#4519)
## Problem

Compute is not always able to reconnect to pages server.
First of all it caused by long time of restart of pageserver.
So number of attempts is increased from 5 (hardcoded) to 60 (GUC).
Also we do not perform flush after each command to increase performance
(it is especially critical for prefetch).
Unfortunately such pending flush makes it not possible to transparently
reconnect to restarted pageserver.
What we can do is to try to minimzie such probabilty.
Most likely broken connection will be detected in first sens command
after some idle period.
This is why max_flush_delay parameter is added which force flush to be
performed after first request after some idle period.

See #4497

## Summary of changes

Add neon.max_reconnect_attempts and neon.max_glush_delay GUCs which
contol when flush has to be done
and when it is possible to try to reconnect to page server


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-07-06 12:47:43 +03:00
Joonas Koivunen
269e20aeab fix: filter out zero synthetic sizes (#4639)
Apparently sending the synthetic_size == 0 is causing problems, so
filter out sending zeros.

Slack discussion:
https://neondb.slack.com/archives/C03F5SM1N02/p1688574285718989?thread_ts=1687874910.681049&cid=C03F5SM1N02
2023-07-06 12:32:34 +03:00
Tomoka Hayashi
91435006bd Fix docker-compose file and document (#4621)
## Problem

- Running the command according to docker.md gives warning and error.
- Warning `permissions should be u=rw (0600) or less` is output when
executing `psql -h localhost -p 55433 -U cloud_admin`.
- `FATAL: password authentication failed for user "root”` is output in
compute logs.

## Summary of changes

- Add `$ chmod 600 ~/.pgpass` in docker.md to avoid warning.
- Add username (cloud_admin) to pg_isready command in docker-compose.yml
to avoid error.

---------

Co-authored-by: Tomoka Hayashi <tomoka.hayashi@ntt.com>
2023-07-06 10:11:24 +01:00
Dmitry Rodionov
b263510866 move some logical size bits to separate logical_size.rs 2023-07-06 11:58:41 +03:00
Dmitry Rodionov
e418fc6dc3 move some tracing related assertions to separate module for tenant and timeline 2023-07-06 11:58:41 +03:00
Dmitry Rodionov
434eaadbe3 Move uninitialized timeline from tenant.rs to timeline/uninit.rs 2023-07-06 11:58:41 +03:00
Alexander Bayandin
6fb7edf494 Compile pg_embedding extension (#4634)
```
CREATE EXTENSION embedding;
CREATE TABLE t (val real[]);
INSERT INTO t (val) VALUES ('{0,0,0}'), ('{1,2,3}'), ('{1,1,1}'), (NULL);
CREATE INDEX ON t USING hnsw (val) WITH (maxelements = 10, dims=3, m=3);
INSERT INTO t (val) VALUES (array[1,2,4]);

SELECT * FROM t ORDER BY val <-> array[3,3,3];
   val   
---------
 {1,2,3}
 {1,2,4}
 {1,1,1}
 {0,0,0}
 
(5 rows)
```
2023-07-05 18:40:25 +01:00
Christian Schwarz
505aa242ac page cache: add size metrics (#4629)
Make them a member of `struct PageCache` to prepare for a future
where there's no global state.
2023-07-05 15:36:42 +03:00
arpad-m
1c516906e7 Impl Display for LayerFileName and require it for Layer (#4630)
Does three things:

* add a `Display` impl for `LayerFileName` equal to the `short_id`
* based on that, replace the `Layer::short_id` function by a requirement
for a `Display` impl
* use that `Display` impl in the places where the `short_id` and `file_name()` functions were used instead

Fixes #4145
2023-07-05 14:27:50 +02:00
Christian Schwarz
7d7cd8375c callers of task_mgr::spawn: some top-level async blocks were missing tenant/timeline id (#4283)
Looking at logs from staging and prod, I found there are a bunch of log
lines without tenant / timeline context.

Manully walk through all task_mgr::spawn lines and fix that using the
least amount of work required.

While doing it, remove some redundant `shutting down` messages. 

refs https://github.com/neondatabase/neon/issues/4222
2023-07-05 14:04:05 +02:00
Vadim Kharitonov
c92b7543b5 Update pgvector to 0.4.4 (#4632)
After announcing `hnsw`, there is a hypothesis that the community will
start comparing it with `pgvector` by themselves. Therefore, let's have
an actual version of `pgvector` in Neon.
2023-07-05 13:39:51 +03:00
Stas Kelvich
dbf88cf2d7 Minimalistic pool for http endpoint compute connections (under opt-in flag)
Cache up to 20 connections per endpoint. Once all pooled connections
are used current implementation can open an extra connection, so the
maximum number of simultaneous connections is not enforced.

There are more things to do here, especially with background clean-up
of closed connections, and checks for transaction state. But current
implementation allows to check for smaller coonection latencies that
this cache should bring.
2023-07-05 12:00:03 +03:00
Konstantin Knizhnik
f1db87ac36 Check if there is enough memory for HNSW index (#4602)
## Problem

HNSW index is created in memory.
Try to prevent OOM by checking of available RAM.


## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-07-05 11:40:38 +03:00
Christian Schwarz
3f9defbfb4 page cache: add access & hit rate metrics (#4628)
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
2023-07-05 10:38:32 +02:00
bojanserafimov
c7143dbde6 compute_ctl: Fix misleading metric (#4608) 2023-07-04 19:07:36 -04:00
Stas Kelvich
cbf9a40889 Set a shorter timeout for the initial connection attempti in proxy.
In case we try to connect to an outdated address that is no longer valid, the
default behavior of Kubernetes is to drop the packets, causing us to wait for
the entire timeout period. We want to fail fast in such cases.

A specific case to consider is when we have cached compute node information
with a 5-minute TTL (Time To Live), but the user has executed a `/suspend` API
call, resulting in the nonexistence of the compute node.
2023-07-04 20:34:22 +03:00
Joonas Koivunen
10aba174c9 metrics: Remove comments regarding upgradeable rwlocks (#4622)
Closes #4001 by removing the comments alluding towards
upgradeable/downgradeable RwLocks.
2023-07-04 17:40:51 +03:00
Conrad Ludgate
ab2ea8cfa5 use pbkdf2 crate (#4626)
## Problem

While pbkdf2 is a simple algorithm, we should probably use a well tested
implementation

## Summary of changes

* Use pbkdf2 crate
* Use arrays like the hmac comment says

## Checklist before requesting a review

- [X] I have performed a self-review of my code.
- [X] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-07-04 14:54:59 +01:00
arpad-m
9c8c55e819 Add _cached and _bytes to pageserver_tenant_synthetic_size metric name (#4616)
This renames the `pageserver_tenant_synthetic_size` metric to
`pageserver_tenant_synthetic_cached_size_bytes`, as was requested on
slack (link in the linked issue).

* `_cached` to hint that it is not incrementally calculated
* `_bytes` to indicate the unit the size is measured in

Fixes #3748
2023-07-03 19:34:07 +02:00
Conrad Ludgate
10110bee69 fix setup instructions (#4615)
## Problem

1. The local endpoints provision 2 ports (postgres and HTTP) which means
the migration_check endpoint has a different port than what the setup
implies
2. psycopg2-binary 2.9.3 has a deprecated poetry config and doesn't
install.

## Summary of changes

Update psycopg2-binary and update the endpoint ports in the readme

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-07-03 18:10:44 +03:00
Joonas Koivunen
cff7ae0b0d fix: no more ansi colored logs (#4613)
Allure does not support ansi colored logs, yet `compute_ctl` has them.

Upgrade criterion to get rid of atty dependency, disable ansi colors,
remove atty dependency and disable ansi feature of tracing-subscriber.

This is a heavy-handed approach. I am not aware of a workflow where
you'd want to connect a terminal directly to for example `compute_ctl`,
usually you find the logs in a file. If someone had been using colors,
they will now need to:
- turn the `tracing-subscriber.default-features` to `true`
- edit their wanted project to have colors

I decided to explicitly disable ansi colors in case we would have in
future a dependency accidentally enabling the feature on
`tracing-subscriber`, which would be quite surprising but not
unimagineable.

By getting rid of `atty` from dependencies we get rid of
<https://github.com/advisories/GHSA-g98v-hv3f-hcfr>.
2023-07-03 16:37:02 +03:00
Alexander Bayandin
78a7f68902 Make pg_version and build_type regular parameters (#4311)
## Problem

All tests have already been parametrised by Postgres version and build
type (to have them distinguishable in the Allure report), but despite
it, it's anyway required to have DEFAULT_PG_VERSION and BUILD_TYPE env
vars set to corresponding values, for example to
run`test_timeline_deletion_with_files_stuck_in_upload_queue[release-pg14-local_fs]`
test it's required to set `DEFAULT_PG_VERSION=14` and
`BUILD_TYPE=release`.
This PR makes the test framework pick up parameters from the test name
itself.

## Summary of changes
- Postgres version and build type related fixtures now are
function-scoped (instead of being sessions scoped before)
- Deprecate `--pg-version` argument in favour of DEFAULT_PG_VERSION env
variable (it's easier to parse)
- GitHub autocomment now includes only one command with all the failed
tests + runs them in parallel
2023-07-03 13:51:40 +01:00
Christian Schwarz
24eaa3b7ca timeline creation: reflect failures due to ancestor LSN issues in status code (#4600)
Before, it was a `500` and control plane would retry, wereas it actually
should have stopped retrying.

(Stacked on top of https://github.com/neondatabase/neon/pull/4597 )

fixes https://github.com/neondatabase/neon/issues/4595
part of https://github.com/neondatabase/cloud/issues/5626

---------

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-07-03 15:21:10 +03:00
Shany Pozin
26828560a8 Add timeouts and retries to consumption metrics reporting client (#4563)
## Problem
#4528
## Summary of changes
Add a 60 seconds default timeout to the reqwest client 
Add retries for up to 3 times to call into the metric consumption
endpoint

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-07-03 15:20:49 +03:00
Alek Westover
86604b3b7d Delete Unnecessary files in Extension Bucket (#4606)
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-07-03 07:37:30 -04:00
Joonas Koivunen
4957bb2d48 fix(proxy): stray span enter globbers up logs (#4612)
Prod logs have deep accidential span nesting. Introduced in #3759 and
has been untouched since, maybe no one watches proxy logs? :) I found it
by accident when looking to see if proxy logs have ansi colors with
`{neon_service="proxy"}`.

The solution is to mostly stop using `Span::enter` or `Span::entered` in
async code. Kept on `Span::entered` in cancel on shutdown related path.
2023-07-03 11:53:57 +01:00
Sasha Krassovsky
ff1a1aea86 Make control plane connector send encrypted password (#4607)
Control plane needs the encrypted hash that postgres itself generates
2023-06-30 14:17:44 -07:00
Em Sharnoff
c9f05d418d Bump vm-builder v0.11.0 -> v0.11.1 (#4605)
This applies the fix from https://github.com/neondatabase/autoscaling/pull/367,
which should resolve the "leaking cloud_admin connections" issue that
has been  observed for some customers.
2023-06-30 23:49:06 +03:00
bojanserafimov
9de1a6fb14 cold starts: Run sync_safekeepers on compute_ctl shutdown (#4588) 2023-06-30 16:29:47 -04:00
Konstantin Knizhnik
fbd37740c5 Make file_cache logging less verbose (#4601)
## Problem


Message "set local file cache limit to..." polutes compute logs.

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-06-30 21:55:56 +03:00
Anastasia Lubennikova
3e55d9dec6 Bump vendor/postgres
Point to REL_14_STABLE_neon and REL_15_STABLE_neon.
This change updates PostgreSQL versions to 14.8 and 15.3
2023-06-30 16:29:10 +03:00
Christian Schwarz
f558f88a08 refactor: distinguished error type for timeline creation failure (#4597)
refs https://github.com/neondatabase/neon/issues/4595
2023-06-30 14:53:21 +02:00
Dmitry Rodionov
b990200496 tests: use shortcut everywhere to get timeline path (#4586) 2023-06-30 15:01:06 +03:00
Alex Chi Z
7e20b49da4 refactor: use LayerDesc in LayerMap (part 2) (#4437)
## Problem

part of https://github.com/neondatabase/neon/issues/4392, continuation
of https://github.com/neondatabase/neon/pull/4408

## Summary of changes

This PR removes all layer objects from LayerMap and moves it to the
timeline struct. In timeline struct, LayerFileManager maps a layer
descriptor to a layer object, and it is stored in the same RwLock as
LayerMap to avoid behavior difference.

Key changes:

* LayerMap now does not have generic, and only stores descriptors.
* In Timeline, we add a new struct called layer mapping.
* Currently, layer mapping is stored in the same lock with layer map.
Every time we retrieve data from the layer map, we will need to map the
descriptor to the actual object.
* Replace_historic is moved to layer mapping's replace, and the return
value behavior is different from before. I'm a little bit unsure about
this part and it would be good to have some comments on that.
* Some test cases are rewritten to adapt to the new interface, and we
can decide whether to remove it in the future because it does not make
much sense now.
* LayerDescriptor is moved to `tests` module and should only be intended
for unit testing / benchmarks.
* Because we now have a usage pattern like "take the guard of lock, then
get the reference of two fields", we want to avoid dropping the
incorrect object when we intend to unlock the lock guard. Therefore, a
new set of helper function `drop_r/wlock` is added. This can be removed
in the future when we finish the refactor.

TODOs after this PR: fully remove RemoteLayer, and move LayerMapping to
a separate LayerCache.

all refactor PRs:

```
#4437 --- #4479 ------------ #4510 (refactor done at this point)
      \-- #4455 -- #4502 --/
```

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-29 15:06:07 -04:00
Alek Westover
032b603011 Fix: Wrong Enum Variant (#4589) 2023-06-29 10:55:02 -04:00
Alex Chi Z
ca0e0781c8 use const instead of magic number for repartition threshold (#4286)
There is a magic number about how often we repartition and therefore
affecting how often we compact. This PR makes this number `10` a global
constant and add docs.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-29 16:56:17 +03:00
Alexander Bayandin
b2a5e91a88 Upload custom extensions to S3 (#4585)
## Problem

We want to have a number of custom extensions that will not be available
by default, as an example of such is [Postgres
Anonymizer](https://postgresql-anonymizer.readthedocs.io/en/stable/)
(please note that the extension should be added to
`shared_preload_libraries`). To distinguish them, custom extensions
should be added to a different S3 path:

```
s3://<bucket>/<release version>/<postgres_version>/<ext_name>/share/extensions/
s3://<bucket>/<release version>/<postgres_version>/<ext_name>/lib

where <ext_name> is an extension name
```

Resolves https://github.com/neondatabase/neon/issues/4582

## Summary of changes
- Add Postgres Anonymizer extension to Dockerfile (it's included only to
postgres-extensions target)
- Build extensions image from postgres-extensions target in a workflow
- Upload custom extensions to S3 (different directory)
2023-06-29 16:33:26 +03:00
Joonas Koivunen
44e7d5132f fix: hide token from logs (#4584)
fixes #4583 and also changes all needlessly arg listing places to use
`skip_all`.
2023-06-29 15:53:16 +03:00
Alex Chi Z
c19681bc12 neon_local: support force init (#4363)
When we use local SSD for bench and create `.neon` directory before we
do `cargo neon init`, the initialization process will error due to
directory already exists. This PR adds a flag `--force` that removes
everything inside the directory if `.neon` already exists.

---------

Signed-off-by: Alex Chi Z. <chi@neon.tech>
2023-06-28 11:39:07 -04:00
Shany Pozin
ec9b585837 Add Activating as a possible state for attaching a tenant in test_tenant_relocation validation (#4581)
Fix flakyness of test_tenant_relocation
2023-06-28 12:35:52 +03:00
Joonas Koivunen
02ef246db6 refactor: to pattern of await after timeout (#4432)
Refactor the `!completed` to be about `Option<_>` instead, side-stepping
any boolean true/false or false/true. As discussed on
https://github.com/neondatabase/neon/pull/4399#discussion_r1219321848
2023-06-28 06:18:45 +00:00
Konstantin Knizhnik
195d4932c6 Set LwLSN after WAL records when redo is performed or skipped (#4579)
## Problem

See #4516

Inspecting log it is possible to notice that if lwlsn is set to the
beginning of applied WAL record, then
incorrect version of the page is loaded:

```
2023-06-27 18:36:51.930 GMT [3273945] CONTEXT:  WAL redo at 0/14AF6F0 for Heap/INSERT: off 2 flags 0x01; blkref #0: rel 1663/5/1259, blk 0 FPW
2023-06-27 18:36:51.930 GMT [3273945] LOG:  Do REDO block 0 of rel 1663/5/1259 fork 0 at LSN 0/**014AF6F0**..0/014AFA60
2023-06-27 18:37:02.173 GMT [3273963] LOG:  Read blk 0 in rel 1663/5/1259 fork 0 (request LSN 0/**014AF6F0**): lsn=0/**0143C7F8** at character 22
2023-06-27 18:37:47.780 GMT [3273945] LOG:  apply WAL record at 0/1BB8F38 xl_tot_len=188, xl_prev=0/1BB8EF8 
2023-06-27 18:37:47.780 GMT [3273945] CONTEXT:  WAL redo at 0/1BB8F38 for Heap/INPLACE: off 2; blkref #0: rel 1663/5/1259, blk 0
2023-06-27 18:37:47.780 GMT [3273945] LOG:  Do REDO block 0 of rel 1663/5/1259 fork 0 at LSN 0/01BB8F38..0/01BB8FF8
2023-06-27 18:37:47.780 GMT [3273945] CONTEXT:  WAL redo at 0/1BB8F38 for Heap/INPLACE: off 2; blkref #0: rel 1663/5/1259, blk 0
2023-06-27 18:37:47.780 GMT [3273945] PANIC:  invalid lp
```

## Summary of changes

1. Use end record LSN  for both cases
2. Update lwlsn for relation metadata

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-28 09:04:13 +03:00
Alexander Bayandin
7fe0a4bf1a Fix promote-images job (#4577)
## Problem

```
+ crane tag neondatabase/extensions:3337 latest
Error: fetching "neondatabase/extensions:3337": GET https://index.docker.io/v2/neondatabase/extensions/manifests/3337: MANIFEST_UNKNOWN: manifest unknown; unknown tag=3337
```

We don't build `neondatabase/extensions` image yet (broken in
https://github.com/neondatabase/neon/pull/4505)

## Summary of changes
- Do not try to interact with `neondatabase/extensions`
2023-06-27 20:05:10 +03:00
bojanserafimov
ef2b9ffbcb Basebackup forward compatibility (#4572) 2023-06-27 12:05:27 -04:00
Alexander Bayandin
250a27fb85 Upload Postgres Extensions to S3 (#4505)
## Problem

We want to store Postgres Extensions in S3 (resolves
https://github.com/neondatabase/neon/issues/4493).

Proposed solution:
- Create a separate docker image (from scratch) that contains only
extensions
- Do not include extensions into compute-node (except for neon
extensions)*
- For release and main builds upload extract extension from the image
and upload to S3 (`s3://<bucket>/<release
version>/<postgres_version>/...`)**

*) We're not doing it until the feature is not fully implemented
**) This differs from the initial proposal in
https://github.com/neondatabase/neon/issues/4493 of putting extensions
straight into `s3://<bucket>/<postgres_version>/...`, because we can't
upload directory atomicly. A drawback of this is that we end up with
unnecessary copies of files ~2.1 GB per release (i.e. +2.1 GB for each
commit in main also). We don't really need to update extensions for each
release if there're no relevant changes, but this requires extra work.

## Summary of changes
- Created a separate stage in Dockerfile.compute-node
`postgres-extensions` that contains only extensions
- Added a separate step in a workflow that builds `postgres-extensions`
image (because of a bug in kaniko this step is commented out because it
takes way too long to get built)
- Extract extensions from the image and upload files to S3 for release
and main builds
- Upload extenstions only for staging (for now)
2023-06-27 16:23:22 +01:00
Shany Pozin
d748615c1f RemoteTimelineClient::delete_all() to use s3::delete_objects (#4461)
## Problem
[#4325](https://github.com/neondatabase/neon/issues/4325)
## Summary of changes
Use delete_objects() method
2023-06-27 15:01:32 +03:00
Dmitry Rodionov
681c6910c2 Straighten the spec for timeline delete (#4538)
## Problem

Lets keep 500 for unusual stuff that is not considered normal. Came up
during one of the discussions around console logs now seeing this 500's.

## Summary of changes

- Return 409 Conflict instead of 500
- Remove 200 ok status because it is not used anymore
2023-06-27 13:56:32 +03:00
Vadim Kharitonov
148f0f9b21 Compile pg_roaringbitmap extension (#4564)
## Problem

```
postgres=# create extension roaringbitmap;
CREATE EXTENSION
postgres=# select roaringbitmap('{1,100,10}');
                 roaringbitmap
------------------------------------------------
 \x3a30000001000000000002001000000001000a006400
(1 row)
```
2023-06-27 10:55:03 +01:00
Shany Pozin
a7f3f5f356 Revert "run Layer::get_value_reconstruct_data in spawn_blocking#4498" (#4569)
This reverts commit 1faf69a698.
2023-06-27 10:57:28 +03:00
Felix Prasanna
00d1cfa503 bump VM_BUILDER_VERSION to 0.11.0 (#4566)
Routine bump of autoscaling version `0.8.0` -> `0.11.0`
2023-06-26 14:10:27 -04:00
Christian Schwarz
1faf69a698 run Layer::get_value_reconstruct_data in spawn_blocking (#4498)
This PR concludes the "async `Layer::get_value_reconstruct_data`"
project.

The problem we're solving is that, before this patch, we'd execute
`Layer::get_value_reconstruct_data` on the tokio executor threads.
This function is IO- and/or CPU-intensive.
The IO is using VirtualFile / std::fs; hence it's blocking.
This results in unfairness towards other tokio tasks, especially under
(disk) load.

Some context can be found at
https://github.com/neondatabase/neon/issues/4154
where I suspect (but can't prove) load spikes of logical size
calculation to
cause heavy eviction skew.

Sadly we don't have tokio runtime/scheduler metrics to quantify the
unfairness.
But generally, we know blocking the executor threads on std::fs IO is
bad.
So, let's have this change and watch out for severe perf regressions in
staging & during rollout.

## Changes

* rename `Layer::get_value_reconstruct_data` to
`Layer::get_value_reconstruct_data_blocking`
* add a new blanket impl'd `Layer::get_value_reconstruct_data`
`async_trait` method that runs `get_value_reconstruct_data_blocking`
inside `spawn_blocking`.
* The `spawn_blocking` requires `'static` lifetime of the captured
variables; hence I had to change the data flow to _move_ the
`ValueReconstructState` into and back out of get_value_reconstruct_data
instead of passing a reference. It's a small struct, so I don't expect a
big performance penalty.

## Performance

Fundamentally, the code changes cause the following performance-relevant
changes:

* Latency & allocations: each `get_value_reconstruct_data` call now
makes a short-lived allocation because `async_trait` is just sugar for
boxed futures under the hood
* Latency: `spawn_blocking` adds some latency because it needs to move
the work to a thread pool
* using `spawn_blocking` plus the existing synchronous code inside is
probably more efficient better than switching all the synchronous code
to tokio::fs because _each_ tokio::fs call does `spawn_blocking` under
the hood.
* Throughput: the `spawn_blocking` thread pool is much larger than the
async executor thread pool. Hence, as long as the disks can keep up,
which they should according to AWS specs, we will be able to deliver
higher `get_value_reconstruct_data` throughput.
* Disk IOPS utilization: we will see higher disk utilization if we get
more throughput. Not a problem because the disks in prod are currently
under-utilized, according to node_exporter metrics & the AWS specs.
* CPU utilization: at higher throughput, CPU utilization will be higher.

Slightly higher latency under regular load is acceptable given the
throughput gains and expected better fairness during disk load peaks,
such as logical size calculation peaks uncovered in #4154.


## Full Stack Of Preliminary PRs

This PR builds on top of the following preliminary PRs

1. Clean-ups
  * https://github.com/neondatabase/neon/pull/4316
  * https://github.com/neondatabase/neon/pull/4317
  * https://github.com/neondatabase/neon/pull/4318
  * https://github.com/neondatabase/neon/pull/4319
  * https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
thought we'd need in my original plan where we would need to convert
`Tenant::timelines` into an async locking primitive (#4333). In reviews,
we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485
7. https://github.com/neondatabase/neon/pull/4441
2023-06-26 11:43:11 +02:00
Christian Schwarz
44a441080d bring back spawn_blocking for compact_level0_phase1 (#4537)
The stats for `compact_level0_phase` that I added in #4527 show the
following breakdown (24h data from prod, only looking at compactions
with > 1 L1 produced):

* 10%ish of wall-clock time spent between the two read locks
* I learned that the `DeltaLayer::iter()` and `DeltaLayer::key_iter()`
calls actually do IO, even before we call `.next()`. I suspect that is
why they take so much time between the locks.
* 80+% of wall-clock time spent writing layer files
* Lock acquisition time is irrelevant (low double-digit microseconds at
most)
* The generation of the holes holds the read lock for a relatively long
time and it's proportional to the amount of keys / IO required to
iterate over them (max: 110ms in prod; staging (nightly benchmarks):
multiple seconds).

Find below screenshots from my ad-hoc spreadsheet + some graphs.

<img width="1182" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/81398b3f-6fa1-40dd-9887-46a4715d9194">

<img width="901" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/e4ac0393-f2c1-4187-a5e5-39a8b0c394c9">

<img width="210" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/7977ade7-6aa5-4773-a0a2-f9729aecee0d">


## Changes In This PR

This PR makes the following changes:

* rearrange the `compact_level0_phase1` code such that we build the
`all_keys_iter` and `all_values_iter` later than before
* only grab the `Timeline::layers` lock once, and hold it until we've
computed the holes
* run compact_level0_phase1 in spawn_blocking, pre-grabbing the
`Timeline::layers` lock in the async code and passing it in as an
`OwnedRwLockReadGuard`.
* the code inside spawn_blocking drops this guard after computing the
holds
* the `OwnedRwLockReadGuard` requires the `Timeline::layers` to be
wrapped in an `Arc`. I think that's Ok, the locking for the RwLock is
more heavy-weight than an additional pointer indirection.
 
## Alternatives Considered

The naive alternative is to throw the entire function into
`spawn_blocking`, and use `blocking_read` for `Timeline::layers` access.

What I've done in this PR is better because, with this alternative,
1. while we `blocking_read()`, we'd waste one slot in the spawn_blocking
pool
2. there's deadlock risk because the spawn_blocking pool is a finite
resource


![image](https://github.com/neondatabase/neon/assets/956573/46c419f1-6695-467e-b315-9d1fc0949058)

## Metadata

Fixes https://github.com/neondatabase/neon/issues/4492
2023-06-26 11:42:17 +02:00
Sasha Krassovsky
c215389f1c quote_ident identifiers when creating neon_superuser (#4562)
## Problem
2023-06-24 10:34:15 +03:00
Sasha Krassovsky
b1477b4448 Create neon_superuser role, grant it to roles created from control plane (#4425)
## Problem
Currently, if a user creates a role, it won't by default have any grants
applied to it. If the compute restarts, the grants get applied. This
gives a very strange UX of being able to drop roles/not have any access
to anything at first, and then once something triggers a config
application, suddenly grants are applied. This removes these grants.
2023-06-24 01:38:27 +03:00
Christian Schwarz
a500bb06fb use preinitialize_metrics to initialize page cache metrics (#4557)
This is follow-up to

```
commit 2252c5c282
Author: Alex Chi Z <iskyzh@gmail.com>
Date:   Wed Jun 14 17:12:34 2023 -0400

    metrics: convert some metrics to pageserver-level (#4490)
```
2023-06-23 16:40:50 -04:00
Christian Schwarz
15456625c2 don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker (#4560)
The consumption metrics synthetic size worker does logical size
calculation. Logical size calculation currently does synchronous disk
IO. This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving
other futures.

While there's work on the way to move the synchronous disk IO into
spawn_blocking, the quickfix here is to use the BACKGROUND_RUNTIME
instead of MGMT_REQUEST_RUNTIME.

Actually it's not just a quickfix. We simply shouldn't be blocking
MGMT_REQUEST_RUNTIME executor threads on CPU or sync disk IO.
That work isn't done yet, as many of the mgmt tasks still _do_ disk IO.
But it's not as intensive as the logical size calculations that we're
fixing here.

While we're at it, fix disk-usage-based eviction in a similar way. It
wasn't the culprit here, according to prod logs, but it can
theoretically be a little CPU-intensive.

More context, including graphs from Prod:
https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949
2023-06-23 15:40:36 -04:00
Vadim Kharitonov
a3f0dd2d30 Compile pg_uuidv7 (#4558)
Doc says that it should be added into `shared_preload_libraries`, but,
practically, it's not required.

```
postgres=# create extension pg_uuidv7;
CREATE EXTENSION
postgres=# SELECT uuid_generate_v7();
           uuid_generate_v7
--------------------------------------
 0188e823-3f8f-796c-a92c-833b0b2d1746
(1 row)
```
2023-06-23 15:56:49 +01:00
Christian Schwarz
76718472be add pageserver-global histogram for basebackup latency (#4559)
The histogram distinguishes by ok/err.

I took the liberty to create a small abstraction for such use cases.
It helps keep the label values inside `metrics.rs`, right next
to the place where the metric and its labels are declared.
2023-06-23 16:42:12 +02:00
Alexander Bayandin
c07b6ffbdc Fix git tag name for release (#4545)
## Problem

A git tag for a release has an extra `release-` prefix (it looks like
`release-release-3439`).
 
## Summary of changes
- Do not add `release-` prefix when create git tag
2023-06-23 12:52:17 +01:00
Alexander Bayandin
6c3605fc24 Nightly Benchmarks: Increase timeout for pgbench-compare job (#4551)
## Problem

In the test environment vacuum duration fluctuates from ~1h to ~5h, along
with another two 1h benchmarks (`select-only` and `simple-update`) it
could be up to 7h which is longer than 6h timeout.

## Summary of changes
- Increase timeout for pgbench-compare job to 8h
- Remove 6h timeouts from Nightly Benchmarks (this is a default value)
2023-06-23 12:47:37 +01:00
Vadim Kharitonov
d96d51a3b7 Update rust to 1.70.0 (#4550) 2023-06-23 13:09:04 +02:00
Alex Chi Z
a010b2108a pgserver: better template config file (#4554)
* `compaction_threshold` should be an integer, not a string.
* uncomment `[section]` so that if a user needs to modify the config,
they can simply uncomment the corresponding line. Otherwise it's easy
for us to forget uncommenting the `[section]` when uncommenting the
config item we want to configure.

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-23 10:18:06 +03:00
Anastasia Lubennikova
2f618f46be Use BUILD_TAG in compute_ctl binary. (#4541)
Pass BUILD_TAG to compute_ctl binary. 
We need it to access versioned extension storage.
2023-06-22 17:06:16 +03:00
Alexander Bayandin
d3aa8a48ea Update client libs for test_runner/pg_clients to their latest versions (#4547)
Resolves https://github.com/neondatabase/neon/security/dependabot/27
2023-06-21 16:20:35 +01:00
Christian Schwarz
e4da76f021 update_gc_info: fix typo in timeline_id tracing field (#4546)
Commit

```
commit 472cc17b7a
Author: Dmitry Rodionov <dmitry@neon.tech>
Date:   Thu Jun 15 17:30:12 2023 +0300

    propagate lock guard to background deletion task (#4495)
```

did a drive-by fix, but, the drive-by had a typo.

```
gc_loop{tenant_id=2e2f2bff091b258ac22a4c4dd39bd25d}:update_gc_info{timline_id=837c688fd37c903639b9aa0a6dd3f1f1}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000024DA0D1-000000000443FB51}:panic{thread=background op worker location=pageserver/src/tenant/timeline.rs:4843:25}: missing extractors: ["TimelineId"]

Stack backtrace:
   0: utils::logging::tracing_panic_hook
             at /libs/utils/src/logging.rs:166:21
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:2002:9
   2: std::panicking::rust_panic_with_hook
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:692:13
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:579:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys_common/backtrace.rs:137:18
   5: rust_begin_unwind
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5
   6: core::panicking::panic_fmt
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14
   7: pageserver::tenant::timeline::debug_assert_current_span_has_tenant_and_timeline_id
             at /pageserver/src/tenant/timeline.rs:4843:25
   8: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0}
             at /pageserver/src/tenant/timeline.rs:4368:9
   9: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0}> as core::future::future::Future>::poll
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  10: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}
             at /pageserver/src/tenant/timeline.rs:4363:5
  11: <pageserver::tenant::timeline::Timeline>::get_reconstruct_data::{closure#0}
             at /pageserver/src/tenant/timeline.rs:2618:69
  12: <pageserver::tenant::timeline::Timeline>::get::{closure#0}
             at /pageserver/src/tenant/timeline.rs:565:13
  13: <pageserver::tenant::timeline::Timeline>::list_slru_segments::{closure#0}
             at /pageserver/src/pgdatadir_mapping.rs:427:42
  14: <pageserver::tenant::timeline::Timeline>::is_latest_commit_timestamp_ge_than::{closure#0}
             at /pageserver/src/pgdatadir_mapping.rs:390:13
  15: <pageserver::tenant::timeline::Timeline>::find_lsn_for_timestamp::{closure#0}
             at /pageserver/src/pgdatadir_mapping.rs:338:17
  16: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0}
             at /pageserver/src/tenant/timeline.rs:3967:71
  17: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0}> as core::future::future::Future>::poll
             at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  18: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}
             at /pageserver/src/tenant/timeline.rs:3948:5
  19: <pageserver::tenant::Tenant>::refresh_gc_info_internal::{closure#0}
             at /pageserver/src/tenant.rs:2687:21
  20: <pageserver::tenant::Tenant>::gc_iteration_internal::{closure#0}
             at /pageserver/src/tenant.rs:2551:13
  21: <pageserver::tenant::Tenant>::gc_iteration::{closure#0}
             at /pageserver/src/tenant.rs:1490:13
  22: pageserver::tenant::tasks::gc_loop::{closure#0}::{closure#0}
             at /pageserver/src/tenant/tasks.rs:187:21
  23: pageserver::tenant::tasks::gc_loop::{closure#0}
             at /pageserver/src/tenant/tasks.rs:208:5
```

## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-21 18:00:14 +03:00
Christian Schwarz
870740c949 cargo update -p openssl (#4542)
To unblock release
https://github.com/neondatabase/neon/pull/4536#issuecomment-1600678054

Context: https://rustsec.org/advisories/RUSTSEC-2023-0044
2023-06-21 15:50:52 +03:00
Dmitry Rodionov
75d583c04a Tenant::load: fix uninit timeline marker processing (#4458)
## Problem

During timeline creation we create special mark file which presense
indicates that initialization didnt complete successfully. In case of a
crash restart we can remove such half-initialized timeline and following
retry from control plane side should perform another attempt.

So in case of a possible crash restart during initial loading we have
following picture:

```
timelines
| - <timeline_id>___uninit
| - <timeline_id>
| - | <timeline files>
```

We call `std::fs::read_dir` to walk files in `timelines` directory one
by one. If we see uninit file
we proceed with deletion of both, timeline directory and uninit file. If
we see timeline we check if uninit file exists and do the same cleanup.
But in fact its possible to get both branches to be true at the same
time. Result of readdir doesnt reflect following directory state
modifications. So you can still get "valid" entry on the next iteration
of the loop despite the fact that it was deleted in one of the previous
iterations of the loop.

To see that you can apply the following patch (it disables uninit mark
cleanup on successful timeline creation):
```diff
diff --git a/pageserver/src/tenant.rs b/pageserver/src/tenant.rs
index 4beb2664..b3cdad8f 100644
--- a/pageserver/src/tenant.rs
+++ b/pageserver/src/tenant.rs
@@ -224,11 +224,6 @@ impl UninitializedTimeline<'_> {
                             )
                         })?;
                 }
-                uninit_mark.remove_uninit_mark().with_context(|| {
-                    format!(
-                        "Failed to remove uninit mark file for timeline {tenant_id}/{timeline_id}"
-                    )
-                })?;
                 v.insert(Arc::clone(&new_timeline));
 
                 new_timeline.maybe_spawn_flush_loop();
```
And perform the following steps:

```bash
neon_local init
neon_local start
neon_local tenant create
neon_local stop
neon_local start
```

The error is:
```log
INFO load{tenant_id=X}:blocking: Found an uninit mark file .neon/tenants/X/timelines/Y.___uninit, removing the timeline and its uninit mark
2023-06-09T18:43:41.664247Z ERROR load{tenant_id=X}: load failed, setting tenant state to Broken: failed to load metadata

Caused by:
    0: Failed to read metadata bytes from path .neon/tenants/X/timelines/Y/metadata
    1: No such file or directory (os error 2)
```

So uninit mark got deleted together with timeline directory but we still
got directory entry for it and tried to load it.

The bug prevented tenant from being successfully loaded.

## Summary of changes

Ideally I think we shouldnt place uninit marks in the same directory as timeline directories but move them to separate directory and
gather them as an input to actual listing, but that would be sort of an
on-disk format change, so just check whether entries are still valid
before operating on them.
2023-06-21 14:25:58 +03:00
Alek Westover
b4c5beff9f list_files function in remote_storage (#4522) 2023-06-20 15:36:28 -04:00
bojanserafimov
90e1f629e8 Add test for skip_pg_catalog_updates (#4530) 2023-06-20 11:38:59 -04:00
Alek Westover
2023e22ed3 Add RelationError error type to pageserver rather than string parsing error messages (#4508) 2023-06-19 13:14:20 -04:00
Christian Schwarz
036fda392f log timings for compact_level0_phase1 (#4527)
The data will help decide whether it's ok
to keep holding Timeline::layers in shared mode until
after we've calculated the holes.

Other timings are to understand the general breakdown
of timings in that function.

Context: https://github.com/neondatabase/neon/issues/4492
2023-06-19 17:25:57 +03:00
Arseny Sher
557abc18f3 Fix test_s3_wal_replay assertion flakiness.
Supposedly fixes https://github.com/neondatabase/neon/issues/4277
2023-06-19 16:08:20 +04:00
Arseny Sher
3b06a5bc54 Raise pageserver walreceiver timeouts.
I observe sporadic reconnections with ~10k idle computes. It looks like a
separate issue, probably walreceiver runtime gets blocked somewhere, but in any
case 2-3 seconds is too small.
2023-06-19 15:59:38 +04:00
Alexander Bayandin
1b947fc8af test_runner: workaround rerunfailures and timeout incompatibility (#4469)
## Problem

`pytest-timeout` and `pytest-rerunfailures` are incompatible (or rather
not fully compatible). Timeouts aren't set for reruns.

Ref https://github.com/pytest-dev/pytest-rerunfailures/issues/99

## Summary of changes
- Dynamically make timeouts `func_only` for tests that we're going to
retry. It applies timeouts for reruns as well.
2023-06-16 18:08:11 +01:00
Christian Schwarz
78082d0b9f create_delta_layer: avoid needless stat (#4489)
We already do it inside `frozen_layer.write_to_disk()`.

Context:
https://github.com/neondatabase/neon/pull/4441#discussion_r1228083959
2023-06-16 16:54:41 +02:00
Alexander Bayandin
190c3ba610 Add tags for releases (#4524)
## Problem

It's not a trivial task to find corresponding changes for a particular
release (for example, for 3371 — 🤷)

Ref:
https://neondb.slack.com/archives/C04BLQ4LW7K/p1686761537607649?thread_ts=1686736854.174559&cid=C04BLQ4LW7K

## Summary of changes
- Tag releases
- Add a manual trigger for the release workflow
2023-06-16 14:17:37 +01:00
Christian Schwarz
14d495ae14 create_delta_layer: improve misleading TODO comment (#4488)
Context:
https://github.com/neondatabase/neon/pull/4441#discussion_r1228086608
2023-06-16 14:23:55 +03:00
Dmitry Rodionov
472cc17b7a propagate lock guard to background deletion task (#4495)
## Problem

1. During the rollout we got a panic: "timeline that we were deleting
was concurrently removed from 'timelines' map" that was caused by lock
guard not being propagated to the background part of the deletion.
Existing test didnt catch it because failpoint that was used for
verification was placed earlier prior to background task spawning.
2. When looking at surrounding code one more bug was detected. We
removed timeline from the map before deletion is finished, which breaks
client retry logic, because it will indicate 404 before actual deletion
is completed which can lead to client stopping its retry poll earlier.

## Summary of changes

1. Carry the lock guard over to background deletion. Ensure existing
test case fails without applied patch (second deletion becomes stuck
without it, which eventually leads to a test failure).
2. Move delete_all call earlier so timeline is removed from the map is
the last thing done during deletion.

Additionally I've added timeline_id to the `update_gc_info` span,
because `debug_assert_current_span_has_tenant_and_timeline_id` in
`download_remote_layer` was firing when `update_gc_info` lead to
on-demand downloads via `find_lsn_for_timestamp` (caught by @problame).
This is not directly related to the PR but fixes possible flakiness.

Another smaller set of changes involves deletion wrapper used in python
tests. Now there is a simpler wrapper that waits for deletions to
complete `timeline_delete_wait_completed`. Most of the
test_delete_timeline.py tests make negative tests, i.e., "does
ps_http.timeline_delete() fail in this and that scenario".
These can be left alone. Other places when we actually do the deletions,
we need to use the helper that polls for completion.

Discussion
https://neondb.slack.com/archives/C03F5SM1N02/p1686668007396639

resolves #4496

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-06-15 17:30:12 +03:00
Arthur Petukhovsky
76413a0fb8 Revert reconnect_timeout to improve performance (#4512)
Default value for `wal_acceptor_reconnect_timeout` was changed in
https://github.com/neondatabase/neon/pull/4428 and it affected
performance up to 20% in some cases.

Revert the value back.
2023-06-15 15:26:59 +03:00
Alexander Bayandin
e60b70b475 Fix data ingestion scripts (#4515)
## Problem

When I switched `psycopg2.connect` from context manager to a regular
function call in https://github.com/neondatabase/neon/pull/4382 I 
embarrassingly forgot about commit, so it doesn't really put data into DB 😞

## Summary of changes
- Enable autocommit for data ingestion scripts
2023-06-15 15:01:06 +03:00
Alex Chi Z
2252c5c282 metrics: convert some metrics to pageserver-level (#4490)
## Problem

Some metrics are better to be observed at page-server level. Otherwise,
as we have a lot of tenants in production, we cannot do a sum b/c
Prometheus has limit on how many time series we can aggregate. This also
helps reduce metrics scraping size.

## Summary of changes

Some integration tests are likely not to pass as it will check the
existence of some metrics. Waiting for CI complete and fix them.

Metrics downgraded: page cache hit (where we are likely to have a
page-server level page cache in the future instead of per-tenant), and
reconstruct time (this would better be tenant-level, as we have one pg
replayer for each tenant, but now we make it page-server level as we do
not need that fine-grained data).

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-14 17:12:34 -04:00
Alexander Bayandin
94f315d490 Remove neon-image-depot job (#4506)
## Problem

`neon-image-depot` is an experimental job we use to compare with the
main `neon-image` job.
But it's not stable and right now we don't have the capacity to properly
fix and evaluate it.
We can come back to this later.

## Summary of changes

Remove `neon-image-depot` job
2023-06-14 19:03:09 +01:00
Christian Schwarz
cd3faa8c0c test_basic_eviction: avoid some sources of flakiness (#4504)
We've seen the download_layer() call return 304 in prod because of a
spurious on-demand download caused by a GetPage request from compute.

Avoid these and some other sources of on-demand downloads by shutting
down compute, SKs, and by disabling background loops.

CF
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4498/5258914461/index.html#suites/2599693fa27db8427603ba822bcf2a20/357808fd552fede3
2023-06-14 19:04:22 +02:00
Arthur Petukhovsky
a7a0c3cd27 Invalidate proxy cache in http-over-sql (#4500)
HTTP queries failed with errors `error connecting to server: failed to
lookup address information: Name or service not known\n\nCaused by:\n
failed to lookup address information: Name or service not known`

The fix reused cache invalidation logic in proxy from usual postgres
connections and added it to HTTP-over-SQL queries.

Also removed a timeout for HTTP request, because it almost never worked
on staging (50s+ time just to start the compute), and we can have the
similar case in production. Should be ok, since we have a limits for the
requests and responses.
2023-06-14 19:24:46 +03:00
Dmitry Rodionov
ee9a5bae43 Filter only active timelines for compaction (#4487)
Previously we may've included Stopping/Broken timelines here, which
leads to errors in logs -> causes tests to sporadically fail

resolves #4467
2023-06-14 19:07:42 +03:00
Alexander Bayandin
9484b96d7c GitHub Autocomment: do not fail the job (#4478)
## Problem

If the script fails to generate a test summary, the step also fails the
job/workflow (despite this could be a non-fatal problem).

## Summary of changes
- Separate JSON parsing and summarisation into separate functions
- Wrap functions calling into try..catch block, add an error message to
GitHub comment and do not fail the step
- Make `scripts/comment-test-report.js` a CLI script that can be run
locally (mock GitHub calls) to make it easier to debug issues locally
2023-06-14 15:07:30 +01:00
Shany Pozin
ebee8247b5 Move s3 delete_objects to use chunks of 1000 OIDs (#4463)
See
https://github.com/neondatabase/neon/pull/4461#pullrequestreview-1474240712
2023-06-14 15:38:01 +03:00
bojanserafimov
3164ad7052 compute_ctl: Spec parser forward compatibility test (#4494) 2023-06-13 21:48:09 -04:00
Alexander Bayandin
a0b3990411 Retry data ingestion scripts on connection errors (#4382)
## Problem

From time to time, we're catching a race condition when trying to upload
perf or regression test results.

Ref:
- https://neondb.slack.com/archives/C03H1K0PGKH/p1685462717870759
- https://github.com/neondatabase/cloud/issues/3686

## Summary of changes

Wrap `psycopg2.connect` method with `@backoff.on_exception`
contextmanager
2023-06-13 22:33:42 +01:00
Stas Kelvich
4385e0c291 Return more RowDescription fields via proxy json endpoint
As we aim to align client-side behavior with node-postgres, it's necessary for us to return these fields, because node-postgres does so as well.
2023-06-13 22:31:18 +03:00
Christian Schwarz
3693d1f431 turn Timeline::layers into tokio::sync::RwLock (#4441)
This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).

# Full Stack Of Preliminary PRs

Thanks to the countless preliminary PRs, this conversion is relatively
straight-forward.

1. Clean-ups
  * https://github.com/neondatabase/neon/pull/4316
  * https://github.com/neondatabase/neon/pull/4317
  * https://github.com/neondatabase/neon/pull/4318
  * https://github.com/neondatabase/neon/pull/4319
  * https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
   thought we'd need in my original plan where we would need to convert
   `Tenant::timelines` into an async locking primitive (#4333). In reviews,
   we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485

# Significant Changes In This PR

## `compact_level0_phase1` & `create_delta_layer`

This commit partially reverts

   "pgserver: spawn_blocking in compaction (#4265)"
    4e359db4c7.

Specifically, it reverts the `spawn_blocking`-ificiation of
`compact_level0_phase1`.
If we didn't revert it, we'd have to use `Timeline::layers.blocking_read()`
inside `compact_level0_phase1`. That would use up a thread in the
`spawn_blocking` thread pool, which is hard-capped.

I considered wrapping the code that follows the second
`layers.read().await` into `spawn_blocking`, but there are lifetime
issues with `deltas_to_compact`.

Also, this PR switches the `create_delta_layer` _function_ back to
async, and uses `spawn_blocking` inside to run the code that does sync
IO, while keeping the code that needs to lock `Timeline::layers` async.

## `LayerIter` and `LayerKeyIter` `Send` bounds

I had to add a `Send` bound on the `dyn` type that `LayerIter`
and `LayerKeyIter` wrap. Why? Because we now have the second
`layers.read().await` inside `compact_level0_phase`, and these
iterator instances are held across that await-point.

More background:
https://github.com/neondatabase/neon/pull/4462#issuecomment-1587376960

## `DatadirModification::flush`

Needed to replace the `HashMap::retain` with a hand-rolled variant
because `TimelineWriter::put` is now async.
2023-06-13 18:38:41 +02:00
Christian Schwarz
fdf7a67ed2 init_empty_layer_map: use try_write (#4485)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
Or more specifically, #4441, where we turn Timeline::layers into a
tokio::sync::RwLock.

By using try_write() here, we can avoid turning init_empty_layer_map
async,
which is nice because much of its transitive call(er) graph isn't async.
2023-06-13 13:49:40 +02:00
Alexey Kondratov
1299df87d2 [compute_ctl] Fix logging if catalog updates are skipped (#4480)
Otherwise, it wasn't clear from the log when Postgres started up
completely if catalog updates were skipped.

Follow-up for 4936ab6
2023-06-13 13:34:56 +02:00
Christian Schwarz
754ceaefac make TimelineWriter Send by using tokio::sync Mutex internally (#4477)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).

There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.

That will require the `TimelineWriter` to become async, because at times
its functions need to lock `Timeline::layers` in order to freeze the
open layer.

While doing that, rustc complains that we're now holding
`Timeline::write_lock` across await points (lock order is that
`write_lock` must be acquired before `Timelines::layers`).

So, we need to switch it over to an async primitive.
2023-06-13 10:15:25 +02:00
Arseny Sher
143fa0da42 Remove timeout on test_close_on_connections_exit
We have 300s timeout on all tests, and doubling logic in popen.wait sometimes
exceeds 5s, making the test flaky.

ref https://github.com/neondatabase/neon/issues/4211
2023-06-13 06:26:03 +04:00
bojanserafimov
4936ab6842 compute_ctl: add flag to avoid config step (#4457)
Add backwards-compatible flag that cplane can use to speed up startup time
2023-06-12 13:57:02 -04:00
Christian Schwarz
939593d0d3 refactor check_checkpoint_distance to prepare for async Timeline::layers (#4476)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).

There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.

That will require the `TimelineWriter` to become async.

That will require `freeze_inmem_layer` to become async.

So, inside check_checkpoint_distance, we will have
`freeze_inmem_layer().await`.

But current rustc isn't smart enough to understand that we
`drop(layers)` earlier, and hence, will complain about the `!Send`
`layers` being held across the `freeze_inmem_layer().await`-point.

This patch puts the guard into a scope, so rustc will shut up in the
next patch where we make the transition for `TimelineWriter`.

obsoletes https://github.com/neondatabase/neon/pull/4474
2023-06-12 17:45:56 +01:00
Christian Schwarz
2011cc05cd make Delta{Value,Key}Iter Send (#4472)
... by switching the internal RwLock to a OnceCell.

This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).

See https://github.com/neondatabase/neon/pull/4462#issuecomment-1587398883
for more context.

fixes https://github.com/neondatabase/neon/issues/4471
2023-06-12 17:45:56 +01:00
Arthur Petukhovsky
b0286e3c46 Always truncate WAL after restart (#4464)
c058e1cec2 skipped `truncate_wal()` it if `write_lsn` is equal to
truncation position, but didn't took into account that `write_lsn` is
reset on restart.

Fixes regression looking like:

```
ERROR WAL acceptor{cid=22 ...}:panic{thread=WAL acceptor 19b6c1743666ec02991a7633c57178db/b07db8c88f4c76ea5ed0954c04cc1e74 location=safekeeper/src/wal_storage.rs:230:13}: unexpected write into non-partial segment file
```

This fix will prevent skipping WAL truncation when we are running for
the first time after restart.
2023-06-12 13:42:28 +00:00
Heikki Linnakangas
e4f05ce0a2 Enable sanity check that disk_consistent_lsn is valid on created timeline.
Commit `create_test_timeline: always put@initdb_lsn the minimum required keys`
already switched us over to using valid initdb_lsns.

All that's left to do is to actually flush the minimum keys so that
we move from disk_consistent_lsn=Lsn(0) to disk_consistent_lsn=initdb_lsn.

Co-authored-by: Christian Schwarz <christian@neon.tech>

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Heikki Linnakangas
8d106708d7 Clean up timeline initialization code.
Clarify who's responsible for initializing the layer map. There were
previously two different ways to do it:

- create_empty_timeline and bootstrap_timeline let prepare_timeline()
  initialize an empty layer map.

- branch_timeline passed a flag to initialize_with_lock() to tell
  initialize_with_lock to call load_layer_map(). Because it was a
  newly created timeline, load_layer_map() never found any layer
  files, so it just initialized an empty layer map.

With this commit, prepare_new_timeline() always does it. The LSN to
initialize it with is passed as argument.

Other changes per function:

prepare_timeline:
- rename to 'prepare_new_timeline' to make it clear that it's only used
  when creating a new timeline, not when loading an existing timeline
- always initialize an empty layer map. The caller can pass the LSN to
  initialize it with. (Previously, prepare_timeline would optionally
  load the layer map at 'initdb_lsn'. Some caller used that, while others
  let initialize_with_lock do it

initialize_with_lock:
- As mentioned above, remove the option to load the layer map
- Acquire the 'timelines' lock in the function itself. None of the callers
  did any other work while holding the lock.
- Rename it to finish_creation() to make its intent more clear. It's only
  used when creating a new timeline now.

create_timeline_data:
- Rename to create_timeline_struct() for clarity. It just initializes
  the Timeline struct, not any other "data"

create_timeline_files:
- use create_dir rather than create_dir_all, to be a little more strict.
  We know that the parent directory should already exist, and the timeline
  directory should not exist.
- Move the call to create_timeline_struct() to the caller. It was just
  being "passed through"

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Christian Schwarz
f450369b20 timeline_init_and_sync: don't hold Tenant::timelines while load_layer_map
This patch inlines `initialize_with_lock` and then reorganizes the code
such that we can `load_layer_map` without holding the
`Tenant::timelines` lock.

As a nice aside, we can get rid of the dummy() uninit mark, which has
always been a terrible hack.

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Christian Schwarz
aad918fb56 create_test_timeline: tests for put@initdb_lsn optimization code 2023-06-12 11:04:49 +01:00
Christian Schwarz
86dd8c96d3 add infrastructure to expect use of initdb_lsn flush optimization 2023-06-12 11:04:49 +01:00
Christian Schwarz
6a65c4a4fe create_test_timeline: always put@initdb_lsn the minimum required keys (#4451)
See the added comment on `create_empty_timeline`.

The various test cases now need to set a valid `Lsn` instead of
`Lsn(0)`.

Rough context:
https://github.com/neondatabase/neon/pull/4364#discussion_r1221995691
2023-06-12 09:28:34 +00:00
Vadim Kharitonov
e9072ee178 Compile rdkit (#4442)
`rdkit` extension

```
postgres=# create extension rdkit;
CREATE EXTENSION
postgres=# select 'c1[o,s]ncn1'::qmol;
    qmol
-------------
 c1[o,s]ncn1
(1 row)
```
2023-06-12 11:13:33 +02:00
Joonas Koivunen
7e17979d7a feat: http request logging on safekeepers.
With RequestSpan, successfull GETs are not logged, but all others, errors and
warns on cancellations are.
2023-06-11 22:53:08 +04:00
Arseny Sher
227271ccad Switch safekeepers to async.
This is a full switch, fs io operations are also tokio ones, working through
thread pool. Similar to pageserver, we have multiple runtimes for easier `top`
usage and isolation.

Notable points:
- Now that guts of safekeeper.rs are full of .await's, we need to be very
  careful not to drop task at random point, leaving timeline in unclear
  state. Currently the only writer is walreceiver and we don't have top
  level cancellation there, so we are good. But to be safe probably we should
  add a fuse panicking if task is being dropped while operation on a timeline
  is in progress.
- Timeline lock is Tokio one now, as we do disk IO under it.
- Collecting metrics got a crutch: since prometheus Collector is
  synchronous, it spawns a thread with current thread runtime collecting data.
- Anything involving closures becomes significantly more complicated, as
  async fns are already kinda closures + 'async closures are unstable'.
- Main thread now tracks other main tasks, which got much easier.
- The only sync place left is initial data loading, as otherwise clippy
  complains on timeline map lock being held across await points -- which is
  not bad here as it happens only in single threaded runtime of main thread.
  But having it sync doesn't hurt either.

I'm concerned about performance of thread pool io offloading, async traits and
many await points; but we can try and see how it goes.

fixes https://github.com/neondatabase/neon/issues/3036
fixes https://github.com/neondatabase/neon/issues/3966
2023-06-11 22:53:08 +04:00
dependabot[bot]
fbf0367e27 build(deps): bump cryptography from 39.0.1 to 41.0.0 (#4409) 2023-06-11 19:14:30 +01:00
Arthur Petukhovsky
a21b55fe0b Use connect_timeout for broker::connect (#4452)
Use `storage_broker::connect` everywhere. Add a default 5 seconds
timeout for opening new connection.
2023-06-09 17:38:53 +03:00
Shany Pozin
add51e1372 Add delete_objects to storage api (#4449)
## Summary of changes
Add missing delete_objects API to support bulk deletes
2023-06-09 13:23:12 +03:00
Alex Chi Z
cdce04d721 pgserver: add local manifest for atomic operation (#4422)
## Problem

Part of https://github.com/neondatabase/neon/issues/4418

## Summary of changes

This PR implements the local manifest interfaces. After the refactor of
timeline is done, we can integrate this with the current storage. The
reader will stop at the first corrupted record.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-08 19:34:25 -04:00
bojanserafimov
6bac770811 Add cold start test (#4436) 2023-06-08 18:11:33 -04:00
Stas Kelvich
c82d19d8d6 Fix NULLs handling in proxy json endpoint
There were few problems with null handling:

* query_raw_txt() accepted vector of string so it always (erroneously)
treated "null" as a string instead of null. Change rust pg client
to accept the vector of Option<String> instead of just Strings. Adopt
coding here to pass nulls as None.

* pg_text_to_json() had a check that always interpreted "NULL" string
as null. That is wrong and nulls were already handled by match None.
This bug appeared as a bad attempt to parse arrays containing NULL
elements. Fix coding by checking presence of quotes while parsing an
array (no quotes -> null, quoted -> "null" string).

Array parser fix also slightly changes behavior by always cleaning
current entry when pushing to the resulting vector. This seems to be
an omission by previous coding, however looks like it was harmless
as entry was not cleared only at the end of the nested or to-level
array.
2023-06-08 16:00:18 +03:00
Stas Kelvich
d73639646e Add more output options to proxy json endpoint
With this commit client can pass following optional headers:

`Neon-Raw-Text-Output: true`. Return postgres values as text, without parsing them. So numbers, objects, booleans, nulls and arrays will be returned as text. That can be useful in cases when client code wants to implement it's own parsing or reuse parsing libraries from e.g. node-postgres.

`Neon-Array-Mode: true`. Return postgres rows as arrays instead of objects. That is more compact representation and also helps in some edge
cases where it is hard to use rows represented as objects (e.g. when several fields have the same name).
2023-06-08 16:00:18 +03:00
Dmitry Rodionov
d53f9ab3eb delete timelines from s3 (#4384)
Delete data from s3 when timeline deletion is requested

## Summary of changes

UploadQueue is altered to support scheduling of delete operations in
stopped state. This looks weird, and I'm thinking whether there are
better options/refactorings for upload client to make it look better.

Probably can be part of https://github.com/neondatabase/neon/issues/4378

Deletion is implemented directly in existing endpoint because changes are not
that significant. If we want more safety we can separate those or create
feature flag for new behavior.

resolves [#4193](https://github.com/neondatabase/neon/issues/4193)

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-06-08 15:01:22 +03:00
Dmitry Rodionov
8560a98d68 fix openapi spec to pass swagger editor validation (#4445)
There shouldnt be a dash before `type: object`. Also added description.
2023-06-08 13:25:30 +03:00
Alex Chi Z
2e687bca5b refactor: use LayerDesc in layer map (part 1) (#4408)
## Problem

part of https://github.com/neondatabase/neon/issues/4392

## Summary of changes

This PR adds a new HashMap that maps persistent layer desc to the layer
object *inside* LayerMap. Originally I directly went towards adding such
layer cache in Timeline, but the changes are too many and cannot be
reviewed as a reasonably-sized PR. Therefore, we take this intermediate
step to change part of the codebase to use persistent layer desc, and
come up with other PRs to move this hash map of layer desc to the
timeline struct.

Also, file_size is now part of the layer desc.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-07 18:28:18 +03:00
Dmitry Rodionov
1a1019990a map TenantState::Broken to TenantAttachmentStatus::Failed (#4371)
## Problem

Attach failures are not reported in public part of the api (in
`attachment_status` field of TenantInfo).

## Summary of changes

Expose TenantState::Broken as TenantAttachmentStatus::Failed

In the way its written Failed status will be reported even if no
attachment happened. (I e if tenant become broken on startup). This is
in line with other members. I e Active will be resolved to Attached even
if no actual attach took place.

This can be tweaked if needed. At the current stage it would be overengineering without clear motivation

resolves #4344
2023-06-07 18:25:30 +03:00
Alex Chi Z
1c200bd15f fix: break dev dependencies between wal_craft and pg_ffi (#4424)
## Problem

close https://github.com/neondatabase/neon/issues/4266

## Summary of changes

With this PR, rust-analyzer should be able to give lints and auto
complete in `mod tests`, and this makes writing tests easier.
Previously, rust-analyzer cannot do auto completion.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-07 17:51:13 +03:00
Arseny Sher
37bf2cac4f Persist safekeeper control file once in a while.
It should make remote_consistent_lsn commonly up-to-date on non actively writing
projects, which removes spike or pageserver -> safekeeper reconnections on
storage nodes restart.
2023-06-07 17:23:37 +04:00
Joonas Koivunen
5761190e0d feat: three phased startup order (#4399)
Initial logical size calculation could still hinder our fast startup
efforts in #4397. See #4183. In deployment of 2023-06-06
about a 200 initial logical sizes were calculated on hosts which
took the longest to complete initial load (12s).

Implements the three step/tier initialization ordering described in
#4397:
1. load local tenants
2. do initial logical sizes per walreceivers for 10s
3. background tasks

Ordering is controlled by:
- waiting on `utils::completion::Barrier`s on background tasks
- having one attempt for each Timeline to do initial logical size
calculation
- `pageserver/src/bin/pageserver.rs` releasing background jobs after
timeout or completion of initial logical size calculation

The timeout is there just to safeguard in case a legitimate non-broken
timeline initial logical size calculation goes long. The timeout is
configurable, by default 10s, which I think would be fine for production
systems. In the test cases I've been looking at, it seems that these
steps are completed as fast as possible.

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-06-07 14:29:23 +03:00
Vadim Kharitonov
88f0cfc575 Fix pgx_ulid extension (#4431)
The issue was in the wrong `control` file name
2023-06-07 11:41:53 +02:00
Arseny Sher
6b3c020cd9 Don't warn on system id = 0 in walproposer greeting.
sync-safekeepers doesn't know it and sends 0.
2023-06-07 12:39:20 +04:00
Arseny Sher
c058e1cec2 Quick exit in truncate_wal if nothing to do.
ref https://github.com/neondatabase/neon/issues/4414
2023-06-07 12:39:20 +04:00
Arseny Sher
dc6a382873 Increase timeouts on compute -> sk connections.
context: https://github.com/neondatabase/neon/issues/4414

And improve messages/comments here and there.
2023-06-07 12:39:20 +04:00
Heikki Linnakangas
df3bae2ce3 Use compute_ctl to manage Postgres in tests. (#3886)
This adds test coverage for 'compute_ctl', as it is now used by all
the python tests.
    
There are a few differences in how 'compute_ctl' is called in the
tests, compared to the real web console:
    
- In the tests, the postgresql.conf file is included as one large
  string in the spec file, and it is written out as it is to the data
  directory.  I added a new field for that to the spec file. The real
  web console, however, sets all the necessary settings in the
  'settings' field, and 'compute_ctl' creates the postgresql.conf from
  those settings.

- In the tests, the information needed to connect to the storage, i.e.
  tenant_id, timeline_id, connection strings to pageserver and
  safekeepers, are now passed as new fields in the spec file. The real
  web console includes them as the GUCs in the 'settings' field. (Both
  of these are different from what the test control plane used to do:
  It used to write the GUCs directly in the postgresql.conf file). The
  plan is to change the control plane to use the new method, and
  remove the old method, but for now, support both.

Some tests that were sensitive to the amount of WAL generated needed
small changes, to accommodate that compute_ctl runs the background
health monitor which makes a few small updates. Also some tests shut
down the pageserver, and now that the background health check can run
some queries while the pageserver is down, that can produce a few
extra errors in the logs, which needed to be allowlisted.

Other changes:
- remove obsolete comments about PostgresNode;
- create standby.signal file for Static compute node;
- log output of `compute_ctl` and `postgres` is merged into
`endpoints/compute.log`.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-06-06 14:59:36 +01:00
Joonas Koivunen
0cef7e977d refactor: just one way to shutdown a tenant (#4407)
We have 2 ways of tenant shutdown, we should have just one.

Changes are mostly mechanical simple refactorings.

Added `warn!` on the "shutdown all remaining tasks" should trigger test
failures in the between time of not having solved the "tenant/timeline
owns all spawned tasks" issue.

Cc: #4327.
2023-06-06 15:30:55 +03:00
Joonas Koivunen
18a9d47f8e test: restore NotConnected being allowed globally (#4426)
Flakyness introduced by #4402 evidence [^1].

I had assumed the NotConnected would had been an expected io error, but
it's not. Restore the global `allowed_error`.

[^1]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4407/5185897757/index.html#suites/82004ab4e3720b47bf78f312dabe7c55/14f636d0ecd3939d/
2023-06-06 13:51:39 +03:00
Sasha Krassovsky
ac11e7c32d Remove arch-specific stuff from HNSW extension (#4423) 2023-06-05 22:04:15 -08:00
Konstantin Knizhnik
8e1b5e1224 Remove -ftree-vectorizer-verbose=0 option notrecognized by MaxOS/X c… (#4412)
…ompiler

## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-05 20:10:19 +03:00
Joonas Koivunen
e0bd81ce1f test: fix flaky warning on attach (#4415)
added the `allowed_error` to the `positive_env` so any tests completing
the attach are allowed have this print out. they are allowed to do so,
because the `random_init_delay` can produce close to zero and thus the
first run will be near attach. Though... Unsure if we ever really need
the eviction task to run **before** it can evict something, as in after
20min or 24h.

in the failed test case however period is 20s so interesting that we
didn't run into this sooner.

evidence of flaky:
https://github.com/neondatabase/neon/actions/runs/5175677035/jobs/9323705929?pr=4399#step:4:38536
2023-06-05 18:12:58 +03:00
Joonas Koivunen
77598f5d0a Better walreceiver logging (#4402)
walreceiver logs are a bit hard to understand because of partial span
usage, extra messages, ignored errors popping up as huge stacktraces.

Fixes #3330 (by spans, also demote info -> debug).

- arrange walreceivers spans into a hiearchy: 
    - `wal_connection_manager{tenant_id, timeline_id}` ->
      `connection{node_id}` -> `poller`
- unifies the error reporting inside `wal_receiver`:
- All ok errors are now `walreceiver connection handling ended: {e:#}`
- All unknown errors are still stacktraceful task_mgr reported errors
  with context `walreceiver connection handling failure`
- Remove `connect` special casing, was: `DB connection stream finished`
  for ok errors
- Remove `done replicating` special casing, was `Replication stream
  finished` for ok errors
- lowered log levels for (non-exhaustive list):
    - `WAL receiver manager started, connecting to broker` (at startup)
    - `WAL receiver shutdown requested, shutting down` (at shutdown)
    - `Connection manager loop ended, shutting down` (at shutdown)
    - `sender is dropped while join handle is still alive` (at lucky
      shutdown, see #2885)
    - `timeline entered terminal state {:?}, stopping wal connection manager
      loop` (at shutdown)
    - `connected!` (at startup)
- `Walreceiver db connection closed` (at disconnects?, was without span)
    - `Connection cancelled` (at shutdown, was without span)
- `observed timeline state change, new state is {new_state:?}` (never
  after Timeline::activate was made infallible)
- changed:
    - `Timeline dropped state updates sender, stopping wal connection
      manager loop`
    - was out of date; sender is not dropped but `Broken | Stopping` state
      transition
        - also made `debug!`
    - `Timeline dropped state updates sender before becoming active,
      stopping wal connection manager loop`
    - was out of date: sender is again not dropped but `Broken | Stopping`
      state transition
        - also made `debug!`
- log fixes:
    - stop double reporting panics via JoinError
2023-06-05 17:35:23 +03:00
Joonas Koivunen
8142edda01 test: Less flaky gc (#4416)
Solves a flaky test error in the wild[^1] by:

- Make the gc shutdown signal reading an `allowed_error`
- Note the gc shutdown signal readings as being in `allowed_error`s
- Allow passing tenant conf to init_start to avoid unncessary tenants

[^1]:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4399/5176432780/index.html#suites/b97efae3a617afb71cb8142f5afa5224/2cd76021ea011f93
2023-06-05 15:43:52 +03:00
Vadim Kharitonov
b9871158ba Compile PGX ULID extension (#4413)
Create pgx_ulid extension

```
postgres=# create extension ulid;
CREATE EXTENSION
postgres=# CREATE TABLE users (
  id ulid NOT NULL DEFAULT gen_ulid() PRIMARY KEY,
  name text NOT NULL
);
CREATE TABLE
postgres=# insert into users (name) values ('vadim');
INSERT 0 1
postgres=# select * from users;
             id             | name
----------------------------+-------
 01H25DDG3KYMYZTNR41X38E256 | vadim
```
2023-06-05 12:52:13 +03:00
Joonas Koivunen
8caef2c0c5 fix: delay eviction_task as well (#4397)
As seen on deployment of 2023-06-01 release, times were improving but
there were some outliers caused by:
- timelines `eviction_task` starting while activating and running
imitation
- timelines `initial logical size` calculation

This PR fixes it so that `eviction_task` is delayed like other
background tasks fixing an oversight from earlier #4372.

After this PR activation will be two phases:
1. load and activate tenants AND calculate some initial logical sizes
2. rest of initial logical sizes AND background tasks
- compaction, gc, disk usage based eviction, timelines `eviction_task`,
consumption metrics
2023-06-05 09:37:53 +03:00
Konstantin Knizhnik
04542826be Add HNSW extension (#4227)
## Describe your changes

Port HNSW implementation for ANN search top Postgres 

## Issue ticket number and link

https://www.pinecone.io/learn/hnsw

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-06-04 11:41:38 +03:00
bojanserafimov
4ba950a35a Add libcurl as dependency to readme (#4405) 2023-06-02 18:07:45 -04:00
Joonas Koivunen
a55c663848 chore: comment marker fixes (#4406)
Upgrading to rust 1.70 will require these.
2023-06-02 21:03:12 +03:00
Heikki Linnakangas
9787227c35 Shield HTTP request handlers from async cancellations. (#4314)
We now spawn a new task for every HTTP request, and wait on the
JoinHandle. If Hyper drops the Future, the spawned task will keep
running. This protects the rest of the pageserver code from unexpected
async cancellations.

This creates a CancellationToken for each request and passes it to the
handler function. If the HTTP request is dropped by the client, the
CancellationToken is signaled. None of the handler functions make use
for the CancellationToken currently, but they now they could.

The CancellationToken arguments also work like documentation. When
you're looking at a function signature and you see that it takes a
CancellationToken as argument, it's a nice hint that the function might
run for a long time, and won't be async cancelled. The default
assumption in the pageserver is now that async functions are not
cancellation-safe anyway, unless explictly marked as such, but this is a
nice extra reminder.

Spawning a task for each request is OK from a performance point of view
because spawning is very cheap in Tokio, and none of our HTTP requests
are very performance critical anyway.

Fixes issue #3478
2023-06-02 08:28:13 -04:00
Joonas Koivunen
ef80a902c8 pg_sni_router: add session_id to more messages (#4403)
See superceded #4390.

- capture log in test
- expand the span to cover init and error reporting
- remove obvious logging by logging only unexpected
2023-06-02 14:59:10 +03:00
Alex Chi Z
66cdba990a refactor: use PersistentLayerDesc for persistent layers (#4398)
## Problem

Part of https://github.com/neondatabase/neon/issues/4373

## Summary of changes

This PR adds `PersistentLayerDesc`, which will be used in LayerMap
mapping and probably layer cache. After this PR and after we change
LayerMap to map to layer desc, we can safely drop RemoteLayerDesc.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-01 22:06:28 +03:00
Alex Chi Z
82484e8241 pgserver: add more metrics for better observability (#4323)
## Problem

This PR includes doc changes to the current metrics as well as adding
new metrics. With the new set of metrics, we can quantitatively analyze
the read amp., write amp. and space amp. in the system, when used
together with https://github.com/neondatabase/neonbench

close https://github.com/neondatabase/neon/issues/4312
ref https://github.com/neondatabase/neon/issues/4347

compaction metrics TBD, a novel idea is to print L0 file number and
number of layers in the system, and we can do this in the future when we
start working on compaction.

## Summary of changes

* Add `READ_NUM_FS_LAYERS` for computing read amp.
* Add `MATERIALIZED_PAGE_CACHE_HIT_UPON_REQUEST`.
* Add `GET_RECONSTRUCT_DATA_TIME`. GET_RECONSTRUCT_DATA_TIME +
RECONSTRUCT_TIME + WAIT_LSN_TIME should be approximately total time of
reads.
* Add `5.0` and `10.0` to `STORAGE_IO_TIME_BUCKETS` given some fsync
runs slow (i.e., > 1s) in some cases.
* Some `WAL_REDO` metrics are only used when Postgres is involved in the
redo process.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-01 21:46:04 +03:00
Joonas Koivunen
36fee50f4d compute_ctl: enable tracing panic hook (#4375)
compute_ctl can panic, but `tracing` is used for logging. panic stderr
output can interleave with messages from normal logging. The fix is to
use the established way (pageserver, safekeeper, storage_broker) of using
`tracing` to report panics.
2023-06-01 20:12:07 +03:00
bojanserafimov
330083638f Fix stale and misleading comment in LayerMap (#4297) 2023-06-01 05:04:46 +03:00
Konstantin Knizhnik
952d6e43a2 Add pageserver parameter forced_image_creation_limit which can be used… (#4353)
This parameter can be use to restrict number of image layers generated
because of GC request (wanted image layers).
Been set to zero it completely eliminates creation of such image layers.
So it allows to avoid extra storage consumption after merging #3673

## Problem
PR #3673 forces generation of missed image layers. So i short term is
cause cause increase (in worst case up to two times) size of storage.
It was intended (by me) that GC period is comparable with PiTR interval.
But looks like it is not the case now - GC is performed much more
frequently. It may cause the problem with space exhaustion: GC forces
new image creation while large PiTR still prevent GC from collecting old
layers.

## Summary of changes

Add new pageserver parameter` forced_image_creation_limit` which
restrict number of created image layers which are requested by GC.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-05-31 21:37:20 +03:00
bojanserafimov
b6447462dc Fix layer map correctness bug (#4342) 2023-05-31 12:23:00 -04:00
Dmitry Rodionov
b190c3e6c3 reduce flakiness by allowing Compaction failed, retrying in X queue is in state Stopped. (#4379)
resolves https://github.com/neondatabase/neon/issues/4374 by adding the error to allowed_errors
2023-05-30 20:11:44 +03:00
Joonas Koivunen
f4db85de40 Continued startup speedup (#4372)
Startup continues to be slow, work towards to alleviate it.

Summary of changes:

- pretty the functional improvements from #4366 into
`utils::completion::{Completion, Barrier}`
- extend "initial load completion" usage up to tenant background tasks
    - previously only global background tasks
- spawn_blocking the tenant load directory traversal
- demote some logging
- remove some unwraps
- propagate some spans to `spawn_blocking`

Runtime effects should be major speedup to loading, but after that, the
`BACKGROUND_RUNTIME` will be blocked for a long time (minutes). Possible
follow-ups:
- complete initial tenant sizes before allowing background tasks to
block the `BACKGROUND_RUNTIME`
2023-05-30 16:25:07 +03:00
Arthur Petukhovsky
210be6b6ab Replace broker duration logs with metrics (#4370)
I've added logs for broker push duration after every iteration in https://github.com/neondatabase/neon/pull/4142. This log has not found any real issues, so we can replace it with metrics, to slightly reduce log volume.

LogQL query found that pushes longer that 500ms happened only 90 times for the last month. https://neonprod.grafana.net/goto/KTNj9UwVg?orgId=1

`{unit="safekeeper.service"} |= "timeline updates to broker in" | regexp "to broker in (?P<duration>.*)" | duration > 500ms`
2023-05-30 16:08:02 +03:00
Alexander Bayandin
daa79b150f Code Coverage: store lcov report (#4358)
## Problem

In the future, we want to compare code coverage on a PR with coverage on
the main branch.
Currently, we store only code coverage HTML reports, I suggest we start
storing reports in "lcov info" format that we can use/parse in the
future. Currently, the file size is ~7Mb (it's a text-based format and
could be compressed into a ~400Kb archive)

- More about "lcov info" format:
https://manpages.ubuntu.com/manpages/jammy/man1/geninfo.1.html#files
- Part of https://github.com/neondatabase/neon/issues/3543

## Summary of changes
- Change `scripts/coverage` to output lcov coverage to
`report/lcov.info` file instead of stdout (we already upload the whole
`report/` directory to S3)
2023-05-30 14:05:41 +01:00
Joonas Koivunen
db14355367 revert: static global init logical size limiter (#4368)
added in #4366. revert for testing without it; it may have unintenteded
side-effects, and it's very difficult to understand the results from the
10k load testing environments. earlier results:
https://github.com/neondatabase/neon/pull/4366#issuecomment-1567491064
2023-05-30 10:40:37 +03:00
Joonas Koivunen
cb83495744 try: startup speedup (#4366)
Startup can take a long time. We suspect it's the initial logical size
calculations. Long term solution is to not block the tokio executors but
do most of I/O in spawn_blocking.

See: #4025, #4183

Short-term solution to above:

- Delay global background tasks until initial tenant loads complete
- Just limit how many init logical size calculations can we have at the
same time to `cores / 2`

This PR is for trying in staging.
2023-05-29 21:48:38 +03:00
Christian Schwarz
f4f300732a refactor TenantState transitions (#4321)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
The motivation is to avoid locking `Tenant::timelines` in places that
can't be `async`, because in #4333 we want to convert Tenant::timelines
from `std::sync::Mutex` to `tokio::sync::Mutex`.

But, the changes here are useful in general because they clean up &
document tenant state transitions.
That also paves the way for #4350, which is an alternative to #4333 that
refactors the pageserver code so that we can keep the
`Tenant::timelines` mutex sync.

This patch consists of the following core insights and changes:

* spawn_load and spawn_attach own the tenant state until they're done
* once load()/attach() calls are done ...
* if they failed, transition them to Broken directly (we know that
there's no background activity because we didn't call activate yet)
* if they succeed, call activate. We can make it infallible. How? Later.
* set_broken() and set_stopping() are changed to wait for spawn_load() /
spawn_attach() to finish.
* This sounds scary because it might hinder detach or shutdown, but
actually, concurrent attach+detach, or attach+shutdown, or
load+shutdown, or attach+shutdown were just racy before this PR.
     So, with this change, they're not anymore.
In the future, we can add a `CancellationToken` stored in Tenant to
cancel `load` and `attach` faster, i.e., make `spawn_load` /
`spawn_attach` transition them to Broken state sooner.

See the doc comments on TenantState for the state transitions that are
now possible.
It might seem scary, but actually, this patch reduces the possible state
transitions.

We introduce a new state `TenantState::Activating` to avoid grabbing the
`Tenant::timelines` lock inside the `send_modify` closure.
These were the humble beginnings of this PR (see Motivation section),
and I think it's still the right thing to have this `Activating` state,
even if we decide against async `Tenant::timelines` mutex. The reason is
that `send_modify` locks internally, and by moving locking of
Tenant::timelines out of the closure, the internal locking of
`send_modify` becomes a leaf of the lock graph, and so, we eliminate
deadlock risk.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-05-29 17:52:41 +03:00
Em Sharnoff
ccf653c1f4 re-enable file cache integration for VM compute node (#4338)
#4155 inadvertently switched to a version of the VM builder that leaves
the file cache integration disabled by default. This re-enables the
vm-informant's file cache integration.

(as a refresher: The vm-informant is the autoscaling component that sits
inside the VM and manages postgres / compute_ctl)

See also: https://github.com/neondatabase/autoscaling/pull/265
2023-05-28 10:22:45 -07:00
Heikki Linnakangas
2d6a022bb8 Don't allow two timeline_delete operations to run concurrently. (#4313)
If the timeline is already being deleted, return an error. We used to
notice the duplicate request and error out in
persist_index_part_with_deleted_flag(), but it's better to detect it
earlier. Add an explicit lock for the deletion.

Note: This doesn't do anything about the async cancellation problem
(github issue #3478): if the original HTTP request dropped, because the
client disconnected, the timeline deletion stops half-way through the
operation. That needs to be fixed, too, but that's a separate story.

(This is a simpler replacement for PR #4194. I'm also working on the
cancellation shielding, see PR #4314.)
2023-05-27 15:55:43 +03:00
Heikki Linnakangas
2cdf07f12c Refactor RequestSpan into a function.
Previously, you used it like this:

    |r| RequestSpan(my_handler).handle(r)

But I don't see the point of the RequestSpan struct. It's just a
wrapper around the handler function. With this commit, the call
becomes:

    |r| request_span(r, my_handler)

Which seems a little simpler.

At first I thought that the RequestSpan struct would allow "chaining"
other kinds of decorators like RequestSpan, so that you could do
something like this:

    |r| CheckPermissions(RequestSpan(my_handler)).handle(r)

But it doesn't work like that. If each of those structs wrap a handler
*function*, it would actually look like this:

    |r| CheckPermissions(|r| RequestSpan(my_handler).handle(r))).handle(r)

This commit doesn't make that kind of chaining any easier, but seems a
little more straightforward anyway.
2023-05-27 11:47:22 +03:00
Heikki Linnakangas
200a520e6c Minor refactoring in RequestSpan
Require the error type to be ApiError. It implicitly required that
anyway, because the function used error::handler, which downcasted the
error to an ApiError. If the error was in fact anything else than
ApiError, it would just panic. Better to check it at compilation time.

Also make the last-resort error handler more forgiving, so that it
returns an 500 Internal Server error response, instead of panicking,
if a request handler returns some other error than an ApiError.
2023-05-27 11:47:22 +03:00
Alex Chi Z
4e359db4c7 pgserver: spawn_blocking in compaction (#4265)
Compaction is usually a compute-heavy process and might affect other
futures running on the thread of the compaction. Therefore, we add
`block_in_place` as a temporary solution to avoid blocking other futures
on the same thread as compaction in the runtime. As we are migrating
towards a fully-async-style pageserver, we can revert this change when
everything is async and when we move compaction to a separate runtime.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-26 17:15:47 -04:00
Joonas Koivunen
be177f82dc Revert "Allow for higher s3 concurrency (#4292)" (#4356)
This reverts commit 024109fbeb for it
failing to be speed up anything, but run into more errors.

See: #3698.
2023-05-26 18:37:17 +03:00
Alexander Bayandin
339a3e3146 GitHub Autocomment: comment commits for branches (#4335)
## Problem

GitHub Autocomment script posts a comment only for PRs. It's harder
to debug failed tests on main or release branches.

## Summary of changes

- Change the GitHub Autocomment script to be able to post a comment to
either a PR or a commit of a branch
2023-05-26 14:49:42 +01:00
Heikki Linnakangas
a560b28829 Make new tenant/timeline IDs mandatory in create APIs. (#4304)
We used to generate the ID, if the caller didn't specify it. That's bad
practice, however, because network is never fully reliable, so it's
possible we create a new tenant but the caller doesn't know about it,
and because it doesn't know the tenant ID, it has no way of retrying or
checking if it succeeded. To discourage that, make it mandatory. The web
control plane has not relied on the auto-generation for a long time.
2023-05-26 16:19:36 +03:00
Joonas Koivunen
024109fbeb Allow for higher s3 concurrency (#4292)
We currently have a semaphore based rate limiter which we hope will keep
us under S3 limits. However, the semaphore does not consider time, so
I've been hesitant to raise the concurrency limit of 100.

See #3698.

The PR Introduces a leaky-bucket based rate limiter instead of the
`tokio::sync::Semaphore` which will allow us to raise the limit later
on. The configuration changes are not contained here.
2023-05-26 13:35:50 +03:00
Alexander Bayandin
2b25f0dfa0 Fix flakiness of test_metric_collection (#4346)
## Problem

Test `test_metric_collection` become flaky:

```
AssertionError: assert not ['2023-05-25T14:03:41.644042Z ERROR metrics_collection: failed to send metrics: reqwest::Error { kind: Request, url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("localhost")), port: Some(18022), path: "/billing/api/v1/usage_events", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" })) }',
                            ...]
```
I suspect it is caused by having 2 places when we define
`httpserver_listen_address` fixture (which is internally used by
`pytest-httpserver` plugin)

## Summary of changes
- Remove the definition of `httpserver_listen_address` from
`test_runner/regress/test_ddl_forwarding.py` and keep one in
`test_runner/fixtures/neon_fixtures.py`
- Also remote unused `httpserver_listen_address` parameter from
`test_proxy_metric_collection`
2023-05-26 00:05:11 +03:00
Christian Schwarz
057cceb559 refactor: make timeline activation infallible (#4319)
Timeline::activate() was only fallible because `launch_wal_receiver`
was.

`launch_wal_receiver` was fallible only because of some preliminary
checks in `WalReceiver::start`.

Turns out these checks can be shifted to the type system by delaying
creatinon of the `WalReceiver` struct to the point where we activate the
timeline.

The changes in this PR were enabled by my previous refactoring that
funneled the broker_client from pageserver startup to the activate()
call sites.

Patch series:

- #4316
- #4317
- #4318
- #4319
2023-05-25 20:26:43 +02:00
sharnoff
ae805b985d Bump vm-builder v0.7.3-alpha3 -> v0.8.0 (#4339)
Routine `vm-builder` version bump, from autoscaling repo release. You
can find the release notes here:
https://github.com/neondatabase/autoscaling/releases/tag/v0.8.0
The changes are from v0.7.2 — most of them were already included in
v0.7.3-alpha3.

Of particular note: This (finally) fixes the cgroup issues, so we should
now be able to scale up when we're about to run out of memory.

**NB:** This has the effect of limit the DB's memory usage in a way it
wasn't limited before. We may run into issues because of that. There is
currently no way to disable that behavior, other than switching the
endpoint back to the k8s-pod provisioner.
2023-05-25 09:33:18 -07:00
Joonas Koivunen
85e76090ea test: fix ancestor is stopping flakyness (#4234)
Flakyness most likely introduced in #4170, detected in
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4232/4980691289/index.html#suites/542b1248464b42cc5a4560f408115965/18e623585e47af33.

Opted to allow it globally because it can happen in other tests as well,
basically whenever compaction is enabled and we stop pageserver
gracefully.
2023-05-25 16:22:58 +00:00
Alexander Bayandin
08e7d2407b Storage: use Postgres 15 as default (#2809) 2023-05-25 15:55:46 +01:00
Alex Chi Z
ab2757f64a bump dependencies version (#4336)
proceeding https://github.com/neondatabase/neon/pull/4237, this PR bumps
AWS dependencies along with all other dependencies to the latest
compatible semver.

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-25 10:21:15 -04:00
Christian Schwarz
e5617021a7 refactor: eliminate global storage_broker client state (#4318)
(This is prep work to make `Timeline::activate` infallible.)

This patch removes the global storage_broker client instance from the
pageserver codebase.

Instead, pageserver startup instantiates it and passes it down to the
`Timeline::activate` function, which in turn passes it to the
WalReceiver, which is the entity that actually uses it.

Patch series:

- #4316
- #4317
- #4318
- #4319
2023-05-25 16:47:42 +03:00
Christian Schwarz
83ba02b431 tenant_status: don't InternalServerError if tenant not found (#4337)
Note this also changes the status code to the (correct) 404. Not sure if
that's relevant to Console.

Context:
https://neondb.slack.com/archives/C04PSBP2SAF/p1684746238831449?thread_ts=1684742106.169859&cid=C04PSBP2SAF

Atop #4300 because it cleans up the mgr::get_tenant() error type and I want eyes on that PR.
2023-05-25 11:38:04 +02:00
Christian Schwarz
37ecebe45b mgr::get_tenant: distinguished error type (#4300)
Before this patch, it would use error type `TenantStateError` which has
many more error variants than can actually happen with
`mgr::get_tenant`.

Along the way, I also introduced `SetNewTenantConfigError` because it
uses `mgr::get_tenant` and also can only fail in much fewer ways than
`TenantStateError` suggests.

The new `page_service.rs`'s `GetActiveTimelineError` and
`GetActiveTenantError` types were necessary to avoid an `Other` variant
on the `GetTenantError`.

This patch is a by-product of reading code that subscribes to
`Tenant::state` changes.
Can't really connect it to any given project.
2023-05-25 11:37:12 +02:00
Sasha Krassovsky
6052ecee07 Add connector extension to send Role/Database updates to console (#3891)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-05-25 12:36:57 +03:00
Christian Schwarz
e11ba24ec5 tenant loops: operate on the Arc<Tenant> directly (#4298)
(Instead of going through mgr every iteration.)

The `wait_for_active_tenant` function's `wait` argument could be removed
because it was only used for the loop that waits for the tenant to show
up in the tenants map. Since we're passing the tenant in, we now longer
need to get it from the tenants map.

NB that there's no guarantee that the tenant object is in the tenants
map at the time the background loop function starts running. But the
tenant mgr guarantees that it will be quite soon. See
`tenant_map_insert` way upwards in the call hierarchy for details.

This is prep work to eliminate `subscribe_for_state_updates` (PR #4299 )

Fixes: #3501
2023-05-25 10:49:09 +02:00
Alex Chi Z
f276f21636 ci: use eu-central-1 bucket (#4315)
Probably increase CI success rate.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-25 00:00:21 +03:00
Alex Chi Z
7126197000 pagectl: refactor ctl and support dump kv in delta (#4268)
This PR refactors the original page_binutils with a single tool pagectl,
use clap derive for better command line parsing, and adds the dump kv
tool to extract information from delta file. This helps me better
understand what's inside the page server. We can add support for other
types of file and more functionalities in the future.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-24 19:36:07 +03:00
Christian Schwarz
afc48e2cd9 refactor responsibility for tenant/timeline activation (#4317)
(This is prep work to make `Timeline::activate()` infallible.)

The current possibility for failure in `Timeline::activate()` is the
broker client's presence / absence. It should be an assert, but we're
careful with these. So, I'm planning to pass in the broker client to
activate(), thereby eliminating the possiblity of its absence.

In the unit tests, we don't have a broker client. So, I thought I'd be
in trouble because the unit tests also called `activate()` before this
PR.

However, closer inspection reveals a long-standing FIXME about this,
which is addressed by this patch.

It turns out that the unit tests don't actually need the background
loops to be running. They just need the state value to be `Active`. So,
for the tests, we just set it to that value but don't spawn the
background loops.

We'll need to revisit this if we ever do more Rust unit tests in the
future. But right now, this refactoring improves the code, so, let's
revisit when we get there.

Patch series:

- #4316
- #4317
- #4318
- #4319
2023-05-24 16:54:11 +02:00
Christian Schwarz
df52587bef attach-time tenant config (#4255)
This PR adds support for supplying the tenant config upon /attach.

Before this change, when relocating a tenant using `/detach` and
`/attach`, the tenant config after `/attach` would be the default config
from `pageserver.toml`.
That is undesirable for settings such as the PITR-interval: if the
tenant's config on the source was `30 days` and the default config on
the attach-side is `7 days`, then the first GC run would eradicate 23
days worth of PITR capability.

The API change is backwards-compatible: if the body is empty, we
continue to use the default config.
We'll remove that capability as soon as the cloud.git code is updated to
use attach-time tenant config
(https://github.com/neondatabase/neon/issues/4282 keeps track of this).

unblocks https://github.com/neondatabase/cloud/issues/5092 
fixes https://github.com/neondatabase/neon/issues/1555
part of https://github.com/neondatabase/neon/issues/886 (Tenant
Relocation)

Implementation
==============

The preliminary PRs for this work were (most-recent to least-recent)

* https://github.com/neondatabase/neon/pull/4279
* https://github.com/neondatabase/neon/pull/4267
* https://github.com/neondatabase/neon/pull/4252
* https://github.com/neondatabase/neon/pull/4235
2023-05-24 17:46:30 +03:00
Alexander Bayandin
35bb10757d scripts/ingest_perf_test_result.py: increase connection timeout (#4329)
## Problem

Sometimes default connection timeout is not enough to connect to the DB
with perf test results, [an
example](https://github.com/neondatabase/neon/actions/runs/5064263522/jobs/9091692868#step:10:332).

Similar changes were made for similar scripts:
- For `scripts/flaky_tests.py` in
https://github.com/neondatabase/neon/pull/4096
- For `scripts/ingest_regress_test_result.py` in
https://github.com/neondatabase/neon/pull/2367 (from the very
begginning)

## Summary of changes
- Connection timeout increased to 30s for
`scripts/ingest_perf_test_result.py`
2023-05-24 10:11:24 -04:00
Alexander Bayandin
2a3f54002c test_runner: update dependencies (#4328)
## Problem

`pytest` 6 truncates error messages and this is not configured.
It's fixed in `pytest` 7, it prints the whole message (truncating limit
is higher) if `--verbose` is set (it's set on CI).

## Summary of changes
- `pytest` and `pytest` plugins are updated to their latest versions
- linters (`black` and `ruff`) are updated to their latest versions
- `mypy` and types are updated to their latest versions, new warnings
are fixed
- while we're here, allure updated its latest version as well
2023-05-24 12:47:01 +01:00
Joonas Koivunen
f3769d45ae chore: upgrade tokio to 1.28.1 (#4294)
no major changes, but this is the most recent LTS release and will be
required by #4292.
2023-05-24 08:15:39 +03:00
Arseny Sher
c200ebc096 proxy: log endpoint name everywhere.
Checking out proxy logs for the endpoint is a frequent (often first) operation
during user issues investigation; let's remove endpoint id -> session id mapping
annoying extra step here.
2023-05-24 09:11:23 +04:00
Konstantin Knizhnik
417f37b2e8 Pass set of wanted image layers from GC to compaction (#3673)
## Describe your changes

Right now the only criteria for image layer generation is number of
delta layer since last image layer.
If we have "stairs" layout of delta layers (see link below) then it can
happen that there a lot of old delta layers which can not be reclaimed
by GC because are not fully covered with image layers.

This PR constructs list of "wanted" image layers in GC (which image
layers are needed to be able to remove old layers)
and pass this list to compaction task which performs generation of image
layers.
So right now except deltas count criteria we also take in account
"wishes" of GC.

## Issue ticket number and link

See 
https://neondb.slack.com/archives/C033RQ5SPDH/p1676914249982519


## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-05-24 08:01:41 +03:00
sharnoff
7f1973f8ac bump vm-builder, use Neon-specific version (#4155)
In the v0.6.0 release, vm-builder was changed to be Neon-specific, so
it's handling all the stuff that Dockerfile.vm-compute-node used to do.

This commit bumps vm-builder to v0.7.3-alpha3.
2023-05-23 15:20:20 -07:00
Christian Schwarz
00f7fc324d tenant_map_insert: don't expose the vacant entry to the closure (#4316)
This tightens up the API a little.
Byproduct of some refactoring work that I'm doing right now.
2023-05-23 15:16:12 -04:00
Stas Kelvich
dad3519351 Add SQL-over-HTTP endpoint to Proxy
This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON
response structure resembling that of the node-postgres driver. This method,
using HTTP POST, achieves smaller amortized latencies in edge setups due to
fewer round trips and an enhanced open connection reuse by the v8 engine.

This update involves several intricacies:
1. SQL injection protection: We employed the extended query protocol, modifying
   the rust-postgres driver to send queries in one roundtrip using a text
   protocol rather than binary, bypassing potential issues like those identified
   in https://github.com/sfackler/rust-postgres/issues/1030.

2. Postgres type compatibility: As not all postgres types have binary
   representations (e.g., acl's in pg_class), we adjusted rust-postgres to
   respond with text protocol, simplifying serialization and fixing queries with
   text-only types in response.

3. Data type conversion: Considering JSON supports fewer data types than
   Postgres, we perform conversions where possible, passing all other types as
   strings. Key conversions include:
   - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain
     text)
   - postgres bool, null, text -> json bool, null, string
   - postgres array -> json array
   - postgres json and jsonb -> json object

4. Alignment with node-postgres: To facilitate integration with js libraries,
   we've matched the response structure of node-postgres, returning command tags
   and column oids. Command tag capturing was added to the rust-postgres
   functionality as part of this change.
2023-05-23 20:01:40 +03:00
dependabot[bot]
d75b4e0f16 Bump requests from 2.28.1 to 2.31.0 (#4305) 2023-05-23 14:54:51 +01:00
Christian Schwarz
4d41b2d379 fix: max_lsn_wal_lag broken in tenant conf (#4279)
This patch fixes parsing of the `max_lsn_wal_lag` tenant config item.
We were incorrectly expecting a string before, but the type is a
NonZeroU64.

So, when setting it in the config, the (updated) test case would fail
with

```
 E       psycopg2.errors.InternalError_: Tenant a1fa9cc383e32ddafb73ff920de5f2e6 will not become active. Current state: Broken due to: Failed to parse config from file '.../repo/tenants/a1fa9cc383e32ddafb73ff920de5f2e6/config' as pageserver config: configure option max_lsn_wal_lag is not a string. Backtrace:
```

So, not even the assertions added are necessary.

The test coverage for tenant config is rather thin in general.
For example, the `test_tenant_conf.py` test doesn't cover all the
options.

I'll add a new regression test as part of attach-time-tenant-conf PR
https://github.com/neondatabase/neon/pull/4255
2023-05-23 16:29:59 +03:00
Shany Pozin
d6cf347670 Add an option to set "latest gc cutoff lsn" in pageserver binutils (#4290)
## Problem
[#2539](https://github.com/neondatabase/neon/issues/2539)
## Summary of changes
Add support  for latest_gc_cutoff_lsn update in pageserver_binutils
2023-05-23 15:48:43 +03:00
Joonas Koivunen
6388454375 test: allow benign warning in relation to startup ordering (#4262)
Allow the warning which happens because the disk usage based eviction
runs before tenants are loaded. Example failure:

https://neon-github-public-dev.s3.amazonaws.com/reports/main/5001582237/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/a711f5baf8f8bd8d/
2023-05-22 11:59:54 +03:00
Alexander Bayandin
3837fca7a2 compute-node-image: fix postgis download (#4280)
## Problem

`osgeo.org` is experiencing some problems with DNS resolving which
breaks `compute-node-image` (because it can't download postgis)

## Summary of changes
- Add `140.211.15.30 download.osgeo.org` to /etc/hosts by passing it via
the container option
2023-05-19 15:34:22 +01:00
Dmitry Rodionov
7529ee2ec7 rfc: the state of pageserver tenant relocation (#3868)
Summarize current state of tenant relocation related activities and implementation ideas
2023-05-19 14:35:33 +03:00
Christian Schwarz
b391c94440 tenant create / update-config: reject unknown fields (#4267)
This PR enforces that the tenant create / update-config APIs reject
requests with unknown fields.

This is a desirable property because some tenant config settings control
the lifetime of user data (e.g., GC horizon or PITR interval).

Suppose we inadvertently rename the `pitr_interval` field in the Rust
code.
Then, right now, a client that still uses the old name will send a
tenant config request to configure a new PITR interval.
Before this PR, we would accept such a request, ignore the old name
field, and use the pageserver.toml default value for what the new PITR
interval is.
With this PR, we will instead reject such a request.

One might argue that the client could simply check whether the config it
sent has been applied, using the `/v1/tenant/.../config` endpoint.
That is correct for tenant create and update-config.

But, attach will soon [^1] grow the ability to have attach-time config
as well.
If we ignore unknown fields and fall back to global defaults in that
case, we risk data loss.
Example:
1. Default PITR in pageservers is 7 days.
2. Create a tenant and set its PITR to 30 days.
3. For 30 days, fill the tenant continuously with data.
4. Detach the tenant.
5. Attach tenant.

Attach must use the 30-day PITR setting in this scenario.
If it were to fall back to the 7-day default value, we would lose 23
days of PITR capability for the tenant.

So, the PR that adds attach-time tenant config will build on the
(clunky) infrastructure added in this PR

[^1]: https://github.com/neondatabase/neon/pull/4255

Implementation Notes
====================

This could have been a simple `#[serde(deny_unknown_fields)]` but sadly,
that is documented- but silent-at-compile-time-incompatible with
`#[serde(flatten)]`. But we are still using this by adding on outer struct and use unit tests to ensure it is correct.

`neon_local tenant config` now uses the `.remove()` pattern + bail if
there are leftover config args. That's in line with what
`neon_local tenant create` does. We should dedupe that logic in a future
PR.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: Alex Chi <iskyzh@gmail.com>
2023-05-18 21:16:09 -04:00
Alexander Bayandin
5abc4514b7 Un-xfail fixed tests on Postgres 15 (#4275)
- https://github.com/neondatabase/neon/pull/4182  
- https://github.com/neondatabase/neon/pull/4213
2023-05-18 22:38:33 +01:00
Alexander Bayandin
1b2ece3715 Re-enable compatibility tests on Postgres 15 (#4274)
- Enable compatibility tests for Postgres 15
- Also add `PgVersion::v_prefixed` property to return the version number
with, _guess what,_ v-prefix!
2023-05-18 19:56:09 +01:00
Anastasia Lubennikova
8ebae74c6f Fix handling of XLOG_XACT_COMMIT/ABORT:
Previously we didn't handle XACT_XINFO_HAS_INVALS and XACT_XINFO_HAS_DROPPED_STAT correctly,
which led to getting incorrect value of twophase_xid for records with XACT_XINFO_HAS_TWOPHASE.
This caused 'twophase file for xid {} does not exist' errors in test_isolation
2023-05-18 14:36:45 +01:00
Vadim Kharitonov
fc886dc8c0 Compile pg_cron extension 2023-05-17 17:43:50 +02:00
Heikki Linnakangas
72346e102d Document that our code is mostly not async cancellation-safe.
We had a hot debate on whether we should try to make our code
cancellation-safe, or just accept that it's not, and make sure that
our Futures are driven to completion. The decision is that we drive
Futures to completion. This documents the decision, and summarizes the
reasoning for that.

Discussion that sparked this:
https://github.com/neondatabase/neon/pull/4198#discussion_r1190209316
2023-05-17 17:29:54 +03:00
Joonas Koivunen
918cd25453 ondemand_download_large_rel: solve flakyness (#3697)
Disable background tasks to not get compaction downloading all layers
but also stop safekeepers before checkpointing, use a readonly endpoint.

Fixes: #3666

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-05-17 16:19:02 +02:00
Alex Chi Z
9767432cff add cargo neon shortcut for neon_local (#4240)
Add `cargo neon` as a shortcut for compiling and running `neon_local`.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-17 16:48:00 +03:00
Anastasia Lubennikova
0c4dc55a39 Disable recovery_prefetch for Neon hot standby.
Prefetching of blocks referenced in WAL doesn't make sense for us,
because Neon hot standby anyway ignores pages that are not in the shared_buffers.
2023-05-17 13:35:56 +01:00
Alexander Bayandin
7b9e8be6e4 GitHub Autocomment: add a command to run all failed tests (#4200)
- Group tests by Postgres version
- Merge different build types
- Add a command to GitHub comment on how to rerun all failed tests
(different command for different Postgres versions)
- Restore a link to a test report in the build summary
2023-05-17 11:38:41 +01:00
Christian Schwarz
89307822b0 mgmt api: share a single tenant config model struct in Rust and OpenAPI (#4252)
This is prep for https://github.com/neondatabase/neon/pull/4255

[1/X] OpenAPI: share a single definition of TenantConfig

DRYs up the pageserver OpenAPI YAML's representation of
tenant config.

All the fields of tenant config are now located in a model schema
called TenantConfig.

The tenant create & config-change endpoints have separate schemas,
TenantCreateInfo and TenantConfigureArg, respectively.
These schemas inherit from TenantConfig, using allOf 1.

The tenant config-GET handler's response was previously named
TenantConfig.
It's now named TenantConfigResponse.

None of these changes affect how the request looks on the wire.

The generated Go code will change for Console because the OpenAPI code
generator maps `allOf` to a Go struct embedding.
Luckily, usage of tenant config in Console is still very lightweigt,
but that will change in the near future.
So, this is a good chance to set things straight.

The console changes are tracked in
 https://github.com/neondatabase/cloud/pull/5046

[2/x]: extract the tenant config parts of create & config requests

[3/x]: code movement: move TenantConfigRequestConfig next to
TenantCreateRequestConfig

[4/x] type-alias TenantConfigRequestConfig = TenantCreateRequestConfig;
They are exactly the same.

[5/x] switch to qualified use for tenant create/config request api
models

[6/x] rename models::TenantConfig{RequestConfig,} and remove the alias

[7/x] OpenAPI: sync tenant create & configure body names from Rust code

[8/x]: dedupe the two TryFrom<...> for TenantConfOpt impls
The only difference is that the TenantConfigRequest impl does

```
        tenant_conf.max_lsn_wal_lag = request_data.max_lsn_wal_lag;
        tenant_conf.trace_read_requests = request_data.trace_read_requests;
```

and the TenantCreateRequest impl does

```
        if let Some(max_lsn_wal_lag) = request_data.max_lsn_wal_lag {
            tenant_conf.max_lsn_wal_lag = Some(max_lsn_wal_lag);
        }
        if let Some(trace_read_requests) = request_data.trace_read_requests {
            tenant_conf.trace_read_requests = Some(trace_read_requests);
        }
```

As far as I can tell, these are identical.
2023-05-17 12:31:17 +02:00
Alexander Bayandin
30fe310602 Code Coverage: upload reports to S3 (#4256)
## Problem

`neondatabase/zenith-coverage-data` is too big:
- It takes ~6 minutes to clone and push the repo
- GitHub fails to publish an HTML report to github.io

Part of https://github.com/neondatabase/neon/issues/3543

## Summary of changes
Replace pushing code coverage report to
`neondatabase/zenith-coverage-data` with uploading it to S3
2023-05-17 11:30:07 +01:00
0x29a
ef41b63db7 docs: add links to the doc for better read experience (#4258)
add links to the doc  and refine links for better read experience
2023-05-17 12:25:01 +03:00
Christian Schwarz
1bceceac5a add helper to debug_assert that current span has a TenantId (#4248)
We already have `debug_assert_current_span_has_tenant_and_timeline_id`.
Have the same for just TenantId.
2023-05-17 11:03:46 +02:00
Christian Schwarz
4431779e32 refactor: attach: use create_tenant_files + schedule_local_tenant_processing (#4235)
With this patch, the attach handler now follows the same pattern as
tenant create with regards to instantiation of the new tenant:

1. Prepare on-disk state using `create_tenant_files`.
2. Use the same code path as pageserver startup to load it into memory
and start background loops (`schedule_local_tenant_processing`).

It's a bit sad we can't use the
`PageServerConfig::tenant_attaching_mark_file_path` method inside
`create_tenant_files` because it operates in a temporary directory.
However, it's a small price to pay for the gained simplicity.

During implementation, I noticed that we don't handle failures post
`create_tenant_files` well. I left TODO comments in the code linking to
the issue that I created for this [^1].

Also, I'll dedupe the spawn_load and spawn_attach code in a future
commit.

refs https://github.com/neondatabase/neon/issues/1555
part of https://github.com/neondatabase/neon/issues/886 (Tenant
Relocation)

[^1]: https://github.com/neondatabase/neon/issues/4233
2023-05-16 12:53:17 -04:00
Alexander Bayandin
131343ed45 Fix regress-tests job for Postgres 15 on release branch (#4253)
## Problem

Compatibility tests don't support Postgres 15 yet, but we're still
trying to upload compatibility snapshot (which we do not collect).

Ref
https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129

## Summary of changes

Add `pg_version` parameter to `run-python-test-set` actions and do not
upload compatibility snapshot for Postgres 15
2023-05-16 17:18:56 +01:00
Joseph Koshakow
511b0945c3 Replace usages of wait_for_active_timeline (#4243)
This commit replaces all usages of connection_manager.rs:
wait_for_active_timeline with Timeline::wait_to_become_active.
wait_to_become_active is better and in the right module.

close https://github.com/neondatabase/neon/issues/4189

Co-authored-by: Shany Pozin <shany@neon.tech>
2023-05-16 10:38:39 -04:00
Dmitry Rodionov
b7db62411b Make storage time operations an enum instead of an array (#4238)
Use an enum instead of an array. Before that there was no connection
between definition of the metric and point where it was used aside from
matching string literals. Now its possible to use IDE features to check
for references. Also this allows to avoid mismatch between set of
metrics that was defined and set of metrics that was actually used

What is interesting is that `init logical size` case is not used. I
think `LogicalSize` is a duplicate of `InitLogicalSize`. So removed the latter.
2023-05-16 16:54:29 +03:00
MMeent
efe9e131a7 Update vendored PostgreSQL to latest patch releases (#4208)
Conflicts:

- Changes in PG15's xlogrecovery.c resulted in non-substantial conflicts between
ecb01e6ebb5a67f3fc00840695682a8b1ba40461 and
aee72b7be903e52d9bdc6449aa4c17fb852d8708

Fixes #4207
2023-05-16 15:23:50 +02:00
Alex Chi Z
4a67f60a3b bump aws dep version (#4237)
This PR is simply the patch from
https://github.com/neondatabase/neon/issues/4008 except we enabled
`force_path_style` for custom endpoints. This is because at some
version, the s3 sdk by default uses the virtual-host style access, which
is not supported by MinIO in the default configuration. By enforcing
path style access for custom endpoints, we can pass all e2e test cases.

SDK 0.55 is not the latest version and we can bump it further later when
all flaky tests in this PR are resolved.

This PR also (hopefully) fixes flaky test
`test_ondemand_download_timetravel`.

close https://github.com/neondatabase/neon/issues/4008

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-16 09:09:50 -04:00
Alexander Bayandin
a65e0774a5 Increase shared memory size for regression test run (#4232)
Should fix flakiness caused by the error
```
FATAL:  could not resize shared memory segment "/PostgreSQL.3944613150" to 1048576 bytes: No space left on device
```
2023-05-16 14:06:47 +01:00
Dmitry Rodionov
a0b34e8c49 add create tenant metric to storage operations (#4231)
Add a metric to track time spent in create tenant requests

Originated from https://github.com/neondatabase/neon/pull/4204
2023-05-16 15:15:29 +03:00
bojanserafimov
fdc1c12fb0 Simplify github PR template (#4241) 2023-05-16 08:13:54 -04:00
Alexander Bayandin
0322e2720f Nightly Benchmarks: add neonvm to pgbench-compare (#4225) 2023-05-16 12:46:28 +01:00
Vadim Kharitonov
4f64be4a98 Add endpoint to connection string 2023-05-15 23:45:04 +02:00
Tristan Partin
e7514cc15e Wrap naked PQerrorMessage calls in libpagestore with pchomp (#4242) 2023-05-15 15:36:53 -05:00
Tristan Partin
6415dc791c Fix use-after-free issue in libpagestore (#4239)
## Describe your changes

`pageserver_disconnect()` calls `PQfinish()` which deallocates resources
on the connection structure. `PQerrorMessage()` hands back a pointer to
an allocated resource. Duplicate the error message prior to calling
`pageserver_disconnect()`.

## Issue ticket number and link

Fixes https://github.com/neondatabase/neon/issues/4214

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [x] Do not forget to reformat commit message to not include the above
checklist
2023-05-15 13:38:18 -05:00
Alexander Bayandin
a5615bd8ea Fix Allure reports for different benchmark jobs (#4229)
- Fix Allure report generation failure for Nightly Benchmarks
- Fix GitHub Autocomment for `run-benchmarks` label
(`build_and_test.yml::benchmarks` job)
2023-05-15 13:04:03 +01:00
Joonas Koivunen
4a76f2b8d6 upload new timeline index part json before 201 or on retry (#4204)
Await for upload to complete before returning 201 Created on
`branch_timeline` or when `bootstrap_timeline` happens. Should either of
those waits fail, then on the retried request await for uploads again.
This should work as expected assuming control-plane does not start to
use timeline creation as a wait_for_upload mechanism.

Fixes #3865, started from
https://github.com/neondatabase/neon/pull/3857/files#r1144468177

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-05-15 14:16:43 +03:00
Shany Pozin
9cd6f2ceeb Remove duplicated logic in creating TenantConfOpt (#4230)
## Describe your changes

Remove duplicated logic in creating TenantConfOpt in both TryFrom of
TenantConfigRequest and TenantCreateRequest
2023-05-15 10:08:44 +03:00
Heikki Linnakangas
2855c73990 Fix race condition after attaching tenant with branches. (#4170)
After tenant attach, there is a window where the child timeline is
loaded and accepts GetPage requests, but its parent is not. If a
GetPage request needs to traverse to the parent, it needs to wait for
the parent timeline to become active, or it might miss some records on
the parent timeline.

It's also possible that the parent timeline is active, but it hasn't
yet received all the WAL up to the branch point from the safekeeper.
This happens if a pageserver crashes soon after creating a timeline,
so that the WAL leading to the branch point has not yet been uploaded
to remote storage. After restart, the WAL will be re-streamed and
ingested from the safekeeper, but that takes a while. Because of that,
it's not enough to check that the parent timeline is active, we also
need to wait for the WAL to arrive on the parent timeline, just like
at the beginning of GetPage handling. We probably should change the
behavior at create_timeline so that a timeline can only be created
after all the WAL up to the branch point has been uploaded to remote
storage, but that's not currently the case and out of scope for this
PR (see github issue #4218).

@NanoBjorn encountered this while working on tenant migration. After
migrating a tenant with a parent and child branch, connecting to the
child branch failed with an error like:

```
FATAL:  "base/16385" is not a valid data directory
DETAIL:  File "base/16385/PG_VERSION" is missing.
```

This commit adds two tests that reproduce the bug, with slightly
different symptoms.
2023-05-13 10:44:11 +03:00
Christian Schwarz
edcf4d61a4 distinguish imitated from real size::gather_input calls in metrics (#4224)
Before this PR, the gather_inputs() calls made to imitate synthetic size
calculation accesses were accounted towards the real logical size
calculation metric.

This PR forces all callers to declare the cause for making logical size
calculations, making the decision which cause counts towards which
metric explicit.

This is follow-up to

```
commit 1d266a6365
Author: Christian Schwarz <christian@neon.tech>
Date:   Thu May 11 16:09:29 2023 +0200

    logical size calculation metrics: differentiate regular vs imitated (#4197)
```

After merging this patch, I hope to be able to explain why we have ca
30x more "logical size" ops in prod than "imitate logical size" for any
given observation interval.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-12 17:57:33 +00:00
Christian Schwarz
a2a9c598be add counter metric that increases whenever a background loop overruns its period (#4223)
We already have the warn!() log line for this condition. This PR adds a
corresponding metric on which we can have a dedicated alert. Cheaper and
more reliable than alerting on the logs, because, we run into log rate
limits from time to time these days.

refs https://github.com/neondatabase/neon/issues/4222
2023-05-12 19:00:06 +03:00
Alexander Bayandin
bb06d281ea Run regressions tests on both Postgres 14 and 15 (#4192)
This PR adds tests runs on Postgres 15 and created unified Allure report
with results for all tests.

- Split `.github/actions/allure-report` into
`.github/actions/allure-report-store` and
`.github/actions/allure-report-generate`
- Add debug or release pytest parameter for all tests (depending on
`BUILD_TYPE` env variable)
- Add Postgres version as a pytest parameter for all tests (depending on
`DEFAULT_PG_VERSION` env variable)
- Fix `test_wal_restore` and `restore_from_wal.sh` to support path with
`[`/`]` in it (fixed by applying spellcheck to the script and fixing all
warnings), `restore_from_wal_archive.sh` is deleted as unused.
- All known failures on Postgres 15 marked with xfail
2023-05-12 15:28:51 +01:00
Christian Schwarz
5869234290 logical size calculation: spawn with in_current_span (#4196)
While investigating https://github.com/neondatabase/neon/issues/4154 I
found that the `Calculating logical size for timeline` tracing events
created from within the logical size computation code are not always
attributable to the background task that caused it.

My goal is to be able to distinguish in the logs whether a `Calculating
logical size for timeline` was logged as part of a real synthetic size
calculation VS an imitation by the eviction task.

I want this distinction so I can prove my assumption that the disk IO
peaks which we see every 24h on prod are due to eviction's imitate
synthetic size calculations.

The alternative here, which I would have preferred, but is more work:
link RequestContext's into a child->parent list and dump this list when
we log `Calculating logical size for timeline`.

I would have preferred that over what we have in this PR because,
technically, the ondemand logical size computation can outlive the
caller that spawned it. This is against the idea of correctly nested
spans.

I guess in OpenTelemetry land, the correct modelling would be a link
between the caller's span and the task_mgr task's span.

Anyways, I think the case where we hang up on the spawned ondemand
logical size calculation is quite rare. So, I'm willing to tolerate
incorrectly nested spans for these edge-cases.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-12 15:36:30 +02:00
Rahul Modpur
ecfe4757d3 fix bogus at character context in log messages
Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>
2023-05-11 23:31:42 +01:00
Christian Schwarz
845e296562 eviction: add global histogram for iteration durations (#4212)
I would like to know whether and by how much the eviction iterations
spike in the $period-sized window that happens every $threshold , when
all the timelines do the imitate accesses.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-11 18:02:19 +03:00
Heikki Linnakangas
1988cc5527 Fix failpoint_sleep_millis_async without use std::time::Duration (#4195)
I tried to use failpoint_sleep_millis_async(...) in a source file that
didn't do `use std::time::Duration`, and got a compiler error:

```
error[E0433]: failed to resolve: use of undeclared type `Duration`
   --> pageserver/src/walingest.rs:316:17
    |
316 |                 utils::failpoint_sleep_millis_async!("wal-ingest-logical-message-sleep");
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ not found in this scope
    |
    = note: this error originates in the macro `utils::failpoint_sleep_millis_async` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider importing one of these items
    |
24  | use chrono::Duration;
    |
24  | use core::time::Duration;
    |
24  | use humantime::Duration;
    |
24  | use serde_with::__private__::Duration;
    |
      and 2 other candidates
```
2023-05-11 17:53:42 +03:00
Christian Schwarz
1d266a6365 logical size calculation metrics: differentiate regular vs imitated (#4197)
I want this distinction so I can prove my assumption that the disk IO
peaks which we see every 24h on prod are due to eviction's imitate
synthetic size calculations.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-11 17:09:29 +03:00
Christian Schwarz
80522a1b9d replace has_in_progress_downloads with new attachment_status field (#4168)
Control Plane currently [^1] polls for `has_in_progress_downloads ==
false` after /attach to determine that an attach operation succeeded.

As pointed out in the OpenAPI spec as of neon#4151, polling for
`has_in_progress_downloads` is incorrect.

This patch changes the situation by
- removing `has_in_progress_downloads`
- adding a new field `attachment_status.`
- changing instructions for `/attach` to poll for `attachment_status ==
attached`.

This makes the instructions in `/attach` actionable for Control Plane.
NB that we don't expose the TenantState in the OpenAPI docs, even though
we expose it in the endpoint. That is with good reason because we don't
want to commit to a fixed set of tenant states forever. Hence, the
separate `attachment_status` field that exposes the bare minimum
required to make /attach + subsequent polling 100% safe wrt split brain.

It would have been nice to report failures explicitly, but the problem
is that we lose that state when we restart. So, we return `attached`
upon attach failure. The tenant is Broken in that case, causing Control
Plane's subsequent health check will fail. Control Plane can roll back
the relocation operation then.
NB: the reliance on the subsequent health check is no change to what we
had before this patch!
NB: we can always add additional TenantAttachmentStatus'es in the future
to communicate failure.

This PR also moves the attach-marker file's creation to the API
handler's synchronous part. That was done to avoid the need to
distinguish
* `Attaching but marker not yet written => AttachmentStatus::Maybe` from
* `Attaching, marker written, but attach failed for other reason =>
AttachmentStatus::Attached`

Coincidentally, this also adds more transactionality to the /attach API
because we only return 202 once we've written the marker file. But, in
the end, it doesn't affect how the control plane interacts with us or
how it needs to do retries. So, we don't mention any of this in the API
docs.

[^1]: The one-click tenant relocation PR cloud#4740, currently WIP, is
      the first real user.
2023-05-11 16:53:46 +03:00
Joonas Koivunen
ecced13d90 try: higher page_service timeouts to isolate an issue (#4206)
See #4205.
2023-05-11 16:14:42 +03:00
Alexander Bayandin
59510f6449 scripts/flaky_tests.py: use retriesStatusChange from Allure 2023-05-10 16:59:03 +01:00
Alexander Bayandin
7fc778d251 GitHub Autocomment: fix flaky test notifications 2023-05-10 16:59:03 +01:00
Alexander Bayandin
1d490b2311 Make benchmark_fixture less noisy 2023-05-10 16:59:03 +01:00
Dmitry Rodionov
eb3a8be933 keep track of timeline deletion status in IndexPart to prevent timeline resurrection (#3919)
Before this patch, the following sequence would lead to the resurrection of a deleted timeline:

- create timeline
- wait for its index part to reach s3
- delete timeline
- wait an arbitrary amount of time, including 0 seconds
- detach tenant
- attach tenant
- the timeline is there and Active again

This happens because we only kept track of the deletion in the tenant dir (by deleting the timeline dir) but not in S3.

The solution is to turn the deleted timeline's IndexPart into a tombstone.
The deletion status of the timeline is expressed in the `deleted_at: Option<NativeDateTime>` field of IndexPart.
It's `None` while the timeline is alive and `Some(deletion time stamp)` if it is deleted.

We change the timeline deletion handler to upload this tombstoned IndexPart.
The handler does not return success if the upload fails.

Coincidentally, this fixes the long-stanging TODO about the `std::fs::remove_dir_all` being not atomic.
It need not be atomic anymore because we set the `deleted_at=Some()` before starting the `remove_dir_all`.

The tombstone is in the IndexPart only, not in the `metadata`.
So, we only have the tombstone and the `remove_dir_all` benefits mentioned above if remote storage is configured.
This was a conscious trade-off because there's no good format evolution story for the current metadata file format.

The introduction of this additional step into `delete_timeline` was painful because delete_timeline needs to be
1. cancel-safe
2. idempotent
3. safe to call concurrently
These are mostly self-inflicted limitations that can be avoided by using request-coalescing.
PR https://github.com/neondatabase/neon/pull/4159 will do that.

fixes https://github.com/neondatabase/neon/issues/3560

refs https://github.com/neondatabase/neon/issues/3889 (part of tenant relocation)


Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-05-10 10:27:12 +02:00
Christian Schwarz
3ec52088dd eviction_task: tracing::instrument the imitate-access calls (#4180)
Currently, if we unexpectly download from the eviction task, the log
lines look like what we have in
https://github.com/neondatabase/neon/issues/4154

```
2023-05-04T14:42:57.586772Z  WARN eviction_task{tenant_id=$TENANT timeline_id=$TIMELINE}:eviction_iteration{policy_kind="LayerAccessThreshold"}: unexpectedly on-demand downloading remote layer remote $TIMELINE/000000067F000032AC0000400C00FFFFFFFF-000000067F000032AC000040140000000008__0000000001696070-0000000003DC76E9 for task kind Eviction
```

We know these are caused by the imitate accesses.
But we don't know which one (my bet is on update_gc_info).

I didn't want to pollute the other tasks' logs with the additional
spans, so, using `.instrument()` when we call non-eviction-task code.

refs https://github.com/neondatabase/neon/issues/4154
2023-05-09 18:16:22 +02:00
Heikki Linnakangas
66b06e416a Pass tracing context in env variables instead of the spec file. (#4174)
If compute_ctl is launched without a spec file, it fetches it from the
control plane with an HTTP request. We cannot get the startup tracing
context from the compute spec in that case, because we don't have it
available on start. We could still read the tracing context from the
compute spec after we have fetched it, but that would leave the fetch
itself out of the context. Pass the tracing context in environment
variables instead.
2023-05-09 17:08:02 +03:00
Arthur Petukhovsky
d62315327a Allow parallel backup in safekeepers (#4177)
Add `wal_backup_parallel_jobs` cmdline argument to specify the max count
of parallel segments upload. New default value is 5, meaning that
safekeepers will try to upload 5 segments concurrently if they are
available. Setting this value to 1 will be equivalent to the sequential
upload that we had before.

Part of the https://github.com/neondatabase/neon/issues/3957
2023-05-09 12:20:35 +03:00
Anastasia Lubennikova
4bd7b1daf2 Bump vendor/postgres:
Fix entering hot standby mode for Neon postgres v15
2023-05-08 21:25:47 +01:00
Sergey Melnikov
0d3d022eb1 Remove deploy workflows (#4157)
## Describe your changes
Removing deploy workflows (moving to aws repo)
2023-05-08 17:30:16 +02:00
Raouf Chebri
e85cbddd2e Update neondatabase banner in README.md (#4176)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-05-08 17:12:42 +02:00
Anton Chaporgin
51ff9f9359 pg-sni-router nlb is internal (#4164) 2023-05-08 18:03:50 +03:00
Vadim Kharitonov
0f8b2d8f0a Compile kq_imcx extension (#3568)
## Describe your changes
Compiles kq_imcx extension

At this moment, there are some issues with the extension:
1. I'm cloning it directly from the master branch. It's better to fetch
tag/archive
2. PG14:
```
postgres=# CREATE EXTENSION IF NOT EXISTS kq_imcx;
postgres=# select * from kq_calendar_cache_info();
2023-02-08 13:55:22.853 UTC [412] ERROR:  relation "ketteq.slice_type" does not exist at character 34
2023-02-08 13:55:22.853 UTC [412] QUERY:  select min(s.id), max(s.id) from ketteq.slice_type s
2023-02-08 13:55:22.853 UTC [412] STATEMENT:  select * from kq_calendar_cache_info();
ERROR:  relation "ketteq.slice_type" does not exist
LINE 1: select min(s.id), max(s.id) from ketteq.slice_type s
```
3. PG15:
`cannot request additional shared memory outside shmem_request_hook`

Note: I don't think we need to publish info about this extension in the
docs.

## Issue ticket number and link
neondatabase/cloud#3387
2023-05-08 15:56:08 +02:00
Gleb Novikov
9860d59aa2 Public docker image repository by default 2023-05-08 15:51:54 +04:00
Christian Schwarz
411c71b486 document current tenant attach API semantics (#4151)
We currently return 202 as soon as the tenant is allocated in memory
before we've written out the marker file. So, the /attach API currently
does not have a transactional character. For example, it can happen that
we respond with a 202 and then crash before writing out the marker file.
In such a case, it is important that the client

1. observes the lost attach (by polling tenant status and observing 404)
2. and consequently retries the attach.

It has to do it in this loop until it observes the tenant as "Active" in
the tenant status. If the client doesn't follow this protocol and
instead goes to another pageserver to attach the tenant, we risk a
split-brain situation where both the first and second pageserver write
to the tenant's S3 state.

The improved description highlights the consequences of this behavior
for clients that use the /attach endpoint.

The tenant relocation that is currently being implemented in cloud#4740
implements retries of Attach and it does poll afterwards, but, it polls
`has_in_progress_downloads`.
That is incorrect, as described in the patch body.

The motivation for this write-up is that, in a future PR, we'll extend
the /attach endpoint with an option to provide the tenant config. If we
decide to leave the non-transactional behavior of /attach unmodified, we
will be able to avoid persisting the tenant config. Conversely, if we
decide that the /attach API should become transactional, we'll need to
persist the tenant config in the attach-marker-file before acknowledging
receipt of the /attach operation.

refs https://github.com/neondatabase/cloud/pull/4740
refs https://github.com/neondatabase/neon/issues/2238
refs https://github.com/neondatabase/neon/issues/1555
2023-05-05 19:32:41 +03:00
Alexey Kondratov
dd4fd89dc6 [compute_ctl] Do not initialize last_active on start (#4137)
Our scale-to-zero logic was optimized for short auto-suspend intervals,
e.g. minutes or hours. In this case, if compute was restarted by k8s due
to some reason (OOM, k8s node went down, pod relocation, etc.),
`last_active` got bumped, we start counting auto-suspend timeout again.
It's not a big deal, i.e. we suspend completely idle compute not after 5
minutes, but after 10 minutes or so.

Yet, some clients may want days or even weeks. And chance that compute
could be restarted during this interval is pretty high, but in this case
we could be not able to suspend some computes for weeks.

After this commit, we won't initialize `last_active` on start, so
`/status` could return an unset attribute. This means that there was no
user activity since start. Control-plane should deal with it by taking
`max()` out of all available activity timestamps: `started_at`,
`last_active`, etc.

compute_ctl part of neondatabase/cloud#4853
2023-05-05 11:45:37 +02:00
Alexander Bayandin
653e633c59 test_runner: add --pg-version pytest argument (#4037)
- allows setting Postgres version for testing using --pg-version argument
- fixes tests for the non-default Postgres version.
2023-05-05 02:57:47 +03:00
Alexander Bayandin
291b4f0d41 Update client libs for test_runner/pg_clients to their latest versions (#4092)
Also, use Workaround D for `swift/PostgresClientKitExample`, 
which supports neither SNI nor connections options
2023-05-04 18:22:04 +01:00
Christian Schwarz
88f39c11d4 refactor: the code that builds TenantConfOpt from mgmt API requests (#4152)
- extract code that builds TenantConfOpt from requests into a From<>
impl
- move map_err(ApiError::BadRequest) into callers
2023-05-04 18:10:40 +03:00
Heikki Linnakangas
b627fa71e4 Make read-only replicas explicit in compute spec (#4136)
This builds on top of PR #4058, and supersedes #4018
2023-05-04 17:41:42 +03:00
Christian Schwarz
7dd9553bbb eviction: regression test + distinguish layer write from map insert (#4005)
This patch adds a regression test for the threshold-based layer
eviction.
The test asserts the basic invariant that, if left alone, the residence
statuses will stabilize, with some layers resident and some layers
evicted.
Thereby, we cover both the aspect of last-access-time-threshold-based
eviction, and the "imitate access" hacks that we put in recently.

The aggressive `period` and `threshold` values revealed a subtle bug
which is also fixed in this patch.
The symptom was that, without the Rust changes of this patch, there
would be occasional test failures due to `WARN... unexpectedly
downloading` log messages.
These log messages were caused by the "imitate access" calls of the
eviction task.
But, the whole point of the "imitate access" hack was to prevent
eviction of the layers that we access there.
After some digging, I found the root cause, which is the following race
condition:

1. Compact: Write out an L1 layer from several L0 layers. This records
residence event `LayerCreate` with the current timestamp.
2. Eviction: imitate access logical size calculation. This accesses the
L0 layers because the L1 layer is not yet in the layer map.
3. Compact: Grab layer map lock, add the new L1 to layer map and remove
the L0s, release layer map lock.
4. Eviction: observes the new L1 layer whose only activity timestamp is
the `LayerCreate` event.

The L1 layer had no chance of being accessed until after (3).
So, if enough time passes between (1) and (3), then (4) will observe a
layer with `now-last_activity > threshold` and evict it

The fix is to require the first `record_residence_event` to happen while
we already hold the layer map lock.
The API requires a ref to a `BatchedUpdates` as a witness that we are
inside a layer map lock.
That is not fool-proof, e.g., new call sites for `insert_historic` could
just completely forget to record the residence event.
It would be nice to prevent this at the type level.
In the meantime, we have a rate-limited log messages to warn us, if such
an implementation error sneaks in in the future.

fixes https://github.com/neondatabase/neon/issues/3593
fixes https://github.com/neondatabase/neon/issues/3942

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-05-04 16:16:48 +02:00
Heikki Linnakangas
b5d64a1e32 Rename field, to match field name in XLogData struct and in rust-postgres (#4149)
The field means the same thing as the `wal_end` field in the XLogData
struct. And in the postgres-protocol crate's corresponding
PrimaryKeepAlive struct, it's also called `wal_end`. Let's be
consistent.

As noted by Arthur at
https://github.com/neondatabase/neon/pull/4144#pullrequestreview-1411031881
2023-05-04 14:41:15 +03:00
Christian Schwarz
f9839a0dd9 import_basebackup_from_tar: don't load local layers twice (#4111)
PR #4104 removed these bits as part of a revert of a larger change.

follow-up to
https://github.com/neondatabase/neon/pull/4104#discussion_r1180444952

---

Let's not merge this before the release.
2023-05-04 09:23:49 +02:00
Arthur Petukhovsky
ce1bbc9fa7 Always send the latest commit_lsn in send_wal (#4150)
When a new connection is established to the safekeeper, the 'end_pos'
field is initially set to Lsn::INVALID (i.e 0/0). If there is no WAL to
send to the client, we send KeepAlive messages with Lsn::INVALID. That
confuses the pageserver: it thinks that safekeeper is lagging very much
behind the tip of the branch, and will reconnect to a different
safekeeper. Then the same thing happens with the new safekeeper, until
some WAL is streamed which sets 'end_pos' to a valid value.

This fix always sets `end_pos` to the most recent `commit_lsn` value.
This is useful to send the latest `commit_lsn` to the receiver, so it
will know how advanced this safekeeper compared to the others.

Fixes https://github.com/neondatabase/neon/issues/3972
Supersedes https://github.com/neondatabase/neon/pull/4144
2023-05-04 00:07:45 +03:00
Alexander Bayandin
b114ef26c2 GitHub Autocomment: add a note if no tests were run (#4109)
- Always (if not cancelled) add a comment to a PR
- Mention in the comment if no tests were run / reports were not
generated.
2023-05-03 15:38:49 +01:00
Arthur Petukhovsky
3ceef7b17a Add more safekeeper and walreceiver metrics (#4142)
Add essential safekeeper and pageserver::walreceiver metrics. Mostly
counters, such as the number of received queries, broker messages,
removed WAL segments, or connection switches events in walreceiver.
Also logs broker push loop duration.
2023-05-03 17:07:41 +03:00
Kirill Bulatov
586e6e55f8 Print WalReceiver context on WAL waiting timeout (#4090)
Closes https://github.com/neondatabase/neon/issues/2106

Before:
```
Extracting base backup to create postgres instance: path=/Users/someonetoignore/work/neon/neon_main/test_output/test_pageserver_lsn_wait_error_safekeeper_stop/repo/endpoints/ep-2/pgdata port=15017

              stderr: command failed: page server 'basebackup' command failed

Caused by:
    0: db error: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50
    1: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50

Stack backtrace:
```

After:
```
Extracting base backup to create postgres instance: path=/Users/someonetoignore/work/neon/neon/test_output/test_pageserver_lsn_wait_error_safekeeper_stop/repo/endpoints/ep-2/pgdata port=15011

              stderr: command failed: page server 'basebackup' command failed

Caused by:
    0: db error: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50, WalReceiver status (update 2023-04-26 14:20:39): streaming WAL from node 12346, commit|streaming Lsn: 0/A2C3F58|0/A2C3F58, safekeeper candidates (id|update_time|commit_lsn): [(12348|14:20:40|0/A2C3F58), (12346|14:20:40|0/A2C3F58), (12347|14:20:40|0/A2C3F58)]
    1: ERROR: Timed out while waiting for WAL record at LSN 0/FFFFFFFF to arrive, last_record_lsn 0/A2C3F58 disk consistent LSN=0/16B5A50, WalReceiver status (update 2023-04-26 14:20:39): streaming WAL from node 12346, commit|streaming Lsn: 0/A2C3F58|0/A2C3F58, safekeeper candidates (id|update_time|commit_lsn): [(12348|14:20:40|0/A2C3F58), (12346|14:20:40|0/A2C3F58), (12347|14:20:40|0/A2C3F58)]

Stack backtrace:
```

As the issue requests, the PR adds the context in logs only, but I think
we should expose the context via HTTP management API similar way — it
should be simple with the new API, but better be done in a separate PR.

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2023-05-03 16:25:19 +03:00
Anton Chaporgin
db81242f4a add debug to pg-sni-router install (#4143) 2023-05-03 16:14:16 +03:00
dependabot[bot]
39ca7c7c09 Bump flask from 2.1.3 to 2.2.5 (#4138) 2023-05-03 10:40:35 +01:00
Heikki Linnakangas
ecc0cf8cd6 Treat EPIPE as an expected error. (#4141)
If the other end of a TCP connection closes its read end of the socket,
you get an EPIPE when you try to send. I saw that happen in the CI once:


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4136/release/4869464644/index.html#suites/c19bc2126511ef8cb145cca25c438215/7ec87b016c0b4b50/

```
2023-05-03T07:53:22.394152Z ERROR Task 'serving compute connection task' tenant_id: Some(c204447079e02e7ba8f593cb8bc57e76), timeline_id: Some(b666f26600e6deaa9f43e1aeee5bacb7) exited with error: Postgres connection error

Caused by:
    Broken pipe (os error 32)

Stack backtrace:
   0: pageserver::page_service::page_service_conn_main::{{closure}}
             at /__w/neon/neon/pageserver/src/page_service.rs:282:17
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::future::future::Future>::poll
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panic/unwind_safe.rs:296:9
      <futures_util::future::future::catch_unwind::CatchUnwind<Fut> as core::future::future::Future>::poll::{{closure}}
             at /__w/neon/neon/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.28/src/future/future/catch_unwind.rs:36:42
      <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panic/unwind_safe.rs:271:9
   ...
```

In the passing, add a comment to explain what the "expected" in the
`is_expected_io_error` function means.
2023-05-03 12:03:13 +03:00
Joonas Koivunen
faebe3177b wal_craft: cleanup to keep editable (#4121)
wal_craft had accumulated some trouble by using `use anyhow::*;`. Fixes
that, removes redundant conversions (never need to convert a Path to
OsStr), especially at the `Process` args.

Originally in #4100 but we merged a later PR instead for the fixes. I
dropped the `postmaster.pid` polling in favor of just having a longer
connect timeout.
2023-05-03 11:11:55 +03:00
Joonas Koivunen
474f69c1c0 fix: omit cancellation logging when panicking (#4125)
noticed while describing `RequestSpan`, this fix will omit the otherwise
logged message about request being cancelled when panicking in the
request handler. this was missed on #4064.
2023-05-03 10:56:49 +03:00
Konstantin Knizhnik
47521693ed Fix file_cache build warnings (#4118)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-05-02 22:28:21 +03:00
Heikki Linnakangas
4d55d61807 Store basic endpoint info in endpoint.json file. (#4058)
It's more convenient than parsing the postgresql.conf file.

Extracted from PR #3886. I started working on another patch (to make it
safe to run two "neon_local endpoint create" commands concurrently), and
realized that this change will make that simpler too.
2023-05-02 20:36:11 +03:00
Sergey Melnikov
093fafd6bd Deploy pg-sni-router (#4132) 2023-05-01 17:18:45 +02:00
Shany Pozin
e3ae2661ee Add 2 new sets of safekeepers to us-west2 (#4130)
## Describe your changes
TF output:       
module.safekeeper-us-west-2.aws_instance.this["3"]: Creation complete
after 13s [id=i-089f6b9ef426dff76]
module.safekeeper-us-west-2.aws_instance.this["4"]: Creation complete
after 13s [id=i-0fe6bf912c4710c82]
module.safekeeper-us-west-2.aws_instance.this["5"]: Creation complete
after 13s [id=i-0a83c1c46d2b4e409]
module.safekeeper-us-west-2.aws_instance.this["6"]: Creation complete
after 13s [id=i-0fef5317b8fdc9f8d]
module.safekeeper-us-west-2.aws_instance.this["7"]: Creation complete
after 13s [id=i-0be739190d4289bf9]
module.safekeeper-us-west-2.aws_instance.this["8"]: Creation complete
after 13s [id=i-00e851803669e5cfe]
2023-05-01 14:22:59 +03:00
Anton Chaporgin
7e368f3edf build pg-sni-router binary (#4129)
## Describe your changes
This adds pg-sni-router binary to the build pipeline and neon image.

## Issue ticket number and link

https://github.com/neondatabase/cloud/issues/1461
2023-05-01 12:14:31 +02:00
Joonas Koivunen
138bc028ed fix: quick and dirty panic avoidance on drop path (#4128)
Sentry caught a panic on load testing server related to metric removals:
https://neondatabase.sentry.io/issues/4142396994

Turn the `expect` into logging, but also add logging for each removal,
so we could identify in which cases we do double-remove. The
double-removal (or never adding) cause is not obvious or expected.

Original added in #3837.
2023-05-01 11:54:09 +03:00
Stas Kelvich
d53f81b449 Add one more pageserver to staging 2023-04-30 22:39:34 +03:00
Joonas Koivunen
6f472df0d0 fix: restore not logging ignored io errors as errors (#4120)
the fix is rather indirect due to the accidental applying of too much
`anyhow`: if handle_pagerequests returns a `QueryError` it will now be
bubbled up as-is `QueryError`. `QueryError` allows the inner
`std::io::Error` to be inspected and thus we can filter certain error
kinds which are perfectly normal without a huge log message.

for a very long time (b2f5102) the errors were converted to `anyhow` by
mistake which made this difficult or impossible, even though from the
types it would *appear* that we propagate wrapped `std::io::Error`s and
can filter them.

Fixes #4113, most likely filters some other errors as well.
2023-04-30 14:34:55 +03:00
Rahul Patil
21eb944b5e Staging: Add safekeeper nodes [3-8] to eu-west-1 (#4123) 2023-04-29 15:25:57 +03:00
Arthur Petukhovsky
95244912c5 Override sharded-slab to increase MAX_THREADS (#4122)
Add patch directive to Cargo.toml to use patched version of
sharded-slab:
98d16753ab

Patch changes the MAX_THREADS limit from 4096 to 32768. This is a
temporary workaround for using tracing from many threads in safekeepers
code, until async safekeepers patch is merged to the main.

Note that patch can affect other rust services, not only the safekeeper
binary.
2023-04-29 13:31:04 +03:00
Shany Pozin
2617e70008 Add 4 new Pageservers for retool launch (#4115)
## Describe your changes
Adding 4 new pageserves to us-west

TF apply output:
module.pageserver-us-west-2.aws_instance.this["7"]: Creation complete
after 21s [id=i-02eec9b40617db5bc]
module.pageserver-us-west-2.aws_instance.this["5"]: Creation complete
after 21s [id=i-00ca6417c7bf96820]
module.pageserver-us-west-2.aws_instance.this["4"]: Creation complete
after 21s [id=i-013263dd1c239adcc]
module.pageserver-us-west-2.aws_instance.this["6"]: Creation complete
after 22s [id=i-01cdf7d2bc1433b6a]
2023-04-29 11:42:52 +02:00
Arthur Petukhovsky
8543485e92 Pull clone timeline from peer safekeepers (#4089)
Add HTTP endpoint to initialize safekeeper timeline from peer
safekeepers. This is useful for initializing new safekeeper to replace
failed safekeeper. Not fully "correct" in all cases, but should work in
most.

This code is not suitable for production workloads but can be tested on
staging to get started. New endpoint is separated from usual cases and
should not affect anything if no one explicitly uses a new endpoint. We
can rollback this commit in case of issues.
2023-04-28 14:20:46 +00:00
Joonas Koivunen
ec53c5ca2e revert: "Add check for duplicates of generated image layers" (#4104)
This reverts commit 732acc5.

Reverted PR: #3869

As noted in PR #4094, we do in fact try to insert duplicates to the
layer map, if L0->L1 compaction is interrupted. We do not have a proper
fix for that right now, and we are in a hurry to make a release to
production, so revert the changes related to this to the state that we
have in production currently. We know that we have a bug here, but
better to live with the bug that we've had in production for a long
time, than rush a fix to production without testing it in staging first.

Cc: #4094, #4088
2023-04-28 17:20:18 +03:00
Stas Kelvich
94d612195a bump rust-postgres version, after merging PR in rust-postgres 2023-04-28 17:15:43 +03:00
Stas Kelvich
b1329db495 fix sigterm handling 2023-04-28 17:15:43 +03:00
Stas Kelvich
5bb971d64e fix more python tests 2023-04-28 17:15:43 +03:00
Stas Kelvich
0364f77b9a fix python styling 2023-04-28 17:15:43 +03:00
Stas Kelvich
4ac6a9f089 add backward compatibility to proxy 2023-04-28 17:15:43 +03:00
Stas Kelvich
9486d76b2a Add tests for link auth to compute connection 2023-04-28 17:15:43 +03:00
Stas Kelvich
040f736909 remove changes in main proxy that are now not needed 2023-04-28 17:15:43 +03:00
Stas Kelvich
645e4f6ab9 use TLS in link proxy 2023-04-28 17:15:43 +03:00
Heikki Linnakangas
e947cc119b Add a small test case for pg_sni_router 2023-04-28 17:15:43 +03:00
Heikki Linnakangas
53e5d18da5 Start passthrough earlier
As soon as we have received the SSLRequest packet, and have figured
out the hostname to connect to from the SNI, we can start passing
through data. We don't need to parse the StartupPacket that the client
will send next.
2023-04-28 17:15:43 +03:00
Heikki Linnakangas
3813c703c9 Add an option for destination port.
Makes it easier to test locally.
2023-04-28 17:15:43 +03:00
Heikki Linnakangas
b15204fa8c Fix --help, and required args 2023-04-28 17:15:43 +03:00
Alexey Kondratov
81c75586ab Take port from SNI, formatting, make clippy happy 2023-04-28 17:15:43 +03:00
Anton Chaporgin
556fb1642a fixed the way hostname is parsed 2023-04-28 17:15:43 +03:00
Stas Kelvich
23aca81943 Add SNI-based proxy router
In order to not to create NodePorts for each compute we can setup
services that accept connections on wildcard domains and then use
information from domain name to route connection to some internal
service. There are ready solutions for HTTPS and TLS connections
but postgresql protocol uses opportunistic TLS and we haven't found
any ready solutions.

This patch introduces `pg_sni_router` which routes connections to
`aaa--bbb--123.external.domain` to `aaa.bbb.123.internal.domain`.

In the long run we can avoid console -> compute psql communications,
but now this router seems to be the easier way forward.
2023-04-28 17:15:43 +03:00
Arseny Sher
42798e6adc Increase connection_timeout to PG in find end of WAL test.
And log postgres to stdout.

Probably fixes https://github.com/neondatabase/neon/issues/3778
2023-04-28 16:17:23 +04:00
Arthur Petukhovsky
b03143dfc8 Use serde_as DisplayFromStr everywhere (#4103)
We used `display_serialize` previously, but it works only for Serialize.
`DisplayFromStr` does the same, but also works for Deserialize.
2023-04-28 13:55:07 +03:00
Arseny Sher
fdacfaabfd Move PageserverFeedback to utils.
It allows to replace u64 with proper Lsn and pretty print PageserverFeedback
with serde(_json). Now walsenders on safekeepers queried with debug_dump look
like

"walsenders": [
  {
    "ttid": "fafe0cf39a99c608c872706149de9d2a/b4fb3be6f576935e7f0fcb84bdb909a1",
    "addr": "127.0.0.1:48774",
    "conn_id": 3,
    "appname": "pageserver",
    "feedback": {
      "Pageserver": {
	"current_timeline_size": 32096256,
	"last_received_lsn": "0/2415298",
	"disk_consistent_lsn": "0/1696628",
	"remote_consistent_lsn": "0/0",
	"replytime": "2023-04-12T13:54:53.958856+00:00"
      }
    }
  }
],
2023-04-28 06:22:13 +04:00
Arseny Sher
b2a3981ead Move tracking of walsenders out of Timeline.
Refactors walsenders out of timeline.rs to makes it less convoluted into
separate WalSenders with its own lock, but otherwise having the same structure.
Tracking of in-memory remote_consistent_lsn is also moved there as it is mainly
received from pageserver.

State of walsender (feedback) is also restructured to be cleaner; now it is
either PageserverFeedback or StandbyFeedback(StandbyReply, HotStandbyFeedback),
but not both.
2023-04-28 06:22:13 +04:00
Joonas Koivunen
fe0b616299 feat(page_service): read timeouts (#4093)
Introduce read timeouts to our `page_service` connections. Without read
timeouts, we essentially leak connections.

This is a port of #3995. Split the refactorings to the other PR: #4097.

Fixes #4028.
2023-04-27 17:55:35 +00:00
Alexander Bayandin
c4e1cafb63 scripts/flaky_tests.py: handle connection error (#4096)
- Increase `connect_timeout` to 30s, which should be enough for 
most of the cases
- If the script cannot connect to the DB (or any other
`psycopg2.OperationalError` occur) — do not fail the script, log
the error and proceed. Problems with fetching flaky tests shouldn't
block the PR
2023-04-27 17:08:00 +01:00
Joonas Koivunen
fdf5e4db5e refactor: Cleanup page service (#4097)
Refactoring part of #4093.

Numerious `Send + Sync` bounds were a distraction, that were not needed
at all. The proper `Bytes` usage and one `"error_message".to_string()`
are just drive-by fixes.

Not using the `PostgresBackendTCP` allows us to start setting read
timeouts (and more). `PostgresBackendTCP` is still used from proxy, so
it cannot be removed.
2023-04-27 18:51:57 +03:00
Heikki Linnakangas
d1e86d65dc Run rustfmt to fix whitespace.
Commit e6ec2400fc introduced some trivial whitespace issues.
2023-04-27 18:45:22 +03:00
Arseny Sher
f5b4697c90 Log session_id when proxy per client task errors out. 2023-04-27 19:08:22 +04:00
Christian Schwarz
3be81dd36b fix clippy --release failure introduced in #4030 (#4095)
PR `build: run clippy for powerset of features (#4077)` brought us a
`clippy --release` pass.

It was merged after #4030, which fails under `clippy --release` with

```
error: static `TENANT_ID_EXTRACTOR` is never used
    --> pageserver/src/tenant/timeline.rs:4270:16
     |
4270 |     pub static TENANT_ID_EXTRACTOR: once_cell::sync::Lazy<
     |                ^^^^^^^^^^^^^^^^^^^
     |
     = note: `-D dead-code` implied by `-D warnings`

error: static `TIMELINE_ID_EXTRACTOR` is never used
    --> pageserver/src/tenant/timeline.rs:4276:16
     |
4276 |     pub static TIMELINE_ID_EXTRACTOR: once_cell::sync::Lazy<
     |                ^^^^^^^^^^^^^^^^^^^^^
```

A merge queue would have prevented this.
2023-04-27 17:07:25 +03:00
MMeent
e6ec2400fc Enable hot standby PostgreSQL replicas.
Notes:
 - This still needs UI support from the Console
 - I've not tuned any GUCs for PostgreSQL to make this work better
 - Safekeeper has gotten a tweak in which WAL is sent and how: It now
sends zero-ed WAL data from the start of the timeline's first segment up to
the first byte of the timeline to be compatible with normal PostgreSQL
WAL streaming.
 - This includes the commits of #3714 

Fixes one part of https://github.com/neondatabase/neon/issues/769

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-04-27 15:26:44 +02:00
Christian Schwarz
5b911e1f9f build: run clippy for powerset of features (#4077)
This will catch compiler & clippy warnings in all feature combinations.

We should probably use cargo hack for build and test as well, but,
that's quite expensive and would add to overall CI wait times.

obsoletes https://github.com/neondatabase/neon/pull/4073
refs https://github.com/neondatabase/neon/pull/4070
2023-04-27 15:01:27 +03:00
Christian Schwarz
9ea7b5dd38 clean up logging around on-demand downloads (#4030)
- Remove repeated tenant & timeline from span
- Demote logging of the path to debug level
- Log completion at info level, in the same function where we log errors
- distinguish between layer file download success & on-demand download
succeeding as a whole in the log message wording
- Assert that the span contains a tenant id and a timeline id

fixes https://github.com/neondatabase/neon/issues/3945

Before:

```
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: download complete: /storage/pageserver/data/tenants/$TENANT_ID/timelines/$TIMELINE_ID/000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates.
```

After:

```
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: layer file download finished
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates.
  INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: on-demand download successful
```
2023-04-27 11:54:48 +02:00
Arseny Sher
0112a602e1 Add timeout on proxy -> compute connection establishment.
Otherwise we sit up to default tcp_syn_retries (about 2+ min) before gettings os
error 110 if compute has been migrated to another pod.
2023-04-27 09:50:52 +04:00
Anastasia Lubennikova
92214578af Fix proxy_io_bytes_per_client metric: use branch_id identifier properly. (#4084)
It fixes the miscalculation of the metric for projects that use multiple
branches for the same endpoint.
We were under billing users with such projects. So we need to
communicate the change in Release Notes.
2023-04-26 17:47:54 +03:00
Christian Schwarz
6861259be7 add global metric for unexpected on-demand downloads (#4069)
Until we have toned down the prod logs to zero WARN and ERROR, we want a
dedicated metric for which we can have a dedicated alert.

fixes https://github.com/neondatabase/neon/issues/3924
2023-04-26 15:18:26 +02:00
Sergey Melnikov
11df2ee5d7 Add safekeeper-3.us-east-2.aws.neon.build (#4085) 2023-04-26 14:40:36 +03:00
Arseny Sher
31a3910fd9 Remove wait_for_sk_commit_lsn_to_reach_remote_storage.
It had a couple of inherent races:

1) Even if compute is killed before the call, some more data might still arrive
to safekeepers after commit_lsn on them is polled, advancing it. Then checkpoint
on pageserver might not include this tail, and so upload of expected LSN won't
happen until one more checkpoint.

2) commit_lsn is updated asynchronously -- compute can commit transaction before
communicating commit_lsn to even single safekeeper (sync-safekeepers can be used
to forces the advancement). This makes semantics of
wait_for_sk_commit_lsn_to_reach_remote_storage quite complicated.

Replace it with last_flush_lsn_upload which
1) Learns last flush LSN on compute;
2) Waits for it to arrive to pageserver;
3) Checkpoints it;
4) Waits for the upload.

In some tests this keeps compute alive longer than before, but this doesn't seem
to be important.

There is a chance this fixes https://github.com/neondatabase/neon/issues/3209
2023-04-26 13:46:33 +04:00
Joonas Koivunen
381c8fca4f feat: log how long tenant activation takes (#4080)
Adds just a counter counting up from the creation to the tenant, logged
after activation. Might help guide us with the investigation of #4025.
2023-04-26 12:39:17 +03:00
Joonas Koivunen
4625da3164 build: remove busted sk-1.us-east-2 from staging hosts (#4082)
this should give us complete deployments while a new one is being
brought up.
2023-04-26 09:07:45 +00:00
Joonas Koivunen
850f6b1cb9 refactor: drop pageserver_ondisk_layers (#4071)
I didn't get through #3775 fast enough so we wanted to remove this
metric.

Fixes #3705.
2023-04-26 11:49:29 +03:00
Sergey Melnikov
f19b70b379 Configure extra domain for us-east-1 (#4078) 2023-04-26 09:36:26 +02:00
Sergey Melnikov
9d0cf08d5f Fix new storage-broker deploy for eu-central-1 (#4079) 2023-04-26 10:29:44 +03:00
Alexander Bayandin
2d6fd72177 GitHub Workflows: Fix crane for several registries (#4076)
Follow-up fix after https://github.com/neondatabase/neon/pull/4067

```
+ crane tag neondatabase/vm-compute-node-v14:3064 latest
Error: fetching "neondatabase/vm-compute-node-v14:3064": GET https://index.docker.io/v2/neondatabase/vm-compute-node-v14/manifests/3064: MANIFEST_UNKNOWN: manifest unknown; unknown tag=3064
```

I reverted back the previous approach for promoting images
(login to one registry, save images to local fs, logout and login to
another registry, and push images from local fs). It turns out what
works for one Google project (kaniko), doesn't work for another (crane)
[sigh]
2023-04-25 23:58:59 +01:00
Heikki Linnakangas
8945fbdb31 Enable OpenTelemetry tracing in proxy in staging. (#4065)
Depends on https://github.com/neondatabase/helm-charts/pull/32

Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>
2023-04-25 20:45:36 +03:00
Alexander Bayandin
05ac0e2493 Login to ECR and Docker Hub at once (#4067)
- Update kaniko to 1.9.2 (from 1.7.0), problem with reproducible build is fixed
- Login to ECR and Docker Hub at once, so we can push to several
registries, it makes job `push-docker-hub` unneeded
- `push-docker-hub` replaced with `promote-images` in `needs:` clause,
Pushing images to production ECR moved to `promote-images` job
2023-04-25 17:54:10 +01:00
Joonas Koivunen
bfd45dd671 test_tenant_config: allow ERROR from eviction task (#4074) 2023-04-25 18:41:09 +03:00
Joonas Koivunen
7f80230fd2 fix: stop dead_code rustc lint (#4070)
only happens without `--all-features` which is what `./run_clippy.sh`
uses.
2023-04-25 17:07:04 +02:00
Sergey Melnikov
78bbbccadb Deploy proxies for preview enviroments (#4052)
## Describe your changes
Deploy `main` proxies to the preview environments
We don't deploy storage there yet, as it's tricky.

## Issue ticket number and link
https://github.com/neondatabase/cloud/issues/4737
2023-04-25 16:46:52 +02:00
Christian Schwarz
dbbe032c39 neon_local: fix tenant create -c eviction_policy:... (#4004)
And add corresponding unit test.

The fix is to use `.remove()` instead of `.get()` when processing the
arugments hash map.
The code uses emptiness of the hash map to determine whether all
arguments have been processed.
This was likely a copy-paste error.

    
refs https://github.com/neondatabase/neon/issues/3942
2023-04-25 15:33:30 +02:00
Joonas Koivunen
cb9473928d feat: add rough timings for basebackup (#4062)
just record the time needed for waiting the lsn and then the basebackup
in a log message in millis. this is related to ongoing investigations to
cold start performance.

this could also be a a counter. it cannot be added next to smgr
histograms, because we don't want another histogram per timeline.

the aim is to allow drilling deeper into which timelines were slow, and
to understand why some need two basebackups.
2023-04-25 13:22:16 +00:00
Christian Schwarz
fa20e37574 add gauge for in-flight layer uploads (#3951)
For the "worst-case /storage usage panel", we need to compute
```
remote size + local-only size
```

We currently don't have a metric for local-only layers.

The number of in-flight layers in the upload queue is just that, so, let
Prometheus scrape it.

The metric is two counters (started and finished).
The delta is the amount of in-flight uploads in the queue.

The metrics are incremented in the respective `call_unfinished_metric_*`
functions.
These track ongoing operations by file_kind and op_kind.
We only need this metric for layer uploads, so, there's the new
RemoteTimelineClientMetricsCallTrackSize type that forces all call sites
to decide whether they want the size tracked or not.
If we find that other file_kinds or op_kinds are interesting (metadata
uploads, layer downloads, layer deletes) are interesting, we can just
enable them, and they'll be just another label combination within the
metrics that this PR adds.

fixes https://github.com/neondatabase/neon/issues/3922
2023-04-25 14:22:48 +02:00
Joonas Koivunen
4911d7ce6f feat: warn when requests get cancelled (#4064)
Add a simple disarmable dropguard to log if request is cancelled before
it is completed. We currently don't have this, and it makes for
difficult to know when the request was dropped.
2023-04-25 15:22:23 +03:00
Christian Schwarz
e83684b868 add libmetric metric for each logged log message (#4055)
This patch extends the libmetrics logging setup functionality with a
`tracing` layer that increments a Prometheus counter each time we log a
log message. We have the counter per tracing event level. This allows
for monitoring WARN and ERR log volume without parsing the log. Also, it
would allow cross-checking whether logs got dropped on the way into
Loki.

It would be nicer if we could hook deeper into the tracing logging
layer, to avoid evaluating the filter twice.
But I don't know how to do it.
2023-04-25 14:10:18 +02:00
Eduard Dyckman
afbbc61036 Adding synthetic size to pageserver swagger (#4049)
## Describe your changes

I added synthetic size response to the console swagger. Now I am syncing
it back to neon
2023-04-24 16:19:25 +03:00
Alexey Kondratov
7ba5c286b7 [compute_ctl] Improve 'empty' compute startup sequence (#4034)
Do several attempts to get spec from the control-plane and retry network
errors and all reasonable HTTP response codes. Do not hang waiting for
spec without confirmation from the control-plane that compute is known
and is in the `Empty` state.

Adjust the way we track `total_startup_ms` metric, it should be
calculated since the moment we received spec, not from the moment
`compute_ctl` started. Also introduce a new `wait_for_spec_ms` metric
to track the time spent sleeping and waiting for spec to be delivered
from control-plane.

Part of neondatabase/cloud#3533
2023-04-21 11:10:48 +02:00
sharnoff
02b28ae0b1 fix vm-informant dbname: "neondb" -> "postgres" (#4046)
Changes the vm-informant's postgres connection string's dbname from
"neondb" (which sometimes doesn't exist) to "postgres" (which
_hopefully_ should exist more often?).

Currently there are a handful of VMs in prod that aren't working with
autoscaling because they don't have the "neondb" database.

The vm-informant doesn't require any database in particular; it's just
connecting as `cloud_admin` to be able to adjust the file cache
settings.
2023-04-18 18:54:32 +03:00
Cihan Demirci
0bfbae2d73 Add storage broker deployment to us-east-1 (#4048) 2023-04-18 18:41:09 +03:00
fcdm
f1b7dc4064 Update pageserver instances in us-east-1 2023-04-18 14:08:12 +01:00
Alexander Bayandin
e2a5177e89 Bump h2 from 0.3.17 to 0.3.18 (#4045) 2023-04-18 16:04:10 +03:00
Cihan Demirci
0c083564ce Add us-east-1 hosts file and update regions (#4042)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-04-17 15:25:27 +03:00
fcdm
d8dd60dc81 Add helm values for us-east-1 2023-04-17 11:59:38 +01:00
Arthur Petukhovsky
73f34eaa5e Send AppendResponse keepalive once per second (#4036)
Walproposer sends AppendRequest at least once per second. This patch
adds a response to these requests once per second.

Fixes https://github.com/neondatabase/neon/issues/4017
2023-04-17 11:24:57 +03:00
Matt Nappo
c2496c7ef2 Added black_box in layer_map benches (fix #3396) 2023-04-16 16:33:37 +03:00
Kirill Bulatov
ebea298415 Update most of the dependencies to their latest versions (#4026)
See https://github.com/neondatabase/neon/pull/3991

Brings the changes back with the right way to use new `toml_edit` to
deserialize values and a unit test for this.

All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated.

3 sets of crates remain unupdated:

* `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13).
* `opentelemetry` and `opentelemetry-*` crates

```
error[E0308]: mismatched types
  --> libs/tracing-utils/src/http.rs:65:21
   |
65 |     span.set_parent(parent_ctx);
   |          ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context`
   |          |
   |          arguments to this method are incorrect
   |
   = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types
note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
   = note: perhaps two different versions of crate `opentelemetry_api` are being used?
note: associated function defined here
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8
   |
43 |     fn set_parent(&self, cx: Context);
   |        ^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `tracing-utils` due to previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `tracing-utils` due to previous error
```

`tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need.

* similarly, `rustls`, `tokio-rustls`, `rustls-*` and `tls-listener` crates have similar issue:

```
error[E0308]: mismatched types
   --> libs/postgres_backend/tests/simple_select.rs:112:78
    |
112 |     let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg);
    |                                --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig`
    |                                |
    |                                arguments to this function are incorrect
    |
    = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types
note: struct `ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1
    |
125 | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1
    |
91  | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
    = note: perhaps two different versions of crate `rustls` are being used?
note: associated function defined here
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12
    |
23  |     pub fn new(config: ClientConfig) -> Self {
    |            ^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `postgres_backend` due to previous error
warning: build failed, waiting for other jobs to finish...
```

* aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed.
Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008
2023-04-14 18:28:54 +03:00
Vadim Kharitonov
5ffa20dd82 [proxy] adjust proxy sleep timeout 2023-04-14 15:08:07 +03:00
Vadim Kharitonov
75ea8106ec Add procps into compute containers 2023-04-14 15:02:26 +03:00
Vadim Kharitonov
017d3a390d Compile postgres with lz4 and zstd support 2023-04-14 15:02:26 +03:00
Alexey Kondratov
589cf1ed21 [compute_ctl] Do not create availability checker data on each start (#4019)
Initially, idea was to ensure that when we come and check data
availability, special service table already contains one row. So if we
loose it for some reason, we will error out.

Yet, to do availability check we anyway start compute first! So it
doesn't really add some value, but we affect each compute start as we
update at least one row in the database. Also this writes some WAL, so
if timeline is close to `neon.max_cluster_size` it could prevent compute
from starting up.

That said, do CREATE TABLE IF NOT EXISTS + UPSERT right in the
`/check_writability` handler.
2023-04-14 13:05:07 +02:00
Alexander Bayandin
0c82ff3d98 test_runner: add Timeline Inspector to Grafana links (#4021) 2023-04-14 11:46:47 +01:00
Christian Schwarz
8895f28dae make evictions_low_residence_duration_metric_threshold per-tenant (#3949)
Before this patch, if a tenant would override its eviction_policy
setting to use a lower LayerAccessThreshold::threshold than the
`evictions_low_residence_duration_metric_threshold`, the evictions done
for that tenant would count towards the
`evictions_with_low_residence_duration` metric.

That metric is used to identify pre-mature evictions, commonly triggered
by disk-usage-based eviction under disk pressure.

We don't want that to happen for the legitimate evictions of the tenant
that overrides its eviction_policy.

So, this patch
- moves the setting into TenantConf
- adds test coverage
- updates the staging & prod yamls

Forward Compatibility:
Software before this patch will ignore the new tenant conf field and use
the global one instead.
So we can roll back safely.

Backward Compatibility:
Parsing old configs with software as of this patch will fail in
`PageServerConf::parse_and_validate` with error 
`unrecognized pageserver option 'evictions_low_residence_duration_metric_threshold'`
if the option is still present in the global section.
We deal with this by updating the configs in Ansible.

fixes https://github.com/neondatabase/neon/issues/3940
2023-04-14 13:25:45 +03:00
dependabot[bot]
b6c7c3290f Bump h2 from 0.3.15 to 0.3.17 (#4020) 2023-04-13 20:03:24 +01:00
Sasha Krassovsky
fd31fafeee Make proxy shutdown when all connections are closed (#3764)
## Describe your changes
Makes Proxy start draining connections on SIGTERM.
## Issue ticket number and link
#3333
2023-04-13 19:31:30 +03:00
Alexey Kondratov
db8dd6f380 [compute_ctl] Implement live reconfiguration (#3980)
With this commit one can request compute reconfiguration
from the running `compute_ctl` with compute in `Running` state
by sending a new spec:
```shell
curl -d "{\"spec\": $(cat ./compute-spec-new.json)}" http://localhost:3080/configure
```

Internally, we start a separate configurator thread that is waiting on
`Condvar` for `ConfigurationPending` compute state in a loop. Then it does
reconfiguration, sets compute back to `Running` state and notifies other
waiters.

It will need some follow-ups, e.g. for retry logic for control-plane
requests, but should be useful for testing in the current state. This
shouldn't affect any existing environment, since computes are configured
in a different way there.

Resolves neondatabase/cloud#4433
2023-04-13 18:07:29 +02:00
Alexander Bayandin
36c20946b4 Verify extensions checksums (#4014)
To not be taken by surprise by upstream git re-tag or by malicious activity,
let's verify the checksum for extensions we download

Also, unify the installation of `pg_graphql` and `pg_tiktoken` 
with other extensions.
2023-04-13 15:25:09 +01:00
Heikki Linnakangas
89b5589b1b Tenant size should never be zero. Simplify test.
Looking at the git history of this test, I think "size == 0" used to
have a special meaning earlier, but now it should never happen.
2023-04-13 16:57:31 +03:00
Heikki Linnakangas
53f438a8a8 Rename "Postgres nodes" in control_plane to endpoints.
We use the term "endpoint" in for compute Postgres nodes in the web UI
and user-facing documentation now. Adjust the nomenclature in the code.

This changes the name of the "neon_local pg" command to "neon_local
endpoint". Also adjust names of classes, variables etc. in the python
tests accordingly.

This also changes the directory structure so that endpoints are now
stored in:

    .neon/endpoints/<endpoint id>

instead of:

    .neon/pgdatadirs/tenants/<tenant_id>/<endpoint (node) name>

The tenant ID is no longer part of the path. That means that you
cannot have two endpoints with the same name/ID in two different
tenants anymore. That's consistent with how we treat endpoints in the
real control plane and proxy: the endpoint ID must be globally unique.
2023-04-13 14:34:29 +03:00
Vadim Kharitonov
356439aa33 Add note about manual_release_instructions label (#4015)
## Describe your changes
Do not forget to process required manual stuff after release

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
2023-04-13 13:13:24 +03:00
Vadim Kharitonov
c237a2f5fb Compile pg_hint_plan extension 2023-04-13 12:59:46 +03:00
Dmitry Rodionov
15d1f85552 Add reason to TenantState::Broken (#3954)
Reason and backtrace are added to the Broken state. Backtrace is automatically collected when tenant entered the broken state. The format for API, CLI and metrics is changed and unified to return tenant state name in camel case. Previously snake case was used for metrics and camel case was used for everything else. Now tenant state field in TenantInfo swagger spec is changed to contain state name in "slug" field and other fields (currently only reason and backtrace for Broken variant in "data" field). To allow for this breaking change state was removed from TenantInfo swagger spec because it was not used anywhere.

Please note that the tenant's broken reason is not persisted on disk so the reason is lost when pageserver is restarted.

Requires changes to grafana dashboard that monitors tenant states.

Closes #3001

---------

Co-authored-by: theirix <theirix@gmail.com>
2023-04-13 12:11:43 +03:00
Konstantin Knizhnik
732acc54c1 Add check for duplicates of generated image layers (#3869)
## Describe your changes

## Issue ticket number and link

#3673

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-04-13 10:19:34 +03:00
Stas Kelvich
5d0ecadf7c Add support for non-SNI case in multi-cert proxy
When no SNI is provided use the default certificate, otherwise we can't
get to the options parameter which can be used to set endpoint name too.
That means that non-SNI flow will not work for CNAME domains in verify-full
mode.
2023-04-12 18:16:49 +03:00
Kirill Bulatov
f7995b3c70 Revert "Update most of the dependencies to their latest versions (#3991)" (#4013)
This reverts commit a64044a7a9.

See https://neondb.slack.com/archives/C03H1K0PGKH/p1681306682795559
2023-04-12 14:51:59 +00:00
Alexander Bayandin
13e53e5dc8 GitHub Workflows: use '!cancelled' instead of 'success or failure' 2023-04-12 15:22:18 +01:00
Alexander Bayandin
c94b8998be GitHub Workflows: print error messages to stderr 2023-04-12 15:22:18 +01:00
Alexander Bayandin
218062ceba GitHub Workflows: use ref_name instead of ref 2023-04-12 15:22:18 +01:00
Sam Gaw
8d295780cb Add support for ip4r extension 2023-04-12 16:40:02 +03:00
Kirill Bulatov
a64044a7a9 Update most of the dependencies to their latest versions (#3991)
All non-trivial updates extracted into separate commits, also `carho
hakari` data and its manifest format were updated.

3 sets of crates remain unupdated:

* `base64` — touches proxy in a lot of places and changed its api (by
0.21 version) quite strongly since our version (0.13).
* `opentelemetry` and `opentelemetry-*` crates

```
error[E0308]: mismatched types
  --> libs/tracing-utils/src/http.rs:65:21
   |
65 |     span.set_parent(parent_ctx);
   |          ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context`
   |          |
   |          arguments to this method are incorrect
   |
   = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types
note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api`
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1
   |
77 | pub struct Context {
   | ^^^^^^^^^^^^^^^^^^
   = note: perhaps two different versions of crate `opentelemetry_api` are being used?
note: associated function defined here
  --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8
   |
43 |     fn set_parent(&self, cx: Context);
   |        ^^^^^^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `tracing-utils` due to previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `tracing-utils` due to previous error
```

`tracing-opentelemetry` of version `0.19` is not yet released, that is
supposed to have the update we need.

* similarly, `rustls`, `tokio-rustls`, `rustls-*` and `tls-listener`
crates have similar issue:

```
error[E0308]: mismatched types
   --> libs/postgres_backend/tests/simple_select.rs:112:78
    |
112 |     let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg);
    |                                --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig`
    |                                |
    |                                arguments to this function are incorrect
    |
    = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types
note: struct `ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1
    |
125 | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls`
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1
    |
91  | pub struct ClientConfig {
    | ^^^^^^^^^^^^^^^^^^^^^^^
    = note: perhaps two different versions of crate `rustls` are being used?
note: associated function defined here
   --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12
    |
23  |     pub fn new(config: ClientConfig) -> Self {
    |            ^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `postgres_backend` due to previous error
warning: build failed, waiting for other jobs to finish...
```

* aws crates: I could not make new API to work with bucket endpoint
overload, and console e2e tests failed.
Other our tests passed, further investigation is worth to be done in
https://github.com/neondatabase/neon/issues/4008
2023-04-12 15:32:38 +03:00
Kirill Bulatov
d8939d4162 Move walreceiver start and stop behind a struct (#3973)
The PR changes module function-based walreceiver interface with a
`WalReceiver` struct that exposes a few public methods, `new`, `start`
and `stop` now.

Later, the same struct is planned to be used for getting walreceiver
stats (and, maybe, other extra data) to display during missing wal
errors for https://github.com/neondatabase/neon/issues/2106

Now though, the change required extra logic changes:

* due to the `WalReceiver` struct added, it became easier to pass `ctx`
and later do a `detached_child` instead of

bfee412701/pageserver/src/tenant/timeline.rs (L1379-L1381)

* `WalReceiver::start` which is now the public API to start the
walreceiver, could return an `Err` which now may turn a tenant into
`Broken`, same as the timeline that it tries to load during startup.

* `WalReceiverConf` was added to group walreceiver parameters from
pageserver's tenant config
2023-04-12 12:39:02 +03:00
Heikki Linnakangas
06ce83c912 Tolerate missing 'operation_uuid' field in spec file.
'compute_ctl' doesn't use the operation_uuid for anything, it just prints
it to the log.
2023-04-12 12:11:22 +03:00
Heikki Linnakangas
8ace7a7515 Remove unused 'timestamp' field from ComputeSpec struct. 2023-04-12 12:11:22 +03:00
Heikki Linnakangas
ef68321b31 Use Lsn, TenantId, TimelineId types in compute_ctl.
Stronger types are generally nicer.
2023-04-12 12:11:22 +03:00
Heikki Linnakangas
6064a26963 Refactor 'spec' in ComputeState.
Sometimes, it contained real values, sometimes just defaults if the
spec was not received yet. Make the state more clear by making it an
Option instead.

One consequence is that if some of the required settings like
neon.tenant_id are missing from the spec file sent to the /configure
endpoint, it is spotted earlier and you get an immediate HTTP error
response. Not that it matters very much, but it's nicer nevertheless.
2023-04-12 01:55:40 +03:00
Stas Kelvich
3c9f42a2e2 Support aarch64 in walredo seccomp code (#3996)
Aarch64 doesn't implement some old syscalls like open and select. Use
openat instead of open to check if seccomp is supported. Leave both
select and pselect6 in the allowlist since we don't call select syscall
directly and may hope that libc will call pselect6 on aarch64.

To check whether some syscall is supported it is possible to use
`scmp_sys_resolver` from seccopm package:

```
> apt install seccopm
> scmp_sys_resolver -a x86_64 select
23
> scmp_sys_resolver -a aarch64 select
-10101
> scmp_sys_resolver -a aarch64 pselect6
72
```

Negative value means that syscall is not supported.

Another cross-check is to look up for the actuall syscall table in
`unistd.h`. To resolve all the macroses one can use `gcc -E` as it is
done in `dump_sys_aarch64()` function in
libseccomp/src/arch-syscall-validate.

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-04-11 19:28:18 +00:00
Alexey Kondratov
40a68e9077 [compute_ctl] Add timeout for tracing_utils::shutdown_tracing() (#3982)
Shutting down OTEL tracing provider may hang for quite some time, see,
for example:
- https://github.com/open-telemetry/opentelemetry-rust/issues/868
- and our problems with staging
https://github.com/neondatabase/cloud/issues/3707#issuecomment-1493983636

Yet, we want computes to shut down fast enough, as we may need a new one
for the same timeline ASAP. So wait no longer than 2s for the shutdown
to complete, then just error out and exit the main thread.

Related to neondatabase/cloud#3707
2023-04-11 15:05:35 +02:00
Stas Kelvich
de99ee2c0d Add more proxy cnames 2023-04-11 14:54:09 +03:00
Alexander Bayandin
c79d5a947c Nightly Benchmarks: run third-party benchmarks once a week (#3987) 2023-04-11 10:58:04 +01:00
Arseny Sher
7ad5a5e847 Enable timeout on reading from socket in safekeeper WAL service.
TCP_KEEPALIVE is not enabled by default, so this prevents hanged up connections
in case of abrupt client termination. Add 'closed' flag to PostgresBackendReader
and pass it during handles join to prevent attempts to read from socket if we
errored out previously -- now with timeouts this is a common situation.

It looks like
2023-04-10T18:08:37.493448Z  INFO {cid=68}:WAL
receiver{ttid=59f91ad4e821ab374f9ccdf918da3a85/16438f99d61572c72f0c7b0ed772785d}:
terminated: timed out

Presumably fixes https://github.com/neondatabase/neon/issues/3971
2023-04-11 11:45:43 +04:00
Stas Kelvich
22c890b71c Add more cnames to proxies 2023-04-11 01:55:25 +03:00
Stas Kelvich
83549a8d40 Revert "Support aarch64 in walredo seccomp code"
This reverts commit 98df7db094.
2023-04-11 00:08:01 +03:00
Stas Kelvich
98df7db094 Support aarch64 in walredo seccomp code
Aarch64 doesn't implement some old syscalls like open and select. Use
openat instead of open to check if seccomp is supported. Leave both
select and pselect6 in the allowlist since we don't call select syscall
directly and may hope that libc will call pselect6 on aarch64.

To check whether some syscall is supported it is possible to use
`scmp_sys_resolver` from seccopm package:

```
> apt install seccopm
> scmp_sys_resolver -a x86_64 select
23
> scmp_sys_resolver -a aarch64 select
-10101
> scmp_sys_resolver -a aarch64 pselect6
72
```

Negative value means that syscall is not supported.

Another cross-check is to look up for the actuall syscall table
in `unistd.h`. To resolve all the macroses one can use `gcc -E` as
it is done in `dump_sys_aarch64()` function in
libseccomp/src/arch-syscall-validate.
2023-04-10 23:54:16 +03:00
Heikki Linnakangas
f0b2e076d9 Move compute_ctl structs used in HTTP API and spec file to separate crate.
This is in preparation of using compute_ctl to launch postgres nodes
in the neon_local control plane. And seems like a good idea to
separate the public interfaces anyway.

One non-mechanical change here is that the 'metrics' field is moved
under the Mutex, instead of using atomics. We were not using atomics
for performance but for convenience here, and it seems more clear to
not use atomics in the model for the HTTP response type.
2023-04-09 21:52:28 +03:00
Alexander Bayandin
818e341af0 Nightly Benchmarks: replace neon-captest-prefetch with -new/-reuse (#3970)
We have enabled prefetch by default, let's use this in Nightly
Benchmarks:
- effective_io_concurrency=100 by default (instead of 32)
- maintenance_io_concurrency=100 by default (instead of 32)

Rename `neon-captest-prefetch` to `neon-captest-new` (for pgbench with
initialisation) and `neon-captest-reuse` (for OLAP scenarios)
2023-04-09 12:52:49 +01:00
Kirill Bulatov
dec58092e8 Replace Box<dyn> with impl in RemoteStorage upload (#3984)
Replaces `Box<(dyn io::AsyncRead + Unpin + Send + Sync + 'static)>` with
`impl io::AsyncRead + Unpin + Send + Sync + 'static` usages in the
`RemoteStorage` interface, to make it closer to
[`#![feature(async_fn_in_trait)]`](https://blog.rust-lang.org/inside-rust/2022/11/17/async-fn-in-trait-nightly.html)

For `GenericRemoteStorage`, replaces `type Target = dyn RemoteStorage`
with another impl with `RemoteStorage` methods inside it.
We can reuse the trait, that would require importing the trait in every
file where it's used and makes us farther from the unstable feature.
After this PR, I've manged to create a patch with the changes:

https://github.com/neondatabase/neon/compare/kb/less-dyn-storage...kb/nightly-async-trait?expand=1

Current rust implementation does not like recursive async trait calls,
so `UnreliableWrapper` was removed: it contained a
`GenericRemoteStorage` that implemented the `RemoteStorage` trait, and
itself implemented the trait, which nightly rustc did not like and
proposed to box the future.
Similarly, `GenericRemoteStorage` cannot implement `RemoteStorage` for
nightly rustc to work, since calls various remote storages' methods from
inside.

I've compiled current `main` and the nightly branch both with `time env
RUSTC_WRAPPER="" cargo +nightly build --all --timings` command, and got
```
    Finished dev [optimized + debuginfo] target(s) in 2m 04s
env RUSTC_WRAPPER="" cargo +nightly build --all --timings  1283.19s user 50.40s system 1074% cpu 2:04.15 total

for the new feature tried and

    Finished dev [optimized + debuginfo] target(s) in 2m 40s
env RUSTC_WRAPPER="" cargo +nightly build --all --timings  1288.59s user 52.06s system 834% cpu 2:40.71 total

for the old async_trait approach.
```

On my machine, the `remote_storage` lib compilation takes ~10 less time
with the nightly feature (left) than the regular main (right).


![image](https://user-images.githubusercontent.com/2690773/230620797-163d8b89-dac8-4366-bcf6-cd1cdddcd22c.png)

Full cargo reports are available at
[timings.zip](https://github.com/neondatabase/neon/files/11179369/timings.zip)
2023-04-07 21:39:49 +03:00
Stas Kelvich
0bf70e113f Add extra cnames to staging proxy 2023-04-07 19:18:19 +03:00
Vadim Kharitonov
31f2cdeb1e Update Dockerfile.compute-node
Co-authored-by: MMeent <matthias@neon.tech>
2023-04-07 15:26:22 +02:00
Vadim Kharitonov
979fa8b1ba Compile timescaledb 2023-04-07 15:26:22 +02:00
Konstantin Knizhnik
bfee412701 Trigger tests for index scan implementation (#3968)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-04-07 14:26:21 +03:00
Dmitry Rodionov
bfeb428d1b tests: make neon_fixtures a bit thinner by splitting out some pageserver related helpers (#3977)
neon_fixture is quite big and messy, lets clean it up a bit.
2023-04-07 13:47:28 +03:00
Stas Kelvich
b1c2a6384a Set non-wildcard common names in link auth proxy
Old coding here ignored non-wildcard common names and passed None instead. With my recent changes
I started throwing an error in that case. Old logic doesn't seem to be a great choice, so instead
of passing None I actually set non-wildcard common names too. That way it is possible to avoid handling
cases with None in downstream code.
2023-04-07 01:24:27 +03:00
Anastasia Lubennikova
6d01d835a8 [proxy] Report error if proxy_io_bytes_per_client metric has decreased 2023-04-06 23:14:07 +03:00
Alexey Kondratov
e42982fb1e [compute_ctl] Empty computes and /configure API (#3963)
This commit adds an option to start compute without spec and then pass
it a valid spec via `POST /configure` API endpoint. This is a main
prerequisite for maintaining the pool of compute nodes in the
control-plane.

For example:

1. Start compute with
   ```shell
   cargo run --bin compute_ctl -- -i no-compute \
    -p http://localhost:9095 \
    -D compute_pgdata \
    -C "postgresql://cloud_admin@127.0.0.1:5434/postgres" \
    -b ./pg_install/v15/bin/postgres
   ```

2. Configure it with
   ```shell
   curl -d "{\"spec\": $(cat ./compute-spec.json)}" http://localhost:3080/configure
   ```

Internally, it's implemented using a `Condvar` + `Mutex`. Compute spec
is moved under Mutex, as it's now could be updated in the http handler.
Also `RwLock` was replaced with `Mutex` because the latter works well
with `Condvar`.

First part of the neondatabase/cloud#4433
2023-04-06 21:21:58 +02:00
Dmitry Rodionov
b45c92e533 tests: exclude compatibility tests by default (#3975)
This allows to skip compatibility tests based on `CHECK_ONDISK_DATA_COMPATIBILITY` environment variable. When the variable is missing (default) compatibility tests wont be run.
2023-04-06 21:21:39 +03:00
Arthur Petukhovsky
ba4a96fdb1 Eagerly update wal_backup_lsn after each segment offload (#3976)
Otherwise it can lag a lot, preventing WAL segments cleanup. Also max
wal_backup_lsn on update, pulling it down is pointless.

Should help with https://github.com/neondatabase/neon/issues/3957, but
will not fix it completely.
2023-04-06 20:57:06 +03:00
Alexander Bayandin
4d64edf8a5 Nightly Benchmarks: Add free tier sized compute (#3969)
- Add support for VMs and CU
- Add free tier limited benchmark (0.25 CU)
- Ensure we use 1 CU by default for pgbench workload
2023-04-06 19:18:24 +03:00
Kirill Bulatov
102746bc8f Apply clippy rule exclusion locally instead of a global approach (#3974) 2023-04-06 18:57:48 +03:00
Alexander Bayandin
887cee64e2 test_runner: add links to grafana for remote tests (#3961)
Add Grafana links to allure reports to make it easier to debug perf
test failures
2023-04-06 13:52:41 +01:00
Vadim Kharitonov
2ce973c72f Allow installation of pg_stat_statements 2023-04-06 13:26:40 +02:00
Gleb Novikov
9db70f6232 Added disk_size and instance_type to payload (#3918)
## Describe your changes

In https://github.com/neondatabase/cloud/issues/4354 we are making
scheduling of projects based on available disk space and overcommit, so
we need to know disk size and just in case instance type of the
pageserver

## Issue ticket number and link

https://github.com/neondatabase/cloud/issues/4354

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] ~If it is a core feature, I have added thorough tests.~
- [ ] ~Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?~
- [ ] ~If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.~
2023-04-06 14:02:56 +04:00
Joonas Koivunen
b17c24fa38 fix: settle down to configured percent (#3947)
in real env testing we noted that the disk-usage based eviction sails 1
percentage point above the configured value, which might be a source of
confusion, so it might be better to get rid of that confusion now.

confusion: "I configured 85% but pageserver sails at 86%".

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-04-06 12:47:21 +03:00
Alexander Bayandin
9310949b44 GitHub Autocomment: Retry on server errors (#3958)
Retry posting/updating a comment in case of 5XX errors from GitHub API
2023-04-05 22:08:06 +03:00
Stas Kelvich
d8df5237fa Aligne extra certificate name with default cert-manager names 2023-04-05 21:29:21 +03:00
Stas Kelvich
c3ca48c62b Support extra domain names for proxy.
Make it possible to specify directory where proxy will look up for
extra certificates. Proxy will iterate through subdirs of that directory
and load `key.pem` and `cert.pem` files from each subdir. Certs directory
structure may look like that:

  certs
  |--example.com
  |  |--key.pem
  |  |--cert.pem
  |--foo.bar
     |--key.pem
     |--cert.pem

Actual domain names are taken from certs and key, subdir names are
ignored.
2023-04-05 20:06:48 +03:00
Alexander Bayandin
957acb51b5 GitHub Autocomment: Fix the link to the latest commit (#3952) 2023-04-04 19:06:10 +03:00
Alexander Bayandin
1d23b5d1de Comment PR with test results (#3907)
This PR adds posting a comment with test results. Each workflow run
updates the comment with new results.
The layout and the information that we post can be changed to our needs,
right now, it contains failed tests and test which changes status after
rerun (i.e. flaky tests)
2023-04-04 12:22:47 +01:00
Alexander Bayandin
105b8bb9d3 test_runner: automatically rerun flaky tests (#3880)
This PR adds a plugin that automatically reruns (up to 3 times) flaky
tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and
`pytest-rerunfailures` plugin.

As the first approximation we consider the test flaky if it has failed on 
the main branch in the last 10 days.

Flaky tests are fetched by `scripts/flaky_tests.py` script (it's
possible to use it in a standalone mode to learn which tests are flaky),
stored to a JSON file, and then the file is passed to the pytest plugin.
2023-04-04 12:21:54 +01:00
Kirill Bulatov
846532112c Remove unused S3 list operation (#3936)
In S3, pageserver only lists tenants (prefixes) on S3, no other keys.
Remove the list operation from the API, since S3 impl does not seem to
work normally and not used anyway,
2023-04-03 23:44:38 +03:00
Dmitry Ivanov
f85a61ceac [proxy] Fix regression in logging
For some reason, `tracing::instrument` proc_macro doesn't always print
elements specified via `fields()` or even show that it's impossible
(e.g. there's no Display impl).

Work around this using the `?foo` notation.

Before:
2023-04-03T14:48:06.017504Z  INFO handle_client🤝 received SslRequest

After:
2023-04-03T14:51:24.424176Z  INFO handle_client{session_id=7bd07be8-3462-404e-8ccc-0a5332bf3ace}🤝 received SslRequest
2023-04-03 18:49:30 +03:00
Christian Schwarz
45bf76eb05 enable layer eviction by default in prod (#3933)
Leave disk_usage_based_eviction above the current max usage in prod
(82%ish), so that deploying this commit won't trigger
disk_usage_based_eviction.

As indicated in the TODO, we'll decrease the value to 80% later.

Also update the staging YAMLs to use the anchor syntax for
`evictions_low_residence_duration_metric_threshold` like we do in the
prod YAMLs as of this patch.
2023-04-03 14:57:36 +02:00
Joonas Koivunen
a415670bc3 feat: log evictions (#3930)
this will help log analysis with the counterpart of already logging all
remote download needs and downloads. ended up with a easily regexable
output in the final round.
2023-04-03 14:15:41 +03:00
Joonas Koivunen
cf5cfe6d71 fix: metric used for alerting threshold on staging (#3932)
This should remove the too eager alerts from staging.
2023-04-03 13:26:45 +03:00
Arseny Sher
d733bc54b8 Rename ReplicationFeedback and its fields.
This is the the feedback originating from pageserver, so change previous
confusing names to
s/ReplicationFeedback/PageserverFeedback
s/ps_writelsn/last_receive_lsn
s/ps_flushlsn/disk_consistent_lsn
s/ps_apply_lsn/remote_consistent_lsn

I haven't changed on the wire format to keep compatibility. However,
understanding of new field names is added to compute, so once all computes
receive this patch we can change the wire names as well. Safekeepers/pageservers
are deployed roughly at the same time and it is ok to live without feedbacks
during the short period, so this is not a problem there.
2023-04-03 01:52:41 +04:00
Arthur Petukhovsky
814abd9f84 Switch to safekeeper in the same AZ (#3883)
Add a condition to switch walreceiver connection to safekeeper that is
located in the same availability zone. Switch happens when commit_lsn of
a candidate is not less than commit_lsn from the active connection. This
condition is expected not to trigger instantly, because commit_lsn of a
current connection is usually greater than commit_lsn of updates from
the broker. That means that if WAL is written continuously, switch can
take a lot of time, but it should happen eventually.

Now protoc 3.15+ is required for building neon.

Fixes https://github.com/neondatabase/neon/issues/3200
2023-04-02 11:32:27 +03:00
Alexander Bayandin
75ffe34b17 check-macos-build: fix cache key (#3926)
We don't have `${{ matrix.build_type }}` there, so it gets resolved to
an empty substring and looks like this

[`v1-macOS--pg-f8a650e49b06d39ad131b860117504044b01f312-dcccd010ff851b9f72bb451f28243fa3a341f07028034bbb46ea802413b36d80`](https://github.com/neondatabase/neon/actions/runs/4575422427/jobs/8078231907#step:26:2)
2023-03-31 21:45:59 +03:00
Christian Schwarz
d2aa31f0ce fix pageserver_evictions_with_low_residence_duration metric (#3925)
It was doing the comparison in the wrong way.
2023-03-31 19:25:53 +03:00
Dmitry Rodionov
22f9ea5fe2 Remind people to clean up merge commit message in PR template (#3920) 2023-03-31 16:11:34 +03:00
Joonas Koivunen
d0711d0896 build: fix git perms for deploy job (#3921)
copy pasted from `build-neon` job. it is interesting that this is only
needed by `build-neon` and `deploy`.

Fixes:
https://github.com/neondatabase/neon/actions/runs/4568077915/jobs/8070960178
which seems to have been going for a while.
2023-03-31 16:05:15 +03:00
Arseny Sher
271f6a6e99 Always sync-safekeepers in neon_local on compute start.
Instead of checking neon.safekeepers GUC value in existing pg node data dir,
just always run sync-safekeepers when safekeepers are configured. Without this
change, creation of new compute didn't run it. That's ok for new
timeline/branch (it doesn't return anything useful anyway, and LSN is known by
pageserver), but restart of compute for existing timeline bore the risk of
getting basebackup not on the latest LSN, i.e. basically broken -- it might not
have prev_lsn, and even if it had, walproposer would complain anyway.

fixes https://github.com/neondatabase/neon/issues/2963
2023-03-31 16:15:06 +04:00
Christian Schwarz
a64dd3ecb5 disk-usage-based layer eviction (#3809)
This patch adds a pageserver-global background loop that evicts layers
in response to a shortage of available bytes in the $repo/tenants
directory's filesystem.

The loop runs periodically at a configurable `period`.

Each loop iteration uses `statvfs` to determine filesystem-level space
usage. It compares the returned usage data against two different types
of thresholds. The iteration tries to evict layers until app-internal
accounting says we should be below the thresholds. We cross-check this
internal accounting with the real world by making another `statvfs` at
the end of the iteration. We're good if that second statvfs shows that
we're _actually_ below the configured thresholds. If we're still above
one or more thresholds, we emit a warning log message, leaving it to the
operator to investigate further.

There are two thresholds:
- `max_usage_pct` is the relative available space, expressed in percent
of the total filesystem space. If the actual usage is higher, the
threshold is exceeded.
- `min_avail_bytes` is the absolute available space in bytes. If the
actual usage is lower, the threshold is exceeded.

The iteration evicts layers in LRU fashion with a reservation of up to
`tenant_min_resident_size` bytes of the most recent layers per tenant.
The layers not part of the per-tenant reservation are evicted
least-recently-used first until we're below all thresholds. The
`tenant_min_resident_size` can be overridden per tenant as
`min_resident_size_override` (bytes).

In addition to the loop, there is also an HTTP endpoint to perform one
loop iteration synchronous to the request. The endpoint takes an
absolute number of bytes that the iteration needs to evict before
pressure is relieved. The tests use this endpoint, which is a great
simplification over setting up loopback-mounts in the tests, which would
be required to test the statvfs part of the implementation. We will rely
on manual testing in staging to test the statvfs parts.

The HTTP endpoint is also handy in emergencies where an operator wants
the pageserver to evict a given amount of space _now. Hence, it's
arguments documented in openapi_spec.yml. The response type isn't
documented though because we don't consider it stable. The endpoint
should _not_ be used by Console but it could be used by on-call.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-31 14:47:57 +03:00
Konstantin Knizhnik
bf46237fc2 Fix prefetch for parallel bitmap scan (#3875)
## Describe your changes

Fix prefetch for parallel bitmap scan

## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-30 22:07:19 +03:00
Lassi Pölönen
41d364a8f1 Add more detailed logging to compute_ctl's shutdown (#3915)
Currently we don't see from the logs, if shutting down tracing takes
long time or not. We do see that shutting down computes gets delayed for
some reason and hits thhe grace period limit. Moving the shutdown
message to slightly later, when we don't have anything else than just
exit left.
## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-30 22:02:39 +03:00
Christian Schwarz
fa54a57ca2 random_init_delay: remove the minimum of 10 seconds (#3914)
Before this patch, the range from which the random delay is picked is at
minimum 10 seconds.
With this patch, they delay is bounded to whatever the given `period`
is, and zero, if period id Duration::ZERO.

Motivation for this: the disk usage eviction tests that we'll add in
https://github.com/neondatabase/neon/pull/3905 need to wait for the disk
usage eviction background loop to do its job.
They set a period of 1s.
It seems wasteful to wait 10 seconds in the tests.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-30 18:38:45 +02:00
Lassi Pölönen
1c1bb904ed Rename zenith_* labels to neon_* (#3911)
## Describe your changes
Get rid of the legacy labeling. Aslo `neon_region_slug` with the same
value as `neon_region` doesn't make much sense, so just drop it. This
allows us to drop the relabeling from zenith to neon in the log
collector.
2023-03-30 16:24:47 +03:00
Gleb Novikov
b26c837ed6 Fixed pageserver openapi spec properties reference (#3904)
## Describe your changes

In [this linter
run](https://github.com/neondatabase/cloud/actions/runs/4553032319/jobs/8029101300?pr=4391)
accidentally found out that spec is invalid. Reference other schemas in
properties should be done the way I changed.

Could not find documentation specifically for schemas embedding in
`components.schemas`, but it seems like the approach is inherited from
json schema:
https://json-schema.org/understanding-json-schema/structuring.html#ref

## Issue ticket number and link
-

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] ~If it is a core feature, I have added thorough tests.~
- [ ] ~Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?~
- [ ] ~If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.~
2023-03-29 19:18:44 +04:00
Kirill Bulatov
ac9c7e8c4a Replace pin! from tokio to the std one (#3903)
With fresh rustc brought by
https://github.com/neondatabase/neon/pull/3902, we can use
`std::pin::pin!` macro instead of the tokio one.
One place did not need the macro at all, other places were adjusted.
2023-03-29 14:14:56 +03:00
Vadim Kharitonov
f1b174dc6a Update rust version to 1.68.2 2023-03-29 12:50:04 +04:00
Kirill Bulatov
9d714a8413 Split $CARGO_FLAGS and $CARGO_FEATURES to make e2e tests work 2023-03-29 00:08:30 +03:00
Kirill Bulatov
6c84cbbb58 Run new Rust IT test in CI 2023-03-29 00:08:30 +03:00
Kirill Bulatov
1300dc9239 Replace Python IT test with the Rust one 2023-03-29 00:08:30 +03:00
Kirill Bulatov
018c8b0e2b Use proper tokens and delimeters when listing S3 2023-03-29 00:08:30 +03:00
Arseny Sher
b52389f228 Cleanly exit on any shutdown signal in storage_broker.
neon_local sends SIGQUIT, which otherwise dumps core by default. Also, remove
obsolete install_shutdown_handlers; in all binaries it was overridden by
ShutdownSignals::handle later.

ref https://github.com/neondatabase/neon/issues/3847
2023-03-28 22:29:42 +04:00
Heikki Linnakangas
5a123b56e5 Remove obsolete hack to rename neon-specific GUCs.
I checked the console database, we don't have any of these left in
production.
2023-03-28 17:57:22 +03:00
Arthur Petukhovsky
7456e5b71c Add script to collect state from safekeepers (#3835)
Add an ansible script to collect
https://github.com/neondatabase/neon/pull/3710 state JSON from all
safekeeper nodes and upload them to a postgres table.
2023-03-28 17:04:02 +03:00
Konstantin Knizhnik
9798737ec6 Update pgxn/neon/file_cache.c
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-28 14:43:34 +04:00
Konstantin Knizhnik
35ecb139dc Use stavfs instead inof statfs to fix MacOS build 2023-03-28 14:43:34 +04:00
Arseny Sher
278d0f117d Rename neon_local sk logs s/safekeeper 1.log/safekeeper-1.log.
I don't like spaces in file names.
2023-03-28 14:28:56 +04:00
Arseny Sher
c30b9e6eb1 Show full path to pg_ctl invokation when it fails. 2023-03-28 12:06:06 +04:00
Konstantin Knizhnik
82a4777046 Add local free space monitor (#3832)
## Describe your changes

Monitor free spae in local file system and shrink local file cache size
if it is under watermark.
Neon is using local storage for temp files (temp table + intermediate
results), unlogged relations
and local file cache.

Ideally all space not used for temporary files should be used for local
file cache.
Temporary files and even unlogged relation are intended to have small
life time (because
them can be lost at  any moment in case of compute restart).

So the policy is to overcommit local cache size and shrink it if there
is not enough free space.
As far as temporary files are expected to be needed for a short time,
there i no need
to permanently shrink local file cache size. Instead of it, we just
throw away least recently accessed elements
from local file cache, releasing some space on the local disk.

## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: sharnoff <sharnoff@neon.tech>
2023-03-28 08:27:50 +03:00
Dmitry Rodionov
6efea43449 Use precondition failed code in delete_timeline when tenant is missing (#3884)
This allows client to differentiate between missing tenant and missing
timeline cases
2023-03-27 21:01:46 +03:00
Joonas Koivunen
f14895b48e eviction: avoid post-restart download by synthetic_size (#3871)
As of #3867, we do artificial layer accesses to layers that will be
needed after the next restart, but not until then because of caches.

With this patch, we also do that for the accesses that the synthetic
size calculation worker does if consumption metrics are enabled.

The actual size calculation is not of importance, but we need to
calculate all of the sizes, so we only call tenant::size::gather_inputs.

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-03-27 19:20:23 +02:00
Christian Schwarz
fe15624570 eviction_task: only refresh layer accesses once per p.threshold (#3877)
Without this, we run it every p.period, which can be quite low. For
example, the running experiment with 3000 tenants in prod uses a period
of 1 minute.

Doing it once per p.threshold is enough to prevent eviction.
2023-03-27 14:33:40 +03:00
Christian Schwarz
ff51e96fbd fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn (#3874)
fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn

Assume a single-timeline project.
If the gc_horizon covers all WAL (last_record_lsn < gc_horizon)
but we have written more data than just initdb, the synthetic
size calculation worker needs to calculate the logical size
at LSN initdb_lsn (Segment BranchStart).

Before this patch, that calculation would incorrectly return
the initial logical size calculation result that we cache in
the Timeline::initial_logical_size. Presumably, because there
was confusion around initdb_lsn vs. initial size calculation.

The fix is to only hand out the initialized_size() only if
the LSN matches.

The distinction in the metrics between "init logical size" and "logical
size" was also incorrect because of the above. So, remove it.

There was a special case for `size != 0`. This was to cover the case of
LogicalSize::empty_initial(), but `initial_part_end` is `None` in that
case, so the new `LogicalSize::initialized_size()` will return None
in that case as well.

Lastly, to prevent confusion like this in the future, rename all
occurrences of `init_lsn` to either just `lsn` or a more specific name.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-27 12:45:10 +02:00
Vadim Kharitonov
e3cbcc2ea7 Revert "Add neondatabase/release team as a default reviewers for storage"
This reverts commit daeaa767c4.
2023-03-27 14:10:18 +04:00
Heikki Linnakangas
8d78329991 Remove some dead code.
whoami() was never called, 'is_test' was never set.

'restart()' might be useful, but it wasn't hooked up the CLI so it was
dead code. It's not clear what kind of a restart it should perform,
anyway: just restart Postgres, or re-initialize the data directory
from a fresh basebackup like "stop"+"start" does.
2023-03-27 12:24:35 +03:00
Dmitry Rodionov
4d8c765485 remove redundant dyn (#3878)
remove redundant dyn
2023-03-27 12:04:48 +03:00
dependabot[bot]
4071ff8c7b Bump openssl from 0.10.45 to 0.10.48 in /test_runner/pg_clients/rust/tokio-postgres (#3879) 2023-03-25 12:33:39 +00:00
Dmitry Rodionov
870ba43a1f return proper http codes in timeline delete endpoint (#3876)
return proper http codes in timeline delete endpoint
+ fix openapi spec for detach to include 404 responses
2023-03-24 19:25:39 +02:00
Joonas Koivunen
f5ca897292 fix: less logging at shutdown (#3866)
Log less during shutdown; don't log anything for quickly (less than 1s)
exiting tasks.
2023-03-23 12:00:52 +02:00
Kirill Bulatov
8bd565e09e Ensure branches with no layers have their remote storage counterpart created eventually (#3857)
Discovered during writing a test for
https://github.com/neondatabase/neon/pull/3843
2023-03-22 17:42:31 +02:00
Joonas Koivunen
6033dfdf4a Re-access layers before threshold eviction (#3867)
To avoid re-downloading evicted files on restart, re-compute logical
size and partitioning before each threshold based eviction run.

Cc: #3802

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-03-22 16:26:27 +02:00
mikecaat
14a40c9ca6 Fix minor things for the docker-compose file (#3862)
* Add the REPOSITORY env to build args to avoid the following error when
executing without the credentials for the repository.

```
ERROR: Service 'compute' failed to build: Head
"https://369495373322.dkr.ecr.eu-central-1.amazonaws.com/v2/compute-node-v15/manifests/2221":
no basic auth credentials
```

* update the tag version in the documentation to support storage broker
2023-03-22 08:10:53 +00:00
Shany Pozin
0f7de84785 Allow calling detach on ignored tenant (#3834)
## Describe your changes
Added a query param to detach API
Allow to remove local state of a tenant even if its not in the memory
(following ignore API)
## Issue ticket number and link
#3828
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2023-03-22 07:17:00 +00:00
Kirill Bulatov
dd22c87100 Remove older layer metadata format support code (#3854)
The PR enforces current newest `index_part.json` format in the type
system (version `1`), not allowing any previous forms of it, that were
used in the past.
Similarly, the code to mitigate the
https://github.com/neondatabase/neon/issues/3024 issue is now also
removed.

Current code does not produce old formats and extra files in the
index_part.json, in the future we will be able to use
https://github.com/neondatabase/aversion or other approach to make
version transitions more explicit.

See https://neondb.slack.com/archives/C033RQ5SPDH/p1679134185248119 for
the justification on the breaking changes.
2023-03-21 23:33:28 +02:00
Heikki Linnakangas
6fdd9c10d1 Read storage auth token from spec file.
We read the pageserver connection string from the spec file, so let's
read the auth token from the same place.

We've been talking about pre-launching compute nodes that are not
associated with any particular tenant at startup, so that the spec
file is delivered to the compute node later. We cannot change the env
variables after the process has been launched.

We still pass the token to 'postgres' binary in the NEON_AUTH_TOKEN
env variable, but compute_ctl is now responsible for setting it.
2023-03-21 20:12:09 +02:00
Dmitry Rodionov
4158e24e60 rfc: delete pageserver data from s3 (#3792)
[Rendered](https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md)

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-21 20:03:27 +02:00
Shany Pozin
809acb5fa9 Move neon-image-depot to a larger runner (#3860)
## Describe your changes
https://neondb.slack.com/archives/C039YKBRZB4/p1679413279637059
## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-21 19:32:36 +02:00
Heikki Linnakangas
299db9d028 Simplify and clean up the $NEON_AUTH_TOKEN stuff in compute
- Remove the neon.safekeeper_token_env GUC. It was used to set the
  name of an environment variable, which was then used in pageserver
  and safekeeper connection strings to in place of the
  password. Instead, always look up the environment variable called
  NEON_AUTH_TOKEN. That's what neon.safekeeper_token_env was always
  set to in practice, and I don't see the need for the extra level of
  indirection or configurability.

- Instead of substituting $NEON_AUTH_TOKEN in the connection strings,
  pass $NEON_AUTH_TOKEN "out-of-band" as the password, when we connect
  to the pageserver or safekeepers. That's simpler.

- Also use the password from $NEON_AUTH_TOKEN in compute_ctl, when it
  connects to the pageserver to get the "base backup".
2023-03-21 00:15:04 +02:00
Heikki Linnakangas
5a786fab4f Remove duplicated global variables in neon extension.
Walproposer used to live in the backend, while pagestore_smgr was an
extension. But now that both are part of the neon extension,
walproposer can access the same 'neon_tenant' and 'neon_timeline'
variables as the pageserver_smgr code.
2023-03-21 00:15:04 +02:00
Arseny Sher
699f200811 Send error context chain to the client when Copy stream errors. 2023-03-21 01:22:02 +04:00
Christian Schwarz
881356c417 add metrics to detect eviction-induced thrashing (#3837)
This patch adds two metrics that will enable us to detect *thrashing* of
layers, i.e., repetitions of `eviction, on-demand-download, eviction,
... ` for a given layer.

The first metric counts all layer evictions per timeline. It requires no
further explanation. The second metric counts the layer evictions where
the layer was resident for less than a given threshold.

We can alert on increments to the second metric. The first metric will
serve as a baseline, and further, it's generally interesting, outside of
thrashing.

The second metric's threshold is configurable in PageServerConf and
defaults to 24h. The threshold value is reproduced as a label in the
metric because the counter's value is semantically tied to that
threshold. Since changes to the config and hence the label value are
infrequent, this will have low storage overhead in the metrics storage.

The data source to determine the time that the layer was resident is the
file's `mtime`. Using `mtime` is more of a crutch. It would be better if
Pageserver did its own persistent bookkeeping of residence change events
instead of relying on the filesystem. We had some discussion about this:
https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900

My position is that `mtime` is good enough for now. It can theoretically
jump forward if someone copies files without resetting `mtime`. But that
shouldn't happen in practice. Note that moving files back and forth
doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`,
which is typically used for filesystem-level backup / restore, correctly
syncs `mtime`.

I've added a label that identifies the data source to keep options open
for a future, better data source than `mtime`. Since this value will
stay the same for the time being, it's not a problem for metrics
storage.

refs https://github.com/neondatabase/neon/issues/3728
2023-03-20 16:11:36 +01:00
Heikki Linnakangas
fea4b5f551 Switch to EdDSA algorithm for the storage JWT authentication tokens.
The control plane currently only supports EdDSA. We need to either teach
the storage to use EdDSA, or the control plane to use RSA. EdDSA is more
modern, so let's use that.

We could support both, but it would require a little more code and tests,
and we don't really need the flexibility since we control both sides.
2023-03-20 16:28:01 +02:00
Heikki Linnakangas
77107607f3 Allow JWT key generation to fail if authentication is not enabled.
This allows you to run without the 'openssl' binary as long as you
don't enable authentication. This becomes more important with the next
commit, which switches the JWT algorithm to EdDSA. LibreSSL does not
support EdDSA, and LibreSSL comes with macOS, so the next commit makes
it much more likely for the key generation to fail for macOS users.

To allow running without a keypair, don't generate the authentication
token in the 'neon_local init' step. Instead, generate a new token on
every request that needs one, using the private key.
2023-03-20 16:28:01 +02:00
Heikki Linnakangas
1da963b2f9 Remove some unused code in control plane. 2023-03-20 16:28:01 +02:00
Heikki Linnakangas
1ddb9249aa Reduce the # of histogram buckets in metrics. (#3850)
Shrinks the total number of metrics collected for each timeline by
about 50%.

See https://github.com/neondatabase/neon/issues/2848. This doesn't fully
solve the problem, we still collect a lot of metrics even with this, but
this gives us a lot of headroom.
2023-03-20 15:49:16 +02:00
Joonas Koivunen
0c1228c37a feat: store initial timeline in env fixture (#3839)
minor change, but will allow more use in future for the default
tenants.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-03-20 11:57:27 +02:00
Christian Schwarz
3c15874c48 allow specifying eviction_policy in TenantCreateRequest
This was on oversight from 175a577ad4.

Nothing uses this AFAIK, but, let's fix it anyways.

Noticed while working on https://github.com/neondatabase/neon/issues/3728
2023-03-20 10:43:53 +01:00
Shany Pozin
93f3f4ab5f Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818)
## Describe your changes
Add Error enum for tenant state response to allow better error handling
in mgmt api
## Issue ticket number and link
#2238
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-19 10:44:42 +02:00
sharnoff
f6e2e0042d Fix + re-enable VM cgroup creation + running (#3820)
Re-enable cgroup shenanigans in VMs, with some special care taken to
make sure that our version of cgroup-tools supports cgroup v2 (debian
bullseye does not, and probably won't because it requires a breaking
change in libcgroup).

This involves manually building libcgroup  / cgroup-tools from source,
then copying the output into the final build stage.

We originally considered pulling the package from debian's testing repo
(which is up-to-date), but decided against it. Refer to the PR for more
details.

Prior work, for reference:

* 2153d2e0 - Run compute_ctl in a cgroup in VMs
* 1360361f - Fix missing VM cgconfig.conf
* 8dae8799 - Disable VM cgroup shenanigans
2023-03-16 17:09:45 -07:00
Christian Schwarz
b917270c67 remove unused TenantConfig::update function 2023-03-16 16:25:35 +01:00
Arthur Petukhovsky
b067378d0d Measure cross-AZ traffic in safekeepers (#3806)
Create `safekeeper_pg_io_bytes_total` metric to track total amount of
bytes written/read in a postgres connections to safekeepers. This metric
has the following labels:
- `client_az` – availability zone of the connection initiator, or
`"unknown"`
- `sk_az` – availability zone of the safekeeper, or `"unknown"`
- `app_name` – `application_name` of the postgres client
- `dir` – data direction, either `"read"` or `"write"`
- `same_az` – `"true"`, `"false"` or `"unknown"`. Can be derived from
`client_az` and `sk_az`, exists purely for convenience.

This is implemented by passing availability zone in the connection
string, like this: `-c tenant_id=AAA timeline_id=BBB
availability-zone=AZ-1`.

Update ansible deployment scripts to add availability_zone argument
to safekeeper and pageserver in systemd service files.
2023-03-16 17:24:01 +03:00
Joonas Koivunen
768c8d9972 test: allow gc to get unlucky (#3826)
this failure case was probably introduced by b220ba6, because earlier
the gc would always have run fast enough for restart every 1s. however,
test got added later, so we have just been lucky.

fixes #3824 by allowing this error to happen.
2023-03-16 11:03:21 +02:00
Rahul Patil
f1d960d2c2 Add new pageserver to eu-central-1 (#3829)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-15 16:58:28 +01:00
Arthur Petukhovsky
cd17802b1f Add pg_clients test for serverless driver (#3827)
Fixes #3819
2023-03-15 16:18:38 +03:00
Heikki Linnakangas
10a5d36af8 Separate mgmt and libpq authentication configs in pageserver. (#3773)
This makes it possible to enable authentication only for the mgmt HTTP
API or the compute API. The HTTP API doesn't need to be directly
accessible from compute nodes, and it can be secured through network
policies. This also allows rolling out authentication in a piecemeal
fashion.
2023-03-15 13:52:29 +02:00
Arseny Sher
a7ab53c80c Forward framed read buf contents to compute before proxy pass.
Otherwise they get lost. Normally buffer is empty before proxy pass, but this is
not the case with pipeline mode of out npm driver; fixes connection hangup
introduced by b80fe41af3 for it.

fixes https://github.com/neondatabase/neon/issues/3822
2023-03-15 14:32:41 +03:00
Heikki Linnakangas
2672fd09d8 Make test independent of the order of config lines. 2023-03-14 20:10:34 +02:00
Heikki Linnakangas
4a92799f24 Fix check for trailing garbage in basebackup import.
There was a warning for trailing garbage after end-of-tar archive, but
it didn't always work. The reason is that we created a StreamReader
over the original copyin-stream, but performed the check for garbage
on the copyin-stream. There could be some garbage bytes buffered in
the StreamReader, which were not caught by the warning.

I considered turning the the warning into a fatal error, aborting the
import, but I wasn't sure if we handle aborting the import properly.
Do we clean up the timeline directory on error? If we don't, we should
make that more robust, but that's a different story.

Also, normally a valid tar archive ends with two 512-byte blocks of zeros.
The tokio_tar crate stops at the first all-zeros block. Read and check
the second all-zeros block, and error out if it's not there, or contains
something unexpected.
2023-03-14 19:45:33 +02:00
Konstantin Knizhnik
5396273541 Avoid holes between generated image layers (#3771)
## Describe your changes

When we perform partitioning of the whole key space, we take in account
actual ranges of relation present in the database. So if we have
relation with relid=1 and size 100 and relation with relid=2 with size
200 then result of KeySpace::partition may contain partitions
<100000000..100000099> and <200000000..200000199>. Generated image
layers will contain the same boundaries.
But when GC is checking image coverage to find out of old layer is fully
covered by newer image layer and so can be deleted, it takes in account
only full key range. I.e. if there is delta layer <100000000..300000000>
then it never be garbage collected because image layers
<100000000..100000099> and <200000000..200000199> are not completely
covering it.This is how it looks in practice:


000000067F000032AC00000A300000000000-000000067F000032AC00000A330000000000__000000000F761828

000000067F000032AC00000A31000000001F-000000067F000032AC00000A620000000005__0000000001696070-000000000442A551

000000067F000032AC00000A3300FFFFFFFF-000000067F000032AC00000A650100000000__000000000F761828

So there are two image layers covering delta layer but ... there is a
hole: A330000000000...A3300FFFFFFFF and as a result delta layer is not
collected.

## Issue ticket number and link

This PR is deeply related with #3673 because it is addressing the same
problem: old layers are not utilized by GC.
The test test_gc_old_layers.py in #3673 can be used to see effect of
this patch.

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-14 16:15:07 +02:00
Joonas Koivunen
c23c8946a3 chore: clippies introduced with rust 1.68 (#3781)
- handle automatically fixable future clippies
- tune run-clippy.sh to remove macos specifics which we no longer have

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2023-03-14 15:29:02 +02:00
Joonas Koivunen
15b692ccc9 test: more strict finding of WARN, ERROR lines (#3798)
this prevents flakyness when `WARN|ERROR` appears in some other part of
the line, for example in a random filename.
2023-03-14 15:27:52 +02:00
Alexander Bayandin
3d869cbcde Replace flake8 and isort with ruff (#3810)
- Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort
- Update mypy and black
2023-03-14 13:25:44 +00:00
Lassi Pölönen
68ae020b37 Use RollingUpdate strategy also for legacy proxy (#3814)
## Describe your changes
We have previously changed the neon-proxy to use RollingUpdate. This
should be enabled in legacy proxy too in order to avoid breaking
connections for the clients and allow for example backups to run even
during deployment. (https://github.com/neondatabase/neon/pull/3683)

## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-03-14 13:23:46 +00:00
Joonas Koivunen
d6bb8caad4 refactor: correct return value for not found L0's on LayerMap::replace (#3805)
in prev implementation, an `ok_or_else(...)?` is used to cause a
"precondition error" on LayerMap::replace, however we only see this
particular error if an L0 for which replace fails is not in the layermap
because it is not in `l0_delta_layers`. changes or fixes this to be
Replacement::NotFound instead, making it more clear that an error would
only be raised for actual preconditions, like trying to replace layer
with completly unrelated layer.
2023-03-14 13:18:26 +02:00
Alexander Bayandin
319402fc74 postgres_ffi: restore POSTGRES_INSTALL_DIR support (#3811)
Fix path construction to `pg_config`: `pg_install_dir_versioned` 
already includes `pg_version`
2023-03-14 10:25:48 +00:00
Dmitry Ivanov
2e4bf7cee4 [proxy] Immediately log all compute node connection errors. 2023-03-14 01:45:57 +03:00
Dmitry Ivanov
15ed6af5f2 Add descriptions to proxy's python tests. 2023-03-14 01:32:37 +03:00
Dmitry Rodionov
50476a7cc7 test: update to match current interfaces 2023-03-13 17:50:10 +02:00
andres
d7ab69f303 add test for getting branchpoints from an inactive timeline 2023-03-13 17:50:10 +02:00
sharnoff
582620274a Enable file cache handling by vm-informant (#3794)
Enables the VM informant's file cache integration.

See also: https://github.com/neondatabase/autoscaling/pull/47
2023-03-13 07:16:39 -07:00
Vadim Kharitonov
daeaa767c4 Add neondatabase/release team as a default reviewers for storage
releases
2023-03-13 13:40:15 +01:00
Konstantin Knizhnik
f0573f5991 Remove block cursor cache (#3740)
## Describe your changes

Do not pin current block in BlockCursor

## Issue ticket number and link

See #3712 

There are places (see get_reconstruct_data) in our code when thread is
holding read layers lock and then try to read file and so lock page
cache slot. So we have edge in dependency graph layers->page cache slot.
At the same time (as Christian noticed) we can lock page cache slot in
BlockCursor and then try obtain shard lock on layers.
So there is backward edge in dependency graph page cache slot>layers
which forms loop and may cause deadlock.

There are three possible fixes of the problem:
1. Perform compaction under `layers` shared lock. See PR #3732. It fixes
the problem but make it not possible to append any data to pageserver
until compaction is completed.
2. Do not hold `layers` lock while accessing layers (not sure if it is
possible to do because it definitely introduce some new race
conditions).
3. Do not pin current pages in BockCursor (this PR).

My experiments shows that this cache in BlockCursor is not so useful:
the number of hits/misses for cursor cache on pgbench workload (-i -s
10/-c 10 -T 100/-c 10 -S -T 100):
```
hits: 163011
misses: 1023602
```
So number of cache misses is 10x times  larger.
And results for read-only pgbench are mostly the same:
```
with cache: 14581
w/out cache: 14429
```
 
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-13 14:07:35 +02:00
Nikita Kalyanov
07dcf679de set content type explicitly (#3799)
I moved management API v2 to ogen and the generated code seems to be
more strict about content type. Let's set it properly as it is json
after all

## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-13 14:00:01 +02:00
Heikki Linnakangas
e0ee138a8b Add a test for tokio-postgres client to the driver test suite.
It is fully supported. To enable TLS, though, it requires some extra
glue code, and a dependency to a TLS library.
2023-03-13 12:14:37 +02:00
Arthur Petukhovsky
d9a1329834 Make postgres_backend use generic IO type (#3789)
- Support measuring inbound and outbound traffic in MeasuredStream
- Start using MeasuredStream in safekeepers code
2023-03-13 12:18:10 +03:00
Joonas Koivunen
8699342249 Ondemand rx bytes and layer count (#3777)
Adds two new *global* metrics:
- pageserver_remote_ondemand_downloaded_layers_total
- pageserver_remote_ondemand_downloaded_bytes_total

An existing test is repurposed once more to check that we do get some
reasonable counts. These are to replace guessing from the nic RX bytes
metric how much was on-demand downloaded.

First part of #3745: This does not add the "(un)?avoidable" metric,
which I plan to add as a new metric, which will be a subset of the
counts of the metrics added here.
2023-03-13 09:26:49 +02:00
Joonas Koivunen
ce8fbbd910 Fix allowed error again (#3790)
Fixes #3360 again, this time checking all other "Error processing HTTP
request" messages and aligning the regex with the two others.
2023-03-10 17:44:12 +00:00
Vadim Kharitonov
1401021b21 Be able to get number of CPUs (#3774)
After enabling autoscaling, we faced the issue that customers are not
able to get the number of CPUs they use at this moment. Therefore I've
added these two options:

1. Postgresql function to allow customers to call it whenever they want
2. `compute_ctl` endpoint to show these number in console
2023-03-10 19:00:20 +02:00
Stas Kelvich
252b3685a2 Use unsafe-postgres feature to build pgx extension
Recently added `unsafe-postgres` feature allows to build pgx extensions
against postgres forks that decided to change their ABI name (like us).
With that we can build extensions without forking them and using stock
pgx. As this feature is new few manual version bumps were required.
2023-03-10 17:40:45 +02:00
Heikki Linnakangas
34d3385b2e Add unit tests for JWT encoding and decoding. 2023-03-10 16:09:32 +02:00
Heikki Linnakangas
b00530df2a Add section in internal docs on the JWT payload.
Just copied from the code comments. Could be improved, but this is a
start.
2023-03-10 16:09:32 +02:00
Heikki Linnakangas
bebf76c461 Accept RS384 and RS512 JWT tokens.
Previously, we only accepted RS256. Seems like a pointless limitation,
and when I was testing it with RS512 tokens, it took me a while to
understand why it wasn't working.
2023-03-10 16:09:32 +02:00
Vadim Kharitonov
2ceef91da1 Compile pg_tiktoken extension 2023-03-10 14:59:26 +01:00
Anastasia Lubennikova
b7fddfa70d Add branch_id field to proxy_io_bytes_per_client metric.
Since we allow switching endpoints between different branches, it is important to use composite key.
Otherwise, we may try to calculate delta between metric values for two different branches.
2023-03-10 15:00:48 +02:00
Heikki Linnakangas
d1537a49fa Fix escaping in postgresql.conf that we generate at compute startup
If there are any config options that contain single quotes or backslashes,
they need to be escaped
2023-03-10 14:59:21 +02:00
Heikki Linnakangas
856d01ff68 Add newline at end of postgresql.conf 2023-03-10 14:59:21 +02:00
Heikki Linnakangas
42ec79fb0d Make expected test output nicer to read.
By using Rust raw string literal.
2023-03-10 14:59:21 +02:00
Rory de Zoete
3c4f5af1b9 Try depot.dev for image building (#3768)
To see if it is faster. Run side-by-side for a while so we can gather
enough data.
2023-03-10 11:11:39 +01:00
Arseny Sher
290884ea3b Fix too many arguments in read_network clippy complain. 2023-03-10 10:50:03 +03:00
Arseny Sher
965837df53 Log connection ids in safekeeper instead of thread ids.
Fixes build on macOS (which doesn't have nix gettid) after 0d8ced8534.
2023-03-10 10:50:03 +03:00
Arseny Sher
d1a0f2f0eb Fix example why manual-range-contains is disabled. 2023-03-10 00:03:17 +03:00
Konstantin Knizhnik
a34e78d084 Retry attempt to connect to pageserver in order to make pageserver restart transparent for clients (#3700)
…start transparent for clients

## Describe your changes

Try to reestablish connection with pageserver if send is failed to be
able to make pageserver restart transparent for client

## Issue ticket number and link

https://github.com/neondatabase/neon/issues/1138

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-03-09 22:15:46 +02:00
Arseny Sher
b80fe41af3 Refactor postgres protocol parsing.
1) Remove allocation and data copy during each message read. Instead, parsing
functions now accept BytesMut from which data they form messages, with
pointers (e.g. in CopyData) pointing directly into BytesMut buffer. Accordingly,
move ConnectionError containing IO error subtype into framed.rs providing this
and leave in pq_proto only ProtocolError.

2) Remove anyhow from pq_proto.

3) Move FeStartupPacket out of FeMessage. Now FeStartupPacket::parse returns it
directly, eliminating dead code where user wants startup packet but has to match
for others.

proxy stream.rs is adapted to framed.rs with minimal changes. It also benefits
from framed.rs improvements described above.
2023-03-09 20:45:56 +03:00
Arseny Sher
0d8ced8534 Remove sync postgres_backend, tidy up its split usage.
- Add support for splitting async postgres_backend into read and write halfes.
  Safekeeper needs this for bidirectional streams. To this end, encapsulate
  reading-writing postgres messages to framed.rs with split support without any
  additional changes (relying on BufRead for reading and BytesMut out buffer for
  writing).
- Use async postgres_backend throughout safekeeper (and in proxy auth link
  part).
- In both safekeeper COPY streams, do read-write from the same thread/task with
  select! for easier error handling.
- Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL
  -- join split parts back catching errors from them before returning.

Initially I hoped to do that read-write without split at all, through polling
IO:
https://github.com/neondatabase/neon/pull/3522
However that turned out to be more complicated than I initially expected
due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>>
which is Send construct just to satisfy the checker; 2) can be workaround with
transmute. But this is so messy that I decided to leave split.
2023-03-09 20:45:56 +03:00
Arseny Sher
7627d85345 Move async postgres_backend to its own crate.
To untie cyclic dependency between sync and async versions of postgres_backend,
copy QueryError and some logging/error routines to postgres_backend.rs. This is
temporal glue to make commits smaller, sync version will be dropped by the
upcoming commit completely.
2023-03-09 20:45:56 +03:00
Arseny Sher
3f11a647c0 Rename write_message to write_message_noflush in postgres_backend_async.rs
To make it unifrom across the project; proxy stream.rs and older
postgres_backend uses write_message_noflush.
2023-03-09 20:45:56 +03:00
Alexey Kondratov
e43c413a3f [compute_tools] Add /insights endpoint to compute_ctl (#3704)
This commit adds a basic HTTP API endpoint that allows scraping the
`pg_stat_statements` data and getting a list of slow queries. New
insights like cache hit rate and so on could be added later.

Extension `pg_stat_statements` is checked / created only if compute
tries to load the corresponding shared library. The latter is configured
by control-plane and currently covered with feature flag.

Co-authored by Eduard Dyckman (bird.duskpoet@gmail.com)
2023-03-09 14:21:10 +01:00
Heikki Linnakangas
8459e0265e Add performance test for compaction and image layer creation 2023-03-09 14:30:12 +02:00
Kirill Bulatov
03a2ce9d13 Add tracing spans with request_id into pageserver management API handlers (#3755)
Adds a newtype that creates a span with request_id from
https://github.com/neondatabase/neon/pull/3708 for every HTTP request
served.

Moves request logging and error handlers under the new wrapper, so every request-related event now is logged under the request span.
For compatibility reasons, error handler is left on the general router, since not every service uses the new handler wrappers yet.
2023-03-09 09:24:01 +02:00
Heikki Linnakangas
ccf92df4da Remove deprecated support to handle ZENITH_AUTH_TOKEN.
It's not used anywhere anymore.
2023-03-09 00:53:13 +02:00
Vadim Kharitonov
37bc6d9be4 Compile plpgsql_check extension 2023-03-08 23:20:24 +01:00
Vadim Kharitonov
177f986795 Compile hll extension 2023-03-08 12:03:09 +01:00
Heikki Linnakangas
fb1581d0b9 Fix setting "image_creation_threshold" setting in tenant config. (#3762)
We have a few tests that try to set image_creation_threshold, but it
didn't actually have any effect because we were missing some critical
code to load the setting from config file into memory.

The two modified tests in `test_remote_storage.py perform
compaction and GC, and assert that GC removes some layers. That
only happens if new image layers are created by the
compaction. The tests explicitly disabled image layer creation by
setting image_creation_threshold to a high value, but it didn't
take effect because reading image_creation_threshold from config
file was broken, which is why the test worked. Fix the test to
set image_creation_threshold low, instead, so that GC has work to
do.

Change 'test_tenant_conf.py' so that it exercises the added code.

This might explain why we're apparently missing test coverage for GC
(issue #3415), although I didn't try to address that here, nor did I
check if this improves the it.
2023-03-08 11:39:30 +02:00
Sasha Krassovsky
02b8e0e5af Add OpenAPI spec for do_gc (#3756)
## Describe your changes
Adds a field to the OpenAPI spec for the page server which describes the
`do_gc` command.
## Issue ticket number and link
#3669
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-07 09:08:46 -08:00
Vadim Kharitonov
1b16de0d0f Compile prefix extension 2023-03-07 17:42:53 +01:00
Stas Kelvich
069b5b0a06 Make postgres --wal-redo more embeddable.
* Stop allocating and maintaining 128MB hash table for last written

  LSN cache as it is not needed in wal-redo.

* Do not require access to the initialized data directory. That
  saves few dozens megabytes of empty but initialized data directory.
  Currently such directories do occupy about 10% of the disk space
  on the pageservers as most of tenants are empty.

* Move shmem-initialization code to the extension instead of postgres
2023-03-07 15:01:14 +02:00
Joonas Koivunen
b05e94e4ff fix: allow ERROR log to appear per allowed failure (#3696)
The test already allows the background thread trying to checkpoint to
fail, however the resulting log message is currently not allowed thus
causing flakyness.
2023-03-07 12:44:04 +00:00
Arseny Sher
0acf9ace9a Return 404 if timeline is not found in safekeeper HTTP API. 2023-03-07 16:34:20 +04:00
Arseny Sher
ca85646df4 Max peer_horizon_lsn before adopting it.
Before this patch, persistent peer_horizon_lsn was always sent to walproposer,
making it initially calculate it equal to max of persistent values and in turn
pulling back the in memory value. Send instead in memory value and take max when
safekeeper sets it.

closes https://github.com/neondatabase/neon/issues/3752
2023-03-07 10:16:54 +04:00
Shany Pozin
7b9057ad01 Add timeout to download copy (#3675)
## Describe your changes
Adding a timeout handling for the remote download of layers of 120
seconds for each operation
Note that these downloads are being retried for N times 

## Issue ticket number and link
Fixes: #3672

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-03-06 18:52:59 +02:00
Konstantin Knizhnik
96f65fad68 Handle crash of walredo process and retry applying wal records (#3739)
## Describe your changes

Restart walredo process an d retry applying walredo records i case of
abnormal walredo process termination

## Issue ticket number and link

See #1700


## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-06 10:10:58 +02:00
Christian Schwarz
9cada8b59d fix benchmarks, broken by PR #3737
Benchmarks only run on `main` branch, so, the pre-commit tests didn't
catch these.
2023-03-03 18:47:57 +01:00
Christian Schwarz
66a5159511 fix: compaction: no index upload scheduled if no on-demand downloads
Commit

    0cf7fd0fb8
    Compaction with on-demand download (#3598)

introduced a subtle bug: if we don't have to do on-demand downloads,
we only take one ROUND in fn compact() and exit early.
Thereby, we miss scheduling the index part upload for any layers
created by fn compact_inner().

Before that commit, we didn't have this problem.
So, this patch fixes it.

Since no regression test caught this, I went ahead and extended the
timeline size tests to assert that, if remote storage is configured,
1. pageserver_remote_physical_size matches the other physical sizes
2. file sizes reported by the layer map info endpoint match the other
   physical size metrics

Without the pageserver code fix, the regression test would
fail at the physical size assertion, complaining that
any of the resident physical size != remote physical size metric
50790400.0 != 18399232.0
I figured out what the problem is by comparing the remote storage
and local directories like so, and noticed that the image layer
in the local directory wasn't present on the remote side.
It's size was exactly the difference
    50790400.0 - 18399232.0  =32391168.0

fixes https://github.com/neondatabase/neon/issues/3738
2023-03-03 16:11:54 +01:00
Christian Schwarz
d1a0a907ff tests: use parse_metrics everywhere (#3737)
- use parse_metrics() in all places where we parse Prometheus metrics
- query_all: make `filter` argument optional
- encourage using properly parsed, typed metrics by changing get_metrics()
  to return already-parsed metrics. The new get_metric_str() method,
  like in the Safekeeper type, returns the raw text response.
2023-03-03 14:53:27 +01:00
Christian Schwarz
1b780fa752 timeline_checkpoint_handler: add span with tenant and timeline id
Before this patch, the logs written by freeze_and_flush() and compact()
didn't have any span, which made the test logs annoying to read.
2023-03-03 12:10:24 +01:00
Christian Schwarz
38022ff11c gc: only decrement resident size if GC'd layer is resident
Before this patch, GC would call PersistentLayer::delete()
on every GC'ed layer.
RemoteLayer::delete() returned Ok(()) unconditionally.
GC would then proceed by decrementing the resident size metric,
even though the layer is a RemoteLayer.

This patch makes the following changes:
- Rename PersistentLayer::delete() to delete_resident_layer_file().
  That name is unambiguous.
- Make RemoteLayer::delete_resident_layer_file return an Err().
  We would have uncovered this bug if we had done that from the start.
- Change GC / Timeline::delete_historic_layer check whether
  the layer is remote or not, and only call delete_resident_layer_file()
  if it's not remote. This brings us in line with how eviction does it.
- Add a regression test.

fixes https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
1b9b9d60d4 eviction: add comment explaining resident size decrement on error
https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
68141a924d eviction: remove needless if-let around resident size decrement
The branch was always taken at runtime, so, this should not
constitute a behavioral change.

refs https://github.com/neondatabase/neon/issues/3722
2023-03-03 12:10:24 +01:00
Christian Schwarz
764d27f696 fix checkpoint_timeout serialization in TenantConf
Without this change, when actually setting this conf opt, the tenant
would become Broken next time we load it.
Why?
The serde_toml representation that persist_tenant_conf would write out
would be a TOML inline table of `secs` and `nsecs`.
But our hand-rolled TenantConf parser expects a TOML string.

I checked that all other `Duration` values in TenantConfOpt
use the humantime serialization.

Issues like this would likely be systematically prevent by
https://github.com/neondatabase/neon/issues/3682
2023-03-03 12:10:24 +01:00
Arthur Petukhovsky
b23742e09c Create /v1/debug_dump safekeepers endpoint (#3710)
Add HTTP endpoint to get full safekeeper state of all existing timelines
(all in-memory values and info about all files stored on disk).

Example:
https://gist.github.com/petuhovskiy/3cbb8f870401e9f486731d145161c286
2023-03-03 14:01:05 +03:00
Vadim Kharitonov
5e514b8465 Compile pgTAP extension 2023-03-03 10:14:38 +01:00
Vadim Kharitonov
a60f687ce2 Compile rum extension 2023-03-02 12:28:20 +01:00
sharnoff
8dae879994 Disable VM cgroup shenanigans (#3730)
As discussed - temporary, so it can unblock releasing autoscaling.
Cleaner to fully remove, then add back rather than commenting it out.
2023-03-01 23:58:43 -08:00
Shany Pozin
d19c5248c9 Add UUID header to mgmt API (#3708)
## Describe your changes

## Issue ticket number and link
#3479
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-01 18:09:08 +02:00
sharnoff
1360361f60 Fix missing VM cgconfig.conf (#3718)
It was being added to the wrong stage in the dockerfile. This should fix
it, and resolves an ongoing issue on staging.
2023-02-28 21:11:00 -08:00
Alexander Bayandin
000eb1b069 Bump tempfile from 3.3.0 to 3.4.0 (#3709)
Update `tempfile` crate to get rid of `remove_dir_all` dependency
Ref https://github.com/neondatabase/neon/security/dependabot/15
2023-02-27 12:44:08 +00:00
Heikki Linnakangas
f51b48fa49 Fix UNLOGGED tables.
Instead of trying to create missing files on the way, send init fork contents as
main fork from pageserver during basebackup. Add test for that. Call
put_rel_drop for init forks; previously they weren't removed. Bump
vendor/postgres to revert previous approach on Postgres side.

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>

ref https://github.com/neondatabase/postgres/pull/264
ref https://github.com/neondatabase/postgres/pull/259
ref https://github.com/neondatabase/neon/issues/1222
2023-02-24 23:30:02 +04:00
Sergey Melnikov
9f906ff236 Add pageserver-2.us-east-2.aws.neon.tech (#3701) 2023-02-23 19:56:21 +01:00
Sam Kleinman
c79dd8d458 compute_ctl: support for fetching spec from control plane (#3610) 2023-02-23 13:19:39 -05:00
Vadim Kharitonov
ec4ecdd543 Enable postgres SPI extensions 2023-02-23 16:49:37 +01:00
MMeent
20a4d817ce Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581)
## Describe your changes
Rebase vendored PostgreSQL onto 14.7 and 15.2

## Issue ticket number and link

#3579

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
    ```
The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL
14 and 15.2 for PostgreSQL 15.
    ```
2023-02-23 16:10:22 +02:00
Vadim Kharitonov
5ebf7e5619 Fix pg_jsonschema and pg_graphql 2023-02-23 10:43:46 +01:00
Arseny Sher
0692fffbf3 Bump vendor/postgres to include hotfix for unlogged tables with indexes.
https://github.com/neondatabase/postgres/pull/259
https://github.com/neondatabase/postgres/pull/262
2023-02-23 01:34:59 +04:00
Vadim Kharitonov
093570af20 Compile pg_hashids extension 2023-02-22 21:00:25 +01:00
Dmitry Rodionov
eb403da814 Use debug level for successful GET http requests (#3681)
We started rather frequently scrap some apis for metadata. This includes
layer eviction tester, I believe console does that too.

It should eliminate these logs:
https://neonprod.grafana.net/goto/rr_ace1Vz?orgId=1 (Note the rate
around 2k messages per minute)
2023-02-22 22:19:05 +03:00
Vadim Kharitonov
f3ad635911 Compile pgrouting extension 2023-02-22 20:16:11 +01:00
Vadim Kharitonov
a8d7360881 Compile hypopg extension 2023-02-22 20:14:30 +01:00
Lassi Pölönen
b0311cfdeb Change the production neon-proxy-scram update strategy to RollingUpdate (#3683)
## Describe your changes
The same change in production as was done in staging by
https://github.com/neondatabase/neon/pull/3678

## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-02-22 20:15:37 +02:00
Konstantin Knizhnik
412e0aa985 Skip largest N holes during compaction (#3597)
## Describe your changes

This is yet another attempt to address problem with storage size
ballooning #2948
Previous PR #3348 tries to address this problem by maintaining list of
holes for each layer.
The problem with this approach is that we have to load all layer on
pageserver start.
Lazy loading of layers is not possible any more.

This PR tries to collect information of N largest holes on compaction
time and exclude this holes from produced layers.
It can cause generation of larger number of layers (up to 2 times) and
producing small layers.
But it requires minimal changes in code and doesn't affect storage
format.

For graphical explanation please see thread:
https://github.com/neondatabase/neon/pull/3597#discussion_r1112704451

## Issue ticket number and link

#2948
#3348

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-22 18:28:01 +02:00
Lassi Pölönen
965b4f4ae2 Change the staging neon-proxy-scram update strategy to RollingUpdate (#3678)
## Describe your changes
When we deploy the proxy with the default Recreate strategy, there's
always some downtime and existing connections will be shut down. Change
the strategy to RollingUpdate and delay the kill signal by one week. AWS
Network Loadbalancer keeps the existing connections alive for as long as
the pods are alive, but will direct new connections to new pods.


## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-02-22 16:50:07 +02:00
Arthur Petukhovsky
95018672fa Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671)
We migrated all timelines to
`safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be
removed.
2023-02-22 11:55:41 +02:00
Sergey Melnikov
2caece2077 Add -v to ansible invocations (#3670)
To get more debug output on failures
2023-02-21 23:11:52 +03:00
Joonas Koivunen
b8b8c19fb4 fix: hold permit until GetObject eof (#3663)
previously we applied the ratelimiting only up to receiving the headers
from s3, or somewhere near it. the commit adds an adapter which carries
the permit until the AsyncRead has been disposed.

fixes #3662.
2023-02-21 21:14:08 +02:00
Joonas Koivunen
225add041f calculate_logical_size: no longer use spawn_blocking (#3664)
Calculation of logical size is now async because of layer downloads, so
we shouldn't use spawn_blocking for it. Use of `spawn_blocking`
exhausted resources which are needed by `tokio::io::copy` when copying
from a stream to a file which lead to deadlock.

Fixes: #3657
2023-02-21 21:09:31 +02:00
Joonas Koivunen
5d001b1e5a chore: ignore all compaction inactive tenant errors (#3665)
these are happening in tests because of #3655 but they sure took some
time to appear.

makes the `Compaction failed, retrying in 2s: Cannot run compaction
iteration on inactive tenant` into a globally allowed error, because it
has been seen failing on different test cases.
2023-02-21 20:20:13 +02:00
Joonas Koivunen
fe462de85b fix: log download failed error (#3661)
Fixes #3659
2023-02-21 19:31:53 +02:00
Vadim Kharitonov
c0de7f5cd8 Build pg_jsonschema and pg_graphql extensions (#3535)
## Describe your changes
Layer for building pg extensions written on Rust

It required forking:
* `cargo-pgx` (in order not to catch an ABI mismatch error (`cargo-pgx`
hardcoded ABI tcdi/pgx#1032)
* `pg_jsonschema` (to use forked `cargo-pgx` version)
* `pgx-contrib-spiext` (to use forked `cargo-pgx`)
* `pg_graphql` (to use forked `cargo-pgx` and `pgx-contrib-spiext`
version)

Before the patch:

```
postgres=# create extension pg_jsonschema;
2023-02-02 17:45:23.120 UTC [35] ERROR:  incompatible library "/usr/local/lib/pg_jsonschema.so": ABI mismatch
2023-02-02 17:45:23.120 UTC [35] DETAIL:  Server has ABI "Neon Postgres", library has "PostgreSQL".
2023-02-02 17:45:23.120 UTC [35] STATEMENT:  create extension pg_jsonschema;
ERROR:  incompatible library "/usr/local/lib/pg_jsonschema.so": ABI mismatch
DETAIL:  Server has ABI "Neon Postgres", library has "PostgreSQL".
```

After 

```
postgres=# create extension pg_jsonschema;
CREATE EXTENSION
postgres=# select json_matches_schema('{"type": "object"}', '{}');
 json_matches_schema
---------------------
 t
postgres=# create extension pg_graphql;
CREATE EXTENSION
postgres=# create table book(id int primary key, title text);
CREATE TABLE
postgres=# insert into book(id, title) values (1, 'book 1');
INSERT 0 1
postgres=# select graphql.resolve($$
query {
  bookCollection {
    edges {
      node {
        id
      }
    }
  }
}
$$);
                            resolve
----------------------------------------------------------------
 {"data": {"bookCollection": {"edges": [{"node": {"id": 1}}]}}}
(1 row)
```

## Issue ticket number and link
Closes #3429, #3096

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

`pg_jsonschema` extension will be available for our customers
2023-02-21 17:31:23 +01:00
Joonas Koivunen
b220ba6cd1 add random init delay for background tasks (#3655)
Fixes #3649.
2023-02-21 12:42:11 +01:00
Joonas Koivunen
7de373210d Warn when background tasks exceed their configured period (#3654)
Fixes #3648.
2023-02-21 13:02:19 +02:00
Vadim Kharitonov
5c5b03ce08 Compile xml2 extension 2023-02-21 10:34:45 +01:00
Joonas Koivunen
d7d3f451f0 Use tracing panic hook in all binaries (#3634)
Enables tracing panic hook in addition to pageserver introduced in
#3475:

- proxy
- safekeeper
- storage_broker

For proxy, a drop guard which resets the original std panic hook was
added on the first commit. Other binaries don't need it so they never
reset anything by `disarm`ing the drop guard.

The aim of the change is to make sure all panics a) have span
information b) are logged similar to other messages, not interleaved
with other messages as happens right now. Interleaving happens right now
because std prints panics to stderr, and other logging happens in
stdout. If this was handled gracefully by some utility, the log message
splitter would treat panics as belonging to the previous message because
it expects a message to start with a timestamp.

Cc: #3468
2023-02-21 10:03:55 +02:00
Keanu Ashwell
bc7d3c6476 docs: add dependency requirements for arch based systems (#3588)
This pull request adds information on building neon on Arch based system
such as Artix, Manjaro, Antergos, etc.
2023-02-20 22:51:54 +03:00
Sergey Melnikov
e3d75879c0 Use fqdn to access console management API on production (#3651)
console-release.local is legacy manual CNAME to
neon-internal-api.aws.neon.tech in r53
We could use neon-internal-api.aws.neon.tech name directly

This already was deployed to staging in
https://github.com/neondatabase/neon/pull/3642
2023-02-20 18:11:06 +01:00
Christian Schwarz
485b269674 eviction: tone down logs to debug!() level if there were no evictions
fixes #3647
2023-02-20 18:01:59 +01:00
Christian Schwarz
ee1eda9921 eviction: remove EvictionStats::not_considered_due_to_clock_skew
Rationale: see the block comment added in this patch.

fixes #3641
2023-02-20 18:01:59 +01:00
Christian Schwarz
e363911c85 timeline: propagate span to download_remote_layer (#3644)
fixes #3643
refs #3604
2023-02-20 17:18:13 +02:00
Sergey Melnikov
d5d690c044 Use fqdn for staging console management API (#3642)
`console-staging.local` is legacy manual CNAME to
`neon-internal-api.aws.neon.build` in r53
We could use `neon-internal-api.aws.neon.build` name directly
2023-02-20 16:05:21 +01:00
Arthur Petukhovsky
8f557477c6 Add new safekeeper to ap-southeast-1 prod (#3645) 2023-02-20 17:51:27 +03:00
Shany Pozin
af210c8b42 Allow running do_gc in non testing env (#3639)
## Describe your changes
Since the current default gc period is set to 1 hour, whenever there is
an immediate need to reduce PITR and run gc, the user has to wait 1 hour
for PITR change to take effect
By enabling this API the user can configure PITR and immediately call
the do_gc API to trigger gc
## Issue ticket number and link
#3590
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-20 13:23:13 +02:00
sharnoff
2153d2e00a Run compute_ctl in a cgroup in VMs (#3577) 2023-02-17 14:14:41 -08:00
Alexander Bayandin
564fa11244 Update Postgres extensions (#3615)
- Update postgis from 3.3.1 from 3.3.2
- Update plv8 from 3.1.4 to 3.1.5
- Update h3-pg from 4.0.1 to 4.1.2 (and underlying h3 from 4.0.1 to 4.1.0)
2023-02-17 18:18:23 +00:00
Christian Schwarz
8d28a24b26 staging: enable automatic layer eviction at 20m threshold + period (#3636)
What it says on the tin.

Part of #2476
2023-02-17 18:32:01 +02:00
Anastasia Lubennikova
53128d56d9 Fix make clean:
Use correct paths in neon-pg-ext-clean
2023-02-17 17:57:45 +02:00
Anastasia Lubennikova
40799d8ae7 Add debug messages to catch abnormal consumption metric values 2023-02-17 17:57:45 +02:00
Konstantin Knizhnik
b242b0ad67 Fix flaky tests (#3616)
## Describe your changes

test_on_demand_download is flaky because not waiting until created image
layer is transferred to S3.
test_tenants_with_remote_storage just leaves garbage at the end of
overwritten file.

Right solution for test_on_demand_download is to add some API call to
wait completion of synchronization with S3 (not just based on last
record LSN). But right now it is solved using sleep.

## Issue ticket number and link

#3209 

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-17 15:56:56 +02:00
Dmitry Ivanov
d90cd36bcc [proxy] Improve tracing spans here and there. 2023-02-17 15:32:14 +03:00
Dmitry Ivanov
956b6f17ca [proxy] Handle some unix signals.
On the surface, this doesn't add much, but there are some benefits:

* We can do graceful shutdowns and thus record more code coverage data.

* We now have a foundation for the more interesting behaviors, e.g. "stop
  accepting new connections after SIGTERM but keep serving the existing ones".

* We give the otel machinery a chance to flush trace events before
  finally shutting down.
2023-02-17 15:32:14 +03:00
Heikki Linnakangas
6f9af0aa8c [proxy] Enable OpenTelemetry tracing.
This commit sets up OpenTelemetry tracing and exporter, so that they
can be exported as OpenTelemetry traces as well.

All outgoing HTTP requests will be traced. A separate (child)
span is created for each outgoing HTTP request, and the tracing
context is also propagated to the server in the HTTP headers.

If tracing is enabled in the control plane and compute node too, you
can now get an end-to-end distributed trace of what happens when a new
connection is established, starting from the handshake with the
client, creating the 'start_compute' operation in the control plane,
starting the compute node, all the way to down to fetching the base
backup and the availability checks in compute_ctl.

Co-authored-by: Dmitry Ivanov <dima@neon.tech>
2023-02-17 15:32:14 +03:00
Joonas Koivunen
8e6b27bf7c fix: avoid busy loop on replacement failure (#3613)
Add an AtomicBool per RemoteLayer, use it to mark together with closed
semaphore that remotelayer is unusable until restart or ignore+load.

https://github.com/neondatabase/neon/issues/3533#issuecomment-1431481554
2023-02-17 14:15:29 +02:00
Joonas Koivunen
ae3eff1ad2 Tracing panic hook (#3475)
Fixes #3468.

This does change how the panics look, and most importantly, make sure
they are not interleaved with other messages. Adds a `GET /v1/panic`
endpoint for panic testing (useful for sentry dedup and this hook
testing).

The panics are now logged within a new error level span called `panic`
which separates it from other error level events. The panic info is
unpacked into span fields:
- thread=mgmt request worker
- location="pageserver/src/http/routes.rs:898:9"

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-02-17 13:56:00 +02:00
Joonas Koivunen
501702b27c fix: flaky test_compaction_downloads_on_demand_with_image_creation (#3629)
fix is to stop postgres before the final checkpoint to ensure no
inmemory layer gets created.

Fixes #3627.
2023-02-17 13:34:26 +02:00
Alexander Bayandin
526f8b76aa Bump werkzeug from 2.1.2 to 2.2.3 (#3631)
## Describe your changes

```
$ poetry add werkzeug@latest "moto[server]@latest"
Using version ^2.2.3 for werkzeug
Using version ^4.1.2 for moto

Updating dependencies
Resolving dependencies... (1.6s)

Writing lock file

Package operations: 0 installs, 2 updates, 1 removal

  • Removing pytz (2022.1)
  • Updating werkzeug (2.1.2 -> 2.2.3)
  • Updating moto (3.1.18 -> 4.1.2)
```

Resolves:
- https://github.com/neondatabase/neon/security/dependabot/14
- https://github.com/neondatabase/neon/security/dependabot/13

`@dependabot` failed to create a PR for some reason (I guess because it
also needed to handle `moto` dependency)

## Issue ticket number and link
N/A

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [x] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-17 08:29:52 +00:00
Sergey Melnikov
a1b062123b Do not deploy storage to old account (#3630)
It's gone
2023-02-16 20:28:53 +00:00
Dmitry Ivanov
a4d5c8085b Move hacks to a dedicated module. 2023-02-16 22:10:56 +03:00
Dmitry Ivanov
edffe0dd9d Extract password hack & cleartext hack 2023-02-16 22:10:56 +03:00
Heikki Linnakangas
d9c518b2cc Refactor use_cleartext_password_flow.
It's not a property of the credentials that we receive from the
client, so remove it from ClientCredentials. Instead, pass it as an
argument directly to 'authenticate' function, where it's actually
used. All the rest of the changes is just plumbing to pass it through
the call stack to 'authenticate'
2023-02-16 22:10:56 +03:00
Anastasia Lubennikova
0d3aefb274 Only use active timelines in synthetic_size calculation 2023-02-16 17:58:53 +02:00
Anastasia Lubennikova
6139e8e426 Revert "Add debug messages around sending cached metrics"
This reverts commit a839860c2e.
2023-02-16 17:46:15 +02:00
Anastasia Lubennikova
d9ba3c5f5e Revert "Add debug messages around timeline.get_current_logical_size"
This reverts commit a5ce2b5330.
2023-02-16 17:46:15 +02:00
Joonas Koivunen
0cf7fd0fb8 Compaction with on-demand download (#3598)
Repeatedly (twice) try to download the compaction targeted layers before actual
compaction. Adds tests for both L0 compaction downloading layers and image
creation downloading layers. Image creation support existed already.

Fixes #3591

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-02-16 15:36:13 +02:00
Kirill Bulatov
f0b41e7750 Propose less verbose way to build neon (#3624)
Closes https://github.com/neondatabase/neon/issues/3518 and might help
https://github.com/neondatabase/neon/issues/3611 and the future build
attempts.

Propose `-s` flag in the Readme when building via `make` command, to
help people to spot build errors easier.
2023-02-16 14:25:35 +02:00
Vadim Kharitonov
5082d84f5b Compile pgjwt extension 2023-02-16 10:54:22 +01:00
Anastasia Lubennikova
7991bd3b69 Fix periodic metric sending: don't reset timer on every iteration (#3617)
Previously timer was reset on every collect_metrics_iteration and sending of cached metrics was never triggered.

This is a follow-up for a69da4a7.
2023-02-16 10:56:42 +02:00
Heikki Linnakangas
ddbdcdddd7 Tenant size calculation: refactor, rewrite, and add SVG (#2817)
Refactor the tenant_size_model code. Segment now contains just the
minimum amount of information needed to calculate the size. Other
information that is useful for building up the segment tree, and for
display purposes, is now kept elsewhere. The code in 'main.rs' has a new
ScenarioBuilder struct for that.

Calculating which Segments are "needed" is now the responsibility of the
caller of tenant_size_mode, not part of the calculation itself. So it's
up to the caller to make all the decisions with retention periods for
each branch.

The output of the sizing calculation is now a Vec of SizeResults, rather
than a tree. It uses a tree representation internally, when doing the
calculation, but it's not exposed to the caller anymore.

Refactor the way the recursive calculation is performed.

Rewrite the code in size.rs that builds the Segment model. Get rid of
the intermediate representation with Update structs. Build the Segments
directly, with some local HashMaps and Vecs to track branch points to
help with that.

retention_period is now an input to gather_inputs(), rather than an
output.

Update pageserver http API: rename /size endpoint to /synthetic_size
with following parameters:
    - /synthetic_size?inputs_only to get debug info;
- /synthetic_size?retention_period=0 to override cutoff that is used to
calculate the size;
pass header -H "Accept: text/html" to get HTML output, otherwise JSON is
returned

Update python tests and openapi spec.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-02-16 10:53:46 +02:00
Shany Pozin
7b182e2605 Update settings.md with latest PITR and gc period values (#3618)
## Describe your changes
Updates PITR and GC_PERIOD default value doc
## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-16 10:33:04 +02:00
Dmitry Ivanov
1d9d7c02db [proxy] Don't forward empty options to compute nodes
Clients may specify endpoint/project name via `options=project=...`,
so we should not only remove `project=` from `options` but also
drop `options` entirely, because connection pools don't support it.

Discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1676464382670119
2023-02-15 22:05:03 +03:00
Anna Stepanyan
a974602f9f fix the logical size term definition (#3609)
a size of a *database* cannot be a sum of the sizes of *all databases*
indicating that a logical size is calculated for a branch

## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] i checked the suggested changes
- [x] this is not a core feature
- [x] this is just a docs update, does not require analytics
- [x] this PR does not require a public announcement
2023-02-15 15:11:06 +01:00
Anastasia Lubennikova
a839860c2e Add debug messages around sending cached metrics 2023-02-15 16:02:02 +02:00
Anastasia Lubennikova
a5ce2b5330 Add debug messages around timeline.get_current_logical_size 2023-02-15 16:02:02 +02:00
Dmitry Ivanov
3569c1bacd [proxy] Fix: don't cache user & dbname in node info cache
Upstream proxy erroneously stores user & dbname in compute node info
cache entries, thus causing "funny" connection problems if such an entry
is reused while connecting to e.g. a different DB on the same compute node.

This PR fixes the problem but doesn't eliminate the root cause just yet.
I'll revisit this code and make it more type-safe in the upcoming PR.
2023-02-14 17:54:01 +03:00
Vadim Kharitonov
86681b92aa Enable plls and plcoffee extensions 2023-02-14 13:33:27 +01:00
Sergey Melnikov
eb21d9969d Add pageserver-3.us-west-2.aws.neon.tech (#3603) 2023-02-14 12:56:03 +01:00
Joonas Koivunen
e6618f1cc0 Update current logical size gauge (#3592)
Alternative to #3586.

Introduces usage of current_logical_size.current_size as a boundary after which we start to update the metric gauge on ingested wal. Previously any incremented value (ingested wal) would had updated the gauge, but this would had left the metric at zero for timelines which never receive any wal even if size had been calculated. Now the gauge is updated right away as the calculation completes, not requiring any wal to be received.
2023-02-14 13:17:34 +02:00
Dmitry Ivanov
eaff14da5f [proxy] Restore INFO as the default tracing level
Also move tracing init to its own function.
2023-02-13 17:09:43 +03:00
Arthur Petukhovsky
f383b4d540 Enable TCP_NODELAY for wss connections 2023-02-10 21:40:28 +03:00
Dmitry Ivanov
694150ce40 [proxy] Respect the magic RUST_LOG env variable
Usage: `RUST_LOG=trace proxy ...`
2023-02-10 18:49:32 +03:00
Vadim Kharitonov
f4359b688c Backport cargo fmt diff from release branch into main 2023-02-10 14:20:55 +01:00
Vadim Kharitonov
948f047f0a Compile pgvector extension 2023-02-10 13:37:52 +01:00
Vadim Kharitonov
4175cfbdac Create folder for file_cache 2023-02-10 13:03:12 +01:00
Dmitry Ivanov
9657459d80 [proxy] Fix possible unsoundness in the websocket machinery (#3569)
This PR replaces the ill-advised `unsafe Sync` impl with a de-facto
standard way to solve the underlying problem.

TLDR:
- tokio::task::spawn requires future to be Send
- ∀t. (t : Sync) <=> (&t : Send)
- ∀t. (t : Send + !Sync) => (&t : !Send)
2023-02-10 12:45:38 +03:00
Christian Schwarz
a4256b3250 allow on-demand downloads in walreceiver connection handler
Without this patch, basebackup fails if we evict all layers before that.

This slipped in as part of

   commit 01b4b0c2f3
   Author: Christian Schwarz <christian@neon.tech>
   Date:   Fri Jan 13 17:02:22 2023 +0100

       Introduce RequestContext
2023-02-09 13:39:04 +01:00
Christian Schwarz
175a577ad4 automatic layer eviction
This patch adds a per-timeline periodic task that executes an eviction
policy. The eviction policy is configurable per tenant.

Two policies exist:
- NoEviction (the default one)
- LayerAccessThreshold

The LayerAccessThreshold policy examines the last access timestamp per
layer in the layer map and evicts the layer if that last access is
further in the past than a configurable threshold value.
This policy kind is evaluated periodically at a configurable period.
It logs a summary statistic at `info!()` or `warn!()` level, depending
on whether any evictions failed.

This feature has no explicit killswitch since it's off by default.
2023-02-09 13:33:55 +01:00
Joonas Koivunen
1fdf01e3bc fix: readable Debug for Layers (#3575)
#3536 added the custom Debug implementations but it using derived Debug
on Key lead to too verbose output. Instead of making `Key`'s `Debug`
unconditionally or conditionally do the `Display` variant (for table
space'd keys), opted to build a newtype to provide `Debug` for
`Range<Key>` via `Display` which seemed to work unconditionally.

Also orders Key to have: 1. comment, 2. derive, 3. `struct Key`.
2023-02-09 13:55:37 +02:00
Christian Schwarz
446a39e969 make LayerAccesStatFullDetails Copy
Method to_api_model renamed to as_api_model because of Clippy complaint:
https://rust-lang.github.io/rust-clippy/master/index.html#wrong_self_convention
2023-02-09 12:35:45 +01:00
Arthur Petukhovsky
7ed9eb4a56 Add script for safekeeper tenants cleanup (#3452)
This script can be used to remove tenant directories on safekeepers for
projects which do not longer exist (deleted in the console).

To run this script you need to upload it to safekeeper (i.e. with SSH),
and run it with python3. Ansible can be used to run this script on
multiple safekeepers.

Fixes https://github.com/neondatabase/cloud/issues/3356
2023-02-09 13:28:20 +02:00
Joonas Koivunen
f07d6433b6 fix: one leftover Arc::ptr_eq (#3573)
@knizhnik noticed that one instance of `Arc::<dyn
PersistentLayer>::ptr_eq` was missed in #3558.

Now all `ptr_eq` which remain are in comments.
2023-02-09 13:02:07 +02:00
Heikki Linnakangas
2040db98ef Add docs for synthetic size calculation (#3328)
---------
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-02-09 11:20:10 +02:00
dependabot[bot]
371493ae32 Bump cryptography from 38.0.3 to 39.0.1 (#3565) 2023-02-08 16:08:01 +00:00
Rory de Zoete
1b9e5e84aa Add new storage hosts for placement group test (#3561)
To test the placement group setup
2023-02-08 16:48:29 +01:00
Christian Schwarz
7ed93fff06 refactor: allow for eviction of layers in a batch
The auto-eviction PR (#3552) operates in two phaes:

  1. find candidate layers
  2. evict them.

For (2), a batch API like the one added in this commit is useful.

Note that this PR requires #3558 to be merged first.
Otherwise, the tests won't pass.
2023-02-08 14:40:47 +01:00
Joonas Koivunen
a6dffb6ef9 fix: stop using Arc::ptr_eq with dyn Trait (#3558)
This changes the way we compare `Arc<dyn PersistentLayer>` in Timeline's
`LayerMap` not to use `Arc::ptr_eq` which has been witnessed in
development of #3557 to yield wrong results. It gives wrong results
because it compares fat pointers, which are `(object, vtable)` tuples
for `dyn Trait` and there are no guarantees that the `vtable`s are
unique. As in there were multiple vtables for `RemoteLayer` which is why
the comparison failed in #3557.

This is a known issue in rust, clippy warns against it and rust std
might be moving to the solution which has been reproduced on this PR:
compare only object pointers by "casting out" the vtable pointer.
2023-02-08 12:25:25 +00:00
Sergey Melnikov
c5c14368e3 Fix deploy-prod.yml syntax (#3556) 2023-02-07 15:27:31 +01:00
Sergey Melnikov
1254dc7ee2 Fix production deploy: run as root to access docker (#3555) 2023-02-07 15:21:15 +01:00
Joonas Koivunen
fcb905f519 Use LayerMap::replace in eviction (#3544)
Follow-up to #3536, to actually use the new `Debug` in replacing the
layers, and use replacement with manual eviction endpoint.

Turns out the two paths share a lot of handling of `Replacement` but
didn't unify the two (need 3). There are also upcoming refactorings
from other PRs to this.
2023-02-07 11:08:55 +02:00
Christian Schwarz
58fa4f0eb7 maintain access stats for historic layers
This patch adds basic access statistics for historic layers
and exposes them in the management API's `LayerMapInfo`.

We record the accesses in the `{Delta,Image}Layer::load()` function
because it's the common path of
* page_service (`Timline::get_reconstruct_data()`)
* Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`)

The stats survive residence status changes, and record these as well.

When scraping the layer map endpoint to record its evolution over time,
one must account for stat resets because they are in-memory only and
will reset on pageserver restart.
Use the launch timestamp header added by (#3527) to identify pageserver restarts.

This is PR https://github.com/neondatabase/neon/pull/3496
2023-02-06 17:01:38 +01:00
Anastasia Lubennikova
877a2d70e3 Periodically send cached consumption metrics (#3520)
Add new pageserver config setting `cached_metric_collection_interval`
with default `1 hour`.
This setting controls how often unchanged cached consumption metrics are sent to
the HTTP endpoint.

This is a workaround for billing service limitations.
fixes #3485
2023-02-06 17:53:10 +02:00
Sergey Melnikov
959f5c6f40 Do not deploy legacy scram proxy (*.cloud.neon.tech) to the old account (#3546)
We have migrated to the new proxy, which was setup in
https://github.com/neondatabase/neon/pull/3461
2023-02-06 15:51:20 +01:00
Joonas Koivunen
678fe0684f std::fmt::Debug for Layer implementations (#3536)
Follow-up to #3513.

This removes the old blanket `std::fmt::Debug` impl on `dyn Layer` which
did not seem to be used from anywhere (no compilation errors after
removing).

Adds `std::fmt::Debug` requirement and implementations for `trait Layer`
implementors:
- LayerDescriptor (derived)
- RemoteLayer (manual)
- DeltaLayer (manual)
- ImageLayer (manual)

Manual implementations are used to skip PageserverConf, tenant and
timeline ids, large collections.

Adds and adjusts some doc comments to be more rustdoc alike.
2023-02-06 14:21:51 +02:00
Shany Pozin
c9821f13e0 Expose the tenant calculated synthetic size as a Prometheus metric (#3541)
## Describe your changes
Expose the currently calculated synthetic size as a Prometheus metric
## Issue ticket number and link
#3509

## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-06 09:25:15 +02:00
Heikki Linnakangas
121d535068 Add a test for the PostgresNIO Swift client library.
It supports SNI. See discussion at
https://community.neon.tech/t/postgresnio-swift-sni-support/419/4
2023-02-04 15:56:11 +01:00
Kirill Bulatov
ec3a3aed37 Dump current tenant config (#3534)
The PR adds an endpoint to show tenant's current config: `GET
/v1/tenant/:tenant_id/config`

Tenant's config consists of two parts: tenant overrides (could be
changed via other management API requests) and the default part,
substituting all missing overrides (constant, hardcoded in pageserver).
The API returns the custom overrides and the final tenant config, after
applying all the defaults.

Along the way, it had to fix two things in the config:

* allow to shorten the json version and omit all `null`'s (same as toml
serializer behaves by default), and to understand such shortened format
when deserialized. A unit test is added
* fix a bug, when `PUT /v1/tenant/config` endpoint rewritten the local
file with what had came in the request, but updating (not rewriting the
old values) the in-memory state instead.
That got uncovered during adjusting the e2e test and fixed to do the
replacement everywhere, otherwise there's no way to revert existing
overrides. Fixes #3471 (commit
dc688affe8)
* fixes https://github.com/neondatabase/neon/issues/3472 by reordering
the config saving operations
2023-02-04 01:32:29 +02:00
Christian Schwarz
87cd2bae77 introduce LaunchTimestamp to identify process restarts
This patch adds a LaunchTimestamp type to the `metrics` crate,
along with a `libmetric_` Prometheus metric.

The initial user is pageserver.
In addition to exposing the Prometheus metric, it also reproduces
the launch timestamp as a header in the API responses.

The motivation for this is that we plan to scrape the pageserver's
/v1/tenant/:tenant_id/timeline/:timeline_id/layer
HTTP endpoint over time. It will soon expose access metrics (#3496)
which reset upon process restart. We will use the pageserver's launch
ID to identify a restart between two scrape points.

However, there are other potential uses. For example, we could use
the Prometheus metric to annotate Grafana plots whenever the launch
timestamp changes.
2023-02-03 18:12:17 +01:00
bojanserafimov
be81db21b9 Revert accidental change (#3538) 2023-02-03 17:54:12 +02:00
Joonas Koivunen
f2d89761c2 feat: LayerMap::replace (#3513)
Cc: #3486

Adds a method to replace a particular layer from the LayerMap for the
purposes of remote layer download and layer eviction. In those use cases
read lock on layer map needs to be released after initial search, but
other operations could modify layermap before replacing thread gets to
run.

Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-02-03 15:33:46 +02:00
Sergey Melnikov
a0372158a0 Add build_info metric to storage broker (#3525)
## Describe your changes
Add libmetrics_build_info metrics with commit sha to storage_broken
/metrics, to match behaviour of proxy, pageserver and safekeeper.
2023-02-03 12:23:27 +01:00
Anastasia Lubennikova
83048a4adc Handle errors during metric collection. (#3521)
Don't exit the loop if one of the tenants failed to scrape its metrics.
fixes #3490
2023-02-03 12:37:34 +02:00
Konstantin Knizhnik
f71b1b174d Check correctness of file_cache_size_limit (#3530)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-03 08:01:26 +02:00
Dmitry Ivanov
96e78394f5 [proxy] Fix project (aka endpoint) init in the password hack handler (#3529)
The project/endpoint should be set in the original (non-as_ref'd) creds,
because we call `wake_compute` not only in `try_password_hack` but also
later in the connection retry logic.

This PR also removes the obsolete `as_ref` method and makes the code
simpler because we no longer need this complication after a recent
refactoring.

Further action points: finally introduce typestate in creds (planned).
2023-02-02 22:56:15 +02:00
bojanserafimov
ada933eb42 Pageserver read trace utils (#2795)
List, dump, and analyze read traces.
2023-02-02 15:33:40 -05:00
Kirill Bulatov
f6a10f4693 Use regular layer names instead of a special test ones (#3524)
We do not need special enum variant for testing the file names, neither
its special handling across the code.
Current tests are able to create regular layers with normal layer names,
as the PR shows.
2023-02-02 14:52:17 +02:00
Vadim Kharitonov
d25307dced Compile postgres with ICU support 2023-02-02 13:33:38 +01:00
Kirill Bulatov
2759f1a22e Evict layers on demand (#3486)
Closes https://github.com/neondatabase/neon/issues/3439

Adds a set of commands to manipulate the layer map:
* dump the layer map contents
* evict the layer form the layer map (remove the local file, put the
remote layer instead in the layer map)
* download the layer (operation, reversing the eviction)

The commands will change later, when the statistics is added on top, so
the swagger schema is not adjusted.

The commands might have issues with big amount of layers: no pagination
is done for the dump command, eviction and download commands look for
the layer to evict/download by iterating all layers sequentially and
comparing the layer names.
For now, that seems to be tolerable ("big" number of layers is ~2_000)
and further experiments are needed.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-02-02 12:14:44 +02:00
Kirill Bulatov
f474495ba0 Publish builds stats that are easy to browse (#3514)
Adds two new tags, `run-extra-build-macos` and `run-extra-build-stats`
to trigger corresponding build jobs on any PR.

On every build for `main` or PR with `run-extra-build-stats` tag, publish a GitHub commit status with the link to the `cargo build --all --release --timings` report.
2023-02-02 11:18:42 +02:00
Shany Pozin
bf1c36a30c Moving the template file location (#3523)
see
https://github.com/appsmithorg/appsmith/issus/826#issuecomment-703093426
for details
2023-02-02 11:02:47 +02:00
Alexander Bayandin
567b71c1d2 Require poetry 1.3; regenerate poetry.lock (#3508)
Ref https://python-poetry.org/blog/announcing-poetry-1.3.0/#new-lock-file-format
2023-02-01 18:11:00 +00:00
Sergey Melnikov
f3dadfb3d0 Confirm that there is an emergency before manual execution of prod deploy workflow (#3507)
![image](https://user-images.githubusercontent.com/7127190/215840037-69eda3ee-920e-4b90-bf7d-aa58f0bdfb50.png)
2023-02-01 16:01:27 +01:00
Dmitry Ivanov
ea0278cf27 [proxy] Implement compute node info cache (#3331)
This patch adds a timed LRU cache implementation and a compute node info cache on top of that.
Cache entries might expire on their own (default ttl=5mins) or become invalid due to real-world events,
e.g. compute node scale-to-zero event, so we add a connection retry loop with a wake-up call.

Solved problems:
- [x] Find a decent LRU implementation.
- [x] Implement timed LRU on top of that.
- [x] Cache results of `proxy_wake_compute` API call.
- [x] Don't invalidate newer cache entries for the same key.
- [x] Add cmdline configuration knobs (requires some refactoring).
- [x] Add failed connection estab metric.
- [x] Refactor auth backends to make things simpler (retries, cache
placement, etc).
- [x] Address review comments (add code comments + cleanup).
- [x] Retry `/proxy_wake_compute` if we couldn't connect to a compute
(e.g. stalled cache entry).
- [x] Add high-level description for `TimedLru`.

TODOs (will be addressed later):
- [ ] Add cache metrics (hit, spurious hit, miss).
- [ ] Synchronize http requests across concurrent per-client tasks
(https://github.com/neondatabase/neon/pull/3331#issuecomment-1399216069).
- [ ] Cache results of `proxy_get_role_secret` API call.
2023-02-01 17:11:41 +03:00
Christian Schwarz
f1aece1ba0 add RequestContext plumbing for layer access stats
In preparation for #3496  plumb through RequestContext to the data
access methods of `PersistentLayer`.

This is PR https://github.com/neondatabase/neon/pull/3504
2023-02-01 15:29:01 +02:00
Christian Schwarz
590695e845 improve query param parsing
- add parse_query_param()
- use Cow<> where possible
- move param parsing code to utils::http::request

This was originally PR https://github.com/neondatabase/neon/pull/3502
which targeted a different branch.

closes  #3510
2023-02-01 14:11:12 +01:00
Joonas Koivunen
9bb6a6c77c pysync: override PYTHON_KEYRING_BACKEND (#3480)
This bothers me everytime I have to call `pysync`.
2023-02-01 14:07:23 +02:00
Vadim Kharitonov
2309dd5646 Install postgis_sfcgal 2023-02-01 10:45:41 +01:00
Sergey Melnikov
847fc566fd Use the same runners/container for old prod deployments as for new prod 2023-01-31 17:40:24 +01:00
dependabot[bot]
a058bc6de8 Bump aiohttp from 3.7.0 to 3.7.4 (#3445)
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.7.0 to
3.7.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/aio-libs/aiohttp/releases">aiohttp's
releases</a>.</em></p>
<blockquote>
<h2>aiohttp 3.7.3 release</h2>
<h2>Features</h2>
<ul>
<li>Use Brotli instead of brotlipy
<code>[#3803](https://github.com/aio-libs/aiohttp/issues/3803)
&lt;https://github.com/aio-libs/aiohttp/issues/3803&gt;</code>_</li>
<li>Made exceptions pickleable. Also changed the repr of some
exceptions.
<code>[#4077](https://github.com/aio-libs/aiohttp/issues/4077)
&lt;https://github.com/aio-libs/aiohttp/issues/4077&gt;</code>_</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Raise a ClientResponseError instead of an AssertionError for a blank
HTTP Reason Phrase.
<code>[#3532](https://github.com/aio-libs/aiohttp/issues/3532)
&lt;https://github.com/aio-libs/aiohttp/issues/3532&gt;</code>_</li>
<li>Fix <code>web_middlewares.normalize_path_middleware</code> behavior
for patch without slash.
<code>[#3669](https://github.com/aio-libs/aiohttp/issues/3669)
&lt;https://github.com/aio-libs/aiohttp/issues/3669&gt;</code>_</li>
<li>Fix overshadowing of overlapped sub-applications prefixes.
<code>[#3701](https://github.com/aio-libs/aiohttp/issues/3701)
&lt;https://github.com/aio-libs/aiohttp/issues/3701&gt;</code>_</li>
<li>Make <code>BaseConnector.close()</code> a coroutine and wait until
the client closes all connections. Drop deprecated &quot;with
Connector():&quot; syntax.
<code>[#3736](https://github.com/aio-libs/aiohttp/issues/3736)
&lt;https://github.com/aio-libs/aiohttp/issues/3736&gt;</code>_</li>
<li>Reset the <code>sock_read</code> timeout each time data is received
for a <code>aiohttp.client</code> response.
<code>[#3808](https://github.com/aio-libs/aiohttp/issues/3808)
&lt;https://github.com/aio-libs/aiohttp/issues/3808&gt;</code>_</li>
<li>Fixed type annotation for add_view method of UrlDispatcher to accept
any subclass of View
<code>[#3880](https://github.com/aio-libs/aiohttp/issues/3880)
&lt;https://github.com/aio-libs/aiohttp/issues/3880&gt;</code>_</li>
<li>Fixed querying the address families from DNS that the current host
supports.
<code>[#5156](https://github.com/aio-libs/aiohttp/issues/5156)
&lt;https://github.com/aio-libs/aiohttp/issues/5156&gt;</code>_</li>
<li>Change return type of MultipartReader.<strong>aiter</strong>() and
BodyPartReader.<strong>aiter</strong>() to AsyncIterator.
<code>[#5163](https://github.com/aio-libs/aiohttp/issues/5163)
&lt;https://github.com/aio-libs/aiohttp/issues/5163&gt;</code>_</li>
<li>Provide x86 Windows wheels.
<code>[#5230](https://github.com/aio-libs/aiohttp/issues/5230)
&lt;https://github.com/aio-libs/aiohttp/issues/5230&gt;</code>_</li>
</ul>
<h2>Improved Documentation</h2>
<ul>
<li>Add documentation for <code>aiohttp.web.FileResponse</code>.
<code>[#3958](https://github.com/aio-libs/aiohttp/issues/3958)
&lt;https://github.com/aio-libs/aiohttp/issues/3958&gt;</code>_</li>
<li>Removed deprecation warning in tracing example docs
<code>[#3964](https://github.com/aio-libs/aiohttp/issues/3964)
&lt;https://github.com/aio-libs/aiohttp/issues/3964&gt;</code>_</li>
<li>Fixed wrong &quot;Usage&quot; docstring of
<code>aiohttp.client.request</code>.
<code>[#4603](https://github.com/aio-libs/aiohttp/issues/4603)
&lt;https://github.com/aio-libs/aiohttp/issues/4603&gt;</code>_</li>
<li>Add aiohttp-pydantic to third party libraries
<code>[#5228](https://github.com/aio-libs/aiohttp/issues/5228)
&lt;https://github.com/aio-libs/aiohttp/issues/5228&gt;</code>_</li>
</ul>
<h2>Misc</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst">aiohttp's
changelog</a>.</em></p>
<blockquote>
<h1>3.7.4 (2021-02-25)</h1>
<h2>Bugfixes</h2>
<ul>
<li>
<p><strong>(SECURITY BUG)</strong> Started preventing open redirects in
the
<code>aiohttp.web.normalize_path_middleware</code> middleware. For
more details, see
<a
href="https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg">https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg</a>.</p>
<p>Thanks to <code>Beast Glatisant
&lt;https://github.com/g147&gt;</code>__ for
finding the first instance of this issue and <code>Jelmer Vernooij
&lt;https://jelmer.uk/&gt;</code>__ for reporting and tracking it down
in aiohttp.
<code>[#5497](https://github.com/aio-libs/aiohttp/issues/5497)
&lt;https://github.com/aio-libs/aiohttp/issues/5497&gt;</code>_</p>
</li>
<li>
<p>Fix interpretation difference of the pure-Python and the Cython-based
HTTP parsers construct a <code>yarl.URL</code> object for HTTP
request-target.</p>
<p>Before this fix, the Python parser would turn the URI's absolute-path
for <code>//some-path</code> into <code>/</code> while the Cython code
preserved it as
<code>//some-path</code>. Now, both do the latter.
<code>[#5498](https://github.com/aio-libs/aiohttp/issues/5498)
&lt;https://github.com/aio-libs/aiohttp/issues/5498&gt;</code>_</p>
</li>
</ul>
<hr />
<h1>3.7.3 (2020-11-18)</h1>
<h2>Features</h2>
<ul>
<li>Use Brotli instead of brotlipy
<code>[#3803](https://github.com/aio-libs/aiohttp/issues/3803)
&lt;https://github.com/aio-libs/aiohttp/issues/3803&gt;</code>_</li>
<li>Made exceptions pickleable. Also changed the repr of some
exceptions.
<code>[#4077](https://github.com/aio-libs/aiohttp/issues/4077)
&lt;https://github.com/aio-libs/aiohttp/issues/4077&gt;</code>_</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Raise a ClientResponseError instead of an AssertionError for a blank
HTTP Reason Phrase.
<code>[#3532](https://github.com/aio-libs/aiohttp/issues/3532)
&lt;https://github.com/aio-libs/aiohttp/issues/3532&gt;</code>_</li>
<li>Fix <code>web_middlewares.normalize_path_middleware</code> behavior
for patch without slash.
<code>[#3669](https://github.com/aio-libs/aiohttp/issues/3669)
&lt;https://github.com/aio-libs/aiohttp/issues/3669&gt;</code>_</li>
<li>Fix overshadowing of overlapped sub-applications prefixes.
<code>[#3701](https://github.com/aio-libs/aiohttp/issues/3701)
&lt;https://github.com/aio-libs/aiohttp/issues/3701&gt;</code>_</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0a26acc1de"><code>0a26acc</code></a>
Bump aiohttp to v3.7.4 for a security release</li>
<li><a
href="021c416c18"><code>021c416</code></a>
Merge branch 'ghsa-v6wp-4m6f-gcjg' into master</li>
<li><a
href="4ed7c25b53"><code>4ed7c25</code></a>
Bump chardet from 3.0.4 to 4.0.0 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5333">#5333</a>)</li>
<li><a
href="b61f0fdffc"><code>b61f0fd</code></a>
Fix how pure-Python HTTP parser interprets <code>//</code></li>
<li><a
href="5c1efbc32c"><code>5c1efbc</code></a>
Bump pre-commit from 2.9.2 to 2.9.3 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5322">#5322</a>)</li>
<li><a
href="0075075801"><code>0075075</code></a>
Bump pygments from 2.7.2 to 2.7.3 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5318">#5318</a>)</li>
<li><a
href="5085173d94"><code>5085173</code></a>
Bump multidict from 5.0.2 to 5.1.0 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5308">#5308</a>)</li>
<li><a
href="5d1a75e68d"><code>5d1a75e</code></a>
Bump pre-commit from 2.9.0 to 2.9.2 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5290">#5290</a>)</li>
<li><a
href="6724d0e7a9"><code>6724d0e</code></a>
Bump pre-commit from 2.8.2 to 2.9.0 (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5273">#5273</a>)</li>
<li><a
href="c688451ce3"><code>c688451</code></a>
Removed duplicate timeout parameter in ClientSession reference docs. (<a
href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5262">#5262</a>)
...</li>
<li>Additional commits viewable in <a
href="https://github.com/aio-libs/aiohttp/compare/v3.7.0...v3.7.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aiohttp&package-manager=pip&previous-version=3.7.0&new-version=3.7.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/neondatabase/neon/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Vadim Kharitonov <vadim@neon.tech>
2023-01-31 17:30:45 +02:00
Konstantin Knizhnik
895f929bce Add layer_map_analyzer tool (#3451)
See #3348
2023-01-31 15:50:52 +02:00
Vadim Kharitonov
a7d8bfa631 Fix create release PR 2023-01-31 14:36:04 +01:00
Sergey Melnikov
0806a46c0c Fix production deploy (#3498)
`get_binaries.sh` no longer use `RELEASE` environmental variable, it
just use `DOCKER_TAG`
2023-01-31 13:36:25 +01:00
Sergey Melnikov
5e08b35f53 Fix new deploy workflow (#3492)
Add 'branch' input to specify commit for deploy scripts/configs. Commit
can't be passed to workflow as ref, and we need to pin configs to
specific commit for main/release deploys
Update deploy input descriptions to match GH interface
2023-01-30 22:08:00 +01:00
Sergey Melnikov
82cbcb36ab Extract neon deploy jobs into separate workflows (#3424)
Extract deploy jobs from build_and_test.yml to deploy-dev and
deploy-prod workflows.
Add trigger to run this workflows after Neon is build and tested on main and
release branches.

This will allow us to redeploy/rollback/patch config without full
rebuild.
2023-01-30 20:10:54 +01:00
Vadim Kharitonov
ec0e641578 Create Release PR: review fixes 2023-01-30 16:15:22 +01:00
Lassi Pölönen
20b38acff0 Replace per timeline pageserver_storage_operations_seconds with a global one (#3409)
Related to: https://github.com/neondatabase/neon/issues/2848

`pageserver_storage_operations_seconds` is the most expensive metric we
have, as there are a lot of tenants/timelines and the histogram had 42
buckets. These are quite sparse too, so instead of having a histogram
per timeline, create a new histogram
`pageserver_storage_operations_seconds_global` without tenant and
timeline dimensions and replace `pageserver_storage_operations_seconds`
with sum and counter.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-01-30 17:10:29 +02:00
Kirill Bulatov
c61bc25ef9 Clean up NeedsDownload error (#3464) 2023-01-30 16:08:23 +02:00
Rory de Zoete
7bb13569b3 Switch more jobs to small runner (#3483)
As these jobs don't benefit from additional cores

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2023-01-30 14:00:44 +01:00
Vadim Kharitonov
5fc233964a Create release PR 2023-01-30 12:44:48 +01:00
Heikki Linnakangas
5ee77c0b1f Fix holding tracing span guard over query execution.
I added these spans to trace how long the queries take, but I didn't realize
that there's a difference between:

    let _ = span.entered();

and

    let _guard = span.entered();

The former drops the guard immediately, while the latter holds it
until the end of the scope. As a result, the span was ended
immediately, and the query was executed outside the span.
2023-01-30 12:10:51 +02:00
Shany Pozin
ddb9c2fe94 Add metrics for tenants state (#3448)
## Describe your changes
Added a metric that allow to monitor tenants state 
## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3161

## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [X] I have added an e2e test for it.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-01-29 14:04:06 +02:00
Shany Pozin
67d418e91c Set the last_record_gauge to the value which was persisted metadata (#3460)
## Describe your changes
Whenever a tenant is detached or the pageserver is restarted the
pageserver_last_record_lsn metric is dropped
This fix resurrects the value from the metadata whenever the tenant is
attached again
## Issue ticket number and link
[3571](https://github.com/neondatabase/cloud/issues/3571)
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-01-29 12:40:50 +02:00
Rory de Zoete
4d291d0e90 Prevent assume error (#3476)
To fix `Error: The requested DurationSeconds exceeds the
MaxSessionDuration set for this role.`

Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2023-01-27 19:27:23 +01:00
Rory de Zoete
4718c67c17 Update deploy steps (#3470)
First one isn't optimal, but as it was requested to run the runner as
nonroot ->
https://github.com/neondatabase/runner/pull/1#discussion_r1069909593
this job will need more significant refactoring. This should unblock the
deployment process.

---------

Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2023-01-27 18:05:49 +01:00
Konstantin Knizhnik
c5ca7d0c68 Implement asynchronous pipe for communication with walredo process (#3368)
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-01-27 18:36:24 +02:00
Joonas Koivunen
0ec84e2f1f Allow creating config for attached tenant (#3446)
Currently `attach` doesn't write a tenant config, because we don't back
it up in the first place. The current implementation of
`Tenant::persist_tenant_config` does not allow changing tenant's
configuration through the http api which will fail because the file
wasn't created on attach and
`OpenOptions::truncate(true).write(true).create_new(false)` is used.

I think this patch allows for least controversial middle ground which
*enables* changing tenant configuration even for attached tenants (not
just created tenants).
2023-01-27 15:34:59 +02:00
Rory de Zoete
8342e9ea6f Update helm job (#3467)
As followup from https://github.com/neondatabase/build/pull/47

Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2023-01-27 13:28:26 +01:00
Christian Schwarz
99399c112a move walreceiver module under timeline
Walreceiver is a per-timeline abstraction. Move it there to reflect
the hierarchy of abstractions and task_mgr tasks.
The code that sets up the global storage_broker client
is not timeline-scoped. So, break it out into a separate module.

The motivation for this change is to prepare the code base for replacing
the task_mgr global task registry with a more ownership-oriented
approach to manage task lifetimes.

I removed TaskStateUpdate::Init because, after doing the changes,
rustc warned that it was never constructed.
A quick search through the commit history shows that this
has always been true since

    commit fb68d01449
    Author: Dmitry Rodionov <dmitry@neon.tech>
    Date:   Mon Sep 26 23:57:02 2022 +0300

        Preserve task result in TaskHandle by keeping join handle around (#2521)

So, the warning is not an indication of some accidental code removal.

This is PR: https://github.com/neondatabase/neon/pull/3456
2023-01-27 12:23:17 +01:00
Rory de Zoete
2388981311 Add cleanup tasks for ansible and helm (#3465)
To fix:

https://github.com/neondatabase/neon/actions/runs/4023027504/jobs/6913421070

https://github.com/neondatabase/neon/actions/runs/4023027504/jobs/6913421268

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2023-01-27 11:20:51 +01:00
Sergey Melnikov
fb721cdfa5 Setup legacy scram proxy in the new account (#3461)
This setup proxies with *.cloud.neon.tech certificate in the us-west-2
region of the new account, we are not switching to them here yet
2023-01-27 11:05:05 +01:00
Heikki Linnakangas
bf63f129ae Make 'branch_timeline' function more clear.
Change the signature so that it takes an Arc<Timeline> reference to the
source timeline, instead of just the ID. All the callers have an Arc
reference at hand, so this is more convenient for everyone.

Reorder the code a bit and improve the comments, to make it more clear
what it does and why.
2023-01-27 02:12:07 +02:00
Sergey Melnikov
2ecd0e1f00 Decommission link proxy from old account (#3454) 2023-01-26 16:18:57 +01:00
Rory de Zoete
b858d70f19 Update promote job (#3455)
To fix errors such as:
`An error occurred (ImageAlreadyExistsException) when calling the
PutImage operation: Image with digest
'sha256:da6d8ad97d84e3aec4e6a240c3a35868b626692ee5d199cdd3fe45d29a8e54df'
and tag 'latest' already exists in the repository with name
'compute-node-v14' in registry with id '369495373322'`

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2023-01-26 14:26:23 +01:00
Heikki Linnakangas
0c0e15b81d compute_ctl: Extract tracing context from incoming HTTP requests.
This allows tracing the handling of HTTP requests as part of the caller's
trace.
2023-01-26 15:20:03 +02:00
Heikki Linnakangas
3e94fd5af3 Inherit OpenTelemetry context for compute startup from cloud console.
This allows fine-grained distributed tracing of the 'start_compute'
operation from the cloud console. The startup actions performed by
'compute_ctl' are now performed in a child of the 'start_compute'
context, so you can trace through the whole compute start operation.

This needs a corresponding change in the cloud console to fill in the
'startup_tracing_context' field in the json spec. If it's missing, the
startup operations are simply traced as a separate trace, without
a parent.
2023-01-26 15:20:03 +02:00
Heikki Linnakangas
006ee5f94a Configure 'compute_ctl' to use OpenTelemetry exporter.
This allows tracing the startup actions e.g. with Jaeger
(https://www.jaegertracing.io/). We use the "tracing-opentelemetry"
crate, which turns tracing spans into OpenTelemetry spans, so you can
use the usual "#[instrument]" directives to add tracing.

I put the tracing initialization code to a separate crate,
`tracing-utils`, so that we can reuse it in other programs. We
probably want to set up tracing in the same way in all our programs.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-01-26 15:20:03 +02:00
Rory de Zoete
4bcbb7793d Revert docker hub job (#3453)
Regression fix as permissions aren't configured properly on gen3 for
this job.

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2023-01-26 11:30:53 +01:00
Christian Schwarz
dc64962ffc tenant::mgr: explicit tracking of initializing & shutting-down states
This patch wrap the tenants hashmap into an enum that represents the
tenant manager's three major states:
- Initializing
- Open for business
- Shutting down.
See the enum doc comments for details.

In response, all the users of `TENANTS` are now forced to distinguish
those states.
The only major change is in `run_if_no_tenant_in_memory`, which,
before this patch, was used by the /attach and /load endpoints.
This patch rewrites that method under the name `tenant_map_insert`,
replacing the anyhow::Result with a std Result and a dedicated error
type.
Introducing this error types allows using `tenant_map_insert` in
`tenant_create`, thereby unifying all code paths that create tenants
objects to use `tenant_map_insert`.

This is beneficial because we can now systematically prevent tenants
from being created, attached, or `/load`ed during pageserver shutdown.
The management API remains available, but the endpoints that create
new tenants will fail with an error.
More work would need to be done to properly distinguish these errors
through HTTP status codes such as 503.
2023-01-26 11:24:48 +01:00
Rory de Zoete
cd5732d9d8 Gen3 runners (#3220)
https://github.com/neondatabase/cloud/issues/2738

Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2023-01-26 10:46:06 +01:00
bojanserafimov
0a09589403 Increase gc period to 1h (#3432) 2023-01-25 15:18:41 -05:00
Vadim Kharitonov
e3efb0d854 Fix bug while creating unit extension (#3447)
after executing

```sql
CREATE EXTENSION unit;
``` 

I saw such error

```
ERROR: could not open file "/usr/local/pgsql/share/extension/unit_prefixes.data" for reading: No such file or directory (SQLSTATE 58P01)
```

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-01-25 17:29:06 +00:00
Sergey Melnikov
4b8dbea5c1 Add production link proxy to new account (#3444)
This PR setup link proxy in us-east-2 region, but do not redirect
pg.neon.tech DNS name to it
Will keep old link proxy for the time of migration
2023-01-25 17:15:56 +01:00
Kirill Bulatov
0c7276ae13 Expect timeline being stopped during the detach smoke test (#3442)
Found this error recently:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/release/4005062867/index.html#categories/d2116d7e3b88302f27b3d646396b385b/18590f7063e91b53/?attachment=69e899c74f1cbfc5

I could not reproduce it locally, since always received `gc target
timeline does not exist` instead, so that test is quite lucky.

Still, the error is pretty valid to appear in this context, so do not fail the test if it's found in the logs.
2023-01-25 16:23:30 +02:00
Vadim Kharitonov
00f1f54b7a Leave one Dockerfile 2023-01-25 15:10:45 +01:00
Christian Schwarz
8963d830fb add script to download all remote layers (#3294)
For use in production in case on-demand download turns out to be
problematic during tenant_attach, or when we eventually introduce layer
eviction.

Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
2023-01-25 16:55:25 +03:00
Christian Schwarz
01b4b0c2f3 Introduce RequestContext
Motivation
==========

Layer Eviction Needs Context
----------------------------

Before we start implementing layer eviction, we need to collect some
access statistics per layer file or maybe even page.
Part of these statistics should be the initiator of a page read request
to answer the question of whether it was page_service vs. one of the
background loops, and if the latter, which of them?

Further, it would be nice to learn more about what activity in the pageserver
initiated an on-demand download of a layer file.
We will use this information to test out layer eviction policies.

Read more about the current plan for layer eviction here:
https://github.com/neondatabase/neon/issues/2476#issuecomment-1370822104

task_mgr problems + cancellation + tenant/timeline lifecycle
------------------------------------------------------------

Apart from layer eviction, we have long-standing problems with task_mgr,
task cancellation, and various races around tenant / timeline lifecycle
transitions.
One approach to solve these is to abandon task_mgr in favor of a
mechanism similar to Golang's context.Context, albeit extended to
support waiting for completion, and specialized to the needs in the
pageserver.

Heikki solves all of the above at once in PR
https://github.com/neondatabase/neon/pull/3228 , which is not yet
merged at the time of writing.

What Is This Patch About
========================

This patch addresses the immediate needs of layer eviction by
introducing a `RequestContext` structure that is plumbed through the
pageserver - all the way from the various entrypoints (page_service,
management API, tenant background loops) down to
Timeline::{get,get_reconstruct_data}.

The struct carries a description of the kind of activity that initiated
the call. We re-use task_mgr::TaskKind for this.

Also, it carries the desired on-demand download behavior of the entrypoint.
Timeline::get_reconstruct_data can then log the TaskKind that initiated
the on-demand download.

I developed this patch by git-checking-out Heikki's big RequestContext
PR https://github.com/neondatabase/neon/pull/3228 , then deleting all
the functionality that we do not need to address the needs for layer
eviction.

After that, I added a few things on top:

1. The concept of attached_child and detached_child in preparation for
   cancellation signalling through RequestContext, which will be added in
   a future patch.
2. A kill switch to turn DownloadBehavior::Error into a warning.
3. Renamed WalReceiverConnection to WalReceiverConnectionPoller and
   added an additional TaskKind WalReceiverConnectionHandler.These were
   necessary to create proper detached_child-type RequestContexts for the
   various tasks that walreceiver starts.

How To Review This Patch
========================

Start your review with the module-level comment in context.rs.
It explains the idea of RequestContext, what parts of it are implemented
in this patch, and the future plans for RequestContext.

Then review the various `task_mgr::spawn` call sites. At each of them,
we should be creating a new detached_child RequestContext.

Then review the (few) RequestContext::attached_child call sites and
ensure that the spawned tasks do not outlive the task that spawns them.
If they do, these call sites should use detached_child() instead.

Then review the todo_child() call sites and judge whether it's worth the
trouble of plumbing through a parent context from the caller(s).

Lastly, go through the bulk of mechanical changes that simply forwards
the &ctx.
2023-01-25 14:53:30 +01:00
Sergey Melnikov
dee71404a2 Use TLS for staging link proxy (#3443)
Fixes #3416 on staging

Adding domain parameter result in:
* Issuing TLS cert for that domain
* Passing that cert to proxy with `--tls-key`/`--tls-cert`
2023-01-25 14:39:55 +01:00
Kirill Bulatov
572332ab50 Tone down page_service timeouts (#3426)
Closes https://github.com/neondatabase/neon/issues/3341
2023-01-25 13:40:08 +02:00
Vadim Kharitonov
5223b62a19 Compile unit extension 2023-01-25 12:08:45 +01:00
Vadim Kharitonov
bc4f594ed6 Fix Sentry Version 2023-01-25 12:07:38 +01:00
Kirill Bulatov
ea6f41324a Tone down postgres client io errors (#3435)
Closes https://github.com/neondatabase/neon/issues/3343
2023-01-25 10:50:33 +00:00
Kirill Bulatov
3d5faa0295 Unify common image args in compute-node Dockerfiles (#3437)
Part of https://github.com/neondatabase/neon/issues/3436
2023-01-25 12:39:53 +02:00
Kirill Bulatov
9fbef1159f Tone down http error printing (#3434)
Only print backtraces for internal server error variants of the API
error.
2023-01-25 10:36:30 +00:00
501 changed files with 52006 additions and 21638 deletions

View File

@@ -14,3 +14,4 @@ opt-level = 1
[alias]
build_testing = ["build", "--features", "testing"]
neon = ["run", "--bin", "neon_local"]

View File

@@ -4,7 +4,7 @@
hakari-package = "workspace_hack"
# Format for `workspace-hack = ...` lines in other Cargo.tomls. Requires cargo-hakari 0.9.8 or above.
dep-format-version = "3"
dep-format-version = "4"
# Setting workspace.resolver = "2" in the root Cargo.toml is HIGHLY recommended.
# Hakari works much better with the new feature resolver.

View File

@@ -15,8 +15,11 @@
!proxy/
!safekeeper/
!storage_broker/
!trace/
!vendor/postgres-v14/
!vendor/postgres-v15/
!workspace_hack/
!neon_local/
!scripts/ninstall.sh
!scripts/combine_control_files.py
!vm-cgconfig.conf

View File

@@ -10,6 +10,7 @@
<!-- List everything that should be done **before** release, any issues / setting changes / etc -->
### Checklist after release
- [ ] Make sure instructions from PRs included in this release and labeled `manual_release_instructions` are executed (either by you or by people who wrote them).
- [ ] Based on the merged commits write release notes and open a PR into `website` repo ([example](https://github.com/neondatabase/website/pull/219/files))
- [ ] Check [#dev-production-stream](https://neondb.slack.com/archives/C03F5SM1N02) Slack channel
- [ ] Check [stuck projects page](https://console.neon.tech/admin/projects?sort=last_active&order=desc&stuck=true)

View File

@@ -0,0 +1,186 @@
name: 'Create Allure report'
description: 'Generate Allure report from uploaded by actions/allure-report-store tests results'
outputs:
report-url:
description: 'Allure report URL'
value: ${{ steps.generate-report.outputs.report-url }}
report-json-url:
description: 'Allure report JSON URL'
value: ${{ steps.generate-report.outputs.report-json-url }}
runs:
using: "composite"
steps:
# We're using some of env variables quite offen, so let's set them once.
#
# It would be nice to have them set in common runs.env[0] section, but it doesn't work[1]
#
# - [0] https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsenv
# - [1] https://github.com/neondatabase/neon/pull/3907#discussion_r1154703456
#
- name: Set variables
shell: bash -euxo pipefail {0}
run: |
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
BRANCH_OR_PR=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")
fi
LOCK_FILE=reports/${BRANCH_OR_PR}/lock.txt
WORKDIR=/tmp/${BRANCH_OR_PR}-$(date +%s)
mkdir -p ${WORKDIR}
echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV
echo "LOCK_FILE=${LOCK_FILE}" >> $GITHUB_ENV
echo "WORKDIR=${WORKDIR}" >> $GITHUB_ENV
echo "BUCKET=${BUCKET}" >> $GITHUB_ENV
env:
BUCKET: neon-github-public-dev
# TODO: We can replace with a special docker image with Java and Allure pre-installed
- uses: actions/setup-java@v3
with:
distribution: 'temurin'
java-version: '17'
- name: Install Allure
shell: bash -euxo pipefail {0}
run: |
if ! which allure; then
ALLURE_ZIP=allure-${ALLURE_VERSION}.zip
wget -q https://github.com/allure-framework/allure2/releases/download/${ALLURE_VERSION}/${ALLURE_ZIP}
echo "${ALLURE_ZIP_SHA256} ${ALLURE_ZIP}" | sha256sum --check
unzip -q ${ALLURE_ZIP}
echo "$(pwd)/allure-${ALLURE_VERSION}/bin" >> $GITHUB_PATH
rm -f ${ALLURE_ZIP}
fi
env:
ALLURE_VERSION: 2.22.1
ALLURE_ZIP_SHA256: fdc7a62d94b14c5e0bf25198ae1feded6b005fdbed864b4d3cb4e5e901720b0b
# Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this
- name: Acquire lock
shell: bash -euxo pipefail {0}
run: |
LOCK_TIMEOUT=300 # seconds
LOCK_CONTENT="${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"
echo ${LOCK_CONTENT} > ${WORKDIR}/lock.txt
# Do it up to 5 times to avoid race condition
for _ in $(seq 1 5); do
for i in $(seq 1 ${LOCK_TIMEOUT}); do
LOCK_ACQUIRED=$(aws s3api head-object --bucket neon-github-public-dev --key ${LOCK_FILE} | jq --raw-output '.LastModified' || true)
# `date --date="..."` is supported only by gnu date (i.e. it doesn't work on BSD/macOS)
if [ -z "${LOCK_ACQUIRED}" ] || [ "$(( $(date +%s) - $(date --date="${LOCK_ACQUIRED}" +%s) ))" -gt "${LOCK_TIMEOUT}" ]; then
break
fi
sleep 1
done
aws s3 mv --only-show-errors ${WORKDIR}/lock.txt "s3://${BUCKET}/${LOCK_FILE}"
# Double-check that exactly THIS run has acquired the lock
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt
if [ "$(cat lock.txt)" = "${LOCK_CONTENT}" ]; then
break
fi
done
- name: Generate and publish final Allure report
id: generate-report
shell: bash -euxo pipefail {0}
run: |
REPORT_PREFIX=reports/${BRANCH_OR_PR}
RAW_PREFIX=reports-raw/${BRANCH_OR_PR}/${GITHUB_RUN_ID}
# Get previously uploaded data for this run
ZSTD_NBTHREADS=0
S3_FILEPATHS=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${RAW_PREFIX}/ | jq --raw-output '.Contents[].Key')
if [ -z "$S3_FILEPATHS" ]; then
# There's no previously uploaded data for this $GITHUB_RUN_ID
exit 0
fi
for S3_FILEPATH in ${S3_FILEPATHS}; do
time aws s3 cp --only-show-errors "s3://${BUCKET}/${S3_FILEPATH}" "${WORKDIR}"
archive=${WORKDIR}/$(basename $S3_FILEPATH)
mkdir -p ${archive%.tar.zst}
time tar -xf ${archive} -C ${archive%.tar.zst}
rm -f ${archive}
done
# Get history trend
time aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${REPORT_PREFIX}/latest/history" "${WORKDIR}/latest/history" || true
# Generate report
time allure generate --clean --output ${WORKDIR}/report ${WORKDIR}/*
# Replace a logo link with a redirect to the latest version of the report
sed -i 's|<a href="." class=|<a href="https://'${BUCKET}'.s3.amazonaws.com/'${REPORT_PREFIX}'/latest/index.html?nocache='"'+Date.now()+'"'" class=|g' ${WORKDIR}/report/app.js
# Upload a history and the final report (in this particular order to not to have duplicated history in 2 places)
time aws s3 mv --recursive --only-show-errors "${WORKDIR}/report/history" "s3://${BUCKET}/${REPORT_PREFIX}/latest/history"
time aws s3 mv --recursive --only-show-errors "${WORKDIR}/report" "s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}"
REPORT_URL=https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html
# Generate redirect
cat <<EOF > ${WORKDIR}/index.html
<!DOCTYPE html>
<meta charset="utf-8">
<title>Redirecting to ${REPORT_URL}</title>
<meta http-equiv="refresh" content="0; URL=${REPORT_URL}">
EOF
time aws s3 cp --only-show-errors ${WORKDIR}/index.html "s3://${BUCKET}/${REPORT_PREFIX}/latest/index.html"
echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT
echo "report-json-url=${REPORT_URL%/index.html}/data/suites.json" >> $GITHUB_OUTPUT
echo "[Allure Report](${REPORT_URL})" >> ${GITHUB_STEP_SUMMARY}
- name: Release lock
if: always()
shell: bash -euxo pipefail {0}
run: |
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt || exit 0
if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}" ]; then
aws s3 rm "s3://${BUCKET}/${LOCK_FILE}"
fi
- name: Cleanup
if: always()
shell: bash -euxo pipefail {0}
run: |
if [ -d "${WORKDIR}" ]; then
rm -rf ${WORKDIR}
fi
- uses: actions/github-script@v6
if: always()
env:
REPORT_URL: ${{ steps.generate-report.outputs.report-url }}
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
with:
script: |
const { REPORT_URL, COMMIT_SHA } = process.env
await github.rest.repos.createCommitStatus({
owner: context.repo.owner,
repo: context.repo.repo,
sha: `${COMMIT_SHA}`,
state: 'success',
target_url: `${REPORT_URL}`,
context: 'Allure report',
})

View File

@@ -0,0 +1,72 @@
name: 'Store Allure results'
description: 'Upload test results to be used by actions/allure-report-generate'
inputs:
report-dir:
description: 'directory with test results generated by tests'
required: true
unique-key:
description: 'string to distinguish different results in the same run'
required: true
runs:
using: "composite"
steps:
- name: Set variables
shell: bash -euxo pipefail {0}
run: |
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
BRANCH_OR_PR=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")
fi
echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV
echo "REPORT_DIR=${REPORT_DIR}" >> $GITHUB_ENV
env:
REPORT_DIR: ${{ inputs.report-dir }}
- name: Upload test results
shell: bash -euxo pipefail {0}
run: |
REPORT_PREFIX=reports/${BRANCH_OR_PR}
RAW_PREFIX=reports-raw/${BRANCH_OR_PR}/${GITHUB_RUN_ID}
# Add metadata
cat <<EOF > ${REPORT_DIR}/executor.json
{
"name": "GitHub Actions",
"type": "github",
"url": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/latest/index.html",
"buildOrder": ${GITHUB_RUN_ID},
"buildName": "GitHub Actions Run #${GITHUB_RUN_NUMBER}/${GITHUB_RUN_ATTEMPT}",
"buildUrl": "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/attempts/${GITHUB_RUN_ATTEMPT}",
"reportUrl": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html",
"reportName": "Allure Report"
}
EOF
cat <<EOF > ${REPORT_DIR}/environment.properties
COMMIT_SHA=${COMMIT_SHA}
EOF
ARCHIVE="${UNIQUE_KEY}-${GITHUB_RUN_ATTEMPT}-$(date +%s).tar.zst"
ZSTD_NBTHREADS=0
time tar -C ${REPORT_DIR} -cf ${ARCHIVE} --zstd .
time aws s3 mv --only-show-errors ${ARCHIVE} "s3://${BUCKET}/${RAW_PREFIX}/${ARCHIVE}"
env:
UNIQUE_KEY: ${{ inputs.unique-key }}
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
BUCKET: neon-github-public-dev
- name: Cleanup
if: always()
shell: bash -euxo pipefail {0}
run: |
rm -rf ${REPORT_DIR}

View File

@@ -1,232 +0,0 @@
name: 'Create Allure report'
description: 'Create and publish Allure report'
inputs:
action:
desctiption: 'generate or store'
required: true
build_type:
description: '`build_type` from run-python-test-set action'
required: true
test_selection:
description: '`test_selector` from run-python-test-set action'
required: false
outputs:
report-url:
description: 'Allure report URL'
value: ${{ steps.generate-report.outputs.report-url }}
runs:
using: "composite"
steps:
- name: Validate input parameters
shell: bash -euxo pipefail {0}
run: |
if [ "${{ inputs.action }}" != "store" ] && [ "${{ inputs.action }}" != "generate" ]; then
echo 2>&1 "Unknown inputs.action type '${{ inputs.action }}'; allowed 'generate' or 'store' only"
exit 1
fi
if [ -z "${{ inputs.test_selection }}" ] && [ "${{ inputs.action }}" == "store" ]; then
echo 2>&1 "inputs.test_selection must be set for 'store' action"
exit 2
fi
- name: Calculate variables
id: calculate-vars
shell: bash -euxo pipefail {0}
run: |
# TODO: for manually triggered workflows (via workflow_dispatch) we need to have a separate key
pr_number=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${pr_number}" != "null" ]; then
key=pr-${pr_number}
elif [ "${GITHUB_REF_NAME}" = "main" ]; then
# Shortcut for a special branch
key=main
elif [ "${GITHUB_REF_NAME}" = "release" ]; then
# Shortcut for a special branch
key=release
else
key=branch-$(printf "${GITHUB_REF_NAME}" | tr -c "[:alnum:]._-" "-")
fi
echo "KEY=${key}" >> $GITHUB_OUTPUT
# Sanitize test selection to remove `/` and any other special characters
# Use printf instead of echo to avoid having `\n` at the end of the string
test_selection=$(printf "${{ inputs.test_selection }}" | tr -c "[:alnum:]._-" "-" )
echo "TEST_SELECTION=${test_selection}" >> $GITHUB_OUTPUT
- uses: actions/setup-java@v3
if: ${{ inputs.action == 'generate' }}
with:
distribution: 'temurin'
java-version: '17'
- name: Install Allure
if: ${{ inputs.action == 'generate' }}
shell: bash -euxo pipefail {0}
run: |
if ! which allure; then
ALLURE_ZIP=allure-${ALLURE_VERSION}.zip
wget -q https://github.com/allure-framework/allure2/releases/download/${ALLURE_VERSION}/${ALLURE_ZIP}
echo "${ALLURE_ZIP_MD5} ${ALLURE_ZIP}" | md5sum -c
unzip -q ${ALLURE_ZIP}
echo "$(pwd)/allure-${ALLURE_VERSION}/bin" >> $GITHUB_PATH
rm -f ${ALLURE_ZIP}
fi
env:
ALLURE_VERSION: 2.19.0
ALLURE_ZIP_MD5: ced21401a1a8b9dfb68cee9e4c210464
- name: Upload Allure results
if: ${{ inputs.action == 'store' }}
env:
REPORT_PREFIX: reports/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
RAW_PREFIX: reports-raw/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
TEST_OUTPUT: /tmp/test_output
BUCKET: neon-github-public-dev
TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}
shell: bash -euxo pipefail {0}
run: |
# Add metadata
cat <<EOF > $TEST_OUTPUT/allure/results/executor.json
{
"name": "GitHub Actions",
"type": "github",
"url": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/latest/index.html",
"buildOrder": ${GITHUB_RUN_ID},
"buildName": "GitHub Actions Run #${{ github.run_number }}/${GITHUB_RUN_ATTEMPT}",
"buildUrl": "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/attempts/${GITHUB_RUN_ATTEMPT}",
"reportUrl": "https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html",
"reportName": "Allure Report"
}
EOF
cat <<EOF > $TEST_OUTPUT/allure/results/environment.properties
TEST_SELECTION=${{ inputs.test_selection }}
BUILD_TYPE=${{ inputs.build_type }}
EOF
ARCHIVE="${GITHUB_RUN_ID}-${TEST_SELECTION}-${GITHUB_RUN_ATTEMPT}-$(date +%s).tar.zst"
ZSTD_NBTHREADS=0
tar -C ${TEST_OUTPUT}/allure/results -cf ${ARCHIVE} --zstd .
aws s3 mv --only-show-errors ${ARCHIVE} "s3://${BUCKET}/${RAW_PREFIX}/${ARCHIVE}"
# Potentially we could have several running build for the same key (for example for the main branch), so we use improvised lock for this
- name: Acquire Allure lock
if: ${{ inputs.action == 'generate' }}
shell: bash -euxo pipefail {0}
env:
LOCK_FILE: reports/${{ steps.calculate-vars.outputs.KEY }}/lock.txt
BUCKET: neon-github-public-dev
TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}
run: |
LOCK_TIMEOUT=300 # seconds
for _ in $(seq 1 5); do
for i in $(seq 1 ${LOCK_TIMEOUT}); do
LOCK_ADDED=$(aws s3api head-object --bucket neon-github-public-dev --key ${LOCK_FILE} | jq --raw-output '.LastModified' || true)
# `date --date="..."` is supported only by gnu date (i.e. it doesn't work on BSD/macOS)
if [ -z "${LOCK_ADDED}" ] || [ "$(( $(date +%s) - $(date --date="${LOCK_ADDED}" +%s) ))" -gt "${LOCK_TIMEOUT}" ]; then
break
fi
sleep 1
done
echo "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" > lock.txt
aws s3 mv --only-show-errors lock.txt "s3://${BUCKET}/${LOCK_FILE}"
# A double-check that exactly WE have acquired the lock
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt
if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" ]; then
break
fi
done
- name: Generate and publish final Allure report
if: ${{ inputs.action == 'generate' }}
id: generate-report
env:
REPORT_PREFIX: reports/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
RAW_PREFIX: reports-raw/${{ steps.calculate-vars.outputs.KEY }}/${{ inputs.build_type }}
TEST_OUTPUT: /tmp/test_output
BUCKET: neon-github-public-dev
shell: bash -euxo pipefail {0}
run: |
# Get previously uploaded data for this run
ZSTD_NBTHREADS=0
s3_filepaths=$(aws s3api list-objects-v2 --bucket ${BUCKET} --prefix ${RAW_PREFIX}/${GITHUB_RUN_ID}- | jq --raw-output '.Contents[].Key')
if [ -z "$s3_filepaths" ]; then
# There's no previously uploaded data for this run
exit 0
fi
for s3_filepath in ${s3_filepaths}; do
aws s3 cp --only-show-errors "s3://${BUCKET}/${s3_filepath}" "${TEST_OUTPUT}/allure/"
archive=${TEST_OUTPUT}/allure/$(basename $s3_filepath)
mkdir -p ${archive%.tar.zst}
tar -xf ${archive} -C ${archive%.tar.zst}
rm -f ${archive}
done
# Get history trend
aws s3 cp --recursive --only-show-errors "s3://${BUCKET}/${REPORT_PREFIX}/latest/history" "${TEST_OUTPUT}/allure/latest/history" || true
# Generate report
allure generate --clean --output $TEST_OUTPUT/allure/report $TEST_OUTPUT/allure/*
# Replace a logo link with a redirect to the latest version of the report
sed -i 's|<a href="." class=|<a href="https://'${BUCKET}'.s3.amazonaws.com/'${REPORT_PREFIX}'/latest/index.html" class=|g' $TEST_OUTPUT/allure/report/app.js
# Upload a history and the final report (in this particular order to not to have duplicated history in 2 places)
aws s3 mv --recursive --only-show-errors "${TEST_OUTPUT}/allure/report/history" "s3://${BUCKET}/${REPORT_PREFIX}/latest/history"
aws s3 mv --recursive --only-show-errors "${TEST_OUTPUT}/allure/report" "s3://${BUCKET}/${REPORT_PREFIX}/${GITHUB_RUN_ID}"
REPORT_URL=https://${BUCKET}.s3.amazonaws.com/${REPORT_PREFIX}/${GITHUB_RUN_ID}/index.html
# Generate redirect
cat <<EOF > ./index.html
<!DOCTYPE html>
<meta charset="utf-8">
<title>Redirecting to ${REPORT_URL}</title>
<meta http-equiv="refresh" content="0; URL=${REPORT_URL}">
EOF
aws s3 cp --only-show-errors ./index.html "s3://${BUCKET}/${REPORT_PREFIX}/latest/index.html"
echo "[Allure Report](${REPORT_URL})" >> ${GITHUB_STEP_SUMMARY}
echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT
- name: Release Allure lock
if: ${{ inputs.action == 'generate' && always() }}
shell: bash -euxo pipefail {0}
env:
LOCK_FILE: reports/${{ steps.calculate-vars.outputs.KEY }}/lock.txt
BUCKET: neon-github-public-dev
TEST_SELECTION: ${{ steps.calculate-vars.outputs.TEST_SELECTION }}
run: |
aws s3 cp --only-show-errors "s3://${BUCKET}/${LOCK_FILE}" ./lock.txt || exit 0
if [ "$(cat lock.txt)" = "${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}-${TEST_SELECTION}" ]; then
aws s3 rm "s3://${BUCKET}/${LOCK_FILE}"
fi
- uses: actions/github-script@v6
if: ${{ inputs.action == 'generate' && always() }}
env:
REPORT_URL: ${{ steps.generate-report.outputs.report-url }}
BUILD_TYPE: ${{ inputs.build_type }}
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
with:
script: |
const { REPORT_URL, BUILD_TYPE, SHA } = process.env
await github.rest.repos.createCommitStatus({
owner: context.repo.owner,
repo: context.repo.repo,
sha: `${SHA}`,
state: 'success',
target_url: `${REPORT_URL}`,
context: `Allure report / ${BUILD_TYPE}`,
})

View File

@@ -37,7 +37,7 @@ runs:
echo 'SKIPPED=true' >> $GITHUB_OUTPUT
exit 0
else
echo 2>&1 "Neither s3://${BUCKET}/${PREFIX}/${FILENAME} nor its version from previous attempts exist"
echo >&2 "Neither s3://${BUCKET}/${PREFIX}/${FILENAME} nor its version from previous attempts exist"
exit 1
fi
fi

View File

@@ -58,7 +58,7 @@ runs:
done
if [ -z "${branch_id}" ] || [ "${branch_id}" == "null" ]; then
echo 2>&1 "Failed to create branch after 10 attempts, the latest response was: ${branch}"
echo >&2 "Failed to create branch after 10 attempts, the latest response was: ${branch}"
exit 1
fi
@@ -122,7 +122,7 @@ runs:
done
if [ -z "${password}" ] || [ "${password}" == "null" ]; then
echo 2>&1 "Failed to reset password after 10 attempts, the latest response was: ${reset_password}"
echo >&2 "Failed to reset password after 10 attempts, the latest response was: ${reset_password}"
exit 1
fi

View File

@@ -48,7 +48,7 @@ runs:
done
if [ -z "${branch_id}" ] || [ "${branch_id}" == "null" ]; then
echo 2>&1 "Failed to delete branch after 10 attempts, the latest response was: ${deleted_branch}"
echo >&2 "Failed to delete branch after 10 attempts, the latest response was: ${deleted_branch}"
exit 1
fi
env:

View File

@@ -14,6 +14,12 @@ inputs:
api_host:
desctiption: 'Neon API host'
default: console.stage.neon.tech
provisioner:
desctiption: 'k8s-pod or k8s-neonvm'
default: 'k8s-pod'
compute_units:
desctiption: '[Min, Max] compute units; Min and Max are used for k8s-neonvm with autoscaling, for k8s-pod values Min and Max should be equal'
default: '[1, 1]'
outputs:
dsn:
@@ -31,6 +37,10 @@ runs:
# A shell without `set -x` to not to expose password/dsn in logs
shell: bash -euo pipefail {0}
run: |
if [ "${PROVISIONER}" == "k8s-pod" ] && [ "${MIN_CU}" != "${MAX_CU}" ]; then
echo >&2 "For k8s-pod provisioner MIN_CU should be equal to MAX_CU"
fi
project=$(curl \
"https://${API_HOST}/api/v2/projects" \
--fail \
@@ -42,6 +52,9 @@ runs:
\"name\": \"Created by actions/neon-project-create; GITHUB_RUN_ID=${GITHUB_RUN_ID}\",
\"pg_version\": ${POSTGRES_VERSION},
\"region_id\": \"${REGION_ID}\",
\"provisioner\": \"${PROVISIONER}\",
\"autoscaling_limit_min_cu\": ${MIN_CU},
\"autoscaling_limit_max_cu\": ${MAX_CU},
\"settings\": { }
}
}")
@@ -62,3 +75,6 @@ runs:
API_KEY: ${{ inputs.api_key }}
REGION_ID: ${{ inputs.region_id }}
POSTGRES_VERSION: ${{ inputs.postgres_version }}
PROVISIONER: ${{ inputs.provisioner }}
MIN_CU: ${{ fromJSON(inputs.compute_units)[0] }}
MAX_CU: ${{ fromJSON(inputs.compute_units)[1] }}

View File

@@ -36,14 +36,14 @@ inputs:
description: 'Region name for real s3 tests'
required: false
default: ''
real_s3_access_key_id:
description: 'Access key id'
rerun_flaky:
description: 'Whether to rerun flaky tests'
required: false
default: ''
real_s3_secret_access_key:
description: 'Secret access key'
default: 'false'
pg_version:
description: 'Postgres version to use for tests'
required: false
default: ''
default: 'v14'
runs:
using: "composite"
@@ -63,12 +63,12 @@ runs:
path: /tmp/neon-previous
prefix: latest
- name: Download compatibility snapshot for Postgres 14
- name: Download compatibility snapshot
if: inputs.build_type != 'remote'
uses: ./.github/actions/download
with:
name: compatibility-snapshot-${{ inputs.build_type }}-pg14
path: /tmp/compatibility_snapshot_pg14
name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}
path: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}
prefix: latest
- name: Checkout
@@ -96,18 +96,18 @@ runs:
COMPATIBILITY_POSTGRES_DISTRIB_DIR: /tmp/neon-previous/pg_install
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: ${{ inputs.build_type }}
AWS_ACCESS_KEY_ID: ${{ inputs.real_s3_access_key_id }}
AWS_SECRET_ACCESS_KEY: ${{ inputs.real_s3_secret_access_key }}
COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg14
COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}
ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'backward compatibility breakage')
ALLOW_FORWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'forward compatibility breakage')
RERUN_FLAKY: ${{ inputs.rerun_flaky }}
PG_VERSION: ${{ inputs.pg_version }}
shell: bash -euxo pipefail {0}
run: |
# PLATFORM will be embedded in the perf test report
# and it is needed to distinguish different environments
export PLATFORM=${PLATFORM:-github-actions-selfhosted}
export POSTGRES_DISTRIB_DIR=${POSTGRES_DISTRIB_DIR:-/tmp/neon/pg_install}
export DEFAULT_PG_VERSION=${DEFAULT_PG_VERSION:-14}
export DEFAULT_PG_VERSION=${PG_VERSION#v}
if [ "${BUILD_TYPE}" = "remote" ]; then
export REMOTE_ENV=1
@@ -123,8 +123,8 @@ runs:
exit 1
fi
if [[ "${{ inputs.run_in_parallel }}" == "true" ]]; then
# -n4 uses four processes to run tests via pytest-xdist
EXTRA_PARAMS="-n4 $EXTRA_PARAMS"
# -n16 uses sixteen processes to run tests via pytest-xdist
EXTRA_PARAMS="-n16 $EXTRA_PARAMS"
# --dist=loadgroup points tests marked with @pytest.mark.xdist_group
# to the same worker to make @pytest.mark.order work with xdist
@@ -143,6 +143,13 @@ runs:
EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"
fi
if [ "${RERUN_FLAKY}" == "true" ]; then
mkdir -p $TEST_OUTPUT
poetry run ./scripts/flaky_tests.py "${TEST_RESULT_CONNSTR}" --days 10 --output "$TEST_OUTPUT/flaky.json"
EXTRA_PARAMS="--flaky-tests-json $TEST_OUTPUT/flaky.json $EXTRA_PARAMS"
fi
if [[ "${{ inputs.build_type }}" == "debug" ]]; then
cov_prefix=(scripts/coverage "--profraw-prefix=$GITHUB_JOB" --dir=/tmp/coverage run)
elif [[ "${{ inputs.build_type }}" == "release" ]]; then
@@ -180,19 +187,18 @@ runs:
scripts/generate_and_push_perf_report.sh
fi
- name: Upload compatibility snapshot for Postgres 14
- name: Upload compatibility snapshot
if: github.ref_name == 'release'
uses: ./.github/actions/upload
with:
name: compatibility-snapshot-${{ inputs.build_type }}-pg14-${{ github.run_id }}
# The path includes a test name (test_create_snapshot) and directory that the test creates (compatibility_snapshot_pg14), keep the path in sync with the test
path: /tmp/test_output/test_create_snapshot/compatibility_snapshot_pg14/
name: compatibility-snapshot-${{ inputs.build_type }}-pg${{ inputs.pg_version }}-${{ github.run_id }}
# Directory is created by test_compatibility.py::test_create_snapshot, keep the path in sync with the test
path: /tmp/test_output/compatibility_snapshot_pg${{ inputs.pg_version }}/
prefix: latest
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
- name: Upload test results
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-store
with:
action: store
build_type: ${{ inputs.build_type }}
test_selection: ${{ inputs.test_selection }}
report-dir: /tmp/test_output/allure/results
unique-key: ${{ inputs.build_type }}

View File

@@ -23,7 +23,7 @@ runs:
mkdir -p $(dirname $ARCHIVE)
if [ -f ${ARCHIVE} ]; then
echo 2>&1 "File ${ARCHIVE} already exist. Something went wrong before"
echo >&2 "File ${ARCHIVE} already exist. Something went wrong before"
exit 1
fi
@@ -33,10 +33,10 @@ runs:
elif [ -f ${SOURCE} ]; then
time tar -cf ${ARCHIVE} --zstd ${SOURCE}
elif ! ls ${SOURCE} > /dev/null 2>&1; then
echo 2>&1 "${SOURCE} does not exist"
echo >&2 "${SOURCE} does not exist"
exit 2
else
echo 2>&1 "${SOURCE} is neither a directory nor a file, do not know how to handle it"
echo >&2 "${SOURCE} is neither a directory nor a file, do not know how to handle it"
exit 3
fi

View File

@@ -1,5 +0,0 @@
neon_install.tar.gz
.neon_current_version
collections/*
!collections/.keep

View File

@@ -1,12 +0,0 @@
[defaults]
localhost_warning = False
host_key_checking = False
timeout = 30
[ssh_connection]
ssh_args = -F ./ansible.ssh.cfg
# teleport doesn't support sftp yet https://github.com/gravitational/teleport/issues/7127
# and scp neither worked for me
transfer_method = piped
pipelining = True

View File

@@ -1,15 +0,0 @@
# Remove this once https://github.com/gravitational/teleport/issues/10918 is fixed
# (use pre 8.5 option name to cope with old ssh in CI)
PubkeyAcceptedKeyTypes +ssh-rsa-cert-v01@openssh.com
Host tele.zenith.tech
User admin
Port 3023
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
Host * !tele.zenith.tech
User admin
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
ProxyJump tele.zenith.tech

View File

@@ -1,193 +0,0 @@
- name: Upload Neon binaries
hosts: storage
gather_facts: False
remote_user: "{{ remote_user }}"
tasks:
- name: get latest version of Neon binaries
register: current_version_file
set_fact:
current_version: "{{ lookup('file', '.neon_current_version') | trim }}"
tags:
- pageserver
- safekeeper
- name: inform about versions
debug:
msg: "Version to deploy - {{ current_version }}"
tags:
- pageserver
- safekeeper
- name: upload and extract Neon binaries to /usr/local
ansible.builtin.unarchive:
owner: root
group: root
src: neon_install.tar.gz
dest: /usr/local
become: true
tags:
- pageserver
- safekeeper
- binaries
- putbinaries
- name: Deploy pageserver
hosts: pageservers
gather_facts: False
remote_user: "{{ remote_user }}"
tasks:
- name: upload init script
when: console_mgmt_base_url is defined
ansible.builtin.template:
src: scripts/init_pageserver.sh
dest: /tmp/init_pageserver.sh
owner: root
group: root
mode: '0755'
become: true
tags:
- pageserver
- name: init pageserver
shell:
cmd: /tmp/init_pageserver.sh
args:
creates: "/storage/pageserver/data/tenants"
environment:
NEON_REPO_DIR: "/storage/pageserver/data"
LD_LIBRARY_PATH: "/usr/local/v14/lib"
become: true
tags:
- pageserver
- name: read the existing remote pageserver config
ansible.builtin.slurp:
src: /storage/pageserver/data/pageserver.toml
register: _remote_ps_config
tags:
- pageserver
- name: parse the existing pageserver configuration
ansible.builtin.set_fact:
_existing_ps_config: "{{ _remote_ps_config['content'] | b64decode | sivel.toiletwater.from_toml }}"
tags:
- pageserver
- name: construct the final pageserver configuration dict
ansible.builtin.set_fact:
pageserver_config: "{{ pageserver_config_stub | combine({'id': _existing_ps_config.id }) }}"
tags:
- pageserver
- name: template the pageserver config
template:
src: templates/pageserver.toml.j2
dest: /storage/pageserver/data/pageserver.toml
become: true
tags:
- pageserver
- name: upload systemd service definition
ansible.builtin.template:
src: systemd/pageserver.service
dest: /etc/systemd/system/pageserver.service
owner: root
group: root
mode: '0644'
become: true
tags:
- pageserver
- name: start systemd service
ansible.builtin.systemd:
daemon_reload: yes
name: pageserver
enabled: yes
state: restarted
become: true
tags:
- pageserver
- name: post version to console
when: console_mgmt_base_url is defined
shell:
cmd: |
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/pageservers
tags:
- pageserver
- name: Deploy safekeeper
hosts: safekeepers
gather_facts: False
remote_user: "{{ remote_user }}"
tasks:
- name: upload init script
when: console_mgmt_base_url is defined
ansible.builtin.template:
src: scripts/init_safekeeper.sh
dest: /tmp/init_safekeeper.sh
owner: root
group: root
mode: '0755'
become: true
tags:
- safekeeper
- name: init safekeeper
shell:
cmd: /tmp/init_safekeeper.sh
args:
creates: "/storage/safekeeper/data/safekeeper.id"
environment:
NEON_REPO_DIR: "/storage/safekeeper/data"
LD_LIBRARY_PATH: "/usr/local/v14/lib"
become: true
tags:
- safekeeper
# in the future safekeepers should discover pageservers byself
# but currently use first pageserver that was discovered
- name: set first pageserver var for safekeepers
set_fact:
first_pageserver: "{{ hostvars[groups['pageservers'][0]]['inventory_hostname'] }}"
tags:
- safekeeper
- name: upload systemd service definition
ansible.builtin.template:
src: systemd/safekeeper.service
dest: /etc/systemd/system/safekeeper.service
owner: root
group: root
mode: '0644'
become: true
tags:
- safekeeper
- name: start systemd service
ansible.builtin.systemd:
daemon_reload: yes
name: safekeeper
enabled: yes
state: restarted
become: true
tags:
- safekeeper
- name: post version to console
when: console_mgmt_base_url is defined
shell:
cmd: |
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/$INSTANCE_ID | jq '.version = {{ current_version }}' > /tmp/new_version
curl -sfS -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" -X POST -d@/tmp/new_version {{ console_mgmt_base_url }}/management/api/v2/safekeepers
tags:
- safekeeper

View File

@@ -1,42 +0,0 @@
#!/bin/bash
set -e
if [ -n "${DOCKER_TAG}" ]; then
# Verson is DOCKER_TAG but without prefix
VERSION=$(echo $DOCKER_TAG | sed 's/^.*-//g')
else
echo "Please set DOCKER_TAG environment variable"
exit 1
fi
# do initial cleanup
rm -rf neon_install postgres_install.tar.gz neon_install.tar.gz .neon_current_version
mkdir neon_install
# retrieve binaries from docker image
echo "getting binaries from docker image"
docker pull --quiet neondatabase/neon:${DOCKER_TAG}
ID=$(docker create neondatabase/neon:${DOCKER_TAG})
docker cp ${ID}:/data/postgres_install.tar.gz .
tar -xzf postgres_install.tar.gz -C neon_install
mkdir neon_install/bin/
docker cp ${ID}:/usr/local/bin/pageserver neon_install/bin/
docker cp ${ID}:/usr/local/bin/pageserver_binutils neon_install/bin/
docker cp ${ID}:/usr/local/bin/safekeeper neon_install/bin/
docker cp ${ID}:/usr/local/bin/storage_broker neon_install/bin/
docker cp ${ID}:/usr/local/bin/proxy neon_install/bin/
docker cp ${ID}:/usr/local/v14/bin/ neon_install/v14/bin/
docker cp ${ID}:/usr/local/v15/bin/ neon_install/v15/bin/
docker cp ${ID}:/usr/local/v14/lib/ neon_install/v14/lib/
docker cp ${ID}:/usr/local/v15/lib/ neon_install/v15/lib/
docker rm -vf ${ID}
# store version to file (for ansible playbooks) and create binaries tarball
echo ${VERSION} > neon_install/.neon_current_version
echo ${VERSION} > .neon_current_version
tar -czf neon_install.tar.gz -C neon_install .
# do final cleaup
rm -rf neon_install postgres_install.tar.gz

View File

@@ -1,38 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-ap-southeast-1
bucket_region: ap-southeast-1
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.epsilon.ap-southeast-1.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: ap-southeast-1
ansible_aws_ssm_bucket_name: neon-prod-storage-ap-southeast-1
console_region_id: aws-ap-southeast-1
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.ap-southeast-1.aws.neon.tech:
ansible_host: i-064de8ea28bdb495b
pageserver-1.ap-southeast-1.aws.neon.tech:
ansible_host: i-0b180defcaeeb6b93
safekeepers:
hosts:
safekeeper-0.ap-southeast-1.aws.neon.tech:
ansible_host: i-0d6f1dc5161eef894
safekeeper-1.ap-southeast-1.aws.neon.tech:
ansible_host: i-0e338adda8eb2d19f
safekeeper-2.ap-southeast-1.aws.neon.tech:
ansible_host: i-04fb63634e4679eb9

View File

@@ -1,38 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-eu-central-1
bucket_region: eu-central-1
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.gamma.eu-central-1.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: eu-central-1
ansible_aws_ssm_bucket_name: neon-prod-storage-eu-central-1
console_region_id: aws-eu-central-1
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.eu-central-1.aws.neon.tech:
ansible_host: i-0cd8d316ecbb715be
pageserver-1.eu-central-1.aws.neon.tech:
ansible_host: i-090044ed3d383fef0
safekeepers:
hosts:
safekeeper-0.eu-central-1.aws.neon.tech:
ansible_host: i-0b238612d2318a050
safekeeper-1.eu-central-1.aws.neon.tech:
ansible_host: i-07b9c45e5c2637cd4
safekeeper-2.eu-central-1.aws.neon.tech:
ansible_host: i-020257302c3c93d88

View File

@@ -1,39 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-us-east-2
bucket_region: us-east-2
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.delta.us-east-2.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: us-east-2
ansible_aws_ssm_bucket_name: neon-prod-storage-us-east-2
console_region_id: aws-us-east-2
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.us-east-2.aws.neon.tech:
ansible_host: i-062227ba7f119eb8c
pageserver-1.us-east-2.aws.neon.tech:
ansible_host: i-0b3ec0afab5968938
safekeepers:
hosts:
safekeeper-0.us-east-2.aws.neon.tech:
ansible_host: i-0e94224750c57d346
safekeeper-1.us-east-2.aws.neon.tech:
ansible_host: i-06d113fb73bfddeb0
safekeeper-2.us-east-2.aws.neon.tech:
ansible_host: i-09f66c8e04afff2e8

View File

@@ -1,41 +0,0 @@
storage:
vars:
bucket_name: neon-prod-storage-us-west-2
bucket_region: us-west-2
console_mgmt_base_url: http://console-release.local
broker_endpoint: http://storage-broker-lb.eta.us-west-2.internal.aws.neon.tech:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: us-west-2
ansible_aws_ssm_bucket_name: neon-prod-storage-us-west-2
console_region_id: aws-us-west-2-new
sentry_environment: production
children:
pageservers:
hosts:
pageserver-0.us-west-2.aws.neon.tech:
ansible_host: i-0d9f6dfae0e1c780d
pageserver-1.us-west-2.aws.neon.tech:
ansible_host: i-0c834be1dddba8b3f
pageserver-2.us-west-2.aws.neon.tech:
ansible_host: i-051642d372c0a4f32
safekeepers:
hosts:
safekeeper-0.us-west-2.aws.neon.tech:
ansible_host: i-00719d8a74986fda6
safekeeper-1.us-west-2.aws.neon.tech:
ansible_host: i-074682f9d3c712e7c
safekeeper-2.us-west-2.aws.neon.tech:
ansible_host: i-042b7efb1729d7966

View File

@@ -1,40 +0,0 @@
---
storage:
vars:
console_mgmt_base_url: http://console-release.local
bucket_name: zenith-storage-oregon
bucket_region: us-west-2
broker_endpoint: http://storage-broker.prod.local:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-release.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "{{ inventory_hostname }}"
safekeeper_s3_prefix: prod-1/wal
hostname_suffix: ".local"
remote_user: admin
sentry_environment: production
children:
pageservers:
hosts:
zenith-1-ps-2:
console_region_id: aws-us-west-2
zenith-1-ps-3:
console_region_id: aws-us-west-2
zenith-1-ps-4:
console_region_id: aws-us-west-2
zenith-1-ps-5:
console_region_id: aws-us-west-2
safekeepers:
hosts:
zenith-1-sk-1:
console_region_id: aws-us-west-2
zenith-1-sk-2:
console_region_id: aws-us-west-2
zenith-1-sk-4:
console_region_id: aws-us-west-2

View File

@@ -1,33 +0,0 @@
#!/bin/sh
# fetch params from meta-data service
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
# store fqdn hostname in var
HOST=$(hostname -f)
cat <<EOF | tee /tmp/payload
{
"version": 1,
"host": "${HOST}",
"port": 6400,
"region_id": "{{ console_region_id }}",
"instance_id": "${INSTANCE_ID}",
"http_host": "${HOST}",
"http_port": 9898,
"active": false,
"availability_zone_id": "${AZ_ID}"
}
EOF
# check if pageserver already registered or not
if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers/${INSTANCE_ID} -o /dev/null; then
# not registered, so register it now
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/pageservers -d@/tmp/payload | jq -r '.id')
# init pageserver
sudo -u pageserver /usr/local/bin/pageserver -c "id=${ID}" -c "pg_distrib_dir='/usr/local'" --init -D /storage/pageserver/data
fi

View File

@@ -1,31 +0,0 @@
#!/bin/sh
# fetch params from meta-data service
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ_ID=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
# store fqdn hostname in var
HOST=$(hostname -f)
cat <<EOF | tee /tmp/payload
{
"version": 1,
"host": "${HOST}",
"port": 6500,
"http_port": 7676,
"region_id": "{{ console_region_id }}",
"instance_id": "${INSTANCE_ID}",
"availability_zone_id": "${AZ_ID}",
"active": false
}
EOF
# check if safekeeper already registered or not
if ! curl -sf -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers/${INSTANCE_ID} -o /dev/null; then
# not registered, so register it now
ID=$(curl -sf -X POST -H "Authorization: Bearer {{ CONSOLE_API_TOKEN }}" {{ console_mgmt_base_url }}/management/api/v2/safekeepers -d@/tmp/payload | jq -r '.id')
# init safekeeper
sudo -u safekeeper /usr/local/bin/safekeeper --id ${ID} --init -D /storage/safekeeper/data
fi

View File

@@ -1,2 +0,0 @@
ansible_connection: aws_ssm
ansible_python_interpreter: /usr/bin/python3

View File

@@ -1,36 +0,0 @@
storage:
vars:
bucket_name: neon-dev-storage-eu-west-1
bucket_region: eu-west-1
console_mgmt_base_url: http://console-staging.local
broker_endpoint: http://storage-broker-lb.zeta.eu-west-1.internal.aws.neon.build:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-staging.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: eu-west-1
ansible_aws_ssm_bucket_name: neon-dev-storage-eu-west-1
console_region_id: aws-eu-west-1
sentry_environment: staging
children:
pageservers:
hosts:
pageserver-0.eu-west-1.aws.neon.build:
ansible_host: i-01d496c5041c7f34c
safekeepers:
hosts:
safekeeper-0.eu-west-1.aws.neon.build:
ansible_host: i-05226ef85722831bf
safekeeper-1.eu-west-1.aws.neon.build:
ansible_host: i-06969ee1bf2958bfc
safekeeper-2.eu-west-1.aws.neon.build:
ansible_host: i-087892e9625984a0b

View File

@@ -1,42 +0,0 @@
storage:
vars:
bucket_name: neon-staging-storage-us-east-2
bucket_region: us-east-2
console_mgmt_base_url: http://console-staging.local
broker_endpoint: http://storage-broker-lb.beta.us-east-2.internal.aws.neon.build:50051
pageserver_config_stub:
pg_distrib_dir: /usr/local
metric_collection_endpoint: http://console-staging.local/billing/api/v1/usage_events
metric_collection_interval: 10min
remote_storage:
bucket_name: "{{ bucket_name }}"
bucket_region: "{{ bucket_region }}"
prefix_in_bucket: "pageserver/v1"
safekeeper_s3_prefix: safekeeper/v1/wal
hostname_suffix: ""
remote_user: ssm-user
ansible_aws_ssm_region: us-east-2
ansible_aws_ssm_bucket_name: neon-staging-storage-us-east-2
console_region_id: aws-us-east-2
sentry_environment: staging
children:
pageservers:
hosts:
pageserver-0.us-east-2.aws.neon.build:
ansible_host: i-0c3e70929edb5d691
pageserver-1.us-east-2.aws.neon.build:
ansible_host: i-0565a8b4008aa3f40
pageserver-2.us-east-2.aws.neon.build:
ansible_host: i-01e31cdf7e970586a
pageserver-3.us-east-2.aws.neon.build:
ansible_host: i-0602a0291365ef7cc
safekeepers:
hosts:
safekeeper-0.us-east-2.aws.neon.build:
ansible_host: i-027662bd552bf5db0
safekeeper-1.us-east-2.aws.neon.build:
ansible_host: i-0171efc3604a7b907
safekeeper-2.us-east-2.aws.neon.build:
ansible_host: i-0de0b03a51676a6ce

View File

@@ -1,18 +0,0 @@
[Unit]
Description=Neon pageserver
After=network.target auditd.service
[Service]
Type=simple
User=pageserver
Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/pageserver LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_PAGESERVER }} SENTRY_ENVIRONMENT={{ sentry_environment }}
ExecStart=/usr/local/bin/pageserver -c "pg_distrib_dir='/usr/local'" -c "listen_pg_addr='0.0.0.0:6400'" -c "listen_http_addr='0.0.0.0:9898'" -c "broker_endpoint='{{ broker_endpoint }}'" -D /storage/pageserver/data
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT
Restart=on-failure
TimeoutSec=10
LimitNOFILE=30000000
[Install]
WantedBy=multi-user.target

View File

@@ -1,18 +0,0 @@
[Unit]
Description=Neon safekeeper
After=network.target auditd.service
[Service]
Type=simple
User=safekeeper
Environment=RUST_BACKTRACE=1 NEON_REPO_DIR=/storage/safekeeper/data LD_LIBRARY_PATH=/usr/local/v14/lib SENTRY_DSN={{ SENTRY_URL_SAFEKEEPER }} SENTRY_ENVIRONMENT={{ sentry_environment }}
ExecStart=/usr/local/bin/safekeeper -l {{ inventory_hostname }}{{ hostname_suffix }}:6500 --listen-http {{ inventory_hostname }}{{ hostname_suffix }}:7676 -D /storage/safekeeper/data --broker-endpoint={{ broker_endpoint }} --remote-storage='{bucket_name="{{bucket_name}}", bucket_region="{{bucket_region}}", prefix_in_bucket="{{ safekeeper_s3_prefix }}"}'
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT
Restart=on-failure
TimeoutSec=10
LimitNOFILE=30000000
[Install]
WantedBy=multi-user.target

View File

@@ -1 +0,0 @@
{{ pageserver_config | sivel.toiletwater.to_toml }}

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-staging.local/management/api/v2"
domain: "*.eu-west-1.aws.neon.build"
sentryEnvironment: "staging"
wssPort: 8443
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: dev
zenith_region: eu-west-1
zenith_region_slug: eu-west-1
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: eu-west-1.aws.neon.build
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: staging
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.zeta.eu-west-1.internal.aws.neon.build
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "staging"

View File

@@ -1,67 +0,0 @@
# Helm chart values for neon-proxy-link.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "link"
authEndpoint: "https://console.stage.neon.tech/authenticate_proxy_request/"
uri: "https://console.stage.neon.tech/psql_session/"
sentryEnvironment: "staging"
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy-link pods
podLabels:
zenith_service: proxy
zenith_env: dev
zenith_region: us-east-2
zenith_region_slug: us-east-2
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
external-dns.alpha.kubernetes.io/hostname: neon-proxy-link-mgmt.beta.us-east-2.aws.neon.build
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: neon-proxy-link.beta.us-east-2.aws.neon.build
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-staging.local/management/api/v2"
domain: "*.cloud.stage.neon.tech"
sentryEnvironment: "staging"
wssPort: 8443
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram-legacy
zenith_env: dev
zenith_region: us-east-2
zenith_region_slug: us-east-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: neon-proxy-scram-legacy.beta.us-east-2.aws.neon.build
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-staging.local/management/api/v2"
domain: "*.us-east-2.aws.neon.build"
sentryEnvironment: "staging"
wssPort: 8443
metricCollectionEndpoint: "http://console-staging.local/billing/api/v1/usage_events"
metricCollectionInterval: "1min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: dev
zenith_region: us-east-2
zenith_region_slug: us-east-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: us-east-2.aws.neon.build
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: staging
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.beta.us-east-2.internal.aws.neon.build
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "staging"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.ap-southeast-1.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: ap-southeast-1
zenith_region_slug: ap-southeast-1
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: ap-southeast-1.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.epsilon.ap-southeast-1.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.eu-central-1.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: eu-central-1
zenith_region_slug: eu-central-1
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: eu-central-1.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.gamma.eu-central-1.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.us-east-2.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: us-east-2
zenith_region_slug: us-east-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: us-east-2.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.delta.us-east-2.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,61 +0,0 @@
# Helm chart values for neon-proxy-scram.
# This is a YAML-formatted file.
image:
repository: neondatabase/neon
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.us-west-2.aws.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
# -- Additional labels for neon-proxy pods
podLabels:
zenith_service: proxy-scram
zenith_env: prod
zenith_region: us-west-2
zenith_region_slug: us-west-2
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: us-west-2.aws.neon.tech
httpsPort: 443
#metrics:
# enabled: true
# serviceMonitor:
# enabled: true
# selector:
# release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,52 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker-lb.eta.us-west-2.internal.aws.neon.tech
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: false
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,56 +0,0 @@
# Helm chart values for neon-storage-broker
podLabels:
neon_env: production
neon_service: storage-broker
# Use L4 LB
service:
# service.annotations -- Annotations to add to the service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external # use newer AWS Load Balancer Controller
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal # deploy LB to private subnet
# assign service to this name at external-dns
external-dns.alpha.kubernetes.io/hostname: storage-broker.prod.local
# service.type -- Service type
type: LoadBalancer
# service.port -- broker listen port
port: 50051
ingress:
enabled: false
metrics:
enabled: true
serviceMonitor:
enabled: true
selector:
release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-storage-broker.fullname\" . }}"
labels:
helm.sh/chart: neon-storage-broker-{{ .Chart.Version }}
app.kubernetes.io/name: neon-storage-broker
app.kubernetes.io/instance: neon-storage-broker
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-storage-broker"
endpoints:
- port: broker
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"
settings:
sentryEnvironment: "production"

View File

@@ -1,54 +0,0 @@
settings:
authBackend: "console"
authEndpoint: "http://console-release.local/management/api/v2"
domain: "*.cloud.neon.tech"
sentryEnvironment: "production"
wssPort: 8443
metricCollectionEndpoint: "http://console-release.local/billing/api/v1/usage_events"
metricCollectionInterval: "10min"
podLabels:
zenith_service: proxy-scram
zenith_env: production
zenith_region: us-west-2
zenith_region_slug: oregon
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: '*.cloud.neon.tech'
httpsPort: 443
metrics:
enabled: true
serviceMonitor:
enabled: true
selector:
release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,59 +0,0 @@
settings:
authBackend: "link"
authEndpoint: "https://console.neon.tech/authenticate_proxy_request/"
uri: "https://console.neon.tech/psql_session/"
sentryEnvironment: "production"
# -- Additional labels for zenith-proxy pods
podLabels:
zenith_service: proxy
zenith_env: production
zenith_region: us-west-2
zenith_region_slug: oregon
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internal
external-dns.alpha.kubernetes.io/hostname: proxy-release.local
type: LoadBalancer
exposedService:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
external-dns.alpha.kubernetes.io/hostname: connect.neon.tech,pg.neon.tech
metrics:
enabled: true
serviceMonitor:
enabled: true
selector:
release: kube-prometheus-stack
extraManifests:
- apiVersion: operator.victoriametrics.com/v1beta1
kind: VMServiceScrape
metadata:
name: "{{ include \"neon-proxy.fullname\" . }}"
labels:
helm.sh/chart: neon-proxy-{{ .Chart.Version }}
app.kubernetes.io/name: neon-proxy
app.kubernetes.io/instance: "{{ include \"neon-proxy.fullname\" . }}"
app.kubernetes.io/version: "{{ .Chart.AppVersion }}"
app.kubernetes.io/managed-by: Helm
namespace: "{{ .Release.Namespace }}"
spec:
selector:
matchLabels:
app.kubernetes.io/name: "neon-proxy"
endpoints:
- port: http
path: /metrics
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- "{{ .Release.Namespace }}"

View File

@@ -1,10 +1,14 @@
## Describe your changes
## Problem
## Issue ticket number and link
## Summary of changes
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above checklist

View File

@@ -16,12 +16,12 @@ on:
workflow_dispatch: # adds ability to run this manually
inputs:
region_id:
description: 'Use a particular region. If not set the default region will be used'
description: 'Project region id. If not set, the default region will be used'
required: false
default: 'aws-us-east-2'
save_perf_report:
type: boolean
description: 'Publish perf report or not. If not set, the report is published only for the main branch'
description: 'Publish perf report. If not set, the report will be published only for the main branch'
required: false
defaults:
@@ -30,7 +30,7 @@ defaults:
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
jobs:
@@ -42,7 +42,7 @@ jobs:
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: "neon-staging"
runs-on: [ self-hosted, us-east-2, x64 ]
@@ -92,11 +92,8 @@ jobs:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -107,25 +104,66 @@ jobs:
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
generate-matrices:
# Create matrices for the benchmarking jobs, so we run benchmarks on rds only once a week (on Saturday)
#
# Available platforms:
# - neon-captest-new: Freshly created project (1 CU)
# - neon-captest-freetier: Use freetier-sized compute (0.25 CU)
# - neon-captest-reuse: Reusing existing project
# - rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# - rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
runs-on: ubuntu-latest
outputs:
pgbench-compare-matrix: ${{ steps.pgbench-compare-matrix.outputs.matrix }}
olap-compare-matrix: ${{ steps.olap-compare-matrix.outputs.matrix }}
steps:
- name: Generate matrix for pgbench benchmark
id: pgbench-compare-matrix
run: |
matrix='{
"platform": [
"neon-captest-new",
"neon-captest-reuse",
"neonvm-captest-new"
],
"db_size": [ "10gb" ],
"include": [{ "platform": "neon-captest-freetier", "db_size": "3gb" },
{ "platform": "neon-captest-new", "db_size": "50gb" },
{ "platform": "neonvm-captest-freetier", "db_size": "3gb" },
{ "platform": "neonvm-captest-new", "db_size": "50gb" }]
}'
if [ "$(date +%A)" = "Saturday" ]; then
matrix=$(echo $matrix | jq '.include += [{ "platform": "rds-postgres", "db_size": "10gb"},
{ "platform": "rds-aurora", "db_size": "50gb"}]')
fi
echo "matrix=$(echo $matrix | jq --compact-output '.')" >> $GITHUB_OUTPUT
- name: Generate matrix for OLAP benchmarks
id: olap-compare-matrix
run: |
matrix='{
"platform": [
"neon-captest-reuse"
]
}'
if [ "$(date +%A)" = "Saturday" ]; then
matrix=$(echo $matrix | jq '.include += [{ "platform": "rds-postgres" },
{ "platform": "rds-aurora" }]')
fi
echo "matrix=$(echo $matrix | jq --compact-output '.')" >> $GITHUB_OUTPUT
pgbench-compare:
needs: [ generate-matrices ]
strategy:
fail-fast: false
matrix:
# neon-captest-new: Run pgbench in a freshly created project
# neon-captest-reuse: Same, but reusing existing project
# neon-captest-prefetch: Same, with prefetching enabled (new project)
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-reuse, neon-captest-prefetch, rds-postgres ]
db_size: [ 10gb ]
runner: [ us-east-2 ]
include:
- platform: neon-captest-prefetch
db_size: 50gb
runner: us-east-2
- platform: rds-aurora
db_size: 50gb
runner: us-east-2
matrix: ${{fromJson(needs.generate-matrices.outputs.pgbench-compare-matrix)}}
env:
TEST_PG_BENCH_DURATIONS_MATRIX: "60m"
@@ -134,15 +172,16 @@ jobs:
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
runs-on: [ self-hosted, "${{ matrix.runner }}", x64 ]
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
# Increase timeout to 8h, default timeout is 6h
timeout-minutes: 480
steps:
- uses: actions/checkout@v3
@@ -160,13 +199,15 @@ jobs:
echo "${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/bin" >> $GITHUB_PATH
- name: Create Neon Project
if: contains(fromJson('["neon-captest-new", "neon-captest-prefetch"]'), matrix.platform)
if: contains(fromJson('["neon-captest-new", "neon-captest-freetier", "neonvm-captest-new", "neonvm-captest-freetier"]'), matrix.platform)
id: create-neon-project
uses: ./.github/actions/neon-project-create
with:
region_id: ${{ github.event.inputs.region_id || 'aws-us-east-2' }}
postgres_version: ${{ env.DEFAULT_PG_VERSION }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
compute_units: ${{ (matrix.platform == 'neon-captest-freetier' && '[0.25, 0.25]') || '[1, 1]' }}
provisioner: ${{ (contains(matrix.platform, 'neonvm-') && 'k8s-neonvm') || 'k8s-pod' }}
- name: Set up Connection String
id: set-up-connstr
@@ -175,7 +216,7 @@ jobs:
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}
;;
neon-captest-new | neon-captest-prefetch)
neon-captest-new | neon-captest-freetier | neonvm-captest-new | neonvm-captest-freetier)
CONNSTR=${{ steps.create-neon-project.outputs.dsn }}
;;
rds-aurora)
@@ -185,7 +226,7 @@ jobs:
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'neon-captest-new', 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}"
exit 1
;;
esac
@@ -194,17 +235,6 @@ jobs:
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Benchmark init
uses: ./.github/actions/run-python-test-set
with:
@@ -252,11 +282,8 @@ jobs:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -275,23 +302,19 @@ jobs:
#
# *_CLICKBENCH_CONNSTR: Genuine ClickBench DB with ~100M rows
# *_CLICKBENCH_10M_CONNSTR: DB with the first 10M rows of ClickBench DB
if: success() || failure()
needs: [ pgbench-compare ]
if: ${{ !cancelled() }}
needs: [ generate-matrices, pgbench-compare ]
strategy:
fail-fast: false
matrix:
# neon-captest-prefetch: We have pre-created projects with prefetch enabled
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]
matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
runs-on: [ self-hosted, us-east-2, x64 ]
@@ -299,8 +322,6 @@ jobs:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
steps:
- uses: actions/checkout@v3
@@ -320,7 +341,7 @@ jobs:
id: set-up-connstr
run: |
case "${PLATFORM}" in
neon-captest-prefetch)
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CLICKBENCH_10M_CONNSTR }}
;;
rds-aurora)
@@ -330,7 +351,7 @@ jobs:
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CLICKBENCH_10M_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"
exit 1
;;
esac
@@ -339,17 +360,6 @@ jobs:
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: ClickBench benchmark
uses: ./.github/actions/run-python-test-set
with:
@@ -364,11 +374,8 @@ jobs:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -386,23 +393,19 @@ jobs:
# We might change it after https://github.com/neondatabase/neon/issues/2900.
#
# *_TPCH_S10_CONNSTR: DB generated with scale factor 10 (~10 GB)
if: success() || failure()
needs: [ clickbench-compare ]
if: ${{ !cancelled() }}
needs: [ generate-matrices, clickbench-compare ]
strategy:
fail-fast: false
matrix:
# neon-captest-prefetch: We have pre-created projects with prefetch enabled
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]
matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
runs-on: [ self-hosted, us-east-2, x64 ]
@@ -410,8 +413,6 @@ jobs:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
steps:
- uses: actions/checkout@v3
@@ -431,7 +432,7 @@ jobs:
id: set-up-connstr
run: |
case "${PLATFORM}" in
neon-captest-prefetch)
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_TPCH_S10_CONNSTR }}
;;
rds-aurora)
@@ -441,7 +442,7 @@ jobs:
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_TPCH_S10_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"
exit 1
;;
esac
@@ -450,17 +451,6 @@ jobs:
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Run TPC-H benchmark
uses: ./.github/actions/run-python-test-set
with:
@@ -475,11 +465,8 @@ jobs:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
@@ -491,23 +478,19 @@ jobs:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
user-examples-compare:
if: success() || failure()
needs: [ tpch-compare ]
if: ${{ !cancelled() }}
needs: [ generate-matrices, tpch-compare ]
strategy:
fail-fast: false
matrix:
# neon-captest-prefetch: We have pre-created projects with prefetch enabled
# rds-aurora: Aurora Postgres Serverless v2 with autoscaling from 0.5 to 2 ACUs
# rds-postgres: RDS Postgres db.m5.large instance (2 vCPU, 8 GiB) with gp3 EBS storage
platform: [ neon-captest-prefetch, rds-postgres, rds-aurora ]
matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
DEFAULT_PG_VERSION: 14
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref == 'refs/heads/main' ) }}
SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}
PLATFORM: ${{ matrix.platform }}
runs-on: [ self-hosted, us-east-2, x64 ]
@@ -515,8 +498,6 @@ jobs:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
timeout-minutes: 360 # 6h
steps:
- uses: actions/checkout@v3
@@ -536,7 +517,7 @@ jobs:
id: set-up-connstr
run: |
case "${PLATFORM}" in
neon-captest-prefetch)
neon-captest-reuse)
CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_CAPTEST_CONNSTR }}
;;
rds-aurora)
@@ -546,7 +527,7 @@ jobs:
CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_RDS_POSTGRES_CONNSTR }}
;;
*)
echo 2>&1 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-prefetch', 'rds-aurora', or 'rds-postgres'"
echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neon-captest-reuse', 'rds-aurora', or 'rds-postgres'"
exit 1
;;
esac
@@ -555,17 +536,6 @@ jobs:
psql ${CONNSTR} -c "SELECT version();"
- name: Set database options
if: matrix.platform == 'neon-captest-prefetch'
run: |
DB_NAME=$(psql ${BENCHMARK_CONNSTR} --no-align --quiet -t -c "SELECT current_database()")
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET enable_seqscan_prefetch=on"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET effective_io_concurrency=32"
psql ${BENCHMARK_CONNSTR} -c "ALTER DATABASE ${DB_NAME} SET maintenance_io_concurrency=32"
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Run user examples
uses: ./.github/actions/run-python-test-set
with:
@@ -580,17 +550,14 @@ jobs:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
- name: Create Allure report
if: success() || failure()
uses: ./.github/actions/allure-report
with:
action: generate
build_type: ${{ env.BUILD_TYPE }}
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@v1
with:
channel-id: "C033QLM5P7D" # dev-staging-stream
slack-message: "Periodic TPC-H perf testing ${{ matrix.platform }}: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
slack-message: "Periodic User example perf testing ${{ matrix.platform }}: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

File diff suppressed because it is too large Load Diff

View File

@@ -4,6 +4,7 @@ on:
push:
branches:
- main
pull_request:
defaults:
run:
@@ -11,7 +12,7 @@ defaults:
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
env:
@@ -20,6 +21,7 @@ env:
jobs:
check-macos-build:
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')
timeout-minutes: 90
runs-on: macos-latest
@@ -51,14 +53,14 @@ jobs:
uses: actions/cache@v3
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
key: v1-${{ runner.os }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v3
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ matrix.build_type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
key: v1-${{ runner.os }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Set extra env for macOS
run: |
@@ -93,11 +95,16 @@ jobs:
run: ./run_clippy.sh
gather-rust-build-stats:
timeout-minutes: 90
runs-on: ubuntu-latest
if: github.ref_name == 'main' || contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats')
runs-on: [ self-hosted, gen3, large ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:pinned
options: --init
env:
BUILD_TYPE: release
# remove the cachepot wrapper and build without crate caches
RUSTC_WRAPPER: ""
# build with incremental compilation produce partial results
# so do not attempt to cache this build, also disable the incremental compilation
CARGO_INCREMENTAL: 0
@@ -109,11 +116,6 @@ jobs:
submodules: true
fetch-depth: 1
- name: Install Ubuntu postgres dependencies
run: |
sudo apt update
sudo apt install build-essential libreadline-dev zlib1g-dev flex bison libseccomp-dev libssl-dev protobuf-compiler
# Some of our rust modules use FFI and need those to be checked
- name: Get postgres headers
run: make postgres-headers -j$(nproc)
@@ -122,7 +124,31 @@ jobs:
run: cargo build --all --release --timings
- name: Upload the build stats
uses: actions/upload-artifact@v3
id: upload-stats
env:
BUCKET: neon-github-public-dev
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
run: |
REPORT_URL=https://${BUCKET}.s3.amazonaws.com/build-stats/${SHA}/${GITHUB_RUN_ID}/cargo-timing.html
aws s3 cp --only-show-errors ./target/cargo-timings/cargo-timing.html "s3://${BUCKET}/build-stats/${SHA}/${GITHUB_RUN_ID}/"
echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT
- name: Publish build stats report
uses: actions/github-script@v6
env:
REPORT_URL: ${{ steps.upload-stats.outputs.report-url }}
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
with:
name: neon-${{ runner.os }}-release-build-stats
path: ./target/cargo-timings/
script: |
const { REPORT_URL, SHA } = process.env
await github.rest.repos.createCommitStatus({
owner: context.repo.owner,
repo: context.repo.repo,
sha: `${SHA}`,
state: 'success',
target_url: `${REPORT_URL}`,
context: `Build stats (release)`,
})

View File

@@ -14,7 +14,7 @@ on:
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.ref == 'refs/heads/main' && github.sha || 'anysha' }}
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
jobs:

34
.github/workflows/release.yml vendored Normal file
View File

@@ -0,0 +1,34 @@
name: Create Release Branch
on:
schedule:
- cron: '0 10 * * 2'
workflow_dispatch:
jobs:
create_release_branch:
runs-on: [ubuntu-latest]
steps:
- name: Check out code
uses: actions/checkout@v3
with:
ref: main
- name: Get current date
id: date
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
- name: Create release branch
run: git checkout -b releases/${{ steps.date.outputs.date }}
- name: Push new branch
run: git push origin releases/${{ steps.date.outputs.date }}
- name: Create pull request into release
uses: thomaseizinger/create-pull-request@e3972219c86a56550fb70708d96800d8e24ba862 # 1.3.0
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
head: releases/${{ steps.date.outputs.date }}
base: release
title: Release ${{ steps.date.outputs.date }}

4
.neon_clippy_args Normal file
View File

@@ -0,0 +1,4 @@
# * `-A unknown_lints` do not warn about unknown lint suppressions
# that people with newer toolchains might use
# * `-D warnings` - fail on any warnings (`cargo` returns non-zero exit status)
export CLIPPY_COMMON_ARGS="--locked --workspace --all-targets -- -A unknown_lints -D warnings"

View File

@@ -2,7 +2,7 @@
Howdy! Usual good software engineering practices apply. Write
tests. Write comments. Follow standard Rust coding practices where
possible. Use 'cargo fmt' and 'clippy' to tidy up formatting.
possible. Use `cargo fmt` and `cargo clippy` to tidy up formatting.
There are soft spots in the code, which could use cleanup,
refactoring, additional comments, and so forth. Let's try to raise the

2258
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -3,11 +3,26 @@ members = [
"compute_tools",
"control_plane",
"pageserver",
"pageserver/ctl",
"proxy",
"safekeeper",
"storage_broker",
"workspace_hack",
"libs/*",
"trace",
"libs/compute_api",
"libs/pageserver_api",
"libs/postgres_ffi",
"libs/safekeeper_api",
"libs/utils",
"libs/consumption_metrics",
"libs/postgres_backend",
"libs/pq_proto",
"libs/tenant_size_model",
"libs/metrics",
"libs/postgres_connection",
"libs/remote_storage",
"libs/tracing-utils",
"libs/postgres_ffi/wal_craft",
]
[workspace.package]
@@ -19,24 +34,27 @@ license = "Apache-2.0"
anyhow = { version = "1.0", features = ["backtrace"] }
async-stream = "0.3"
async-trait = "0.1"
atty = "0.2.14"
aws-config = { version = "0.51.0", default-features = false, features=["rustls"] }
aws-sdk-s3 = "0.21.0"
aws-smithy-http = "0.51.0"
aws-types = "0.51.0"
aws-config = { version = "0.55", default-features = false, features=["rustls"] }
aws-sdk-s3 = "0.27"
aws-smithy-http = "0.55"
aws-credential-types = "0.55"
aws-types = "0.55"
base64 = "0.13.0"
bincode = "1.3"
bindgen = "0.61"
bindgen = "0.65"
bstr = "1.0"
byteorder = "1.4"
bytes = "1.0"
chrono = { version = "0.4", default-features = false, features = ["clock"] }
clap = "4.0"
clap = { version = "4.0", features = ["derive"] }
close_fds = "0.3.2"
comfy-table = "6.1"
const_format = "0.2"
crc32c = "0.6"
crossbeam-utils = "0.8.5"
either = "1.8"
enum-map = "2.4.2"
enumset = "1.0.12"
fail = "0.5.0"
fs2 = "0.4.3"
futures = "0.3"
@@ -44,8 +62,9 @@ futures-core = "0.3"
futures-util = "0.3"
git-version = "0.3"
hashbrown = "0.13"
hashlink = "0.8.1"
hex = "0.4"
hex-literal = "0.3"
hex-literal = "0.4"
hmac = "0.12.1"
hostname = "0.3.1"
humantime = "2.1"
@@ -57,67 +76,87 @@ jsonwebtoken = "8"
libc = "0.2"
md5 = "0.7.0"
memoffset = "0.8"
native-tls = "0.2"
nix = "0.26"
notify = "5.0.0"
num_cpus = "1.15"
num-traits = "0.2.15"
once_cell = "1.13"
opentelemetry = "0.18.0"
opentelemetry-otlp = { version = "0.11.0", default_features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }
opentelemetry-semantic-conventions = "0.10.0"
parking_lot = "0.12"
pbkdf2 = "0.12.1"
pin-project-lite = "0.2"
prometheus = {version = "0.13", default_features=false, features = ["process"]} # removes protobuf dependency
prost = "0.11"
rand = "0.8"
regex = "1.4"
reqwest = { version = "0.11", default-features = false, features = ["rustls-tls"] }
reqwest-tracing = { version = "0.4.0", features = ["opentelemetry_0_18"] }
reqwest-middleware = "0.2.0"
reqwest-retry = "0.2.2"
routerify = "3"
rpds = "0.12.0"
rpds = "0.13"
rustls = "0.20"
rustls-pemfile = "1"
rustls-split = "0.3"
scopeguard = "1.1"
sentry = { version = "0.29", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }
sentry = { version = "0.30", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"
serde_with = "2.0"
sha2 = "0.10.2"
signal-hook = "0.3"
socket2 = "0.4.4"
socket2 = "0.5"
strum = "0.24"
strum_macros = "0.24"
svg_fmt = "0.4.1"
sync_wrapper = "0.1.2"
tar = "0.4"
test-context = "0.1"
thiserror = "1.0"
tls-listener = { version = "0.6", features = ["rustls", "hyper-h1"] }
tokio = { version = "1.17", features = ["macros"] }
tokio-io-timeout = "1.2.0"
tokio-postgres-rustls = "0.9.0"
tokio-rustls = "0.23"
tokio-stream = "0.1"
tokio-util = { version = "0.7", features = ["io"] }
toml = "0.5"
toml_edit = { version = "0.17", features = ["easy"] }
tonic = {version = "0.8", features = ["tls", "tls-roots"]}
toml = "0.7"
toml_edit = "0.19"
tonic = {version = "0.9", features = ["tls", "tls-roots"]}
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
tracing-error = "0.2.0"
tracing-opentelemetry = "0.18.0"
tracing-subscriber = { version = "0.3", default_features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter"] }
url = "2.2"
uuid = { version = "1.2", features = ["v4", "serde"] }
walkdir = "2.3.2"
webpki-roots = "0.22.5"
x509-parser = "0.14"
webpki-roots = "0.23"
x509-parser = "0.15"
## TODO replace this with tracing
env_logger = "0.10"
log = "0.4"
## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="1aaedab101b23f7612042850d8f2036810fa7c7f" }
postgres-native-tls = { git = "https://github.com/neondatabase/rust-postgres.git", rev="1aaedab101b23f7612042850d8f2036810fa7c7f" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev="1aaedab101b23f7612042850d8f2036810fa7c7f" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev="1aaedab101b23f7612042850d8f2036810fa7c7f" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="1aaedab101b23f7612042850d8f2036810fa7c7f" }
tokio-tar = { git = "https://github.com/neondatabase/tokio-tar.git", rev="404df61437de0feef49ba2ccdbdd94eb8ad6e142" }
## Other git libraries
heapless = { default-features=false, features=[], git = "https://github.com/japaric/heapless.git", rev = "644653bf3b831c6bb4963be2de24804acf5e5001" } # upstream release pending
## Local libraries
compute_api = { version = "0.1", path = "./libs/compute_api/" }
consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }
metrics = { version = "0.1", path = "./libs/metrics/" }
pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }
postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }
postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }
postgres_ffi = { version = "0.1", path = "./libs/postgres_ffi/" }
pq_proto = { version = "0.1", path = "./libs/pq_proto/" }
@@ -125,22 +164,29 @@ remote_storage = { version = "0.1", path = "./libs/remote_storage/" }
safekeeper_api = { version = "0.1", path = "./libs/safekeeper_api" }
storage_broker = { version = "0.1", path = "./storage_broker/" } # Note: main broker code is inside the binary crate, so linking with the library shouldn't be heavy.
tenant_size_model = { version = "0.1", path = "./libs/tenant_size_model/" }
tracing-utils = { version = "0.1", path = "./libs/tracing-utils/" }
utils = { version = "0.1", path = "./libs/utils/" }
## Common library dependency
workspace_hack = { version = "0.1", path = "./workspace_hack/" }
## Build dependencies
criterion = "0.4"
criterion = "0.5.1"
rcgen = "0.10"
rstest = "0.16"
tempfile = "3.2"
tonic-build = "0.8"
rstest = "0.17"
tempfile = "3.4"
tonic-build = "0.9"
[patch.crates-io]
# This is only needed for proxy's tests.
# TODO: we should probably fork `tokio-postgres-rustls` instead.
[patch.crates-io]
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="43e6db254a97fdecbce33d8bc0890accfd74495e" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev="1aaedab101b23f7612042850d8f2036810fa7c7f" }
# Changes the MAX_THREADS limit from 4096 to 32768.
# This is a temporary workaround for using tracing from many threads in safekeepers code,
# until async safekeepers patch is merged to the main.
sharded-slab = { git = "https://github.com/neondatabase/sharded-slab.git", rev="98d16753ab01c61f0a028de44167307a00efea00" }
################# Binary contents sections

View File

@@ -2,7 +2,7 @@
### The image itself is mainly used as a container for the binaries and for starting e2e tests with custom parameters.
### By default, the binaries inside the image have some mock parameters and can start, but are not intended to be used
### inside this image in the real deployments.
ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com
ARG REPOSITORY=neondatabase
ARG IMAGE=rust
ARG TAG=pinned
@@ -39,12 +39,19 @@ ARG CACHEPOT_BUCKET=neon-github-dev
COPY --from=pg-build /home/nonroot/pg_install/v14/include/postgresql/server pg_install/v14/include/postgresql/server
COPY --from=pg-build /home/nonroot/pg_install/v15/include/postgresql/server pg_install/v15/include/postgresql/server
COPY . .
COPY --chown=nonroot . .
# Show build caching stats to check if it was used in the end.
# Has to be the part of the same RUN since cachepot daemon is killed in the end of this RUN, losing the compilation stats.
RUN set -e \
&& mold -run cargo build --bin pageserver --bin pageserver_binutils --bin draw_timeline_dir --bin safekeeper --bin storage_broker --bin proxy --locked --release \
&& mold -run cargo build \
--bin pg_sni_router \
--bin pageserver \
--bin pagectl \
--bin safekeeper \
--bin storage_broker \
--bin proxy \
--locked --release \
&& cachepot -s
# Build final image
@@ -63,9 +70,9 @@ RUN set -e \
&& useradd -d /data neon \
&& chown -R neon:neon /data
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pg_sni_router /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pageserver /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pageserver_binutils /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/draw_timeline_dir /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/pagectl /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/safekeeper /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_broker /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/proxy /usr/local/bin

835
Dockerfile.compute-node Normal file
View File

@@ -0,0 +1,835 @@
ARG PG_VERSION
ARG REPOSITORY=neondatabase
ARG IMAGE=rust
ARG TAG=pinned
ARG BUILD_TAG
#########################################################################################
#
# Layer "build-deps"
#
#########################################################################################
FROM debian:bullseye-slim AS build-deps
RUN apt update && \
apt install -y git autoconf automake libtool build-essential bison flex libreadline-dev \
zlib1g-dev libxml2-dev libcurl4-openssl-dev libossp-uuid-dev wget pkg-config libssl-dev \
libicu-dev libxslt1-dev liblz4-dev libzstd-dev
#########################################################################################
#
# Layer "pg-build"
# Build Postgres from the neon postgres repository.
#
#########################################################################################
FROM build-deps AS pg-build
ARG PG_VERSION
COPY vendor/postgres-${PG_VERSION} postgres
RUN cd postgres && \
export CONFIGURE_CMD="./configure CFLAGS='-O2 -g3' --enable-debug --with-openssl --with-uuid=ossp \
--with-icu --with-libxml --with-libxslt --with-lz4" && \
if [ "${PG_VERSION}" != "v14" ]; then \
# zstd is available only from PG15
export CONFIGURE_CMD="${CONFIGURE_CMD} --with-zstd"; \
fi && \
eval $CONFIGURE_CMD && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C contrib/ install && \
# Install headers
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/include install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/interfaces/libpq install && \
# Enable some of contrib extensions
echo 'trusted = true' >> /usr/local/pgsql/share/extension/autoinc.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/bloom.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/earthdistance.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/insert_username.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/intagg.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/moddatetime.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_stat_statements.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrowlocks.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgstattuple.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/refint.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/xml2.control
#########################################################################################
#
# Layer "postgis-build"
# Build PostGIS from the upstream PostGIS mirror.
#
#########################################################################################
FROM build-deps AS postgis-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt update && \
apt install -y cmake gdal-bin libboost-dev libboost-thread-dev libboost-filesystem-dev \
libboost-system-dev libboost-iostreams-dev libboost-program-options-dev libboost-timer-dev \
libcgal-dev libgdal-dev libgmp-dev libmpfr-dev libopenscenegraph-dev libprotobuf-c-dev \
protobuf-c-compiler xsltproc
# SFCGAL > 1.3 requires CGAL > 5.2, Bullseye's libcgal-dev is 5.2
RUN wget https://gitlab.com/Oslandia/SFCGAL/-/archive/v1.3.10/SFCGAL-v1.3.10.tar.gz -O SFCGAL.tar.gz && \
echo "4e39b3b2adada6254a7bdba6d297bb28e1a9835a9f879b74f37e2dab70203232 SFCGAL.tar.gz" | sha256sum --check && \
mkdir sfcgal-src && cd sfcgal-src && tar xvzf ../SFCGAL.tar.gz --strip-components=1 -C . && \
cmake -DCMAKE_BUILD_TYPE=Release . && make -j $(getconf _NPROCESSORS_ONLN) && \
DESTDIR=/sfcgal make install -j $(getconf _NPROCESSORS_ONLN) && \
make clean && cp -R /sfcgal/* /
ENV PATH "/usr/local/pgsql/bin:$PATH"
RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.2.tar.gz -O postgis.tar.gz && \
echo "9a2a219da005a1730a39d1959a1c7cec619b1efb009b65be80ffc25bad299068 postgis.tar.gz" | sha256sum --check && \
mkdir postgis-src && cd postgis-src && tar xvzf ../postgis.tar.gz --strip-components=1 -C . && \
./autogen.sh && \
./configure --with-sfcgal=/usr/local/bin/sfcgal-config && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
cd extensions/postgis && \
make clean && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_raster.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_sfcgal.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_tiger_geocoder.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_topology.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer_data_us.control
RUN wget https://github.com/pgRouting/pgrouting/archive/v3.4.2.tar.gz -O pgrouting.tar.gz && \
echo "cac297c07d34460887c4f3b522b35c470138760fe358e351ad1db4edb6ee306e pgrouting.tar.gz" | sha256sum --check && \
mkdir pgrouting-src && cd pgrouting-src && tar xvzf ../pgrouting.tar.gz --strip-components=1 -C . && \
mkdir build && \
cd build && \
cmake -DCMAKE_BUILD_TYPE=Release .. && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrouting.control
#########################################################################################
#
# Layer "plv8-build"
# Build plv8
#
#########################################################################################
FROM build-deps AS plv8-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt update && \
apt install -y ninja-build python3-dev libncurses5 binutils clang
RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.5.tar.gz -O plv8.tar.gz && \
echo "1e108d5df639e4c189e1c5bdfa2432a521c126ca89e7e5a969d46899ca7bf106 plv8.tar.gz" | sha256sum --check && \
mkdir plv8-src && cd plv8-src && tar xvzf ../plv8.tar.gz --strip-components=1 -C . && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
make DOCKER=1 -j $(getconf _NPROCESSORS_ONLN) install && \
rm -rf /plv8-* && \
find /usr/local/pgsql/ -name "plv8-*.so" | xargs strip && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plv8.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plcoffee.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plls.control
#########################################################################################
#
# Layer "h3-pg-build"
# Build h3_pg
#
#########################################################################################
FROM build-deps AS h3-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# packaged cmake is too old
RUN wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-x86_64.sh \
-q -O /tmp/cmake-install.sh \
&& echo "739d372726cb23129d57a539ce1432453448816e345e1545f6127296926b6754 /tmp/cmake-install.sh" | sha256sum --check \
&& chmod u+x /tmp/cmake-install.sh \
&& /tmp/cmake-install.sh --skip-license --prefix=/usr/local/ \
&& rm /tmp/cmake-install.sh
RUN wget https://github.com/uber/h3/archive/refs/tags/v4.1.0.tar.gz -O h3.tar.gz && \
echo "ec99f1f5974846bde64f4513cf8d2ea1b8d172d2218ab41803bf6a63532272bc h3.tar.gz" | sha256sum --check && \
mkdir h3-src && cd h3-src && tar xvzf ../h3.tar.gz --strip-components=1 -C . && \
mkdir build && cd build && \
cmake .. -DCMAKE_BUILD_TYPE=Release && \
make -j $(getconf _NPROCESSORS_ONLN) && \
DESTDIR=/h3 make install && \
cp -R /h3/usr / && \
rm -rf build
RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.1.2.tar.gz -O h3-pg.tar.gz && \
echo "c135aa45999b2ad1326d2537c1cadef96d52660838e4ca371706c08fdea1a956 h3-pg.tar.gz" | sha256sum --check && \
mkdir h3-pg-src && cd h3-pg-src && tar xvzf ../h3-pg.tar.gz --strip-components=1 -C . && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/h3.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/h3_postgis.control
#########################################################################################
#
# Layer "unit-pg-build"
# compile unit extension
#
#########################################################################################
FROM build-deps AS unit-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/df7cb/postgresql-unit/archive/refs/tags/7.7.tar.gz -O postgresql-unit.tar.gz && \
echo "411d05beeb97e5a4abf17572bfcfbb5a68d98d1018918feff995f6ee3bb03e79 postgresql-unit.tar.gz" | sha256sum --check && \
mkdir postgresql-unit-src && cd postgresql-unit-src && tar xvzf ../postgresql-unit.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
# unit extension's "create extension" script relies on absolute install path to fill some reference tables.
# We move the extension from '/usr/local/pgsql/' to '/usr/local/' after it is build. So we need to adjust the path.
# This one-liner removes pgsql/ part of the path.
# NOTE: Other extensions that rely on MODULEDIR variable after building phase will need the same fix.
find /usr/local/pgsql/share/extension/ -name "unit*.sql" -print0 | xargs -0 sed -i "s|pgsql/||g" && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/unit.control
#########################################################################################
#
# Layer "vector-pg-build"
# compile pgvector extension
#
#########################################################################################
FROM build-deps AS vector-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/pgvector/pgvector/archive/refs/tags/v0.4.4.tar.gz -O pgvector.tar.gz && \
echo "1cb70a63f8928e396474796c22a20be9f7285a8a013009deb8152445b61b72e6 pgvector.tar.gz" | sha256sum --check && \
mkdir pgvector-src && cd pgvector-src && tar xvzf ../pgvector.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/vector.control
#########################################################################################
#
# Layer "pgjwt-pg-build"
# compile pgjwt extension
#
#########################################################################################
FROM build-deps AS pgjwt-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# 9742dab1b2f297ad3811120db7b21451bca2d3c9 made on 13/11/2021
RUN wget https://github.com/michelp/pgjwt/archive/9742dab1b2f297ad3811120db7b21451bca2d3c9.tar.gz -O pgjwt.tar.gz && \
echo "cfdefb15007286f67d3d45510f04a6a7a495004be5b3aecb12cda667e774203f pgjwt.tar.gz" | sha256sum --check && \
mkdir pgjwt-src && cd pgjwt-src && tar xvzf ../pgjwt.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgjwt.control
#########################################################################################
#
# Layer "hypopg-pg-build"
# compile hypopg extension
#
#########################################################################################
FROM build-deps AS hypopg-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/HypoPG/hypopg/archive/refs/tags/1.3.1.tar.gz -O hypopg.tar.gz && \
echo "e7f01ee0259dc1713f318a108f987663d60f3041948c2ada57a94b469565ca8e hypopg.tar.gz" | sha256sum --check && \
mkdir hypopg-src && cd hypopg-src && tar xvzf ../hypopg.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/hypopg.control
#########################################################################################
#
# Layer "pg-hashids-pg-build"
# compile pg_hashids extension
#
#########################################################################################
FROM build-deps AS pg-hashids-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/iCyberon/pg_hashids/archive/refs/tags/v1.2.1.tar.gz -O pg_hashids.tar.gz && \
echo "74576b992d9277c92196dd8d816baa2cc2d8046fe102f3dcd7f3c3febed6822a pg_hashids.tar.gz" | sha256sum --check && \
mkdir pg_hashids-src && cd pg_hashids-src && tar xvzf ../pg_hashids.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_hashids.control
#########################################################################################
#
# Layer "rum-pg-build"
# compile rum extension
#
#########################################################################################
FROM build-deps AS rum-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/postgrespro/rum/archive/refs/tags/1.3.13.tar.gz -O rum.tar.gz && \
echo "6ab370532c965568df6210bd844ac6ba649f53055e48243525b0b7e5c4d69a7d rum.tar.gz" | sha256sum --check && \
mkdir rum-src && cd rum-src && tar xvzf ../rum.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/rum.control
#########################################################################################
#
# Layer "pgtap-pg-build"
# compile pgTAP extension
#
#########################################################################################
FROM build-deps AS pgtap-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/theory/pgtap/archive/refs/tags/v1.2.0.tar.gz -O pgtap.tar.gz && \
echo "9c7c3de67ea41638e14f06da5da57bac6f5bd03fea05c165a0ec862205a5c052 pgtap.tar.gz" | sha256sum --check && \
mkdir pgtap-src && cd pgtap-src && tar xvzf ../pgtap.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgtap.control
#########################################################################################
#
# Layer "ip4r-pg-build"
# compile ip4r extension
#
#########################################################################################
FROM build-deps AS ip4r-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/RhodiumToad/ip4r/archive/refs/tags/2.4.1.tar.gz -O ip4r.tar.gz && \
echo "78b9f0c1ae45c22182768fe892a32d533c82281035e10914111400bf6301c726 ip4r.tar.gz" | sha256sum --check && \
mkdir ip4r-src && cd ip4r-src && tar xvzf ../ip4r.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/ip4r.control
#########################################################################################
#
# Layer "prefix-pg-build"
# compile Prefix extension
#
#########################################################################################
FROM build-deps AS prefix-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/dimitri/prefix/archive/refs/tags/v1.2.9.tar.gz -O prefix.tar.gz && \
echo "38d30a08d0241a8bbb8e1eb8f0152b385051665a8e621c8899e7c5068f8b511e prefix.tar.gz" | sha256sum --check && \
mkdir prefix-src && cd prefix-src && tar xvzf ../prefix.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/prefix.control
#########################################################################################
#
# Layer "hll-pg-build"
# compile hll extension
#
#########################################################################################
FROM build-deps AS hll-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/citusdata/postgresql-hll/archive/refs/tags/v2.17.tar.gz -O hll.tar.gz && \
echo "9a18288e884f197196b0d29b9f178ba595b0dfc21fbf7a8699380e77fa04c1e9 hll.tar.gz" | sha256sum --check && \
mkdir hll-src && cd hll-src && tar xvzf ../hll.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/hll.control
#########################################################################################
#
# Layer "plpgsql-check-pg-build"
# compile plpgsql_check extension
#
#########################################################################################
FROM build-deps AS plpgsql-check-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN wget https://github.com/okbob/plpgsql_check/archive/refs/tags/v2.3.2.tar.gz -O plpgsql_check.tar.gz && \
echo "9d81167c4bbeb74eebf7d60147b21961506161addc2aee537f95ad8efeae427b plpgsql_check.tar.gz" | sha256sum --check && \
mkdir plpgsql_check-src && cd plpgsql_check-src && tar xvzf ../plpgsql_check.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plpgsql_check.control
#########################################################################################
#
# Layer "timescaledb-pg-build"
# compile timescaledb extension
#
#########################################################################################
FROM build-deps AS timescaledb-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin:$PATH"
RUN apt-get update && \
apt-get install -y cmake && \
wget https://github.com/timescale/timescaledb/archive/refs/tags/2.10.1.tar.gz -O timescaledb.tar.gz && \
echo "6fca72a6ed0f6d32d2b3523951ede73dc5f9b0077b38450a029a5f411fdb8c73 timescaledb.tar.gz" | sha256sum --check && \
mkdir timescaledb-src && cd timescaledb-src && tar xvzf ../timescaledb.tar.gz --strip-components=1 -C . && \
./bootstrap -DSEND_TELEMETRY_DEFAULT:BOOL=OFF -DUSE_TELEMETRY:BOOL=OFF -DAPACHE_ONLY:BOOL=ON -DCMAKE_BUILD_TYPE=Release && \
cd build && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make install -j $(getconf _NPROCESSORS_ONLN) && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/timescaledb.control
#########################################################################################
#
# Layer "pg-hint-plan-pg-build"
# compile pg_hint_plan extension
#
#########################################################################################
FROM build-deps AS pg-hint-plan-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ARG PG_VERSION
ENV PATH "/usr/local/pgsql/bin:$PATH"
RUN case "${PG_VERSION}" in \
"v14") \
export PG_HINT_PLAN_VERSION=14_1_4_1 \
export PG_HINT_PLAN_CHECKSUM=c3501becf70ead27f70626bce80ea401ceac6a77e2083ee5f3ff1f1444ec1ad1 \
;; \
"v15") \
export PG_HINT_PLAN_VERSION=15_1_5_0 \
export PG_HINT_PLAN_CHECKSUM=564cbbf4820973ffece63fbf76e3c0af62c4ab23543142c7caaa682bc48918be \
;; \
*) \
echo "Export the valid PG_HINT_PLAN_VERSION variable" && exit 1 \
;; \
esac && \
wget https://github.com/ossc-db/pg_hint_plan/archive/refs/tags/REL${PG_HINT_PLAN_VERSION}.tar.gz -O pg_hint_plan.tar.gz && \
echo "${PG_HINT_PLAN_CHECKSUM} pg_hint_plan.tar.gz" | sha256sum --check && \
mkdir pg_hint_plan-src && cd pg_hint_plan-src && tar xvzf ../pg_hint_plan.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make install -j $(getconf _NPROCESSORS_ONLN) && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_hint_plan.control
#########################################################################################
#
# Layer "kq-imcx-pg-build"
# compile kq_imcx extension
#
#########################################################################################
FROM build-deps AS kq-imcx-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN apt-get update && \
apt-get install -y git libgtk2.0-dev libpq-dev libpam-dev libxslt-dev libkrb5-dev cmake && \
wget https://github.com/ketteq-neon/postgres-exts/archive/e0bd1a9d9313d7120c1b9c7bb15c48c0dede4c4e.tar.gz -O kq_imcx.tar.gz && \
echo "dc93a97ff32d152d32737ba7e196d9687041cda15e58ab31344c2f2de8855336 kq_imcx.tar.gz" | sha256sum --check && \
mkdir kq_imcx-src && cd kq_imcx-src && tar xvzf ../kq_imcx.tar.gz --strip-components=1 -C . && \
mkdir build && \
cd build && \
cmake -DCMAKE_BUILD_TYPE=Release .. && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/kq_imcx.control
#########################################################################################
#
# Layer "pg-cron-pg-build"
# compile pg_cron extension
#
#########################################################################################
FROM build-deps AS pg-cron-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://github.com/citusdata/pg_cron/archive/refs/tags/v1.5.2.tar.gz -O pg_cron.tar.gz && \
echo "6f7f0980c03f1e2a6a747060e67bf4a303ca2a50e941e2c19daeed2b44dec744 pg_cron.tar.gz" | sha256sum --check && \
mkdir pg_cron-src && cd pg_cron-src && tar xvzf ../pg_cron.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_cron.control
#########################################################################################
#
# Layer "rdkit-pg-build"
# compile rdkit extension
#
#########################################################################################
FROM build-deps AS rdkit-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt-get update && \
apt-get install -y \
cmake \
libboost-iostreams1.74-dev \
libboost-regex1.74-dev \
libboost-serialization1.74-dev \
libboost-system1.74-dev \
libeigen3-dev \
libfreetype6-dev
ENV PATH "/usr/local/pgsql/bin/:/usr/local/pgsql/:$PATH"
RUN wget https://github.com/rdkit/rdkit/archive/refs/tags/Release_2023_03_1.tar.gz -O rdkit.tar.gz && \
echo "db346afbd0ba52c843926a2a62f8a38c7b774ffab37eaf382d789a824f21996c rdkit.tar.gz" | sha256sum --check && \
mkdir rdkit-src && cd rdkit-src && tar xvzf ../rdkit.tar.gz --strip-components=1 -C . && \
cmake \
-D RDK_BUILD_CAIRO_SUPPORT=OFF \
-D RDK_BUILD_INCHI_SUPPORT=ON \
-D RDK_BUILD_AVALON_SUPPORT=ON \
-D RDK_BUILD_PYTHON_WRAPPERS=OFF \
-D RDK_BUILD_DESCRIPTORS3D=OFF \
-D RDK_BUILD_FREESASA_SUPPORT=OFF \
-D RDK_BUILD_COORDGEN_SUPPORT=ON \
-D RDK_BUILD_MOLINTERCHANGE_SUPPORT=OFF \
-D RDK_BUILD_YAEHMOP_SUPPORT=OFF \
-D RDK_BUILD_STRUCTCHECKER_SUPPORT=OFF \
-D RDK_USE_URF=OFF \
-D RDK_BUILD_PGSQL=ON \
-D RDK_PGSQL_STATIC=ON \
-D PostgreSQL_CONFIG=pg_config \
-D PostgreSQL_INCLUDE_DIR=`pg_config --includedir` \
-D PostgreSQL_TYPE_INCLUDE_DIR=`pg_config --includedir-server` \
-D PostgreSQL_LIBRARY_DIR=`pg_config --libdir` \
-D RDK_INSTALL_INTREE=OFF \
-D CMAKE_BUILD_TYPE=Release \
. && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/rdkit.control
#########################################################################################
#
# Layer "pg-uuidv7-pg-build"
# compile pg_uuidv7 extension
#
#########################################################################################
FROM build-deps AS pg-uuidv7-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://github.com/fboulnois/pg_uuidv7/archive/refs/tags/v1.0.1.tar.gz -O pg_uuidv7.tar.gz && \
echo "0d0759ab01b7fb23851ecffb0bce27822e1868a4a5819bfd276101c716637a7a pg_uuidv7.tar.gz" | sha256sum --check && \
mkdir pg_uuidv7-src && cd pg_uuidv7-src && tar xvzf ../pg_uuidv7.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pg_uuidv7.control
#########################################################################################
#
# Layer "pg-roaringbitmap-pg-build"
# compile pg_roaringbitmap extension
#
#########################################################################################
FROM build-deps AS pg-roaringbitmap-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://github.com/ChenHuajun/pg_roaringbitmap/archive/refs/tags/v0.5.4.tar.gz -O pg_roaringbitmap.tar.gz && \
echo "b75201efcb1c2d1b014ec4ae6a22769cc7a224e6e406a587f5784a37b6b5a2aa pg_roaringbitmap.tar.gz" | sha256sum --check && \
mkdir pg_roaringbitmap-src && cd pg_roaringbitmap-src && tar xvzf ../pg_roaringbitmap.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/roaringbitmap.control
#########################################################################################
#
# Layer "pg-embedding-pg-build"
# compile pg_embedding extension
#
#########################################################################################
FROM build-deps AS pg-embedding-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ENV PATH "/usr/local/pgsql/bin/:$PATH"
# 2465f831ea1f8d49c1d74f8959adb7fc277d70cd made on 05/07/2023
# There is no release tag yet
RUN wget https://github.com/neondatabase/pg_embedding/archive/2465f831ea1f8d49c1d74f8959adb7fc277d70cd.tar.gz -O pg_embedding.tar.gz && \
echo "047af2b1f664a1e6e37867bd4eeaf5934fa27d6ba3d6c4461efa388ddf7cd1d5 pg_embedding.tar.gz" | sha256sum --check && \
mkdir pg_embedding-src && cd pg_embedding-src && tar xvzf ../pg_embedding.tar.gz --strip-components=1 -C . && \
find /usr/local/pgsql -type f | sort > /before_embedding.txt && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/embedding.control &&\
find /usr/local/pgsql -type f | sort > /after_embedding.txt &&\
/bin/bash -c 'for from in $(comm -13 /before_embedding.txt /after_embedding.txt); do to=/extensions/embedding/${from:17} && mkdir -p $(dirname ${to}) && cp -a ${from} ${to}; done' && \
tar -zcvf /extensions/embedding.tar.gz /extensions/embedding && \
mkdir -p /control_files &&\
cp /usr/local/pgsql/share/extension/embedding.control /control_files/embedding.control
#########################################################################################
#
# Layer "pg-anon-pg-build"
# compile anon extension
#
#########################################################################################
FROM build-deps AS pg-anon-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# Kaniko doesn't allow to do `${from#/usr/local/pgsql/}`, so we use `${from:17}` instead
ENV PATH "/usr/local/pgsql/bin/:$PATH"
RUN wget https://gitlab.com/dalibo/postgresql_anonymizer/-/archive/1.1.0/postgresql_anonymizer-1.1.0.tar.gz -O pg_anon.tar.gz && \
echo "08b09d2ff9b962f96c60db7e6f8e79cf7253eb8772516998fc35ece08633d3ad pg_anon.tar.gz" | sha256sum --check && \
mkdir pg_anon-src && cd pg_anon-src && tar xvzf ../pg_anon.tar.gz --strip-components=1 -C . && \
find /usr/local/pgsql -type f | sort > /before.txt && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/anon.control && \
find /usr/local/pgsql -type f | sort > /after.txt && \
/bin/bash -c 'for from in $(comm -13 /before.txt /after.txt); do to=/extensions/anon/${from:17} && mkdir -p $(dirname ${to}) && cp -a ${from} ${to}; done' && \
mkdir /control_files &&\
tar -zcvf /extensions/anon.tar.gz /extensions/anon && \
mkdir -p /control_files &&\
cp /usr/local/pgsql/share/extension/anon.control /control_files/anon.control
# # TODO: Delete leftovers from the extension build step
# rm -rf ./extensions/lib/pgxs
# rm -rf ./extensions/lib/pkgconfig
#########################################################################################
#
# Layer "rust extensions"
# This layer is used to build `pgx` deps
#
#########################################################################################
FROM build-deps AS rust-extensions-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt-get update && \
apt-get install -y curl libclang-dev cmake && \
useradd -ms /bin/bash nonroot -b /home
ENV HOME=/home/nonroot
ENV PATH="/home/nonroot/.cargo/bin:/usr/local/pgsql/bin/:$PATH"
USER nonroot
WORKDIR /home/nonroot
ARG PG_VERSION
RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && \
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain stable && \
rm rustup-init && \
cargo install --locked --version 0.7.3 cargo-pgx && \
/bin/bash -c 'cargo pgx init --pg${PG_VERSION:1}=/usr/local/pgsql/bin/pg_config'
USER root
#########################################################################################
#
# Layer "pg-jsonschema-pg-build"
# Compile "pg_jsonschema" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-jsonschema-pg-build
# caeab60d70b2fd3ae421ec66466a3abbb37b7ee6 made on 06/03/2023
# there is no release tag yet, but we need it due to the superuser fix in the control file, switch to git tag after release >= 0.1.5
RUN wget https://github.com/supabase/pg_jsonschema/archive/caeab60d70b2fd3ae421ec66466a3abbb37b7ee6.tar.gz -O pg_jsonschema.tar.gz && \
echo "54129ce2e7ee7a585648dbb4cef6d73f795d94fe72f248ac01119992518469a4 pg_jsonschema.tar.gz" | sha256sum --check && \
mkdir pg_jsonschema-src && cd pg_jsonschema-src && tar xvzf ../pg_jsonschema.tar.gz --strip-components=1 -C . && \
sed -i 's/pgx = "0.7.1"/pgx = { version = "0.7.3", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
cargo pgx install --release && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_jsonschema.control
#########################################################################################
#
# Layer "pg-graphql-pg-build"
# Compile "pg_graphql" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-graphql-pg-build
# b4988843647450a153439be367168ed09971af85 made on 22/02/2023 (from remove-pgx-contrib-spiext branch)
# Currently pgx version bump to >= 0.7.2 causes "call to unsafe function" compliation errors in
# pgx-contrib-spiext. There is a branch that removes that dependency, so use it. It is on the
# same 1.1 version we've used before.
RUN wget https://github.com/yrashk/pg_graphql/archive/b4988843647450a153439be367168ed09971af85.tar.gz -O pg_graphql.tar.gz && \
echo "0c7b0e746441b2ec24187d0e03555faf935c2159e2839bddd14df6dafbc8c9bd pg_graphql.tar.gz" | sha256sum --check && \
mkdir pg_graphql-src && cd pg_graphql-src && tar xvzf ../pg_graphql.tar.gz --strip-components=1 -C . && \
sed -i 's/pgx = "~0.7.1"/pgx = { version = "0.7.3", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
sed -i 's/pgx-tests = "~0.7.1"/pgx-tests = "0.7.3"/g' Cargo.toml && \
cargo pgx install --release && \
# it's needed to enable extension because it uses untrusted C language
sed -i 's/superuser = false/superuser = true/g' /usr/local/pgsql/share/extension/pg_graphql.control && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_graphql.control
#########################################################################################
#
# Layer "pg-tiktoken-build"
# Compile "pg_tiktoken" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-tiktoken-pg-build
# 801f84f08c6881c8aa30f405fafbf00eec386a72 made on 10/03/2023
RUN wget https://github.com/kelvich/pg_tiktoken/archive/801f84f08c6881c8aa30f405fafbf00eec386a72.tar.gz -O pg_tiktoken.tar.gz && \
echo "52f60ac800993a49aa8c609961842b611b6b1949717b69ce2ec9117117e16e4a pg_tiktoken.tar.gz" | sha256sum --check && \
mkdir pg_tiktoken-src && cd pg_tiktoken-src && tar xvzf ../pg_tiktoken.tar.gz --strip-components=1 -C . && \
cargo pgx install --release && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/pg_tiktoken.control
#########################################################################################
#
# Layer "pg-pgx-ulid-build"
# Compile "pgx_ulid" extension
#
#########################################################################################
FROM rust-extensions-build AS pg-pgx-ulid-build
RUN wget https://github.com/pksunkara/pgx_ulid/archive/refs/tags/v0.1.0.tar.gz -O pgx_ulid.tar.gz && \
echo "908b7358e6f846e87db508ae5349fb56a88ee6305519074b12f3d5b0ff09f791 pgx_ulid.tar.gz" | sha256sum --check && \
mkdir pgx_ulid-src && cd pgx_ulid-src && tar xvzf ../pgx_ulid.tar.gz --strip-components=1 -C . && \
sed -i 's/pgx = "=0.7.3"/pgx = { version = "0.7.3", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
cargo pgx install --release && \
echo "trusted = true" >> /usr/local/pgsql/share/extension/ulid.control
#########################################################################################
#
# Layer "neon-pg-ext-build"
# compile neon extensions
#
#########################################################################################
FROM build-deps AS neon-pg-ext-build
# Public extensions
COPY --from=postgis-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=postgis-build /sfcgal/* /
COPY --from=plv8-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /h3/usr /
COPY --from=unit-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=vector-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pgjwt-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-jsonschema-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-graphql-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-tiktoken-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=hypopg-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-hashids-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=rum-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pgtap-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=ip4r-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=prefix-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=hll-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=plpgsql-check-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=timescaledb-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-hint-plan-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=kq-imcx-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-cron-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-pgx-ulid-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=rdkit-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-uuidv7-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-roaringbitmap-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=pg-embedding-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY pgxn/ pgxn/
RUN make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon \
-s install && \
make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon_utils \
-s install && \
make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/hnsw \
-s install
#########################################################################################
#
# Compile and run the Neon-specific `compute_ctl` binary
#
#########################################################################################
FROM $REPOSITORY/$IMAGE:$TAG AS compute-tools
ARG BUILD_TAG
ENV BUILD_TAG=$BUILD_TAG
USER nonroot
# Copy entire project to get Cargo.* files with proper dependencies for the whole project
COPY --chown=nonroot . .
RUN cd compute_tools && cargo build --locked --profile release-line-debug-size-lto
#########################################################################################
#
# Clean up postgres folder before inclusion
#
#########################################################################################
FROM neon-pg-ext-build AS postgres-cleanup-layer
COPY --from=neon-pg-ext-build /usr/local/pgsql /usr/local/pgsql
# Remove binaries from /bin/ that we won't use (or would manually copy & install otherwise)
RUN cd /usr/local/pgsql/bin && rm ecpg raster2pgsql shp2pgsql pgtopo_export pgtopo_import pgsql2shp
# Remove headers that we won't need anymore - we've completed installation of all extensions
RUN rm -r /usr/local/pgsql/include
# Remove static postgresql libraries - all compilation is finished, so we
# can now remove these files - they must be included in other binaries by now
# if they were to be used by other libraries.
RUN rm /usr/local/pgsql/lib/lib*.a
#########################################################################################
#
# Extension only
#
#########################################################################################
FROM scratch AS postgres-extensions
# After the transition this layer will include all extensitons.
# As for now, it's only for new custom ones
#
# extensions
COPY --from=pg-anon-pg-build /extensions/anon.tar.gz /extensions/anon.tar.gz
COPY --from=pg-anon-pg-build /control_files/anon.control /control_files/anon.control
COPY --from=pg-embedding-pg-build /extensions/embedding.tar.gz /extensions/embedding.tar.gz
COPY --from=pg-embedding-pg-build /control_files/embedding.control /control_files/embedding.control
FROM python:3.11-slim-bookworm AS alekpython
COPY scripts/combine_control_files.py combine_control_files.py
COPY --from=postgres-extensions /control_files /control_files
CMD [ "python3", "./combine_control_files.py"]
#########################################################################################
#
# Final layer
# Put it all together into the final image
#
#########################################################################################
FROM debian:bullseye-slim
# Add user postgres
RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \
echo "postgres:test_console_pass" | chpasswd && \
mkdir /var/db/postgres/compute && mkdir /var/db/postgres/specs && \
chown -R postgres:postgres /var/db/postgres && \
chmod 0750 /var/db/postgres/compute && \
echo '/usr/local/lib' >> /etc/ld.so.conf && /sbin/ldconfig && \
# create folder for file cache
mkdir -p -m 777 /neon/cache
COPY --from=postgres-cleanup-layer --chown=postgres /usr/local/pgsql /usr/local
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
# Install:
# libreadline8 for psql
# libicu67, locales for collations (including ICU and plpgsql_check)
# liblz4-1 for lz4
# libossp-uuid16 for extension ossp-uuid
# libgeos, libgdal, libsfcgal1, libproj and libprotobuf-c1 for PostGIS
# libxml2, libxslt1.1 for xml2
# libzstd1 for zstd
# libboost*, libfreetype6, and zlib1g for rdkit
RUN apt update && \
apt install --no-install-recommends -y \
gdb \
libicu67 \
liblz4-1 \
libreadline8 \
libboost-iostreams1.74.0 \
libboost-regex1.74.0 \
libboost-serialization1.74.0 \
libboost-system1.74.0 \
libossp-uuid16 \
libfreetype6 \
libgeos-c1v5 \
libgdal28 \
libproj19 \
libprotobuf-c1 \
libsfcgal1 \
libxml2 \
libxslt1.1 \
libzstd1 \
libcurl4-openssl-dev \
locales \
procps \
zlib1g && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG en_US.utf8
USER postgres
ENTRYPOINT ["/usr/local/bin/compute_ctl"]

View File

@@ -1,220 +0,0 @@
#
# This file is identical to the Dockerfile.compute-node-v15 file
# except for the version of Postgres that is built.
#
ARG TAG=pinned
#########################################################################################
#
# Layer "build-deps"
#
#########################################################################################
FROM debian:bullseye-slim AS build-deps
RUN apt update && \
apt install -y git autoconf automake libtool build-essential bison flex libreadline-dev \
zlib1g-dev libxml2-dev libcurl4-openssl-dev libossp-uuid-dev wget pkg-config libssl-dev
#########################################################################################
#
# Layer "pg-build"
# Build Postgres from the neon postgres repository.
#
#########################################################################################
FROM build-deps AS pg-build
COPY vendor/postgres-v14 postgres
RUN cd postgres && \
./configure CFLAGS='-O2 -g3' --enable-debug --with-openssl --with-uuid=ossp && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C contrib/ install && \
# Install headers
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/include install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/interfaces/libpq install && \
# Enable some of contrib extensions
echo 'trusted = true' >> /usr/local/pgsql/share/extension/bloom.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrowlocks.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/intagg.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgstattuple.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/earthdistance.control
#########################################################################################
#
# Layer "postgis-build"
# Build PostGIS from the upstream PostGIS mirror.
#
#########################################################################################
FROM build-deps AS postgis-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt update && \
apt install -y gdal-bin libgdal-dev libprotobuf-c-dev protobuf-c-compiler xsltproc
RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.1.tar.gz && \
tar xvzf postgis-3.3.1.tar.gz && \
cd postgis-3.3.1 && \
./autogen.sh && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
./configure && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
cd extensions/postgis && \
make clean && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_raster.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_tiger_geocoder.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_topology.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer_data_us.control
#########################################################################################
#
# Layer "plv8-build"
# Build plv8
#
#########################################################################################
FROM build-deps AS plv8-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt update && \
apt install -y ninja-build python3-dev libc++-dev libc++abi-dev libncurses5 binutils
# https://github.com/plv8/plv8/issues/475:
# v8 uses gold for linking and sets `--thread-count=4` which breaks
# gold version <= 1.35 (https://sourceware.org/bugzilla/show_bug.cgi?id=23607)
# Install newer gold version manually as debian-testing binutils version updates
# libc version, which in turn breaks other extension built against non-testing libc.
RUN wget https://ftp.gnu.org/gnu/binutils/binutils-2.38.tar.gz && \
tar xvzf binutils-2.38.tar.gz && \
cd binutils-2.38 && \
cd libiberty && ./configure && make -j $(getconf _NPROCESSORS_ONLN) && \
cd ../bfd && ./configure && make bfdver.h && \
cd ../gold && ./configure && make -j $(getconf _NPROCESSORS_ONLN) && make install && \
cp /usr/local/bin/ld.gold /usr/bin/gold
# Sed is used to patch for https://github.com/plv8/plv8/issues/503
RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.4.tar.gz && \
tar xvzf v3.1.4.tar.gz && \
cd plv8-3.1.4 && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
sed -i 's/MemoryContextAlloc(/MemoryContextAllocZero(/' plv8.cc && \
make DOCKER=1 -j $(getconf _NPROCESSORS_ONLN) install && \
rm -rf /plv8-* && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plv8.control
#########################################################################################
#
# Layer "h3-pg-build"
# Build h3_pg
#
#########################################################################################
FROM build-deps AS h3-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# packaged cmake is too old
RUN wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-x86_64.sh \
-q -O /tmp/cmake-install.sh \
&& chmod u+x /tmp/cmake-install.sh \
&& /tmp/cmake-install.sh --skip-license --prefix=/usr/local/ \
&& rm /tmp/cmake-install.sh
RUN wget https://github.com/uber/h3/archive/refs/tags/v4.0.1.tar.gz -O h3.tgz && \
tar xvzf h3.tgz && \
cd h3-4.0.1 && \
mkdir build && \
cd build && \
cmake .. -DCMAKE_BUILD_TYPE=Release && \
make -j $(getconf _NPROCESSORS_ONLN) && \
DESTDIR=/h3 make install && \
cp -R /h3/usr / && \
rm -rf build
RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.0.1.tar.gz -O h3-pg.tgz && \
tar xvzf h3-pg.tgz && \
cd h3-pg-4.0.1 && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/h3.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/h3_postgis.control
#########################################################################################
#
# Layer "neon-pg-ext-build"
# compile neon extensions
#
#########################################################################################
FROM build-deps AS neon-pg-ext-build
COPY --from=postgis-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=plv8-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /h3/usr /
COPY pgxn/ pgxn/
RUN make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon \
-s install
#########################################################################################
#
# Compile and run the Neon-specific `compute_ctl` binary
#
#########################################################################################
FROM 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:$TAG AS compute-tools
USER nonroot
# Copy entire project to get Cargo.* files with proper dependencies for the whole project
COPY --chown=nonroot . .
RUN cd compute_tools && cargo build --locked --profile release-line-debug-size-lto
#########################################################################################
#
# Clean up postgres folder before inclusion
#
#########################################################################################
FROM neon-pg-ext-build AS postgres-cleanup-layer
COPY --from=neon-pg-ext-build /usr/local/pgsql /usr/local/pgsql
# Remove binaries from /bin/ that we won't use (or would manually copy & install otherwise)
RUN cd /usr/local/pgsql/bin && rm ecpg raster2pgsql shp2pgsql pgtopo_export pgtopo_import pgsql2shp
# Remove headers that we won't need anymore - we've completed installation of all extensions
RUN rm -r /usr/local/pgsql/include
# Remove static postgresql libraries - all compilation is finished, so we
# can now remove these files - they must be included in other binaries by now
# if they were to be used by other libraries.
RUN rm /usr/local/pgsql/lib/lib*.a
#########################################################################################
#
# Final layer
# Put it all together into the final image
#
#########################################################################################
FROM debian:bullseye-slim
# Add user postgres
RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \
echo "postgres:test_console_pass" | chpasswd && \
mkdir /var/db/postgres/compute && mkdir /var/db/postgres/specs && \
chown -R postgres:postgres /var/db/postgres && \
chmod 0750 /var/db/postgres/compute && \
echo '/usr/local/lib' >> /etc/ld.so.conf && /sbin/ldconfig
COPY --from=postgres-cleanup-layer --chown=postgres /usr/local/pgsql /usr/local
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
# Install:
# libreadline8 for psql
# libossp-uuid16 for extension ossp-uuid
# libgeos, libgdal, libproj and libprotobuf-c1 for PostGIS
RUN apt update && \
apt install --no-install-recommends -y \
libreadline8 \
libossp-uuid16 \
libgeos-c1v5 \
libgdal28 \
libproj19 \
libprotobuf-c1 \
gdb && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
USER postgres
ENTRYPOINT ["/usr/local/bin/compute_ctl"]

View File

@@ -1,220 +0,0 @@
#
# This file is identical to the Dockerfile.compute-node-v14 file
# except for the version of Postgres that is built.
#
ARG TAG=pinned
#########################################################################################
#
# Layer "build-deps"
#
#########################################################################################
FROM debian:bullseye-slim AS build-deps
RUN apt update && \
apt install -y git autoconf automake libtool build-essential bison flex libreadline-dev \
zlib1g-dev libxml2-dev libcurl4-openssl-dev libossp-uuid-dev wget pkg-config libssl-dev
#########################################################################################
#
# Layer "pg-build"
# Build Postgres from the neon postgres repository.
#
#########################################################################################
FROM build-deps AS pg-build
COPY vendor/postgres-v15 postgres
RUN cd postgres && \
./configure CFLAGS='-O2 -g3' --enable-debug --with-openssl --with-uuid=ossp && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C contrib/ install && \
# Install headers
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/include install && \
make MAKELEVEL=0 -j $(getconf _NPROCESSORS_ONLN) -s -C src/interfaces/libpq install && \
# Enable some of contrib extensions
echo 'trusted = true' >> /usr/local/pgsql/share/extension/bloom.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgrowlocks.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/intagg.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/pgstattuple.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/earthdistance.control
#########################################################################################
#
# Layer "postgis-build"
# Build PostGIS from the upstream PostGIS mirror.
#
#########################################################################################
FROM build-deps AS postgis-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt update && \
apt install -y gdal-bin libgdal-dev libprotobuf-c-dev protobuf-c-compiler xsltproc
RUN wget https://download.osgeo.org/postgis/source/postgis-3.3.1.tar.gz && \
tar xvzf postgis-3.3.1.tar.gz && \
cd postgis-3.3.1 && \
./autogen.sh && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
./configure && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
cd extensions/postgis && \
make clean && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_raster.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_tiger_geocoder.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/postgis_topology.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/address_standardizer_data_us.control
#########################################################################################
#
# Layer "plv8-build"
# Build plv8
#
#########################################################################################
FROM build-deps AS plv8-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN apt update && \
apt install -y ninja-build python3-dev libc++-dev libc++abi-dev libncurses5 binutils
# https://github.com/plv8/plv8/issues/475:
# v8 uses gold for linking and sets `--thread-count=4` which breaks
# gold version <= 1.35 (https://sourceware.org/bugzilla/show_bug.cgi?id=23607)
# Install newer gold version manually as debian-testing binutils version updates
# libc version, which in turn breaks other extension built against non-testing libc.
RUN wget https://ftp.gnu.org/gnu/binutils/binutils-2.38.tar.gz && \
tar xvzf binutils-2.38.tar.gz && \
cd binutils-2.38 && \
cd libiberty && ./configure && make -j $(getconf _NPROCESSORS_ONLN) && \
cd ../bfd && ./configure && make bfdver.h && \
cd ../gold && ./configure && make -j $(getconf _NPROCESSORS_ONLN) && make install && \
cp /usr/local/bin/ld.gold /usr/bin/gold
# Sed is used to patch for https://github.com/plv8/plv8/issues/503
RUN wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.4.tar.gz && \
tar xvzf v3.1.4.tar.gz && \
cd plv8-3.1.4 && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
sed -i 's/MemoryContextAlloc(/MemoryContextAllocZero(/' plv8.cc && \
make DOCKER=1 -j $(getconf _NPROCESSORS_ONLN) install && \
rm -rf /plv8-* && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plv8.control
#########################################################################################
#
# Layer "h3-pg-build"
# Build h3_pg
#
#########################################################################################
FROM build-deps AS h3-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# packaged cmake is too old
RUN wget https://github.com/Kitware/CMake/releases/download/v3.24.2/cmake-3.24.2-linux-x86_64.sh \
-q -O /tmp/cmake-install.sh \
&& chmod u+x /tmp/cmake-install.sh \
&& /tmp/cmake-install.sh --skip-license --prefix=/usr/local/ \
&& rm /tmp/cmake-install.sh
RUN wget https://github.com/uber/h3/archive/refs/tags/v4.0.1.tar.gz -O h3.tgz && \
tar xvzf h3.tgz && \
cd h3-4.0.1 && \
mkdir build && \
cd build && \
cmake .. -DCMAKE_BUILD_TYPE=Release && \
make -j $(getconf _NPROCESSORS_ONLN) && \
DESTDIR=/h3 make install && \
cp -R /h3/usr / && \
rm -rf build
RUN wget https://github.com/zachasme/h3-pg/archive/refs/tags/v4.0.1.tar.gz -O h3-pg.tgz && \
tar xvzf h3-pg.tgz && \
cd h3-pg-4.0.1 && \
export PATH="/usr/local/pgsql/bin:$PATH" && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/h3.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/h3_postgis.control
#########################################################################################
#
# Layer "neon-pg-ext-build"
# compile neon extensions
#
#########################################################################################
FROM build-deps AS neon-pg-ext-build
COPY --from=postgis-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=plv8-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY --from=h3-pg-build /h3/usr /
COPY pgxn/ pgxn/
RUN make -j $(getconf _NPROCESSORS_ONLN) \
PG_CONFIG=/usr/local/pgsql/bin/pg_config \
-C pgxn/neon \
-s install
#########################################################################################
#
# Compile and run the Neon-specific `compute_ctl` binary
#
#########################################################################################
FROM 369495373322.dkr.ecr.eu-central-1.amazonaws.com/rust:$TAG AS compute-tools
USER nonroot
# Copy entire project to get Cargo.* files with proper dependencies for the whole project
COPY --chown=nonroot . .
RUN cd compute_tools && cargo build --locked --profile release-line-debug-size-lto
#########################################################################################
#
# Clean up postgres folder before inclusion
#
#########################################################################################
FROM neon-pg-ext-build AS postgres-cleanup-layer
COPY --from=neon-pg-ext-build /usr/local/pgsql /usr/local/pgsql
# Remove binaries from /bin/ that we won't use (or would manually copy & install otherwise)
RUN cd /usr/local/pgsql/bin && rm ecpg raster2pgsql shp2pgsql pgtopo_export pgtopo_import pgsql2shp
# Remove headers that we won't need anymore - we've completed installation of all extensions
RUN rm -r /usr/local/pgsql/include
# Remove static postgresql libraries - all compilation is finished, so we
# can now remove these files - they must be included in other binaries by now
# if they were to be used by other libraries.
RUN rm /usr/local/pgsql/lib/lib*.a
#########################################################################################
#
# Final layer
# Put it all together into the final image
#
#########################################################################################
FROM debian:bullseye-slim
# Add user postgres
RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \
echo "postgres:test_console_pass" | chpasswd && \
mkdir /var/db/postgres/compute && mkdir /var/db/postgres/specs && \
chown -R postgres:postgres /var/db/postgres && \
chmod 0750 /var/db/postgres/compute && \
echo '/usr/local/lib' >> /etc/ld.so.conf && /sbin/ldconfig
COPY --from=postgres-cleanup-layer --chown=postgres /usr/local/pgsql /usr/local
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
# Install:
# libreadline8 for psql
# libossp-uuid16 for extension ossp-uuid
# libgeos, libgdal, libproj and libprotobuf-c1 for PostGIS
RUN apt update && \
apt install --no-install-recommends -y \
libreadline8 \
libossp-uuid16 \
libgeos-c1v5 \
libgdal28 \
libproj19 \
libprotobuf-c1 \
gdb && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
USER postgres
ENTRYPOINT ["/usr/local/bin/compute_ctl"]

View File

@@ -1,8 +1,9 @@
# First transient image to build compute_tools binaries
# NB: keep in sync with rust image version in .github/workflows/build_and_test.yml
ARG REPOSITORY=369495373322.dkr.ecr.eu-central-1.amazonaws.com
ARG REPOSITORY=neondatabase
ARG IMAGE=rust
ARG TAG=pinned
ARG BUILD_TAG
FROM $REPOSITORY/$IMAGE:$TAG AS rust-build
WORKDIR /home/nonroot
@@ -16,6 +17,8 @@ ENV CACHEPOT_S3_KEY_PREFIX=cachepot
ARG CACHEPOT_BUCKET=neon-github-dev
#ARG AWS_ACCESS_KEY_ID
#ARG AWS_SECRET_ACCESS_KEY
ARG BUILD_TAG
ENV BUILD_TAG=$BUILD_TAG
COPY . .

View File

@@ -133,12 +133,34 @@ neon-pg-ext-%: postgres-%
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile install
+@echo "Compiling neon_utils $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-utils-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile install
+@echo "Compiling hnsw $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/hnsw-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/hnsw-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/hnsw/Makefile install
.PHONY: neon-pg-ext-clean-%
neon-pg-ext-clean-%:
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/pgxn/neon-$* -f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/pgxn/neon_walredo-$* -f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/pgxn/neon_test_utils-$* -f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/hnsw-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/hnsw/Makefile clean
.PHONY: neon-pg-ext
neon-pg-ext: \

View File

@@ -1,3 +1,5 @@
[![Neon](https://user-images.githubusercontent.com/13738772/236813940-dcfdcb5b-69d3-449b-a686-013febe834d4.png)](https://neon.tech)
# Neon
Neon is a serverless open-source alternative to AWS Aurora Postgres. It separates storage and compute and substitutes the PostgreSQL storage layer by redistributing data across a cluster of nodes.
@@ -15,7 +17,7 @@ The Neon storage engine consists of two major components:
- Pageserver. Scalable storage backend for the compute nodes.
- Safekeepers. The safekeepers form a redundant WAL service that received WAL from the compute node, and stores it durably until it has been processed by the pageserver and uploaded to cloud storage.
See developer documentation in [/docs/SUMMARY.md](/docs/SUMMARY.md) for more information.
See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more information.
## Running local installation
@@ -26,14 +28,22 @@ See developer documentation in [/docs/SUMMARY.md](/docs/SUMMARY.md) for more inf
* On Ubuntu or Debian, this set of packages should be sufficient to build the code:
```bash
apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \
libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler
libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler \
libcurl4-openssl-dev
```
* On Fedora, these packages are needed:
```bash
dnf install flex bison readline-devel zlib-devel openssl-devel \
libseccomp-devel perl clang cmake postgresql postgresql-contrib protobuf-compiler \
protobuf-devel
protobuf-devel libcurl-devel
```
* On Arch based systems, these packages are needed:
```bash
pacman -S base-devel readline zlib libseccomp openssl clang \
postgresql-libs cmake postgresql protobuf curl
```
Building Neon requires 3.15+ version of `protoc` (protobuf-compiler). If your distribution provides an older version, you can install a newer version from [here](https://github.com/protocolbuffers/protobuf/releases).
2. [Install Rust](https://www.rust-lang.org/tools/install)
```
@@ -41,11 +51,14 @@ dnf install flex bison readline-devel zlib-devel openssl-devel \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
#### Installing dependencies on OSX (12.3.1)
#### Installing dependencies on macOS (12.3.1)
1. Install XCode and dependencies
```
xcode-select --install
brew install protobuf openssl flex bison
# add openssl to PATH, required for ed25519 keys generation in neon_local
echo 'export PATH="$(brew --prefix openssl)/bin:$PATH"' >> ~/.zshrc
```
2. [Install Rust](https://www.rust-lang.org/tools/install)
@@ -83,9 +96,10 @@ cd neon
# The preferred and default is to make a debug build. This will create a
# demonstrably slower build than a release build. For a release build,
# use "BUILD_TYPE=release make -j`nproc`"
# use "BUILD_TYPE=release make -j`nproc` -s"
# Remove -s for the verbose build log
make -j`nproc`
make -j`nproc` -s
```
#### Building on OSX
@@ -99,16 +113,17 @@ cd neon
# The preferred and default is to make a debug build. This will create a
# demonstrably slower build than a release build. For a release build,
# use "BUILD_TYPE=release make -j`sysctl -n hw.logicalcpu`"
# use "BUILD_TYPE=release make -j`sysctl -n hw.logicalcpu` -s"
# Remove -s for the verbose build log
make -j`sysctl -n hw.logicalcpu`
make -j`sysctl -n hw.logicalcpu` -s
```
#### Dependency installation notes
To run the `psql` client, install the `postgresql-client` package or modify `PATH` and `LD_LIBRARY_PATH` to include `pg_install/bin` and `pg_install/lib`, respectively.
To run the integration tests or Python scripts (not required to use the code), install
Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (requires [poetry](https://python-poetry.org/)) in the project directory.
Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (requires [poetry>=1.3](https://python-poetry.org/)) in the project directory.
#### Running neon database
@@ -116,34 +131,33 @@ Python (3.9 or higher), and install python3 packages using `./scripts/pysync` (r
```sh
# Create repository in .neon with proper paths to binaries and data
# Later that would be responsibility of a package install script
> ./target/debug/neon_local init
Starting pageserver at '127.0.0.1:64000' in '.neon'.
> cargo neon init
Initializing pageserver node 1 at '127.0.0.1:64000' in ".neon"
# start pageserver, safekeeper, and broker for their intercommunication
> ./target/debug/neon_local start
Starting neon broker at 127.0.0.1:50051
> cargo neon start
Starting neon broker at 127.0.0.1:50051.
storage_broker started, pid: 2918372
Starting pageserver at '127.0.0.1:64000' in '.neon'.
Starting pageserver node 1 at '127.0.0.1:64000' in ".neon".
pageserver started, pid: 2918386
Starting safekeeper at '127.0.0.1:5454' in '.neon/safekeepers/sk1'.
safekeeper 1 started, pid: 2918437
# create initial tenant and use it as a default for every future neon_local invocation
> ./target/debug/neon_local tenant create --set-default
> cargo neon tenant create --set-default
tenant 9ef87a5bf0d92544f6fafeeb3239695c successfully created on the pageserver
Created an initial timeline 'de200bd42b49cc1814412c7e592dd6e9' at Lsn 0/16B5A50 for tenant: 9ef87a5bf0d92544f6fafeeb3239695c
Setting tenant 9ef87a5bf0d92544f6fafeeb3239695c as a default one
# start postgres compute node
> ./target/debug/neon_local pg start main
Starting new postgres (v14) main on timeline de200bd42b49cc1814412c7e592dd6e9 ...
Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/main port=55432
Starting postgres node at 'host=127.0.0.1 port=55432 user=cloud_admin dbname=postgres'
> cargo neon endpoint start main
Starting new endpoint main (PostgreSQL v14) on timeline de200bd42b49cc1814412c7e592dd6e9 ...
Starting postgres at 'postgresql://cloud_admin@127.0.0.1:55432/postgres'
# check list of running postgres instances
> ./target/debug/neon_local pg list
NODE ADDRESS TIMELINE BRANCH NAME LSN STATUS
main 127.0.0.1:55432 de200bd42b49cc1814412c7e592dd6e9 main 0/16B5BA8 running
> cargo neon endpoint list
ENDPOINT ADDRESS TIMELINE BRANCH NAME LSN STATUS
main 127.0.0.1:55432 de200bd42b49cc1814412c7e592dd6e9 main 0/16B5BA8 running
```
2. Now, it is possible to connect to postgres and run some queries:
@@ -163,29 +177,28 @@ postgres=# select * from t;
3. And create branches and run postgres on them:
```sh
# create branch named migration_check
> ./target/debug/neon_local timeline branch --branch-name migration_check
> cargo neon timeline branch --branch-name migration_check
Created timeline 'b3b863fa45fa9e57e615f9f2d944e601' at Lsn 0/16F9A00 for tenant: 9ef87a5bf0d92544f6fafeeb3239695c. Ancestor timeline: 'main'
# check branches tree
> ./target/debug/neon_local timeline list
> cargo neon timeline list
(L) main [de200bd42b49cc1814412c7e592dd6e9]
(L) ┗━ @0/16F9A00: migration_check [b3b863fa45fa9e57e615f9f2d944e601]
# start postgres on that branch
> ./target/debug/neon_local pg start migration_check --branch-name migration_check
Starting new postgres migration_check on timeline b3b863fa45fa9e57e615f9f2d944e601 ...
Extracting base backup to create postgres instance: path=.neon/pgdatadirs/tenants/9ef87a5bf0d92544f6fafeeb3239695c/migration_check port=55433
Starting postgres node at 'host=127.0.0.1 port=55433 user=cloud_admin dbname=postgres'
> cargo neon endpoint start migration_check --branch-name migration_check
Starting new endpoint migration_check (PostgreSQL v14) on timeline b3b863fa45fa9e57e615f9f2d944e601 ...
Starting postgres at 'postgresql://cloud_admin@127.0.0.1:55434/postgres'
# check the new list of running postgres instances
> ./target/debug/neon_local pg list
NODE ADDRESS TIMELINE BRANCH NAME LSN STATUS
> cargo neon endpoint list
ENDPOINT ADDRESS TIMELINE BRANCH NAME LSN STATUS
main 127.0.0.1:55432 de200bd42b49cc1814412c7e592dd6e9 main 0/16F9A38 running
migration_check 127.0.0.1:55433 b3b863fa45fa9e57e615f9f2d944e601 migration_check 0/16F9A70 running
migration_check 127.0.0.1:55434 b3b863fa45fa9e57e615f9f2d944e601 migration_check 0/16F9A70 running
# this new postgres instance will have all the data from 'main' postgres,
# but all modifications would not affect data in original postgres
> psql -p55433 -h 127.0.0.1 -U cloud_admin postgres
> psql -p55434 -h 127.0.0.1 -U cloud_admin postgres
postgres=# select * from t;
key | value
-----+-------
@@ -207,7 +220,7 @@ postgres=# select * from t;
4. If you want to run tests afterward (see below), you must stop all the running of the pageserver, safekeeper, and postgres instances
you have just started. You can terminate them all with one command:
```sh
> ./target/debug/neon_local stop
> cargo neon stop
```
## Running tests
@@ -224,9 +237,9 @@ CARGO_BUILD_FLAGS="--features=testing" make
## Documentation
[/docs/](/docs/) Contains a top-level overview of all available markdown documentation.
[docs](/docs) Contains a top-level overview of all available markdown documentation.
- [/docs/sourcetree.md](/docs/sourcetree.md) contains overview of source tree layout.
- [sourcetree.md](/docs/sourcetree.md) contains overview of source tree layout.
To view your `rustdoc` documentation in a browser, try running `cargo doc --no-deps --open`
@@ -251,6 +264,6 @@ To get more familiar with this aspect, refer to:
## Join the development
- Read `CONTRIBUTING.md` to learn about project code style and practices.
- To get familiar with a source tree layout, use [/docs/sourcetree.md](/docs/sourcetree.md).
- Read [CONTRIBUTING.md](/CONTRIBUTING.md) to learn about project code style and practices.
- To get familiar with a source tree layout, use [sourcetree.md](/docs/sourcetree.md).
- To learn more about PostgreSQL internals, check http://www.interdb.jp/pg/index.html

View File

@@ -11,15 +11,22 @@ clap.workspace = true
futures.workspace = true
hyper = { workspace = true, features = ["full"] }
notify.workspace = true
num_cpus.workspace = true
opentelemetry.workspace = true
postgres.workspace = true
regex.workspace = true
serde.workspace = true
serde_json.workspace = true
tar.workspace = true
reqwest = { workspace = true, features = ["json"] }
tokio = { workspace = true, features = ["rt", "rt-multi-thread"] }
tokio-postgres.workspace = true
tracing.workspace = true
tracing-opentelemetry.workspace = true
tracing-subscriber.workspace = true
tracing-utils.workspace = true
url.workspace = true
compute_api.workspace = true
utils.workspace = true
workspace_hack.workspace = true

View File

@@ -30,96 +30,202 @@
//! -b /usr/local/bin/postgres
//! ```
//!
use std::collections::HashMap;
use std::fs::File;
use std::panic;
use std::path::Path;
use std::process::exit;
use std::sync::{Arc, RwLock};
use std::sync::{mpsc, Arc, Condvar, Mutex};
use std::{thread, time::Duration};
use anyhow::{Context, Result};
use chrono::Utc;
use clap::Arg;
use tracing::{error, info};
use url::Url;
use compute_tools::compute::{ComputeMetrics, ComputeNode, ComputeState, ComputeStatus};
use compute_api::responses::ComputeStatus;
use compute_tools::compute::{ComputeNode, ComputeState, ParsedSpec};
use compute_tools::configurator::launch_configurator;
use compute_tools::http::api::launch_http_server;
use compute_tools::informant::spawn_vm_informant_if_present;
use compute_tools::logger::*;
use compute_tools::monitor::launch_monitor;
use compute_tools::params::*;
use compute_tools::pg_helpers::*;
use compute_tools::spec::*;
use url::Url;
const BUILD_TAG_DEFAULT: &str = "local";
fn main() -> Result<()> {
init_logger(DEFAULT_LOG_LEVEL)?;
init_tracing_and_logging(DEFAULT_LOG_LEVEL)?;
let build_tag = option_env!("BUILD_TAG").unwrap_or(BUILD_TAG_DEFAULT);
info!("build_tag: {build_tag}");
let matches = cli().get_matches();
let http_port = *matches
.get_one::<u16>("http-port")
.expect("http-port is required");
let pgdata = matches
.get_one::<String>("pgdata")
.expect("PGDATA path is required");
let connstr = matches
.get_one::<String>("connstr")
.expect("Postgres connection string is required");
let spec = matches.get_one::<String>("spec");
let spec_json = matches.get_one::<String>("spec");
let spec_path = matches.get_one::<String>("spec-path");
// Extract OpenTelemetry context for the startup actions from the
// TRACEPARENT and TRACESTATE env variables, and attach it to the current
// tracing context.
//
// This is used to propagate the context for the 'start_compute' operation
// from the neon control plane. This allows linking together the wider
// 'start_compute' operation that creates the compute container, with the
// startup actions here within the container.
//
// There is no standard for passing context in env variables, but a lot of
// tools use TRACEPARENT/TRACESTATE, so we use that convention too. See
// https://github.com/open-telemetry/opentelemetry-specification/issues/740
//
// Switch to the startup context here, and exit it once the startup has
// completed and Postgres is up and running.
//
// If this pod is pre-created without binding it to any particular endpoint
// yet, this isn't the right place to enter the startup context. In that
// case, the control plane should pass the tracing context as part of the
// /configure API call.
//
// NOTE: This is supposed to only cover the *startup* actions. Once
// postgres is configured and up-and-running, we exit this span. Any other
// actions that are performed on incoming HTTP requests, for example, are
// performed in separate spans.
//
// XXX: If the pod is restarted, we perform the startup actions in the same
// context as the original startup actions, which probably doesn't make
// sense.
let mut startup_tracing_carrier: HashMap<String, String> = HashMap::new();
if let Ok(val) = std::env::var("TRACEPARENT") {
startup_tracing_carrier.insert("traceparent".to_string(), val);
}
if let Ok(val) = std::env::var("TRACESTATE") {
startup_tracing_carrier.insert("tracestate".to_string(), val);
}
let startup_context_guard = if !startup_tracing_carrier.is_empty() {
use opentelemetry::propagation::TextMapPropagator;
use opentelemetry::sdk::propagation::TraceContextPropagator;
let guard = TraceContextPropagator::new()
.extract(&startup_tracing_carrier)
.attach();
info!("startup tracing context attached");
Some(guard)
} else {
None
};
let compute_id = matches.get_one::<String>("compute-id");
let control_plane_uri = matches.get_one::<String>("control-plane-uri");
// Try to use just 'postgres' if no path is provided
let pgbin = matches.get_one::<String>("pgbin").unwrap();
let spec: ComputeSpec = match spec {
let spec;
let mut live_config_allowed = false;
match spec_json {
// First, try to get cluster spec from the cli argument
Some(json) => serde_json::from_str(json)?,
Some(json) => {
spec = Some(serde_json::from_str(json)?);
}
None => {
// Second, try to read it from the file if path is provided
if let Some(sp) = spec_path {
let path = Path::new(sp);
let file = File::open(path)?;
serde_json::from_reader(file)?
spec = Some(serde_json::from_reader(file)?);
} else if let Some(id) = compute_id {
if let Some(cp_base) = control_plane_uri {
live_config_allowed = true;
spec = match get_spec_from_control_plane(cp_base, id) {
Ok(s) => s,
Err(e) => {
error!("cannot get response from control plane: {}", e);
panic!("neither spec nor confirmation that compute is in the Empty state was received");
}
};
} else {
panic!("must specify both --control-plane-uri and --compute-id or none");
}
} else {
panic!("cluster spec should be provided via --spec or --spec-path argument");
panic!(
"compute spec should be provided by one of the following ways: \
--spec OR --spec-path OR --control-plane-uri and --compute-id"
);
}
}
};
let pageserver_connstr = spec
.cluster
.settings
.find("neon.pageserver_connstring")
.expect("pageserver connstr should be provided");
let tenant = spec
.cluster
.settings
.find("neon.tenant_id")
.expect("tenant id should be provided");
let timeline = spec
.cluster
.settings
.find("neon.timeline_id")
.expect("tenant id should be provided");
let compute_state = ComputeNode {
start_time: Utc::now(),
let mut new_state = ComputeState::new();
let spec_set;
if let Some(spec) = spec {
let pspec = ParsedSpec::try_from(spec).map_err(|msg| anyhow::anyhow!(msg))?;
new_state.pspec = Some(pspec);
spec_set = true;
} else {
spec_set = false;
}
let compute_node = ComputeNode {
connstr: Url::parse(connstr).context("cannot parse connstr as a URL")?,
pgdata: pgdata.to_string(),
pgbin: pgbin.to_string(),
spec,
tenant,
timeline,
pageserver_connstr,
metrics: ComputeMetrics::default(),
state: RwLock::new(ComputeState::new()),
live_config_allowed,
state: Mutex::new(new_state),
state_changed: Condvar::new(),
};
let compute = Arc::new(compute_state);
let compute = Arc::new(compute_node);
// Launch service threads first, so we were able to serve availability
// Launch http service first, so we were able to serve control-plane
// requests, while configuration is still in progress.
let _http_handle = launch_http_server(&compute).expect("cannot launch http endpoint thread");
let _http_handle =
launch_http_server(http_port, &compute).expect("cannot launch http endpoint thread");
if !spec_set {
// No spec provided, hang waiting for it.
info!("no compute spec provided, waiting");
let mut state = compute.state.lock().unwrap();
while state.status != ComputeStatus::ConfigurationPending {
state = compute.state_changed.wait(state).unwrap();
if state.status == ComputeStatus::ConfigurationPending {
info!("got spec, continue configuration");
// Spec is already set by the http server handler.
break;
}
}
}
// We got all we need, update the state.
let mut state = compute.state.lock().unwrap();
// Record for how long we slept waiting for the spec.
state.metrics.wait_for_spec_ms = Utc::now()
.signed_duration_since(state.start_time)
.to_std()
.unwrap()
.as_millis() as u64;
// Reset start time to the actual start of the configuration, so that
// total startup time was properly measured at the end.
state.start_time = Utc::now();
state.status = ComputeStatus::Init;
compute.state_changed.notify_all();
drop(state);
// Launch remaining service threads
let _monitor_handle = launch_monitor(&compute).expect("cannot launch compute monitor thread");
// Also spawn the thread responsible for handling the VM informant -- if it's present
let _vm_informant_handle = spawn_vm_informant_if_present().expect("cannot launch VM informant");
let _configurator_handle =
launch_configurator(&compute).expect("cannot launch configurator thread");
// Start Postgres
let mut delay_exit = false;
@@ -128,7 +234,7 @@ fn main() -> Result<()> {
Ok(pg) => Some(pg),
Err(err) => {
error!("could not start the compute node: {:?}", err);
let mut state = compute.state.write().unwrap();
let mut state = compute.state.lock().unwrap();
state.error = Some(format!("{:?}", err));
state.status = ComputeStatus::Failed;
drop(state);
@@ -140,6 +246,9 @@ fn main() -> Result<()> {
// Wait for the child Postgres process forever. In this state Ctrl+C will
// propagate to Postgres and it will be shut down as well.
if let Some(mut pg) = pg {
// Startup is finished, exit the startup tracing span
drop(startup_context_guard);
let ecode = pg
.wait()
.expect("failed to start waiting on Postgres process");
@@ -147,6 +256,16 @@ fn main() -> Result<()> {
exit_code = ecode.code()
}
// Maybe sync safekeepers again, to speed up next startup
let compute_state = compute.state.lock().unwrap().clone();
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
if matches!(pspec.spec.mode, compute_api::spec::ComputeMode::Primary) {
info!("syncing safekeepers on shutdown");
let storage_auth_token = pspec.storage_auth_token.clone();
let lsn = compute.sync_safekeepers(storage_auth_token)?;
info!("synced safekeepers at lsn {lsn}");
}
if let Err(err) = compute.check_for_core_dumps() {
error!("error while checking for core dumps: {err:?}");
}
@@ -156,9 +275,29 @@ fn main() -> Result<()> {
if delay_exit {
info!("giving control plane 30s to collect the error before shutdown");
thread::sleep(Duration::from_secs(30));
info!("shutting down");
}
// Shutdown trace pipeline gracefully, so that it has a chance to send any
// pending traces before we exit. Shutting down OTEL tracing provider may
// hang for quite some time, see, for example:
// - https://github.com/open-telemetry/opentelemetry-rust/issues/868
// - and our problems with staging https://github.com/neondatabase/cloud/issues/3707#issuecomment-1493983636
//
// Yet, we want computes to shut down fast enough, as we may need a new one
// for the same timeline ASAP. So wait no longer than 2s for the shutdown to
// complete, then just error out and exit the main thread.
info!("shutting down tracing");
let (sender, receiver) = mpsc::channel();
let _ = thread::spawn(move || {
tracing_utils::shutdown_tracing();
sender.send(()).ok()
});
let shutdown_res = receiver.recv_timeout(Duration::from_millis(2000));
if shutdown_res.is_err() {
error!("timed out while shutting down tracing, exiting anyway");
}
info!("shutting down");
exit(exit_code.unwrap_or(1))
}
@@ -167,6 +306,14 @@ fn cli() -> clap::Command {
let version = option_env!("CARGO_PKG_VERSION").unwrap_or("unknown");
clap::Command::new("compute_ctl")
.version(version)
.arg(
Arg::new("http-port")
.long("http-port")
.value_name("HTTP_PORT")
.default_value("3080")
.value_parser(clap::value_parser!(u16))
.required(false),
)
.arg(
Arg::new("connstr")
.short('C')
@@ -200,6 +347,18 @@ fn cli() -> clap::Command {
.long("spec-path")
.value_name("SPEC_PATH"),
)
.arg(
Arg::new("compute-id")
.short('i')
.long("compute-id")
.value_name("COMPUTE_ID"),
)
.arg(
Arg::new("control-plane-uri")
.short('p')
.long("control-plane-uri")
.value_name("CONTROL_PLANE_API_BASE_URI"),
)
}
#[test]

View File

@@ -1,12 +1,28 @@
use anyhow::{anyhow, Result};
use postgres::Client;
use tokio_postgres::NoTls;
use tracing::{error, instrument};
use crate::compute::ComputeNode;
/// Update timestamp in a row in a special service table to check
/// that we can actually write some data in this particular timeline.
/// Create table if it's missing.
#[instrument(skip_all)]
pub fn create_writability_check_data(client: &mut Client) -> Result<()> {
pub async fn check_writability(compute: &ComputeNode) -> Result<()> {
// Connect to the database.
let (client, connection) = tokio_postgres::connect(compute.connstr.as_str(), NoTls).await?;
if client.is_closed() {
return Err(anyhow!("connection to postgres closed"));
}
// The connection object performs the actual communication with the database,
// so spawn it off to run on its own.
tokio::spawn(async move {
if let Err(e) = connection.await {
error!("connection error: {}", e);
}
});
let query = "
CREATE TABLE IF NOT EXISTS health_check (
id serial primary key,
@@ -15,31 +31,15 @@ pub fn create_writability_check_data(client: &mut Client) -> Result<()> {
INSERT INTO health_check VALUES (1, now())
ON CONFLICT (id) DO UPDATE
SET updated_at = now();";
let result = client.simple_query(query)?;
if result.len() < 2 {
return Err(anyhow::format_err!("executed {} queries", result.len()));
}
Ok(())
}
#[instrument(skip_all)]
pub async fn check_writability(compute: &ComputeNode) -> Result<()> {
let (client, connection) = tokio_postgres::connect(compute.connstr.as_str(), NoTls).await?;
if client.is_closed() {
return Err(anyhow!("connection to postgres closed"));
}
tokio::spawn(async move {
if let Err(e) = connection.await {
error!("connection error: {}", e);
}
});
let result = client
.simple_query("UPDATE health_check SET updated_at = now() WHERE id = 1;")
.await?;
if result.len() != 1 {
return Err(anyhow!("statement can't be executed"));
let result = client.simple_query(query).await?;
if result.len() != 2 {
return Err(anyhow::format_err!(
"expected 2 query results, but got {}",
result.len()
));
}
Ok(())
}

View File

@@ -1,78 +1,73 @@
//
// XXX: This starts to be scarry similar to the `PostgresNode` from `control_plane`,
// but there are several things that makes `PostgresNode` usage inconvenient in the
// cloud:
// - it inherits from `LocalEnv`, which contains **all-all** the information about
// a complete service running
// - it uses `PageServerNode` with information about http endpoint, which we do not
// need in the cloud again
// - many tiny pieces like, for example, we do not use `pg_ctl` in the cloud
//
// Thus, to use `PostgresNode` in the cloud, we need to 'mock' a bunch of required
// attributes (not required for the cloud). Yet, it is still tempting to unify these
// `PostgresNode` and `ComputeNode` and use one in both places.
//
// TODO: stabilize `ComputeNode` and think about using it in the `control_plane`.
//
use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::process::{Command, Stdio};
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::RwLock;
use std::str::FromStr;
use std::sync::{Condvar, Mutex};
use anyhow::{Context, Result};
use chrono::{DateTime, Utc};
use postgres::{Client, NoTls};
use serde::{Serialize, Serializer};
use tokio_postgres;
use tracing::{info, instrument, warn};
use utils::id::{TenantId, TimelineId};
use utils::lsn::Lsn;
use compute_api::responses::{ComputeMetrics, ComputeStatus};
use compute_api::spec::{ComputeMode, ComputeSpec};
use crate::checker::create_writability_check_data;
use crate::config;
use crate::pg_helpers::*;
use crate::spec::*;
/// Compute node info shared across several `compute_ctl` threads.
pub struct ComputeNode {
pub start_time: DateTime<Utc>,
// Url type maintains proper escaping
pub connstr: url::Url,
pub pgdata: String,
pub pgbin: String,
pub spec: ComputeSpec,
pub tenant: String,
pub timeline: String,
pub pageserver_connstr: String,
pub metrics: ComputeMetrics,
/// Volatile part of the `ComputeNode` so should be used under `RwLock`
/// to allow HTTP API server to serve status requests, while configuration
/// is in progress.
pub state: RwLock<ComputeState>,
/// We should only allow live re- / configuration of the compute node if
/// it uses 'pull model', i.e. it can go to control-plane and fetch
/// the latest configuration. Otherwise, there could be a case:
/// - we start compute with some spec provided as argument
/// - we push new spec and it does reconfiguration
/// - but then something happens and compute pod / VM is destroyed,
/// so k8s controller starts it again with the **old** spec
/// and the same for empty computes:
/// - we started compute without any spec
/// - we push spec and it does configuration
/// - but then it is restarted without any spec again
pub live_config_allowed: bool,
/// Volatile part of the `ComputeNode`, which should be used under `Mutex`.
/// To allow HTTP API server to serving status requests, while configuration
/// is in progress, lock should be held only for short periods of time to do
/// read/write, not the whole configuration process.
pub state: Mutex<ComputeState>,
/// `Condvar` to allow notifying waiters about state changes.
pub state_changed: Condvar,
}
fn rfc3339_serialize<S>(x: &DateTime<Utc>, s: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
x.to_rfc3339().serialize(s)
}
#[derive(Serialize)]
#[serde(rename_all = "snake_case")]
#[derive(Clone, Debug)]
pub struct ComputeState {
pub start_time: DateTime<Utc>,
pub status: ComputeStatus,
/// Timestamp of the last Postgres activity
#[serde(serialize_with = "rfc3339_serialize")]
pub last_active: DateTime<Utc>,
/// Timestamp of the last Postgres activity. It could be `None` if
/// compute wasn't used since start.
pub last_active: Option<DateTime<Utc>>,
pub error: Option<String>,
pub pspec: Option<ParsedSpec>,
pub metrics: ComputeMetrics,
}
impl ComputeState {
pub fn new() -> Self {
Self {
status: ComputeStatus::Init,
last_active: Utc::now(),
start_time: Utc::now(),
status: ComputeStatus::Empty,
last_active: None,
error: None,
pspec: None,
metrics: ComputeMetrics::default(),
}
}
}
@@ -83,29 +78,148 @@ impl Default for ComputeState {
}
}
#[derive(Serialize, Clone, Copy, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum ComputeStatus {
Init,
Running,
Failed,
#[derive(Clone, Debug)]
pub struct ParsedSpec {
pub spec: ComputeSpec,
pub tenant_id: TenantId,
pub timeline_id: TimelineId,
pub pageserver_connstr: String,
pub storage_auth_token: Option<String>,
}
#[derive(Default, Serialize)]
pub struct ComputeMetrics {
pub sync_safekeepers_ms: AtomicU64,
pub basebackup_ms: AtomicU64,
pub config_ms: AtomicU64,
pub total_startup_ms: AtomicU64,
impl TryFrom<ComputeSpec> for ParsedSpec {
type Error = String;
fn try_from(spec: ComputeSpec) -> Result<Self, String> {
// Extract the options from the spec file that are needed to connect to
// the storage system.
//
// For backwards-compatibility, the top-level fields in the spec file
// may be empty. In that case, we need to dig them from the GUCs in the
// cluster.settings field.
let pageserver_connstr = spec
.pageserver_connstring
.clone()
.or_else(|| spec.cluster.settings.find("neon.pageserver_connstring"))
.ok_or("pageserver connstr should be provided")?;
let storage_auth_token = spec.storage_auth_token.clone();
let tenant_id: TenantId = if let Some(tenant_id) = spec.tenant_id {
tenant_id
} else {
spec.cluster
.settings
.find("neon.tenant_id")
.ok_or("tenant id should be provided")
.map(|s| TenantId::from_str(&s))?
.or(Err("invalid tenant id"))?
};
let timeline_id: TimelineId = if let Some(timeline_id) = spec.timeline_id {
timeline_id
} else {
spec.cluster
.settings
.find("neon.timeline_id")
.ok_or("timeline id should be provided")
.map(|s| TimelineId::from_str(&s))?
.or(Err("invalid timeline id"))?
};
Ok(ParsedSpec {
spec,
pageserver_connstr,
storage_auth_token,
tenant_id,
timeline_id,
})
}
}
/// Create special neon_superuser role, that's a slightly nerfed version of a real superuser
/// that we give to customers
fn create_neon_superuser(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
let roles = spec
.cluster
.roles
.iter()
.map(|r| format!("'{}'", escape_literal(&r.name)))
.collect::<Vec<_>>();
let dbs = spec
.cluster
.databases
.iter()
.map(|db| format!("'{}'", escape_literal(&db.name)))
.collect::<Vec<_>>();
let roles_decl = if roles.is_empty() {
String::from("roles text[] := NULL;")
} else {
format!(
r#"
roles text[] := ARRAY(SELECT rolname
FROM pg_catalog.pg_roles
WHERE rolname IN ({}));"#,
roles.join(", ")
)
};
let database_decl = if dbs.is_empty() {
String::from("dbs text[] := NULL;")
} else {
format!(
r#"
dbs text[] := ARRAY(SELECT datname
FROM pg_catalog.pg_database
WHERE datname IN ({}));"#,
dbs.join(", ")
)
};
// ALL PRIVILEGES grants CREATE, CONNECT, and TEMPORARY on all databases
// (see https://www.postgresql.org/docs/current/ddl-priv.html)
let query = format!(
r#"
DO $$
DECLARE
r text;
{}
{}
BEGIN
IF NOT EXISTS (
SELECT FROM pg_catalog.pg_roles WHERE rolname = 'neon_superuser')
THEN
CREATE ROLE neon_superuser CREATEDB CREATEROLE NOLOGIN IN ROLE pg_read_all_data, pg_write_all_data;
IF array_length(roles, 1) IS NOT NULL THEN
EXECUTE format('GRANT neon_superuser TO %s',
array_to_string(ARRAY(SELECT quote_ident(x) FROM unnest(roles) as x), ', '));
FOREACH r IN ARRAY roles LOOP
EXECUTE format('ALTER ROLE %s CREATEROLE CREATEDB', quote_ident(r));
END LOOP;
END IF;
IF array_length(dbs, 1) IS NOT NULL THEN
EXECUTE format('GRANT ALL PRIVILEGES ON DATABASE %s TO neon_superuser',
array_to_string(ARRAY(SELECT quote_ident(x) FROM unnest(dbs) as x), ', '));
END IF;
END IF;
END
$$;"#,
roles_decl, database_decl,
);
info!("Neon superuser created:\n{}", &query);
client
.simple_query(&query)
.map_err(|e| anyhow::anyhow!(e).context(query))?;
Ok(())
}
impl ComputeNode {
pub fn set_status(&self, status: ComputeStatus) {
self.state.write().unwrap().status = status;
let mut state = self.state.lock().unwrap();
state.status = status;
self.state_changed.notify_all();
}
pub fn get_status(&self) -> ComputeStatus {
self.state.read().unwrap().status
self.state.lock().unwrap().status
}
// Remove `pgdata` directory and create it again with right permissions.
@@ -121,14 +235,26 @@ impl ComputeNode {
// Get basebackup from the libpq connection to pageserver using `connstr` and
// unarchive it to `pgdata` directory overriding all its previous content.
#[instrument(skip(self))]
fn get_basebackup(&self, lsn: &str) -> Result<()> {
#[instrument(skip_all, fields(%lsn))]
fn get_basebackup(&self, compute_state: &ComputeState, lsn: Lsn) -> Result<()> {
let spec = compute_state.pspec.as_ref().expect("spec must be set");
let start_time = Utc::now();
let mut client = Client::connect(&self.pageserver_connstr, NoTls)?;
let mut config = postgres::Config::from_str(&spec.pageserver_connstr)?;
// Use the storage auth token from the config file, if given.
// Note: this overrides any password set in the connection string.
if let Some(storage_auth_token) = &spec.storage_auth_token {
info!("Got storage auth token from spec file");
config.password(storage_auth_token);
} else {
info!("Storage auth token not set");
}
let mut client = config.connect(NoTls)?;
let basebackup_cmd = match lsn {
"0/0" => format!("basebackup {} {}", &self.tenant, &self.timeline), // First start of the compute
_ => format!("basebackup {} {} {}", &self.tenant, &self.timeline, lsn),
Lsn(0) => format!("basebackup {} {}", spec.tenant_id, spec.timeline_id), // First start of the compute
_ => format!("basebackup {} {} {}", spec.tenant_id, spec.timeline_id, lsn),
};
let copyreader = client.copy_out(basebackup_cmd.as_str())?;
@@ -141,27 +267,28 @@ impl ComputeNode {
ar.set_ignore_zeros(true);
ar.unpack(&self.pgdata)?;
self.metrics.basebackup_ms.store(
Utc::now()
.signed_duration_since(start_time)
.to_std()
.unwrap()
.as_millis() as u64,
Ordering::Relaxed,
);
self.state.lock().unwrap().metrics.basebackup_ms = Utc::now()
.signed_duration_since(start_time)
.to_std()
.unwrap()
.as_millis() as u64;
Ok(())
}
// Run `postgres` in a special mode with `--sync-safekeepers` argument
// and return the reported LSN back to the caller.
#[instrument(skip(self))]
fn sync_safekeepers(&self) -> Result<String> {
#[instrument(skip_all)]
pub fn sync_safekeepers(&self, storage_auth_token: Option<String>) -> Result<Lsn> {
let start_time = Utc::now();
let sync_handle = Command::new(&self.pgbin)
.args(["--sync-safekeepers"])
.env("PGDATA", &self.pgdata) // we cannot use -D in this mode
.envs(if let Some(storage_auth_token) = &storage_auth_token {
vec![("NEON_AUTH_TOKEN", storage_auth_token)]
} else {
vec![]
})
.stdout(Stdio::piped())
.spawn()
.expect("postgres --sync-safekeepers failed to start");
@@ -182,63 +309,92 @@ impl ComputeNode {
);
}
self.metrics.sync_safekeepers_ms.store(
Utc::now()
.signed_duration_since(start_time)
.to_std()
.unwrap()
.as_millis() as u64,
Ordering::Relaxed,
);
self.state.lock().unwrap().metrics.sync_safekeepers_ms = Utc::now()
.signed_duration_since(start_time)
.to_std()
.unwrap()
.as_millis() as u64;
let lsn = String::from(String::from_utf8(sync_output.stdout)?.trim());
let lsn = Lsn::from_str(String::from_utf8(sync_output.stdout)?.trim())?;
Ok(lsn)
}
/// Do all the preparations like PGDATA directory creation, configuration,
/// safekeepers sync, basebackup, etc.
#[instrument(skip(self))]
pub fn prepare_pgdata(&self) -> Result<()> {
let spec = &self.spec;
#[instrument(skip_all)]
pub fn prepare_pgdata(&self, compute_state: &ComputeState) -> Result<()> {
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
let spec = &pspec.spec;
let pgdata_path = Path::new(&self.pgdata);
// Remove/create an empty pgdata directory and put configuration there.
self.create_pgdata()?;
config::write_postgres_conf(&pgdata_path.join("postgresql.conf"), spec)?;
config::write_postgres_conf(&pgdata_path.join("postgresql.conf"), &pspec.spec)?;
info!("starting safekeepers syncing");
let lsn = self
.sync_safekeepers()
.with_context(|| "failed to sync safekeepers")?;
info!("safekeepers synced at LSN {}", lsn);
// Syncing safekeepers is only safe with primary nodes: if a primary
// is already connected it will be kicked out, so a secondary (standby)
// cannot sync safekeepers.
let lsn = match spec.mode {
ComputeMode::Primary => {
info!("starting safekeepers syncing");
let lsn = self
.sync_safekeepers(pspec.storage_auth_token.clone())
.with_context(|| "failed to sync safekeepers")?;
info!("safekeepers synced at LSN {}", lsn);
lsn
}
ComputeMode::Static(lsn) => {
info!("Starting read-only node at static LSN {}", lsn);
lsn
}
ComputeMode::Replica => {
info!("Initializing standby from latest Pageserver LSN");
Lsn(0)
}
};
info!(
"getting basebackup@{} from pageserver {}",
lsn, &self.pageserver_connstr
lsn, &pspec.pageserver_connstr
);
self.get_basebackup(&lsn).with_context(|| {
self.get_basebackup(compute_state, lsn).with_context(|| {
format!(
"failed to get basebackup@{} from pageserver {}",
lsn, &self.pageserver_connstr
lsn, &pspec.pageserver_connstr
)
})?;
// Update pg_hba.conf received with basebackup.
update_pg_hba(pgdata_path)?;
match spec.mode {
ComputeMode::Primary => {}
ComputeMode::Replica | ComputeMode::Static(..) => {
add_standby_signal(pgdata_path)?;
}
}
Ok(())
}
/// Start Postgres as a child process and manage DBs/roles.
/// After that this will hang waiting on the postmaster process to exit.
#[instrument(skip(self))]
pub fn start_postgres(&self) -> Result<std::process::Child> {
#[instrument(skip_all)]
pub fn start_postgres(
&self,
storage_auth_token: Option<String>,
) -> Result<std::process::Child> {
let pgdata_path = Path::new(&self.pgdata);
// Run postgres as a child process.
let mut pg = Command::new(&self.pgbin)
.args(["-D", &self.pgdata])
.envs(if let Some(storage_auth_token) = &storage_auth_token {
vec![("NEON_AUTH_TOKEN", storage_auth_token)]
} else {
vec![]
})
.spawn()
.expect("cannot start postgres process");
@@ -247,8 +403,9 @@ impl ComputeNode {
Ok(pg)
}
#[instrument(skip(self))]
pub fn apply_config(&self) -> Result<()> {
/// Do initial configuration of the already started Postgres.
#[instrument(skip_all)]
pub fn apply_config(&self, compute_state: &ComputeState) -> Result<()> {
// If connection fails,
// it may be the old node with `zenith_admin` superuser.
//
@@ -268,6 +425,8 @@ impl ComputeNode {
.map_err(|_| anyhow::anyhow!("invalid connstr"))?;
let mut client = Client::connect(zenith_admin_connstr.as_str(), NoTls)?;
// Disable forwarding so that users don't get a cloud_admin role
client.simple_query("SET neon.forward_ddl = false")?;
client.simple_query("CREATE USER cloud_admin WITH SUPERUSER")?;
client.simple_query("GRANT zenith_admin TO cloud_admin")?;
drop(client);
@@ -278,62 +437,118 @@ impl ComputeNode {
Ok(client) => client,
};
// Disable DDL forwarding because control plane already knows about these roles/databases.
client.simple_query("SET neon.forward_ddl = false")?;
// Proceed with post-startup configuration. Note, that order of operations is important.
handle_roles(&self.spec, &mut client)?;
handle_databases(&self.spec, &mut client)?;
handle_role_deletions(self, &mut client)?;
handle_grants(self, &mut client)?;
create_writability_check_data(&mut client)?;
let spec = &compute_state.pspec.as_ref().expect("spec must be set").spec;
create_neon_superuser(spec, &mut client)?;
handle_roles(spec, &mut client)?;
handle_databases(spec, &mut client)?;
handle_role_deletions(spec, self.connstr.as_str(), &mut client)?;
handle_grants(spec, self.connstr.as_str())?;
handle_extensions(spec, &mut client)?;
// 'Close' connection
drop(client);
Ok(())
}
// We could've wrapped this around `pg_ctl reload`, but right now we don't use
// `pg_ctl` for start / stop, so this just seems much easier to do as we already
// have opened connection to Postgres and superuser access.
#[instrument(skip_all)]
fn pg_reload_conf(&self, client: &mut Client) -> Result<()> {
client.simple_query("SELECT pg_reload_conf()")?;
Ok(())
}
/// Similar to `apply_config()`, but does a bit different sequence of operations,
/// as it's used to reconfigure a previously started and configured Postgres node.
#[instrument(skip_all)]
pub fn reconfigure(&self) -> Result<()> {
let spec = self.state.lock().unwrap().pspec.clone().unwrap().spec;
// Write new config
let pgdata_path = Path::new(&self.pgdata);
config::write_postgres_conf(&pgdata_path.join("postgresql.conf"), &spec)?;
let mut client = Client::connect(self.connstr.as_str(), NoTls)?;
self.pg_reload_conf(&mut client)?;
// Proceed with post-startup configuration. Note, that order of operations is important.
// Disable DDL forwarding because control plane already knows about these roles/databases.
if spec.mode == ComputeMode::Primary {
client.simple_query("SET neon.forward_ddl = false")?;
handle_roles(&spec, &mut client)?;
handle_databases(&spec, &mut client)?;
handle_role_deletions(&spec, self.connstr.as_str(), &mut client)?;
handle_grants(&spec, self.connstr.as_str())?;
handle_extensions(&spec, &mut client)?;
}
// 'Close' connection
drop(client);
let unknown_op = "unknown".to_string();
let op_id = spec.operation_uuid.as_ref().unwrap_or(&unknown_op);
info!(
"finished configuration of compute for project {}",
self.spec.cluster.cluster_id
"finished reconfiguration of compute node for operation {}",
op_id
);
Ok(())
}
#[instrument(skip(self))]
#[instrument(skip_all)]
pub fn start_compute(&self) -> Result<std::process::Child> {
let compute_state = self.state.lock().unwrap().clone();
let pspec = compute_state.pspec.as_ref().expect("spec must be set");
info!(
"starting compute for project {}, operation {}, tenant {}, timeline {}",
self.spec.cluster.cluster_id,
self.spec.operation_uuid.as_ref().unwrap(),
self.tenant,
self.timeline,
pspec.spec.cluster.cluster_id.as_deref().unwrap_or("None"),
pspec.spec.operation_uuid.as_deref().unwrap_or("None"),
pspec.tenant_id,
pspec.timeline_id,
);
self.prepare_pgdata()?;
self.prepare_pgdata(&compute_state)?;
let start_time = Utc::now();
let pg = self.start_postgres(pspec.storage_auth_token.clone())?;
let pg = self.start_postgres()?;
self.apply_config()?;
let config_time = Utc::now();
if pspec.spec.mode == ComputeMode::Primary && !pspec.spec.skip_pg_catalog_updates {
self.apply_config(&compute_state)?;
}
let startup_end_time = Utc::now();
self.metrics.config_ms.store(
startup_end_time
{
let mut state = self.state.lock().unwrap();
state.metrics.start_postgres_ms = config_time
.signed_duration_since(start_time)
.to_std()
.unwrap()
.as_millis() as u64,
Ordering::Relaxed,
);
self.metrics.total_startup_ms.store(
startup_end_time
.signed_duration_since(self.start_time)
.as_millis() as u64;
state.metrics.config_ms = startup_end_time
.signed_duration_since(config_time)
.to_std()
.unwrap()
.as_millis() as u64,
Ordering::Relaxed,
);
.as_millis() as u64;
state.metrics.total_startup_ms = startup_end_time
.signed_duration_since(compute_state.start_time)
.to_std()
.unwrap()
.as_millis() as u64;
}
self.set_status(ComputeStatus::Running);
info!(
"finished configuration of compute for project {}",
pspec.spec.cluster.cluster_id.as_deref().unwrap_or("None")
);
Ok(pg)
}
@@ -400,4 +615,43 @@ impl ComputeNode {
Ok(())
}
/// Select `pg_stat_statements` data and return it as a stringified JSON
pub async fn collect_insights(&self) -> String {
let mut result_rows: Vec<String> = Vec::new();
let connect_result = tokio_postgres::connect(self.connstr.as_str(), NoTls).await;
let (client, connection) = connect_result.unwrap();
tokio::spawn(async move {
if let Err(e) = connection.await {
eprintln!("connection error: {}", e);
}
});
let result = client
.simple_query(
"SELECT
row_to_json(pg_stat_statements)
FROM
pg_stat_statements
WHERE
userid != 'cloud_admin'::regrole::oid
ORDER BY
(mean_exec_time + mean_plan_time) DESC
LIMIT 100",
)
.await;
if let Ok(raw_rows) = result {
for message in raw_rows.iter() {
if let postgres::SimpleQueryMessage::Row(row) = message {
if let Some(json) = row.get(0) {
result_rows.push(json.to_string());
}
}
}
format!("{{\"pg_stat_statements\": [{}]}}", result_rows.join(","))
} else {
"{{\"pg_stat_statements\": []}}".to_string()
}
}
}

View File

@@ -5,8 +5,9 @@ use std::path::Path;
use anyhow::Result;
use crate::pg_helpers::escape_conf_value;
use crate::pg_helpers::PgOptionsSerialize;
use crate::spec::ComputeSpec;
use compute_api::spec::{ComputeMode, ComputeSpec};
/// Check that `line` is inside a text file and put it there if it is not.
/// Create file if it doesn't exist.
@@ -34,18 +35,65 @@ pub fn line_in_file(path: &Path, line: &str) -> Result<bool> {
/// Create or completely rewrite configuration file specified by `path`
pub fn write_postgres_conf(path: &Path, spec: &ComputeSpec) -> Result<()> {
// File::create() destroys the file content if it exists.
let mut postgres_conf = File::create(path)?;
let mut file = File::create(path)?;
write_auto_managed_block(&mut postgres_conf, &spec.cluster.settings.as_pg_settings())?;
Ok(())
}
// Write Postgres config block wrapped with generated comment section
fn write_auto_managed_block(file: &mut File, buf: &str) -> Result<()> {
writeln!(file, "# Managed by compute_ctl: begin")?;
writeln!(file, "{}", buf)?;
writeln!(file, "# Managed by compute_ctl: end")?;
// Write the postgresql.conf content from the spec file as is.
if let Some(conf) = &spec.cluster.postgresql_conf {
writeln!(file, "{}", conf)?;
}
write!(file, "{}", &spec.cluster.settings.as_pg_settings())?;
// Add options for connecting to storage
writeln!(file, "# Neon storage settings")?;
if let Some(s) = &spec.pageserver_connstring {
writeln!(
file,
"neon.pageserver_connstring='{}'",
escape_conf_value(s)
)?;
}
if !spec.safekeeper_connstrings.is_empty() {
writeln!(
file,
"neon.safekeepers='{}'",
escape_conf_value(&spec.safekeeper_connstrings.join(","))
)?;
}
if let Some(s) = &spec.tenant_id {
writeln!(
file,
"neon.tenant_id='{}'",
escape_conf_value(&s.to_string())
)?;
}
if let Some(s) = &spec.timeline_id {
writeln!(
file,
"neon.timeline_id='{}'",
escape_conf_value(&s.to_string())
)?;
}
match spec.mode {
ComputeMode::Primary => {}
ComputeMode::Static(lsn) => {
// hot_standby is 'on' by default, but let's be explicit
writeln!(file, "hot_standby=on")?;
writeln!(file, "recovery_target_lsn='{lsn}'")?;
}
ComputeMode::Replica => {
// hot_standby is 'on' by default, but let's be explicit
writeln!(file, "hot_standby=on")?;
}
}
// If there are any extra options in the 'settings' field, append those
if spec.cluster.settings.is_some() {
writeln!(file, "# Managed by compute_ctl: begin")?;
write!(file, "{}", spec.cluster.settings.as_pg_settings())?;
writeln!(file, "# Managed by compute_ctl: end")?;
}
Ok(())
}

View File

@@ -0,0 +1,54 @@
use std::sync::Arc;
use std::thread;
use anyhow::Result;
use tracing::{error, info, instrument};
use compute_api::responses::ComputeStatus;
use crate::compute::ComputeNode;
#[instrument(skip_all)]
fn configurator_main_loop(compute: &Arc<ComputeNode>) {
info!("waiting for reconfiguration requests");
loop {
let state = compute.state.lock().unwrap();
let mut state = compute.state_changed.wait(state).unwrap();
if state.status == ComputeStatus::ConfigurationPending {
info!("got configuration request");
state.status = ComputeStatus::Configuration;
compute.state_changed.notify_all();
drop(state);
let mut new_status = ComputeStatus::Failed;
if let Err(e) = compute.reconfigure() {
error!("could not configure compute node: {}", e);
} else {
new_status = ComputeStatus::Running;
info!("compute node configured");
}
// XXX: used to test that API is blocking
// std::thread::sleep(std::time::Duration::from_millis(10000));
compute.set_status(new_status);
} else if state.status == ComputeStatus::Failed {
info!("compute node is now in Failed state, exiting");
break;
} else {
info!("woken up for compute status: {:?}, sleeping", state.status);
}
}
}
pub fn launch_configurator(compute: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
let compute = Arc::clone(compute);
Ok(thread::Builder::new()
.name("compute-configurator".into())
.spawn(move || {
configurator_main_loop(&compute);
info!("configurator thread is exited");
})?)
}

View File

@@ -3,37 +3,121 @@ use std::net::SocketAddr;
use std::sync::Arc;
use std::thread;
use crate::compute::{ComputeNode, ComputeState, ParsedSpec};
use compute_api::requests::ConfigurationRequest;
use compute_api::responses::{ComputeStatus, ComputeStatusResponse, GenericAPIError};
use anyhow::Result;
use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Method, Request, Response, Server, StatusCode};
use num_cpus;
use serde_json;
use tokio::task;
use tracing::{error, info};
use tracing_utils::http::OtelName;
use crate::compute::ComputeNode;
fn status_response_from_state(state: &ComputeState) -> ComputeStatusResponse {
ComputeStatusResponse {
start_time: state.start_time,
tenant: state
.pspec
.as_ref()
.map(|pspec| pspec.tenant_id.to_string()),
timeline: state
.pspec
.as_ref()
.map(|pspec| pspec.timeline_id.to_string()),
status: state.status,
last_active: state.last_active,
error: state.error.clone(),
}
}
// Service function to handle all available routes.
async fn routes(req: Request<Body>, compute: Arc<ComputeNode>) -> Response<Body> {
async fn routes(req: Request<Body>, compute: &Arc<ComputeNode>) -> Response<Body> {
//
// NOTE: The URI path is currently included in traces. That's OK because
// it doesn't contain any variable parts or sensitive information. But
// please keep that in mind if you change the routing here.
//
match (req.method(), req.uri().path()) {
// Serialized compute state.
(&Method::GET, "/status") => {
info!("serving /status GET request");
let state = compute.state.read().unwrap();
Response::new(Body::from(serde_json::to_string(&*state).unwrap()))
let state = compute.state.lock().unwrap();
let status_response = status_response_from_state(&state);
Response::new(Body::from(serde_json::to_string(&status_response).unwrap()))
}
// Startup metrics in JSON format. Keep /metrics reserved for a possible
// future use for Prometheus metrics format.
(&Method::GET, "/metrics.json") => {
info!("serving /metrics.json GET request");
Response::new(Body::from(serde_json::to_string(&compute.metrics).unwrap()))
let metrics = compute.state.lock().unwrap().metrics.clone();
Response::new(Body::from(serde_json::to_string(&metrics).unwrap()))
}
// Collect Postgres current usage insights
(&Method::GET, "/insights") => {
info!("serving /insights GET request");
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!("compute is not running, current status: {:?}", status);
error!(msg);
return Response::new(Body::from(msg));
}
let insights = compute.collect_insights().await;
Response::new(Body::from(insights))
}
(&Method::POST, "/check_writability") => {
info!("serving /check_writability POST request");
let res = crate::checker::check_writability(&compute).await;
let status = compute.get_status();
if status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for check_writability request: {:?}",
status
);
error!(msg);
return Response::new(Body::from(msg));
}
let res = crate::checker::check_writability(compute).await;
match res {
Ok(_) => Response::new(Body::from("true")),
Err(e) => Response::new(Body::from(e.to_string())),
Err(e) => {
error!("check_writability failed: {}", e);
Response::new(Body::from(e.to_string()))
}
}
}
(&Method::GET, "/info") => {
let num_cpus = num_cpus::get_physical();
info!("serving /info GET request. num_cpus: {}", num_cpus);
Response::new(Body::from(
serde_json::json!({
"num_cpus": num_cpus,
})
.to_string(),
))
}
// Accept spec in JSON format and request compute configuration. If
// anything goes wrong after we set the compute status to `ConfigurationPending`
// and update compute state with new spec, we basically leave compute
// in the potentially wrong state. That said, it's control-plane's
// responsibility to watch compute state after reconfiguration request
// and to clean restart in case of errors.
(&Method::POST, "/configure") => {
info!("serving /configure POST request");
match handle_configure_request(req, compute).await {
Ok(msg) => Response::new(Body::from(msg)),
Err((msg, code)) => {
error!("error handling /configure request: {msg}");
render_json_error(&msg, code)
}
}
}
@@ -46,17 +130,117 @@ async fn routes(req: Request<Body>, compute: Arc<ComputeNode>) -> Response<Body>
}
}
async fn handle_configure_request(
req: Request<Body>,
compute: &Arc<ComputeNode>,
) -> Result<String, (String, StatusCode)> {
if !compute.live_config_allowed {
return Err((
"live configuration is not allowed for this compute node".to_string(),
StatusCode::PRECONDITION_FAILED,
));
}
let body_bytes = hyper::body::to_bytes(req.into_body()).await.unwrap();
let spec_raw = String::from_utf8(body_bytes.to_vec()).unwrap();
if let Ok(request) = serde_json::from_str::<ConfigurationRequest>(&spec_raw) {
let spec = request.spec;
let parsed_spec = match ParsedSpec::try_from(spec) {
Ok(ps) => ps,
Err(msg) => return Err((msg, StatusCode::PRECONDITION_FAILED)),
};
// XXX: wrap state update under lock in code blocks. Otherwise,
// we will try to `Send` `mut state` into the spawned thread
// bellow, which will cause error:
// ```
// error: future cannot be sent between threads safely
// ```
{
let mut state = compute.state.lock().unwrap();
if state.status != ComputeStatus::Empty && state.status != ComputeStatus::Running {
let msg = format!(
"invalid compute status for configuration request: {:?}",
state.status.clone()
);
return Err((msg, StatusCode::PRECONDITION_FAILED));
}
state.pspec = Some(parsed_spec);
state.status = ComputeStatus::ConfigurationPending;
compute.state_changed.notify_all();
drop(state);
info!("set new spec and notified waiters");
}
// Spawn a blocking thread to wait for compute to become Running.
// This is needed to do not block the main pool of workers and
// be able to serve other requests while some particular request
// is waiting for compute to finish configuration.
let c = compute.clone();
task::spawn_blocking(move || {
let mut state = c.state.lock().unwrap();
while state.status != ComputeStatus::Running {
state = c.state_changed.wait(state).unwrap();
info!(
"waiting for compute to become Running, current status: {:?}",
state.status
);
if state.status == ComputeStatus::Failed {
let err = state.error.as_ref().map_or("unknown error", |x| x);
let msg = format!("compute configuration failed: {:?}", err);
return Err((msg, StatusCode::INTERNAL_SERVER_ERROR));
}
}
Ok(())
})
.await
.unwrap()?;
// Return current compute state if everything went well.
let state = compute.state.lock().unwrap().clone();
let status_response = status_response_from_state(&state);
Ok(serde_json::to_string(&status_response).unwrap())
} else {
Err(("invalid spec".to_string(), StatusCode::BAD_REQUEST))
}
}
fn render_json_error(e: &str, status: StatusCode) -> Response<Body> {
let error = GenericAPIError {
error: e.to_string(),
};
Response::builder()
.status(status)
.body(Body::from(serde_json::to_string(&error).unwrap()))
.unwrap()
}
// Main Hyper HTTP server function that runs it and blocks waiting on it forever.
#[tokio::main]
async fn serve(state: Arc<ComputeNode>) {
let addr = SocketAddr::from(([0, 0, 0, 0], 3080));
async fn serve(port: u16, state: Arc<ComputeNode>) {
let addr = SocketAddr::from(([0, 0, 0, 0], port));
let make_service = make_service_fn(move |_conn| {
let state = state.clone();
async move {
Ok::<_, Infallible>(service_fn(move |req: Request<Body>| {
let state = state.clone();
async move { Ok::<_, Infallible>(routes(req, state).await) }
async move {
Ok::<_, Infallible>(
// NOTE: We include the URI path in the string. It
// doesn't contain any variable parts or sensitive
// information in this API.
tracing_utils::http::tracing_handler(
req,
|req| routes(req, &state),
OtelName::UriPath,
)
.await,
)
}
}))
}
});
@@ -72,10 +256,10 @@ async fn serve(state: Arc<ComputeNode>) {
}
/// Launch a separate Hyper HTTP API server thread and return its `JoinHandle`.
pub fn launch_http_server(state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
pub fn launch_http_server(port: u16, state: &Arc<ComputeNode>) -> Result<thread::JoinHandle<()>> {
let state = Arc::clone(state);
Ok(thread::Builder::new()
.name("http-endpoint".into())
.spawn(move || serve(state))?)
.spawn(move || serve(port, state))?)
}

View File

@@ -10,12 +10,12 @@ paths:
/status:
get:
tags:
- "info"
summary: Get compute node internal status
- Info
summary: Get compute node internal status.
description: ""
operationId: getComputeStatus
responses:
"200":
200:
description: ComputeState
content:
application/json:
@@ -25,35 +25,121 @@ paths:
/metrics.json:
get:
tags:
- "info"
summary: Get compute node startup metrics in JSON format
- Info
summary: Get compute node startup metrics in JSON format.
description: ""
operationId: getComputeMetricsJSON
responses:
"200":
200:
description: ComputeMetrics
content:
application/json:
schema:
$ref: "#/components/schemas/ComputeMetrics"
/insights:
get:
tags:
- Info
summary: Get current compute insights in JSON format.
description: |
Note, that this doesn't include any historical data.
operationId: getComputeInsights
responses:
200:
description: Compute insights
content:
application/json:
schema:
$ref: "#/components/schemas/ComputeInsights"
/info:
get:
tags:
- Info
summary: Get info about the compute pod / VM.
description: ""
operationId: getInfo
responses:
200:
description: Info
content:
application/json:
schema:
$ref: "#/components/schemas/Info"
/check_writability:
post:
tags:
- "check"
summary: Check that we can write new data on this compute
- Check
summary: Check that we can write new data on this compute.
description: ""
operationId: checkComputeWritability
responses:
"200":
200:
description: Check result
content:
text/plain:
schema:
type: string
description: Error text or 'true' if check passed
description: Error text or 'true' if check passed.
example: "true"
/configure:
post:
tags:
- Configure
summary: Perform compute node configuration.
description: |
This is a blocking API endpoint, i.e. it blocks waiting until
compute is finished configuration and is in `Running` state.
Optional non-blocking mode could be added later.
operationId: configureCompute
requestBody:
description: Configuration request.
required: true
content:
application/json:
schema:
type: object
required:
- spec
properties:
spec:
# XXX: I don't want to explain current spec in the OpenAPI format,
# as it could be changed really soon. Consider doing it later.
type: object
responses:
200:
description: Compute configuration finished.
content:
application/json:
schema:
$ref: "#/components/schemas/ComputeState"
400:
description: Provided spec is invalid.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
412:
description: |
It's not possible to do live-configuration of the compute.
It's either in the wrong state, or compute doesn't use pull
mode of configuration.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
500:
description: |
Compute configuration request was processed, but error
occurred. Compute will likely shutdown soon.
content:
application/json:
schema:
$ref: "#/components/schemas/GenericError"
components:
securitySchemes:
JWT:
@@ -64,13 +150,16 @@ components:
schemas:
ComputeMetrics:
type: object
description: Compute startup metrics
description: Compute startup metrics.
required:
- wait_for_spec_ms
- sync_safekeepers_ms
- basebackup_ms
- config_ms
- total_startup_ms
properties:
wait_for_spec_ms:
type: integer
sync_safekeepers_ms:
type: integer
basebackup_ms:
@@ -80,28 +169,80 @@ components:
total_startup_ms:
type: integer
Info:
type: object
description: Information about VM/Pod.
required:
- num_cpus
properties:
num_cpus:
type: integer
ComputeState:
type: object
required:
- start_time
- status
- last_active
properties:
start_time:
type: string
description: |
Time when compute was started. If initially compute was started in the `empty`
state and then provided with valid spec, `start_time` will be reset to the
moment, when spec was received.
example: "2022-10-12T07:20:50.52Z"
status:
$ref: '#/components/schemas/ComputeStatus'
last_active:
type: string
description: The last detected compute activity timestamp in UTC and RFC3339 format
description: |
The last detected compute activity timestamp in UTC and RFC3339 format.
It could be empty if compute was never used by user since start.
example: "2022-10-12T07:20:50.52Z"
error:
type: string
description: Text of the error during compute startup, if any
description: Text of the error during compute startup or reconfiguration, if any.
example: ""
tenant:
type: string
description: Identifier of the current tenant served by compute node, if any.
example: c9269c359e9a199fad1ea0981246a78f
timeline:
type: string
description: Identifier of the current timeline served by compute node, if any.
example: ece7de74d4b8cbe5433a68ce4d1b97b4
ComputeInsights:
type: object
properties:
pg_stat_statements:
description: Contains raw output from pg_stat_statements in JSON format.
type: array
items:
type: object
ComputeStatus:
type: string
enum:
- empty
- init
- failed
- running
- configuration_pending
- configuration
example: running
#
# Errors
#
GenericError:
type: object
required:
- error
properties:
error:
type: string
security:
- JWT: []

View File

@@ -1,50 +0,0 @@
use std::path::Path;
use std::process;
use std::thread;
use std::time::Duration;
use tracing::{info, warn};
use anyhow::{Context, Result};
const VM_INFORMANT_PATH: &str = "/bin/vm-informant";
const RESTART_INFORMANT_AFTER_MILLIS: u64 = 5000;
/// Launch a thread to start the VM informant if it's present (and restart, on failure)
pub fn spawn_vm_informant_if_present() -> Result<Option<thread::JoinHandle<()>>> {
let exists = Path::new(VM_INFORMANT_PATH)
.try_exists()
.context("could not check if path exists")?;
if !exists {
return Ok(None);
}
Ok(Some(
thread::Builder::new()
.name("run-vm-informant".into())
.spawn(move || run_informant())?,
))
}
fn run_informant() -> ! {
let restart_wait = Duration::from_millis(RESTART_INFORMANT_AFTER_MILLIS);
info!("starting VM informant");
loop {
let mut cmd = process::Command::new(VM_INFORMANT_PATH);
// Block on subprocess:
let result = cmd.status();
match result {
Err(e) => warn!("failed to run VM informant at {VM_INFORMANT_PATH:?}: {e}"),
Ok(status) if !status.success() => {
warn!("{VM_INFORMANT_PATH} exited with code {status:?}, retrying")
}
Ok(_) => info!("{VM_INFORMANT_PATH} ended gracefully (unexpectedly). Retrying"),
}
// Wait before retrying
thread::sleep(restart_wait);
}
}

View File

@@ -4,11 +4,11 @@
//!
pub mod checker;
pub mod config;
pub mod configurator;
pub mod http;
#[macro_use]
pub mod logger;
pub mod compute;
pub mod informant;
pub mod monitor;
pub mod params;
pub mod pg_helpers;

View File

@@ -1,21 +1,40 @@
use anyhow::Result;
use tracing_opentelemetry::OpenTelemetryLayer;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::prelude::*;
/// Initialize `env_logger` using either `default_level` or
/// Initialize logging to stderr, and OpenTelemetry tracing and exporter.
///
/// Logging is configured using either `default_log_level` or
/// `RUST_LOG` environment variable as default log level.
pub fn init_logger(default_level: &str) -> Result<()> {
///
/// OpenTelemetry is configured with OTLP/HTTP exporter. It picks up
/// configuration from environment variables. For example, to change the destination,
/// set `OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318`. See
/// `tracing-utils` package description.
///
pub fn init_tracing_and_logging(default_log_level: &str) -> anyhow::Result<()> {
// Initialize Logging
let env_filter = tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new(default_level));
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new(default_log_level));
let fmt_layer = tracing_subscriber::fmt::layer()
.with_ansi(false)
.with_target(false)
.with_writer(std::io::stderr);
// Initialize OpenTelemetry
let otlp_layer =
tracing_utils::init_tracing_without_runtime("compute_ctl").map(OpenTelemetryLayer::new);
// Put it all together
tracing_subscriber::registry()
.with(env_filter)
.with(otlp_layer)
.with(fmt_layer)
.init();
tracing::info!("logging and tracing started");
utils::logging::replace_panic_hook_with_tracing_panic_hook().forget();
Ok(())
}

View File

@@ -46,7 +46,7 @@ fn watch_compute_activity(compute: &ComputeNode) {
AND usename != 'cloud_admin';", // XXX: find a better way to filter other monitors?
&[],
);
let mut last_active = compute.state.read().unwrap().last_active;
let mut last_active = compute.state.lock().unwrap().last_active;
if let Ok(backs) = backends {
let mut idle_backs: Vec<DateTime<Utc>> = vec![];
@@ -74,7 +74,7 @@ fn watch_compute_activity(compute: &ComputeNode) {
// Found non-idle backend, so the last activity is NOW.
// Save it and exit the for loop. Also clear the idle backend
// `state_change` timestamps array as it doesn't matter now.
last_active = Utc::now();
last_active = Some(Utc::now());
idle_backs.clear();
break;
}
@@ -82,15 +82,16 @@ fn watch_compute_activity(compute: &ComputeNode) {
// Get idle backend `state_change` with the max timestamp.
if let Some(last) = idle_backs.iter().max() {
last_active = *last;
last_active = Some(*last);
}
}
// Update the last activity in the shared state if we got a more recent one.
let mut state = compute.state.write().unwrap();
let mut state = compute.state.lock().unwrap();
// NB: `Some(<DateTime>)` is always greater than `None`.
if last_active > state.last_active {
state.last_active = last_active;
debug!("set the last compute activity time to: {}", last_active);
debug!("set the last compute activity time to: {:?}", last_active);
}
}
Err(e) => {

View File

@@ -10,49 +10,34 @@ use std::time::{Duration, Instant};
use anyhow::{bail, Result};
use notify::{RecursiveMode, Watcher};
use postgres::{Client, Transaction};
use serde::Deserialize;
use tracing::{debug, instrument};
use compute_api::spec::{Database, GenericOption, GenericOptions, PgIdent, Role};
const POSTGRES_WAIT_TIMEOUT: Duration = Duration::from_millis(60 * 1000); // milliseconds
/// Rust representation of Postgres role info with only those fields
/// that matter for us.
#[derive(Clone, Deserialize)]
pub struct Role {
pub name: PgIdent,
pub encrypted_password: Option<String>,
pub options: GenericOptions,
/// Escape a string for including it in a SQL literal
pub fn escape_literal(s: &str) -> String {
s.replace('\'', "''").replace('\\', "\\\\")
}
/// Rust representation of Postgres database info with only those fields
/// that matter for us.
#[derive(Clone, Deserialize)]
pub struct Database {
pub name: PgIdent,
pub owner: PgIdent,
pub options: GenericOptions,
/// Escape a string so that it can be used in postgresql.conf.
/// Same as escape_literal, currently.
pub fn escape_conf_value(s: &str) -> String {
s.replace('\'', "''").replace('\\', "\\\\")
}
/// Common type representing both SQL statement params with or without value,
/// like `LOGIN` or `OWNER username` in the `CREATE/ALTER ROLE`, and config
/// options like `wal_level = logical`.
#[derive(Clone, Deserialize)]
pub struct GenericOption {
pub name: String,
pub value: Option<String>,
pub vartype: String,
trait GenericOptionExt {
fn to_pg_option(&self) -> String;
fn to_pg_setting(&self) -> String;
}
/// Optional collection of `GenericOption`'s. Type alias allows us to
/// declare a `trait` on it.
pub type GenericOptions = Option<Vec<GenericOption>>;
impl GenericOption {
impl GenericOptionExt for GenericOption {
/// Represent `GenericOption` as SQL statement parameter.
pub fn to_pg_option(&self) -> String {
fn to_pg_option(&self) -> String {
if let Some(val) = &self.value {
match self.vartype.as_ref() {
"string" => format!("{} '{}'", self.name, val),
"string" => format!("{} '{}'", self.name, escape_literal(val)),
_ => format!("{} {}", self.name, val),
}
} else {
@@ -61,18 +46,11 @@ impl GenericOption {
}
/// Represent `GenericOption` as configuration option.
pub fn to_pg_setting(&self) -> String {
fn to_pg_setting(&self) -> String {
if let Some(val) = &self.value {
let name = match self.name.as_str() {
"safekeepers" => "neon.safekeepers",
"wal_acceptor_reconnect" => "neon.safekeeper_reconnect_timeout",
"wal_acceptor_connection_timeout" => "neon.safekeeper_connection_timeout",
it => it,
};
match self.vartype.as_ref() {
"string" => format!("{} = '{}'", name, val),
_ => format!("{} = {}", name, val),
"string" => format!("{} = '{}'", self.name, escape_conf_value(val)),
_ => format!("{} = {}", self.name, val),
}
} else {
self.name.to_owned()
@@ -107,6 +85,7 @@ impl PgOptionsSerialize for GenericOptions {
.map(|op| op.to_pg_setting())
.collect::<Vec<String>>()
.join("\n")
+ "\n" // newline after last setting
} else {
"".to_string()
}
@@ -115,6 +94,7 @@ impl PgOptionsSerialize for GenericOptions {
pub trait GenericOptionsSearch {
fn find(&self, name: &str) -> Option<String>;
fn find_ref(&self, name: &str) -> Option<&GenericOption>;
}
impl GenericOptionsSearch for GenericOptions {
@@ -124,16 +104,25 @@ impl GenericOptionsSearch for GenericOptions {
let op = ops.iter().find(|s| s.name == name)?;
op.value.clone()
}
/// Lookup option by name, returning ref
fn find_ref(&self, name: &str) -> Option<&GenericOption> {
let ops = self.as_ref()?;
ops.iter().find(|s| s.name == name)
}
}
impl Role {
pub trait RoleExt {
fn to_pg_options(&self) -> String;
}
impl RoleExt for Role {
/// Serialize a list of role parameters into a Postgres-acceptable
/// string of arguments.
pub fn to_pg_options(&self) -> String {
fn to_pg_options(&self) -> String {
// XXX: consider putting LOGIN as a default option somewhere higher, e.g. in control-plane.
// For now, we do not use generic `options` for roles. Once used, add
// `self.options.as_pg_options()` somewhere here.
let mut params: String = "LOGIN".to_string();
let mut params: String = self.options.as_pg_options();
params.push_str(" LOGIN");
if let Some(pass) = &self.encrypted_password {
// Some time ago we supported only md5 and treated all encrypted_password as md5.
@@ -154,21 +143,17 @@ impl Role {
}
}
impl Database {
pub fn new(name: PgIdent, owner: PgIdent) -> Self {
Self {
name,
owner,
options: None,
}
}
pub trait DatabaseExt {
fn to_pg_options(&self) -> String;
}
impl DatabaseExt for Database {
/// Serialize a list of database parameters into a Postgres-acceptable
/// string of arguments.
/// NB: `TEMPLATE` is actually also an identifier, but so far we only need
/// to use `template0` and `template1`, so it is not a problem. Yet in the future
/// it may require a proper quoting too.
pub fn to_pg_options(&self) -> String {
fn to_pg_options(&self) -> String {
let mut params: String = self.options.as_pg_options();
write!(params, " OWNER {}", &self.owner.pg_quote())
.expect("String is documented to not to error during write operations");
@@ -177,10 +162,6 @@ impl Database {
}
}
/// String type alias representing Postgres identifier and
/// intended to be used for DB / role names.
pub type PgIdent = String;
/// Generic trait used to provide quoting / encoding for strings used in the
/// Postgres SQL queries and DATABASE_URL.
pub trait Escaping {
@@ -221,7 +202,11 @@ pub fn get_existing_dbs(client: &mut Client) -> Result<Vec<Database>> {
&[],
)?
.iter()
.map(|row| Database::new(row.get("datname"), row.get("owner")))
.map(|row| Database {
name: row.get("datname"),
owner: row.get("owner"),
options: None,
})
.collect();
Ok(postgres_dbs)
@@ -230,7 +215,7 @@ pub fn get_existing_dbs(client: &mut Client) -> Result<Vec<Database>> {
/// Wait for Postgres to become ready to accept connections. It's ready to
/// accept connections when the state-field in `pgdata/postmaster.pid` says
/// 'ready'.
#[instrument(skip(pg))]
#[instrument(skip_all, fields(pgdata = %pgdata.display()))]
pub fn wait_for_postgres(pg: &mut Child, pgdata: &Path) -> Result<()> {
let pid_path = pgdata.join("postmaster.pid");

View File

@@ -1,52 +1,121 @@
use std::fs::File;
use std::path::Path;
use std::str::FromStr;
use anyhow::Result;
use anyhow::{anyhow, bail, Result};
use postgres::config::Config;
use postgres::{Client, NoTls};
use serde::Deserialize;
use tracing::{info, info_span, instrument, span_enabled, warn, Level};
use reqwest::StatusCode;
use tracing::{error, info, info_span, instrument, span_enabled, warn, Level};
use crate::compute::ComputeNode;
use crate::config;
use crate::params::PG_HBA_ALL_MD5;
use crate::pg_helpers::*;
/// Cluster spec or configuration represented as an optional number of
/// delta operations + final cluster state description.
#[derive(Clone, Deserialize)]
pub struct ComputeSpec {
pub format_version: f32,
pub timestamp: String,
pub operation_uuid: Option<String>,
/// Expected cluster state at the end of transition process.
pub cluster: Cluster,
pub delta_operations: Option<Vec<DeltaOp>>,
use compute_api::responses::{ControlPlaneComputeStatus, ControlPlaneSpecResponse};
use compute_api::spec::{ComputeSpec, Database, PgIdent, Role};
// Do control plane request and return response if any. In case of error it
// returns a bool flag indicating whether it makes sense to retry the request
// and a string with error message.
fn do_control_plane_request(
uri: &str,
jwt: &str,
) -> Result<ControlPlaneSpecResponse, (bool, String)> {
let resp = reqwest::blocking::Client::new()
.get(uri)
.header("Authorization", jwt)
.send()
.map_err(|e| {
(
true,
format!("could not perform spec request to control plane: {}", e),
)
})?;
match resp.status() {
StatusCode::OK => match resp.json::<ControlPlaneSpecResponse>() {
Ok(spec_resp) => Ok(spec_resp),
Err(e) => Err((
true,
format!("could not deserialize control plane response: {}", e),
)),
},
StatusCode::SERVICE_UNAVAILABLE => {
Err((true, "control plane is temporarily unavailable".to_string()))
}
StatusCode::BAD_GATEWAY => {
// We have a problem with intermittent 502 errors now
// https://github.com/neondatabase/cloud/issues/2353
// It's fine to retry GET request in this case.
Err((true, "control plane request failed with 502".to_string()))
}
// Another code, likely 500 or 404, means that compute is unknown to the control plane
// or some internal failure happened. Doesn't make much sense to retry in this case.
_ => Err((
false,
format!(
"unexpected control plane response status code: {}",
resp.status()
),
)),
}
}
/// Cluster state seen from the perspective of the external tools
/// like Rails web console.
#[derive(Clone, Deserialize)]
pub struct Cluster {
pub cluster_id: String,
pub name: String,
pub state: Option<String>,
pub roles: Vec<Role>,
pub databases: Vec<Database>,
pub settings: GenericOptions,
}
/// Request spec from the control-plane by compute_id. If `NEON_CONTROL_PLANE_TOKEN`
/// env variable is set, it will be used for authorization.
pub fn get_spec_from_control_plane(
base_uri: &str,
compute_id: &str,
) -> Result<Option<ComputeSpec>> {
let cp_uri = format!("{base_uri}/management/api/v2/computes/{compute_id}/spec");
let jwt: String = match std::env::var("NEON_CONTROL_PLANE_TOKEN") {
Ok(v) => v,
Err(_) => "".to_string(),
};
let mut attempt = 1;
let mut spec: Result<Option<ComputeSpec>> = Ok(None);
/// Single cluster state changing operation that could not be represented as
/// a static `Cluster` structure. For example:
/// - DROP DATABASE
/// - DROP ROLE
/// - ALTER ROLE name RENAME TO new_name
/// - ALTER DATABASE name RENAME TO new_name
#[derive(Clone, Deserialize)]
pub struct DeltaOp {
pub action: String,
pub name: PgIdent,
pub new_name: Option<PgIdent>,
info!("getting spec from control plane: {}", cp_uri);
// Do 3 attempts to get spec from the control plane using the following logic:
// - network error -> then retry
// - compute id is unknown or any other error -> bail out
// - no spec for compute yet (Empty state) -> return Ok(None)
// - got spec -> return Ok(Some(spec))
while attempt < 4 {
spec = match do_control_plane_request(&cp_uri, &jwt) {
Ok(spec_resp) => match spec_resp.status {
ControlPlaneComputeStatus::Empty => Ok(None),
ControlPlaneComputeStatus::Attached => {
if let Some(spec) = spec_resp.spec {
Ok(Some(spec))
} else {
bail!("compute is attached, but spec is empty")
}
}
},
Err((retry, msg)) => {
if retry {
Err(anyhow!(msg))
} else {
bail!(msg);
}
}
};
if let Err(e) = &spec {
error!("attempt {} to get spec failed with: {}", attempt, e);
} else {
return spec;
}
attempt += 1;
std::thread::sleep(std::time::Duration::from_millis(100));
}
// All attempts failed, return error.
spec
}
/// It takes cluster specification and does the following:
@@ -77,6 +146,21 @@ pub fn update_pg_hba(pgdata_path: &Path) -> Result<()> {
Ok(())
}
/// Create a standby.signal file
pub fn add_standby_signal(pgdata_path: &Path) -> Result<()> {
// XXX: consider making it a part of spec.json
info!("adding standby.signal");
let signalfile = pgdata_path.join("standby.signal");
if !signalfile.exists() {
info!("created standby.signal");
File::create(signalfile)?;
} else {
info!("reused pre-existing standby.signal");
}
Ok(())
}
/// Given a cluster spec json and open transaction it handles roles creation,
/// deletion and update.
#[instrument(skip_all)]
@@ -185,17 +269,13 @@ pub fn handle_roles(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
xact.execute(query.as_str(), &[])?;
}
RoleAction::Create => {
let mut query: String = format!("CREATE ROLE {} ", name.pg_quote());
let mut query: String = format!(
"CREATE ROLE {} CREATEROLE CREATEDB IN ROLE neon_superuser",
name.pg_quote()
);
info!("role create query: '{}'", &query);
query.push_str(&role.to_pg_options());
xact.execute(query.as_str(), &[])?;
let grant_query = format!(
"GRANT pg_read_all_data, pg_write_all_data TO {}",
name.pg_quote()
);
xact.execute(grant_query.as_str(), &[])?;
info!("role grant query: '{}'", &grant_query);
}
}
@@ -221,8 +301,8 @@ pub fn handle_roles(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
/// Reassign all dependent objects and delete requested roles.
#[instrument(skip_all)]
pub fn handle_role_deletions(node: &ComputeNode, client: &mut Client) -> Result<()> {
if let Some(ops) = &node.spec.delta_operations {
pub fn handle_role_deletions(spec: &ComputeSpec, connstr: &str, client: &mut Client) -> Result<()> {
if let Some(ops) = &spec.delta_operations {
// First, reassign all dependent objects to db owners.
info!("reassigning dependent objects of to-be-deleted roles");
@@ -239,7 +319,7 @@ pub fn handle_role_deletions(node: &ComputeNode, client: &mut Client) -> Result<
// Check that role is still present in Postgres, as this could be a
// restart with the same spec after role deletion.
if op.action == "delete_role" && existing_roles.iter().any(|r| r.name == op.name) {
reassign_owned_objects(node, &op.name)?;
reassign_owned_objects(spec, connstr, &op.name)?;
}
}
@@ -263,10 +343,10 @@ pub fn handle_role_deletions(node: &ComputeNode, client: &mut Client) -> Result<
}
// Reassign all owned objects in all databases to the owner of the database.
fn reassign_owned_objects(node: &ComputeNode, role_name: &PgIdent) -> Result<()> {
for db in &node.spec.cluster.databases {
fn reassign_owned_objects(spec: &ComputeSpec, connstr: &str, role_name: &PgIdent) -> Result<()> {
for db in &spec.cluster.databases {
if db.owner != *role_name {
let mut conf = Config::from_str(node.connstr.as_str())?;
let mut conf = Config::from_str(connstr)?;
conf.dbname(&db.name);
let mut client = conf.connect(NoTls)?;
@@ -384,14 +464,19 @@ pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
name.pg_quote(),
db.owner.pg_quote()
);
let _ = info_span!("executing", query).entered();
let _guard = info_span!("executing", query).entered();
client.execute(query.as_str(), &[])?;
}
DatabaseAction::Create => {
let mut query: String = format!("CREATE DATABASE {} ", name.pg_quote());
query.push_str(&db.to_pg_options());
let _ = info_span!("executing", query).entered();
let _guard = info_span!("executing", query).entered();
client.execute(query.as_str(), &[])?;
let grant_query: String = format!(
"GRANT ALL PRIVILEGES ON DATABASE {} TO neon_superuser",
name.pg_quote()
);
client.execute(grant_query.as_str(), &[])?;
}
};
@@ -411,42 +496,14 @@ pub fn handle_databases(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
/// Grant CREATE ON DATABASE to the database owner and do some other alters and grants
/// to allow users creating trusted extensions and re-creating `public` schema, for example.
#[instrument(skip_all)]
pub fn handle_grants(node: &ComputeNode, client: &mut Client) -> Result<()> {
let spec = &node.spec;
pub fn handle_grants(spec: &ComputeSpec, connstr: &str) -> Result<()> {
info!("cluster spec grants:");
// We now have a separate `web_access` role to connect to the database
// via the web interface and proxy link auth. And also we grant a
// read / write all data privilege to every role. So also grant
// create to everyone.
// XXX: later we should stop messing with Postgres ACL in such horrible
// ways.
let roles = spec
.cluster
.roles
.iter()
.map(|r| r.name.pg_quote())
.collect::<Vec<_>>();
for db in &spec.cluster.databases {
let dbname = &db.name;
let query: String = format!(
"GRANT CREATE ON DATABASE {} TO {}",
dbname.pg_quote(),
roles.join(", ")
);
info!("grant query {}", &query);
client.execute(query.as_str(), &[])?;
}
// Do some per-database access adjustments. We'd better do this at db creation time,
// but CREATE DATABASE isn't transactional. So we cannot create db + do some grants
// atomically.
for db in &node.spec.cluster.databases {
let mut conf = Config::from_str(node.connstr.as_str())?;
for db in &spec.cluster.databases {
let mut conf = Config::from_str(connstr)?;
conf.dbname(&db.name);
let mut db_client = conf.connect(NoTls)?;
@@ -512,3 +569,18 @@ pub fn handle_grants(node: &ComputeNode, client: &mut Client) -> Result<()> {
Ok(())
}
/// Create required system extensions
#[instrument(skip_all)]
pub fn handle_extensions(spec: &ComputeSpec, client: &mut Client) -> Result<()> {
if let Some(libs) = spec.cluster.settings.find("shared_preload_libraries") {
if libs.contains("pg_stat_statements") {
// Create extension only if this compute really needs it
let query = "CREATE EXTENSION IF NOT EXISTS pg_stat_statements";
info!("creating system extensions with query: {}", query);
client.simple_query(query)?;
}
}
Ok(())
}

View File

@@ -1,14 +1,13 @@
#[cfg(test)]
mod pg_helpers_tests {
use std::fs::File;
use compute_api::spec::{ComputeSpec, GenericOption, GenericOptions, PgIdent};
use compute_tools::pg_helpers::*;
use compute_tools::spec::ComputeSpec;
#[test]
fn params_serialize() {
let file = File::open("tests/cluster_spec.json").unwrap();
let file = File::open("../libs/compute_api/tests/cluster_spec.json").unwrap();
let spec: ComputeSpec = serde_json::from_reader(file).unwrap();
assert_eq!(
@@ -17,18 +16,41 @@ mod pg_helpers_tests {
);
assert_eq!(
spec.cluster.roles.first().unwrap().to_pg_options(),
"LOGIN PASSWORD 'md56b1d16b78004bbd51fa06af9eda75972'"
" LOGIN PASSWORD 'md56b1d16b78004bbd51fa06af9eda75972'"
);
}
#[test]
fn settings_serialize() {
let file = File::open("tests/cluster_spec.json").unwrap();
let file = File::open("../libs/compute_api/tests/cluster_spec.json").unwrap();
let spec: ComputeSpec = serde_json::from_reader(file).unwrap();
assert_eq!(
spec.cluster.settings.as_pg_settings(),
"fsync = off\nwal_level = replica\nhot_standby = on\nneon.safekeepers = '127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501'\nwal_log_hints = on\nlog_connections = on\nshared_buffers = 32768\nport = 55432\nmax_connections = 100\nmax_wal_senders = 10\nlisten_addresses = '0.0.0.0'\nwal_sender_timeout = 0\npassword_encryption = md5\nmaintenance_work_mem = 65536\nmax_parallel_workers = 8\nmax_worker_processes = 8\nneon.tenant_id = 'b0554b632bd4d547a63b86c3630317e8'\nmax_replication_slots = 10\nneon.timeline_id = '2414a61ffc94e428f14b5758fe308e13'\nshared_preload_libraries = 'neon'\nsynchronous_standby_names = 'walproposer'\nneon.pageserver_connstring = 'host=127.0.0.1 port=6400'"
r#"fsync = off
wal_level = replica
hot_standby = on
neon.safekeepers = '127.0.0.1:6502,127.0.0.1:6503,127.0.0.1:6501'
wal_log_hints = on
log_connections = on
shared_buffers = 32768
port = 55432
max_connections = 100
max_wal_senders = 10
listen_addresses = '0.0.0.0'
wal_sender_timeout = 0
password_encryption = md5
maintenance_work_mem = 65536
max_parallel_workers = 8
max_worker_processes = 8
neon.tenant_id = 'b0554b632bd4d547a63b86c3630317e8'
max_replication_slots = 10
neon.timeline_id = '2414a61ffc94e428f14b5758fe308e13'
shared_preload_libraries = 'neon'
synchronous_standby_names = 'walproposer'
neon.pageserver_connstring = 'host=127.0.0.1 port=6400'
test.escaping = 'here''s a backslash \\ and a quote '' and a double-quote " hooray'
"#
);
}

View File

@@ -15,6 +15,7 @@ postgres.workspace = true
regex.workspace = true
reqwest = { workspace = true, features = ["blocking", "json"] }
serde.workspace = true
serde_json.workspace = true
serde_with.workspace = true
tar.workspace = true
thiserror.workspace = true
@@ -23,9 +24,11 @@ url.workspace = true
# Note: Do not directly depend on pageserver or safekeeper; use pageserver_api or safekeeper_api
# instead, so that recompile times are better.
pageserver_api.workspace = true
postgres_backend.workspace = true
safekeeper_api.workspace = true
postgres_connection.workspace = true
storage_broker.workspace = true
utils.workspace = true
compute_api.workspace = true
workspace_hack.workspace = true

View File

@@ -2,7 +2,8 @@
[pageserver]
listen_pg_addr = '127.0.0.1:64000'
listen_http_addr = '127.0.0.1:9898'
auth_type = 'Trust'
pg_auth_type = 'Trust'
http_auth_type = 'Trust'
[[safekeepers]]
id = 1

View File

@@ -3,7 +3,8 @@
[pageserver]
listen_pg_addr = '127.0.0.1:64000'
listen_http_addr = '127.0.0.1:9898'
auth_type = 'Trust'
pg_auth_type = 'Trust'
http_auth_type = 'Trust'
[[safekeepers]]
id = 1

View File

@@ -180,6 +180,11 @@ pub fn stop_process(immediate: bool, process_name: &str, pid_file: &Path) -> any
}
// Wait until process is gone
wait_until_stopped(process_name, pid)?;
Ok(())
}
pub fn wait_until_stopped(process_name: &str, pid: Pid) -> anyhow::Result<()> {
for retries in 0..RETRIES {
match process_has_stopped(pid) {
Ok(true) => {

View File

@@ -7,7 +7,8 @@
//!
use anyhow::{anyhow, bail, Context, Result};
use clap::{value_parser, Arg, ArgAction, ArgMatches, Command};
use control_plane::compute::ComputeControlPlane;
use compute_api::spec::ComputeMode;
use control_plane::endpoint::ComputeControlPlane;
use control_plane::local_env::LocalEnv;
use control_plane::pageserver::PageServerNode;
use control_plane::safekeeper::SafekeeperNode;
@@ -17,6 +18,7 @@ use pageserver_api::{
DEFAULT_HTTP_LISTEN_ADDR as DEFAULT_PAGESERVER_HTTP_ADDR,
DEFAULT_PG_LISTEN_ADDR as DEFAULT_PAGESERVER_PG_ADDR,
};
use postgres_backend::AuthType;
use safekeeper_api::{
DEFAULT_HTTP_LISTEN_PORT as DEFAULT_SAFEKEEPER_HTTP_PORT,
DEFAULT_PG_LISTEN_PORT as DEFAULT_SAFEKEEPER_PG_PORT,
@@ -30,7 +32,6 @@ use utils::{
auth::{Claims, Scope},
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
lsn::Lsn,
postgres_backend::AuthType,
project_git_version,
};
@@ -40,7 +41,7 @@ const DEFAULT_PAGESERVER_ID: NodeId = NodeId(1);
const DEFAULT_BRANCH_NAME: &str = "main";
project_git_version!(GIT_VERSION);
const DEFAULT_PG_VERSION: &str = "14";
const DEFAULT_PG_VERSION: &str = "15";
fn default_conf() -> String {
format!(
@@ -53,14 +54,15 @@ listen_addr = '{DEFAULT_BROKER_ADDR}'
id = {DEFAULT_PAGESERVER_ID}
listen_pg_addr = '{DEFAULT_PAGESERVER_PG_ADDR}'
listen_http_addr = '{DEFAULT_PAGESERVER_HTTP_ADDR}'
auth_type = '{pageserver_auth_type}'
pg_auth_type = '{trust_auth}'
http_auth_type = '{trust_auth}'
[[safekeepers]]
id = {DEFAULT_SAFEKEEPER_ID}
pg_port = {DEFAULT_SAFEKEEPER_PG_PORT}
http_port = {DEFAULT_SAFEKEEPER_HTTP_PORT}
"#,
pageserver_auth_type = AuthType::Trust,
trust_auth = AuthType::Trust,
)
}
@@ -105,8 +107,9 @@ fn main() -> Result<()> {
"start" => handle_start_all(sub_args, &env),
"stop" => handle_stop_all(sub_args, &env),
"pageserver" => handle_pageserver(sub_args, &env),
"pg" => handle_pg(sub_args, &env),
"safekeeper" => handle_safekeeper(sub_args, &env),
"endpoint" => handle_endpoint(sub_args, &env),
"pg" => bail!("'pg' subcommand has been renamed to 'endpoint'"),
_ => bail!("unexpected subcommand {sub_name}"),
};
@@ -305,7 +308,8 @@ fn handle_init(init_match: &ArgMatches) -> anyhow::Result<LocalEnv> {
let mut env =
LocalEnv::parse_config(&toml_file).context("Failed to create neon configuration")?;
env.init(pg_version)
let force = init_match.get_flag("force");
env.init(pg_version, force)
.context("Failed to initialize neon repository")?;
// Initialize pageserver, create initial tenant and timeline.
@@ -469,10 +473,18 @@ fn handle_timeline(timeline_match: &ArgMatches, env: &mut local_env::LocalEnv) -
let mut cplane = ComputeControlPlane::load(env.clone())?;
println!("Importing timeline into pageserver ...");
pageserver.timeline_import(tenant_id, timeline_id, base, pg_wal, pg_version)?;
println!("Creating node for imported timeline ...");
env.register_branch_mapping(name.to_string(), tenant_id, timeline_id)?;
cplane.new_node(tenant_id, name, timeline_id, None, None, pg_version)?;
println!("Creating endpoint for imported timeline ...");
cplane.new_endpoint(
name,
tenant_id,
timeline_id,
None,
None,
pg_version,
ComputeMode::Primary,
)?;
println!("Done");
}
Some(("branch", branch_match)) => {
@@ -520,10 +532,10 @@ fn handle_timeline(timeline_match: &ArgMatches, env: &mut local_env::LocalEnv) -
Ok(())
}
fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
let (sub_name, sub_args) = match pg_match.subcommand() {
Some(pg_subcommand_data) => pg_subcommand_data,
None => bail!("no pg subcommand provided"),
fn handle_endpoint(ep_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
let (sub_name, sub_args) = match ep_match.subcommand() {
Some(ep_subcommand_data) => ep_subcommand_data,
None => bail!("no endpoint subcommand provided"),
};
let mut cplane = ComputeControlPlane::load(env.clone())?;
@@ -545,7 +557,7 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
table.load_preset(comfy_table::presets::NOTHING);
table.set_header([
"NODE",
"ENDPOINT",
"ADDRESS",
"TIMELINE",
"BRANCH NAME",
@@ -553,39 +565,39 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
"STATUS",
]);
for ((_, node_name), node) in cplane
.nodes
for (endpoint_id, endpoint) in cplane
.endpoints
.iter()
.filter(|((node_tenant_id, _), _)| node_tenant_id == &tenant_id)
.filter(|(_, endpoint)| endpoint.tenant_id == tenant_id)
{
let lsn_str = match node.lsn {
None => {
// -> primary node
// Use the LSN at the end of the timeline.
timeline_infos
.get(&node.timeline_id)
.map(|bi| bi.last_record_lsn.to_string())
.unwrap_or_else(|| "?".to_string())
}
Some(lsn) => {
// -> read-only node
let lsn_str = match endpoint.mode {
ComputeMode::Static(lsn) => {
// -> read-only endpoint
// Use the node's LSN.
lsn.to_string()
}
_ => {
// -> primary endpoint or hot replica
// Use the LSN at the end of the timeline.
timeline_infos
.get(&endpoint.timeline_id)
.map(|bi| bi.last_record_lsn.to_string())
.unwrap_or_else(|| "?".to_string())
}
};
let branch_name = timeline_name_mappings
.get(&TenantTimelineId::new(tenant_id, node.timeline_id))
.get(&TenantTimelineId::new(tenant_id, endpoint.timeline_id))
.map(|name| name.as_str())
.unwrap_or("?");
table.add_row([
node_name.as_str(),
&node.address.to_string(),
&node.timeline_id.to_string(),
endpoint_id.as_str(),
&endpoint.pg_address.to_string(),
&endpoint.timeline_id.to_string(),
branch_name,
lsn_str.as_str(),
node.status(),
endpoint.status(),
]);
}
@@ -596,10 +608,10 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
.get_one::<String>("branch-name")
.map(|s| s.as_str())
.unwrap_or(DEFAULT_BRANCH_NAME);
let node_name = sub_args
.get_one::<String>("node")
.map(|node_name| node_name.to_string())
.unwrap_or_else(|| format!("{branch_name}_node"));
let endpoint_id = sub_args
.get_one::<String>("endpoint_id")
.map(String::to_string)
.unwrap_or_else(|| format!("ep-{branch_name}"));
let lsn = sub_args
.get_one::<String>("lsn")
@@ -610,24 +622,60 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
.get_branch_timeline_id(branch_name, tenant_id)
.ok_or_else(|| anyhow!("Found no timeline id for branch name '{branch_name}'"))?;
let port: Option<u16> = sub_args.get_one::<u16>("port").copied();
let pg_port: Option<u16> = sub_args.get_one::<u16>("pg-port").copied();
let http_port: Option<u16> = sub_args.get_one::<u16>("http-port").copied();
let pg_version = sub_args
.get_one::<u32>("pg-version")
.copied()
.context("Failed to parse postgres version from the argument string")?;
cplane.new_node(tenant_id, &node_name, timeline_id, lsn, port, pg_version)?;
let hot_standby = sub_args
.get_one::<bool>("hot-standby")
.copied()
.unwrap_or(false);
let mode = match (lsn, hot_standby) {
(Some(lsn), false) => ComputeMode::Static(lsn),
(None, true) => ComputeMode::Replica,
(None, false) => ComputeMode::Primary,
(Some(_), true) => anyhow::bail!("cannot specify both lsn and hot-standby"),
};
cplane.new_endpoint(
&endpoint_id,
tenant_id,
timeline_id,
pg_port,
http_port,
pg_version,
mode,
)?;
}
"start" => {
let port: Option<u16> = sub_args.get_one::<u16>("port").copied();
let node_name = sub_args
.get_one::<String>("node")
.ok_or_else(|| anyhow!("No node name was provided to start"))?;
let pg_port: Option<u16> = sub_args.get_one::<u16>("pg-port").copied();
let http_port: Option<u16> = sub_args.get_one::<u16>("http-port").copied();
let endpoint_id = sub_args
.get_one::<String>("endpoint_id")
.ok_or_else(|| anyhow!("No endpoint ID was provided to start"))?;
let node = cplane.nodes.get(&(tenant_id, node_name.to_string()));
// If --safekeepers argument is given, use only the listed safekeeper nodes.
let safekeepers =
if let Some(safekeepers_str) = sub_args.get_one::<String>("safekeepers") {
let mut safekeepers: Vec<NodeId> = Vec::new();
for sk_id in safekeepers_str.split(',').map(str::trim) {
let sk_id = NodeId(u64::from_str(sk_id).map_err(|_| {
anyhow!("invalid node ID \"{sk_id}\" in --safekeepers list")
})?);
safekeepers.push(sk_id);
}
safekeepers
} else {
env.safekeepers.iter().map(|sk| sk.id).collect()
};
let auth_token = if matches!(env.pageserver.auth_type, AuthType::NeonJWT) {
let endpoint = cplane.endpoints.get(endpoint_id.as_str());
let auth_token = if matches!(env.pageserver.pg_auth_type, AuthType::NeonJWT) {
let claims = Claims::new(Some(tenant_id), Scope::Tenant);
Some(env.generate_auth_token(&claims)?)
@@ -635,9 +683,23 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
None
};
if let Some(node) = node {
println!("Starting existing postgres {node_name}...");
node.start(&auth_token)?;
let hot_standby = sub_args
.get_one::<bool>("hot-standby")
.copied()
.unwrap_or(false);
if let Some(endpoint) = endpoint {
match (&endpoint.mode, hot_standby) {
(ComputeMode::Static(_), true) => {
bail!("Cannot start a node in hot standby mode when it is already configured as a static replica")
}
(ComputeMode::Primary, true) => {
bail!("Cannot start a node as a hot standby replica, it is already configured as primary node")
}
_ => {}
}
println!("Starting existing endpoint {endpoint_id}...");
endpoint.start(&auth_token, safekeepers)?;
} else {
let branch_name = sub_args
.get_one::<String>("branch-name")
@@ -657,32 +719,47 @@ fn handle_pg(pg_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<()> {
.get_one::<u32>("pg-version")
.copied()
.context("Failed to `pg-version` from the argument string")?;
let mode = match (lsn, hot_standby) {
(Some(lsn), false) => ComputeMode::Static(lsn),
(None, true) => ComputeMode::Replica,
(None, false) => ComputeMode::Primary,
(Some(_), true) => anyhow::bail!("cannot specify both lsn and hot-standby"),
};
// when used with custom port this results in non obvious behaviour
// port is remembered from first start command, i e
// start --port X
// stop
// start <-- will also use port X even without explicit port argument
println!("Starting new postgres (v{pg_version}) {node_name} on timeline {timeline_id} ...");
println!("Starting new endpoint {endpoint_id} (PostgreSQL v{pg_version}) on timeline {timeline_id} ...");
let node =
cplane.new_node(tenant_id, node_name, timeline_id, lsn, port, pg_version)?;
node.start(&auth_token)?;
let ep = cplane.new_endpoint(
endpoint_id,
tenant_id,
timeline_id,
pg_port,
http_port,
pg_version,
mode,
)?;
ep.start(&auth_token, safekeepers)?;
}
}
"stop" => {
let node_name = sub_args
.get_one::<String>("node")
.ok_or_else(|| anyhow!("No node name was provided to stop"))?;
let endpoint_id = sub_args
.get_one::<String>("endpoint_id")
.ok_or_else(|| anyhow!("No endpoint ID was provided to stop"))?;
let destroy = sub_args.get_flag("destroy");
let node = cplane
.nodes
.get(&(tenant_id, node_name.to_string()))
.with_context(|| format!("postgres {node_name} is not found"))?;
node.stop(destroy)?;
let endpoint = cplane
.endpoints
.get(endpoint_id.as_str())
.with_context(|| format!("postgres endpoint {endpoint_id} is not found"))?;
endpoint.stop(destroy)?;
}
_ => bail!("Unexpected pg subcommand '{sub_name}'"),
_ => bail!("Unexpected endpoint subcommand '{sub_name}'"),
}
Ok(())
@@ -801,7 +878,7 @@ fn handle_safekeeper(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> Resul
}
fn handle_start_all(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> anyhow::Result<()> {
// Postgres nodes are not started automatically
// Endpoints are not started automatically
broker::start_broker_process(env)?;
@@ -835,10 +912,10 @@ fn handle_stop_all(sub_match: &ArgMatches, env: &local_env::LocalEnv) -> Result<
fn try_stop_all(env: &local_env::LocalEnv, immediate: bool) {
let pageserver = PageServerNode::from_env(env);
// Stop all compute nodes
// Stop all endpoints
match ComputeControlPlane::load(env.clone()) {
Ok(cplane) => {
for (_k, node) in cplane.nodes {
for (_k, node) in cplane.endpoints {
if let Err(e) = node.stop(false) {
eprintln!("postgres stop failed: {e:#}");
}
@@ -871,7 +948,9 @@ fn cli() -> Command {
.help("Name of the branch to be created or used as an alias for other services")
.required(false);
let pg_node_arg = Arg::new("node").help("Postgres node name").required(false);
let endpoint_id_arg = Arg::new("endpoint_id")
.help("Postgres endpoint id")
.required(false);
let safekeeper_id_arg = Arg::new("id").help("safekeeper id").required(false);
@@ -892,11 +971,22 @@ fn cli() -> Command {
.value_parser(value_parser!(u32))
.default_value(DEFAULT_PG_VERSION);
let port_arg = Arg::new("port")
.long("port")
let pg_port_arg = Arg::new("pg-port")
.long("pg-port")
.required(false)
.value_parser(value_parser!(u16))
.value_name("port");
.value_name("pg-port");
let http_port_arg = Arg::new("http-port")
.long("http-port")
.required(false)
.value_parser(value_parser!(u16))
.value_name("http-port");
let safekeepers_arg = Arg::new("safekeepers")
.long("safekeepers")
.required(false)
.value_name("safekeepers");
let stop_mode_arg = Arg::new("stop-mode")
.short('m')
@@ -918,6 +1008,19 @@ fn cli() -> Command {
.help("Specify Lsn on the timeline to start from. By default, end of the timeline would be used.")
.required(false);
let hot_standby_arg = Arg::new("hot-standby")
.value_parser(value_parser!(bool))
.long("hot-standby")
.help("If set, the node will be a hot replica on the specified timeline")
.required(false);
let force_arg = Arg::new("force")
.value_parser(value_parser!(bool))
.long("force")
.action(ArgAction::SetTrue)
.help("Force initialization even if the repository is not empty")
.required(false);
Command::new("Neon CLI")
.arg_required_else_help(true)
.version(GIT_VERSION)
@@ -933,6 +1036,7 @@ fn cli() -> Command {
.value_name("config"),
)
.arg(pg_version_arg.clone())
.arg(force_arg)
)
.subcommand(
Command::new("timeline")
@@ -1025,37 +1129,42 @@ fn cli() -> Command {
)
)
.subcommand(
Command::new("pg")
Command::new("endpoint")
.arg_required_else_help(true)
.about("Manage postgres instances")
.subcommand(Command::new("list").arg(tenant_id_arg.clone()))
.subcommand(Command::new("create")
.about("Create a postgres compute node")
.arg(pg_node_arg.clone())
.about("Create a compute endpoint")
.arg(endpoint_id_arg.clone())
.arg(branch_name_arg.clone())
.arg(tenant_id_arg.clone())
.arg(lsn_arg.clone())
.arg(port_arg.clone())
.arg(pg_port_arg.clone())
.arg(http_port_arg.clone())
.arg(
Arg::new("config-only")
.help("Don't do basebackup, create compute node with only config files")
.help("Don't do basebackup, create endpoint directory with only config files")
.long("config-only")
.required(false))
.arg(pg_version_arg.clone())
.arg(hot_standby_arg.clone())
)
.subcommand(Command::new("start")
.about("Start a postgres compute node.\n This command actually creates new node from scratch, but preserves existing config files")
.arg(pg_node_arg.clone())
.about("Start postgres.\n If the endpoint doesn't exist yet, it is created.")
.arg(endpoint_id_arg.clone())
.arg(tenant_id_arg.clone())
.arg(branch_name_arg)
.arg(timeline_id_arg)
.arg(lsn_arg)
.arg(port_arg)
.arg(pg_port_arg)
.arg(http_port_arg)
.arg(pg_version_arg)
.arg(hot_standby_arg)
.arg(safekeepers_arg)
)
.subcommand(
Command::new("stop")
.arg(pg_node_arg)
.arg(endpoint_id_arg)
.arg(tenant_id_arg)
.arg(
Arg::new("destroy")
@@ -1067,6 +1176,13 @@ fn cli() -> Command {
)
)
// Obsolete old name for 'endpoint'. We now just print an error if it's used.
.subcommand(
Command::new("pg")
.hide(true)
.arg(Arg::new("ignore-rest").allow_hyphen_values(true).num_args(0..).required(false))
.trailing_var_arg(true)
)
.subcommand(
Command::new("start")
.about("Start page server and safekeepers")

View File

@@ -1,3 +1,9 @@
//! Code to manage the storage broker
//!
//! In the local test environment, the data for each safekeeper is stored in
//!
//! .neon/safekeepers/<safekeeper id>
//!
use anyhow::Context;
use std::path::PathBuf;

View File

@@ -1,555 +0,0 @@
use std::collections::BTreeMap;
use std::fs::{self, File};
use std::io::Write;
use std::net::SocketAddr;
use std::net::TcpStream;
use std::os::unix::fs::PermissionsExt;
use std::path::PathBuf;
use std::process::{Command, Stdio};
use std::str::FromStr;
use std::sync::Arc;
use std::time::Duration;
use anyhow::{Context, Result};
use utils::{
id::{TenantId, TimelineId},
lsn::Lsn,
postgres_backend::AuthType,
};
use crate::local_env::{LocalEnv, DEFAULT_PG_VERSION};
use crate::pageserver::PageServerNode;
use crate::postgresql_conf::PostgresConf;
//
// ComputeControlPlane
//
pub struct ComputeControlPlane {
base_port: u16,
pageserver: Arc<PageServerNode>,
pub nodes: BTreeMap<(TenantId, String), Arc<PostgresNode>>,
env: LocalEnv,
}
impl ComputeControlPlane {
// Load current nodes with ports from data directories on disk
// Directory structure has the following layout:
// pgdatadirs
// |- tenants
// | |- <tenant_id>
// | | |- <node name>
pub fn load(env: LocalEnv) -> Result<ComputeControlPlane> {
let pageserver = Arc::new(PageServerNode::from_env(&env));
let mut nodes = BTreeMap::default();
let pgdatadirspath = &env.pg_data_dirs_path();
for tenant_dir in fs::read_dir(pgdatadirspath)
.with_context(|| format!("failed to list {}", pgdatadirspath.display()))?
{
let tenant_dir = tenant_dir?;
for timeline_dir in fs::read_dir(tenant_dir.path())
.with_context(|| format!("failed to list {}", tenant_dir.path().display()))?
{
let node = PostgresNode::from_dir_entry(timeline_dir?, &env, &pageserver)?;
nodes.insert((node.tenant_id, node.name.clone()), Arc::new(node));
}
}
Ok(ComputeControlPlane {
base_port: 55431,
pageserver,
nodes,
env,
})
}
fn get_port(&mut self) -> u16 {
1 + self
.nodes
.values()
.map(|node| node.address.port())
.max()
.unwrap_or(self.base_port)
}
pub fn new_node(
&mut self,
tenant_id: TenantId,
name: &str,
timeline_id: TimelineId,
lsn: Option<Lsn>,
port: Option<u16>,
pg_version: u32,
) -> Result<Arc<PostgresNode>> {
let port = port.unwrap_or_else(|| self.get_port());
let node = Arc::new(PostgresNode {
name: name.to_owned(),
address: SocketAddr::new("127.0.0.1".parse().unwrap(), port),
env: self.env.clone(),
pageserver: Arc::clone(&self.pageserver),
is_test: false,
timeline_id,
lsn,
tenant_id,
uses_wal_proposer: false,
pg_version,
});
node.create_pgdata()?;
node.setup_pg_conf(self.env.pageserver.auth_type)?;
self.nodes
.insert((tenant_id, node.name.clone()), Arc::clone(&node));
Ok(node)
}
}
///////////////////////////////////////////////////////////////////////////////
#[derive(Debug)]
pub struct PostgresNode {
pub address: SocketAddr,
name: String,
pub env: LocalEnv,
pageserver: Arc<PageServerNode>,
is_test: bool,
pub timeline_id: TimelineId,
pub lsn: Option<Lsn>, // if it's a read-only node. None for primary
pub tenant_id: TenantId,
uses_wal_proposer: bool,
pg_version: u32,
}
impl PostgresNode {
fn from_dir_entry(
entry: std::fs::DirEntry,
env: &LocalEnv,
pageserver: &Arc<PageServerNode>,
) -> Result<PostgresNode> {
if !entry.file_type()?.is_dir() {
anyhow::bail!(
"PostgresNode::from_dir_entry failed: '{}' is not a directory",
entry.path().display()
);
}
// parse data directory name
let fname = entry.file_name();
let name = fname.to_str().unwrap().to_string();
// Read config file into memory
let cfg_path = entry.path().join("postgresql.conf");
let cfg_path_str = cfg_path.to_string_lossy();
let mut conf_file = File::open(&cfg_path)
.with_context(|| format!("failed to open config file in {}", cfg_path_str))?;
let conf = PostgresConf::read(&mut conf_file)
.with_context(|| format!("failed to read config file in {}", cfg_path_str))?;
// Read a few options from the config file
let context = format!("in config file {}", cfg_path_str);
let port: u16 = conf.parse_field("port", &context)?;
let timeline_id: TimelineId = conf.parse_field("neon.timeline_id", &context)?;
let tenant_id: TenantId = conf.parse_field("neon.tenant_id", &context)?;
let uses_wal_proposer = conf.get("neon.safekeepers").is_some();
// Read postgres version from PG_VERSION file to determine which postgres version binary to use.
// If it doesn't exist, assume broken data directory and use default pg version.
let pg_version_path = entry.path().join("PG_VERSION");
let pg_version_str =
fs::read_to_string(pg_version_path).unwrap_or_else(|_| DEFAULT_PG_VERSION.to_string());
let pg_version = u32::from_str(&pg_version_str)?;
// parse recovery_target_lsn, if any
let recovery_target_lsn: Option<Lsn> =
conf.parse_field_optional("recovery_target_lsn", &context)?;
// ok now
Ok(PostgresNode {
address: SocketAddr::new("127.0.0.1".parse().unwrap(), port),
name,
env: env.clone(),
pageserver: Arc::clone(pageserver),
is_test: false,
timeline_id,
lsn: recovery_target_lsn,
tenant_id,
uses_wal_proposer,
pg_version,
})
}
fn sync_safekeepers(&self, auth_token: &Option<String>, pg_version: u32) -> Result<Lsn> {
let pg_path = self.env.pg_bin_dir(pg_version)?.join("postgres");
let mut cmd = Command::new(pg_path);
cmd.arg("--sync-safekeepers")
.env_clear()
.env(
"LD_LIBRARY_PATH",
self.env.pg_lib_dir(pg_version)?.to_str().unwrap(),
)
.env(
"DYLD_LIBRARY_PATH",
self.env.pg_lib_dir(pg_version)?.to_str().unwrap(),
)
.env("PGDATA", self.pgdata().to_str().unwrap())
.stdout(Stdio::piped())
// Comment this to avoid capturing stderr (useful if command hangs)
.stderr(Stdio::piped());
if let Some(token) = auth_token {
cmd.env("NEON_AUTH_TOKEN", token);
}
let sync_handle = cmd
.spawn()
.expect("postgres --sync-safekeepers failed to start");
let sync_output = sync_handle
.wait_with_output()
.expect("postgres --sync-safekeepers failed");
if !sync_output.status.success() {
anyhow::bail!(
"sync-safekeepers failed: '{}'",
String::from_utf8_lossy(&sync_output.stderr)
);
}
let lsn = Lsn::from_str(std::str::from_utf8(&sync_output.stdout)?.trim())?;
println!("Safekeepers synced on {}", lsn);
Ok(lsn)
}
/// Get basebackup from the pageserver as a tar archive and extract it
/// to the `self.pgdata()` directory.
fn do_basebackup(&self, lsn: Option<Lsn>) -> Result<()> {
println!(
"Extracting base backup to create postgres instance: path={} port={}",
self.pgdata().display(),
self.address.port()
);
let sql = if let Some(lsn) = lsn {
format!("basebackup {} {} {}", self.tenant_id, self.timeline_id, lsn)
} else {
format!("basebackup {} {}", self.tenant_id, self.timeline_id)
};
let mut client = self
.pageserver
.page_server_psql_client()
.context("connecting to page server failed")?;
let copyreader = client
.copy_out(sql.as_str())
.context("page server 'basebackup' command failed")?;
// Read the archive directly from the `CopyOutReader`
//
// Set `ignore_zeros` so that unpack() reads all the Copy data and
// doesn't stop at the end-of-archive marker. Otherwise, if the server
// sends an Error after finishing the tarball, we will not notice it.
let mut ar = tar::Archive::new(copyreader);
ar.set_ignore_zeros(true);
ar.unpack(&self.pgdata())
.context("extracting base backup failed")?;
Ok(())
}
fn create_pgdata(&self) -> Result<()> {
fs::create_dir_all(self.pgdata()).with_context(|| {
format!(
"could not create data directory {}",
self.pgdata().display()
)
})?;
fs::set_permissions(self.pgdata().as_path(), fs::Permissions::from_mode(0o700))
.with_context(|| {
format!(
"could not set permissions in data directory {}",
self.pgdata().display()
)
})
}
// Write postgresql.conf with default configuration
// and PG_VERSION file to the data directory of a new node.
fn setup_pg_conf(&self, auth_type: AuthType) -> Result<()> {
let mut conf = PostgresConf::new();
conf.append("max_wal_senders", "10");
conf.append("wal_log_hints", "off");
conf.append("max_replication_slots", "10");
conf.append("hot_standby", "on");
conf.append("shared_buffers", "1MB");
conf.append("fsync", "off");
conf.append("max_connections", "100");
conf.append("wal_level", "replica");
// wal_sender_timeout is the maximum time to wait for WAL replication.
// It also defines how often the walreciever will send a feedback message to the wal sender.
conf.append("wal_sender_timeout", "5s");
conf.append("listen_addresses", &self.address.ip().to_string());
conf.append("port", &self.address.port().to_string());
conf.append("wal_keep_size", "0");
// walproposer panics when basebackup is invalid, it is pointless to restart in this case.
conf.append("restart_after_crash", "off");
// Configure the node to fetch pages from pageserver
let pageserver_connstr = {
let config = &self.pageserver.pg_connection_config;
let (host, port) = (config.host(), config.port());
// Set up authentication
//
// $NEON_AUTH_TOKEN will be replaced with value from environment
// variable during compute pg startup. It is done this way because
// otherwise user will be able to retrieve the value using SHOW
// command or pg_settings
let password = if let AuthType::NeonJWT = auth_type {
"$NEON_AUTH_TOKEN"
} else {
""
};
// NOTE avoiding spaces in connection string, because it is less error prone if we forward it somewhere.
// Also note that not all parameters are supported here. Because in compute we substitute $NEON_AUTH_TOKEN
// We parse this string and build it back with token from env var, and for simplicity rebuild
// uses only needed variables namely host, port, user, password.
format!("postgresql://no_user:{password}@{host}:{port}")
};
conf.append("shared_preload_libraries", "neon");
conf.append_line("");
conf.append("neon.pageserver_connstring", &pageserver_connstr);
if let AuthType::NeonJWT = auth_type {
conf.append("neon.safekeeper_token_env", "$NEON_AUTH_TOKEN");
}
conf.append("neon.tenant_id", &self.tenant_id.to_string());
conf.append("neon.timeline_id", &self.timeline_id.to_string());
if let Some(lsn) = self.lsn {
conf.append("recovery_target_lsn", &lsn.to_string());
}
conf.append_line("");
// Configure backpressure
// - Replication write lag depends on how fast the walreceiver can process incoming WAL.
// This lag determines latency of get_page_at_lsn. Speed of applying WAL is about 10MB/sec,
// so to avoid expiration of 1 minute timeout, this lag should not be larger than 600MB.
// Actually latency should be much smaller (better if < 1sec). But we assume that recently
// updates pages are not requested from pageserver.
// - Replication flush lag depends on speed of persisting data by checkpointer (creation of
// delta/image layers) and advancing disk_consistent_lsn. Safekeepers are able to
// remove/archive WAL only beyond disk_consistent_lsn. Too large a lag can cause long
// recovery time (in case of pageserver crash) and disk space overflow at safekeepers.
// - Replication apply lag depends on speed of uploading changes to S3 by uploader thread.
// To be able to restore database in case of pageserver node crash, safekeeper should not
// remove WAL beyond this point. Too large lag can cause space exhaustion in safekeepers
// (if they are not able to upload WAL to S3).
conf.append("max_replication_write_lag", "15MB");
conf.append("max_replication_flush_lag", "10GB");
if !self.env.safekeepers.is_empty() {
// Configure the node to connect to the safekeepers
conf.append("synchronous_standby_names", "walproposer");
let safekeepers = self
.env
.safekeepers
.iter()
.map(|sk| format!("localhost:{}", sk.pg_port))
.collect::<Vec<String>>()
.join(",");
conf.append("neon.safekeepers", &safekeepers);
} else {
// We only use setup without safekeepers for tests,
// and don't care about data durability on pageserver,
// so set more relaxed synchronous_commit.
conf.append("synchronous_commit", "remote_write");
// Configure the node to stream WAL directly to the pageserver
// This isn't really a supported configuration, but can be useful for
// testing.
conf.append("synchronous_standby_names", "pageserver");
}
let mut file = File::create(self.pgdata().join("postgresql.conf"))?;
file.write_all(conf.to_string().as_bytes())?;
let mut file = File::create(self.pgdata().join("PG_VERSION"))?;
file.write_all(self.pg_version.to_string().as_bytes())?;
Ok(())
}
fn load_basebackup(&self, auth_token: &Option<String>) -> Result<()> {
let backup_lsn = if let Some(lsn) = self.lsn {
Some(lsn)
} else if self.uses_wal_proposer {
// LSN 0 means that it is bootstrap and we need to download just
// latest data from the pageserver. That is a bit clumsy but whole bootstrap
// procedure evolves quite actively right now, so let's think about it again
// when things would be more stable (TODO).
let lsn = self.sync_safekeepers(auth_token, self.pg_version)?;
if lsn == Lsn(0) {
None
} else {
Some(lsn)
}
} else {
None
};
self.do_basebackup(backup_lsn)?;
Ok(())
}
pub fn pgdata(&self) -> PathBuf {
self.env.pg_data_dir(&self.tenant_id, &self.name)
}
pub fn status(&self) -> &str {
let timeout = Duration::from_millis(300);
let has_pidfile = self.pgdata().join("postmaster.pid").exists();
let can_connect = TcpStream::connect_timeout(&self.address, timeout).is_ok();
match (has_pidfile, can_connect) {
(true, true) => "running",
(false, false) => "stopped",
(true, false) => "crashed",
(false, true) => "running, no pidfile",
}
}
fn pg_ctl(&self, args: &[&str], auth_token: &Option<String>) -> Result<()> {
let pg_ctl_path = self.env.pg_bin_dir(self.pg_version)?.join("pg_ctl");
let mut cmd = Command::new(pg_ctl_path);
cmd.args(
[
&[
"-D",
self.pgdata().to_str().unwrap(),
"-l",
self.pgdata().join("pg.log").to_str().unwrap(),
"-w", //wait till pg_ctl actually does what was asked
],
args,
]
.concat(),
)
.env_clear()
.env(
"LD_LIBRARY_PATH",
self.env.pg_lib_dir(self.pg_version)?.to_str().unwrap(),
)
.env(
"DYLD_LIBRARY_PATH",
self.env.pg_lib_dir(self.pg_version)?.to_str().unwrap(),
);
if let Some(token) = auth_token {
cmd.env("NEON_AUTH_TOKEN", token);
}
let pg_ctl = cmd.output().context("pg_ctl failed")?;
if !pg_ctl.status.success() {
anyhow::bail!(
"pg_ctl failed, exit code: {}, stdout: {}, stderr: {}",
pg_ctl.status,
String::from_utf8_lossy(&pg_ctl.stdout),
String::from_utf8_lossy(&pg_ctl.stderr),
);
}
Ok(())
}
pub fn start(&self, auth_token: &Option<String>) -> Result<()> {
// Bail if the node already running.
if self.status() == "running" {
anyhow::bail!("The node is already running");
}
// 1. We always start compute node from scratch, so
// if old dir exists, preserve 'postgresql.conf' and drop the directory
let postgresql_conf_path = self.pgdata().join("postgresql.conf");
let postgresql_conf = fs::read(&postgresql_conf_path).with_context(|| {
format!(
"failed to read config file in {}",
postgresql_conf_path.to_str().unwrap()
)
})?;
fs::remove_dir_all(self.pgdata())?;
self.create_pgdata()?;
// 2. Bring back config files
fs::write(&postgresql_conf_path, postgresql_conf)?;
// 3. Load basebackup
self.load_basebackup(auth_token)?;
if self.lsn.is_some() {
File::create(self.pgdata().join("standby.signal"))?;
}
// 4. Finally start the compute node postgres
println!("Starting postgres node at '{}'", self.connstr());
self.pg_ctl(&["start"], auth_token)
}
pub fn restart(&self, auth_token: &Option<String>) -> Result<()> {
self.pg_ctl(&["restart"], auth_token)
}
pub fn stop(&self, destroy: bool) -> Result<()> {
// If we are going to destroy data directory,
// use immediate shutdown mode, otherwise,
// shutdown gracefully to leave the data directory sane.
//
// Compute node always starts from scratch, so stop
// without destroy only used for testing and debugging.
//
if destroy {
self.pg_ctl(&["-m", "immediate", "stop"], &None)?;
println!(
"Destroying postgres data directory '{}'",
self.pgdata().to_str().unwrap()
);
fs::remove_dir_all(self.pgdata())?;
} else {
self.pg_ctl(&["stop"], &None)?;
}
Ok(())
}
pub fn connstr(&self) -> String {
format!(
"host={} port={} user={} dbname={}",
self.address.ip(),
self.address.port(),
"cloud_admin",
"postgres"
)
}
// XXX: cache that in control plane
pub fn whoami(&self) -> String {
let output = Command::new("whoami")
.output()
.expect("failed to execute whoami");
assert!(output.status.success(), "whoami failed");
String::from_utf8(output.stdout).unwrap().trim().to_string()
}
}
impl Drop for PostgresNode {
// destructor to clean up state after test is done
// XXX: we may detect failed test by setting some flag in catch_unwind()
// and checking it here. But let just clean datadirs on start.
fn drop(&mut self) {
if self.is_test {
let _ = self.stop(true);
}
}
}

View File

@@ -0,0 +1,637 @@
//! Code to manage compute endpoints
//!
//! In the local test environment, the data for each endpoint is stored in
//!
//! .neon/endpoints/<endpoint id>
//!
//! Some basic information about the endpoint, like the tenant and timeline IDs,
//! are stored in the `endpoint.json` file. The `endpoint.json` file is created
//! when the endpoint is created, and doesn't change afterwards.
//!
//! The endpoint is managed by the `compute_ctl` binary. When an endpoint is
//! started, we launch `compute_ctl` It synchronizes the safekeepers, downloads
//! the basebackup from the pageserver to initialize the the data directory, and
//! finally launches the PostgreSQL process. It watches the PostgreSQL process
//! until it exits.
//!
//! When an endpoint is created, a `postgresql.conf` file is also created in
//! the endpoint's directory. The file can be modified before starting PostgreSQL.
//! However, the `postgresql.conf` file in the endpoint directory is not used directly
//! by PostgreSQL. It is passed to `compute_ctl`, and `compute_ctl` writes another
//! copy of it in the data directory.
//!
//! Directory contents:
//!
//! ```ignore
//! .neon/endpoints/main/
//! compute.log - log output of `compute_ctl` and `postgres`
//! endpoint.json - serialized `EndpointConf` struct
//! postgresql.conf - postgresql settings
//! spec.json - passed to `compute_ctl`
//! pgdata/
//! postgresql.conf - copy of postgresql.conf created by `compute_ctl`
//! zenith.signal
//! <other PostgreSQL files>
//! ```
//!
use std::collections::BTreeMap;
use std::net::SocketAddr;
use std::net::TcpStream;
use std::path::PathBuf;
use std::process::Command;
use std::sync::Arc;
use std::time::Duration;
use anyhow::{anyhow, bail, Context, Result};
use serde::{Deserialize, Serialize};
use serde_with::{serde_as, DisplayFromStr};
use utils::id::{NodeId, TenantId, TimelineId};
use crate::local_env::LocalEnv;
use crate::pageserver::PageServerNode;
use crate::postgresql_conf::PostgresConf;
use compute_api::responses::{ComputeState, ComputeStatus};
use compute_api::spec::{Cluster, ComputeMode, ComputeSpec};
// contents of a endpoint.json file
#[serde_as]
#[derive(Serialize, Deserialize, PartialEq, Eq, Clone, Debug)]
pub struct EndpointConf {
endpoint_id: String,
#[serde_as(as = "DisplayFromStr")]
tenant_id: TenantId,
#[serde_as(as = "DisplayFromStr")]
timeline_id: TimelineId,
mode: ComputeMode,
pg_port: u16,
http_port: u16,
pg_version: u32,
skip_pg_catalog_updates: bool,
}
//
// ComputeControlPlane
//
pub struct ComputeControlPlane {
base_port: u16,
// endpoint ID is the key
pub endpoints: BTreeMap<String, Arc<Endpoint>>,
env: LocalEnv,
pageserver: Arc<PageServerNode>,
}
impl ComputeControlPlane {
// Load current endpoints from the endpoints/ subdirectories
pub fn load(env: LocalEnv) -> Result<ComputeControlPlane> {
let pageserver = Arc::new(PageServerNode::from_env(&env));
let mut endpoints = BTreeMap::default();
for endpoint_dir in std::fs::read_dir(env.endpoints_path())
.with_context(|| format!("failed to list {}", env.endpoints_path().display()))?
{
let ep = Endpoint::from_dir_entry(endpoint_dir?, &env, &pageserver)?;
endpoints.insert(ep.endpoint_id.clone(), Arc::new(ep));
}
Ok(ComputeControlPlane {
base_port: 55431,
endpoints,
env,
pageserver,
})
}
fn get_port(&mut self) -> u16 {
1 + self
.endpoints
.values()
.map(|ep| std::cmp::max(ep.pg_address.port(), ep.http_address.port()))
.max()
.unwrap_or(self.base_port)
}
#[allow(clippy::too_many_arguments)]
pub fn new_endpoint(
&mut self,
endpoint_id: &str,
tenant_id: TenantId,
timeline_id: TimelineId,
pg_port: Option<u16>,
http_port: Option<u16>,
pg_version: u32,
mode: ComputeMode,
) -> Result<Arc<Endpoint>> {
let pg_port = pg_port.unwrap_or_else(|| self.get_port());
let http_port = http_port.unwrap_or_else(|| self.get_port() + 1);
let ep = Arc::new(Endpoint {
endpoint_id: endpoint_id.to_owned(),
pg_address: SocketAddr::new("127.0.0.1".parse().unwrap(), pg_port),
http_address: SocketAddr::new("127.0.0.1".parse().unwrap(), http_port),
env: self.env.clone(),
pageserver: Arc::clone(&self.pageserver),
timeline_id,
mode,
tenant_id,
pg_version,
skip_pg_catalog_updates: false,
});
ep.create_endpoint_dir()?;
std::fs::write(
ep.endpoint_path().join("endpoint.json"),
serde_json::to_string_pretty(&EndpointConf {
endpoint_id: endpoint_id.to_string(),
tenant_id,
timeline_id,
mode,
http_port,
pg_port,
pg_version,
skip_pg_catalog_updates: false,
})?,
)?;
std::fs::write(
ep.endpoint_path().join("postgresql.conf"),
ep.setup_pg_conf()?.to_string(),
)?;
self.endpoints
.insert(ep.endpoint_id.clone(), Arc::clone(&ep));
Ok(ep)
}
}
///////////////////////////////////////////////////////////////////////////////
#[derive(Debug)]
pub struct Endpoint {
/// used as the directory name
endpoint_id: String,
pub tenant_id: TenantId,
pub timeline_id: TimelineId,
pub mode: ComputeMode,
// port and address of the Postgres server and `compute_ctl`'s HTTP API
pub pg_address: SocketAddr,
pub http_address: SocketAddr,
// postgres major version in the format: 14, 15, etc.
pg_version: u32,
// These are not part of the endpoint as such, but the environment
// the endpoint runs in.
pub env: LocalEnv,
pageserver: Arc<PageServerNode>,
// Optimizations
skip_pg_catalog_updates: bool,
}
impl Endpoint {
fn from_dir_entry(
entry: std::fs::DirEntry,
env: &LocalEnv,
pageserver: &Arc<PageServerNode>,
) -> Result<Endpoint> {
if !entry.file_type()?.is_dir() {
anyhow::bail!(
"Endpoint::from_dir_entry failed: '{}' is not a directory",
entry.path().display()
);
}
// parse data directory name
let fname = entry.file_name();
let endpoint_id = fname.to_str().unwrap().to_string();
// Read the endpoint.json file
let conf: EndpointConf =
serde_json::from_slice(&std::fs::read(entry.path().join("endpoint.json"))?)?;
Ok(Endpoint {
pg_address: SocketAddr::new("127.0.0.1".parse().unwrap(), conf.pg_port),
http_address: SocketAddr::new("127.0.0.1".parse().unwrap(), conf.http_port),
endpoint_id,
env: env.clone(),
pageserver: Arc::clone(pageserver),
timeline_id: conf.timeline_id,
mode: conf.mode,
tenant_id: conf.tenant_id,
pg_version: conf.pg_version,
skip_pg_catalog_updates: conf.skip_pg_catalog_updates,
})
}
fn create_endpoint_dir(&self) -> Result<()> {
std::fs::create_dir_all(self.endpoint_path()).with_context(|| {
format!(
"could not create endpoint directory {}",
self.endpoint_path().display()
)
})
}
// Generate postgresql.conf with default configuration
fn setup_pg_conf(&self) -> Result<PostgresConf> {
let mut conf = PostgresConf::new();
conf.append("max_wal_senders", "10");
conf.append("wal_log_hints", "off");
conf.append("max_replication_slots", "10");
conf.append("hot_standby", "on");
conf.append("shared_buffers", "1MB");
conf.append("fsync", "off");
conf.append("max_connections", "100");
conf.append("wal_level", "replica");
// wal_sender_timeout is the maximum time to wait for WAL replication.
// It also defines how often the walreciever will send a feedback message to the wal sender.
conf.append("wal_sender_timeout", "5s");
conf.append("listen_addresses", &self.pg_address.ip().to_string());
conf.append("port", &self.pg_address.port().to_string());
conf.append("wal_keep_size", "0");
// walproposer panics when basebackup is invalid, it is pointless to restart in this case.
conf.append("restart_after_crash", "off");
// Load the 'neon' extension
conf.append("shared_preload_libraries", "neon");
conf.append_line("");
// Replication-related configurations, such as WAL sending
match &self.mode {
ComputeMode::Primary => {
// Configure backpressure
// - Replication write lag depends on how fast the walreceiver can process incoming WAL.
// This lag determines latency of get_page_at_lsn. Speed of applying WAL is about 10MB/sec,
// so to avoid expiration of 1 minute timeout, this lag should not be larger than 600MB.
// Actually latency should be much smaller (better if < 1sec). But we assume that recently
// updates pages are not requested from pageserver.
// - Replication flush lag depends on speed of persisting data by checkpointer (creation of
// delta/image layers) and advancing disk_consistent_lsn. Safekeepers are able to
// remove/archive WAL only beyond disk_consistent_lsn. Too large a lag can cause long
// recovery time (in case of pageserver crash) and disk space overflow at safekeepers.
// - Replication apply lag depends on speed of uploading changes to S3 by uploader thread.
// To be able to restore database in case of pageserver node crash, safekeeper should not
// remove WAL beyond this point. Too large lag can cause space exhaustion in safekeepers
// (if they are not able to upload WAL to S3).
conf.append("max_replication_write_lag", "15MB");
conf.append("max_replication_flush_lag", "10GB");
if !self.env.safekeepers.is_empty() {
// Configure Postgres to connect to the safekeepers
conf.append("synchronous_standby_names", "walproposer");
let safekeepers = self
.env
.safekeepers
.iter()
.map(|sk| format!("localhost:{}", sk.pg_port))
.collect::<Vec<String>>()
.join(",");
conf.append("neon.safekeepers", &safekeepers);
} else {
// We only use setup without safekeepers for tests,
// and don't care about data durability on pageserver,
// so set more relaxed synchronous_commit.
conf.append("synchronous_commit", "remote_write");
// Configure the node to stream WAL directly to the pageserver
// This isn't really a supported configuration, but can be useful for
// testing.
conf.append("synchronous_standby_names", "pageserver");
}
}
ComputeMode::Static(lsn) => {
conf.append("recovery_target_lsn", &lsn.to_string());
}
ComputeMode::Replica => {
assert!(!self.env.safekeepers.is_empty());
// TODO: use future host field from safekeeper spec
// Pass the list of safekeepers to the replica so that it can connect to any of them,
// whichever is availiable.
let sk_ports = self
.env
.safekeepers
.iter()
.map(|x| x.pg_port.to_string())
.collect::<Vec<_>>()
.join(",");
let sk_hosts = vec!["localhost"; self.env.safekeepers.len()].join(",");
let connstr = format!(
"host={} port={} options='-c timeline_id={} tenant_id={}' application_name=replica replication=true",
sk_hosts,
sk_ports,
&self.timeline_id.to_string(),
&self.tenant_id.to_string(),
);
let slot_name = format!("repl_{}_", self.timeline_id);
conf.append("primary_conninfo", connstr.as_str());
conf.append("primary_slot_name", slot_name.as_str());
conf.append("hot_standby", "on");
// prefetching of blocks referenced in WAL doesn't make sense for us
// Neon hot standby ignores pages that are not in the shared_buffers
if self.pg_version >= 15 {
conf.append("recovery_prefetch", "off");
}
}
}
Ok(conf)
}
pub fn endpoint_path(&self) -> PathBuf {
self.env.endpoints_path().join(&self.endpoint_id)
}
pub fn pgdata(&self) -> PathBuf {
self.endpoint_path().join("pgdata")
}
pub fn status(&self) -> &str {
let timeout = Duration::from_millis(300);
let has_pidfile = self.pgdata().join("postmaster.pid").exists();
let can_connect = TcpStream::connect_timeout(&self.pg_address, timeout).is_ok();
match (has_pidfile, can_connect) {
(true, true) => "running",
(false, false) => "stopped",
(true, false) => "crashed",
(false, true) => "running, no pidfile",
}
}
fn pg_ctl(&self, args: &[&str], auth_token: &Option<String>) -> Result<()> {
let pg_ctl_path = self.env.pg_bin_dir(self.pg_version)?.join("pg_ctl");
let mut cmd = Command::new(&pg_ctl_path);
cmd.args(
[
&[
"-D",
self.pgdata().to_str().unwrap(),
"-w", //wait till pg_ctl actually does what was asked
],
args,
]
.concat(),
)
.env_clear()
.env(
"LD_LIBRARY_PATH",
self.env.pg_lib_dir(self.pg_version)?.to_str().unwrap(),
)
.env(
"DYLD_LIBRARY_PATH",
self.env.pg_lib_dir(self.pg_version)?.to_str().unwrap(),
);
// Pass authentication token used for the connections to pageserver and safekeepers
if let Some(token) = auth_token {
cmd.env("NEON_AUTH_TOKEN", token);
}
let pg_ctl = cmd
.output()
.context(format!("{} failed", pg_ctl_path.display()))?;
if !pg_ctl.status.success() {
anyhow::bail!(
"pg_ctl failed, exit code: {}, stdout: {}, stderr: {}",
pg_ctl.status,
String::from_utf8_lossy(&pg_ctl.stdout),
String::from_utf8_lossy(&pg_ctl.stderr),
);
}
// Also wait for the compute_ctl process to die. It might have some cleanup
// work to do after postgres stops, like syncing safekeepers, etc.
//
// TODO use background_process::stop_process instead
let pidfile_path = self.endpoint_path().join("compute_ctl.pid");
let pid: u32 = std::fs::read_to_string(pidfile_path)?.parse()?;
let pid = nix::unistd::Pid::from_raw(pid as i32);
crate::background_process::wait_until_stopped("compute_ctl", pid)?;
Ok(())
}
pub fn start(&self, auth_token: &Option<String>, safekeepers: Vec<NodeId>) -> Result<()> {
if self.status() == "running" {
anyhow::bail!("The endpoint is already running");
}
// Slurp the endpoints/<endpoint id>/postgresql.conf file into
// memory. We will include it in the spec file that we pass to
// `compute_ctl`, and `compute_ctl` will write it to the postgresql.conf
// in the data directory.
let postgresql_conf_path = self.endpoint_path().join("postgresql.conf");
let postgresql_conf = match std::fs::read(&postgresql_conf_path) {
Ok(content) => String::from_utf8(content)?,
Err(e) if e.kind() == std::io::ErrorKind::NotFound => "".to_string(),
Err(e) => {
return Err(anyhow::Error::new(e).context(format!(
"failed to read config file in {}",
postgresql_conf_path.to_str().unwrap()
)))
}
};
// We always start the compute node from scratch, so if the Postgres
// data dir exists from a previous launch, remove it first.
if self.pgdata().exists() {
std::fs::remove_dir_all(self.pgdata())?;
}
let pageserver_connstring = {
let config = &self.pageserver.pg_connection_config;
let (host, port) = (config.host(), config.port());
// NOTE: avoid spaces in connection string, because it is less error prone if we forward it somewhere.
format!("postgresql://no_user@{host}:{port}")
};
let mut safekeeper_connstrings = Vec::new();
if self.mode == ComputeMode::Primary {
for sk_id in safekeepers {
let sk = self
.env
.safekeepers
.iter()
.find(|node| node.id == sk_id)
.ok_or_else(|| anyhow!("safekeeper {sk_id} does not exist"))?;
safekeeper_connstrings.push(format!("127.0.0.1:{}", sk.pg_port));
}
}
// Create spec file
let spec = ComputeSpec {
skip_pg_catalog_updates: self.skip_pg_catalog_updates,
format_version: 1.0,
operation_uuid: None,
cluster: Cluster {
cluster_id: None, // project ID: not used
name: None, // project name: not used
state: None,
roles: vec![],
databases: vec![],
settings: None,
postgresql_conf: Some(postgresql_conf),
},
delta_operations: None,
tenant_id: Some(self.tenant_id),
timeline_id: Some(self.timeline_id),
mode: self.mode,
pageserver_connstring: Some(pageserver_connstring),
safekeeper_connstrings,
storage_auth_token: auth_token.clone(),
};
let spec_path = self.endpoint_path().join("spec.json");
std::fs::write(spec_path, serde_json::to_string_pretty(&spec)?)?;
// Open log file. We'll redirect the stdout and stderr of `compute_ctl` to it.
let logfile = std::fs::OpenOptions::new()
.create(true)
.append(true)
.open(self.endpoint_path().join("compute.log"))?;
// Launch compute_ctl
println!("Starting postgres node at '{}'", self.connstr());
let mut cmd = Command::new(self.env.neon_distrib_dir.join("compute_ctl"));
cmd.args(["--http-port", &self.http_address.port().to_string()])
.args(["--pgdata", self.pgdata().to_str().unwrap()])
.args(["--connstr", &self.connstr()])
.args([
"--spec-path",
self.endpoint_path().join("spec.json").to_str().unwrap(),
])
.args([
"--pgbin",
self.env
.pg_bin_dir(self.pg_version)?
.join("postgres")
.to_str()
.unwrap(),
])
.stdin(std::process::Stdio::null())
.stderr(logfile.try_clone()?)
.stdout(logfile);
let child = cmd.spawn()?;
// Write down the pid so we can wait for it when we want to stop
// TODO use background_process::start_process instead
let pid = child.id();
let pidfile_path = self.endpoint_path().join("compute_ctl.pid");
std::fs::write(pidfile_path, pid.to_string())?;
// Wait for it to start
let mut attempt = 0;
const ATTEMPT_INTERVAL: Duration = Duration::from_millis(100);
const MAX_ATTEMPTS: u32 = 10 * 30; // Wait up to 30 s
loop {
attempt += 1;
match self.get_status() {
Ok(state) => {
match state.status {
ComputeStatus::Init => {
if attempt == MAX_ATTEMPTS {
bail!("compute startup timed out; still in Init state");
}
// keep retrying
}
ComputeStatus::Running => {
// All good!
break;
}
ComputeStatus::Failed => {
bail!(
"compute startup failed: {}",
state
.error
.as_deref()
.unwrap_or("<no error from compute_ctl>")
);
}
ComputeStatus::Empty
| ComputeStatus::ConfigurationPending
| ComputeStatus::Configuration => {
bail!("unexpected compute status: {:?}", state.status)
}
}
}
Err(e) => {
if attempt == MAX_ATTEMPTS {
return Err(e).context(
"timed out waiting to connect to compute_ctl HTTP; last error: {e}",
);
}
}
}
std::thread::sleep(ATTEMPT_INTERVAL);
}
Ok(())
}
// Call the /status HTTP API
pub fn get_status(&self) -> Result<ComputeState> {
let client = reqwest::blocking::Client::new();
let response = client
.request(
reqwest::Method::GET,
format!(
"http://{}:{}/status",
self.http_address.ip(),
self.http_address.port()
),
)
.send()?;
// Interpret the response
let status = response.status();
if !(status.is_client_error() || status.is_server_error()) {
Ok(response.json()?)
} else {
// reqwest does not export its error construction utility functions, so let's craft the message ourselves
let url = response.url().to_owned();
let msg = match response.text() {
Ok(err_body) => format!("Error: {}", err_body),
Err(_) => format!("Http error ({}) at {}.", status.as_u16(), url),
};
Err(anyhow::anyhow!(msg))
}
}
pub fn stop(&self, destroy: bool) -> Result<()> {
// If we are going to destroy data directory,
// use immediate shutdown mode, otherwise,
// shutdown gracefully to leave the data directory sane.
//
// Postgres is always started from scratch, so stop
// without destroy only used for testing and debugging.
//
if destroy {
self.pg_ctl(&["-m", "immediate", "stop"], &None)?;
println!(
"Destroying postgres data directory '{}'",
self.pgdata().to_str().unwrap()
);
std::fs::remove_dir_all(self.endpoint_path())?;
} else {
self.pg_ctl(&["stop"], &None)?;
}
Ok(())
}
pub fn connstr(&self) -> String {
format!(
"postgresql://{}@{}:{}/{}",
"cloud_admin",
self.pg_address.ip(),
self.pg_address.port(),
"postgres"
)
}
}

View File

@@ -9,7 +9,7 @@
mod background_process;
pub mod broker;
pub mod compute;
pub mod endpoint;
pub mod local_env;
pub mod pageserver;
pub mod postgresql_conf;

View File

@@ -5,6 +5,7 @@
use anyhow::{bail, ensure, Context};
use postgres_backend::AuthType;
use reqwest::Url;
use serde::{Deserialize, Serialize};
use serde_with::{serde_as, DisplayFromStr};
@@ -17,14 +18,13 @@ use std::net::SocketAddr;
use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
use utils::{
auth::{encode_from_key_file, Claims, Scope},
auth::{encode_from_key_file, Claims},
id::{NodeId, TenantId, TenantTimelineId, TimelineId},
postgres_backend::AuthType,
};
use crate::safekeeper::SafekeeperNode;
pub const DEFAULT_PG_VERSION: u32 = 14;
pub const DEFAULT_PG_VERSION: u32 = 15;
//
// This data structures represents neon_local CLI config
@@ -37,7 +37,7 @@ pub const DEFAULT_PG_VERSION: u32 = 14;
#[derive(Serialize, Deserialize, PartialEq, Eq, Clone, Debug)]
pub struct LocalEnv {
// Base directory for all the nodes (the pageserver, safekeepers and
// compute nodes).
// compute endpoints).
//
// This is not stored in the config file. Rather, this is the path where the
// config file itself is. It is read from the NEON_REPO_DIR env variable or
@@ -110,15 +110,14 @@ impl NeonBroker {
pub struct PageServerConf {
// node id
pub id: NodeId,
// Pageserver connection settings
pub listen_pg_addr: String,
pub listen_http_addr: String,
// used to determine which auth type is used
pub auth_type: AuthType,
// jwt auth token used for communication with pageserver
pub auth_token: String,
// auth type used for the PG and HTTP ports
pub pg_auth_type: AuthType,
pub http_auth_type: AuthType,
}
impl Default for PageServerConf {
@@ -127,8 +126,8 @@ impl Default for PageServerConf {
id: NodeId(0),
listen_pg_addr: String::new(),
listen_http_addr: String::new(),
auth_type: AuthType::Trust,
auth_token: String::new(),
pg_auth_type: AuthType::Trust,
http_auth_type: AuthType::Trust,
}
}
}
@@ -201,14 +200,8 @@ impl LocalEnv {
self.neon_distrib_dir.join("storage_broker")
}
pub fn pg_data_dirs_path(&self) -> PathBuf {
self.base_data_dir.join("pgdatadirs").join("tenants")
}
pub fn pg_data_dir(&self, tenant_id: &TenantId, branch_name: &str) -> PathBuf {
self.pg_data_dirs_path()
.join(tenant_id.to_string())
.join(branch_name)
pub fn endpoints_path(&self) -> PathBuf {
self.base_data_dir.join("endpoints")
}
// TODO: move pageserver files into ./pageserver
@@ -371,7 +364,7 @@ impl LocalEnv {
//
// Initialize a new Neon repository
//
pub fn init(&mut self, pg_version: u32) -> anyhow::Result<()> {
pub fn init(&mut self, pg_version: u32, force: bool) -> anyhow::Result<()> {
// check if config already exists
let base_path = &self.base_data_dir;
ensure!(
@@ -379,11 +372,29 @@ impl LocalEnv {
"repository base path is missing"
);
ensure!(
!base_path.exists(),
"directory '{}' already exists. Perhaps already initialized?",
base_path.display()
);
if base_path.exists() {
if force {
println!("removing all contents of '{}'", base_path.display());
// instead of directly calling `remove_dir_all`, we keep the original dir but removing
// all contents inside. This helps if the developer symbol links another directory (i.e.,
// S3 local SSD) to the `.neon` base directory.
for entry in std::fs::read_dir(base_path)? {
let entry = entry?;
let path = entry.path();
if path.is_dir() {
fs::remove_dir_all(&path)?;
} else {
fs::remove_file(&path)?;
}
}
} else {
bail!(
"directory '{}' already exists. Perhaps already initialized? (Hint: use --force to remove all contents)",
base_path.display()
);
}
}
if !self.pg_bin_dir(pg_version)?.join("postgres").exists() {
bail!(
"Can't find postgres binary at {}",
@@ -399,51 +410,38 @@ impl LocalEnv {
}
}
fs::create_dir(base_path)?;
if !base_path.exists() {
fs::create_dir(base_path)?;
}
// generate keys for jwt
// openssl genrsa -out private_key.pem 2048
let private_key_path;
// Generate keypair for JWT.
//
// The keypair is only needed if authentication is enabled in any of the
// components. For convenience, we generate the keypair even if authentication
// is not enabled, so that you can easily enable it after the initialization
// step. However, if the key generation fails, we treat it as non-fatal if
// authentication was not enabled.
if self.private_key_path == PathBuf::new() {
private_key_path = base_path.join("auth_private_key.pem");
let keygen_output = Command::new("openssl")
.arg("genrsa")
.args(["-out", private_key_path.to_str().unwrap()])
.arg("2048")
.stdout(Stdio::null())
.output()
.context("failed to generate auth private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
}
self.private_key_path = PathBuf::from("auth_private_key.pem");
let public_key_path = base_path.join("auth_public_key.pem");
// openssl rsa -in private_key.pem -pubout -outform PEM -out public_key.pem
let keygen_output = Command::new("openssl")
.arg("rsa")
.args(["-in", private_key_path.to_str().unwrap()])
.arg("-pubout")
.args(["-outform", "PEM"])
.args(["-out", public_key_path.to_str().unwrap()])
.stdout(Stdio::null())
.output()
.context("failed to generate auth private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
match generate_auth_keys(
base_path.join("auth_private_key.pem").as_path(),
base_path.join("auth_public_key.pem").as_path(),
) {
Ok(()) => {
self.private_key_path = PathBuf::from("auth_private_key.pem");
}
Err(e) => {
if !self.auth_keys_needed() {
eprintln!("Could not generate keypair for JWT authentication: {e}");
eprintln!("Continuing anyway because authentication was not enabled");
self.private_key_path = PathBuf::from("auth_private_key.pem");
} else {
return Err(e);
}
}
}
}
self.pageserver.auth_token =
self.generate_auth_token(&Claims::new(None, Scope::PageServerApi))?;
fs::create_dir_all(self.pg_data_dirs_path())?;
fs::create_dir_all(self.endpoints_path())?;
for safekeeper in &self.safekeepers {
fs::create_dir_all(SafekeeperNode::datadir_path_by_id(self, safekeeper.id))?;
@@ -451,6 +449,12 @@ impl LocalEnv {
self.persist_config(base_path)
}
fn auth_keys_needed(&self) -> bool {
self.pageserver.pg_auth_type == AuthType::NeonJWT
|| self.pageserver.http_auth_type == AuthType::NeonJWT
|| self.safekeepers.iter().any(|sk| sk.auth_enabled)
}
}
fn base_path() -> PathBuf {
@@ -460,6 +464,43 @@ fn base_path() -> PathBuf {
}
}
/// Generate a public/private key pair for JWT authentication
fn generate_auth_keys(private_key_path: &Path, public_key_path: &Path) -> anyhow::Result<()> {
// Generate the key pair
//
// openssl genpkey -algorithm ed25519 -out auth_private_key.pem
let keygen_output = Command::new("openssl")
.arg("genpkey")
.args(["-algorithm", "ed25519"])
.args(["-out", private_key_path.to_str().unwrap()])
.stdout(Stdio::null())
.output()
.context("failed to generate auth private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
}
// Extract the public key from the private key file
//
// openssl pkey -in auth_private_key.pem -pubout -out auth_public_key.pem
let keygen_output = Command::new("openssl")
.arg("pkey")
.args(["-in", private_key_path.to_str().unwrap()])
.arg("-pubout")
.args(["-out", public_key_path.to_str().unwrap()])
.output()
.context("failed to extract public key from private key")?;
if !keygen_output.status.success() {
bail!(
"openssl failed: '{}'",
String::from_utf8_lossy(&keygen_output.stderr)
);
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;

View File

@@ -1,3 +1,9 @@
//! Code to manage pageservers
//!
//! In the local test environment, the pageserver stores its data directly in
//!
//! .neon/
//!
use std::borrow::Cow;
use std::collections::HashMap;
use std::fs::File;
@@ -8,9 +14,8 @@ use std::process::{Child, Command};
use std::{io, result};
use anyhow::{bail, Context};
use pageserver_api::models::{
TenantConfigRequest, TenantCreateRequest, TenantInfo, TimelineCreateRequest, TimelineInfo,
};
use pageserver_api::models::{self, TenantInfo, TimelineInfo};
use postgres_backend::AuthType;
use postgres_connection::{parse_host_port, PgConnectionConfig};
use reqwest::blocking::{Client, RequestBuilder, Response};
use reqwest::{IntoUrl, Method};
@@ -20,7 +25,6 @@ use utils::{
http::error::HttpErrorBody,
id::{TenantId, TimelineId},
lsn::Lsn,
postgres_backend::AuthType,
};
use crate::{background_process, local_env::LocalEnv};
@@ -82,15 +86,8 @@ impl PageServerNode {
let (host, port) = parse_host_port(&env.pageserver.listen_pg_addr)
.expect("Unable to parse listen_pg_addr");
let port = port.unwrap_or(5432);
let password = if env.pageserver.auth_type == AuthType::NeonJWT {
Some(env.pageserver.auth_token.clone())
} else {
None
};
Self {
pg_connection_config: PgConnectionConfig::new_host_port(host, port)
.set_password(password),
pg_connection_config: PgConnectionConfig::new_host_port(host, port),
env: env.clone(),
http_client: Client::new(),
http_base_url: format!("http://{}/v1", env.pageserver.listen_http_addr),
@@ -106,25 +103,32 @@ impl PageServerNode {
self.env.pg_distrib_dir_raw().display()
);
let authg_type_param = format!("auth_type='{}'", self.env.pageserver.auth_type);
let http_auth_type_param =
format!("http_auth_type='{}'", self.env.pageserver.http_auth_type);
let listen_http_addr_param = format!(
"listen_http_addr='{}'",
self.env.pageserver.listen_http_addr
);
let pg_auth_type_param = format!("pg_auth_type='{}'", self.env.pageserver.pg_auth_type);
let listen_pg_addr_param =
format!("listen_pg_addr='{}'", self.env.pageserver.listen_pg_addr);
let broker_endpoint_param = format!("broker_endpoint='{}'", self.env.broker.client_url());
let mut overrides = vec![
id,
pg_distrib_dir_param,
authg_type_param,
http_auth_type_param,
pg_auth_type_param,
listen_http_addr_param,
listen_pg_addr_param,
broker_endpoint_param,
];
if self.env.pageserver.auth_type != AuthType::Trust {
if self.env.pageserver.http_auth_type != AuthType::Trust
|| self.env.pageserver.pg_auth_type != AuthType::Trust
{
overrides.push("auth_validation_public_key_path='auth_public_key.pem'".to_owned());
}
overrides
@@ -247,7 +251,10 @@ impl PageServerNode {
}
fn pageserver_env_variables(&self) -> anyhow::Result<Vec<(String, String)>> {
Ok(if self.env.pageserver.auth_type != AuthType::Trust {
// FIXME: why is this tied to pageserver's auth type? Whether or not the safekeeper
// needs a token, and how to generate that token, seems independent to whether
// the pageserver requires a token in incoming requests.
Ok(if self.env.pageserver.http_auth_type != AuthType::Trust {
// Generate a token to connect from the pageserver to a safekeeper
let token = self
.env
@@ -270,27 +277,30 @@ impl PageServerNode {
background_process::stop_process(immediate, "pageserver", &self.pid_file())
}
pub fn page_server_psql(&self, sql: &str) -> Vec<postgres::SimpleQueryMessage> {
let mut client = self.pg_connection_config.connect_no_tls().unwrap();
println!("Pageserver query: '{sql}'");
client.simple_query(sql).unwrap()
}
pub fn page_server_psql_client(&self) -> result::Result<postgres::Client, postgres::Error> {
self.pg_connection_config.connect_no_tls()
}
fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> RequestBuilder {
let mut builder = self.http_client.request(method, url);
if self.env.pageserver.auth_type == AuthType::NeonJWT {
builder = builder.bearer_auth(&self.env.pageserver.auth_token)
pub fn page_server_psql_client(&self) -> anyhow::Result<postgres::Client> {
let mut config = self.pg_connection_config.clone();
if self.env.pageserver.pg_auth_type == AuthType::NeonJWT {
let token = self
.env
.generate_auth_token(&Claims::new(None, Scope::PageServerApi))?;
config = config.set_password(Some(token));
}
builder
Ok(config.connect_no_tls()?)
}
fn http_request<U: IntoUrl>(&self, method: Method, url: U) -> anyhow::Result<RequestBuilder> {
let mut builder = self.http_client.request(method, url);
if self.env.pageserver.http_auth_type == AuthType::NeonJWT {
let token = self
.env
.generate_auth_token(&Claims::new(None, Scope::PageServerApi))?;
builder = builder.bearer_auth(token)
}
Ok(builder)
}
pub fn check_status(&self) -> Result<()> {
self.http_request(Method::GET, format!("{}/status", self.http_base_url))
self.http_request(Method::GET, format!("{}/status", self.http_base_url))?
.send()?
.error_from_body()?;
Ok(())
@@ -298,7 +308,7 @@ impl PageServerNode {
pub fn tenant_list(&self) -> Result<Vec<TenantInfo>> {
Ok(self
.http_request(Method::GET, format!("{}/tenant", self.http_base_url))
.http_request(Method::GET, format!("{}/tenant", self.http_base_url))?
.send()?
.error_from_body()?
.json()?)
@@ -310,8 +320,8 @@ impl PageServerNode {
settings: HashMap<&str, &str>,
) -> anyhow::Result<TenantId> {
let mut settings = settings.clone();
let request = TenantCreateRequest {
new_tenant_id,
let config = models::TenantConfig {
checkpoint_distance: settings
.remove("checkpoint_distance")
.map(|x| x.parse::<u64>())
@@ -352,11 +362,37 @@ impl PageServerNode {
.map(|x| x.parse::<bool>())
.transpose()
.context("Failed to parse 'trace_read_requests' as bool")?,
eviction_policy: settings
.remove("eviction_policy")
.map(serde_json::from_str)
.transpose()
.context("Failed to parse 'eviction_policy' json")?,
min_resident_size_override: settings
.remove("min_resident_size_override")
.map(|x| x.parse::<u64>())
.transpose()
.context("Failed to parse 'min_resident_size_override' as integer")?,
evictions_low_residence_duration_metric_threshold: settings
.remove("evictions_low_residence_duration_metric_threshold")
.map(|x| x.to_string()),
gc_feedback: settings
.remove("gc_feedback")
.map(|x| x.parse::<bool>())
.transpose()
.context("Failed to parse 'gc_feedback' as bool")?,
};
// If tenant ID was not specified, generate one
let new_tenant_id = new_tenant_id.unwrap_or(TenantId::generate());
let request = models::TenantCreateRequest {
new_tenant_id,
config,
};
if !settings.is_empty() {
bail!("Unrecognized tenant settings: {settings:?}")
}
self.http_request(Method::POST, format!("{}/tenant", self.http_base_url))
self.http_request(Method::POST, format!("{}/tenant", self.http_base_url))?
.json(&request)
.send()?
.error_from_body()?
@@ -372,54 +408,86 @@ impl PageServerNode {
})
}
pub fn tenant_config(&self, tenant_id: TenantId, settings: HashMap<&str, &str>) -> Result<()> {
self.http_request(Method::PUT, format!("{}/tenant/config", self.http_base_url))
.json(&TenantConfigRequest {
tenant_id,
pub fn tenant_config(
&self,
tenant_id: TenantId,
mut settings: HashMap<&str, &str>,
) -> anyhow::Result<()> {
let config = {
// Braces to make the diff easier to read
models::TenantConfig {
checkpoint_distance: settings
.get("checkpoint_distance")
.remove("checkpoint_distance")
.map(|x| x.parse::<u64>())
.transpose()
.context("Failed to parse 'checkpoint_distance' as an integer")?,
checkpoint_timeout: settings.get("checkpoint_timeout").map(|x| x.to_string()),
checkpoint_timeout: settings.remove("checkpoint_timeout").map(|x| x.to_string()),
compaction_target_size: settings
.get("compaction_target_size")
.remove("compaction_target_size")
.map(|x| x.parse::<u64>())
.transpose()
.context("Failed to parse 'compaction_target_size' as an integer")?,
compaction_period: settings.get("compaction_period").map(|x| x.to_string()),
compaction_period: settings.remove("compaction_period").map(|x| x.to_string()),
compaction_threshold: settings
.get("compaction_threshold")
.remove("compaction_threshold")
.map(|x| x.parse::<usize>())
.transpose()
.context("Failed to parse 'compaction_threshold' as an integer")?,
gc_horizon: settings
.get("gc_horizon")
.remove("gc_horizon")
.map(|x| x.parse::<u64>())
.transpose()
.context("Failed to parse 'gc_horizon' as an integer")?,
gc_period: settings.get("gc_period").map(|x| x.to_string()),
gc_period: settings.remove("gc_period").map(|x| x.to_string()),
image_creation_threshold: settings
.get("image_creation_threshold")
.remove("image_creation_threshold")
.map(|x| x.parse::<usize>())
.transpose()
.context("Failed to parse 'image_creation_threshold' as non zero integer")?,
pitr_interval: settings.get("pitr_interval").map(|x| x.to_string()),
pitr_interval: settings.remove("pitr_interval").map(|x| x.to_string()),
walreceiver_connect_timeout: settings
.get("walreceiver_connect_timeout")
.remove("walreceiver_connect_timeout")
.map(|x| x.to_string()),
lagging_wal_timeout: settings
.remove("lagging_wal_timeout")
.map(|x| x.to_string()),
lagging_wal_timeout: settings.get("lagging_wal_timeout").map(|x| x.to_string()),
max_lsn_wal_lag: settings
.get("max_lsn_wal_lag")
.remove("max_lsn_wal_lag")
.map(|x| x.parse::<NonZeroU64>())
.transpose()
.context("Failed to parse 'max_lsn_wal_lag' as non zero integer")?,
trace_read_requests: settings
.get("trace_read_requests")
.remove("trace_read_requests")
.map(|x| x.parse::<bool>())
.transpose()
.context("Failed to parse 'trace_read_requests' as bool")?,
})
eviction_policy: settings
.remove("eviction_policy")
.map(serde_json::from_str)
.transpose()
.context("Failed to parse 'eviction_policy' json")?,
min_resident_size_override: settings
.remove("min_resident_size_override")
.map(|x| x.parse::<u64>())
.transpose()
.context("Failed to parse 'min_resident_size_override' as an integer")?,
evictions_low_residence_duration_metric_threshold: settings
.remove("evictions_low_residence_duration_metric_threshold")
.map(|x| x.to_string()),
gc_feedback: settings
.remove("gc_feedback")
.map(|x| x.parse::<bool>())
.transpose()
.context("Failed to parse 'gc_feedback' as bool")?,
}
};
if !settings.is_empty() {
bail!("Unrecognized tenant settings: {settings:?}")
}
self.http_request(Method::PUT, format!("{}/tenant/config", self.http_base_url))?
.json(&models::TenantConfigRequest { tenant_id, config })
.send()?
.error_from_body()?;
@@ -431,7 +499,7 @@ impl PageServerNode {
.http_request(
Method::GET,
format!("{}/tenant/{}/timeline", self.http_base_url, tenant_id),
)
)?
.send()?
.error_from_body()?
.json()?;
@@ -447,11 +515,14 @@ impl PageServerNode {
ancestor_timeline_id: Option<TimelineId>,
pg_version: Option<u32>,
) -> anyhow::Result<TimelineInfo> {
// If timeline ID was not specified, generate one
let new_timeline_id = new_timeline_id.unwrap_or(TimelineId::generate());
self.http_request(
Method::POST,
format!("{}/tenant/{}/timeline", self.http_base_url, tenant_id),
)
.json(&TimelineCreateRequest {
)?
.json(&models::TimelineCreateRequest {
new_timeline_id,
ancestor_start_lsn,
ancestor_timeline_id,
@@ -487,7 +558,7 @@ impl PageServerNode {
pg_wal: Option<(Lsn, PathBuf)>,
pg_version: u32,
) -> anyhow::Result<()> {
let mut client = self.pg_connection_config.connect_no_tls().unwrap();
let mut client = self.page_server_psql_client()?;
// Init base reader
let (start_lsn, base_tarfile_path) = base;

View File

@@ -13,7 +13,7 @@ use std::io::BufRead;
use std::str::FromStr;
/// In-memory representation of a postgresql.conf file
#[derive(Default)]
#[derive(Default, Debug)]
pub struct PostgresConf {
lines: Vec<String>,
hash: HashMap<String, String>,

View File

@@ -1,7 +1,12 @@
//! Code to manage safekeepers
//!
//! In the local test environment, the data for each safekeeper is stored in
//!
//! .neon/safekeepers/<safekeeper id>
//!
use std::io::Write;
use std::path::PathBuf;
use std::process::Child;
use std::sync::Arc;
use std::{io, result};
use anyhow::Context;
@@ -11,7 +16,6 @@ use reqwest::{IntoUrl, Method};
use thiserror::Error;
use utils::{http::error::HttpErrorBody, id::NodeId};
use crate::pageserver::PageServerNode;
use crate::{
background_process,
local_env::{LocalEnv, SafekeeperConf},
@@ -65,14 +69,10 @@ pub struct SafekeeperNode {
pub env: LocalEnv,
pub http_client: Client,
pub http_base_url: String,
pub pageserver: Arc<PageServerNode>,
}
impl SafekeeperNode {
pub fn from_env(env: &LocalEnv, conf: &SafekeeperConf) -> SafekeeperNode {
let pageserver = Arc::new(PageServerNode::from_env(env));
SafekeeperNode {
id: conf.id,
conf: conf.clone(),
@@ -80,7 +80,6 @@ impl SafekeeperNode {
env: env.clone(),
http_client: Client::new(),
http_base_url: format!("http://127.0.0.1:{}/v1", conf.http_port),
pageserver,
}
}
@@ -115,6 +114,10 @@ impl SafekeeperNode {
let datadir = self.datadir_path();
let id_string = id.to_string();
// TODO: add availability_zone to the config.
// Right now we just specify any value here and use it to check metrics in tests.
let availability_zone = format!("sk-{}", id_string);
let mut args = vec![
"-D",
datadir.to_str().with_context(|| {
@@ -126,6 +129,8 @@ impl SafekeeperNode {
&listen_pg,
"--listen-http",
&listen_http,
"--availability-zone",
&availability_zone,
];
if !self.conf.sync {
args.push("--no-sync");
@@ -157,7 +162,7 @@ impl SafekeeperNode {
}
background_process::start_process(
&format!("safekeeper {id}"),
&format!("safekeeper-{id}"),
&datadir,
&self.env.safekeeper_bin(),
&args,

View File

@@ -1,6 +1,14 @@
#!/bin/bash
set -eux
# Generate a random tenant or timeline ID
#
# Takes a variable name as argument. The result is stored in that variable.
generate_id() {
local -n resvar=$1
printf -v resvar '%08x%08x%08x%08x' $SRANDOM $SRANDOM $SRANDOM $SRANDOM
}
PG_VERSION=${PG_VERSION:-14}
SPEC_FILE_ORG=/var/db/postgres/specs/spec.json
@@ -13,29 +21,29 @@ done
echo "Page server is ready."
echo "Create a tenant and timeline"
generate_id tenant_id
PARAMS=(
-sb
-X POST
-H "Content-Type: application/json"
-d "{}"
-d "{\"new_tenant_id\": \"${tenant_id}\"}"
http://pageserver:9898/v1/tenant/
)
tenant_id=$(curl "${PARAMS[@]}" | sed 's/"//g')
result=$(curl "${PARAMS[@]}")
echo $result | jq .
generate_id timeline_id
PARAMS=(
-sb
-X POST
-H "Content-Type: application/json"
-d "{\"tenant_id\":\"${tenant_id}\", \"pg_version\": ${PG_VERSION}}"
-d "{\"new_timeline_id\": \"${timeline_id}\", \"pg_version\": ${PG_VERSION}}"
"http://pageserver:9898/v1/tenant/${tenant_id}/timeline/"
)
result=$(curl "${PARAMS[@]}")
echo $result | jq .
echo "Overwrite tenant id and timeline id in spec file"
tenant_id=$(echo ${result} | jq -r .tenant_id)
timeline_id=$(echo ${result} | jq -r .timeline_id)
sed "s/TENANT_ID/${tenant_id}/" ${SPEC_FILE_ORG} > ${SPEC_FILE}
sed -i "s/TIMELINE_ID/${timeline_id}/" ${SPEC_FILE}

View File

@@ -28,11 +28,6 @@
"value": "replica",
"vartype": "enum"
},
{
"name": "hot_standby",
"value": "on",
"vartype": "bool"
},
{
"name": "wal_log_hints",
"value": "on",

View File

@@ -160,6 +160,7 @@ services:
build:
context: ./compute_wrapper/
args:
- REPOSITORY=${REPOSITORY:-neondatabase}
- COMPUTE_IMAGE=compute-node-v${PG_VERSION:-14}
- TAG=${TAG:-latest}
- http_proxy=$http_proxy
@@ -188,7 +189,7 @@ services:
- "/bin/bash"
- "-c"
command:
- "until pg_isready -h compute -p 55433 ; do
- "until pg_isready -h compute -p 55433 -U cloud_admin ; do
echo 'Waiting to start compute...' && sleep 1;
done"
depends_on:

View File

@@ -29,12 +29,54 @@ These components should not have access to the private key and may only get toke
The key pair is generated once for an installation of compute/pageserver/safekeeper, e.g. by `neon_local init`.
There is currently no way to rotate the key without bringing down all components.
### Best practices
See [RFC 8725: JSON Web Token Best Current Practices](https://www.rfc-editor.org/rfc/rfc8725)
### Token format
The JWT tokens in Neon use "EdDSA" as the algorithm (defined in [RFC8037](https://www.rfc-editor.org/rfc/rfc8037)).
Example:
Header:
```
{
"alg": "EdDSA",
"typ": "JWT"
}
```
Payload:
```
{
"scope": "tenant", # "tenant", "pageserverapi", or "safekeeperdata"
"tenant_id": "5204921ff44f09de8094a1390a6a50f6",
}
```
Meanings of scope:
"tenant": Provides access to all data for a specific tenant
"pageserverapi": Provides blanket access to all tenants on the pageserver plus pageserver-wide APIs.
Should only be used e.g. for status check/tenant creation/list.
"safekeeperdata": Provides blanket access to all data on the safekeeper plus safekeeper-wide APIs.
Should only be used e.g. for status check.
Currently also used for connection from any pageserver to any safekeeper.
### CLI
CLI generates a key pair during call to `neon_local init` with the following commands:
```bash
openssl genrsa -out auth_private_key.pem 2048
openssl rsa -in auth_private_key.pem -pubout -outform PEM -out auth_public_key.pem
openssl genpkey -algorithm ed25519 -out auth_private_key.pem
openssl pkey -in auth_private_key.pem -pubout -out auth_public_key.pem
```
Configuration files for all components point to `public_key.pem` for JWT validation.
@@ -64,20 +106,22 @@ Their authentication is just plain PostgreSQL authentication and out of scope fo
There is no administrative API except those provided by PostgreSQL.
#### Outgoing connections
Compute connects to Pageserver for getting pages.
The connection string is configured by the `neon.pageserver_connstring` PostgreSQL GUC, e.g. `postgresql://no_user:$NEON_AUTH_TOKEN@localhost:15028`.
The environment variable inside the connection string is substituted with
the JWT token.
Compute connects to Pageserver for getting pages. The connection string is
configured by the `neon.pageserver_connstring` PostgreSQL GUC,
e.g. `postgresql://no_user@localhost:15028`. If the `$NEON_AUTH_TOKEN`
environment variable is set, it is used as the password for the connection. (The
pageserver uses JWT tokens for authentication, so the password is really a
token.)
Compute connects to Safekeepers to write and commit data.
The token is the same for all safekeepers.
It's stored in an environment variable, whose name is configured
by the `neon.safekeeper_token_env` PostgreSQL GUC.
If the GUC is unset, no token is passed.
Compute connects to Safekeepers to write and commit data. The list of safekeeper
addresses is given in the `neon.safekeepers` GUC. The connections to the
safekeepers take the password from the `$NEON_AUTH_TOKEN` environment
variable, if set.
Note that both tokens can be (and typically are) the same;
the scope is the tenant and the token is usually passed through the
`$NEON_AUTH_TOKEN` environment variable.
The `compute_ctl` binary that runs before the PostgreSQL server, and launches
PostgreSQL, also makes a connection to the pageserver. It uses it to fetch the
initial "base backup" dump, to initialize the PostgreSQL data directory. It also
uses `$NEON_AUTH_TOKEN` as the password for the connection.
### Pageserver
#### Overview
@@ -102,10 +146,12 @@ Each compute should present a token valid for the timeline's tenant.
Pageserver also has HTTP API: some parts are per-tenant,
some parts are server-wide, these are different scopes.
The `auth_type` configuration variable in Pageserver's config may have
either of three values:
Authentication can be enabled separately for the HTTP mgmt API, and
for the libpq connections from compute. The `http_auth_type` and
`pg_auth_type` configuration variables in Pageserver's config may
have one of these values:
* `Trust` removes all authentication. The outdated `MD5` value does likewise
* `Trust` removes all authentication.
* `NeonJWT` enables JWT validation.
Tokens are validated using the public key which lies in a PEM file
specified in the `auth_validation_public_key_path` config.

View File

@@ -37,9 +37,9 @@ You can specify version of neon cluster using following environment values.
- PG_VERSION: postgres version for compute (default is 14)
- TAG: the tag version of [docker image](https://registry.hub.docker.com/r/neondatabase/neon/tags) (default is latest), which is tagged in [CI test](/.github/workflows/build_and_test.yml)
```
$ cd docker-compose/docker-compose.yml
$ cd docker-compose/
$ docker-compose down # remove the conainers if exists
$ PG_VERSION=15 TAG=2221 docker-compose up --build -d # You can specify the postgres and image version
$ PG_VERSION=15 TAG=2937 docker-compose up --build -d # You can specify the postgres and image version
Creating network "dockercompose_default" with the default driver
Creating docker-compose_storage_broker_1 ... done
(...omit...)
@@ -48,6 +48,7 @@ Creating docker-compose_storage_broker_1 ... done
2. connect compute node
```
$ echo "localhost:55433:postgres:cloud_admin:cloud_admin" >> ~/.pgpass
$ chmod 600 ~/.pgpass
$ psql -h localhost -p 55433 -U cloud_admin
postgres=# CREATE TABLE t(key int primary key, value text);
CREATE TABLE

View File

@@ -4,6 +4,11 @@ The pageserver uses Tokio for handling concurrency. Everything runs in
Tokio tasks, although some parts are written in blocking style and use
spawn_blocking().
We currently use std blocking functions for disk I/O, however. The
current model is that we consider disk I/Os to be short enough that we
perform them while running in a Tokio task. Changing all the disk I/O
calls to async is a TODO.
Each Tokio task is tracked by the `task_mgr` module. It maintains a
registry of tasks, and which tenant or timeline they are operating
on.
@@ -21,19 +26,84 @@ also a `shudown_watcher()` Future that can be used with `tokio::select!`
or similar, to wake up on shutdown.
### Sync vs async
### Async cancellation safety
We use async to wait for incoming data on network connections, and to
perform other long-running operations. For example, each WAL receiver
connection is handled by a tokio Task. Once a piece of WAL has been
received from the network, the task calls the blocking functions in
the Repository to process the WAL.
In async Rust, futures can be "cancelled" at any await point, by
dropping the Future. For example, `tokio::select!` returns as soon as
one of the Futures returns, and drops the others. `tokio::timeout!` is
another example. In the Rust ecosystem, some functions are
cancellation-safe, meaning they can be safely dropped without
side-effects, while others are not. See documentation of
`tokio::select!` for examples.
The core storage code in `layered_repository/` is synchronous, with
blocking locks and I/O calls. The current model is that we consider
disk I/Os to be short enough that we perform them while running in a
Tokio task. If that becomes a problem, we should use `spawn_blocking`
before entering the synchronous parts of the code, or switch to using
tokio I/O functions.
In the pageserver and safekeeper, async code is *not*
cancellation-safe by default. Unless otherwise marked, any async
function that you call cannot be assumed to be async
cancellation-safe, and must be polled to completion.
Be very careful when mixing sync and async code!
The downside of non-cancellation safe code is that you have to be very
careful when using `tokio::select!`, `tokio::timeout!`, and other such
functions that can cause a Future to be dropped. They can only be used
with functions that are explicitly documented to be cancellation-safe,
or you need to spawn a separate task to shield from the cancellation.
At the entry points to the code, we also take care to poll futures to
completion, or shield the rest of the code from surprise cancellations
by spawning a separate task. The code that handles incoming HTTP
requests, for example, spawns a separate task for each request,
because Hyper will drop the request-handling Future if the HTTP
connection is lost.
#### How to cancel, then?
If our code is not cancellation-safe, how do you cancel long-running
tasks? Use CancellationTokens.
TODO: More details on that. And we have an ongoing discussion on what
to do if cancellations might come from multiple sources.
#### Exceptions
Some library functions are cancellation-safe, and are explicitly marked
as such. For example, `utils::seqwait`.
#### Rationale
The alternative would be to make all async code cancellation-safe,
unless otherwise marked. That way, you could use `tokio::select!` more
liberally. The reasons we didn't choose that are explained in this
section.
Writing code in a cancellation-safe manner is tedious, as you need to
scrutinize every `.await` and ensure that if the `.await` call never
returns, the system is in a safe, consistent state. In some ways, you
need to do that with `?` and early `returns`, too, but `.await`s are
easier to miss. It is also easier to perform cleanup tasks when a
function returns an `Err` than when an `.await` simply never
returns. You can use `scopeguard` and Drop guards to perform cleanup
tasks, but it is more tedious. An `.await` that never returns is more
similar to a panic.
Note that even if you only use building blocks that themselves are
cancellation-safe, it doesn't mean that the code as whole is
cancellation-safe. For example, consider the following code:
```
while let Some(i) = work_inbox.recv().await {
if let Err(_) = results_outbox.send(i).await {
println!("receiver dropped");
return;
}
}
}
```
It reads messages from one channel, sends them to another channel. If
this code is cancelled at the `results_outbox.send(i).await`, the
message read from the receiver is lost. That may or may not be OK,
depending on the context.
Another reason to not require cancellation-safety is historical: we
already had a lot of async code that was not scrutinized for
cancellation-safety when this issue was raised. Scrutinizing all
existing code is no fun.

View File

@@ -0,0 +1,269 @@
# Deleting pageserver part of tenants data from s3
Created on 08.03.23
## Motivation
Currently we dont delete pageserver part of the data from s3 when project is deleted. (The same is true for safekeepers, but this outside of the scope of this RFC).
This RFC aims to spin a discussion to come to a robust deletion solution that wont put us in into a corner for features like postponed deletion (when we keep data for user to be able to restore a project if it was deleted by accident)
## Summary
TLDR; There are two options, one based on control plane issuing actual delete requests to s3 and the other one that keeps s3 stuff bound to pageserver. Each one has its pros and cons.
The decision is to stick with pageserver centric approach. For motivation see [Decision](#decision).
## Components
pageserver, control-plane
## Requirements
Deletion should successfully finish (eventually) without leaving dangling files in presense of:
- component restarts
- component outage
- pageserver loss
## Proposed implementation
Before the options are discussed, note that deletion can be quite long process. For deletion from s3 the obvious choice is [DeleteObjects](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) API call. It allows to batch deletion of up to 1k objects in one API call. So deletion operation linearly depends on number of layer files.
Another design limitation is that there is no cheap `mv` operation available for s3. `mv` from `aws s3 mv` uses `copy(src, dst) + delete(src)`. So `mv`-like operation is not feasible as a building block because it actually amplifies the problem with both duration and resulting cost of the operation.
The case when there are multiple pageservers handling the same tenants is largely out of scope of the RFC. We still consider case with migration from one PS to another, but do not consider case when tenant exists on multiple pageservers for extended period of time. The case with multiple pageservers can be reduced to case with one pageservers by calling detach on all pageservers except the last one, for it actual delete needs to be called.
For simplicity lets look into deleting tenants. Differences in deletion process between tenants and timelines are mentioned in paragraph ["Differences between tenants and timelines"](#differences-between-tenants-and-timelines)
### 1. Pageserver owns deletion machinery
#### The sequence
TLDR; With this approach control plane needs to call delete on a tenant and poll for progress. As much as possible is handled on pageserver. Lets see the sequence.
Happy path:
```mermaid
sequenceDiagram
autonumber
participant CP as Control Plane
participant PS as Pageserver
participant S3
CP->>PS: Delete tenant
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
PS->>PS: Create deleted mark file locally
PS->>CP: Accepted
PS->>PS: delete local files other than deleted mark
loop Delete layers for each timeline
PS->>S3: delete(..)
CP->>PS: Finished?
PS->>CP: False
end
PS->>S3: Delete mark file
PS->>PS: Delete local mark file
loop Poll for status
CP->>PS: Finished?
PS->>CP: True or False
end
```
Why two mark files?
Remote one is needed for cases when pageserver is lost during deletion so other pageserver can learn the deletion from s3 during attach.
Why local mark file is needed?
If we dont have one, we have two choices, delete local data before deleting the remote part or do that after.
If we delete local data before remote then during restart pageserver wont pick up remote tenant at all because nothing is available locally (pageserver looks for remote conuterparts of locally available tenants).
If we delete local data after remote then at the end of the sequence when remote mark file is deleted if pageserver restart happens then the state is the same to situation when pageserver just missing data on remote without knowing the fact that this data is intended to be deleted. In this case the current behavior is upload everything local-only to remote.
Thus we need local record of tenant being deleted as well.
##### Handle pageserver crashes
Lets explore sequences with various crash points.
Pageserver crashes before `deleted` mark file is persisted in s3:
```mermaid
sequenceDiagram
autonumber
participant CP as Control Plane
participant PS as Pageserver
participant S3
CP->>PS: Delete tenant
note over PS: Crash point 1.
CP->>PS: Retry delete request
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
PS->>PS: Create deleted mark file locally
PS->>CP: Accepted
PS->>PS: delete local files other than deleted mark
loop Delete layers for each timeline
PS->>S3: delete(..)
CP->>PS: Finished?
PS->>CP: False
end
PS->>S3: Delete mark file
PS->>PS: Delete local mark file
CP->>PS: Finished?
PS->>CP: True
```
Pageserver crashed when deleted mark was about to be persisted in s3, before Control Plane gets a response:
```mermaid
sequenceDiagram
autonumber
participant CP as Control Plane
participant PS as Pageserver
participant S3
CP->>PS: Delete tenant
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
note over PS: Crash point 2.
note over PS: During startup we reconcile <br> with remote and see <br> whether the remote mark exists
alt Remote mark exists
PS->>PS: create local mark if its missing
PS->>PS: delete local files other than deleted mark
loop Delete layers for each timeline
PS->>S3: delete(..)
end
note over CP: Eventually console should <br> retry delete request
CP->>PS: Retry delete tenant
PS->>CP: Not modified
else Mark is missing
note over PS: Continue to operate the tenant as if deletion didnt happen
note over CP: Eventually console should <br> retry delete request
CP->>PS: Retry delete tenant
PS->>S3: Create deleted mark file at <br> /tenant/meta/deleted
PS->>CP: Delete tenant
end
PS->>PS: Continue with layer file deletions
loop Delete layers for each timeline
PS->>S3: delete(..)
CP->>PS: Finished?
PS->>CP: False
end
PS->>S3: Delete mark file
PS->>PS: Delete local mark file
CP->>PS: Finished?
PS->>CP: True
```
Similar sequence applies when both local and remote marks were persisted but Control Plane still didnt receive a response.
If pageserver crashes after both mark files were deleted then it will reply to control plane status poll request with 404 which should be treated by control plane as success.
The same applies if pageserver crashes in the end, when remote mark is deleted but before local one gets deleted. In this case on restart pageserver moves forward with deletion of local mark and Control Plane will receive 404.
##### Differences between tenants and timelines
For timeline the sequence is the same with the following differences:
- remote delete mark file can be replaced with a boolean "deleted" flag in index_part.json
- local deletion mark is not needed, because whole tenant is kept locally so situation described in motivation for local mark is impossible
##### Handle pageserver loss
If pageseserver is lost then the deleted tenant should be attached to different pageserver and delete request needs to be retried against new pageserver. Then attach logic is shared with one described for pageserver restarts (local deletion mark wont be available so needs to be created).
##### Restrictions for tenant that is in progress of being deleted
I propose to add another state to tenant/timeline - PendingDelete. This state shouldnt allow executing any operations aside from polling the deletion status.
#### Summary
Pros:
- Storage is not dependent on control plane. Storage can be restarted even if control plane is not working.
- Allows for easier dogfooding, console can use Neon backed database as primary operational data store. If storage depends on control plane and control plane depends on storage we're stuck.
- No need to share inner s3 workings with control plane. Pageserver presents api contract and S3 paths are not part of this contract.
- No need to pass list of alive timelines to attach call. This will be solved by pageserver observing deleted flag. See
Cons:
- Logic is a tricky, needs good testing
- Anything else?
### 2. Control plane owns deletion machinery
In this case the only action performed on pageserver is removal of local files.
Everything else is done by control plane. The steps are as follows:
1. Control plane marks tenant as "delete pending" in its database
2. It lists the s3 for all the files and repeatedly calls delete until nothing is left behind
3. When no files are left marks deletion as completed
In case of restart it selects all tenants marked as "delete pending" and continues the deletion.
For tenants it is simple. For timelines there are caveats.
Assume that the same workflow is used for timelines.
If a tenant gets relocated during timeline deletion the attach call with its current logic will pick up deleted timeline in its half deleted state.
Available options:
- require list of alive timelines to be passed to attach call
- use the same schema with flag in index_part.json (again part of the caveats around pageserver restart applies). In this case nothing stops pageserver from implementing deletion inside if we already have these deletion marks.
With first option the following problem becomes apparent:
Who is the source of truth regarding timeline liveness?
Imagine:
PS1 fails.
PS2 gets assigned the tenant.
New branch gets created
PS1 starts up (is it possible or we just recycle it?)
PS1 is unaware of the new branch. It can either fall back to s3 ls, or ask control plane.
So here comes the dependency of storage on control plane. During restart storage needs to know which timelines are valid for operation. If there is nothing on s3 that can answer that question storage neeeds to ask control plane.
### Summary
Cons:
- Potential thundering herd-like problem during storage restart (requests to control plane)
- Potential increase in storage startup time (additional request to control plane)
- Storage startup starts to depend on console
- Erroneous attach call can attach tenant in half deleted state
Pros:
- Easier to reason about if you dont have to account for pageserver restarts
### Extra notes
There was a concern that having deletion code in pageserver is a littlebit scary, but we need to have this code somewhere. So to me it is equally scary to have that in whatever place it ends up at.
Delayed deletion can be done with both approaches. As discussed with Anna (@stepashka) this is only relevant for tenants (projects) not for timelines. For first approach detach can be called immediately and deletion can be done later with attach + delete. With second approach control plane needs to start the deletion whenever necessary.
## Decision
After discussion in comments I see that we settled on two options (though a bit different from ones described in rfc). First one is the same - pageserver owns as much as possible. The second option is that pageserver owns markers thing, but actual deletion happens in control plane by repeatedly calling ls + delete.
To my mind the only benefit of the latter approach is possible code reuse between safekeepers and pageservers. Otherwise poking around integrating s3 library into control plane, configuring shared knowledge abouth paths in s3 - are the downsides. Another downside of relying on control plane is the testing process. Control plane resides in different repository so it is quite hard to test pageserver related changes there. e2e test suite there doesnt support shutting down pageservers, which are separate docker containers there instead of just processes.
With pageserver owning everything we still give the retry logic to control plane but its easier to duplicate if needed compared to sharing inner s3 workings. We will have needed tests for retry logic in neon repo.
So the decision is to proceed with pageserver centric approach.

Some files were not shown because too many files have changed in this diff Show More