Commit Graph

4248 Commits

Author SHA1 Message Date
Christian Schwarz
dc5f651600 pagebench: WIP: command to trigger initial logical size calculation 2023-12-15 17:52:39 +00:00
Christian Schwarz
d2d1432a65 include timeline ids in tenant details response 2023-12-15 17:52:38 +00:00
Christian Schwarz
fae3e01522 WIP: performance test that uses the getpage benchmark 2023-12-15 17:49:36 +00:00
Christian Schwarz
746b5d6323 Revert "expose RemotePath in layer map dump endpoint"
This reverts commit 587b58a90b.
2023-12-15 17:49:03 +00:00
Christian Schwarz
587b58a90b expose RemotePath in layer map dump endpoint 2023-12-15 17:48:46 +00:00
Christian Schwarz
feb64cf67c fixup 2023-12-15 17:48:41 +00:00
Christian Schwarz
eb77341bf8 find a way to duplicate a tenant in local_fs
Use the script like so, against the tenant to duplicate:

    poetry run python3 ./test_runner/duplicate_tenant.py 7ea51af32d42bfe7fb93bf5f28114d09  200 8

backup of pageserver.toml

    d =1
    pg_distrib_dir ='/home/admin/neon-main/pg_install'
    http_auth_type ='Trust'
    pg_auth_type ='Trust'
    listen_http_addr ='127.0.0.1:9898'
    listen_pg_addr ='127.0.0.1:64000'
    broker_endpoint ='http://127.0.0.1:50051/'
    #control_plane_api ='http://127.0.0.1:1234/'

    # Initial configuration file created by 'pageserver --init'
    #listen_pg_addr = '127.0.0.1:64000'
    #listen_http_addr = '127.0.0.1:9898'

    #wait_lsn_timeout = '60 s'
    #wal_redo_timeout = '60 s'

    #max_file_descriptors = 10000
    #page_cache_size = 160000

    # initial superuser role name to use when creating a new tenant
    #initial_superuser_name = 'cloud_admin'

    #broker_endpoint = 'http://127.0.0.1:50051'

    #log_format = 'plain'

    #concurrent_tenant_size_logical_size_queries = '1'

    #metric_collection_interval = '10 min'
    #cached_metric_collection_interval = '0s'
    #synthetic_size_calculation_interval = '10 min'

    #disk_usage_based_eviction = { max_usage_pct = .., min_avail_bytes = .., period = "10s"}

    #background_task_maximum_delay = '10s'

    [tenant_config]
    #checkpoint_distance = 268435456 # in bytes
    #checkpoint_timeout = 10 m
    #compaction_target_size = 134217728 # in bytes
    #compaction_period = '20 s'
    #compaction_threshold = 10

    #gc_period = '1 hr'
    #gc_horizon = 67108864
    #image_creation_threshold = 3
    #pitr_interval = '7 days'

    #min_resident_size_override = .. # in bytes
    #evictions_low_residence_duration_metric_threshold = '24 hour'
    #gc_feedback = false

    # make it determinsitic
    gc_period = '0s'
    checkpoint_timeout = '3650 day'
    compaction_period = '20 s'
    compaction_threshold = 10
    compaction_target_size = 134217728
    checkpoint_distance = 268435456
    image_creation_threshold = 3

    [remote_storage]
    local_path = '/home/admin/neon-main/bench_repo_dir/repo/remote_storage_local_fs'

remove http handler

switch to generalized rewrite_summary  & impl page_ctl subcommand to use it

WIP: change duplicate_tenant.py script to use the pagectl command

The script works but at restart, we detach the created tenants because
they're not known to the attachment service:

  Detaching tenant, control plane omitted it in re-attach response tenant_id=1e399d390e3aee6b11c701cbc716bb6c

=> figure out how to further integrate this
2023-12-15 17:48:33 +00:00
Christian Schwarz
edc2fa88b8 pagebench: add a 'getpage@lsn' benchmark 2023-12-15 17:44:27 +00:00
Christian Schwarz
7a27b811a1 WIP 2023-12-15 17:16:03 +00:00
Christian Schwarz
27a35331c0 pagebench: add a 'basebackup' benchmark 2023-12-15 17:09:23 +00:00
Christian Schwarz
0b9f0e72ac pagebench: scaffold 2023-12-15 17:09:21 +00:00
Christian Schwarz
2c631d3dc9 clippy 2023-12-15 17:07:59 +00:00
Christian Schwarz
300d6c38ad Merge remote-tracking branch 'origin/problame/benchmarking/pr/keyspace-in-mgmt-api' into problame/benchmarking/pr/page_service_api_client 2023-12-15 17:06:41 +00:00
Christian Schwarz
8985331533 Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api 2023-12-15 15:50:15 +00:00
Christian Schwarz
a6abcbe454 hakari manage-deps 2023-12-15 15:49:08 +00:00
Christian Schwarz
888a7311f4 preseed rng for display_fromstr_bijection test case 2023-12-15 15:44:09 +00:00
Christian Schwarz
28479529ae fixup merge: move keyspace and models::partitioning into pageserver_api 2023-12-15 15:41:00 +00:00
Christian Schwarz
feaee19d4a Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api 2023-12-15 15:32:02 +00:00
Christian Schwarz
9e238a34b4 make cargo deny happy 2023-12-15 15:31:29 +00:00
Christian Schwarz
46889d768e move client to separate crate 2023-12-15 15:21:05 +00:00
Christian Schwarz
1a71b72c39 move serialization roundtrip to rust unit test 2023-12-15 15:09:39 +00:00
Christian Schwarz
e4509d151d Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api 2023-12-15 14:05:47 +00:00
Christian Schwarz
f91625a552 fixup 2023-12-15 14:04:09 +00:00
Christian Schwarz
795fe55332 Merge branch 'problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api 2023-12-15 13:52:12 +00:00
Christian Schwarz
cab12c02a3 clippy 2023-12-15 13:43:40 +00:00
Christian Schwarz
7ac9ef8291 remove unused dep 2023-12-15 13:38:34 +00:00
Christian Schwarz
d7a8e0b1ae eliminate one workaround, convert sk stuff to async as well 2023-12-15 13:34:32 +00:00
Christian Schwarz
2664e9b834 fix 2023-12-15 12:29:02 +00:00
Christian Schwarz
83bdebb4af WIP 2023-12-15 12:22:33 +00:00
Christian Schwarz
b2508a689b fill in the todo!()s 2023-12-15 11:16:07 +00:00
Christian Schwarz
672a97993d Merge branch 'problame/benchmarking/pr/keyspace-in-mgmt-api' into problame/benchmarking/pr/page_service_api_client 2023-12-15 10:06:13 +00:00
Christian Schwarz
7c63902741 Merge remote-tracking branch 'origin/problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/keyspace-in-mgmt-api 2023-12-15 10:05:22 +00:00
Christian Schwarz
a5214b203d Merge remote-tracking branch 'origin/problame/benchmarking/pr/mgmt-api-client' into problame/benchmarking/pr/page_service_api_client 2023-12-15 09:59:32 +00:00
Christian Schwarz
9e70c213f7 Merge remote-tracking branch 'origin/main' into problame/benchmarking/pr/mgmt-api-client 2023-12-15 09:40:55 +00:00
Christian Schwarz
e8cd645a82 fixup 2023-12-15 09:35:42 +00:00
John Spray
f1cd1a2122 pageserver: improved handling of concurrent timeline creations on the same ID (#6139)
## Problem

Historically, the pageserver used an "uninit mark" file on disk for two
purposes:
- Track which timeline dirs are incomplete for handling on restart
- Avoid trying to create the same timeline twice at the same time.

The original purpose of handling restarts is now defunct, as we use
remote storage as the source of truth and clean up any trash timeline
dirs on startup. Using the file to mutually exclude creation operations
is error prone compared with just doing it in memory, and the existing
checks happened some way into the creation operation, and could expose
errors as 500s (anyhow::Errors) rather than something clean.

## Summary of changes

- Creations are now mutually excluded in memory (using
`Tenant::timelines_creating`), rather than relying on a file on disk for
coordination.
- Acquiring unique access to the timeline ID now happens earlier in the
request.
- Creating the same timeline which already exists is now a 201: this
simplifies retry handling for clients.
- 409 is still returned if a timeline with the same ID is still being
created: if this happens it is probably because the client timed out an
earlier request and has retried.
- Colliding timeline creation requests should no longer return 500
errors

This paves the way to entirely removing uninit markers in a subsequent
change.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-12-15 08:51:23 +00:00
Christian Schwarz
2e0737ce1a pageserver: keyspace in mgmt api client
Part of getpage@lsn benchmark epic:
https://github.com/neondatabase/neon/issues/5771
2023-12-14 19:48:59 +00:00
Christian Schwarz
4c5b7cff49 add a Rust client for pageserver mgmt api
Part of getpage@lsn benchmark epic:
https://github.com/neondatabase/neon/issues/5771

This PR moves the control plane's spread-all-over-the-place client for
the pageserver management API into a separate module within the
pageserver crate.

It also switches to the async version of reqwest, which I think is
generally the right direction, and I need an async client API in the
benchmark epic.
2023-12-14 19:47:26 +00:00
Joonas Koivunen
f010479107 feat(layer): pageserver_layer_redownloaded_after histogram (#6132)
this is aimed at replacing the current mtime only based trashing
alerting later.

Cc: #5331
2023-12-14 21:32:54 +02:00
Conrad Ludgate
cc633585dc gauge guards (#6138)
## Problem

The websockets gauge for active db connections seems to be growing more
than the gauge for client connections over websockets, which does not
make sense.

## Summary of changes

refactor how our counter-pair gauges are represented. not sure if this
will improve the problem, but it should be harder to mess-up the
counters. The API is much nicer though now and doesn't require
scopeguard::defer hacks
2023-12-14 17:21:39 +00:00
Christian Schwarz
aa5581d14f utils::logging: TracingEventCountLayer: don't use with_label_values() on hot path (#6129)
fixes #6126
2023-12-14 16:31:41 +01:00
John Spray
c4e0ef507f pageserver: heatmap uploads (#6050)
Dependency (commits inline):
https://github.com/neondatabase/neon/pull/5842

## Problem

Secondary mode tenants need a manifest of what to download. Ultimately
this will be some kind of heat-scored set of layers, but as a robust
first step we will simply use the set of resident layers: secondary
tenant locations will aim to match the on-disk content of the attached
location.

## Summary of changes

- Add heatmap types representing the remote structure
- Add hooks to Tenant/Timeline for generating these heatmaps
- Create a new `HeatmapUploader` type that is external to `Tenant`, and
responsible for walking the list of attached tenants and scheduling
heatmap uploads.

Notes to reviewers:
- Putting the logic for uploads (and later, secondary mode downloads)
outside of `Tenant` is an opinionated choice, motivated by:
- Enable future smarter scheduling of operations, e.g. uploading the
stalest tenant first, rather than having all tenants compete for a fair
semaphore on a first-come-first-served basis. Similarly for downloads,
we may wish to schedule the tenants with the hottest un-downloaded
layers first.
- Enable accessing upload-related state without synchronization (it
belongs to HeatmapUploader, rather than being some Mutex<>'d part of
Tenant)
- Avoid further expanding the scope of Tenant/Timeline types, which are
already among the largest in the codebase
- You might reasonably wonder how much of the uploader code could be a
generic job manager thing. Probably some of it: but let's defer pulling
that out until we have at least two users (perhaps secondary downloads
will be the second one) to highlight which bits are really generic.

Compromises:
- Later, instead of using digests of heatmaps to decide whether anything
changed, I would prefer to avoid walking the layers in tenants that
don't have changes: tracking that will be a bit invasive, as it needs
input from both remote_timeline_client and Layer.
2023-12-14 13:09:24 +00:00
Conrad Ludgate
6987b5c44e proxy: add more rates to endpoint limiter (#6130)
## Problem

Single rate bucket is limited in usefulness

## Summary of changes

Introduce a secondary bucket allowing an average of 200 requests per
second over 1 minute, and a tertiary bucket allowing an average of 100
requests per second over 10 minutes.

Configured by using a format like

```sh
proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 100@10s --endpoint-rps-limit 50@1m
```

If the bucket limits are inconsistent, an error is returned on startup

```
$ proxy --endpoint-rps-limit 300@1s --endpoint-rps-limit 10@10s
Error: invalid endpoint RPS limits. 10@10s allows fewer requests per bucket than 300@1s (100 vs 300)
```
2023-12-13 21:43:49 +00:00
Alexander Bayandin
0cd49cac84 test_compatibility: make it use initdb.tar.zst 2023-12-13 15:04:25 -06:00
Alexander Bayandin
904dff58b5 test_wal_restore_http: cleanup test 2023-12-13 15:04:25 -06:00
Arthur Petukhovsky
f401a21cf6 Fix test_simple_sync_safekeepers
There is a postgres 16 version encoded in a binary message.
2023-12-13 15:04:25 -06:00
Tristan Partin
158adf602e Update Postgres 16 series to 16.1 2023-12-13 15:04:25 -06:00
Tristan Partin
c94db6adbb Update Postgres 15 series to 15.5 2023-12-13 15:04:25 -06:00
Tristan Partin
85720616b1 Update Postgres 14 series to 14.10 2023-12-13 15:04:25 -06:00
George MacKerron
d6fcc18eb2 Add Neon-Batch- headers to OPTIONS response for SQL-over-HTTP requests (#6116)
This is needed to allow use of batch queries from browsers.

## Problem

SQL-over-HTTP batch queries fail from web browsers because the relevant
headers, `Neon-Batch-isolation-Level` and `Neon-Batch-Read-Only`, are
not included in the server's OPTIONS response. I think we simply forgot
to add them when implementing the batch query feature.

## Summary of changes

Added `Neon-Batch-Isolation-Level` and `Neon-Batch-Read-Only` to the
OPTIONS response.
2023-12-13 17:18:20 +00:00