Commit Graph

6653 Commits

Author SHA1 Message Date
Christian Schwarz
d6e5a46015 eliminate the word batch and stale doc comments 2024-11-22 12:46:52 +01:00
Christian Schwarz
a28c54dac1 cosmetics 2024-11-22 12:44:31 +01:00
Christian Schwarz
ef502f8311 remove async-timer heritage 2024-11-22 12:43:55 +01:00
Christian Schwarz
39e45f9e51 improve tests 2024-11-22 12:27:38 +01:00
Christian Schwarz
c1e8347160 make configurable whether pipelining should use concurrent futures or tasks 2024-11-22 11:27:23 +01:00
Christian Schwarz
093674b2fb impmlement the serial mode 2024-11-22 09:53:08 +01:00
Christian Schwarz
0fa8ae3c0a WIP refactor to allow truly serial mode 2024-11-22 09:47:49 +01:00
Christian Schwarz
c1040bc25d task-based mode 2024-11-22 09:36:45 +01:00
Christian Schwarz
a3d1cf636b config changes to express pipelining config (not respected yet) 2024-11-22 08:36:17 +01:00
Christian Schwarz
89d9d16130 cherry-pick from problame/batching-benchmark while it's waiting for merge 2024-11-22 08:17:30 +01:00
Christian Schwarz
88fd8aed52 watch-based approach 2024-11-21 23:03:21 +01:00
Christian Schwarz
db9093f938 revert back to 'span fixes' commit 2024-11-21 22:07:05 +01:00
Christian Schwarz
240e48df59 improvements 2024-11-21 21:57:53 +01:00
Christian Schwarz
7680aa12a8 draft 2024-11-21 21:34:58 +01:00
Christian Schwarz
56de07154e fruitless debugging 2024-11-21 20:46:56 +01:00
Christian Schwarz
73046fdf5b span fixes 2024-11-21 20:21:55 +01:00
Christian Schwarz
408bc8fc71 cleanups 2024-11-21 19:42:43 +01:00
Christian Schwarz
345f8b6c3b fix ready_for_next_batch order 2024-11-21 19:11:57 +01:00
Christian Schwarz
aa1032aeff no need for cancel & ctx in pagestream_do_batch 2024-11-21 18:40:22 +01:00
Christian Schwarz
a1bb2e7bb0 WIP: pipelined batching 2024-11-21 18:33:34 +01:00
Christian Schwarz
09e7485004 Merge branch 'problame/merge-getpage-test' into problame/batching-timer 2024-11-21 11:28:12 +01:00
Christian Schwarz
058b35f884 Merge branch 'problame/batching-benchmark' into problame/merge-getpage-test 2024-11-21 11:27:16 +01:00
Christian Schwarz
ff0aa152f1 Merge remote-tracking branch 'origin/main' into problame/batching-benchmark 2024-11-21 11:25:23 +01:00
Christian Schwarz
3375f28990 pytest.approx; https://github.com/neondatabase/neon/pull/9820#discussion_r1850679974 2024-11-21 11:21:50 +01:00
Christian Schwarz
e82deb2ccc high-resolution CPU usage 2024-11-21 11:16:00 +01:00
Christian Schwarz
fa7ce2ca07 the final choice: async-timer 1.0beta15 with features=["tokio1"] 2024-11-21 11:15:02 +01:00
John Spray
42bda5d632 pageserver: revise metrics lifetime for SecondaryTenant (#9818)
## Problem

We saw a scale test failure when one shard went
secondary->attached->secondary in a short period of time -- the metrics
for the shard failed a validation assertion that is meant to ensure the
size metric matches the sum of layer sizes in the SecondaryDetail
struct.

This appears to be due to two SecondaryTenants being alive at the same
time -- the first one was shut down but still had its contributions to
the metrics.

Closes: https://github.com/neondatabase/neon/issues/9628

## Summary of changes

- Refactor code for validating metrics and call it in shutdown as well
as during downloads
- Move code for dropping per-tenant secondary metrics from drop() into
shutdown(), so that once shutdown() completes it is definitely safe to
instantiate another SecondaryTenant for the same tenant.
2024-11-21 08:31:24 +00:00
Arpad Müller
59c2c3f8ad compute_ctl: print OpenTelemetry errors via tracing, not stdout (#9830)
Before, `OpenTelemetry` errors were printed to stdout/stderr directly,
causing one of the few log lines without a timestamp, like:

```
OpenTelemetry trace error occurred. error sending request for url (http://localhost:4318/v1/traces)
```

Now, we print:

```
2024-11-21T02:24:20.511160Z  INFO OpenTelemetry error: error sending request for url (http://localhost:4318/v1/traces)
```

I found this while investigating #9731.
2024-11-21 04:46:01 +00:00
Ivan Efremov
2d6bf176a0 proxy: Refactor http conn pool (#9785)
- Use the same ConnPoolEntry for http connection pool.
- Rename EndpointConnPool to the HttpConnPool.
- Narrow clone bound for client

Fixes #9284
2024-11-20 19:36:29 +00:00
Vadim Kharitonov
313ebfdb88 [proxy] chore: allow bypassing empty params to /sql endpoint (#9827)
## Problem

```
curl -H "Neon-Connection-String: postgresql://neondb_owner:PASSWORD@ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/neondb?sslmode=require" https://ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/sql -d '{"query":"SELECT 1","params":[]}'
```

For such a query, I also need to send `params`. Do I really need it?

## Summary of changes
I've marked `params` as optional
2024-11-20 19:36:23 +00:00
Arpad Müller
811fab136f scrubber: allow restricting find_garbage to a partial tenant id prefix (#9814)
Adds support to the `find_garbage` command to restrict itself to a
partial tenant ID prefix, say `a`, and then it only traverses tenants
with IDs starting with `a`. One can now pass the `--tenant-id-prefix`
parameter.

That way, one can shard the `find_garbage` command and make it run in
parallel.

The PR also does a change of how `remote_storage` first removes trailing
`/`s, only to then add them in the listing function. It turns out that
this isn't neccessary and it prevents the prefix functionality from
working. S3 doesn't do this either.
2024-11-20 19:31:02 +00:00
Christian Schwarz
89b6cb8eba Revert "vanilla tokio based timer impl based on tokio::time::Sleep"
This reverts commit 517dda849f.
2024-11-20 20:17:49 +01:00
Christian Schwarz
c68661dfb3 Revert "undo local modifications to benchmark"
This reverts commit 7be13bc5a6.
2024-11-20 19:53:06 +01:00
Christian Schwarz
517dda849f vanilla tokio based timer impl based on tokio::time::Sleep 2024-11-20 19:52:47 +01:00
Christian Schwarz
f22ad868cf Revert "tokio_timerfd::Delay based impl"
This reverts commit fcda7a72c6.
2024-11-20 19:45:37 +01:00
Christian Schwarz
fcda7a72c6 tokio_timerfd::Delay based impl
Performs identically great to the async-timer::Timer features=tokio1 impl
Makes sense because it's the same thing that's happening under the hood.

https://www.notion.so/neondatabase/benchmarking-notes-143f189e004780c4a630cb5f426e39ba?pvs=4#144f189e004780ea9decc82281f6b8d1
2024-11-20 19:42:00 +01:00
Christian Schwarz
469ce810fc Revert "async-timer based approach (again, with data)"
This reverts commit 689788cbba.
2024-11-20 19:40:24 +01:00
Christian Schwarz
21866faa8a Revert "try async-timer 1.0.0-beta15 (still signal-based timers)"
This reverts commit c73e9e40e9.
2024-11-20 19:37:51 +01:00
Christian Schwarz
cbb5817997 Revert "async-timer 1.0.0-beta15 with features=tokio1"
This reverts commit 68550f0f50.
2024-11-20 19:37:44 +01:00
Vlad Lazar
ee26f09e45 pageserver: remove shard split hard link assertion (#9829)
## Problem

We were hitting this assertion in debug mode tests sometimes.

This case was being hit when the parent shard has no resident layers.
For instance, this is the case on split retry where the previous attempt
shut-down the parent and deleted local state for it. If the logical size
calculation does not download some layers before we get to the
hardlinking, then the assertion is hit.

## Summary of Changes

Remove the assertion. It's fine for the ancestor to not have any
resident layers at the time of the split.

Closes https://github.com/neondatabase/neon/issues/9412
2024-11-20 18:33:05 +00:00
Christian Schwarz
5f3e6f398c Revert "try interval-based impl to cross-chec"
This reverts commit 721643beed.
2024-11-20 18:52:55 +01:00
Christian Schwarz
721643beed try interval-based impl to cross-chec
=> zero batching

https://www.notion.so/neondatabase/benchmarking-notes-143f189e004780c4a630cb5f426e39ba?pvs=4#144f189e00478065a9b3e51726082885
2024-11-20 18:50:48 +01:00
Conrad Ludgate
f36f0068b8 chore(proxy): demote more logs during successful connection attempts (#9828)
Follow up to #9803 

See https://github.com/neondatabase/cloud/issues/14378

In collaboration with @cloneable and @awarus, we sifted through logs and
simply demoted some logs to debug. This is not at all finished and there
are more logs to review, but we ran out of time in the session we
organised. In any slightly more nuanced cases, we didn't touch the log,
instead leaving a TODO comment.

I've also slightly refactored the sql-over-http body read/length reject
code. I can split that into a separate PR. It just felt natural after I
switched to `read_body_with_limit` as we discussed during the meet.
2024-11-20 17:50:39 +00:00
Christian Schwarz
68550f0f50 async-timer 1.0.0-beta15 with features=tokio1
Best batching factor so far with no worse degradation of
un-batchable workloads than the other candidates.

https://www.notion.so/neondatabase/benchmarking-notes-143f189e004780c4a630cb5f426e39ba?pvs=4#144f189e004780c0921fe99e1da0e8c9
2024-11-20 18:41:31 +01:00
Christian Schwarz
c73e9e40e9 try async-timer 1.0.0-beta15 (still signal-based timers)
Results unchanged to 0.7.4

https://www.notion.so/neondatabase/benchmarking-notes-143f189e004780c4a630cb5f426e39ba?pvs=4#144f189e004780e18416cc0faf2aca65
2024-11-20 18:32:53 +01:00
John Spray
5ff2f1ee7d pageserver: enable compaction to proceed while live-migrating (#5397)
## Problem

Long ago, in #5299 the tenant states for migration are added, but
respected only in a coarse-grained way: when hinted not to do deletions,
tenants will just avoid doing all GC or compaction.

Skipping compaction is not necessary for AttachedMulti, as we will soon
become the primary attached location, and it is not a waste of resources
to proceed with compaction. Instead, per the RFC
https://github.com/neondatabase/neon/pull/5029/files), deletions should
be queued up in this state, and executed later when we switch to
AttachedSingle.

Avoiding compaction in AttachedMulti can have an operational impact if a
tenant is under significant write load, as a long-running migration can
result in a large accumulation of delta layers with commensurate impact
on read latency.

Closes: https://github.com/neondatabase/neon/issues/5396

## Summary of changes

- Add a 'config' part to RemoteTimelineClient so that it can be aware of
the mode of the tenant it belongs to, and wire this through for
construction + updates
- Add a special buffer for delayed deletions, and when in AttachedMulti
route deletions here instead of into the main remote client queue. This
is drained when transitioning to AttachedSingle. If the tenant is
detached or our process dies before then, then these objects are leaked.
- As a quality of life improvement, also use the remote timeline
client's knowledge of the tenant state to avoid submitting remote
consistent LSN updates for validation when in AttachedStale (as we know
these will fail)

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-11-20 17:31:55 +00:00
John Spray
67f5f83edc pageserver: avoid reading SLRU blocks for GC on shards >0 (#9423)
## Problem

SLRU blocks, which can add up to several gigabytes, are currently
ingested by all shards, multiplying their capacity cost by the shard
count and slowing down ingest. We do this because all shards need the
SLRU pages to do timestamp->LSN lookup for GC.

Related: https://github.com/neondatabase/neon/issues/7512

## Summary of changes

- On non-zero shards, learn the GC offset from shard 0's index instead
of calculating it.
- Add a test `test_sharding_gc` that exercises this
- Do GC in test_pg_regress as a general smoke test that GC functions run
(e.g. this would fail if we were using SLRUs we didn't have)

In this PR we are still ingesting SLRUs everywhere, but not using them
any more. Part 2 PR (https://github.com/neondatabase/neon/pull/9786)
makes the change to not store them at all.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-11-20 15:56:14 +00:00
Christian Schwarz
7be13bc5a6 undo local modifications to benchmark 2024-11-20 16:00:19 +01:00
John Spray
593e35027a tests: use fewer pageservers in test_sharding_split_smoke (#9804)
## Problem

This test uses a gratuitous number of pageservers (16). This works fine
when there are plenty of system resources, but causes issues on test
runners that have limited resources and run many tests concurrently.

Related: https://github.com/neondatabase/neon/issues/9802

## Summary of changes

- Split from 2 shards to 4, instead of 4 to 8
- Don't give every shard a separate pageserver, let two locations share
each pageserver.

Net result is 4 pageservers instead of 16
2024-11-20 14:57:59 +00:00
Christian Schwarz
689788cbba async-timer based approach (again, with data)
Yep, it's clearly the best one with best batching factor at lowest CPU
usage.

https://www.notion.so/neondatabase/benchmarking-notes-143f189e004780c4a630cb5f426e39ba?pvs=4#144f189e004780d0a205e081458b46db
2024-11-20 15:36:10 +01:00