Commit Graph

4188 Commits

Author SHA1 Message Date
Christian Schwarz
fd969dc352 fix real code 2023-12-11 17:59:46 +00:00
Christian Schwarz
4c874178b0 make the repro work by removing the lifetime param from FileGuard 2023-12-11 17:57:49 +00:00
Christian Schwarz
fae1be34a9 WIP 2023-12-11 17:56:43 +00:00
Christian Schwarz
2e96a6be28 minimize 2023-12-11 17:32:08 +00:00
Christian Schwarz
ea107e9687 no tokio spawn required 2023-12-11 17:30:37 +00:00
Christian Schwarz
242a72017d actually require it to be a Send future 2023-12-11 17:29:18 +00:00
Christian Schwarz
8d604a1390 drag page_cache into repro 2023-12-11 17:28:45 +00:00
Christian Schwarz
41547582ab Revert "got meaningful error:"
This reverts commit c4ddc6abaf.
2023-12-11 17:21:29 +00:00
Christian Schwarz
0bcdd0ea7e the fix 2023-12-11 17:19:17 +00:00
Christian Schwarz
c4ddc6abaf got meaningful error:
error[E0521]: borrowed data escapes outside of method
     --> repro-problem/src/virtual_file.rs:702:51
      |
  692 | impl<'a> FileGuard<'a> {
      |      -- lifetime `'a` defined here
  ...
  695 |     fn with_std_file<F, R>(&mut self, with: F) -> R
      |                            --------- `self` is a reference that is only valid in the method body
  ...
  702 |         let mut file = unsafe { File::from_raw_fd(self.as_ref().as_raw_fd()) };
      |                                                   ^^^^^^^^^^^^^
      |                                                   |
      |                                                   `self` escapes the method body here
      |                                                   argument requires that `'a` must outlive `'static`
      |
      = note: requirement occurs because of a mutable reference to `FileGuard<'_>`
      = note: mutable references are invariant over their type parameter
      = help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance
2023-12-11 17:18:40 +00:00
Christian Schwarz
b5eb1bb1ae WIP 2023-12-11 17:16:26 +00:00
Christian Schwarz
2f3693af14 WIP: repro problem 2023-12-11 17:12:20 +00:00
Christian Schwarz
e4ba18d27b make read_exact_at cancel-safe
Pass `FileGuard` to the tokio-epoll-uring read() operatio.
and avoids the nasty scopeguard usage.

This makes the `read_exact_at` cancel-safe.

Depends on https://github.com/neondatabase/tokio-epoll-uring/pull/27
2023-12-11 16:27:32 +00:00
Christian Schwarz
0ca94f5b53 re-add duration metrics 2023-12-11 14:42:21 +00:00
Christian Schwarz
65d0acb081 use tokio_epoll_uring for read path & VirtualFile::open
This makes Delta/Image ::load fns fully tokio-epoll-uring
2023-12-11 14:30:21 +00:00
Christian Schwarz
7d90ef62ae Revert "revert recent VirtualFile asyncification changes (#5291)"
This reverts commit ab1f37e908.

fixes #5479
2023-12-11 14:24:06 +00:00
Joonas Koivunen
f0d15cee6f build: update azure-* to 0.17 (#6081)
this is a drive-by upgrade while we refresh the access tokens at the
same time.
2023-12-11 12:21:02 +01:00
Sasha Krassovsky
0ba4cae491 Fix RLS/REPLICATION granting (#6083)
## Problem

## Summary of changes

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-12-08 12:55:44 -08:00
Andrew Rudenko
df1f8e13c4 proxy: pass neon options in deep object format (#6068)
---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2023-12-08 19:58:36 +01:00
John Spray
e640bc7dba tests: allow-lists for occasional failures (#6074)
test_creating_tenant_conf_after...
- Test detaches a tenant and then re-attaches immediatel: this causes a
race between pending remote LSN update and the generation bump in the
attachment.

test_gc_cutoff:
- Test rapidly restarts a pageserver before one generation has had the
chance to process deletions from the previous generation
2023-12-08 17:32:16 +00:00
Christian Schwarz
cf024de202 virtual_file metrics: expose max size of the fd cache (#6078)
And also leave a comment on how to determine current size.

Kind of follow-up to #6066

refs https://github.com/neondatabase/cloud/issues/8351
refs https://github.com/neondatabase/neon/issues/5479
2023-12-08 17:23:50 +00:00
Conrad Ludgate
e1a564ace2 proxy simplify cancellation (#5916)
## Problem

The cancellation code was confusing and error prone (as seen before in
our memory leaks).

## Summary of changes

* Use the new `TaskTracker` primitve instead of JoinSet to gracefully
wait for tasks to shutdown.
* Updated libs/utils/completion to use `TaskTracker`
* Remove `tokio::select` in favour of `futures::future::select` in a
specialised `run_until_cancelled()` helper function
2023-12-08 16:21:17 +00:00
Christian Schwarz
f5b9af6ac7 page cache: improve eviction-related metrics (#6077)
These changes help with identifying thrashing.

The existing `pageserver_page_cache_find_victim_iters_total` is already
useful, but, it doesn't tell us how many individual find_victim() calls
are happening, only how many clock-LRU steps happened in the entire
system,
without info about whether we needed to actually evict other data vs
just scan for a long time, e.g., because the cache is large.

The changes in this PR allows us to
1. count each possible outcome separately, esp evictions
2. compute mean iterations/outcome

I don't think anyone except me was paying close attention to
`pageserver_page_cache_find_victim_iters_total` before, so,
I think the slight behavior change of also counting iterations
for the 'iters exceeded' case is fine.

refs https://github.com/neondatabase/cloud/issues/8351
refs https://github.com/neondatabase/neon/issues/5479
2023-12-08 15:27:21 +00:00
John Spray
5e98855d80 tests: update tests that used local_fs&mock_s3 to use one or the other (#6015)
## Problem

This was wasting resources: if we run a test with mock s3 we don't then
need to run it again with local fs. When we're running in CI, we don't
need to run with the mock/local storage as well as real S3. There is
some value in having CI notice/spot issues that might otherwise only
happen when running locally, but that doesn't justify the cost of
running the tests so many more times on every PR.

## Summary of changes

- For tests that used available_remote_storages or
available_s3_storages, update them to either specify no remote storage
(therefore inherit the default, which is currently local fs), or to
specify s3_storage() for the tests that actually want an S3 API.
2023-12-08 14:52:37 +00:00
Conrad Ludgate
699049b8f3 proxy: make auth more type safe (#5689)
## Problem

a5292f7e67/proxy/src/auth/backend.rs (L146-L148)

a5292f7e67/proxy/src/console/provider/neon.rs (L90)

a5292f7e67/proxy/src/console/provider/neon.rs (L154)

## Summary of changes

1. Test backend is only enabled on `cfg(test)`.
2. Postgres mock backend + MD5 auth keys are only enabled on
`cfg(feature = testing)`
3. Password hack and cleartext flow will have their passwords validated
before proceeding.
4. Distinguish between ClientCredentials with endpoint and without,
removing many panics in the process
2023-12-08 11:48:37 +00:00
John Spray
2c544343e0 pageserver: filtered WAL ingest for sharding (#6024)
## Problem

Currently, if one creates many shards they will all ingest all the data:
not much use! We want them to ingest a proportional share of the data
each.

Closes: #6025

## Summary of changes

- WalIngest object gets a copy of the ShardIdentity for the Tenant it
was created by.
- While iterating the `blocks` part of a decoded record, blocks that do
not match the current shard are ignored, apart from on shard zero where
they are used to update relation sizes in `observe_decoded_block` (but
not stored).
- Before committing a `DataDirModificiation` from a WAL record, we check
if it's empty, and drop the record if so. This check is necessary
(rather than just looking at the `blocks` part) because certain record
types may modify blocks in non-obvious ways (e.g.
`ingest_heapam_record`).
- Add WAL ingest metrics to record the total received, total committed,
and total filtered out
- Behaviour for unsharded tenants is unchanged: they will continue to
ingest all blocks, and will take the fast path through `is_key_local`
that doesn't bother calculating any hashes.

After this change, shards store a subset of the tenant's total data, and
accurate relation sizes are only maintained on shard zero.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-12-08 10:12:37 +00:00
Arseny Sher
193e60e2b8 Fix/edit pgindent confusing places in neon. 2023-12-08 14:03:13 +04:00
Arseny Sher
1bbd6cae24 pgindent pgxn/neon 2023-12-08 14:03:13 +04:00
Arseny Sher
65f48c7002 Make targets to run pgindent on core and neon extension. 2023-12-08 14:03:13 +04:00
Alexander Bayandin
d9d8e9afc7 test_tenant_reattach: fix reattach mode names (#6070)
## Problem

Ref
https://neondb.slack.com/archives/C033QLM5P7D/p1701987609146109?thread_ts=1701976393.757279&cid=C033QLM5P7D

## Summary of changes
- Make reattach mode names unique for `test_tenant_reattach`
2023-12-08 08:39:45 +00:00
Arpad Müller
7914eaf1e6 Buffer initdb.tar.zst to a temporary file before upload (#5944)
In https://github.com/neondatabase/neon/pull/5912#pullrequestreview-1749982732 , Christian liked the idea of using files instead of buffering the
archive to RAM for the *download* path. This is for the upload path,
which is a very similar situation.
2023-12-08 03:33:44 +01:00
Joonas Koivunen
37fdbc3aaa fix: use larger buffers for remote storage (#6069)
Currently using 8kB buffers, raise that to 32kB to hopefully 1/4 of
`spawn_blocking` usage. Also a drive-by fixing of last `tokio::io::copy`
to `tokio::io::copy_buf`.
2023-12-07 19:36:44 +00:00
Tristan Partin
7aa1e58301 Add support for Python 3.12 2023-12-07 12:30:42 -06:00
Christian Schwarz
f2892d3798 virtual_file metrics: distinguish first and subsequent open() syscalls (#6066)
This helps with identifying thrashing.

I don't love the name, but, there is already "close-by-replace".

While reading the code, I also found a case where we waste
work in a cache pressure situation:
https://github.com/neondatabase/neon/issues/6065

refs https://github.com/neondatabase/cloud/issues/8351
2023-12-07 16:17:33 +00:00
Joonas Koivunen
b492cedf51 fix(remote_storage): buffering, by using streams for upload and download (#5446)
There is double buffering in remote_storage and in pageserver for 8KiB
in using `tokio::io::copy` to read `BufReader<ReaderStream<_>>`.

Switches downloads and uploads to use `Stream<Item =
std::io::Result<Bytes>>`. Caller and only caller now handles setting up
buffering. For reading, `Stream<Item = ...>` is also a `AsyncBufRead`,
so when writing to a file, we now have `tokio::io::copy_buf` reading
full buffers and writing them to `tokio::io::BufWriter` which handles
the buffering before dispatching over to `tokio::fs::File`.

Additionally implements streaming uploads for azure. With azure
downloads are a bit nicer than before, but not much; instead of one huge
vec they just hold on to N allocations we got over the wire.

This PR will also make it trivial to switch reading and writing to
io-uring based methods.

Cc: #5563.
2023-12-07 15:52:22 +00:00
John Spray
880663f6bc tests: use tenant_create() helper in test_bulk_insert (#6064)
## Problem

Since #5449 we enable generations in tests by default. Running
benchmarks was missed while merging that PR, and there was one that
needed updating.

## Summary of changes

Make test_bulk_insert use the proper generation-aware helper for tenant
creation.
2023-12-07 14:52:16 +00:00
John Spray
e89e41f8ba tests: update for tenant generations (#5449)
## Problem

Some existing tests are written in a way that's incompatible with tenant
generations.

## Summary of changes

Update all the tests that need updating: this is things like calling
through the NeonPageserver.tenant_attach helper to get a generation
number, instead of calling directly into the pageserver API. There are
various more subtle cases.
2023-12-07 12:27:16 +00:00
Conrad Ludgate
f9401fdd31 proxy: fix channel binding error messages (#6054)
## Problem

For channel binding failed messages we were still saying "channel
binding not supported" in the errors.

## Summary of changes

Fix error messages
2023-12-07 11:47:16 +00:00
Joonas Koivunen
b7ffe24426 build: update tokio to 1.34.0, tokio-utils 0.7.10 (#6061)
We should still remember to bump minimum crates for libraries beginning
to use task tracker.
2023-12-07 11:31:38 +00:00
Joonas Koivunen
52718bb8ff fix(layer): metric splitting, span rename (#5902)
Per [feedback], split the Layer metrics, also finally account for lost
and [re-submitted feedback] on `layer_gc` by renaming it to
`layer_delete`, `Layer::garbage_collect_on_drop` renamed to
`Layer::delete_on_drop`. References to "gc" dropped from metric names
and elsewhere.

Also fixes how the cancellations were tracked: there was one rare
counter. Now there is a top level metric for cancelled inits, and the
rare "download failed but failed to communicate" counter is kept.

Fixes: #6027

[feedback]: https://github.com/neondatabase/neon/pull/5809#pullrequestreview-1720043251
[re-submitted feedback]: https://github.com/neondatabase/neon/pull/5108#discussion_r1401867311
2023-12-07 11:39:40 +02:00
Joonas Koivunen
10c77cb410 temp: increase the wait tenant activation timeout (#6058)
5s is causing way too much noise; this is of course a temporary fix, we
should prioritize tenants for which there are pagestream openings the
highest, second highest the basebackups.

Deployment thread for context:
https://neondb.slack.com/archives/C03H1K0PGKH/p1701935048144479?thread_ts=1701765158.926659&cid=C03H1K0PGKH
2023-12-07 09:01:08 +00:00
Heikki Linnakangas
31be301ef3 Make simple_rcu::RcuWaitList::wait() async (#6046)
The gc_timeline() function is async, but it calls the synchronous wait()
function. In the worst case, that could lead to a deadlock by using up
all tokio executor threads.

In the passing, fix a few typos in comments.

Fixes issue #6045.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-12-07 10:20:40 +02:00
Joonas Koivunen
a3c7d400b4 fix: avoid allocations with logging a slug (#6047)
to_string forces allocating a less than pointer sized string (costing on
stack 4 usize), using a Display formattable slug saves that. the
difference seems small, but at the same time, we log these a lot.
2023-12-07 07:25:22 +00:00
Vadim Kharitonov
7501ca6efb Revert timescaledb for pg14 and pg15 (#6056)
```
could not start the compute node: compute is in state "failed": db error: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory Caused by: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory
```
2023-12-06 15:12:36 +00:00
Christian Schwarz
987c9aaea0 virtual_file: fix the metric for close() calls done by VirtualFile::drop (#6051)
Before this PR we would inc() the counter for `Close` even though the
slot's FD had already been closed.

Especially visible when subtracting `open` from `close+close-by-replace`
on a system that does a lot of attach and detach.

refs https://github.com/neondatabase/cloud/issues/8440
refs https://github.com/neondatabase/cloud/issues/8351
2023-12-06 12:05:28 +00:00
Konstantin Knizhnik
7fab731f65 Track size of FSM fork while applying records at replica (#5901)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1700560921471619

## Summary of changes

Update relation size cache for FSM fork in WAL records filter

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-12-05 18:49:24 +02:00
John Spray
483caa22c6 pageserver: logging tweaks (#6039)
- The `Attaching tenant` log message omitted some useful information
like the generation and mode
- info-level messages about writing configuration files were
unnecessarily verbose
- During process shutdown, we don't emit logs about the various phases:
this is very cheap to log since we do it once per process lifetime, and
is helpful when figuring out where something got stuck during a hang.
2023-12-05 16:11:15 +00:00
John Spray
da5e03b0d8 pageserver: add a /reset API for tenants (#6014)
## Problem

Traditionally we would detach/attach directly with curl if we wanted to
"reboot" a single tenant. That's kind of inconvenient these days,
because one needs to know a generation number to issue an attach
request.

Closes: https://github.com/neondatabase/neon/issues/6011

## Summary of changes

- Introduce a new `/reset` API, which remembers the LocationConf from
the current attachment so that callers do not have to work out the
correct configuration/generation to use.
- As an additional support tool, allow an optional `drop_cache` query
parameter, for situations where we are concerned that some on-disk state
might be bad and want to clear that as well as the in-memory state.

One might wonder why I didn't call this "reattach" -- it's because
there's already a PS->CP API of that name and it could get confusing.
2023-12-05 15:38:27 +00:00
John Spray
be885370f6 pageserver: remove redundant unsafe_create_dir_all (#6040)
This non-fsyncing analog to our safe directory creation function was
just duplicating what tokio's fs::create_dir_all does.
2023-12-05 15:03:07 +00:00
Alexey Kondratov
bc1020f965 compute_ctl: Notify waiters when Postgres failed to start (#6034)
In case of configuring the empty compute, API handler is waiting on
condvar for compute state change. Yet, previously if Postgres failed to
start we were just setting compute status to `Failed` without notifying.
It causes a timeout on control plane side, although we can return a
proper error from compute earlier.

With this commit API handler should be properly notified.
2023-12-05 13:38:45 +01:00