## Problem
The initdb cancellation added in #5921 is not sufficient to reliably
abort the entire initdb process. Initdb also spawns children. The tests
added by #6310 (#6385) and #6436 now do initdb cancellations on a more
regular basis.
In #6385, I attempted to issue `killpg` (after giving it a new process
group ID) to kill not just the initdb but all its spawned subprocesses,
but this didn't work. Initdb doesn't take *that* long in the end either,
so we just wait until it concludes.
## Summary of changes
* revert initdb cancellation support added in #5921
* still return `Err(Cancelled)` upon cancellation, but this is just to
not have to remove the cancellation infrastructure
* fixes to the `test_tenant_delete_races_timeline_creation` test to make
it reliably pass
Fixes#6385
## Problem
The API for detaching things wasn't implement yet, but one could hit
this case indirectly from tests when using attach-hook, and find tenants
unexpectedly attached again because their policy remained Single.
## Summary of changes
Add PlacementPolicy::Detached, and:
- add the behavior for it in schedule()
- in tenant_migrate, refuse if the policy is detached
- automatically set this policy in attach-hook if the caller has
specified pageserver=null.
The pagebench integration PR (#6214) issues attachment requests in
parallel.
We observed corrupted attachments.json from time to time, especially in
the test cases with high tenant counts.
The atomic overwrite added in #6444 exposed the root cause cleanly:
the `.commit()` calls of two request handlers could interleave or
be reordered.
See also:
https://github.com/neondatabase/neon/pull/6444#issuecomment-1906392259
This PR makes changes to the `persistence` module to fix above race:
- mpsc queue for PendingWrites
- one writer task performs the writes in mpsc queue order
- request handlers that need to do writes do it using the
new `mutating_transaction` function.
`mutating_transaction`, while holding the lock, does the modifications,
serializes the post-modification state, and pushes that as a
`PendingWrite` into the mpsc queue.
It then release the lock and `await`s the completion of the write.
The writer tasks executes the `PendingWrites` in queue order.
Once the write has been executed, it wakes the writing tokio task.
The pagebench integration PR (#6214) is the first to SIGQUIT & then
restart attachment_service.
With many tenants (100), we have found frequent failures on restart in
the CI[^1].
[^1]:
[Allure](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6214/7615750160/index.html#suites/e26265675583c610f99af77084ae58f1/851ff709578c4452/)
```
2024-01-22T19:07:57.932021Z INFO request{method=POST path=/attach-hook request_id=2697503c-7b3e-4529-b8c1-d12ef912d3eb}: Request handled, status: 200 OK
2024-01-22T19:07:58.898213Z INFO Got SIGQUIT. Terminating
2024-01-22T19:08:02.176588Z INFO version: git-env:d56f31639356ed8e8ce832097f132f27ee19ac8a, launch_timestamp: 2024-01-22 19:08:02.174634554 UTC, build_tag build_tag-env:7615750160, state at /tmp/test_output/test_pageserver_max_throughput_getpage_at_latest_lsn[10-13-30]/repo/attachments.json, listening on 127.0.0.1:15048
thread 'main' panicked at /__w/neon/neon/control_plane/attachment_service/src/persistence.rs:95:17:
Failed to load state from '/tmp/test_output/test_pageserver_max_throughput_getpage_at_latest_lsn[10-13-30]/repo/attachments.json': trailing characters at line 1 column 8957 (maybe your .neon/ dir was written by an older version?)
stack backtrace:
0: rust_begin_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
2: attachment_service::persistence::PersistentState::load_or_new::{{closure}}
at ./control_plane/attachment_service/src/persistence.rs:95:17
3: attachment_service::persistence::Persistence:🆕:{{closure}}
at ./control_plane/attachment_service/src/persistence.rs:103:56
4: attachment_service::main::{{closure}}
at ./control_plane/attachment_service/src/main.rs:69:61
5: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/park.rs:282:63
6: tokio::runtime::coop::with_budget
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/coop.rs:107:5
7: tokio::runtime::coop::budget
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/coop.rs:73:5
8: tokio::runtime::park::CachedParkThread::block_on
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/park.rs:282:31
9: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/context/blocking.rs:66:9
10: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
11: tokio::runtime::context::runtime::enter_runtime
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/context/runtime.rs:65:16
12: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
13: tokio::runtime::runtime::Runtime::block_on
at ./.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.34.0/src/runtime/runtime.rs:350:50
14: attachment_service::main
at ./control_plane/attachment_service/src/main.rs:99:5
15: core::ops::function::FnOnce::call_once
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```
The attachment_service handles SIGQUIT by just exiting the process.
In theory, the SIGQUIT could come in while we're writing out the
`attachments.json`.
Now, in above log output, there's a 1 second gap between the last
request completing
and the SIGQUIT coming in. So, there must be some other issue.
But, let's have this change anyways, maybe it helps uncover the real
cause for the test failure.
1. Introduce a naive `Timeline::get_vectored` implementation
The return type is intended to be flexible enough for various types of
callers. We return the pages in a map keyed by `Key` such that the
caller doesn't have to map back to the key if it needs to know it. Some
callers can ignore errors
for specific pages, so we return a separate `Result<Bytes,
PageReconstructError>` for each page and an overarching
`GetVectoredError` for API misuse. The overhead of the mapping will be
small and bounded since we enforce a maximum key count for the
operation.
2. Use the `get_vectored` API for SLRU segment reconstruction and image
layer creation.
## Problem
Parsing the IP address at check time is a little wasteful.
## Summary of changes
Parse the IP when we get it from cplane. Adding a `None` variant to
still allow malformed patterns
## Problem
There are a lot of responses with 404 role not found error, which are
not getting cached in proxy.
## Summary of changes
If there was returned an empty secret but with the project_id, store it
in cache.
## Problem
There is "neon.pageserver_connstring" GUC with PGC_SIGHUP option,
allowing to change it using
pg_reload_conf(). It is used by control plane to update pageserver
connection string if page server is crashed,
relocated or new shards are added.
It is copied to shared memory because config can not be loaded during
query execution and we need to
reestablish connection to page server.
## Summary of changes
Copying connection string to shared memory is done by postmaster. And
other backends
should check update counter to determine of connection URL is changed
and connection needs to be reestablished.
We can not use standard Postgres LW-locks, because postmaster has proc
entry and so can not wait
on this primitive. This is why lockless access algorithm is implemented
using two atomic counters to enforce
consistent reading of connection string value from shared memory.
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
The idea is to achieve separation between keyspace layout definition
and operating on said keyspace. I've inlined all these function since
they're small and we don't use LTO in the storage release builds
at the moment.
Closes https://github.com/neondatabase/neon/issues/6347
Before this patch, we would update the `tenant_state.intent` in memory
but not persist the detachment to disk.
I noticed this in https://github.com/neondatabase/neon/pull/6214 where
we stop, then restart, the attachment service.
## Problem
When a tenant is in Attaching state, and waiting for the
`concurrent_tenant_warmup` semaphore, it also listens for the tenant
cancellation token. When that token fires, Tenant::attach drops out.
Meanwhile, Tenant::set_stopping waits forever for the tenant to exit
Attaching state.
Fixes: https://github.com/neondatabase/neon/issues/6423
## Summary of changes
- In the absence of a valid state for the tenant, it is set to Broken in
this path. A more elegant solution will require more refactoring, beyond
this minimal fix.
## Problem
If you build the compute-node dockerfile with the PG_VERSION argument
passed in (e.g. `docker build -f Dockerfile.compute-node --build-arg
PG_VERSION=v15 .`, it fails, as some of stages doesn't have the
PG_VERSION arg defined.
## Summary of changes
Added the PG_VERSION arg to the plv8-build, neon-pg-ext-build, and
pg-embedding-pg-build stages of Dockerfile.compute-node
## Problem
See https://neondb.slack.com/archives/C06F5UJH601/p1705731304237889
Adding 1 to xid in `update_next_xid` can cause overflow in debug mode.
0xffffffff is valid transaction ID.
## Summary of changes
Use `wrapping_add`
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
## Problem
In https://github.com/neondatabase/neon/pull/6283 I did a couple changes
that weren't directly related to the goal of extracting the state
machine, so I'm putting them here
## Summary of changes
- move postgres vs console provider into another enum
- reduce error cases for link auth
- slightly refactor link flow
## Problem
Currently we store in cache even if the project is undefined. That makes
invalidation impossible.
## Summary of changes
Do not store if project id is empty.
update_next_xid() doesn't have any special treatment for the invalid or
other special XIDs, so it will treat InvalidTransactionId (0) as a
regular XID. If old nextXid is smaller than 2^31, 0 will look like a
very old XID, and nothing happens. But if nextXid is greater than 2^31 0
will look like a very new XID, and update_next_xid() will incorrectly
bump up nextXID.
configures nextest to kill tests after 1 minute. slow period is set to
20s which is how long our tests currently take in total, there will be 2
warnings and then the test will be killed and it's output logged.
Cc: #6361
Cc: #6368 -- likely this will be enough for longer time, but it will be
counter productive when we want to attach and debug; the added line
would have to be commented out.