## Problem
Fix for https://github.com/neondatabase/neon/pull/12324
## Summary of changes
Need `serde(default)` to allow this field not present in the config,
otherwise there will be a config deserialization error.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
gRPC base backups use gRPC compression. However, this has two problems:
* Base backup caching will cache compressed base backups (making gRPC
compression pointless).
* Tonic does not support varying the compression level, and zstd default
level is 10% slower than gzip fastest level.
Touches https://github.com/neondatabase/neon/issues/11728.
Touches https://github.com/neondatabase/cloud/issues/29353.
## Summary of changes
This patch adds a gRPC parameter `BaseBackupRequest::compression`
specifying the compression algorithm. It also moves compression into
`send_basebackup_tarball` to reduce code duplication.
A follow-up PR will integrate the base backup cache with gRPC.
## Summary
A design for a storage system that allows storage of files required to
make
Neon's Endpoints have a better experience at or after a reboot.
## Motivation
Several systems inside PostgreSQL (and Neon) need some persistent
storage for
optimal workings across reboots and restarts, but still work without.
Examples are the cumulative statistics file in `pg_stat/global.stat`,
`pg_stat_statements`' `pg_stat/pg_stat_statements.stat`, and
`pg_prewarm`'s
`autoprewarm.blocks`. We need a storage system that can store and manage
these files for each Endpoint.
[GH rendered
file](https://github.com/neondatabase/neon/blob/MMeent/rfc-unlogged-file/docs/rfcs/040-Endpoint-Persistent-Unlogged-Files-Storage.md)
Part of https://github.com/neondatabase/cloud/issues/24225
## Problem
part of https://github.com/neondatabase/neon/issues/11813
## Summary of changes
It costs $$$ to directly retrieve the feature flags from the pageserver.
Therefore, this patch adds new APIs to retrieve the spec from the
storcon and updates it via pageserver.
* Storcon retrieves the feature flag and send it to the pageservers.
* If the feature flag gets updated outside of the normal refresh loop of
the pageserver, pageserver won't fetch the flags on its own as long as
the last updated time <= refresh_period.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
https://github.com/neondatabase/cloud/issues/30539
If the current leader cancels the `call` function, then it has removed
the jobs from the queue, but will never finish sending the responses.
Because of this, it is not cancellation safe.
## Summary of changes
Document these functions as not cancellation safe. Move cancellation of
the queued jobs into the queue itself.
## Alternatives considered
1. We could spawn the task that runs the batch, since that won't get
cancelled.
* This requires `fn call(self: Arc<Self>)` or `fn call(&'static self)`.
2. We could add another scopeguard and return the requests back to the
queue.
* This requires that requests are always retry safe, and also requires
requests to be `Clone`.
## Problem
While working more on TLS to compute, I realised that Console Redirect
-> pg-sni-router -> compute would break if channel binding was set to
prefer. This is because the channel binding data would differ between
Console Redirect -> pg-sni-router vs pg-sni-router -> compute.
I also noticed that I actually disabled channel binding in #12145, since
`connect_raw` would think that the connection didn't support TLS.
## Summary of changes
Make sure we specify the channel binding.
Make sure that `connect_raw` can see if we have TLS support.
In `test_layer_download_cancelled_by_config_location`, we simulate hung
downloads via the `before-downloading-layer-stream-pausable` failpoint.
Then, we cancel a timeline via the `location_config` endpoint.
With the new default as of
https://github.com/neondatabase/neon/pull/11712, we would be creating
the timeline on safekeepers regardless if there have been writes or not,
and it turns out the test relied on the timeline not existing on
safekeepers, due to a cancellation bug:
* as established before, the test makes the read path hang
* the timeline cancellation function first cancels the walreceiver, and
only then cancels the timeline's token
* `WalIngest::new` is requesting a checkpoint, which hits the read path
* at cancellation time, we'd be hanging inside the read, not seeing the
cancellation of the walreceiver
* the test would time out due to the hang
This is probably also reproducible in the wild when there is S3
unavailabilies or bottlenecks. So we thought that it's worthwhile to fix
the hang issue. The approach chosen in the end involves the
`tokio::select` macro.
In PR 11712, we originally punted on the test due to the hang and opted
it out from the new default, but now we can use the new default.
Part of https://github.com/neondatabase/neon/issues/12299
pgaudit can spam logs due to all the monitoring that we do. Logs from
these connections are not necessary for HIPPA compliance, so we can stop
logging from those connections.
Part-of: https://github.com/neondatabase/cloud/issues/29574
Signed-off-by: Tristan Partin <tristan@neon.tech>
This makes it possible for the compiler to validate that a match block
matched all PostgreSQL versions we support.
## Problem
We did not have a complete picture about which places we had to test
against PG versions, and what format these versions were: The full PG
version ID format (Major/minor/bugfix `MMmmbb`) as transfered in
protocol messages, or only the Major release version (`MM`). This meant
type confusion was rampant.
With this change, it becomes easier to develop new version-dependent
features, by making type and niche confusion impossible.
## Summary of changes
Every use of `pg_version` is now typed as either `PgVersionId` (u32,
valued in decimal `MMmmbb`) or PgMajorVersion (an enum, with a value for
every major version we support, serialized and stored like a u32 with
the value of that major version)
---------
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
## Problem
The billing team wants to change the billing events pipeline and use a
common events format in S3 buckets across different event producers.
## Summary of changes
Change the events storage format for billing events from JSON to NDJSON.
Resolves: https://github.com/neondatabase/cloud/issues/29994
## Problem
PGLB will do the connect_to_compute logic, neonkeeper will do the
session establishment logic. We should split it.
## Summary of changes
Moves postgres authentication to compute to a separate routine that
happens after connect_to_compute.
The 1.88.0 stable release is near (this Thursday). We'd like to fix most
warnings beforehand so that the compiler upgrade doesn't require
approval from too many teams.
This is therefore a preparation PR (like similar PRs before it).
There is a lot of changes for this release, mostly because the
`uninlined_format_args` lint has been added to the `style` lint group.
One can read more about the lint
[here](https://rust-lang.github.io/rust-clippy/master/#/uninlined_format_args).
The PR is the result of `cargo +beta clippy --fix` and `cargo fmt`. One
remaining warning is left for the proxy team.
---------
Co-authored-by: Conrad Ludgate <conrad@neon.tech>
## Problem
`Attached(0)` tenant migrations can get stuck if the heatmap file has
not been uploaded.
## Summary of Changes
- Added a test to reproduce the issue.
- Introduced a `kick_secondary_downloads` config flag:
- Enabled in testing environments.
- Disabled in production (and in the new test).
- Updated `Attached(0)` locations to consider the number of secondaries
in their intent when deciding whether to download the heatmap.
## Problem
Part of #11813
## Summary of changes
Add a test API to make it easier to manipulate the feature flags within
tests.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
The scheduler uses total shards per AZ to select the AZ for newly
created or attached tenants.
This makes bad decisions when we have different node counts per AZ -- we
might have 2 very busy pageservers in one AZ, and 4 more lightly loaded
pageservers in other AZs, and the scheduler picks the busy pageservers
because the total shard count in their AZ is lower.
## Summary of changes
- Divide the shard count by the number of nodes in the AZ when scoring
in `get_az_for_new_tenant`
---------
Co-authored-by: John Spray <john.spray@databricks.com>
The `--gzip-probability` parameter was removed in #12250. However,
`test_basebackup_with_high_slru_count` still uses it, and keeps failing.
This patch removes the use of the parameter (gzip is enabled by
default).
## Problem
Part of #11813
## Summary of changes
The current interval is 30s and it costs a lot of $$$. This patch
reduced it to 600s refresh interval (which means that it takes 10min for
feature flags to propagate from UI to the pageserver). In the future we
can let storcon retrieve the feature flags and push it to pageservers.
We can consider creating a new release or we can postpone this to the
week after the next week.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
The 'make all' step must run always. PR #12311 accidentally left the
condition in there to skip it if there were no changes in postgres v14
sources. That condition belonged to a whole different step that was
removed altogether in PR#12311, and the condition should've been removed
too.
Per CI failure:
https://github.com/neondatabase/neon/actions/runs/15820148967/job/44587394469
## Problem
Previously, we couldn't read from an in-memory layer while a batch was
being written to it. Vice-versa, we couldn't write to it while there
was an on-going read.
## Summary of Changes
The goal of this change is to improve concurrency. Writes happened
through a &mut self method so the enforcement was at the type system
level.
We attempt to improve by:
1. Adding interior mutability to EphemeralLayer. This involves wrapping
the buffered writer in a read-write lock.
2. Minimise the time that the read lock is held for. Only hold the read
lock while reading from the buffers (recently flushed or pending
flush). If we need to read from the file, drop the lock and allow IO
to be concurrent.
The new benchmark variants with concurrent reads improve between 70 to
200 percent (against main).
Benchmark results are in this
[commit](891f094ce6).
## Future Changes
We can push the interior mutability into the buffered writer. The
mutable tail goes under a read lock, the flushed part goes into an
ArcSwap and then we can read from anything that is flushed _without_ any
locking.
gRPC base backups send a stream of fixed-size 64KB chunks.
pagebench basebackup with compression enabled shows this to reduce
throughput:
* 64 KB: 55 RPS
* 128 KB: 69 RPS
* 256 KB: 73 RPS
* 1024 KB: 73 RPS
This patch sets the base backup chunk size to 256 KB.
The build-tools image contains various build tools and dependencies,
mostly Rust-related. The compute image build used it to build
compute_ctl and a few other little rust binaries that are included in
the compute image. However, for extensions built in Rust (pgrx), the
build used a different layer which installed the rust toolchain using
rustup.
Switch to using the same rust toolchain for both pgrx-based extensions
and compute_ctl et al. Since we don't need anything else from the
build-tools image, I switched to using the toolchain installed with
rustup, and eliminated the dependency to build-tools altogether. The
compute image build no longer depends on build-tools.
Note: We no longer use 'mold' for linking compute_ctl et al, since mold
is not included in the build-deps-with-cargo layer. We could add it
there, but it doesn't seem worth it. I proposed stopping using mold
altogether in https://github.com/neondatabase/neon/pull/10735, but that
was rejected because 'mold' is faster for incremental builds. That
doesn't matter much for docker builds however, since they're not
incremental, and the compute binaries are not as large as the storage
server binaries anyway.
To avoid duplicating the build logic. `make all` covers the separate
`postgres-*` and `neon-pg-ext` steps, and also does `cargo build`.
That's how you would typically do a full local build anyway.
Every time you run `make`, it runs `make install` on all the PostgreSQL
sources, which copies the header files. That in turn triggers a rebuild
of the `postgres_ffi` crate, and everything that depends on it. We had
worked around this earlier (see #2458), by passing a custom INSTALL
script to the Postgres makefiles, which refrains from updating the
modification timestamp on headers when they have not been changed, but
the v14 makefile didn't obey INSTALL for the header files. Backporting
c0a1d7621b to v14 fixes that.
This backports upstream PostgreSQL commit c0a1d7621b to v14.
Corresponding PR in the 'postgres' repo:
https://github.com/neondatabase/postgres/pull/660
## Problem
In #12217, we began passing the stripe size in reattach responses, and
persisting it in the on-disk state. This is necessary to ensure the
storage controller and Pageserver have a consistent view of the intended
stripe size of unsharded tenants, which will be used for splits that do
not specify a stripe size. However, for backwards compatibility, these
stripe sizes were optional.
## Summary of changes
Make the stripe sizes required for reattach responses and on-disk
location configs. These will always be provided by the previous
(current) release.
The previous behavior was for the compute to override control plane
options if there was a conflict. We want to change the behavior so that
the control plane has the absolute power on what is right. In the event
that we need a new option passed to the compute as soon as possible, we
can initially roll it out in the control plane, and then migrate the
option to EXTRA_OPTIONS within the compute later, for instance.
Signed-off-by: Tristan Partin <tristan@neon.tech>
## Problem
We need to specify the number of safekeepers for neon_local without
`testing` feature.
Also we need this option for testing different configurations of
safekeeper migration code.
We cannot set it in `neon_fixtures.py` and in the default config of
`neon_local` yet, because it will fail compatibility tests. I'll make a
separate PR with removing `cfg!("testing")` completely and specifying
this option in the config when this option reaches the release branch.
- Part of https://github.com/neondatabase/neon/issues/12298
## Summary of changes
- Add `timeline_safekeeper_count` config option to storcon and
neon_local
Enable it across tests and set it as default. Marks the first milestone
of https://github.com/neondatabase/neon/issues/9114. We already enabled
it in all AWS regions and planning to enable it in all Azure regions
next week.
will merge after we roll out in all regions.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Timeline GC is very aggressive with regards to layer map locking.
We've seen timelines with loads of layers in production that hold the
write lock for the layer map for 30 minutes at a time.
This blocks reads and the write path to some extent.
## Summary of changes
Determining the set of layers to GC is done under the read lock.
Applying the updates is done under the write lock.
Previously, everything was done under write lock.
See #11942
Idea:
* if connections are short lived, they can get enqueued and then also
remove themselves later if they never made it to redis. This reduces the
load on the queue.
* short lived connections (<10m, most?) will only issue 1 command, we
remove the delete command and rely on ttl.
* we can enqueue as many commands as we want, as we can always cancel
the enqueue, thanks to the ~~intrusive linked lists~~ `BTreeMap`.
This improves `pagebench getpage-latest-lsn` gRPC support by:
* Using `page_api::Client`.
* Removing `--protocol`, and using the `page-server-connstring` scheme
instead.
* Adding `--compression` to enable zstd compression.
## Problem
Pagebench does not support gRPC for `basebackup` benchmarks.
Requires #12243.
Touches #11728.
## Summary of changes
Add gRPC support via gRPC connstrings, e.g. `pagebench basebackup
--page-service-connstring grpc://localhost:51051`.
Also change `--gzip-probability` to `--no-compression`, since this must
be specified per-client for gRPC.
Introduce a separate `postgres_ffi_types` crate which contains a few
types and functions that were used in the API. `postgres_ffi_types` is a
much small crate than `postgres_ffi`, and it doesn't depend on bindgen
or the Postgres C headers.
Move NeonWalRecord and Value types to wal_decoder crate. They are only
used in the pageserver-safekeeper "ingest" API. The rest of the ingest
API types are defined in wal_decoder, so move these there as well.
We have been bad at keeping them up-to-date, several contrib modules and
neon extensions were missing from the clean rules. Give up trying, and
remove the targets altogether. In practice, it's straightforward to just
do `rm -rf pg_install/build`, so the clean-targets are hardly worth the
maintenance effort.
I kept `make distclean` though. The rule for that is simple enough.
Since commit 87ad50c925, storage_controller has used diesel_async, which
in turn uses tokio-postgres as the Postgres client, which doesn't
require libpq. Thus we no longer need libpq in the storage image.
## Problem
The gRPC page service should support compression.
Requires #12111.
Touches #11728.
Touches https://github.com/neondatabase/cloud/issues/25679.
## Summary of changes
Add support for gzip and zstd compression in the server, and a client
parameter to enable compression.
This will need further benchmarking under realistic network conditions.
## Problem
GC info is an input to updating layer visibility.
Currently, gc info is updated on timeline activation and visibility is
computed on tenant attach, so we ignore branch points and compute
visibility by taking all layers into account.
Side note: gc info is also updated when timelines are created and
dropped. That doesn't help because we create the timelines in
topological order from the root. Hence the root timeline goes first,
without context of where the branch points are.
The impact of this in prod is that shards need to rehydrate layers after
live migration since the non-visible ones were excluded from the
heatmap.
## Summary of Changes
Move the visibility calculation into tenant attachment instead of
activation.
## Problem
`test_runner` integration tests should support gRPC.
Touches #11926.
## Summary of changes
* Enable gRPC for Pageservers, with dynamic port allocations.
* Add a `grpc` parameter for endpoint creation, plumbed through to
`neon_local endpoint create`.
No tests actually use gRPC yet, since computes don't support it yet.