Compare commits

...

14 Commits

Author SHA1 Message Date
Tristan Partin
e7ec82cc08 Move control plane spec response code to proper location
Per the comment, this code is indeed not related to the compute_ctl HTTP
API.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-10 16:35:43 -06:00
Tristan Partin
3d143ad799 Unbrick the forward compatibility test failures (#10747)
Since the merge of https://github.com/neondatabase/neon/pull/10523,
forward compatibility tests have been broken everywhere.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-10 22:22:10 +00:00
Alex Chi Z.
b0c7ee0175 feat(pageserver): better gc_compaction_split heuristics (#10727)
## Problem

close https://github.com/neondatabase/neon/issues/10213

`range_search` only returns the top-most layers that may satisfy the
search, so it doesn't include all layers that might be accessed (the
user needs to recursively call this function). We need to retrieve the
full layer map and find overlaps in order to have a correct heuristics
of the job split.

## Summary of changes

Retrieve all layers and find overlaps instead of doing `range_search`.
The patch also reduces the time holding the layer map read guard.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-10 19:33:34 +00:00
Erik Grinaker
8c4e94107d pageserver: notify compaction loop at threshold (#10740)
## Problem

The compaction loop currently runs periodically, which can cause it to
wait for up to 20 seconds before starting L0 compaction by default.

Also, when we later separate the semaphores for L0 compaction and image
compaction, we want to give up waiting for the image compaction
semaphore if L0 compaction is needed on any timeline.

Touches #10694.

## Summary of changes

Notify the compaction loop when an L0 flush (on any timeline) exceeds
`compaction_threshold`.

Also do some opportunistic cleanups in the area.
2025-02-10 17:48:09 +00:00
Heikki Linnakangas
c368b0fe14 Use a cache mount to speed up rebuilding compute node image (#10737)
Building the compute rust binaries from scratch is pretty slow, it takes
between 4-15 minutes on my laptop, depending on which compiler flags and
other tricks I use. A cache mount allows caching the dependencies and
incremental builds, which speeds up rebuilding significantly when you
only makes a small change in a source file.
2025-02-10 16:58:29 +00:00
Heikki Linnakangas
aba61a3712 Download awscli in separate layer in Dockerfile, to allow caching (#10733)
The awscli was downloaded at the last stages of the overall compute
image build, which meant that if you modified any part of the build, it
would trigger a re-download of the awscli. That's a bit annoying when
developing locally and rebuilding the compute image repeatedly. Move it
to a separate layer, to cache separately and to avoid the spurious
rebuilds.
2025-02-10 16:48:28 +00:00
Tristan Partin
946da3f7e2 Require --compute-id when running compute_ctl (#10523)
The compute_id will be used when verifying claims sent by the control
plane.

Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-10 16:46:20 +00:00
Ivan Efremov
73633e27ed fix(proxy): Log errors from the local proxy in auth-broker (#10659)
Handle errors from local proxy by parsing HTTP response in auth broker
code

Closes [#19476](https://github.com/neondatabase/cloud/issues/19476)
2025-02-10 16:06:13 +00:00
Konstantin Knizhnik
0cf0119751 Add --save_records option to pg_waldump (#10626)
## Problem

Make it possible to dump WAL records in format recognised by walredo
process.
Intended usage:

```
pg_waldump -R 1663/5/16396  -B 771727 000000010000000100000034 --save-records=/tmp/walredo.records
postgres --wal-redo < /tmp/walredo.records > /tmp/page.img
```

## Summary of changes

Related Postgres PRs:
https://github.com/neondatabase/postgres/pull/575
https://github.com/neondatabase/postgres/pull/572

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-10 15:48:03 +00:00
Alex Chi Z.
b37f52fdf1 feat(pageserver): dump read path on missing key error (#10528)
## Problem

helps investigate https://github.com/neondatabase/neon/issues/10482

## Summary of changes

In debug mode and testing mode, we will record all files visited by a
read operation, and print it out when it errors.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-10 14:25:56 +00:00
Alex Chi Z.
443c8d0b4b feat(pageserver): repartition on L0-L1 boundary (#10548)
## Problem

Reduce the read amplification when doing `repartition`.

## Summary of changes

Compute the L0-L1 boundary LSN and do repartition here.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-10 14:25:48 +00:00
Alexander Bayandin
2f36bdb218 CI(build-neon): fix duplicated builds (#10731)
## Problem

Parameterising `build-neon` job with `test-cfg` makes it to build
exactly the same thing several times.

See
-
874accd6ed/.github/workflows/_build-and-test-locally.yml (L51-L52)
-
https://github.com/neondatabase/neon/actions/runs/13215068271/job/36893373038

## Summary of changes
- Extract `sanitizers` to a separate input from `test-cfg` and set it
separately
- Don't parametrise `build-neon` with `test-cfg`
2025-02-10 12:29:39 +00:00
Ivan Efremov
e7118213ab impr(proxy): Set TTL for Redis cancellation map keys (#10671)
Use expire() op to set TTL for Redis cancellation key
2025-02-10 10:51:53 +00:00
a-masterov
d204d51faf Fix the upgrade test for pg_jwt by adding the database name (#10738)
## Problem
The upgrade test for pg_jwt does not work correctly.
## Summary of changes
The script for the upgrade test is modified to use the database
`contrib_regression`.
2025-02-10 09:56:46 +00:00
26 changed files with 528 additions and 257 deletions

View File

@@ -20,9 +20,14 @@ on:
required: true
type: string
test-cfg:
description: 'a json object of postgres versions and lfc/sanitizers states to build and run regression tests on'
description: 'a json object of postgres versions and lfc states to run regression tests on'
required: true
type: string
sanitizers:
description: 'enabled or disabled'
required: false
default: 'disabled'
type: string
defaults:
run:
@@ -48,8 +53,6 @@ jobs:
# io_uring will account the memory of the CQ and SQ as locked.
# More details: https://github.com/neondatabase/neon/issues/6373#issuecomment-1905814391
options: --init --shm-size=512mb --ulimit memlock=67108864:67108864
strategy:
matrix: ${{ fromJSON(format('{{"include":{0}}}', inputs.test-cfg)) }}
env:
BUILD_TYPE: ${{ inputs.build-type }}
GIT_VERSION: ${{ github.event.pull_request.head.sha || github.sha }}
@@ -89,7 +92,7 @@ jobs:
- name: Set env variables
env:
ARCH: ${{ inputs.arch }}
SANITIZERS: ${{ matrix.sanitizers }}
SANITIZERS: ${{ inputs.sanitizers }}
run: |
CARGO_FEATURES="--features testing"
if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' ]]; then
@@ -167,7 +170,7 @@ jobs:
- name: Run cargo build
env:
WITH_TESTS: ${{ matrix.sanitizers != 'enabled' && '--tests' || '' }}
WITH_TESTS: ${{ inputs.sanitizers != 'enabled' && '--tests' || '' }}
run: |
export ASAN_OPTIONS=detect_leaks=0
${cov_prefix} mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins ${WITH_TESTS}
@@ -177,7 +180,7 @@ jobs:
- name: Install rust binaries
env:
ARCH: ${{ inputs.arch }}
SANITIZERS: ${{ matrix.sanitizers }}
SANITIZERS: ${{ inputs.sanitizers }}
run: |
# Install target binaries
mkdir -p /tmp/neon/bin/
@@ -225,7 +228,7 @@ jobs:
role-duration-seconds: 18000 # 5 hours
- name: Run rust tests
if: ${{ matrix.sanitizers != 'enabled' }}
if: ${{ inputs.sanitizers != 'enabled' }}
env:
NEXTEST_RETRIES: 3
run: |
@@ -334,7 +337,7 @@ jobs:
- name: Pytest regression tests
continue-on-error: ${{ matrix.lfc_state == 'with-lfc' && inputs.build-type == 'debug' }}
uses: ./.github/actions/run-python-test-set
timeout-minutes: ${{ matrix.sanitizers != 'enabled' && 60 || 180 }}
timeout-minutes: ${{ inputs.sanitizers != 'enabled' && 60 || 180 }}
with:
build_type: ${{ inputs.build-type }}
test_selection: regress
@@ -352,7 +355,7 @@ jobs:
PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring
PAGESERVER_GET_VECTORED_CONCURRENT_IO: sidecar-task
USE_LFC: ${{ matrix.lfc_state == 'with-lfc' && 'true' || 'false' }}
SANITIZERS: ${{ matrix.sanitizers }}
SANITIZERS: ${{ inputs.sanitizers }}
# Temporary disable this step until we figure out why it's so flaky
# Ref https://github.com/neondatabase/neon/issues/4540

View File

@@ -74,7 +74,8 @@ jobs:
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
build-tag: ${{ needs.tag.outputs.build-tag }}
build-type: ${{ matrix.build-type }}
test-cfg: '[{"pg_version":"v17", "sanitizers": "enabled"}]'
test-cfg: '[{"pg_version":"v17"}]'
sanitizers: enabled
secrets: inherit

View File

@@ -1578,7 +1578,15 @@ ENV BUILD_TAG=$BUILD_TAG
USER nonroot
# Copy entire project to get Cargo.* files with proper dependencies for the whole project
COPY --chown=nonroot . .
RUN mold -run cargo build --locked --profile release-line-debug-size-lto --bin compute_ctl --bin fast_import --bin local_proxy
RUN --mount=type=cache,uid=1000,target=/home/nonroot/.cargo/registry \
--mount=type=cache,uid=1000,target=/home/nonroot/.cargo/git \
--mount=type=cache,uid=1000,target=/home/nonroot/target \
mold -run cargo build --locked --profile release-line-debug-size-lto --bin compute_ctl --bin fast_import --bin local_proxy && \
mkdir target-bin && \
cp target/release-line-debug-size-lto/compute_ctl \
target/release-line-debug-size-lto/fast_import \
target/release-line-debug-size-lto/local_proxy \
target-bin
#########################################################################################
#
@@ -1641,6 +1649,29 @@ RUN echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 30
&& echo "${pgbouncer_exporter_sha256} pgbouncer_exporter" | sha256sum -c -\
&& echo "${sql_exporter_sha256} sql_exporter" | sha256sum -c -
#########################################################################################
#
# Layer "awscli"
#
#########################################################################################
FROM alpine/curl:${ALPINE_CURL_VERSION} AS awscli
ARG TARGETARCH
RUN set -ex; \
if [ "${TARGETARCH}" = "amd64" ]; then \
TARGETARCH_ALT="x86_64"; \
CHECKSUM="c9a9df3770a3ff9259cb469b6179e02829687a464e0824d5c32d378820b53a00"; \
elif [ "${TARGETARCH}" = "arm64" ]; then \
TARGETARCH_ALT="aarch64"; \
CHECKSUM="8181730be7891582b38b028112e81b4899ca817e8c616aad807c9e9d1289223a"; \
else \
echo "Unsupported architecture: ${TARGETARCH}"; exit 1; \
fi; \
curl --retry 5 -L "https://awscli.amazonaws.com/awscli-exe-linux-${TARGETARCH_ALT}-2.17.5.zip" -o /tmp/awscliv2.zip; \
echo "${CHECKSUM} /tmp/awscliv2.zip" | sha256sum -c -; \
unzip /tmp/awscliv2.zip -d /tmp/awscliv2; \
/tmp/awscliv2/aws/install; \
rm -rf /tmp/awscliv2.zip /tmp/awscliv2
#########################################################################################
#
# Clean up postgres folder before inclusion
@@ -1754,16 +1785,19 @@ RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \
# create folder for file cache
mkdir -p -m 777 /neon/cache
# aws cli is used by fast_import
COPY --from=awscli /usr/local/aws-cli /usr/local/aws-cli
COPY --from=postgres-cleanup-layer --chown=postgres /usr/local/pgsql /usr/local
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/fast_import /usr/local/bin/fast_import
COPY --from=compute-tools --chown=postgres /home/nonroot/target-bin/compute_ctl /usr/local/bin/compute_ctl
COPY --from=compute-tools --chown=postgres /home/nonroot/target-bin/fast_import /usr/local/bin/fast_import
# pgbouncer and its config
COPY --from=pgbouncer /usr/local/pgbouncer/bin/pgbouncer /usr/local/bin/pgbouncer
COPY --chmod=0666 --chown=postgres compute/etc/pgbouncer.ini /etc/pgbouncer.ini
# local_proxy and its config
COPY --from=compute-tools --chown=postgres /home/nonroot/target/release-line-debug-size-lto/local_proxy /usr/local/bin/local_proxy
COPY --from=compute-tools --chown=postgres /home/nonroot/target-bin/local_proxy /usr/local/bin/local_proxy
RUN mkdir -p /etc/local_proxy && chown postgres:postgres /etc/local_proxy
# Metrics exporter binaries and configuration files
@@ -1831,31 +1865,10 @@ RUN apt update && \
locales \
procps \
ca-certificates \
curl \
unzip \
$VERSION_INSTALLS && \
apt clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
# aws cli is used by fast_import (curl and unzip above are at this time only used for this installation step)
ARG TARGETARCH
RUN set -ex; \
if [ "${TARGETARCH}" = "amd64" ]; then \
TARGETARCH_ALT="x86_64"; \
CHECKSUM="c9a9df3770a3ff9259cb469b6179e02829687a464e0824d5c32d378820b53a00"; \
elif [ "${TARGETARCH}" = "arm64" ]; then \
TARGETARCH_ALT="aarch64"; \
CHECKSUM="8181730be7891582b38b028112e81b4899ca817e8c616aad807c9e9d1289223a"; \
else \
echo "Unsupported architecture: ${TARGETARCH}"; exit 1; \
fi; \
curl --retry 5 -L "https://awscli.amazonaws.com/awscli-exe-linux-${TARGETARCH_ALT}-2.17.5.zip" -o /tmp/awscliv2.zip; \
echo "${CHECKSUM} /tmp/awscliv2.zip" | sha256sum -c -; \
unzip /tmp/awscliv2.zip -d /tmp/awscliv2; \
/tmp/awscliv2/aws/install; \
rm -rf /tmp/awscliv2.zip /tmp/awscliv2; \
true
ENV LANG=en_US.utf8
USER postgres
ENTRYPOINT ["/usr/local/bin/compute_ctl"]

View File

@@ -41,6 +41,7 @@ use std::process::exit;
use std::str::FromStr;
use std::sync::atomic::Ordering;
use std::sync::{mpsc, Arc, Condvar, Mutex, RwLock};
use std::time::SystemTime;
use std::{thread, time::Duration};
use anyhow::{Context, Result};
@@ -85,6 +86,19 @@ fn parse_remote_ext_config(arg: &str) -> Result<String> {
}
}
/// Generate a compute ID if one is not supplied. This exists to keep forward
/// compatibility tests working, but will be removed in a future iteration.
fn generate_compute_id() -> String {
let now = SystemTime::now();
format!(
"compute-{}",
now.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_secs()
)
}
#[derive(Parser)]
#[command(rename_all = "kebab-case")]
struct Cli {
@@ -130,10 +144,10 @@ struct Cli {
#[arg(short = 'S', long, group = "spec-path")]
pub spec_path: Option<OsString>,
#[arg(short = 'i', long, group = "compute-id", conflicts_with_all = ["spec", "spec-path"])]
pub compute_id: Option<String>,
#[arg(short = 'i', long, group = "compute-id", default_value = generate_compute_id())]
pub compute_id: String,
#[arg(short = 'p', long, conflicts_with_all = ["spec", "spec-path"], requires = "compute-id", value_name = "CONTROL_PLANE_API_BASE_URL")]
#[arg(short = 'p', long, conflicts_with_all = ["spec", "spec-path"], value_name = "CONTROL_PLANE_API_BASE_URL")]
pub control_plane_uri: Option<String>,
}
@@ -259,20 +273,11 @@ fn try_spec_from_cli(cli: &Cli) -> Result<CliSpecParams> {
});
}
if cli.compute_id.is_none() {
panic!(
"compute spec should be provided by one of the following ways: \
--spec OR --spec-path OR --control-plane-uri and --compute-id"
);
};
if cli.control_plane_uri.is_none() {
panic!("must specify both --control-plane-uri and --compute-id or none");
panic!("must specify --control-plane-uri");
};
match get_spec_from_control_plane(
cli.control_plane_uri.as_ref().unwrap(),
cli.compute_id.as_ref().unwrap(),
) {
match get_spec_from_control_plane(cli.control_plane_uri.as_ref().unwrap(), &cli.compute_id) {
Ok(spec) => Ok(CliSpecParams {
spec,
live_config_allowed: true,
@@ -319,6 +324,7 @@ fn wait_spec(
let tokio_conn_conf = tokio_postgres::config::Config::from_str(connstr.as_str())
.context("cannot build tokio postgres config from connstr")?;
let compute_node = ComputeNode {
compute_id: cli.compute_id.clone(),
connstr,
conn_conf,
tokio_conn_conf,

View File

@@ -59,6 +59,8 @@ pub static PG_PID: AtomicU32 = AtomicU32::new(0);
/// Compute node info shared across several `compute_ctl` threads.
pub struct ComputeNode {
/// The ID of the compute
pub compute_id: String,
// Url type maintains proper escaping
pub connstr: url::Url,
// We connect to Postgres from many different places, so build configs once

View File

@@ -1,6 +1,7 @@
use anyhow::{anyhow, bail, Result};
use postgres::Client;
use reqwest::StatusCode;
use serde::Deserialize;
use std::fs::File;
use std::path::Path;
use tracing::{error, info, instrument, warn};
@@ -11,9 +12,26 @@ use crate::migration::MigrationRunner;
use crate::params::PG_HBA_ALL_MD5;
use crate::pg_helpers::*;
use compute_api::responses::{ControlPlaneComputeStatus, ControlPlaneSpecResponse};
use compute_api::spec::ComputeSpec;
/// Response of the `/computes/{compute_id}/spec` control-plane API.
#[derive(Deserialize, Debug)]
pub struct ControlPlaneSpecResponse {
pub spec: Option<ComputeSpec>,
pub status: ControlPlaneComputeStatus,
}
#[derive(Deserialize, Clone, Copy, Debug, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum ControlPlaneComputeStatus {
// Compute is known to control-plane, but it's not
// yet attached to any timeline / endpoint.
Empty,
// Compute is attached to some timeline / endpoint and
// should be able to start with provided spec.
Attached,
}
// Do control plane request and return response if any. In case of error it
// returns a bool flag indicating whether it makes sense to retry the request
// and a string with error message.

View File

@@ -665,6 +665,22 @@ impl Endpoint {
.to_str()
.unwrap(),
])
// TODO: It would be nice if we generated compute IDs with the same
// algorithm as the real control plane.
//
// TODO: Add this back when
// https://github.com/neondatabase/neon/pull/10747 is merged.
//
//.args([
// "--compute-id",
// &format!(
// "compute-{}",
// SystemTime::now()
// .duration_since(UNIX_EPOCH)
// .unwrap()
// .as_secs()
// ),
//])
.stdin(std::process::Stdio::null())
.stderr(logfile.try_clone()?)
.stdout(logfile);

View File

@@ -2,4 +2,4 @@
set -ex
cd "$(dirname ${0})"
patch -p1 <test-upgrade.patch
pg_prove test.sql
pg_prove -d contrib_regression test.sql

View File

@@ -7,7 +7,7 @@ use serde::{Deserialize, Serialize, Serializer};
use crate::{
privilege::Privilege,
spec::{ComputeSpec, Database, ExtVersion, PgIdent, Role},
spec::{Database, ExtVersion, PgIdent, Role},
};
#[derive(Serialize, Debug, Deserialize)]
@@ -135,26 +135,6 @@ pub struct CatalogObjects {
pub databases: Vec<Database>,
}
/// Response of the `/computes/{compute_id}/spec` control-plane API.
/// This is not actually a compute API response, so consider moving
/// to a different place.
#[derive(Deserialize, Debug)]
pub struct ControlPlaneSpecResponse {
pub spec: Option<ComputeSpec>,
pub status: ControlPlaneComputeStatus,
}
#[derive(Deserialize, Clone, Copy, Debug, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum ControlPlaneComputeStatus {
// Compute is known to control-plane, but it's not
// yet attached to any timeline / endpoint.
Empty,
// Compute is attached to some timeline / endpoint and
// should be able to start with provided spec.
Attached,
}
#[derive(Clone, Debug, Default, Serialize)]
pub struct InstalledExtension {
pub extname: String,

View File

@@ -122,6 +122,7 @@ pub struct ConfigToml {
pub wal_receiver_protocol: PostgresClientProtocol,
pub page_service_pipelining: PageServicePipeliningConfig,
pub get_vectored_concurrent_io: GetVectoredConcurrentIo,
pub enable_read_path_debugging: Option<bool>,
}
#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
@@ -512,6 +513,11 @@ impl Default for ConfigToml {
} else {
GetVectoredConcurrentIo::SidecarTask
},
enable_read_path_debugging: if cfg!(test) || cfg!(feature = "testing") {
Some(true)
} else {
None
},
}
}
}

View File

@@ -1,4 +1,5 @@
use std::fmt::{Debug, Display};
use std::time::Duration;
use futures::Future;
use tokio_util::sync::CancellationToken;
@@ -29,6 +30,11 @@ pub async fn exponential_backoff(
}
}
pub fn exponential_backoff_duration(n: u32, base_increment: f64, max_seconds: f64) -> Duration {
let seconds = exponential_backoff_duration_seconds(n, base_increment, max_seconds);
Duration::from_secs_f64(seconds)
}
pub fn exponential_backoff_duration_seconds(n: u32, base_increment: f64, max_seconds: f64) -> f64 {
if n == 0 {
0.0

View File

@@ -197,6 +197,10 @@ pub struct PageServerConf {
pub page_service_pipelining: pageserver_api::config::PageServicePipeliningConfig,
pub get_vectored_concurrent_io: pageserver_api::config::GetVectoredConcurrentIo,
/// Enable read path debugging. If enabled, read key errors will print a backtrace of the layer
/// files read.
pub enable_read_path_debugging: bool,
}
/// Token for authentication to safekeepers
@@ -360,6 +364,7 @@ impl PageServerConf {
wal_receiver_protocol,
page_service_pipelining,
get_vectored_concurrent_io,
enable_read_path_debugging,
} = config_toml;
let mut conf = PageServerConf {
@@ -446,6 +451,7 @@ impl PageServerConf {
.unwrap_or_default(),
virtual_file_io_mode: virtual_file_io_mode.unwrap_or(virtual_file::IoMode::preferred()),
no_sync: no_sync.unwrap_or(false),
enable_read_path_debugging: enable_read_path_debugging.unwrap_or(false),
};
// ------------------------------------------------------------

View File

@@ -56,6 +56,7 @@ use timeline::CompactOptions;
use timeline::ShutdownMode;
use tokio::io::BufReader;
use tokio::sync::watch;
use tokio::sync::Notify;
use tokio::task::JoinSet;
use tokio_util::sync::CancellationToken;
use tracing::*;
@@ -350,6 +351,9 @@ pub struct Tenant {
/// Overhead of mutex is acceptable because compaction is done with a multi-second period.
compaction_circuit_breaker: std::sync::Mutex<CircuitBreaker>,
/// Signals the tenant compaction loop that there is L0 compaction work to be done.
pub(crate) l0_compaction_trigger: Arc<Notify>,
/// Scheduled gc-compaction tasks.
scheduled_compaction_tasks: std::sync::Mutex<HashMap<TimelineId, Arc<GcCompactionQueue>>>,
@@ -1691,12 +1695,7 @@ impl Tenant {
timeline_id,
index_part,
remote_metadata,
TimelineResources {
remote_client,
pagestream_throttle: self.pagestream_throttle.clone(),
pagestream_throttle_metrics: self.pagestream_throttle_metrics.clone(),
l0_flush_global_state: self.l0_flush_global_state.clone(),
},
self.get_timeline_resources_for(remote_client),
LoadTimelineCause::Attach,
ctx,
)
@@ -4112,6 +4111,7 @@ impl Tenant {
// use an extremely long backoff.
Some(Duration::from_secs(3600 * 24)),
)),
l0_compaction_trigger: Arc::new(Notify::new()),
scheduled_compaction_tasks: Mutex::new(Default::default()),
activate_now_sem: tokio::sync::Semaphore::new(0),
attach_wal_lag_cooldown: Arc::new(std::sync::OnceLock::new()),
@@ -5020,12 +5020,19 @@ impl Tenant {
)
}
/// Call this before constructing a timeline, to build its required structures
/// Builds required resources for a new timeline.
fn build_timeline_resources(&self, timeline_id: TimelineId) -> TimelineResources {
let remote_client = self.build_timeline_remote_client(timeline_id);
self.get_timeline_resources_for(remote_client)
}
/// Builds timeline resources for the given remote client.
fn get_timeline_resources_for(&self, remote_client: RemoteTimelineClient) -> TimelineResources {
TimelineResources {
remote_client: self.build_timeline_remote_client(timeline_id),
remote_client,
pagestream_throttle: self.pagestream_throttle.clone(),
pagestream_throttle_metrics: self.pagestream_throttle_metrics.clone(),
l0_compaction_trigger: self.l0_compaction_trigger.clone(),
l0_flush_global_state: self.l0_flush_global_state.clone(),
}
}
@@ -7698,6 +7705,18 @@ mod tests {
}
tline.freeze_and_flush().await?;
// Force layers to L1
tline
.compact(
&cancel,
{
let mut flags = EnumSet::new();
flags.insert(CompactFlags::ForceL0Compaction);
flags
},
&ctx,
)
.await?;
if iter % 5 == 0 {
let (_, before_delta_file_accessed) =
@@ -7710,6 +7729,7 @@ mod tests {
let mut flags = EnumSet::new();
flags.insert(CompactFlags::ForceImageLayerCreation);
flags.insert(CompactFlags::ForceRepartition);
flags.insert(CompactFlags::ForceL0Compaction);
flags
},
&ctx,
@@ -8156,6 +8176,8 @@ mod tests {
let cancel = CancellationToken::new();
// Image layer creation happens on the disk_consistent_lsn so we need to force set it now.
tline.force_set_disk_consistent_lsn(Lsn(0x40));
tline
.compact(
&cancel,
@@ -8169,8 +8191,7 @@ mod tests {
)
.await
.unwrap();
// Image layers are created at last_record_lsn
// Image layers are created at repartition LSN
let images = tline
.inspect_image_layers(Lsn(0x40), &ctx, io_concurrency.clone())
.await

View File

@@ -44,7 +44,7 @@ pub(crate) use layer::{EvictionError, Layer, ResidentLayer};
use self::inmemory_layer::InMemoryLayerFileId;
use super::timeline::GetVectoredError;
use super::timeline::{GetVectoredError, ReadPath};
use super::PageReconstructError;
pub fn range_overlaps<T>(a: &Range<T>, b: &Range<T>) -> bool
@@ -262,6 +262,8 @@ pub(crate) struct ValuesReconstructState {
pub(crate) io_concurrency: IoConcurrency,
num_active_ios: Arc<AtomicUsize>,
pub(crate) read_path: Option<ReadPath>,
}
/// The level of IO concurrency to be used on the read path
@@ -609,6 +611,7 @@ impl ValuesReconstructState {
delta_layers_visited: 0,
io_concurrency,
num_active_ios: Arc::new(AtomicUsize::new(0)),
read_path: None,
}
}

View File

@@ -22,9 +22,10 @@ use crate::tenant::timeline::compaction::CompactionOutcome;
use crate::tenant::timeline::CompactionError;
use crate::tenant::{Tenant, TenantState};
use pageserver_api::config::tenant_conf_defaults::DEFAULT_COMPACTION_PERIOD;
use utils::backoff::exponential_backoff_duration;
use utils::completion::Barrier;
use utils::pausable_failpoint;
use utils::rate_limit::RateLimit;
use utils::{backoff, pausable_failpoint};
/// Semaphore limiting concurrent background tasks (across all tenants).
///
@@ -211,89 +212,93 @@ pub fn start_background_loops(tenant: &Arc<Tenant>, can_start: Option<&Barrier>)
/// Compaction task's main loop.
async fn compaction_loop(tenant: Arc<Tenant>, cancel: CancellationToken) {
const BASE_BACKOFF_SECS: f64 = 1.0;
const MAX_BACKOFF_SECS: f64 = 300.0;
const RECHECK_CONFIG_INTERVAL: Duration = Duration::from_secs(10);
let ctx = RequestContext::todo_child(TaskKind::Compaction, DownloadBehavior::Download);
let mut first = true;
let mut period = tenant.get_compaction_period();
let mut error_run = 0; // consecutive errors
// Stagger the compaction loop across tenants.
if wait_for_active_tenant(&tenant, &cancel).await.is_break() {
return;
}
if sleep_random(period, &cancel).await.is_err() {
return;
}
loop {
// Recheck that we're still active.
if wait_for_active_tenant(&tenant, &cancel).await.is_break() {
return;
}
let period = tenant.get_compaction_period();
// TODO: we shouldn't need to await to find tenant and this could be moved outside of
// loop, #3501. There are also additional "allowed_errors" in tests.
if first {
first = false;
if sleep_random(period, &cancel).await.is_err() {
break;
}
}
let sleep_duration;
// Refresh the period. If compaction is disabled, check again in a bit.
period = tenant.get_compaction_period();
if period == Duration::ZERO {
#[cfg(not(feature = "testing"))]
info!("automatic compaction is disabled");
// check again in 10 seconds, in case it's been enabled again.
sleep_duration = Duration::from_secs(10)
} else {
let iteration = Iteration {
started_at: Instant::now(),
period,
kind: BackgroundLoopKind::Compaction,
};
tokio::select! {
_ = tokio::time::sleep(RECHECK_CONFIG_INTERVAL) => {},
_ = cancel.cancelled() => return,
}
continue;
}
// Run compaction
let IterationResult { output, elapsed } = iteration
.run(tenant.compaction_iteration(&cancel, &ctx))
.await;
match output {
Ok(outcome) => {
error_run = 0;
// schedule the next compaction immediately in case there is a pending compaction task
sleep_duration = if let CompactionOutcome::Pending = outcome {
Duration::from_secs(1)
} else {
period
};
}
Err(err) => {
let wait_duration = backoff::exponential_backoff_duration_seconds(
error_run + 1,
1.0,
MAX_BACKOFF_SECS,
);
error_run += 1;
let wait_duration = Duration::from_secs_f64(wait_duration);
log_compaction_error(&err, error_run, &wait_duration, cancel.is_cancelled());
sleep_duration = wait_duration;
// Wait for the next compaction run.
let backoff = exponential_backoff_duration(error_run, BASE_BACKOFF_SECS, MAX_BACKOFF_SECS);
tokio::select! {
_ = tokio::time::sleep(backoff), if error_run > 0 => {},
_ = tokio::time::sleep(period), if error_run == 0 => {},
_ = tenant.l0_compaction_trigger.notified(), if error_run == 0 => {},
_ = cancel.cancelled() => return,
}
// Run compaction.
let iteration = Iteration {
started_at: Instant::now(),
period,
kind: BackgroundLoopKind::Compaction,
};
let IterationResult { output, elapsed } = iteration
.run(tenant.compaction_iteration(&cancel, &ctx))
.await;
match output {
Ok(outcome) => {
error_run = 0;
// If there's more compaction work pending, reschedule immediately. This isn't
// necessarily L0 compaction, but that's fine for now.
//
// TODO: differentiate between L0 compaction and other compaction. The former needs
// to be responsive, the latter doesn't.
if outcome == CompactionOutcome::Pending {
tenant.l0_compaction_trigger.notify_one();
}
}
// the duration is recorded by performance tests by enabling debug in this function
debug!(
elapsed_ms = elapsed.as_millis(),
"compaction iteration complete"
);
};
// Sleep
if tokio::time::timeout(sleep_duration, cancel.cancelled())
.await
.is_ok()
{
break;
Err(err) => {
error_run += 1;
let backoff =
exponential_backoff_duration(error_run, BASE_BACKOFF_SECS, MAX_BACKOFF_SECS);
log_compaction_error(&err, error_run, backoff, cancel.is_cancelled());
continue;
}
}
// NB: this log entry is recorded by performance tests.
debug!(
elapsed_ms = elapsed.as_millis(),
"compaction iteration complete"
);
}
}
fn log_compaction_error(
err: &CompactionError,
error_count: u32,
sleep_duration: &Duration,
sleep_duration: Duration,
task_cancelled: bool,
) {
use crate::tenant::upload_queue::NotInitialized;
@@ -390,13 +395,9 @@ async fn gc_loop(tenant: Arc<Tenant>, cancel: CancellationToken) {
return;
}
Err(e) => {
let wait_duration = backoff::exponential_backoff_duration_seconds(
error_run + 1,
1.0,
MAX_BACKOFF_SECS,
);
error_run += 1;
let wait_duration = Duration::from_secs_f64(wait_duration);
let wait_duration =
exponential_backoff_duration(error_run, 1.0, MAX_BACKOFF_SECS);
if matches!(e, crate::tenant::GcError::TimelineCancelled) {
// Timeline was cancelled during gc. We might either be in an event

View File

@@ -45,11 +45,9 @@ use rand::Rng;
use remote_storage::DownloadError;
use serde_with::serde_as;
use storage_broker::BrokerClientChannel;
use tokio::runtime::Handle;
use tokio::sync::mpsc::Sender;
use tokio::{
runtime::Handle,
sync::{oneshot, watch},
};
use tokio::sync::{oneshot, watch, Notify};
use tokio_util::sync::CancellationToken;
use tracing::*;
use utils::critical;
@@ -227,6 +225,7 @@ pub struct TimelineResources {
pub remote_client: RemoteTimelineClient,
pub pagestream_throttle: Arc<crate::tenant::throttle::Throttle>,
pub pagestream_throttle_metrics: Arc<crate::metrics::tenant_throttling::Pagestream>,
pub l0_compaction_trigger: Arc<Notify>,
pub l0_flush_global_state: l0_flush::L0FlushGlobalState,
}
@@ -426,6 +425,9 @@ pub struct Timeline {
/// If true, the last compaction failed.
compaction_failed: AtomicBool,
/// Notifies the tenant compaction loop that there is pending L0 compaction work.
l0_compaction_trigger: Arc<Notify>,
/// Make sure we only have one running gc at a time.
///
/// Must only be taken in two places:
@@ -626,6 +628,71 @@ impl From<layer_manager::Shutdown> for GetVectoredError {
}
}
/// A layer identifier when used in the [`ReadPath`] structure. This enum is for observability purposes
/// only and not used by the "real read path".
pub enum ReadPathLayerId {
PersistentLayer(PersistentLayerKey),
InMemoryLayer(Range<Lsn>),
}
impl std::fmt::Display for ReadPathLayerId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
ReadPathLayerId::PersistentLayer(key) => write!(f, "{}", key),
ReadPathLayerId::InMemoryLayer(range) => {
write!(f, "in-mem {}..{}", range.start, range.end)
}
}
}
}
pub struct ReadPath {
keyspace: KeySpace,
lsn: Lsn,
path: Vec<(ReadPathLayerId, KeySpace, Range<Lsn>)>,
}
impl ReadPath {
pub fn new(keyspace: KeySpace, lsn: Lsn) -> Self {
Self {
keyspace,
lsn,
path: Vec::new(),
}
}
pub fn record_layer_visit(
&mut self,
layer_to_read: &ReadableLayer,
keyspace_to_read: &KeySpace,
lsn_range: &Range<Lsn>,
) {
let id = match layer_to_read {
ReadableLayer::PersistentLayer(layer) => {
ReadPathLayerId::PersistentLayer(layer.layer_desc().key())
}
ReadableLayer::InMemoryLayer(layer) => {
ReadPathLayerId::InMemoryLayer(layer.get_lsn_range())
}
};
self.path
.push((id, keyspace_to_read.clone(), lsn_range.clone()));
}
}
impl std::fmt::Display for ReadPath {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
writeln!(f, "Read path for {} at lsn {}:", self.keyspace, self.lsn)?;
for (idx, (layer_id, keyspace, lsn_range)) in self.path.iter().enumerate() {
writeln!(
f,
"{}: {} {}..{} {}",
idx, layer_id, lsn_range.start, lsn_range.end, keyspace
)?;
}
Ok(())
}
}
#[derive(thiserror::Error)]
pub struct MissingKeyError {
key: Key,
@@ -633,6 +700,8 @@ pub struct MissingKeyError {
cont_lsn: Lsn,
request_lsn: Lsn,
ancestor_lsn: Option<Lsn>,
/// Debug information about the read path if there's an error
read_path: Option<ReadPath>,
backtrace: Option<std::backtrace::Backtrace>,
}
@@ -649,10 +718,15 @@ impl std::fmt::Display for MissingKeyError {
"could not find data for key {} (shard {:?}) at LSN {}, request LSN {}",
self.key, self.shard, self.cont_lsn, self.request_lsn
)?;
if let Some(ref ancestor_lsn) = self.ancestor_lsn {
write!(f, ", ancestor {}", ancestor_lsn)?;
}
if let Some(ref read_path) = self.read_path {
write!(f, "\n{}", read_path)?;
}
if let Some(ref backtrace) = self.backtrace {
write!(f, "\n{}", backtrace)?;
}
@@ -1069,6 +1143,7 @@ impl Timeline {
request_lsn: lsn,
ancestor_lsn: None,
backtrace: None,
read_path: None,
})),
}
}
@@ -1195,6 +1270,13 @@ impl Timeline {
reconstruct_state: &mut ValuesReconstructState,
ctx: &RequestContext,
) -> Result<BTreeMap<Key, Result<Bytes, PageReconstructError>>, GetVectoredError> {
let read_path = if self.conf.enable_read_path_debugging {
Some(ReadPath::new(keyspace.clone(), lsn))
} else {
None
};
reconstruct_state.read_path = read_path;
let traversal_res: Result<(), _> = self
.get_vectored_reconstruct_data(keyspace.clone(), lsn, reconstruct_state, ctx)
.await;
@@ -2584,6 +2666,7 @@ impl Timeline {
compaction_lock: tokio::sync::Mutex::default(),
compaction_failed: AtomicBool::default(),
l0_compaction_trigger: resources.l0_compaction_trigger,
gc_lock: tokio::sync::Mutex::default(),
standby_horizon: AtomicLsn::new(0),
@@ -3504,6 +3587,7 @@ impl Timeline {
request_lsn,
ancestor_lsn: Some(timeline.ancestor_lsn),
backtrace: None,
read_path: std::mem::take(&mut reconstruct_state.read_path),
}));
}
@@ -3622,6 +3706,9 @@ impl Timeline {
}
if let Some((layer_to_read, keyspace_to_read, lsn_range)) = fringe.next_layer() {
if let Some(ref mut read_path) = reconstruct_state.read_path {
read_path.record_layer_visit(&layer_to_read, &keyspace_to_read, &lsn_range);
}
let next_cont_lsn = lsn_range.start;
layer_to_read
.get_values_reconstruct_data(
@@ -3922,6 +4009,12 @@ impl Timeline {
}
let flush_duration = flush_timer.stop_and_record();
// Notify the tenant compaction loop if L0 compaction is needed.
let l0_count = *watch_l0.borrow();
if l0_count >= self.get_compaction_threshold() {
self.l0_compaction_trigger.notify_one();
}
// Delay the next flush to backpressure if compaction can't keep up. We delay by the
// flush duration such that the flush takes 2x as long. This is propagated up to WAL
// ingestion by having ephemeral layer rolls wait for flushes.

View File

@@ -687,6 +687,20 @@ impl Timeline {
// Define partitioning schema if needed
let l0_l1_boundary_lsn = {
// We do the repartition on the L0-L1 boundary. All data below the boundary
// are compacted by L0 with low read amplification, thus making the `repartition`
// function run fast.
let guard = self.layers.read().await;
let l0_min_lsn = guard
.layer_map()?
.level0_deltas()
.iter()
.map(|l| l.get_lsn_range().start)
.min()
.unwrap_or(self.get_disk_consistent_lsn());
l0_min_lsn.max(self.get_ancestor_lsn())
};
// 1. L0 Compact
let l0_compaction_outcome = {
let timer = self.metrics.compact_time_histo.start_timer();
@@ -709,80 +723,87 @@ impl Timeline {
return Ok(CompactionOutcome::Pending);
}
// 2. Repartition and create image layers if necessary
let partition_count = match self
.repartition(
self.get_last_record_lsn(), // TODO: use L0-L1 boundary
self.get_compaction_target_size(),
options.flags,
ctx,
)
.await
{
Ok(((dense_partitioning, sparse_partitioning), lsn)) => {
// Disables access_stats updates, so that the files we read remain candidates for eviction after we're done with them
let image_ctx = RequestContextBuilder::extend(ctx)
.access_stats_behavior(AccessStatsBehavior::Skip)
.build();
if l0_l1_boundary_lsn < self.partitioning.read().1 {
// We never go backwards when repartition and create image layers.
info!("skipping image layer generation because repartition LSN is greater than L0-L1 boundary LSN.");
} else {
// 2. Repartition and create image layers if necessary
match self
.repartition(
l0_l1_boundary_lsn,
self.get_compaction_target_size(),
options.flags,
ctx,
)
.await
{
Ok(((dense_partitioning, sparse_partitioning), lsn)) => {
// Disables access_stats updates, so that the files we read remain candidates for eviction after we're done with them
let image_ctx = RequestContextBuilder::extend(ctx)
.access_stats_behavior(AccessStatsBehavior::Skip)
.build();
let mut partitioning = dense_partitioning;
partitioning
.parts
.extend(sparse_partitioning.into_dense().parts);
let mut partitioning = dense_partitioning;
partitioning
.parts
.extend(sparse_partitioning.into_dense().parts);
// 3. Create new image layers for partitions that have been modified "enough".
let (image_layers, outcome) = self
.create_image_layers(
&partitioning,
lsn,
if options
.flags
.contains(CompactFlags::ForceImageLayerCreation)
{
ImageLayerCreationMode::Force
} else {
ImageLayerCreationMode::Try
},
&image_ctx,
self.last_image_layer_creation_status
.load()
.as_ref()
.clone(),
)
.await
.inspect_err(|err| {
if let CreateImageLayersError::GetVectoredError(
GetVectoredError::MissingKey(_),
) = err
{
critical!("missing key during compaction: {err:?}");
}
})?;
// 3. Create new image layers for partitions that have been modified "enough".
let (image_layers, outcome) = self
.create_image_layers(
&partitioning,
lsn,
if options
.flags
.contains(CompactFlags::ForceImageLayerCreation)
{
ImageLayerCreationMode::Force
} else {
ImageLayerCreationMode::Try
},
&image_ctx,
self.last_image_layer_creation_status
.load()
.as_ref()
.clone(),
)
.await
.inspect_err(|err| {
if let CreateImageLayersError::GetVectoredError(
GetVectoredError::MissingKey(_),
) = err
{
critical!("missing key during compaction: {err:?}");
}
})?;
self.last_image_layer_creation_status
.store(Arc::new(outcome.clone()));
self.last_image_layer_creation_status
.store(Arc::new(outcome.clone()));
self.upload_new_image_layers(image_layers)?;
if let LastImageLayerCreationStatus::Incomplete { .. } = outcome {
// Yield and do not do any other kind of compaction.
info!("skipping shard ancestor compaction due to pending image layer generation tasks (preempted by L0 compaction).");
return Ok(CompactionOutcome::Pending);
self.upload_new_image_layers(image_layers)?;
if let LastImageLayerCreationStatus::Incomplete { .. } = outcome {
// Yield and do not do any other kind of compaction.
info!("skipping shard ancestor compaction due to pending image layer generation tasks (preempted by L0 compaction).");
return Ok(CompactionOutcome::Pending);
}
}
partitioning.parts.len()
}
Err(err) => {
// no partitioning? This is normal, if the timeline was just created
// as an empty timeline. Also in unit tests, when we use the timeline
// as a simple key-value store, ignoring the datadir layout. Log the
// error but continue.
//
// Suppress error when it's due to cancellation
if !self.cancel.is_cancelled() && !err.is_cancelled() {
tracing::error!("could not compact, repartitioning keyspace failed: {err:?}");
Err(err) => {
// no partitioning? This is normal, if the timeline was just created
// as an empty timeline. Also in unit tests, when we use the timeline
// as a simple key-value store, ignoring the datadir layout. Log the
// error but continue.
//
// Suppress error when it's due to cancellation
if !self.cancel.is_cancelled() && !err.is_cancelled() {
tracing::error!(
"could not compact, repartitioning keyspace failed: {err:?}"
);
}
}
1
}
};
};
}
let partition_count = self.partitioning.read().0 .0.parts.len();
// 4. Shard ancestor compaction
@@ -2238,8 +2259,11 @@ impl Timeline {
split_key_ranges.push((start, end));
}
split_key_ranges.sort();
let guard = self.layers.read().await;
let layer_map = guard.layer_map()?;
let all_layers = {
let guard = self.layers.read().await;
let layer_map = guard.layer_map()?;
layer_map.iter_historic_layers().collect_vec()
};
let mut current_start = None;
let ranges_num = split_key_ranges.len();
for (idx, (start, end)) in split_key_ranges.into_iter().enumerate() {
@@ -2251,14 +2275,23 @@ impl Timeline {
// We have already processed this partition.
continue;
}
let res = layer_map.range_search(start..end, compact_below_lsn);
let total_size = res.found.keys().map(|x| x.layer.file_size()).sum::<u64>();
let overlapping_layers = {
let mut desc = Vec::new();
for layer in all_layers.iter() {
if overlaps_with(&layer.get_key_range(), &(start..end))
&& layer.get_lsn_range().start <= compact_below_lsn
{
desc.push(layer.clone());
}
}
desc
};
let total_size = overlapping_layers.iter().map(|x| x.file_size).sum::<u64>();
if total_size > sub_compaction_max_job_size_mb * 1024 * 1024 || ranges_num == idx + 1 {
// Try to extend the compaction range so that we include at least one full layer file.
let extended_end = res
.found
.keys()
.map(|layer| layer.layer.key_range.end)
let extended_end = overlapping_layers
.iter()
.map(|layer| layer.key_range.end)
.min();
// It is possible that the search range does not contain any layer files when we reach the end of the loop.
// In this case, we simply use the specified key range end.
@@ -2285,7 +2318,6 @@ impl Timeline {
current_start = Some(end);
}
}
drop(guard);
Ok(compact_jobs)
}

View File

@@ -17,13 +17,11 @@ use crate::{
metadata::TimelineMetadata,
remote_timeline_client::{PersistIndexPartWithDeletedFlagError, RemoteTimelineClient},
CreateTimelineCause, DeleteTimelineError, MaybeDeletedIndexPart, Tenant,
TenantManifestError, TimelineOrOffloaded,
TenantManifestError, Timeline, TimelineOrOffloaded,
},
virtual_file::MaybeFatalIo,
};
use super::{Timeline, TimelineResources};
/// Mark timeline as deleted in S3 so we won't pick it up next time
/// during attach or pageserver restart.
/// See comment in persist_index_part_with_deleted_flag.
@@ -296,12 +294,7 @@ impl DeleteTimelineFlow {
timeline_id,
local_metadata,
None, // Ancestor is not needed for deletion.
TimelineResources {
remote_client,
pagestream_throttle: tenant.pagestream_throttle.clone(),
pagestream_throttle_metrics: tenant.pagestream_throttle_metrics.clone(),
l0_flush_global_state: tenant.l0_flush_global_state.clone(),
},
tenant.get_timeline_resources_for(remote_client),
// Important. We dont pass ancestor above because it can be missing.
// Thus we need to skip the validation here.
CreateTimelineCause::Delete,

View File

@@ -108,6 +108,10 @@ impl<T> Backend<'_, T> {
Self::Local(_) => panic!("Local backend has no API"),
}
}
pub(crate) fn is_local_proxy(&self) -> bool {
matches!(self, Self::Local(_))
}
}
impl<'a, T> Backend<'a, T> {

View File

@@ -69,17 +69,35 @@ pub async fn handle_cancel_messages(
value,
resp_tx,
_guard,
expire: _,
expire,
} => {
let res = client.hset(&key, field, value).await;
if let Some(resp_tx) = resp_tx {
resp_tx
.send(client.hset(key, field, value).await)
.inspect_err(|e| {
tracing::debug!("failed to send StoreCancelKey response: {:?}", e);
})
.ok();
if res.is_ok() {
resp_tx
.send(client.expire(key, expire).await)
.inspect_err(|e| {
tracing::debug!(
"failed to send StoreCancelKey response: {:?}",
e
);
})
.ok();
} else {
resp_tx
.send(res)
.inspect_err(|e| {
tracing::debug!(
"failed to send StoreCancelKey response: {:?}",
e
);
})
.ok();
}
} else if res.is_ok() {
drop(client.expire(key, expire).await);
} else {
drop(client.hset(key, field, value).await);
tracing::warn!("failed to store cancel key: {:?}", res);
}
}
CancelKeyOp::GetCancelData {
@@ -436,7 +454,7 @@ impl Session {
&self.key
}
// Send the store key op to the cancellation handler
// Send the store key op to the cancellation handler and set TTL for the key
pub(crate) async fn write_cancel_key(
&self,
cancel_closure: CancelClosure,

View File

@@ -400,9 +400,9 @@ fn create_random_jwk() -> (SigningKey, jose_jwk::Key) {
pub(crate) enum HttpConnError {
#[error("pooled connection closed at inconsistent state")]
ConnectionClosedAbruptly(#[from] tokio::sync::watch::error::SendError<uuid::Uuid>),
#[error("could not connection to postgres in compute")]
#[error("could not connect to postgres in compute")]
PostgresConnectionError(#[from] postgres_client::Error),
#[error("could not connection to local-proxy in compute")]
#[error("could not connect to local-proxy in compute")]
LocalProxyConnectionError(#[from] LocalProxyConnError),
#[error("could not parse JWT payload")]
JwtPayloadError(serde_json::Error),

View File

@@ -11,10 +11,12 @@ use http_body_util::{BodyExt, Full};
use hyper::body::Incoming;
use hyper::http::{HeaderName, HeaderValue};
use hyper::{header, HeaderMap, Request, Response, StatusCode};
use indexmap::IndexMap;
use postgres_client::error::{DbError, ErrorPosition, SqlState};
use postgres_client::{GenericClient, IsolationLevel, NoTls, ReadyForQueryStatus, Transaction};
use pq_proto::StartupMessageParamsBuilder;
use serde::Serialize;
use serde_json::value::RawValue;
use serde_json::Value;
use tokio::time::{self, Instant};
use tokio_util::sync::CancellationToken;
@@ -249,6 +251,50 @@ pub(crate) async fn handle(
let mut response = match result {
Ok(r) => {
ctx.set_success();
// Handling the error response from local proxy here
if config.authentication_config.is_auth_broker && r.status().is_server_error() {
let status = r.status();
let body_bytes = r
.collect()
.await
.map_err(|e| {
ApiError::InternalServerError(anyhow::Error::msg(format!(
"could not collect http body: {e}"
)))
})?
.to_bytes();
if let Ok(mut json_map) =
serde_json::from_slice::<IndexMap<&str, &RawValue>>(&body_bytes)
{
let message = json_map.get("message");
if let Some(message) = message {
let msg: String = match serde_json::from_str(message.get()) {
Ok(msg) => msg,
Err(_) => {
"Unable to parse the response message from server".to_string()
}
};
error!("Error response from local_proxy: {status} {msg}");
json_map.retain(|key, _| !key.starts_with("neon:")); // remove all the neon-related keys
let resp_json = serde_json::to_string(&json_map)
.unwrap_or("failed to serialize the response message".to_string());
return json_response(status, resp_json);
}
}
error!("Unable to parse the response message from local_proxy");
return json_response(
status,
json!({ "message": "Unable to parse the response message from server".to_string() }),
);
}
r
}
Err(e @ SqlOverHttpError::Cancelled(_)) => {
@@ -618,8 +664,6 @@ async fn handle_db_inner(
let authenticate_and_connect = Box::pin(
async {
let is_local_proxy = matches!(backend.auth_backend, crate::auth::Backend::Local(_));
let keys = match auth {
AuthData::Password(pw) => {
backend
@@ -634,7 +678,9 @@ async fn handle_db_inner(
};
let client = match keys.keys {
ComputeCredentialKeys::JwtPayload(payload) if is_local_proxy => {
ComputeCredentialKeys::JwtPayload(payload)
if backend.auth_backend.is_local_proxy() =>
{
let mut client = backend.connect_to_local_postgres(ctx, conn_info).await?;
let (cli_inner, _dsc) = client.client_inner();
cli_inner.set_jwt_session(&payload).await?;

View File

@@ -20,6 +20,9 @@ from fixtures.remote_storage import LocalFsStorage, RemoteStorageKind
from fixtures.utils import query_scalar, wait_until
@pytest.mark.skip(
reason="We won't create future layers any more after https://github.com/neondatabase/neon/pull/10548"
)
@pytest.mark.parametrize(
"attach_mode",
["default_generation", "same_generation"],

View File

@@ -1,11 +1,11 @@
{
"v17": [
"17.2",
"8dfd5a7030d3e8a98b60265ebe045788892ac7f3"
"4c45d78ad587e4bcb4a5a7ef6931b88c6a3d575d"
],
"v16": [
"16.6",
"86d9ea96ebb9088eac62f57f1f5ace68e70e0d1c"
"13cf5d06c98a8e9b0590ce6cdfd193a08d0a7792"
],
"v15": [
"15.10",