mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-05 20:42:54 +00:00
pageserver config: ignore+warn about unknown fields (instead of deny_unknown_fields) (#11275)
# Refs - refs https://github.com/neondatabase/neon/issues/8915 - discussion thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1742406381132599 - stacked atop https://github.com/neondatabase/neon/pull/11298 - corresponding internal docs update that illustrates how this PR removes friction: https://github.com/neondatabase/docs/pull/404 # Problem Rejecting `pageserver.toml`s with unknown fields adds friction, especially when using `pageserver.toml` fields as feature flags that need to be decommissioned. See the added paragraphs on `pageserver_api::models::ConfigToml` for details on what kind of friction it causes. Also read the corresponding internal docs update linked above to see a more imperative guide for using `pageserver.toml` flags as feature flags. # Solution ## Ignoring unknown fields Ignoring is the serde default behavior. So, just remove `serde(deny_unknown_fields)` from all structs in `pageserver_api::config::ConfigToml` `pageserver_api::config::TenantConfigToml`. I went through all the child fields and verified they don't use `deny_unknown_fields` either, including those shared with `pageserver_api::models`. ## Warning about unknown fields We still want to warn about unknown fields to - be informed about typos in the config template - be reminded about feature-flag style configs that have been cleaned up in code but not yet in config templates We tried `serde_ignore` (cf draft #11319) but it doesn't work with `serde(flatten)`. The solution we arrived at is to compare the on-disk TOML with the TOML that we produce if we serialize the `ConfigToml` again. Any key specified in the on-disk TOML but not present in the serialized TOML is flagged as an ignored key. The mechanism to do it is a tiny recursive decent visitor on the `toml_edit::DocumentMut`. # Future Work Invalid config _values_ in known fields will continue to fail pageserver startup. See - https://github.com/neondatabase/cloud/issues/24349 for current worst case impact to deployments & ideas to improve.
This commit is contained in:
committed by
GitHub
parent
6ee84d985a
commit
4f94751b75
@@ -51,9 +51,54 @@ pub struct NodeMetadata {
|
||||
/// If there cannot be a static default value because we need to make runtime
|
||||
/// checks to determine the default, make it an `Option` (which defaults to None).
|
||||
/// The runtime check should be done in the consuming crate, i.e., `pageserver`.
|
||||
///
|
||||
/// Unknown fields are silently ignored during deserialization.
|
||||
/// The alternative, which we used in the past, was to set `deny_unknown_fields`,
|
||||
/// which fails deserialization, and hence pageserver startup, if there is an unknown field.
|
||||
/// The reason we don't do that anymore is that it complicates
|
||||
/// usage of config fields for feature flagging, which we commonly do for
|
||||
/// region-by-region rollouts.
|
||||
/// The complications mainly arise because the `pageserver.toml` contents on a
|
||||
/// prod server have a separate lifecycle from the pageserver binary.
|
||||
/// For instance, `pageserver.toml` contents today are defined in the internal
|
||||
/// infra repo, and thus introducing a new config field to pageserver and
|
||||
/// rolling it out to prod servers are separate commits in separate repos
|
||||
/// that can't be made or rolled back atomically.
|
||||
/// Rollbacks in particular pose a risk with deny_unknown_fields because
|
||||
/// the old pageserver binary may reject a new config field, resulting in
|
||||
/// an outage unless the person doing the pageserver rollback remembers
|
||||
/// to also revert the commit that added the config field in to the
|
||||
/// `pageserver.toml` templates in the internal infra repo.
|
||||
/// (A pre-deploy config check would eliminate this risk during rollbacks,
|
||||
/// cf [here](https://github.com/neondatabase/cloud/issues/24349).)
|
||||
/// In addition to this compatibility problem during emergency rollbacks,
|
||||
/// deny_unknown_fields adds further complications when decomissioning a feature
|
||||
/// flag: with deny_unknown_fields, we can't remove a flag from the [`ConfigToml`]
|
||||
/// until all prod servers' `pageserver.toml` files have been updated to a version
|
||||
/// that doesn't specify the flag. Otherwise new software would fail to start up.
|
||||
/// This adds the requirement for an intermediate step where the new config field
|
||||
/// is accepted but ignored, prolonging the decomissioning process by an entire
|
||||
/// release cycle.
|
||||
/// By contrast with unknown fields silently ignored, decomissioning a feature
|
||||
/// flag is a one-step process: we can skip the intermediate step and straight
|
||||
/// remove the field from the [`ConfigToml`]. We leave the field in the
|
||||
/// `pageserver.toml` files on prod servers until we reach certainty that we
|
||||
/// will not roll back to old software whose behavior was dependent on config.
|
||||
/// Then we can remove the field from the templates in the internal infra repo.
|
||||
/// This process is [documented internally](
|
||||
/// https://docs.neon.build/storage/pageserver_configuration.html).
|
||||
///
|
||||
/// Note that above relaxed compatbility for the config format does NOT APPLY
|
||||
/// TO THE STORAGE FORMAT. As general guidance, when introducing storage format
|
||||
/// changes, ensure that the potential rollback target version will be compatible
|
||||
/// with the new format. This must hold regardless of what flags are set in in the `pageserver.toml`:
|
||||
/// any format version that exists in an environment must be compatible with the software that runs there.
|
||||
/// Use a pageserver.toml flag only to gate whether software _writes_ the new format.
|
||||
/// For more compatibility considerations, refer to [internal docs](
|
||||
/// https://docs.neon.build/storage/compat.html?highlight=compat#format-versions--compatibility)
|
||||
#[serde_as]
|
||||
#[derive(Clone, Debug, serde::Deserialize, serde::Serialize)]
|
||||
#[serde(default, deny_unknown_fields)]
|
||||
#[serde(default)]
|
||||
pub struct ConfigToml {
|
||||
// types mapped 1:1 into the runtime PageServerConfig type
|
||||
pub listen_pg_addr: String,
|
||||
@@ -138,7 +183,6 @@ pub struct ConfigToml {
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
|
||||
#[serde(deny_unknown_fields)]
|
||||
pub struct DiskUsageEvictionTaskConfig {
|
||||
pub max_usage_pct: utils::serde_percent::Percent,
|
||||
pub min_avail_bytes: u64,
|
||||
@@ -153,13 +197,11 @@ pub struct DiskUsageEvictionTaskConfig {
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
|
||||
#[serde(tag = "mode", rename_all = "kebab-case")]
|
||||
#[serde(deny_unknown_fields)]
|
||||
pub enum PageServicePipeliningConfig {
|
||||
Serial,
|
||||
Pipelined(PageServicePipeliningConfigPipelined),
|
||||
}
|
||||
#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
|
||||
#[serde(deny_unknown_fields)]
|
||||
pub struct PageServicePipeliningConfigPipelined {
|
||||
/// Causes runtime errors if larger than max get_vectored batch size.
|
||||
pub max_batch_size: NonZeroUsize,
|
||||
@@ -175,7 +217,6 @@ pub enum PageServiceProtocolPipelinedExecutionStrategy {
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
|
||||
#[serde(tag = "mode", rename_all = "kebab-case")]
|
||||
#[serde(deny_unknown_fields)]
|
||||
pub enum GetVectoredConcurrentIo {
|
||||
/// The read path is fully sequential: layers are visited
|
||||
/// one after the other and IOs are issued and waited upon
|
||||
@@ -294,7 +335,7 @@ pub struct MaxVectoredReadBytes(pub NonZeroUsize);
|
||||
|
||||
/// Tenant-level configuration values, used for various purposes.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
|
||||
#[serde(deny_unknown_fields, default)]
|
||||
#[serde(default)]
|
||||
pub struct TenantConfigToml {
|
||||
// Flush out an inmemory layer, if it's holding WAL older than this
|
||||
// This puts a backstop on how much WAL needs to be re-digested if the
|
||||
|
||||
@@ -1104,7 +1104,7 @@ pub struct CompactionAlgorithmSettings {
|
||||
}
|
||||
|
||||
#[derive(Debug, PartialEq, Eq, Clone, Deserialize, Serialize)]
|
||||
#[serde(tag = "mode", rename_all = "kebab-case", deny_unknown_fields)]
|
||||
#[serde(tag = "mode", rename_all = "kebab-case")]
|
||||
pub enum L0FlushConfig {
|
||||
#[serde(rename_all = "snake_case")]
|
||||
Direct { max_concurrency: NonZeroUsize },
|
||||
|
||||
@@ -16,7 +16,7 @@ use http_utils::tls_certs::ReloadingCertificateResolver;
|
||||
use metrics::launch_timestamp::{LaunchTimestamp, set_launch_timestamp_metric};
|
||||
use metrics::set_build_info_metric;
|
||||
use nix::sys::socket::{setsockopt, sockopt};
|
||||
use pageserver::config::{PageServerConf, PageserverIdentity};
|
||||
use pageserver::config::{PageServerConf, PageserverIdentity, ignored_fields};
|
||||
use pageserver::controller_upcall_client::StorageControllerUpcallClient;
|
||||
use pageserver::deletion_queue::DeletionQueue;
|
||||
use pageserver::disk_usage_eviction_task::{self, launch_disk_usage_global_eviction_task};
|
||||
@@ -98,7 +98,7 @@ fn main() -> anyhow::Result<()> {
|
||||
env::set_current_dir(&workdir)
|
||||
.with_context(|| format!("Failed to set application's current dir to '{workdir}'"))?;
|
||||
|
||||
let conf = initialize_config(&identity_file_path, &cfg_file_path, &workdir)?;
|
||||
let (conf, ignored) = initialize_config(&identity_file_path, &cfg_file_path, &workdir)?;
|
||||
|
||||
// Initialize logging.
|
||||
//
|
||||
@@ -144,7 +144,17 @@ fn main() -> anyhow::Result<()> {
|
||||
&[("node_id", &conf.id.to_string())],
|
||||
);
|
||||
|
||||
// after setting up logging, log the effective IO engine choice and read path implementations
|
||||
// Warn about ignored config items; see pageserver_api::config::ConfigToml
|
||||
// doc comment for rationale why we prefer this over serde(deny_unknown_fields).
|
||||
{
|
||||
let ignored_fields::Paths { paths } = &ignored;
|
||||
for path in paths {
|
||||
warn!(?path, "ignoring unknown configuration item");
|
||||
}
|
||||
}
|
||||
|
||||
// Log configuration items for feature-flag-like config
|
||||
// (maybe we should automate this with a visitor?).
|
||||
info!(?conf.virtual_file_io_engine, "starting with virtual_file IO engine");
|
||||
info!(?conf.virtual_file_io_mode, "starting with virtual_file IO mode");
|
||||
info!(?conf.wal_receiver_protocol, "starting with WAL receiver protocol");
|
||||
@@ -207,7 +217,7 @@ fn main() -> anyhow::Result<()> {
|
||||
tracing::info!("Initializing page_cache...");
|
||||
page_cache::init(conf.page_cache_size);
|
||||
|
||||
start_pageserver(launch_ts, conf, otel_guard).context("Failed to start pageserver")?;
|
||||
start_pageserver(launch_ts, conf, ignored, otel_guard).context("Failed to start pageserver")?;
|
||||
|
||||
scenario.teardown();
|
||||
Ok(())
|
||||
@@ -217,7 +227,7 @@ fn initialize_config(
|
||||
identity_file_path: &Utf8Path,
|
||||
cfg_file_path: &Utf8Path,
|
||||
workdir: &Utf8Path,
|
||||
) -> anyhow::Result<&'static PageServerConf> {
|
||||
) -> anyhow::Result<(&'static PageServerConf, ignored_fields::Paths)> {
|
||||
// The deployment orchestrator writes out an indentity file containing the node id
|
||||
// for all pageservers. This file is the source of truth for the node id. In order
|
||||
// to allow for rolling back pageserver releases, the node id is also included in
|
||||
@@ -246,16 +256,36 @@ fn initialize_config(
|
||||
|
||||
let config_file_contents =
|
||||
std::fs::read_to_string(cfg_file_path).context("read config file from filesystem")?;
|
||||
let config_toml = serde_path_to_error::deserialize(
|
||||
toml_edit::de::Deserializer::from_str(&config_file_contents)
|
||||
.context("build toml deserializer")?,
|
||||
)
|
||||
.context("deserialize config toml")?;
|
||||
|
||||
// Deserialize the config file contents into a ConfigToml.
|
||||
let config_toml: pageserver_api::config::ConfigToml = {
|
||||
let deserializer = toml_edit::de::Deserializer::from_str(&config_file_contents)
|
||||
.context("build toml deserializer")?;
|
||||
let mut path_to_error_track = serde_path_to_error::Track::new();
|
||||
let deserializer =
|
||||
serde_path_to_error::Deserializer::new(deserializer, &mut path_to_error_track);
|
||||
serde::Deserialize::deserialize(deserializer).context("deserialize config toml")?
|
||||
};
|
||||
|
||||
// Find unknown fields by re-serializing the parsed ConfigToml and comparing it to the on-disk file.
|
||||
// Any fields that are only in the on-disk version are unknown.
|
||||
// (The assumption here is that the ConfigToml doesn't to skip_serializing_if.)
|
||||
// (Make sure to read the ConfigToml doc comment on why we only want to warn about, but not fail startup, on unknown fields).
|
||||
let ignored = {
|
||||
let ondisk_toml = config_file_contents
|
||||
.parse::<toml_edit::DocumentMut>()
|
||||
.context("parse original config as toml document")?;
|
||||
let parsed_toml = toml_edit::ser::to_document(&config_toml)
|
||||
.context("re-serialize config to toml document")?;
|
||||
pageserver::config::ignored_fields::find(ondisk_toml, parsed_toml)
|
||||
};
|
||||
|
||||
// Construct the runtime god object (it's called PageServerConf but actually is just global shared state).
|
||||
let conf = PageServerConf::parse_and_validate(identity.id, config_toml, workdir)
|
||||
.context("runtime-validation of config toml")?;
|
||||
let conf = Box::leak(Box::new(conf));
|
||||
|
||||
Ok(Box::leak(Box::new(conf)))
|
||||
Ok((conf, ignored))
|
||||
}
|
||||
|
||||
struct WaitForPhaseResult<F: std::future::Future + Unpin> {
|
||||
@@ -306,6 +336,7 @@ fn startup_checkpoint(started_at: Instant, phase: &str, human_phase: &str) {
|
||||
fn start_pageserver(
|
||||
launch_ts: &'static LaunchTimestamp,
|
||||
conf: &'static PageServerConf,
|
||||
ignored: ignored_fields::Paths,
|
||||
otel_guard: Option<OtelGuard>,
|
||||
) -> anyhow::Result<()> {
|
||||
// Monotonic time for later calculating startup duration
|
||||
@@ -329,7 +360,7 @@ fn start_pageserver(
|
||||
pageserver::metrics::tokio_epoll_uring::Collector::new(),
|
||||
))
|
||||
.unwrap();
|
||||
pageserver::preinitialize_metrics(conf);
|
||||
pageserver::preinitialize_metrics(conf, ignored);
|
||||
|
||||
// If any failpoints were set from FAILPOINTS environment variable,
|
||||
// print them to the log for debugging purposes
|
||||
|
||||
@@ -4,6 +4,8 @@
|
||||
//! file, or on the command line.
|
||||
//! See also `settings.md` for better description on every parameter.
|
||||
|
||||
pub mod ignored_fields;
|
||||
|
||||
use std::env;
|
||||
use std::num::NonZeroUsize;
|
||||
use std::sync::Arc;
|
||||
@@ -560,7 +562,6 @@ impl PageServerConf {
|
||||
}
|
||||
|
||||
#[derive(serde::Deserialize, serde::Serialize)]
|
||||
#[serde(deny_unknown_fields)]
|
||||
pub struct PageserverIdentity {
|
||||
pub id: NodeId,
|
||||
}
|
||||
@@ -632,82 +633,4 @@ mod tests {
|
||||
PageServerConf::parse_and_validate(NodeId(0), config_toml, &workdir)
|
||||
.expect("parse_and_validate");
|
||||
}
|
||||
|
||||
/// If there's a typo in the pageserver config, we'd rather catch that typo
|
||||
/// and fail pageserver startup than silently ignoring the typo, leaving whoever
|
||||
/// made it in the believe that their config change is effective.
|
||||
///
|
||||
/// The default in serde is to allow unknown fields, so, we rely
|
||||
/// on developer+review discipline to add `deny_unknown_fields` when adding
|
||||
/// new structs to the config, and these tests here as a regression test.
|
||||
///
|
||||
/// The alternative to all of this would be to allow unknown fields in the config.
|
||||
/// To catch them, we could have a config check tool or mgmt API endpoint that
|
||||
/// compares the effective config with the TOML on disk and makes sure that
|
||||
/// the on-disk TOML is a strict subset of the effective config.
|
||||
mod unknown_fields_handling {
|
||||
macro_rules! test {
|
||||
($short_name:ident, $input:expr) => {
|
||||
#[test]
|
||||
fn $short_name() {
|
||||
let input = $input;
|
||||
let err = toml_edit::de::from_str::<pageserver_api::config::ConfigToml>(&input)
|
||||
.expect_err("some_invalid_field is an invalid field");
|
||||
dbg!(&err);
|
||||
assert!(err.to_string().contains("some_invalid_field"));
|
||||
}
|
||||
};
|
||||
}
|
||||
use indoc::indoc;
|
||||
|
||||
test!(
|
||||
toplevel,
|
||||
indoc! {r#"
|
||||
some_invalid_field = 23
|
||||
"#}
|
||||
);
|
||||
|
||||
test!(
|
||||
toplevel_nested,
|
||||
indoc! {r#"
|
||||
[some_invalid_field]
|
||||
foo = 23
|
||||
"#}
|
||||
);
|
||||
|
||||
test!(
|
||||
disk_usage_based_eviction,
|
||||
indoc! {r#"
|
||||
[disk_usage_based_eviction]
|
||||
some_invalid_field = 23
|
||||
"#}
|
||||
);
|
||||
|
||||
test!(
|
||||
tenant_config,
|
||||
indoc! {r#"
|
||||
[tenant_config]
|
||||
some_invalid_field = 23
|
||||
"#}
|
||||
);
|
||||
|
||||
test!(
|
||||
l0_flush,
|
||||
indoc! {r#"
|
||||
[l0_flush]
|
||||
mode = "direct"
|
||||
some_invalid_field = 23
|
||||
"#}
|
||||
);
|
||||
|
||||
// TODO: fix this => https://github.com/neondatabase/neon/issues/8915
|
||||
// test!(
|
||||
// remote_storage_config,
|
||||
// indoc! {r#"
|
||||
// [remote_storage_config]
|
||||
// local_path = "/nonexistent"
|
||||
// some_invalid_field = 23
|
||||
// "#}
|
||||
// );
|
||||
}
|
||||
}
|
||||
|
||||
179
pageserver/src/config/ignored_fields.rs
Normal file
179
pageserver/src/config/ignored_fields.rs
Normal file
@@ -0,0 +1,179 @@
|
||||
//! Check for fields in the on-disk config file that were ignored when
|
||||
//! deserializing [`pageserver_api::config::ConfigToml`].
|
||||
//!
|
||||
//! This could have been part of the [`pageserver_api::config`] module,
|
||||
//! but the way we identify unused fields in this module
|
||||
//! is specific to the format (TOML) and the implementation of the
|
||||
//! deserialization for that format ([`toml_edit`]).
|
||||
|
||||
use std::collections::HashSet;
|
||||
|
||||
use itertools::Itertools;
|
||||
|
||||
/// Pass in the user-specified config and the re-serialized [`pageserver_api::config::ConfigToml`].
|
||||
/// The returned [`Paths`] contains the paths to the fields that were ignored by deserialization
|
||||
/// of the [`pageserver_api::config::ConfigToml`].
|
||||
pub fn find(user_specified: toml_edit::DocumentMut, reserialized: toml_edit::DocumentMut) -> Paths {
|
||||
let user_specified = paths(user_specified);
|
||||
let reserialized = paths(reserialized);
|
||||
fn paths(doc: toml_edit::DocumentMut) -> HashSet<String> {
|
||||
let mut out = Vec::new();
|
||||
let mut visitor = PathsVisitor::new(&mut out);
|
||||
visitor.visit_table_like(doc.as_table());
|
||||
HashSet::from_iter(out)
|
||||
}
|
||||
|
||||
let mut ignored = HashSet::new();
|
||||
|
||||
// O(n) because of HashSet
|
||||
for path in user_specified {
|
||||
if !reserialized.contains(&path) {
|
||||
ignored.insert(path);
|
||||
}
|
||||
}
|
||||
|
||||
Paths {
|
||||
paths: ignored
|
||||
.into_iter()
|
||||
// sort lexicographically for deterministic output
|
||||
.sorted()
|
||||
.collect(),
|
||||
}
|
||||
}
|
||||
|
||||
pub struct Paths {
|
||||
pub paths: Vec<String>,
|
||||
}
|
||||
|
||||
struct PathsVisitor<'a> {
|
||||
stack: Vec<String>,
|
||||
out: &'a mut Vec<String>,
|
||||
}
|
||||
|
||||
impl<'a> PathsVisitor<'a> {
|
||||
fn new(out: &'a mut Vec<String>) -> Self {
|
||||
Self {
|
||||
stack: Vec::new(),
|
||||
out,
|
||||
}
|
||||
}
|
||||
|
||||
fn visit_table_like(&mut self, table_like: &dyn toml_edit::TableLike) {
|
||||
for (entry, item) in table_like.iter() {
|
||||
self.stack.push(entry.to_string());
|
||||
self.visit_item(item);
|
||||
self.stack.pop();
|
||||
}
|
||||
}
|
||||
|
||||
fn visit_item(&mut self, item: &toml_edit::Item) {
|
||||
match item {
|
||||
toml_edit::Item::None => (),
|
||||
toml_edit::Item::Value(value) => self.visit_value(value),
|
||||
toml_edit::Item::Table(table) => {
|
||||
self.visit_table_like(table);
|
||||
}
|
||||
toml_edit::Item::ArrayOfTables(array_of_tables) => {
|
||||
for (i, table) in array_of_tables.iter().enumerate() {
|
||||
self.stack.push(format!("[{i}]"));
|
||||
self.visit_table_like(table);
|
||||
self.stack.pop();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn visit_value(&mut self, value: &toml_edit::Value) {
|
||||
match value {
|
||||
toml_edit::Value::String(_)
|
||||
| toml_edit::Value::Integer(_)
|
||||
| toml_edit::Value::Float(_)
|
||||
| toml_edit::Value::Boolean(_)
|
||||
| toml_edit::Value::Datetime(_) => self.out.push(self.stack.join(".")),
|
||||
toml_edit::Value::Array(array) => {
|
||||
for (i, value) in array.iter().enumerate() {
|
||||
self.stack.push(format!("[{i}]"));
|
||||
self.visit_value(value);
|
||||
self.stack.pop();
|
||||
}
|
||||
}
|
||||
toml_edit::Value::InlineTable(inline_table) => self.visit_table_like(inline_table),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
pub(crate) mod tests {
|
||||
|
||||
fn test_impl(original: &str, parsed: &str, expect: [&str; 1]) {
|
||||
let original: toml_edit::DocumentMut = original.parse().expect("parse original config");
|
||||
let parsed: toml_edit::DocumentMut = parsed.parse().expect("parse re-serialized config");
|
||||
|
||||
let super::Paths { paths: actual } = super::find(original, parsed);
|
||||
assert_eq!(actual, &expect);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn top_level() {
|
||||
test_impl(
|
||||
r#"
|
||||
[a]
|
||||
b = 1
|
||||
c = 2
|
||||
d = 3
|
||||
"#,
|
||||
r#"
|
||||
[a]
|
||||
b = 1
|
||||
c = 2
|
||||
"#,
|
||||
["a.d"],
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn nested() {
|
||||
test_impl(
|
||||
r#"
|
||||
[a.b.c]
|
||||
d = 23
|
||||
"#,
|
||||
r#"
|
||||
[a]
|
||||
e = 42
|
||||
"#,
|
||||
["a.b.c.d"],
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn array_of_tables() {
|
||||
test_impl(
|
||||
r#"
|
||||
[[a]]
|
||||
b = 1
|
||||
c = 2
|
||||
d = 3
|
||||
"#,
|
||||
r#"
|
||||
[[a]]
|
||||
b = 1
|
||||
c = 2
|
||||
"#,
|
||||
["a.[0].d"],
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn array() {
|
||||
test_impl(
|
||||
r#"
|
||||
foo = [ {bar = 23} ]
|
||||
"#,
|
||||
r#"
|
||||
foo = [ { blup = 42 }]
|
||||
"#,
|
||||
["foo.[0].bar"],
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -30,6 +30,7 @@ use strum::{EnumCount, IntoEnumIterator as _, VariantNames};
|
||||
use strum_macros::{IntoStaticStr, VariantNames};
|
||||
use utils::id::TimelineId;
|
||||
|
||||
use crate::config;
|
||||
use crate::config::PageServerConf;
|
||||
use crate::context::{PageContentKind, RequestContext};
|
||||
use crate::pgdatadir_mapping::DatadirModificationStats;
|
||||
@@ -4107,9 +4108,33 @@ pub(crate) fn set_tokio_runtime_setup(setup: &str, num_threads: NonZeroUsize) {
|
||||
.set(u64::try_from(num_threads.get()).unwrap());
|
||||
}
|
||||
|
||||
pub fn preinitialize_metrics(conf: &'static PageServerConf) {
|
||||
static PAGESERVER_CONFIG_IGNORED_ITEMS: Lazy<UIntGaugeVec> = Lazy::new(|| {
|
||||
register_uint_gauge_vec!(
|
||||
"pageserver_config_ignored_items",
|
||||
"TOML items present in the on-disk configuration file but ignored by the pageserver config parser.\
|
||||
The `item` label is the dot-separated path of the ignored item in the on-disk configuration file.\
|
||||
The value for an unknown config item is always 1.\
|
||||
There is a special label value \"\", which is 0, so that there is always a metric exposed (simplifies dashboards).",
|
||||
&["item"]
|
||||
)
|
||||
.unwrap()
|
||||
});
|
||||
|
||||
pub fn preinitialize_metrics(
|
||||
conf: &'static PageServerConf,
|
||||
ignored: config::ignored_fields::Paths,
|
||||
) {
|
||||
set_page_service_config_max_batch_size(&conf.page_service_pipelining);
|
||||
|
||||
PAGESERVER_CONFIG_IGNORED_ITEMS
|
||||
.with_label_values(&[""])
|
||||
.set(0);
|
||||
for path in &ignored.paths {
|
||||
PAGESERVER_CONFIG_IGNORED_ITEMS
|
||||
.with_label_values(&[path])
|
||||
.set(1);
|
||||
}
|
||||
|
||||
// Python tests need these and on some we do alerting.
|
||||
//
|
||||
// FIXME(4813): make it so that we have no top level metrics as this fn will easily fall out of
|
||||
|
||||
@@ -1297,9 +1297,20 @@ class NeonEnv:
|
||||
ps_cfg[key] = value
|
||||
|
||||
# Create a corresponding NeonPageserver object
|
||||
self.pageservers.append(
|
||||
NeonPageserver(self, ps_id, port=pageserver_port, az_id=ps_cfg["availability_zone"])
|
||||
ps = NeonPageserver(
|
||||
self, ps_id, port=pageserver_port, az_id=ps_cfg["availability_zone"]
|
||||
)
|
||||
|
||||
if config.test_may_use_compatibility_snapshot_binaries:
|
||||
# New features gated by pageserver config usually get rolled out in the
|
||||
# test suite first, by enabling it in the `ps_cfg` abve.
|
||||
# Compatibility tests run with old binaries that predate feature code & config.
|
||||
# So, old binaries will warn about the flag's presence.
|
||||
# Silence those warnings categorically.
|
||||
log.info("test may use old binaries, ignoring warnings about unknown config items")
|
||||
ps.allowed_errors.append(".*ignoring unknown configuration item.*")
|
||||
|
||||
self.pageservers.append(ps)
|
||||
cfg["pageservers"].append(ps_cfg)
|
||||
|
||||
# Create config and a Safekeeper object for each safekeeper
|
||||
|
||||
@@ -101,7 +101,7 @@ if TYPE_CHECKING:
|
||||
# export CHECK_ONDISK_DATA_COMPATIBILITY=true
|
||||
# export COMPATIBILITY_NEON_BIN=neon_previous/target/${BUILD_TYPE}
|
||||
# export COMPATIBILITY_POSTGRES_DISTRIB_DIR=neon_previous/pg_install
|
||||
# export NEON_BIN=target/release
|
||||
# export NEON_BIN=target/${BUILD_TYPE}
|
||||
# export POSTGRES_DISTRIB_DIR=pg_install
|
||||
#
|
||||
# # Build previous version of binaries and store them somewhere:
|
||||
|
||||
56
test_runner/regress/test_pageserver_config.py
Normal file
56
test_runner/regress/test_pageserver_config.py
Normal file
@@ -0,0 +1,56 @@
|
||||
import re
|
||||
|
||||
import pytest
|
||||
from fixtures.neon_fixtures import NeonEnv
|
||||
from fixtures.utils import run_only_on_default_postgres
|
||||
|
||||
|
||||
@pytest.mark.parametrize("what", ["default", "top_level", "nested"])
|
||||
@run_only_on_default_postgres(reason="does not use postgres")
|
||||
def test_unknown_config_items_handling(neon_simple_env: NeonEnv, what: str):
|
||||
"""
|
||||
Ensure we log unknown config fields and expose a metric for alerting.
|
||||
There are more unit tests in the Rust code for other TOML items.
|
||||
"""
|
||||
env = neon_simple_env
|
||||
|
||||
def edit_fn(config) -> str | None:
|
||||
if what == "default":
|
||||
return None
|
||||
elif what == "top_level":
|
||||
config["unknown_top_level_config_item"] = 23
|
||||
return r"unknown_top_level_config_item"
|
||||
elif what == "nested":
|
||||
config["remote_storage"]["unknown_config_item"] = 23
|
||||
return r"remote_storage.unknown_config_item"
|
||||
else:
|
||||
raise ValueError(f"Unknown what: {what}")
|
||||
|
||||
def get_metric():
|
||||
metrics = env.pageserver.http_client().get_metrics()
|
||||
samples = metrics.query_all("pageserver_config_ignored_items")
|
||||
by_item = {sample.labels["item"]: sample.value for sample in samples}
|
||||
assert by_item[""] == 0, "must always contain the empty item with value 0"
|
||||
del by_item[""]
|
||||
return by_item
|
||||
|
||||
expected_ignored_item = env.pageserver.edit_config_toml(edit_fn)
|
||||
|
||||
if expected_ignored_item is not None:
|
||||
expected_ignored_item_log_line_re = r".*ignoring unknown configuration item.*" + re.escape(
|
||||
expected_ignored_item
|
||||
)
|
||||
env.pageserver.allowed_errors.append(expected_ignored_item_log_line_re)
|
||||
|
||||
if expected_ignored_item is not None:
|
||||
assert not env.pageserver.log_contains(expected_ignored_item_log_line_re)
|
||||
assert get_metric() == {}
|
||||
|
||||
# in any way, unknown config items should not fail pageserver to start
|
||||
# TODO: extend this test with the config validator mode once we introduce it
|
||||
# https://github.com/neondatabase/cloud/issues/24349
|
||||
env.pageserver.restart()
|
||||
|
||||
if expected_ignored_item is not None:
|
||||
assert env.pageserver.log_contains(expected_ignored_item_log_line_re)
|
||||
assert get_metric() == {expected_ignored_item: 1}
|
||||
@@ -195,3 +195,7 @@ def test_throttle_fair_config_is_settable_but_ignored_in_config_toml(
|
||||
ps_http = env.pageserver.http_client()
|
||||
conf = ps_http.tenant_config(env.initial_tenant)
|
||||
assert_throttle_config_with_field_fair_set(conf.effective_config["timeline_get_throttle"])
|
||||
|
||||
env.pageserver.allowed_errors.append(
|
||||
r'.*ignoring unknown configuration item path="tenant_config\.timeline_get_throttle\.fair"*'
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user