rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-03 20:20:38 +00:00

Author	SHA1	Message	Date
Dmitry Rodionov	d53f9ab3eb	delete timelines from s3 (#4384 ) Delete data from s3 when timeline deletion is requested ## Summary of changes UploadQueue is altered to support scheduling of delete operations in stopped state. This looks weird, and I'm thinking whether there are better options/refactorings for upload client to make it look better. Probably can be part of https://github.com/neondatabase/neon/issues/4378 Deletion is implemented directly in existing endpoint because changes are not that significant. If we want more safety we can separate those or create feature flag for new behavior. resolves [#4193](https://github.com/neondatabase/neon/issues/4193) --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-06-08 15:01:22 +03:00
Dmitry Rodionov	1a1019990a	map TenantState::Broken to TenantAttachmentStatus::Failed (#4371 ) ## Problem Attach failures are not reported in public part of the api (in `attachment_status` field of TenantInfo). ## Summary of changes Expose TenantState::Broken as TenantAttachmentStatus::Failed In the way its written Failed status will be reported even if no attachment happened. (I e if tenant become broken on startup). This is in line with other members. I e Active will be resolved to Attached even if no actual attach took place. This can be tweaked if needed. At the current stage it would be overengineering without clear motivation resolves #4344	2023-06-07 18:25:30 +03:00
Heikki Linnakangas	df3bae2ce3	Use `compute_ctl` to manage Postgres in tests. (#3886 ) This adds test coverage for 'compute_ctl', as it is now used by all the python tests. There are a few differences in how 'compute_ctl' is called in the tests, compared to the real web console: - In the tests, the postgresql.conf file is included as one large string in the spec file, and it is written out as it is to the data directory. I added a new field for that to the spec file. The real web console, however, sets all the necessary settings in the 'settings' field, and 'compute_ctl' creates the postgresql.conf from those settings. - In the tests, the information needed to connect to the storage, i.e. tenant_id, timeline_id, connection strings to pageserver and safekeepers, are now passed as new fields in the spec file. The real web console includes them as the GUCs in the 'settings' field. (Both of these are different from what the test control plane used to do: It used to write the GUCs directly in the postgresql.conf file). The plan is to change the control plane to use the new method, and remove the old method, but for now, support both. Some tests that were sensitive to the amount of WAL generated needed small changes, to accommodate that compute_ctl runs the background health monitor which makes a few small updates. Also some tests shut down the pageserver, and now that the background health check can run some queries while the pageserver is down, that can produce a few extra errors in the logs, which needed to be allowlisted. Other changes: - remove obsolete comments about PostgresNode; - create standby.signal file for Static compute node; - log output of `compute_ctl` and `postgres` is merged into `endpoints/compute.log`. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-06-06 14:59:36 +01:00
Joonas Koivunen	0cef7e977d	refactor: just one way to shutdown a tenant (#4407 ) We have 2 ways of tenant shutdown, we should have just one. Changes are mostly mechanical simple refactorings. Added `warn!` on the "shutdown all remaining tasks" should trigger test failures in the between time of not having solved the "tenant/timeline owns all spawned tasks" issue. Cc: #4327.	2023-06-06 15:30:55 +03:00
Joonas Koivunen	18a9d47f8e	test: restore NotConnected being allowed globally (#4426 ) Flakyness introduced by #4402 evidence [^1]. I had assumed the NotConnected would had been an expected io error, but it's not. Restore the global `allowed_error`. [^1]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4407/5185897757/index.html#suites/82004ab4e3720b47bf78f312dabe7c55/14f636d0ecd3939d/	2023-06-06 13:51:39 +03:00
Joonas Koivunen	77598f5d0a	Better walreceiver logging (#4402 ) walreceiver logs are a bit hard to understand because of partial span usage, extra messages, ignored errors popping up as huge stacktraces. Fixes #3330 (by spans, also demote info -> debug). - arrange walreceivers spans into a hiearchy: - `wal_connection_manager{tenant_id, timeline_id}` -> `connection{node_id}` -> `poller` - unifies the error reporting inside `wal_receiver`: - All ok errors are now `walreceiver connection handling ended: {e:#}` - All unknown errors are still stacktraceful task_mgr reported errors with context `walreceiver connection handling failure` - Remove `connect` special casing, was: `DB connection stream finished` for ok errors - Remove `done replicating` special casing, was `Replication stream finished` for ok errors - lowered log levels for (non-exhaustive list): - `WAL receiver manager started, connecting to broker` (at startup) - `WAL receiver shutdown requested, shutting down` (at shutdown) - `Connection manager loop ended, shutting down` (at shutdown) - `sender is dropped while join handle is still alive` (at lucky shutdown, see #2885) - `timeline entered terminal state {:?}, stopping wal connection manager loop` (at shutdown) - `connected!` (at startup) - `Walreceiver db connection closed` (at disconnects?, was without span) - `Connection cancelled` (at shutdown, was without span) - `observed timeline state change, new state is {new_state:?}` (never after Timeline::activate was made infallible) - changed: - `Timeline dropped state updates sender, stopping wal connection manager loop` - was out of date; sender is not dropped but `Broken \| Stopping` state transition - also made `debug!` - `Timeline dropped state updates sender before becoming active, stopping wal connection manager loop` - was out of date: sender is again not dropped but `Broken \| Stopping` state transition - also made `debug!` - log fixes: - stop double reporting panics via JoinError	2023-06-05 17:35:23 +03:00
Joonas Koivunen	8142edda01	test: Less flaky gc (#4416 ) Solves a flaky test error in the wild[^1] by: - Make the gc shutdown signal reading an `allowed_error` - Note the gc shutdown signal readings as being in `allowed_error`s - Allow passing tenant conf to init_start to avoid unncessary tenants [^1]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4399/5176432780/index.html#suites/b97efae3a617afb71cb8142f5afa5224/2cd76021ea011f93	2023-06-05 15:43:52 +03:00
Alex Chi Z	82484e8241	pgserver: add more metrics for better observability (#4323 ) ## Problem This PR includes doc changes to the current metrics as well as adding new metrics. With the new set of metrics, we can quantitatively analyze the read amp., write amp. and space amp. in the system, when used together with https://github.com/neondatabase/neonbench close https://github.com/neondatabase/neon/issues/4312 ref https://github.com/neondatabase/neon/issues/4347 compaction metrics TBD, a novel idea is to print L0 file number and number of layers in the system, and we can do this in the future when we start working on compaction. ## Summary of changes * Add `READ_NUM_FS_LAYERS` for computing read amp. * Add `MATERIALIZED_PAGE_CACHE_HIT_UPON_REQUEST`. * Add `GET_RECONSTRUCT_DATA_TIME`. GET_RECONSTRUCT_DATA_TIME + RECONSTRUCT_TIME + WAIT_LSN_TIME should be approximately total time of reads. * Add `5.0` and `10.0` to `STORAGE_IO_TIME_BUCKETS` given some fsync runs slow (i.e., > 1s) in some cases. * Some `WAL_REDO` metrics are only used when Postgres is involved in the redo process. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-06-01 21:46:04 +03:00
Dmitry Rodionov	b190c3e6c3	reduce flakiness by allowing Compaction failed, retrying in X queue is in state Stopped. (#4379 ) resolves https://github.com/neondatabase/neon/issues/4374 by adding the error to allowed_errors	2023-05-30 20:11:44 +03:00
Heikki Linnakangas	a560b28829	Make new tenant/timeline IDs mandatory in create APIs. (#4304 ) We used to generate the ID, if the caller didn't specify it. That's bad practice, however, because network is never fully reliable, so it's possible we create a new tenant but the caller doesn't know about it, and because it doesn't know the tenant ID, it has no way of retrying or checking if it succeeded. To discourage that, make it mandatory. The web control plane has not relied on the auto-generation for a long time.	2023-05-26 16:19:36 +03:00
Joonas Koivunen	85e76090ea	test: fix ancestor is stopping flakyness (#4234 ) Flakyness most likely introduced in #4170, detected in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4232/4980691289/index.html#suites/542b1248464b42cc5a4560f408115965/18e623585e47af33. Opted to allow it globally because it can happen in other tests as well, basically whenever compaction is enabled and we stop pageserver gracefully.	2023-05-25 16:22:58 +00:00
Sasha Krassovsky	6052ecee07	Add connector extension to send Role/Database updates to console (#3891 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-05-25 12:36:57 +03:00
Christian Schwarz	e11ba24ec5	tenant loops: operate on the Arc<Tenant> directly (#4298 ) (Instead of going through mgr every iteration.) The `wait_for_active_tenant` function's `wait` argument could be removed because it was only used for the loop that waits for the tenant to show up in the tenants map. Since we're passing the tenant in, we now longer need to get it from the tenants map. NB that there's no guarantee that the tenant object is in the tenants map at the time the background loop function starts running. But the tenant mgr guarantees that it will be quite soon. See `tenant_map_insert` way upwards in the call hierarchy for details. This is prep work to eliminate `subscribe_for_state_updates` (PR #4299 ) Fixes: #3501	2023-05-25 10:49:09 +02:00
Christian Schwarz	df52587bef	attach-time tenant config (#4255 ) This PR adds support for supplying the tenant config upon /attach. Before this change, when relocating a tenant using `/detach` and `/attach`, the tenant config after `/attach` would be the default config from `pageserver.toml`. That is undesirable for settings such as the PITR-interval: if the tenant's config on the source was `30 days` and the default config on the attach-side is `7 days`, then the first GC run would eradicate 23 days worth of PITR capability. The API change is backwards-compatible: if the body is empty, we continue to use the default config. We'll remove that capability as soon as the cloud.git code is updated to use attach-time tenant config (https://github.com/neondatabase/neon/issues/4282 keeps track of this). unblocks https://github.com/neondatabase/cloud/issues/5092 fixes https://github.com/neondatabase/neon/issues/1555 part of https://github.com/neondatabase/neon/issues/886 (Tenant Relocation) Implementation ============== The preliminary PRs for this work were (most-recent to least-recent) * https://github.com/neondatabase/neon/pull/4279 * https://github.com/neondatabase/neon/pull/4267 * https://github.com/neondatabase/neon/pull/4252 * https://github.com/neondatabase/neon/pull/4235	2023-05-24 17:46:30 +03:00
Alexander Bayandin	2a3f54002c	test_runner: update dependencies (#4328 ) ## Problem `pytest` 6 truncates error messages and this is not configured. It's fixed in `pytest` 7, it prints the whole message (truncating limit is higher) if `--verbose` is set (it's set on CI). ## Summary of changes - `pytest` and `pytest` plugins are updated to their latest versions - linters (`black` and `ruff`) are updated to their latest versions - `mypy` and types are updated to their latest versions, new warnings are fixed - while we're here, allure updated its latest version as well	2023-05-24 12:47:01 +01:00
Stas Kelvich	dad3519351	Add SQL-over-HTTP endpoint to Proxy This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON response structure resembling that of the node-postgres driver. This method, using HTTP POST, achieves smaller amortized latencies in edge setups due to fewer round trips and an enhanced open connection reuse by the v8 engine. This update involves several intricacies: 1. SQL injection protection: We employed the extended query protocol, modifying the rust-postgres driver to send queries in one roundtrip using a text protocol rather than binary, bypassing potential issues like those identified in https://github.com/sfackler/rust-postgres/issues/1030. 2. Postgres type compatibility: As not all postgres types have binary representations (e.g., acl's in pg_class), we adjusted rust-postgres to respond with text protocol, simplifying serialization and fixing queries with text-only types in response. 3. Data type conversion: Considering JSON supports fewer data types than Postgres, we perform conversions where possible, passing all other types as strings. Key conversions include: - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain text) - postgres bool, null, text -> json bool, null, string - postgres array -> json array - postgres json and jsonb -> json object 4. Alignment with node-postgres: To facilitate integration with js libraries, we've matched the response structure of node-postgres, returning command tags and column oids. Command tag capturing was added to the rust-postgres functionality as part of this change.	2023-05-23 20:01:40 +03:00
Christian Schwarz	b391c94440	tenant create / update-config: reject unknown fields (#4267 ) This PR enforces that the tenant create / update-config APIs reject requests with unknown fields. This is a desirable property because some tenant config settings control the lifetime of user data (e.g., GC horizon or PITR interval). Suppose we inadvertently rename the `pitr_interval` field in the Rust code. Then, right now, a client that still uses the old name will send a tenant config request to configure a new PITR interval. Before this PR, we would accept such a request, ignore the old name field, and use the pageserver.toml default value for what the new PITR interval is. With this PR, we will instead reject such a request. One might argue that the client could simply check whether the config it sent has been applied, using the `/v1/tenant/.../config` endpoint. That is correct for tenant create and update-config. But, attach will soon [^1] grow the ability to have attach-time config as well. If we ignore unknown fields and fall back to global defaults in that case, we risk data loss. Example: 1. Default PITR in pageservers is 7 days. 2. Create a tenant and set its PITR to 30 days. 3. For 30 days, fill the tenant continuously with data. 4. Detach the tenant. 5. Attach tenant. Attach must use the 30-day PITR setting in this scenario. If it were to fall back to the 7-day default value, we would lose 23 days of PITR capability for the tenant. So, the PR that adds attach-time tenant config will build on the (clunky) infrastructure added in this PR [^1]: https://github.com/neondatabase/neon/pull/4255 Implementation Notes ==================== This could have been a simple `#[serde(deny_unknown_fields)]` but sadly, that is documented- but silent-at-compile-time-incompatible with `#[serde(flatten)]`. But we are still using this by adding on outer struct and use unit tests to ensure it is correct. `neon_local tenant config` now uses the `.remove()` pattern + bail if there are leftover config args. That's in line with what `neon_local tenant create` does. We should dedupe that logic in a future PR. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: Alex Chi <iskyzh@gmail.com>	2023-05-18 21:16:09 -04:00
Alexander Bayandin	1b2ece3715	Re-enable compatibility tests on Postgres 15 (#4274 ) - Enable compatibility tests for Postgres 15 - Also add `PgVersion::v_prefixed` property to return the version number with, _guess what,_ v-prefix!	2023-05-18 19:56:09 +01:00
Joonas Koivunen	4a76f2b8d6	upload new timeline index part json before 201 or on retry (#4204 ) Await for upload to complete before returning 201 Created on `branch_timeline` or when `bootstrap_timeline` happens. Should either of those waits fail, then on the retried request await for uploads again. This should work as expected assuming control-plane does not start to use timeline creation as a wait_for_upload mechanism. Fixes #3865, started from https://github.com/neondatabase/neon/pull/3857/files#r1144468177 Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-15 14:16:43 +03:00
Alexander Bayandin	bb06d281ea	Run regressions tests on both Postgres 14 and 15 (#4192 ) This PR adds tests runs on Postgres 15 and created unified Allure report with results for all tests. - Split `.github/actions/allure-report` into `.github/actions/allure-report-store` and `.github/actions/allure-report-generate` - Add debug or release pytest parameter for all tests (depending on `BUILD_TYPE` env variable) - Add Postgres version as a pytest parameter for all tests (depending on `DEFAULT_PG_VERSION` env variable) - Fix `test_wal_restore` and `restore_from_wal.sh` to support path with `[`/`]` in it (fixed by applying spellcheck to the script and fixing all warnings), `restore_from_wal_archive.sh` is deleted as unused. - All known failures on Postgres 15 marked with xfail	2023-05-12 15:28:51 +01:00
Alexander Bayandin	1d490b2311	Make benchmark_fixture less noisy	2023-05-10 16:59:03 +01:00
Dmitry Rodionov	eb3a8be933	keep track of timeline deletion status in IndexPart to prevent timeline resurrection (#3919 ) Before this patch, the following sequence would lead to the resurrection of a deleted timeline: - create timeline - wait for its index part to reach s3 - delete timeline - wait an arbitrary amount of time, including 0 seconds - detach tenant - attach tenant - the timeline is there and Active again This happens because we only kept track of the deletion in the tenant dir (by deleting the timeline dir) but not in S3. The solution is to turn the deleted timeline's IndexPart into a tombstone. The deletion status of the timeline is expressed in the `deleted_at: Option<NativeDateTime>` field of IndexPart. It's `None` while the timeline is alive and `Some(deletion time stamp)` if it is deleted. We change the timeline deletion handler to upload this tombstoned IndexPart. The handler does not return success if the upload fails. Coincidentally, this fixes the long-stanging TODO about the `std::fs::remove_dir_all` being not atomic. It need not be atomic anymore because we set the `deleted_at=Some()` before starting the `remove_dir_all`. The tombstone is in the IndexPart only, not in the `metadata`. So, we only have the tombstone and the `remove_dir_all` benefits mentioned above if remote storage is configured. This was a conscious trade-off because there's no good format evolution story for the current metadata file format. The introduction of this additional step into `delete_timeline` was painful because delete_timeline needs to be 1. cancel-safe 2. idempotent 3. safe to call concurrently These are mostly self-inflicted limitations that can be avoided by using request-coalescing. PR https://github.com/neondatabase/neon/pull/4159 will do that. fixes https://github.com/neondatabase/neon/issues/3560 refs https://github.com/neondatabase/neon/issues/3889 (part of tenant relocation) Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-05-10 10:27:12 +02:00
Alexander Bayandin	653e633c59	test_runner: add --pg-version pytest argument (#4037 ) - allows setting Postgres version for testing using --pg-version argument - fixes tests for the non-default Postgres version.	2023-05-05 02:57:47 +03:00
Christian Schwarz	7dd9553bbb	eviction: regression test + distinguish layer write from map insert (#4005 ) This patch adds a regression test for the threshold-based layer eviction. The test asserts the basic invariant that, if left alone, the residence statuses will stabilize, with some layers resident and some layers evicted. Thereby, we cover both the aspect of last-access-time-threshold-based eviction, and the "imitate access" hacks that we put in recently. The aggressive `period` and `threshold` values revealed a subtle bug which is also fixed in this patch. The symptom was that, without the Rust changes of this patch, there would be occasional test failures due to `WARN... unexpectedly downloading` log messages. These log messages were caused by the "imitate access" calls of the eviction task. But, the whole point of the "imitate access" hack was to prevent eviction of the layers that we access there. After some digging, I found the root cause, which is the following race condition: 1. Compact: Write out an L1 layer from several L0 layers. This records residence event `LayerCreate` with the current timestamp. 2. Eviction: imitate access logical size calculation. This accesses the L0 layers because the L1 layer is not yet in the layer map. 3. Compact: Grab layer map lock, add the new L1 to layer map and remove the L0s, release layer map lock. 4. Eviction: observes the new L1 layer whose only activity timestamp is the `LayerCreate` event. The L1 layer had no chance of being accessed until after (3). So, if enough time passes between (1) and (3), then (4) will observe a layer with `now-last_activity > threshold` and evict it The fix is to require the first `record_residence_event` to happen while we already hold the layer map lock. The API requires a ref to a `BatchedUpdates` as a witness that we are inside a layer map lock. That is not fool-proof, e.g., new call sites for `insert_historic` could just completely forget to record the residence event. It would be nice to prevent this at the type level. In the meantime, we have a rate-limited log messages to warn us, if such an implementation error sneaks in in the future. fixes https://github.com/neondatabase/neon/issues/3593 fixes https://github.com/neondatabase/neon/issues/3942 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-04 16:16:48 +02:00
Arthur Petukhovsky	8543485e92	Pull clone timeline from peer safekeepers (#4089 ) Add HTTP endpoint to initialize safekeeper timeline from peer safekeepers. This is useful for initializing new safekeeper to replace failed safekeeper. Not fully "correct" in all cases, but should work in most. This code is not suitable for production workloads but can be tested on staging to get started. New endpoint is separated from usual cases and should not affect anything if no one explicitly uses a new endpoint. We can rollback this commit in case of issues.	2023-04-28 14:20:46 +00:00
Stas Kelvich	5bb971d64e	fix more python tests	2023-04-28 17:15:43 +03:00
Stas Kelvich	0364f77b9a	fix python styling	2023-04-28 17:15:43 +03:00
Stas Kelvich	9486d76b2a	Add tests for link auth to compute connection	2023-04-28 17:15:43 +03:00
Arseny Sher	b2a3981ead	Move tracking of walsenders out of Timeline. Refactors walsenders out of timeline.rs to makes it less convoluted into separate WalSenders with its own lock, but otherwise having the same structure. Tracking of in-memory remote_consistent_lsn is also moved there as it is mainly received from pageserver. State of walsender (feedback) is also restructured to be cleaner; now it is either PageserverFeedback or StandbyFeedback(StandbyReply, HotStandbyFeedback), but not both.	2023-04-28 06:22:13 +04:00
MMeent	e6ec2400fc	Enable hot standby PostgreSQL replicas. Notes: - This still needs UI support from the Console - I've not tuned any GUCs for PostgreSQL to make this work better - Safekeeper has gotten a tweak in which WAL is sent and how: It now sends zero-ed WAL data from the start of the timeline's first segment up to the first byte of the timeline to be compatible with normal PostgreSQL WAL streaming. - This includes the commits of #3714 Fixes one part of https://github.com/neondatabase/neon/issues/769 Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-04-27 15:26:44 +02:00
Christian Schwarz	6861259be7	add global metric for unexpected on-demand downloads (#4069 ) Until we have toned down the prod logs to zero WARN and ERROR, we want a dedicated metric for which we can have a dedicated alert. fixes https://github.com/neondatabase/neon/issues/3924	2023-04-26 15:18:26 +02:00
Arseny Sher	31a3910fd9	Remove wait_for_sk_commit_lsn_to_reach_remote_storage. It had a couple of inherent races: 1) Even if compute is killed before the call, some more data might still arrive to safekeepers after commit_lsn on them is polled, advancing it. Then checkpoint on pageserver might not include this tail, and so upload of expected LSN won't happen until one more checkpoint. 2) commit_lsn is updated asynchronously -- compute can commit transaction before communicating commit_lsn to even single safekeeper (sync-safekeepers can be used to forces the advancement). This makes semantics of wait_for_sk_commit_lsn_to_reach_remote_storage quite complicated. Replace it with last_flush_lsn_upload which 1) Learns last flush LSN on compute; 2) Waits for it to arrive to pageserver; 3) Checkpoints it; 4) Waits for the upload. In some tests this keeps compute alive longer than before, but this doesn't seem to be important. There is a chance this fixes https://github.com/neondatabase/neon/issues/3209	2023-04-26 13:46:33 +04:00
Christian Schwarz	fa20e37574	add gauge for in-flight layer uploads (#3951 ) For the "worst-case /storage usage panel", we need to compute ``` remote size + local-only size ``` We currently don't have a metric for local-only layers. The number of in-flight layers in the upload queue is just that, so, let Prometheus scrape it. The metric is two counters (started and finished). The delta is the amount of in-flight uploads in the queue. The metrics are incremented in the respective `call_unfinished_metric_*` functions. These track ongoing operations by file_kind and op_kind. We only need this metric for layer uploads, so, there's the new RemoteTimelineClientMetricsCallTrackSize type that forces all call sites to decide whether they want the size tracked or not. If we find that other file_kinds or op_kinds are interesting (metadata uploads, layer downloads, layer deletes) are interesting, we can just enable them, and they'll be just another label combination within the metrics that this PR adds. fixes https://github.com/neondatabase/neon/issues/3922	2023-04-25 14:22:48 +02:00
Christian Schwarz	e83684b868	add libmetric metric for each logged log message (#4055 ) This patch extends the libmetrics logging setup functionality with a `tracing` layer that increments a Prometheus counter each time we log a log message. We have the counter per tracing event level. This allows for monitoring WARN and ERR log volume without parsing the log. Also, it would allow cross-checking whether logs got dropped on the way into Loki. It would be nicer if we could hook deeper into the tracing logging layer, to avoid evaluating the filter twice. But I don't know how to do it.	2023-04-25 14:10:18 +02:00
Alexander Bayandin	0c82ff3d98	test_runner: add Timeline Inspector to Grafana links (#4021 )	2023-04-14 11:46:47 +01:00
Christian Schwarz	8895f28dae	make evictions_low_residence_duration_metric_threshold per-tenant (#3949 ) Before this patch, if a tenant would override its eviction_policy setting to use a lower LayerAccessThreshold::threshold than the `evictions_low_residence_duration_metric_threshold`, the evictions done for that tenant would count towards the `evictions_with_low_residence_duration` metric. That metric is used to identify pre-mature evictions, commonly triggered by disk-usage-based eviction under disk pressure. We don't want that to happen for the legitimate evictions of the tenant that overrides its eviction_policy. So, this patch - moves the setting into TenantConf - adds test coverage - updates the staging & prod yamls Forward Compatibility: Software before this patch will ignore the new tenant conf field and use the global one instead. So we can roll back safely. Backward Compatibility: Parsing old configs with software as of this patch will fail in `PageServerConf::parse_and_validate` with error `unrecognized pageserver option 'evictions_low_residence_duration_metric_threshold'` if the option is still present in the global section. We deal with this by updating the configs in Ansible. fixes https://github.com/neondatabase/neon/issues/3940	2023-04-14 13:25:45 +03:00
Sasha Krassovsky	fd31fafeee	Make proxy shutdown when all connections are closed (#3764 ) ## Describe your changes Makes Proxy start draining connections on SIGTERM. ## Issue ticket number and link #3333	2023-04-13 19:31:30 +03:00
Heikki Linnakangas	53f438a8a8	Rename "Postgres nodes" in control_plane to endpoints. We use the term "endpoint" in for compute Postgres nodes in the web UI and user-facing documentation now. Adjust the nomenclature in the code. This changes the name of the "neon_local pg" command to "neon_local endpoint". Also adjust names of classes, variables etc. in the python tests accordingly. This also changes the directory structure so that endpoints are now stored in: .neon/endpoints/<endpoint id> instead of: .neon/pgdatadirs/tenants/<tenant_id>/<endpoint (node) name> The tenant ID is no longer part of the path. That means that you cannot have two endpoints with the same name/ID in two different tenants anymore. That's consistent with how we treat endpoints in the real control plane and proxy: the endpoint ID must be globally unique.	2023-04-13 14:34:29 +03:00
Dmitry Rodionov	15d1f85552	Add reason to TenantState::Broken (#3954 ) Reason and backtrace are added to the Broken state. Backtrace is automatically collected when tenant entered the broken state. The format for API, CLI and metrics is changed and unified to return tenant state name in camel case. Previously snake case was used for metrics and camel case was used for everything else. Now tenant state field in TenantInfo swagger spec is changed to contain state name in "slug" field and other fields (currently only reason and backtrace for Broken variant in "data" field). To allow for this breaking change state was removed from TenantInfo swagger spec because it was not used anywhere. Please note that the tenant's broken reason is not persisted on disk so the reason is lost when pageserver is restarted. Requires changes to grafana dashboard that monitors tenant states. Closes #3001 --------- Co-authored-by: theirix <theirix@gmail.com>	2023-04-13 12:11:43 +03:00
Dmitry Rodionov	bfeb428d1b	tests: make neon_fixtures a bit thinner by splitting out some pageserver related helpers (#3977 ) neon_fixture is quite big and messy, lets clean it up a bit.	2023-04-07 13:47:28 +03:00
Alexander Bayandin	887cee64e2	test_runner: add links to grafana for remote tests (#3961 ) Add Grafana links to allure reports to make it easier to debug perf test failures	2023-04-06 13:52:41 +01:00
Alexander Bayandin	105b8bb9d3	test_runner: automatically rerun flaky tests (#3880 ) This PR adds a plugin that automatically reruns (up to 3 times) flaky tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and `pytest-rerunfailures` plugin. As the first approximation we consider the test flaky if it has failed on the main branch in the last 10 days. Flaky tests are fetched by `scripts/flaky_tests.py` script (it's possible to use it in a standalone mode to learn which tests are flaky), stored to a JSON file, and then the file is passed to the pytest plugin.	2023-04-04 12:21:54 +01:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Shany Pozin	0f7de84785	Allow calling detach on ignored tenant (#3834 ) ## Describe your changes Added a query param to detach API Allow to remove local state of a tenant even if its not in the memory (following ignore API) ## Issue ticket number and link #3828 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2023-03-22 07:17:00 +00:00
Christian Schwarz	881356c417	add metrics to detect eviction-induced thrashing (#3837 ) This patch adds two metrics that will enable us to detect thrashing of layers, i.e., repetitions of `eviction, on-demand-download, eviction, ... ` for a given layer. The first metric counts all layer evictions per timeline. It requires no further explanation. The second metric counts the layer evictions where the layer was resident for less than a given threshold. We can alert on increments to the second metric. The first metric will serve as a baseline, and further, it's generally interesting, outside of thrashing. The second metric's threshold is configurable in PageServerConf and defaults to 24h. The threshold value is reproduced as a label in the metric because the counter's value is semantically tied to that threshold. Since changes to the config and hence the label value are infrequent, this will have low storage overhead in the metrics storage. The data source to determine the time that the layer was resident is the file's `mtime`. Using `mtime` is more of a crutch. It would be better if Pageserver did its own persistent bookkeeping of residence change events instead of relying on the filesystem. We had some discussion about this: https://github.com/neondatabase/neon/pull/3809#issuecomment-1470448900 My position is that `mtime` is good enough for now. It can theoretically jump forward if someone copies files without resetting `mtime`. But that shouldn't happen in practice. Note that moving files back and forth doesn't change `mtime`, nor does `chown` or `chmod`. Lastly, `rsync -a`, which is typically used for filesystem-level backup / restore, correctly syncs `mtime`. I've added a label that identifies the data source to keep options open for a future, better data source than `mtime`. Since this value will stay the same for the time being, it's not a problem for metrics storage. refs https://github.com/neondatabase/neon/issues/3728	2023-03-20 16:11:36 +01:00
Heikki Linnakangas	fea4b5f551	Switch to EdDSA algorithm for the storage JWT authentication tokens. The control plane currently only supports EdDSA. We need to either teach the storage to use EdDSA, or the control plane to use RSA. EdDSA is more modern, so let's use that. We could support both, but it would require a little more code and tests, and we don't really need the flexibility since we control both sides.	2023-03-20 16:28:01 +02:00
Joonas Koivunen	0c1228c37a	feat: store initial timeline in env fixture (#3839 ) minor change, but will allow more use in future for the default tenants. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-03-20 11:57:27 +02:00
Shany Pozin	93f3f4ab5f	Return NotFound in mgmt API requests when tenant is not present in the pageserver (#3818 ) ## Describe your changes Add Error enum for tenant state response to allow better error handling in mgmt api ## Issue ticket number and link #2238 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-19 10:44:42 +02:00
Heikki Linnakangas	10a5d36af8	Separate mgmt and libpq authentication configs in pageserver. (#3773 ) This makes it possible to enable authentication only for the mgmt HTTP API or the compute API. The HTTP API doesn't need to be directly accessible from compute nodes, and it can be secured through network policies. This also allows rolling out authentication in a piecemeal fashion.	2023-03-15 13:52:29 +02:00
Joonas Koivunen	15b692ccc9	test: more strict finding of WARN, ERROR lines (#3798 ) this prevents flakyness when `WARN\|ERROR` appears in some other part of the line, for example in a random filename.	2023-03-14 15:27:52 +02:00

1 2 3 4 5 ...

351 Commits