rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-14 08:52:56 +00:00

Author	SHA1	Message	Date
Lassi Pölönen	e520293090	Add build info metric to pageserver, safekeeper and proxy (#2596 ) * Test that we emit build info metric for pageserver, safekeeper and proxy with some non-zero length revision label * Emit libmetrics_build_info on startup of pageserver, safekeeper and proxy with label "revision" which tells the git revision.	2022-10-11 09:54:32 +03:00
Kirill Bulatov	3be3bb7730	Be more verbose with initdb for pageserver timeline creation	2022-10-09 08:21:11 +03:00
Kirill Bulatov	01d2c52c82	Tidy up feature reporting	2022-10-09 08:21:11 +03:00
Kirill Bulatov	9f79e7edea	Merge pageserver helper binaries and provide it for deployment (#2590 )	2022-10-08 12:42:17 +00:00
Heikki Linnakangas	9e1eb69d55	Increase default compaction_period setting to 20 s. The previous default of 1 s caused excessive CPU usage when there were a lot of projects. Polling every timeline once a second was too aggressive so let's reduce it. Fixes https://github.com/neondatabase/neon/issues/2542, but we probably also want do to something so that we don't poll timelines that have received no new WAL or layers since last check.	2022-10-07 13:55:19 +03:00
Andrés	47bae68a2e	Make get_lsn_by_timestamp available in mgmt API (#2536 ) (#2560 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-06 12:42:50 +03:00
Konstantin Knizhnik	ff8c481777	Normalize last_record LSN in wal receiver (#2529 ) * Add test for branching on page boundary * Normalize start recovery point Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech>	2022-10-06 09:01:56 +03:00
sharnoff	580584c8fc	Remove control_plane deps on pageserver/safekeeper (#2513 ) Creates new `pageserver_api` and `safekeeper_api` crates to serve as the shared dependencies. Should reduce both recompile times and cold compile times. Decreases the size of the optimized `neon_local` binary: 380M -> 179M. No significant changes for anything else (mostly as expected).	2022-10-04 11:14:45 -07:00
Kirill Bulatov	d823e84ed5	Allow attaching tenants with zero timelines	2022-10-04 18:13:51 +03:00
Kirill Bulatov	231dfbaed6	Do not remove empty timelines/ directory for tenants	2022-10-04 18:13:51 +03:00
Joonas Koivunen	31123d1fa8	Silence clippies, minor doc fix (#2543 ) * doc: remove stray backtick * chore: clippy::let_unit_value * chore: silence useless_transmute, duplicate_mod * chore: remove allowing deref_nullptr not needed since bindgen 0.60.0. * chore: remove repeated allowed lints they are already allowed from the crate root.	2022-10-03 17:44:17 +03:00
Kirill Bulatov	7b2f9dc908	Reuse existing tenants during attach (#2540 )	2022-10-03 13:33:55 +03:00
Dmitry Rodionov	fb68d01449	Preserve task result in TaskHandle by keeping join handle around (#2521 ) * Preserve task result in TaskHandle by keeping join handle around The solution is not great, but it should hep to debug staging issue I tried to do it in a least destructive way. TaskHandle used only in one place so it is ok to use something less generic unless we want to extend its usage across the codebase. In its current current form for its single usage place it looks too abstract Some problems around this code: 1. Task can drop event sender and continue running 2. Task cannot be joined several times (probably not needed, but still, can be surprising) 3. Had to split task event into two types because ahyhow::Error does not implement clone. So TaskContinueEvent derives clone but usual task evend does not. Clone requirement appears because we clone the current value in next_task_event. Taking it by reference is complicated. 4. Split between Init and Started is artificial and comes from watch::channel requirement to have some initial value. To summarize from 3 and 4. It may be a better idea to use RWLock or a bounded channel instead	2022-09-26 20:57:02 +00:00
sharnoff	805bb198c2	Miscellaneous small fixups (#2503 ) Changes are: * Correct typo "firts" -> "first" * Change <empty panic with comment explaining> to <panic with message taken from the comment> * Fix weird indentation that rustfmt was failing to handle * Use existing `anyhow::{anyhow,bail}!` as `{anyhow,bail}!` if it's already in scope * Spell `Result<T, anyhow::Error>` as `anyhow::Result<T>` * In general, closer to matching the rest of the codebase * Change usages of `hash_map::Entry` to `Entry` when it's already in scope * A quick search shows our style on this one varies across the files it's used in	2022-09-23 11:49:28 -07:00
MMeent	bc3ba23e0a	Fix extreme metrics bloat in storage sync (#2506 ) * Fix extreme metrics bloat in storage sync From 78 metrics per (timeline, tenant) pair down to (max) 10 metrics per (timeline, tenant) pair, plus another 117 metrics in a global histogram that replaces the previous per-timeline histogram. * Drop image sync operation metric series when dropping TimelineMetrics.	2022-09-23 14:35:36 +02:00
Dmitry Rodionov	43560506c0	remove duplicate walreceiver connection span	2022-09-23 00:27:24 +03:00
Egor Suvorov	262fa3be09	pageserver pg proto: add missing auth checks (#2494 ) Fixes #1858	2022-09-22 17:07:08 +03:00
Anastasia Lubennikova	2d012f0d32	Fix rebase conflicts in pageserver code	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	eba419fda3	Clean up the pg_version choice code	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	ed6b75e301	show pg_version in create_timeline info span	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	862902f9e5	Update readme and openapi spec	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	8d890b3cbb	fix clippy warnings	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	5dddeb8d88	Use non-versioned pg_distrib dir	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	a69e060f0f	fix clippy warning	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	03c606f7c5	Pass pg_version parameter to timeline import command. Add pg_version field to LocalTimelineInfo. Use pg_version in the export_import_between_pageservers script	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	9dfede8146	Handle backwards-compatibility of TimelineMetadata. This commit bumps TimelineMetadata format version and makes it independent from STORAGE_FORMAT_VERSION.	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	86bf491981	Support pg 15 - Split postgres_ffi into two version specific files. - Preserve pg_version in timeline metadata. - Use pg_version in safekeeper code. Check for postgres major version mismatch. - Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding. - Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture. To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable: 'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress' Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.	2022-09-22 14:15:13 +03:00
Dmitry Rodionov	e764c1e60f	remove self argument from several spans	2022-09-22 11:41:04 +03:00
Konstantin Knizhnik	f3073a4db9	R-Tree layer map (#2317 ) Replace the layer array and linear search with R-tree So far, the in-memory layer map that holds information about layer files that exist, has used a simple Vec, in no particular order, to hold information about all the layers. That obviously doesn't scale very well; with thousands of layer files the linear search was consuming a lot of CPU. Replace it with a two-dimensional R-tree, with Key and LSN ranges as the dimensions. For the R-tree, use the 'rstar' crate. To be able to use that, we convert the Keys and LSNs into 256-bit integers. 64 bits would be enough to represent LSNs, and 128 bits would be enough to represent Keys. However, we use 256 bits, because rstar internally performs multiplication to calculate the area of rectangles, and the result of multiplying two 128 bit integers doesn't necessarily fit in 128 bits, causing integer overflow and, if overflow-checks are enabled, panic. To avoid that, we use 256 bit integers. Add a performance test that creates a lot of layer files, to demonstrate the benefit.	2022-09-22 08:35:06 +03:00
sharnoff	6f949e1556	Improve pageserver/safekeepeer HTTP API errors (#2461 ) Part of the general work on improving pageserver logs. Brief summary of changes: * Remove `ApiError::from_err` * Remove `impl From<anyhow::Error> for ApiError` * Convert `ApiError::{BadRequest, NotFound}` to use `anyhow::Error` * Note: `NotFound` has more verbose formatting because it's more likely to have useful information for the receiving "user" * Explicitly convert from `tokio::task::JoinError`s into `InternalServerError`s where appropriate Also note: many of the places where errors were implicitly converted to 500s have now been updated to return a more appropriate error. Some places where it's not yet possible to distinguish the error types have been left as 500s.	2022-09-20 17:02:10 -07:00
Kirill Bulatov	8d7024a8c2	Move path manipulation function to utils	2022-09-20 23:43:52 +03:00
Kirill Bulatov	6b8dcad1bb	Unify timeline creation steps	2022-09-20 23:43:52 +03:00
Kirill Bulatov	310c507303	Merge path retrieval methods in config.rs	2022-09-20 23:43:52 +03:00
Kirill Bulatov	6fc719db13	Merge timelines.rs with tenant.rs	2022-09-20 23:43:52 +03:00
sharnoff	4a3b3ff11d	Move testing pageserver libpq cmds to HTTP api (#2429 ) Closes #2422. The APIs have been feature gated with the `testing_api!` macro so that they return 400s when support hasn't been compiled in.	2022-09-20 11:28:12 -07:00
sharnoff	4b25b9652a	Rename more zid-like idents (#2480 ) Follow-up to PR #2433 (`b8eb908a`). There's still a few more unresolved locations that have been left as-is for the same compatibility reasons in the original PR.	2022-09-20 11:06:31 -07:00
Arthur Petukhovsky	566e816298	Refactor safekeeper timelines handling (#2329 ) See https://github.com/neondatabase/neon/pull/2329 for details	2022-09-20 07:42:39 +00:00
Dmitry Rodionov	fcb4a61a12	Adjust spans around gc and compaction So compaction and gc loops have their own span to always show tenant id in log messages.	2022-09-19 20:08:38 +03:00
Dmitry Rodionov	44fd4e3c9f	add more logs	2022-09-16 18:14:05 +03:00
sharnoff	db5ec0dae7	Cleanup/simplify logical size calculation (#2459 ) Should produce identical results; replaces an error case that shouldn't be possible with `expect`.	2022-09-15 23:50:46 -07:00
Kirill Bulatov	031e57a973	Disable failpoints by default	2022-09-16 09:26:29 +03:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Dmitry Rodionov	1d53173e62	update openapi spec (tenant state has changed)	2022-09-13 21:53:59 +03:00
Dmitry Rodionov	db0c49148d	clean up metrics in handle_pagerequests	2022-09-13 21:21:45 +03:00
Kirill Bulatov	1a8c8b04d7	Merge Repository and Tenant entities, rework tenant background jobs	2022-09-13 15:39:39 +03:00
Kirill Bulatov	2a837d7de7	Create tenants in temporary directory first (#2426 )	2022-09-12 21:04:33 +00:00
Heikki Linnakangas	40c845e57d	Switch to async for all concurrency in the pageserver. Instead of spawning helper threads, we now use Tokio tasks. There are multiple Tokio runtimes, for different kinds of tasks. One for serving libpq client connections, another for background operations like GC and compaction, and so on. That's not strictly required, we could use just one runtime, but with this you can still get an overview of what's happening with "top -H". There's one subtle behavior in how TenantState is updated. Before this patch, if you deleted all timelines from a tenant, its GC and compaction loops were stopped, and the tenant went back to Idle state. We no longer do that. The empty tenant stays Active. The changes to test_tenant_tasks.py are related to that. There's still plenty of synchronous code and blocking. For example, we still use blocking std::io functions for all file I/O, and the communication with WAL redo processes is still uses low-level unix poll(). We might want to rewrite those later, but this will do for now. The model is that local file I/O is considered to be fast enough that blocking - and preventing other tasks running in the same thread - is acceptable.	2022-09-12 14:21:00 +03:00
Kirill Bulatov	c9e7c2f014	Ensure all temporary and empty directories and files are cleansed on pageserver startup	2022-09-09 16:36:45 +03:00
Dmitry Rodionov	0b76b82e0e	review clean up	2022-09-08 19:59:42 +03:00
Heikki Linnakangas	35b4816f09	Turn GenericRemoteStorage into just a newtype around 'Arc<dyn RemoteStorage>' We had a pattern like this: match remote_storage { GenericRemoteStorage::Local(storage) => { let source = storage.remote_object_id(&file_path)?; ... storage .function(&source, ...) .await }, GenericRemoteStorage::S3(storage) => { ... exact same code as for the Local case ... }, This removes the code duplication, by allowing you to call the functions directly on GenericRemoteStorage. Also change RemoveObjectId to be just a type alias for String. Now that the callers of GenericRemoteStorage functions don't know whether they're dealing with the LocalFs or S3 implementation, RemoveObjectId must be the same type for both.	2022-09-08 19:59:42 +03:00

1 2 3 4 5 ...

933 Commits