rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 04:30:38 +00:00

Author	SHA1	Message	Date
sharnoff	6f949e1556	Improve pageserver/safekeepeer HTTP API errors (#2461 ) Part of the general work on improving pageserver logs. Brief summary of changes: * Remove `ApiError::from_err` * Remove `impl From<anyhow::Error> for ApiError` * Convert `ApiError::{BadRequest, NotFound}` to use `anyhow::Error` * Note: `NotFound` has more verbose formatting because it's more likely to have useful information for the receiving "user" * Explicitly convert from `tokio::task::JoinError`s into `InternalServerError`s where appropriate Also note: many of the places where errors were implicitly converted to 500s have now been updated to return a more appropriate error. Some places where it's not yet possible to distinguish the error types have been left as 500s.	2022-09-20 17:02:10 -07:00
Kirill Bulatov	8d7024a8c2	Move path manipulation function to utils	2022-09-20 23:43:52 +03:00
sharnoff	4b25b9652a	Rename more zid-like idents (#2480 ) Follow-up to PR #2433 (`b8eb908a`). There's still a few more unresolved locations that have been left as-is for the same compatibility reasons in the original PR.	2022-09-20 11:06:31 -07:00
Kirill Bulatov	7863c4a702	Regenerate Hakari files, add a CI check for that	2022-09-20 11:39:10 +03:00
Arthur Petukhovsky	566e816298	Refactor safekeeper timelines handling (#2329 ) See https://github.com/neondatabase/neon/pull/2329 for details	2022-09-20 07:42:39 +00:00
sharnoff	9c35a09452	Improve build errors when `postgres_ffi` fails (#2460 ) This commit does two things of note: 1. Bumps the bindgen dependency from `0.59.1` to `0.60.1`. This gets us an actual error type from bindgen, so we can display what's wrong. 2. Adds `anyhow` as a build dependency, so our error message can be prettier. It's already used heavily elsewhere in the crates in this repo, so I figured the fact it's a build dependency doesn't matter much. I ran into this from running `cargo <cmd>` without running `make` first. Here's a comparison of the compiler output in those two cases. Before this commit: ``` error: failed to run custom build command for `postgres_ffi v0.1.0 ($repo_path/libs/postgres_ffi)` Caused by: process didn't exit successfully: `$repo_path/target/debug/build/postgres_ffi-2f7253b3ad3ca840/build-script-build` (exit status: 101) --- stdout cargo:rerun-if-changed=bindgen_deps.h --- stderr bindgen_deps.h:7:10: fatal error: 'c.h' file not found bindgen_deps.h:7:10: fatal error: 'c.h' file not found, err: true thread 'main' panicked at 'Unable to generate bindings: ()', libs/postgres_ffi/build.rs:135:14 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` After this commit: ``` error: failed to run custom build command for `postgres_ffi v0.1.0 ($repo_path/libs/postgres_ffi)` Caused by: process didn't exit successfully: `$repo_path/target/debug/build/postgres_ffi-e01fb59602596748/build-script-build` (exit status: 1) --- stdout cargo:rerun-if-changed=bindgen_deps.h --- stderr bindgen_deps.h:7:10: fatal error: 'c.h' file not found Error: Unable to generate bindings Caused by: clang diagnosed error: bindgen_deps.h:7:10: fatal error: 'c.h' file not found ```	2022-09-16 08:37:44 -07:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Heikki Linnakangas	40c845e57d	Switch to async for all concurrency in the pageserver. Instead of spawning helper threads, we now use Tokio tasks. There are multiple Tokio runtimes, for different kinds of tasks. One for serving libpq client connections, another for background operations like GC and compaction, and so on. That's not strictly required, we could use just one runtime, but with this you can still get an overview of what's happening with "top -H". There's one subtle behavior in how TenantState is updated. Before this patch, if you deleted all timelines from a tenant, its GC and compaction loops were stopped, and the tenant went back to Idle state. We no longer do that. The empty tenant stays Active. The changes to test_tenant_tasks.py are related to that. There's still plenty of synchronous code and blocking. For example, we still use blocking std::io functions for all file I/O, and the communication with WAL redo processes is still uses low-level unix poll(). We might want to rewrite those later, but this will do for now. The model is that local file I/O is considered to be fast enough that blocking - and preventing other tasks running in the same thread - is acceptable.	2022-09-12 14:21:00 +03:00
Kirill Bulatov	c9e7c2f014	Ensure all temporary and empty directories and files are cleansed on pageserver startup	2022-09-09 16:36:45 +03:00
Dmitry Rodionov	0b76b82e0e	review clean up	2022-09-08 19:59:42 +03:00
Heikki Linnakangas	35b4816f09	Turn GenericRemoteStorage into just a newtype around 'Arc<dyn RemoteStorage>' We had a pattern like this: match remote_storage { GenericRemoteStorage::Local(storage) => { let source = storage.remote_object_id(&file_path)?; ... storage .function(&source, ...) .await }, GenericRemoteStorage::S3(storage) => { ... exact same code as for the Local case ... }, This removes the code duplication, by allowing you to call the functions directly on GenericRemoteStorage. Also change RemoveObjectId to be just a type alias for String. Now that the callers of GenericRemoteStorage functions don't know whether they're dealing with the LocalFs or S3 implementation, RemoveObjectId must be the same type for both.	2022-09-08 19:59:42 +03:00
Anastasia Lubennikova	2794cd83c7	Prepare pg 15 support (generate bindings for pg15) (#2396 ) Another preparatory commit for pg15 support: * generate bindings for both pg14 and pg15; * update Makefile and CI scripts: now neon build depends on both PostgreSQL versions; * some code refactoring to decrease version-specific dependencies.	2022-09-07 12:40:48 +03:00
Anastasia Lubennikova	05e263d0d3	Prepare pg 15 support (build system and submodules) (#2337 ) * Add submodule postgres-15 * Support pg_15 in pgxn/neon * Renamed zenith -> neon in Makefile * fix name of codestyle check * Refactor build system to prepare for building multiple Postgres versions. Rename "vendor/postgres" to "vendor/postgres-v14" Change Postgres build and install directory paths to be version-specific: - tmp_install/build -> pg_install/build/14 - tmp_install/* -> pg_install/14/* And Makefile targets: - "make postgres" -> "make postgres-v14" - "make postgres-headers" -> "make postgres-v14-headers" - etc. Add Makefile aliases: - "make postgres" to build "postgres-v14" and in future, "postgres-v15" - similarly for "make postgres-headers" Fix POSTGRES_DISTRIB_DIR path in pytest scripts * Make postgres version a variable in codestyle workflow * Support vendor/postgres-v15 in codestyle check workflow * Support postgres-v15 building in Makefile * fix pg version in Dockerfile.compute-node * fix kaniko path * Build neon extensions in version-specific directories * fix obsolete mentions of vendor/postgres * use vendor/postgres-v14 in Dockerfile.compute-node.legacy * Use PG_VERSION_NUM to gate dependencies in inmem_smgr.c * Use versioned ECR repositories and image names for compute-node. The image name format is compute-node-vXX, where XX is postgres major version number. For now only v14 is supported. Old format unversioned name (compute-node) is left, because cloud repo depends on it. * update vendor/postgres submodule url (zenith->neondatabase rename) * Fix postgres path in python tests after rebase * fix path in regress test * Use separate dockerfiles to build compute-node: Dockerfile.compute-node-v15 should be identical to Dockerfile.compute-node-v14 except for the version number. This is a hack, because Kaniko doesn't support build ARGs properly * bump vendor/postgres-v14 and vendor/postgres-v15 * Don't use Kaniko cache for v14 and v15 compute-node images * Build compute-node images for different versions in different jobs Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2022-09-05 18:30:54 +03:00
Kirill Bulatov	8a7333438a	Extract common remote storage operations into GenericRemoteStorage (#2373 )	2022-09-02 11:58:28 +03:00
Heikki Linnakangas	f0a0d7bb7a	Split RcuWriteGuard::store() into two stages: store and wait. This makes it easier to explain which stages allow concurrent readers and writers. Expand the comments with examples, too.	2022-09-02 00:34:37 +03:00
Kirill Bulatov	a4803233bb	Remove `RemoteObjectName` and many remote storage generics in pageserver (#2360 )	2022-08-30 22:19:52 +03:00
Heikki Linnakangas	f09bd6bc88	Fix size checks in the "local" remote storage implementation. The code correctly detected too short and too long inputs, but the error message was bogus for the case the input stream was too long: Error: Provided stream has actual size 5 fthat is smaller than the given stream size 4 That check was only supposed to check for too small inputs, but it in fact caught too long inputs too. That was good, because the check below that that was supposed to check for too long inputs was in fact broken, and never did anything. It tried to read input a buffer of size 0, to check if there is any extra data, but reading to a zero-sized buffer always returns 0.	2022-08-30 18:44:06 +03:00
Dmitry Ivanov	96a50e99cf	Forward various connection params to compute nodes. (#2336 ) Previously, proxy didn't forward auxiliary `options` parameter and other ones to the client's compute node, e.g. ``` $ psql "user=john host=localhost dbname=postgres options='-cgeqo=off'" postgres=# show geqo; ┌──────┐ │ geqo │ ├──────┤ │ on │ └──────┘ (1 row) ``` With this patch we now forward `options`, `application_name` and `replication`. Further reading: https://www.postgresql.org/docs/current/libpq-connect.html Fixes #1287.	2022-08-30 17:36:21 +03:00
Heikki Linnakangas	bfa1d91612	Introduce RCU, and use it to protect latest_gc_cutoff_lsn. `latest_gc_cutoff_lsn` tracks the cutoff point where GC has been performed. Anything older than the cutoff might already have been GC'd away, and cannot be queried by get_page_at_lsn requests. It's protected by an RWLock. Whenever a get_page_at_lsn requests comes in, it first grabs the lock and reads the current `latest_gc_cutoff`, and holds the lock it until the request has been served. The lock ensures that GC doesn't start concurrently and remove page versions that we still need to satisfy the request. With the lock, get_page_at_lsn request could potentially be blocked for a long time. GC only holds the lock in exclusive mode for a short duration, but depending on how whether the RWLock is "fair", a read request might be queued behind the GC's exclusive request, which in turn might be queued behind a long-running read operation, like a basebackup. If the lock implementation is not fair, i.e. if a reader can always jump the queue if the lock is already held in read mode, then another problem arises: GC might be starved if a constant stream of GetPage requests comes in. To avoid the long wait or starvation, introduce a Read-Copy-Update mechanism to replace the lock on `latest_gc_cutoff_lsn`. With the RCU, reader can always read the latest value without blocking (except for a very short duration if the lock protecting the RCU is contended; that's comparable to a spinlock). And a writer can always write a new value without waiting for readers to finish using the old value. The old readers will continue to see the old value through their guard object, while new readers will see the new value. This is purely theoretical ATM, we don't have any reports of either starvation or blocking behind GC happening in practice. But it's simple to fix, so let's nip that problem in the bud.	2022-08-29 11:23:37 +03:00
Heikki Linnakangas	34b5d7aa9f	Remove unused dependency	2022-08-27 18:14:33 +03:00
Heikki Linnakangas	88a339ed73	Update a few crates "cargo tree -d" showed that we're building multiple versions of some crates. Update some crates, to avoid depending on multiple versions.	2022-08-27 18:14:30 +03:00
Egor Suvorov	c952f022bb	waldecoder: fix comment	2022-08-25 15:03:22 +02:00
Dmitry Ivanov	8e1d6dd848	Minor cleanup in pq_proto (#2322 )	2022-08-23 18:00:02 +03:00
Heikki Linnakangas	4013290508	Fix module doc comment. `///` is used for comments on the next code that follows, so the comment actually applied to the `use std::collections::BTreeMap;` line that follows. rustfmt complained about that: error: an inner attribute is not permitted following an outer doc comment --> /home/heikki/git-sandbox/neon/libs/utils/src/seqwait_async.rs:7:1 \| 5 \| /// \| --- previous doc comment 6 \| 7 \| #![warn(missing_docs)] \| ^^^^^^^^^^^^^^^^^^^^^^ not permitted following an outer attribute 8 \| 9 \| use std::collections::BTreeMap; \| ------------------------------- the inner attribute doesn't annotate this `use` import \| = note: inner attributes, like `#![no_std]`, annotate the item enclosing them, and are usually found at the beginning of source files help: to annotate the `use` import, change the attribute from inner to outer style \| 7 - #![warn(missing_docs)] 7 + #[warn(missing_docs)] \| `//!` is the correct syntax for comments that apply to the whole file.	2022-08-23 12:58:54 +03:00
Heikki Linnakangas	84cd40b416	rustfmt fixes. Not sure why these don't show up as CI failures, but on my laptop, rustfmt insists.	2022-08-19 22:21:15 +03:00
Heikki Linnakangas	9bc12f7444	Move auto-generated 'bindings' to a separate inner module. Re-export only things that are used by other modules. In the future, I'm imagining that we run bindgen twice, for Postgres v14 and v15. The two sets of bindings would go into separate 'bindings_v14' and 'bindings_v15' modules. Rearrange postgres_ffi modules. Move function, to avoid Postgres version dependency in timelines.rs Move function to generate a logical-message WAL record to postgres_ffi.	2022-08-18 13:25:00 +03:00
Heikki Linnakangas	e94a5ce360	Rename pg_control_ffi.h to bindgen_deps.h, for clarity. The pg_control_ffi.h name implies that it only includes stuff related to pg_control.h. That's mostly true currently, but really the point of the file is to include everything that we need to generate Rust definitions from.	2022-08-16 19:37:36 +03:00
Kirill Bulatov	648e8bbefe	Fix 1.63 clippy lints (#2282 )	2022-08-16 18:49:22 +03:00
Arseny Sher	431393e361	Find end of WAL on safekeepers using WalStreamDecoder. We could make it inside wal_storage.rs, but taking into account that - wal_storage.rs reading is async - we don't need s3 here - error handling is different; error during decoding is normal I decided to put it separately. Test cargo test test_find_end_of_wal_last_crossing_segment prepared earlier by @yeputons passes now. Fixes https://github.com/neondatabase/neon/issues/544 https://github.com/neondatabase/cloud/issues/2004 Supersedes https://github.com/neondatabase/neon/pull/2066	2022-08-14 14:47:14 +03:00
Egor Suvorov	a7bf60631f	postgres_ffi/waldecoder: introduce explicit `enum State` Previously it was emulated with a combination of nullable fields. This change should make the logic more readable.	2022-08-12 11:40:46 +03:00
Egor Suvorov	07bb7a2afe	postgres_ffi/waldecoder: remove unused startlsn	2022-08-12 11:40:46 +03:00
Egor Suvorov	142e247e85	postgres_ffi/waldecoder: validate more header fields	2022-08-12 11:40:46 +03:00
Ankur Srivastava	84d1bc06a9	refactor: replace lazy-static with once-cell (#2195 ) - Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy` - fixes #1147 Signed-off-by: Ankur Srivastava <best.ankur@gmail.com>	2022-08-05 19:34:04 +02:00
Arthur Petukhovsky	0a958b0ea1	Check find_end_of_wal errors instead of unwrap	2022-08-04 17:56:19 +03:00
Dmitry Rodionov	5f71aa09d3	support running tests against real s3 implementation without mocking	2022-08-04 11:14:05 +03:00
Thang Pham	6a664629fa	Add timeline physical size tracking (#2126 ) Ref #1902. - Track the layered timeline's `physical_size` using `pageserver_current_physical_size` metric when updating the layer map. - Report the local timeline's `physical_size` in timeline GET APIs. - Add `include-non-incremental-physical-size` URL flag to also report the local timeline's `physical_size_non_incremental` (similar to `logical_size_non_incremental`) - Add a `UIntGaugeVec` and `UIntGauge` to represent `u64` prometheus metrics Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2022-07-27 12:36:46 -04:00
Dmitry Ivanov	5f4ccae5c5	[proxy] Add the `password hack` authentication flow (#2095 ) [proxy] Add the `password hack` authentication flow This lets us authenticate users which can use neither SNI (due to old libpq) nor connection string `options` (due to restrictions in other client libraries). Note: `PasswordHack` will accept passwords which are not encoded in base64 via the "password" field. The assumption is that most user passwords will be valid utf-8 strings, and the rest may still be passed via "password_".	2022-07-25 17:23:10 +03:00
Heikki Linnakangas	b4c74c0ecd	Clean up unnecessary dependencies. Just to be tidy.	2022-07-20 16:31:25 +03:00
Thang Pham	160e52ec7e	Optimize branch creation (#2101 ) Resolves #2054 Context: branch creation needs to wait for GC to acquire `gc_cs` lock, which prevents creating new timelines during GC. However, because individual timeline GC iteration also requires `compaction_cs` lock, branch creation may also need to wait for compactions of multiple timelines. This results in large latency when creating a new branch, which we advertised as "instantly". This PR optimizes the latency of branch creation by separating GC into two phases: 1. Collect GC data (branching points, cutoff LSNs, etc) 2. Perform GC for each timeline The GC bottleneck comes from step 2, which must wait for compaction of multiple timelines. This PR modifies the branch creation and GC functions to allow GC to hold the GC lock only in step 1. As a result, branch creation doesn't need to wait for compaction to finish but only needs to wait for GC data collection step, which is fast.	2022-07-19 14:56:25 -04:00
Egor Suvorov	94003e1ebc	postgres_ffi: test restoring from intermediate LSNs by wal_craft	2022-07-15 19:06:50 +03:00
Egor Suvorov	19ea486cde	postgres_ffi/xlog_utils: refactor find_end_of_wal test * Deduce `last_segment` automatically * Get rid of local `wal_dir`/`wal_seg_size` variables * Prepare to test parsing of WAL from multiple specific points, not just the start; extract `check_end_of_wal` function to check both partial and non-partial WAL segments.	2022-07-15 19:06:50 +03:00
Alexander Bayandin	07acd6ddde	Fix clippy warnings in postgres_ffi/build.rs (#2081 )	2022-07-13 14:12:11 +01:00
Alexander Bayandin	61cc562822	Make POSTGRES_INSTALL_DIR configurable for build (#2067 )	2022-07-13 09:18:11 +01:00
Egor Suvorov	f540f115a3	postgres_ffi/wal_craft: simplify API	2022-07-08 18:30:56 +02:00
Egor Suvorov	0b5b2e8e0b	postgres_ffi/wal_craft: extract trait Crafter Make the intent of the code clearer.	2022-07-08 18:30:56 +02:00
Egor Suvorov	60e5dc10e6	postgres_ffi/wal_generate: use 'craft' instead of 'generate' It does very fine-tuned byte-to-byte WAL crafting, not a sloppy generation. Hence 'craft' sounds like a better description.	2022-07-08 18:30:56 +02:00
Egor Suvorov	80b7a3b51a	Test what happens when XLOG_SWITCH ends on page boundary, fix #1991	2022-07-08 15:37:26 +02:00
Egor Suvorov	85bda437de	postgres_ffi/wal_generate: add last_wal_record_xlog_switch and use it in tests Fix #1190: WalDecoder did not return correct LSN of the next record after processing a XLOG_SWITCH record	2022-07-08 15:37:26 +02:00
Egor Suvorov	c08fa9d562	postgres_ffi/wal_generate: support generating WAL for an already running Postgres server * ensure_server_config() function is added to ensure the server does not have background processes which intervene with WAL generation * Rework command line syntax * Add `print-postgres-config` subcommand which prints the required server configuration	2022-07-08 13:56:37 +02:00
Dmitry Rodionov	e1e24336b7	review adjustments, bring back timeline_detach and rename it to timeline_delete	2022-07-07 21:20:04 +03:00

1 2

98 Commits