rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 14:02:55 +00:00

Author	SHA1	Message	Date
Vadim Kharitonov	0b428f7c41	Enable licenses check for 3rd-parties	2023-01-03 15:11:50 +01:00
Egor Suvorov	cb61944982	Safekeeper: refactor auth validation * Load public auth key on startup and store it in the config. * Get rid of a separate `auth` parameter which was passed all over the place.	2022-12-31 02:27:08 +03:00
Arseny Sher	f6bf7b2003	Add tenant_id to safekeeper spans. Now that it's hard to map timeline id into project in the console, this should help a little.	2022-12-27 20:19:12 +03:00
Arseny Sher	fee8bf3a17	Remove global_commit_lsn. It is complicated and fragile to maintain and not really needed; update commit_lsn locally only when we have enough WAL flushed. ref https://github.com/neondatabase/neon/issues/3069	2022-12-27 20:19:12 +03:00
Arseny Sher	1ad6e186bc	Refuse ProposerElected if it is going to truncate correct WAL. Prevents commit_lsn monotonicity violation (otherwise harmless). closes https://github.com/neondatabase/neon/issues/3069	2022-12-27 20:19:12 +03:00
Kirill Bulatov	fca25edae8	Fix 1.66 Clippy warnings (#3178 ) 1.66 release speeds up compile times for over 10% according to tests. Also its Clippy finds plenty of old nits in our code: * useless conversion, `foo as u8` where `foo: u8` and similar, removed `as u8` and similar * useless references and dereferenced (that were automatically adjusted by the compiler), removed various `&` and `` bool -> u8 conversion via `if/else`, changed to `u8::from` * Map `.iter()` calls where only values were used, changed to `.values()` instead Standing out lints: * `Eq` is missing in our protoc generated structs. Silenced, does not seem crucial for us. * `fn default` looks like the one from `Default` trait, so I've implemented that instead and replaced the `dummy_` method in tests with `::default()` invocation Clippy detected that ``` if retry_attempt < u32::MAX { retry_attempt += 1; } ``` is a saturating add and proposed to replace it.	2022-12-22 14:27:48 +02:00
Kirill Bulatov	3735aece56	Safekeeper: Always use workdir as a full path	2022-12-19 21:43:36 +02:00
Dmitry Ivanov	61194ab2f4	Update rust-postgres everywhere I've rebased[1] Neon's fork of rust-postgres to incorporate latest upstream changes (including dependabot's fixes), so we need to advance revs here as well. [1] https://github.com/neondatabase/rust-postgres/commits/neon	2022-12-17 00:26:10 +03:00
Dmitry Ivanov	83baf49487	[proxy] Forward compute connection params to client This fixes all kinds of problems related to missing params, like broken timestamps (due to `integer_datetimes`). This solution is not ideal, but it will help. Meanwhile, I'm going to dedicate some time to improving connection machinery. Note that this does not fix problems with passing certain parameters in a reverse direction, i.e. from client to compute. This is a separate matter and will be dealt with in an upcoming PR.	2022-12-16 21:37:50 +03:00
Arseny Sher	e14bbb889a	Enable broker client keepalives. (#3127 ) Should fix stale connections. ref https://github.com/neondatabase/neon/issues/3108	2022-12-16 11:55:12 +02:00
Kirill Bulatov	02c1c351dc	Create initial timeline without remote storage (#3077 ) Removes the race during pageserver initial timeline creation that lead to partial layer uploads. This race is only reproducible in test code, we do not create initial timelines in cloud (yet, at least), but still nice to remove the non-deterministic behavior.	2022-12-13 15:42:59 +02:00
Arseny Sher	f013d53230	Switch to clap derive API in safekeeper. Less lines and easier to read/modify. Practically no functional changes.	2022-12-12 16:25:23 +03:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Arseny Sher	28667ce724	Make safekeeper exit code 0. We don't have any useful graceful shutdown mode, so immediate one is normal. https://github.com/neondatabase/neon/issues/2956	2022-12-09 12:35:36 +03:00
Kirill Bulatov	b50e0793cf	Rework remote_storage interface (#2993 ) Changes: * Remove `RemoteObjectId` concept from remote_storage. Operate directly on /-separated names instead. These names are now represented by struct `RemotePath` which was renamed from struct `RelativePath` * Require remote storage to operate on relative paths for its contents, thus simplifying the way to derive them in pageserver and safekeeper * Make `IndexPart` to use `String` instead of `RelativePath` for its entries, since those are just the layer names	2022-12-07 23:11:02 +02:00
Christian Schwarz	ac0c167a85	improve pidfile handling This patch centralize the logic of creating & reading pid files into the new pid_file module and improves upon / makes explicit a few race conditions that existed with the previous code. Starting Processes / Creating Pidfiles ====================================== Before this patch, we had three places that had very similar-looking match lock_file::create_lock_file { ... } blocks. After this change, they can use a straight-forward call provided by the pid_file: pid_file::claim_pid_file_for_pid() Stopping Processes / Reading Pidfiles ===================================== The new pid_file module provides a function to read a pidfile, called read_pidfile(), that returns a pub enum PidFileRead { NotExist, NotHeldByAnyProcess(PidFileGuard), LockedByOtherProcess(Pid), } If we get back NotExist, there is nothing to kill. If we get back NotHeldByAnyProcess, the pid file is stale and we must ignore its contents. If it's LockedByOtherProcess, it's either another pidfile reader or, more likely, the daemon that is still running. In this case, we can read the pid in the pidfile and kill it. There's still a small window where this is racy, but it's not a regression compared to what we have before. The NotHeldByAnyProcess is an improvement over what we had before this patch. Before, we would blindly read the pidfile contents and kill, even if no other process held the flock. If the pidfile was stale (NotHeldByAnyProcess), then that kill would either result in ESRCH or hit some other unrelated process on the system. This patch avoids the latter cacse by grabbing an exclusive flock before reading the pidfile, and returning the flock to the caller in the form of a guard object, to avoid concurrent reads / kills. It's hopefully irrelevant in practice, but it's a little robustness that we get for free here. Maintain flock on Pidfile of ETCD / any InitialPidFile::Create() ================================================================ Pageserver and safekeeper create their pidfiles themselves. But for etcd, neon_local creates the pidfile (InitialPidFile::Create()). Before this change, we would unlock the etcd pidfile as soon as `neon_local start` exits, simply because no-one else kept the FD open. During `neon_local stop`, that results in a stale pid file, aka, NotHeldByAnyProcess, and it would henceforth not trust that the PID stored in the file is still valid. With this patch, we make the etcd process inherit the pidfile FD, thereby keeping the flock held until it exits.	2022-12-07 18:24:12 +01:00
Kliment Serafimov	8f2b3cbded	Sentry integration for storage. (#2926 ) Added basic instrumentation to integrate sentry with the proxy, pageserver, and safekeeper processes. Currently in sentry there are three projects, one for each process. Sentry url is sent to all three processes separately via cli args.	2022-12-06 18:57:54 +00:00
Egor Suvorov	ae53dc3326	Add authentication between Safekeeper and Pageserver/Compute * Fix https://github.com/neondatabase/neon/issues/1854 * Never log Safekeeper::conninfo in walproposer as it now contains a secret token * control_panel, test_runner: generate and pass JWT tokens for Safekeeper to compute and pageserver * Compute: load JWT token for Safekepeer from the environment variable. Do not reuse the token from pageserver_connstring because it's embedded in there weirdly. * Pageserver: load JWT token for Safekeeper from the environment variable. * Rewrite docs/authentication.md	2022-11-25 04:17:42 +03:00
Egor Suvorov	2ce5d8137d	Separate permission checks for Pageserver and Safekeeper There will be different scopes for those two, so authorization code should be different. The `check_permission` function is now not in the shared library. Its implementation is very similar to the one which will be added for Safekeeper. In fact, we may reuse the same existing root-like 'PageServerApi' scope, but I would prefer to have separate root-like scopes for services. Also, generate_management_token in tests is generate_pageserver_token now.	2022-11-25 04:17:42 +03:00
Alexey Kondratov	e6db4b63eb	[safekeeper] Serialize LSN in the `term_history` according to the spec (#2896 ) Use string format in the timeline status HTTP API reponse.	2022-11-24 17:19:01 +01:00
Arthur Petukhovsky	c6072d38c2	Remove debug logs in should_walsender_stop (#2791 )	2022-11-10 15:49:00 +00:00
Dmitry Ivanov	c38f38dab7	Move pq_proto to its own crate	2022-11-03 22:56:04 +03:00
Arseny Sher	63221e4b42	Fix sk->ps walsender shutdown on sk side on caughtup. This will fix many threads issue, but code around awfully still wants improvement. https://github.com/neondatabase/neon/issues/2722	2022-11-03 16:20:55 +04:00
Kirill Bulatov	d42700280f	Remove daemonize from storage components (#2677 ) Move daemonization logic into `control_plane`. Storage binaries now only crate a lockfile to avoid concurrent services running in the same directory.	2022-11-02 02:26:37 +02:00
Arseny Sher	9f49605041	Fix division by zero panic in determine_offloader.	2022-10-22 18:25:12 +03:00
Lassi Pölönen	321aeac3d4	Json logging capability (#2624 ) * Support configuring the log format as json or plain. Separately test json and plain logger. They would be competing on the same global subscriber otherwise. * Implement log_format for pageserver config * Implement configurable log format for safekeeper.	2022-10-21 17:30:20 +00:00
Arseny Sher	7480a0338a	Determine safekeeper for offloading WAL without etcd election API. This API is rather pointless, as sane choice anyway requires knowledge of peers status and leaders lifetime in any case can intersect, which is fine for us -- so manual elections are straightforward. Here, we deterministically choose among the reasonably caught up safekeepers, shifting by timeline id to spread the load. A step towards custom broker https://github.com/neondatabase/neon/issues/2394	2022-10-21 15:33:27 +03:00
Kirill Bulatov	c4ee62d427	Bump clap and other minor dependencies (#2623 )	2022-10-17 12:58:40 +03:00
Kirill Bulatov	f03b7c3458	Bump regular dependencies (#2618 ) * etcd-client is not updated, since we plan to replace it with another client and the new version errors with some missing prost library error * clap had released another major update that requires changing every CLI declaration again, deserves a separate PR	2022-10-15 01:55:31 +03:00
Arseny Sher	9fe4548e13	Reimplement explicit timeline creation on safekeepers. With the ability to pass commit_lsn. This allows to perform project WAL recovery through different (from the original) set of safekeepers (or under different ttid) by 1) moving WAL files to s3 under proper ttid; 2) explicitly creating timeline on safekeepers, setting commit_lsn to the latest point; 3) putting the lastest .parital file to the timeline directory on safekeepers, if desired. Extend test_s3_wal_replay to exersise this behaviour. Also extends timeline_status endpoint to return postgres information.	2022-10-13 21:43:10 +04:00
Lassi Pölönen	e520293090	Add build info metric to pageserver, safekeeper and proxy (#2596 ) * Test that we emit build info metric for pageserver, safekeeper and proxy with some non-zero length revision label * Emit libmetrics_build_info on startup of pageserver, safekeeper and proxy with label "revision" which tells the git revision.	2022-10-11 09:54:32 +03:00
Arthur Petukhovsky	f25dd75be9	Fix deadlock in safekeeper metrics (#2566 ) We had a problem where almost all of the threads were waiting on a futex syscall. More specifically: - `/metrics` handler was inside `TimelineCollector::collect()`, waiting on a mutex for a single Timeline - This exact timeline was inside `control_file::FileStorage::persist()`, waiting on a mutex for Lazy initialization of `PERSIST_CONTROL_FILE_SECONDS` - `PERSIST_CONTROL_FILE_SECONDS: Lazy<Histogram>` was blocked on `prometheus::register` - `prometheus::register` calls `DEFAULT_REGISTRY.write().register()` to take a write lock on Registry and add a new metric - `DEFAULT_REGISTRY` lock was already taken inside `DEFAULT_REGISTRY.gather()`, which was called by `/metrics` handler to collect all metrics This commit creates another Registry with a separate lock, to avoid deadlock in a case where `TimelineCollector` triggers registration of new metrics inside default registry.	2022-10-06 01:07:02 +03:00
sharnoff	580584c8fc	Remove control_plane deps on pageserver/safekeeper (#2513 ) Creates new `pageserver_api` and `safekeeper_api` crates to serve as the shared dependencies. Should reduce both recompile times and cold compile times. Decreases the size of the optimized `neon_local` binary: 380M -> 179M. No significant changes for anything else (mostly as expected).	2022-10-04 11:14:45 -07:00
Arthur Petukhovsky	dabb6d2675	Fix log level for sk startup logs (#2526 )	2022-09-27 10:36:17 +00:00
Arthur Petukhovsky	fc7087b16f	Add metric for loaded safekeeper timelines (#2509 )	2022-09-27 11:57:59 +03:00
Arthur Petukhovsky	d15116f2cc	Update pg_version for old timelines	2022-09-26 19:57:03 +03:00
Anastasia Lubennikova	c81ede8644	Hotfix for safekeeper timelines with unknown pg_version. Assume DEFAULT_PG_VERSION = 14	2022-09-22 21:17:36 +03:00
Anastasia Lubennikova	5e151192f5	Fix rebase conflicts in safekeeper code	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	3618c242b9	use version specific find_end_of_wal function	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	86bf491981	Support pg 15 - Split postgres_ffi into two version specific files. - Preserve pg_version in timeline metadata. - Use pg_version in safekeeper code. Check for postgres major version mismatch. - Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding. - Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture. To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable: 'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress' Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.	2022-09-22 14:15:13 +03:00
Arthur Petukhovsky	7eebb45ea6	Reduce metrics footprint in safekeeper (#2491 ) Fixes bugs with metrics in control_file and wal_storage, where we haven't deleted metrics for inactive timelines.	2022-09-21 19:13:30 +03:00
sharnoff	6f949e1556	Improve pageserver/safekeepeer HTTP API errors (#2461 ) Part of the general work on improving pageserver logs. Brief summary of changes: * Remove `ApiError::from_err` * Remove `impl From<anyhow::Error> for ApiError` * Convert `ApiError::{BadRequest, NotFound}` to use `anyhow::Error` * Note: `NotFound` has more verbose formatting because it's more likely to have useful information for the receiving "user" * Explicitly convert from `tokio::task::JoinError`s into `InternalServerError`s where appropriate Also note: many of the places where errors were implicitly converted to 500s have now been updated to return a more appropriate error. Some places where it's not yet possible to distinguish the error types have been left as 500s.	2022-09-20 17:02:10 -07:00
sharnoff	4b25b9652a	Rename more zid-like idents (#2480 ) Follow-up to PR #2433 (`b8eb908a`). There's still a few more unresolved locations that have been left as-is for the same compatibility reasons in the original PR.	2022-09-20 11:06:31 -07:00
Arthur Petukhovsky	566e816298	Refactor safekeeper timelines handling (#2329 ) See https://github.com/neondatabase/neon/pull/2329 for details	2022-09-20 07:42:39 +00:00
Kirill Bulatov	6db6e7ddda	Use backward-compatible safekeeper code	2022-09-14 08:14:05 +03:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Heikki Linnakangas	35b4816f09	Turn GenericRemoteStorage into just a newtype around 'Arc<dyn RemoteStorage>' We had a pattern like this: match remote_storage { GenericRemoteStorage::Local(storage) => { let source = storage.remote_object_id(&file_path)?; ... storage .function(&source, ...) .await }, GenericRemoteStorage::S3(storage) => { ... exact same code as for the Local case ... }, This removes the code duplication, by allowing you to call the functions directly on GenericRemoteStorage. Also change RemoveObjectId to be just a type alias for String. Now that the callers of GenericRemoteStorage functions don't know whether they're dealing with the LocalFs or S3 implementation, RemoveObjectId must be the same type for both.	2022-09-08 19:59:42 +03:00
Anastasia Lubennikova	2794cd83c7	Prepare pg 15 support (generate bindings for pg15) (#2396 ) Another preparatory commit for pg15 support: * generate bindings for both pg14 and pg15; * update Makefile and CI scripts: now neon build depends on both PostgreSQL versions; * some code refactoring to decrease version-specific dependencies.	2022-09-07 12:40:48 +03:00
Kirill Bulatov	73f926c39a	Return safekeeper remote storage logging during downloads	2022-09-02 15:08:18 +03:00
Kirill Bulatov	8a7333438a	Extract common remote storage operations into GenericRemoteStorage (#2373 )	2022-09-02 11:58:28 +03:00

1 2 3

115 Commits