rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 22:12:56 +00:00

Author	SHA1	Message	Date
Dmitry Rodionov	7987889cb3	keep successfully downloaded index parts	2022-07-18 12:27:04 +03:00
Dmitry Rodionov	912a08317b	do not ignore errors during downloading of tenant index parts	2022-07-18 12:27:04 +03:00
Kirill Bulatov	c4b2347e21	Use less restricrtive lock guard during storage sync	2022-07-17 12:49:18 +03:00
Heikki Linnakangas	a342957aee	Use ok_or_else() instead of ok_or(), to silence clippy warnings. "cargo clippy" started to complain about these, after running "cargo update". Not sure why it didn't complain before, but seems reasonable to fix these. (The "cargo update" is not included in this commit)	2022-07-14 22:13:51 +03:00
Thang Pham	7f048abf3b	Add `close_fds` for `initdb` command and add close fd test (#2060 ) This PR adds a test for https://github.com/neondatabase/neon/pull/1834 and fixes the error in https://app.circleci.com/pipelines/github/neondatabase/neon/7753/workflows/94d1b796-10a3-4989-b23c-4c1eb4a49cf5/jobs/79586, which happens because `pageserver.pid` is held by `initdb` command on restart. Because the test requires `lsof` to be installed in the docker image, this PR also updates the caches and docker image specified in CircleCI config file.	2022-07-12 15:04:40 -04:00
bojanserafimov	5cf597044d	Allow prev_lsn hint for fullbackup (#2052 )	2022-07-11 10:31:14 -04:00
Heikki Linnakangas	95452e605a	Optimize importing a physical backup Before this patch, importing a physical backup followed the same path as ingesting any WAL records: 1. All the data pages from the backup are first collected in the DatadirModification object. 2. Then, they are "committed" to the Repository. They are written to the in-memory layer 3. Finally, the in-memory layer is frozen, and flushed to disk as a L0 delta layer file. This was pretty inefficient. In step 1, the whole physical backup was held in memory. If the backup is large, you simply run out of memory. And in step 3, the resulting L0 delta layer file is large, holding all the data again. That's a problem if the backup is larger than 5 GB: Amazon S3 doesn't allow uploading files larger than 5 GB (without using multi-part upload, see github issue #1910). So we want to avoid that. To alleviate those problems, optimize the codepath for importing a physical backup. The basic flow is the same as before, but step 1 is optimized so that it doesn't accumulate all the data in memory, and step 3 writes the data in image layers instead of one large delta layer.	2022-07-11 17:03:58 +03:00
Dmitry Rodionov	21da9199fa	take Value by reference to avoid calling .clone	2022-07-11 17:03:58 +03:00
Thang Pham	1f5918b36d	Delay calculating the starting LSN when doing timeline branching (#2053 ) Previously, upon branching, if no starting LSN is specified, we determine the start LSN based on the source timeline's last record LSN in `timelines::create_timeline` function, which then calls `Repository::branch_timeline` to create the timeline. Inside the `LayeredRepository::branch_timeline` function, to start branching, we try to acquire a GC lock to prevent GC from removing data needed for the new timeline. However, a GC iteration takes time, so the GC lock can be held for a long period of time. As a result, the previously determined starting LSN can become invalid because of GC. This PR fixes the above issue by delaying the LSN calculation part and moving it to be inside `LayeredRepository::branch_timeline` function.	2022-07-08 10:29:29 -04:00
Dmitry Rodionov	1a5af6d7a5	extend detach/delete tests	2022-07-07 21:20:04 +03:00
Dmitry Rodionov	520ffb341b	fix pageserver openapi spec	2022-07-07 21:20:04 +03:00
Dmitry Rodionov	9f2b40645d	review cleanup, point timeline/detach to timeline/delete	2022-07-07 21:20:04 +03:00
Dmitry Rodionov	168214e0b6	use tenant status endpoint to check whether timelines were downloaded or not	2022-07-07 21:20:04 +03:00
Dmitry Rodionov	d9d4ef12c3	review cleanup	2022-07-07 21:20:04 +03:00
Dmitry Rodionov	e1e24336b7	review adjustments, bring back timeline_detach and rename it to timeline_delete	2022-07-07 21:20:04 +03:00
Dmitry Rodionov	4c54e4b37d	switch to per-tenant attach/detach download operations of all timelines for one tenant are now grouped together so when attach is invoked pageserver downloads all of them and registers them in a single apply_sync_status_update call so branches can be used safely with attach/detach	2022-07-07 21:20:04 +03:00
Heikki Linnakangas	e6ea049165	If an error happens during import of base backup or WAL, log it. We only sent the error to the client, with no trace in the pageserver log. Log it, similar to how we log errors in GetPage@LSN requests.	2022-07-07 16:05:13 +03:00
Heikki Linnakangas	0e3456351f	Shrink thread pools used for WAL receivers and background tasks. I noticed that the pageserver has a very large virtual memory size, several GB, even though it doesn't actually use that much memory. That's not much of a problem normally, but I hit it because I wanted to run tests with a limited virtual memory size, by calling setrlimit(RLIMIT_AS), but the highest limit you can set is 2 GB. I was not able to start pageserver with a limit of 2 GB. On Linux, each thread allocates 32 MB of virtual memory. I read this on some random forum on the Internet, but unfortunately could not find the source again now. Empirically, reducing the number of threads clearly helps to bring down the virtual memory size. Aside from the virtual memory usage, it seems excessive to launch 40 threads in both of those thread pools. The tokio default is to have as many worker threads as there are CPU cores in the system. That seems like a fine heuristic for us, too, so remove the explicit setting of the pool size and rely on the default. Note that the GC and compaction tasks are actually run with tokio spawn_blocking, so the threads that are actually doing the work, and possibly waiting on I/O, are not consuming threads from the thread pool. The WAL receiver work is done in the tokio worker threads, but the WAL receivers are more CPU bound so that seems OK. Also remove the explicit maxinum on blocking tasks. I'm not sure what the right value for that would be, or whether the value we set (100) would be better than the tokio default (512). Since the value was arbitrary, let's just rely on the tokio default for that, too.	2022-07-06 22:36:38 +03:00
bojanserafimov	242af75653	Fix signal file parsing (#2042 )	2022-07-06 13:45:02 -04:00
Kirill Bulatov	50821c0a3c	Return download stream directly from the remote storage API	2022-07-05 21:45:15 +03:00
Dmitry Rodionov	cfdf79aceb	harden create_empty_timeline Reorder checks so it checks whether the timeline exists before writing something to disk, possibly replacing valid content	2022-07-05 16:44:18 +03:00
Heikki Linnakangas	bb69e0920c	Do not overwrite an existing image layer. See github issues #1594 and #1690 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2022-07-05 14:45:31 +03:00
bojanserafimov	d29c545b5d	Gc/compaction thread pool, take 2 (#1933 ) Decrease the number of pageserver threads by running gc and compaction in a blocking tokio thread pool	2022-07-05 02:06:40 -04:00
Kirill Bulatov	6abdb12724	Fix 1.62 Clippy errors	2022-07-04 23:46:37 +03:00
Bojan Serafimov	f09c09438a	Fix gc after import	2022-07-01 11:10:49 +03:00
Kirill Bulatov	1d0706cf25	Fix walreceiver connection selection mechanism * Avoid reconnecting to safekeeper immediately after its failure by limiting candidates to those with fewest connection attempts. Thus we don't have to wait lagging_wal_timeout (10s by default) before switch happens even if no new changes are generated, and current test_restarts_under_load expects some commits to happen within 4s. * Make default max_lsn_wal_lag larger, otherwise we constant reconnections happen during normal work. * Fix wal_connection_attempts maintanance, preventing busy loop of reconnections.	2022-06-30 00:40:12 +03:00
Anastasia Lubennikova	3c2b03cd87	Update timeline size on dropdb. Add the test (#1973 ) In addition, fix database size calculation: count not only main fork of the relation, but also vm and fsm.	2022-06-23 12:28:12 +03:00
Kirill Bulatov	7c49abe7d1	Rework etcd timeline updates and their handling	2022-06-23 09:11:27 +03:00
bojanserafimov	1ca28e6f3c	Import basebackup into pageserver (#1925 ) Allow importing basebackup taken from vanilla postgres or another pageserver via psql copy in protocol.	2022-06-21 11:04:10 -04:00
Anastasia Lubennikova	36ee182d26	Implement page servise 'fullbackup' endpoint (#1923 ) * Implement page servise 'fullbackup' endpoint that works like basebackup, but also sends relational files * Add test_runner/batch_others/test_fullbackup.py Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2022-06-16 14:07:11 +03:00
Kirill Bulatov	d8a37452c8	Rename ZenithFeedback (#1912 )	2022-06-11 00:44:05 +03:00
chaitanya sharma	e1336f451d	renamed .zenith data-dir to .neon.	2022-06-09 18:19:18 +02:00
Arthur Petukhovsky	a01999bc4a	Replace most common remote logs with metrics (#1909 )	2022-06-08 13:36:49 +03:00
Kirill Bulatov	8a53472e4f	Force etcd broker keys to not to intersect	2022-06-08 11:21:05 +03:00
Dmitry Rodionov	6e26588d17	Allow to customize shutdown condition in PostgresBackend Use it in PageServerHandler to check per thread shutdown condition from thread_mgr which takes into account tenants and timelines	2022-06-07 22:11:54 +03:00
Dmitry Rodionov	7dc6beacbd	make it possible to associate thread with a tenant after thread start	2022-06-07 12:59:35 +03:00
bojanserafimov	92de8423af	Remove dead code (#1886 )	2022-06-05 09:18:11 -04:00
Dmitry Rodionov	e442f5357b	unify two identical failpoints in flush_frozen_layer probably is a merge artfact	2022-06-03 19:36:09 +03:00
Kirill Bulatov	2623193876	Remove pageserver_connstr from WAL stream logic	2022-06-03 17:30:36 +03:00
huming	9c846a93e8	chore(doc)	2022-06-03 14:24:27 +03:00
Kirill Bulatov	1d16ee92d4	Fix the Lsn difference reconnection	2022-06-03 00:23:13 +03:00
Kirill Bulatov	b0c4ec0594	Log storage sync and etcd events a bit better	2022-06-03 00:23:13 +03:00
Egor Suvorov	aba5e5f8b5	GitHub Actions: pin Rust version to 1.58 like on CircleCI * Fix failing `cargo clippy` while we're here. The behavior has been changed in Rust 1.60: https://github.com/rust-lang/rust-clippy/issues/8928 * Add Rust version to the Cargo deps cache key	2022-06-02 17:45:53 +02:00
Ryan Russell	c71faae2c6	Docs readability cont Signed-off-by: Ryan Russell <git@ryanrussell.org>	2022-06-02 15:05:12 +02:00
Dmitry Rodionov	1188c9a95c	remove extra span as this code is already covered by create timeline span E g this log line contains duplicated data: INFO /timeline_create{tenant=8d367870988250a755101b5189bbbc17 new_timeline=Some(27e2580f51f5660642d8ce124e9ee4ac) lsn=None}: bootstrapping{timeline=27e2580f51f5660642d8ce124e9ee4ac tenant=8d367870988250a755101b5189bbbc17}: created root timeline 27e2580f51f5660642d8ce124e9ee4ac timeline.lsn 0/16960E8 this avoids variable duplication in `bootstrapping` subspan	2022-06-01 19:29:17 +03:00
Kirill Bulatov	e5cb727572	Replace callmemaybe with etcd subscriptions on safekeeper timeline info	2022-06-01 16:07:04 +03:00
bojanserafimov	ca10cc12c1	Close file descriptors for redo process (#1834 )	2022-05-31 14:14:09 -04:00
Ryan Russell	54e163ac03	Improve Readability in Docs Signed-off-by: Ryan Russell <ryanrussell@users.noreply.github.com>	2022-05-31 17:22:47 +03:00
Anastasia Lubennikova	6a867bce6d	Rename 'zenith_admin' role to 'cloud_admin'	2022-05-30 11:11:01 +03:00
Anastasia Lubennikova	3accde613d	Rename contrib/zenith to contrib/neon. Rename custom GUCs: - zenith.page_server_connstring -> neon.pageserver_connstring - zenith.zenith_tenant -> neon.tenantid - zenith.zenith_timeline -> neon.timelineid - zenith.max_cluster_size -> neon.max_cluster_size	2022-05-30 11:11:01 +03:00

1 2 3 4 5 ...

804 Commits