rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-06 05:30:38 +00:00

Author	SHA1	Message	Date
Alexey Kondratov	0eca1d19de	Add safety notes, benchmark. Optimize checksum calculation	2022-07-07 20:45:50 +02:00
Alexey Kondratov	53b9cb915e	Turn off data-checksums for old tenants by default and explicitly enable for new ones	2022-07-07 19:44:47 +02:00
Alexey Kondratov	cc6ffb558d	Verify checksum of the page and WAL records before sending to the redo process	2022-07-07 19:44:47 +02:00
Alexey Kondratov	6059801943	Enable Postgres data checksums (neondatabase/cloud#536 ) We need checksums to verify data integrity, when we read it from untrusted place (e.g. local disk) or via untrusted communication channel (e.g. network). At the same time, we trust pageserver <-> redo process communication channel, as it is just a pipe. Here we enable calculation of data checksums in the wal redo process and when we extract FPI during WAL injestion. Compute node (Postgres) will verify checksum of every page after receiving it back from pageserver. So it is pretty similar to how vanilla Postgres checks them. There are two other places where we should verify checksums to detect data corruption earlier: - when we receive WAL records from safekeepers (already implemented, see: WalStreamDecoder::poll_decode) - when we write layer files to disk and read back in memory from local disk or S3	2022-07-07 19:44:47 +02:00
Konstantin Knizhnik	2501afba6e	Calculate postgres checksum for FPI stored in pageserver (neondatabase/cloud#536 )	2022-07-07 19:44:47 +02:00
Heikki Linnakangas	e6ea049165	If an error happens during import of base backup or WAL, log it. We only sent the error to the client, with no trace in the pageserver log. Log it, similar to how we log errors in GetPage@LSN requests.	2022-07-07 16:05:13 +03:00
Heikki Linnakangas	0e3456351f	Shrink thread pools used for WAL receivers and background tasks. I noticed that the pageserver has a very large virtual memory size, several GB, even though it doesn't actually use that much memory. That's not much of a problem normally, but I hit it because I wanted to run tests with a limited virtual memory size, by calling setrlimit(RLIMIT_AS), but the highest limit you can set is 2 GB. I was not able to start pageserver with a limit of 2 GB. On Linux, each thread allocates 32 MB of virtual memory. I read this on some random forum on the Internet, but unfortunately could not find the source again now. Empirically, reducing the number of threads clearly helps to bring down the virtual memory size. Aside from the virtual memory usage, it seems excessive to launch 40 threads in both of those thread pools. The tokio default is to have as many worker threads as there are CPU cores in the system. That seems like a fine heuristic for us, too, so remove the explicit setting of the pool size and rely on the default. Note that the GC and compaction tasks are actually run with tokio spawn_blocking, so the threads that are actually doing the work, and possibly waiting on I/O, are not consuming threads from the thread pool. The WAL receiver work is done in the tokio worker threads, but the WAL receivers are more CPU bound so that seems OK. Also remove the explicit maxinum on blocking tasks. I'm not sure what the right value for that would be, or whether the value we set (100) would be better than the tokio default (512). Since the value was arbitrary, let's just rely on the tokio default for that, too.	2022-07-06 22:36:38 +03:00
bojanserafimov	242af75653	Fix signal file parsing (#2042 )	2022-07-06 13:45:02 -04:00
Kirill Bulatov	50821c0a3c	Return download stream directly from the remote storage API	2022-07-05 21:45:15 +03:00
Dmitry Rodionov	cfdf79aceb	harden create_empty_timeline Reorder checks so it checks whether the timeline exists before writing something to disk, possibly replacing valid content	2022-07-05 16:44:18 +03:00
Heikki Linnakangas	bb69e0920c	Do not overwrite an existing image layer. See github issues #1594 and #1690 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2022-07-05 14:45:31 +03:00
bojanserafimov	d29c545b5d	Gc/compaction thread pool, take 2 (#1933 ) Decrease the number of pageserver threads by running gc and compaction in a blocking tokio thread pool	2022-07-05 02:06:40 -04:00
Kirill Bulatov	6abdb12724	Fix 1.62 Clippy errors	2022-07-04 23:46:37 +03:00
Bojan Serafimov	f09c09438a	Fix gc after import	2022-07-01 11:10:49 +03:00
Kirill Bulatov	1d0706cf25	Fix walreceiver connection selection mechanism * Avoid reconnecting to safekeeper immediately after its failure by limiting candidates to those with fewest connection attempts. Thus we don't have to wait lagging_wal_timeout (10s by default) before switch happens even if no new changes are generated, and current test_restarts_under_load expects some commits to happen within 4s. * Make default max_lsn_wal_lag larger, otherwise we constant reconnections happen during normal work. * Fix wal_connection_attempts maintanance, preventing busy loop of reconnections.	2022-06-30 00:40:12 +03:00
Anastasia Lubennikova	3c2b03cd87	Update timeline size on dropdb. Add the test (#1973 ) In addition, fix database size calculation: count not only main fork of the relation, but also vm and fsm.	2022-06-23 12:28:12 +03:00
Kirill Bulatov	7c49abe7d1	Rework etcd timeline updates and their handling	2022-06-23 09:11:27 +03:00
bojanserafimov	1ca28e6f3c	Import basebackup into pageserver (#1925 ) Allow importing basebackup taken from vanilla postgres or another pageserver via psql copy in protocol.	2022-06-21 11:04:10 -04:00
Anastasia Lubennikova	36ee182d26	Implement page servise 'fullbackup' endpoint (#1923 ) * Implement page servise 'fullbackup' endpoint that works like basebackup, but also sends relational files * Add test_runner/batch_others/test_fullbackup.py Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2022-06-16 14:07:11 +03:00
Kirill Bulatov	d8a37452c8	Rename ZenithFeedback (#1912 )	2022-06-11 00:44:05 +03:00
chaitanya sharma	e1336f451d	renamed .zenith data-dir to .neon.	2022-06-09 18:19:18 +02:00
Arthur Petukhovsky	a01999bc4a	Replace most common remote logs with metrics (#1909 )	2022-06-08 13:36:49 +03:00
Kirill Bulatov	8a53472e4f	Force etcd broker keys to not to intersect	2022-06-08 11:21:05 +03:00
Dmitry Rodionov	6e26588d17	Allow to customize shutdown condition in PostgresBackend Use it in PageServerHandler to check per thread shutdown condition from thread_mgr which takes into account tenants and timelines	2022-06-07 22:11:54 +03:00
Dmitry Rodionov	7dc6beacbd	make it possible to associate thread with a tenant after thread start	2022-06-07 12:59:35 +03:00
bojanserafimov	92de8423af	Remove dead code (#1886 )	2022-06-05 09:18:11 -04:00
Dmitry Rodionov	e442f5357b	unify two identical failpoints in flush_frozen_layer probably is a merge artfact	2022-06-03 19:36:09 +03:00
Kirill Bulatov	2623193876	Remove pageserver_connstr from WAL stream logic	2022-06-03 17:30:36 +03:00
huming	9c846a93e8	chore(doc)	2022-06-03 14:24:27 +03:00
Kirill Bulatov	1d16ee92d4	Fix the Lsn difference reconnection	2022-06-03 00:23:13 +03:00
Kirill Bulatov	b0c4ec0594	Log storage sync and etcd events a bit better	2022-06-03 00:23:13 +03:00
Egor Suvorov	aba5e5f8b5	GitHub Actions: pin Rust version to 1.58 like on CircleCI * Fix failing `cargo clippy` while we're here. The behavior has been changed in Rust 1.60: https://github.com/rust-lang/rust-clippy/issues/8928 * Add Rust version to the Cargo deps cache key	2022-06-02 17:45:53 +02:00
Ryan Russell	c71faae2c6	Docs readability cont Signed-off-by: Ryan Russell <git@ryanrussell.org>	2022-06-02 15:05:12 +02:00
Dmitry Rodionov	1188c9a95c	remove extra span as this code is already covered by create timeline span E g this log line contains duplicated data: INFO /timeline_create{tenant=8d367870988250a755101b5189bbbc17 new_timeline=Some(27e2580f51f5660642d8ce124e9ee4ac) lsn=None}: bootstrapping{timeline=27e2580f51f5660642d8ce124e9ee4ac tenant=8d367870988250a755101b5189bbbc17}: created root timeline 27e2580f51f5660642d8ce124e9ee4ac timeline.lsn 0/16960E8 this avoids variable duplication in `bootstrapping` subspan	2022-06-01 19:29:17 +03:00
Kirill Bulatov	e5cb727572	Replace callmemaybe with etcd subscriptions on safekeeper timeline info	2022-06-01 16:07:04 +03:00
bojanserafimov	ca10cc12c1	Close file descriptors for redo process (#1834 )	2022-05-31 14:14:09 -04:00
Ryan Russell	54e163ac03	Improve Readability in Docs Signed-off-by: Ryan Russell <ryanrussell@users.noreply.github.com>	2022-05-31 17:22:47 +03:00
Anastasia Lubennikova	6a867bce6d	Rename 'zenith_admin' role to 'cloud_admin'	2022-05-30 11:11:01 +03:00
Anastasia Lubennikova	3accde613d	Rename contrib/zenith to contrib/neon. Rename custom GUCs: - zenith.page_server_connstring -> neon.pageserver_connstring - zenith.zenith_tenant -> neon.tenantid - zenith.zenith_timeline -> neon.timelineid - zenith.max_cluster_size -> neon.max_cluster_size	2022-05-30 11:11:01 +03:00
Heikki Linnakangas	4b4d3073b8	Fix misc typos	2022-05-28 14:56:23 +03:00
Kian-Meng Ang	f1c51a1267	Fix typos	2022-05-28 14:02:05 +03:00
Thang Pham	75f71a6380	Handle broken timelines on startup (#1809 ) Resolve #1663. ## Changes - ignore a "broken" [1] timeline on page server startup - fix the race condition when creating multiple timelines in parallel for a tenant - added tests for the above changes [1]: a timeline is marked as "broken" if either - failed to load the timeline's metadata or - the timeline's disk consistent LSN is zero	2022-05-27 11:43:06 -04:00
Arseny Sher	0e1bd57c53	Add WAL offloading to s3 on safekeepers. Separate task is launched for each timeline and stopped when timeline doesn't need offloading. Decision who offloads is done through etcd leader election; currently there is no pre condition for participating, that's a TODO. neon_local and tests infrastructure for remote storage in safekeepers added, along with the test itself. ref #1009 Co-authored-by: Anton Shyrabokau <ahtoxa@Antons-MacBook-Pro.local>	2022-05-27 06:19:23 +04:00
Thang Pham	7d565aa4b9	Reduce the logging level when PG client disconnected to `INFO` (#1713 ) Fixes #1683.	2022-05-26 12:21:15 -04:00
Dmitry Rodionov	72a7220dc8	Tidy up some log messages * turn println into an info with proper message * rename new_local_timeline to load_local_timeline because it does not create new timeline, it registers timeline that exists on disk in pageserver in-memory structures	2022-05-26 18:37:40 +03:00
Konstantin Knizhnik	b0d114ee3f	Initialize last_freeze_at with disk consistent LSN to avoid creation of small L0 delta layer on startup refer #1736	2022-05-26 15:42:18 +03:00
chaitanya sharma	c584d90bb9	initial commit, renamed znodeid to nodeid.	2022-05-25 20:11:26 +03:00
Heikki Linnakangas	7997fc2932	Fix error handling with 'basebackup' command. If the 'basebackup' command failed in the middle of building the tar archive, the client would not report the error, but would attempt to to start up postgres with the partial contents of the data directory. That fails because the control file is missing (it's added to the archive last, precisly to make sure that you cannot start postgres from a partial archive). But the client doesn't see the proper error message that caused the basebackup to fail in the server, which is confusing. Two issues conspired to cause that: 1. The tar::Builder object that we use in the pageserver to construct the tar stream has a Drop handler that automatically writes a valid end-of-archive marker on drop. Because of that, the resulting tarball looks complete, even if an error happens while we're building it. The pageserver does send an ErrorResponse after the seemingly-valid tarball, but: 2. The client stops reading the Copy stream, as soon as it sees the tar end-of-archive marker. Therefore, it doesn't read the ErrorResponse that comes after it. We have two clients that call 'basebackup', one in `control_plane` used by the `neon_local` binary, and another one in `compute_tools`. Both had the same issue. This PR fixes both issues, even though fixing either one would be enough to fix the problem at hand. The pageserver now doesn't send the end-of-archive marker on error, and the client now reads the copy stream to the end, even if it sees an end-of-archive marker. Fixes github issue #1715 In the passing, change Basebackup to use generic Write rather than 'dyn'.	2022-05-25 18:14:44 +03:00
KlimentSerafimov	8346aa3a29	Potential fix to #1626 . Fixed typo is Makefile. (#1781 ) * Potential fix to #1626. Fixed typo is Makefile. * Completed fix to #1626. Summary: changed 'error' to 'bail' in start_pageserver and start_safekeeper.	2022-05-24 04:55:38 -04:00
Heikki Linnakangas	2aceb6a309	Fix garbage collection to not remove image layers that are still needed. The logic would incorrectly remove an image layer, if a new image layer existed, even though the older image layer was still needed by some delta layers after it. See example given in the comment this adds. Without this fix, I was getting a lot of "could not find data for key 010000000000000000000000000000000000" errors from GC, with the new test case being added in PR #1735. Fixes #707	2022-05-23 20:58:27 +03:00

1 2 3 4 5 ...

793 Commits