rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-10 15:02:56 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	22cc8760b9	Move walredo process code under pgxn in the main 'neon' repository. - Refactor the way the WalProposerMain function is called when started with --sync-safekeepers. The postgres binary now explicitly loads the 'neon.so' library and calls the WalProposerMain in it. This is simpler than the global function callback "hook" we previously used. - Move the WAL redo process code to a new library, neon_walredo.so, and use the same mechanism as for --sync-safekeepers to call the WalRedoMain function, when launched with --walredo argument. - Also move the seccomp code to neon_walredo.so library. I kept the configure check in the postgres side for now, though.	2022-10-31 01:11:50 +01:00
Arseny Sher	596d622a82	Fix test_prepare_snapshot. It should checkpoint pageserver after waiting for all data arrival, not before.	2022-10-28 22:12:31 +04:00
Sergey Melnikov	7481fb082c	Fix bugs in #2713 (#2716 )	2022-10-28 14:12:49 +00:00
Arseny Sher	1eb9bd052a	Bump vendor/postgres-v15 to fix XLP_FIRST_IS_CONTRECORD issue. ref https://github.com/neondatabase/cloud/issues/2688	2022-10-28 16:45:11 +03:00
Sergey Melnikov	59a3ca4ec6	Deploy proxy to new prod regions (#2713 ) * Refactor proxy deploy * Test new prod deploy * Remove assume role * Add new values * Add all regions	2022-10-28 16:25:28 +03:00
Sergey Melnikov	e86a9105a4	Deploy storage to new prod regions (#2709 )	2022-10-28 10:17:27 +00:00
Stas Kelvich	d3c8749da5	Build compute postgres with openssl support The main reason for that change is that Postgres 15 requires OpenSSL for `pgcrypto` to work. Also not a bad idea to have SSL-enabled Postgres in general.	2022-10-28 10:39:22 +03:00
Alexander Bayandin	128dc8d405	Nightly Benchmarks: fix workflow (#2708 )	2022-10-27 19:26:10 +03:00
Alexander Bayandin	0cbae6e8f3	test_backward_compatibility: friendlier error message (#2707 )	2022-10-27 15:54:49 +00:00
Alexander Stanovoy	78e412b84b	The fix of #2650 . (#2686 ) * Wrappers and drop implementations for image and delta layer writers. * Two regression tests for the image and delta layer files.	2022-10-27 14:02:55 +00:00
Rory de Zoete	6dbf202e0d	Update crane copy target (#2704 ) Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>	2022-10-27 16:00:40 +02:00
Arseny Sher	b42bf9265a	Enable etcd compaction in neon_local.	2022-10-27 10:47:08 +03:00
Stas Kelvich	1f08ba5790	Avoid debian-testing packages in compute Dockerfiles plv8 can only be built with a fairly new gold linker version. We used to install it via binutils packages from testing, but it also updates libc and that causes troubles in the resulting image as different extensions were built against different libc versions. We could either use libc from debian-testing everywhere or restrain from using testing packages and install necessary programs manually. This patch uses the latter approach: gold for plv8 and cmake for h3 are installed manually. In a passing declare h3_postgis as a safe extension (previous omission).	2022-10-27 09:44:16 +03:00
bojanserafimov	0c54eb65fb	Move pagestream api to libs/pageserver_api (#2698 )	2022-10-26 17:32:31 -04:00
mikecaat	259a5f356e	Add a docker-compose example file (#1943 ) (#2666 ) Co-authored-by: Masahiro Ikeda <masahiro.ikeda.us@hco.ntt.co.jp>	2022-10-26 13:59:25 +03:00
Sergey Melnikov	a3cb8c11e0	Do not release to new staging proxies on release (#2685 )	2022-10-25 23:51:23 +00:00
bojanserafimov	9fb2287f87	Add draw_timeline binary (#2688 )	2022-10-25 11:25:22 -04:00
Alexander Bayandin	834ffe1bac	Add data format backward compatibility tests (#2626 )	2022-10-25 16:41:50 +02:00
Stas Kelvich	df18b041c0	Use apt version pinning instead of repo priorities Higher `bullseye` priority doesn't works for packages installed via `bullseye-updates`, e.g.: ``` libc-bin: Installed: 2.31-13+deb11u5 Candidate: 2.35-3 Version table: 2.35-3 500 500 http://ftp.debian.org/debian testing/main amd64 Packages *** 2.31-13+deb11u5 500 500 http://deb.debian.org/debian bullseye-updates/main amd64 Packages 100 /var/lib/dpkg/status 2.31-13+deb11u4 990 990 http://deb.debian.org/debian bullseye/main amd64 Packages ``` Try version pinning instead	2022-10-25 14:29:11 +03:00
Anastasia Lubennikova	39897105b2	Check postgres version and ensure that public schema exists before running GRANT query on it	2022-10-25 09:55:24 +03:00
Stas Kelvich	2f399f08b2	Hotfix to disable grant create on public schema `GRANT CREATE ON SCHEMA public` fails if there is no schema `public`. Disable it in release for now and make a better fix later (it is needed for v15 support).	2022-10-25 09:55:24 +03:00
Arseny Sher	9f49605041	Fix division by zero panic in determine_offloader.	2022-10-22 18:25:12 +03:00
Konstantin Knizhnik	7b6431cbd7	Disable wal_log_hints by default (#2598 ) * Disable wal_log_hints by default * Remove obsolete comment anbout wal_log_hints	2022-10-22 14:59:18 +03:00
Lassi Pölönen	321aeac3d4	Json logging capability (#2624 ) * Support configuring the log format as json or plain. Separately test json and plain logger. They would be competing on the same global subscriber otherwise. * Implement log_format for pageserver config * Implement configurable log format for safekeeper.	2022-10-21 17:30:20 +00:00
Andrés	71ef7b6663	Remove cached_property package (#2673 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-21 20:02:31 +03:00
Kirill Bulatov	5928cb33c5	Introduce timeline state (#2651 ) Similar to https://github.com/neondatabase/neon/pull/2395, introduces a state field in Timeline, that's possible to subscribe to. Adjusts * walreceiver to not to have any connections if timeline is not Active * remote storage sync to not to schedule uploads if timeline is Broken * not to create timelines if a tenant/timeline is broken * automatically switches timelines' states based on tenant state Does not adjust timeline's gc, checkpointing and layer flush behaviour much, since it's not safe to cancel these processes abruptly and there's task_mgr::shutdown_tasks that does similar thing.	2022-10-21 15:51:48 +00:00
Sergey Melnikov	6ff2c61ae0	Refactor safekeeper s3 config and change it for new account (#2672 )	2022-10-21 13:44:08 +00:00
Arseny Sher	7480a0338a	Determine safekeeper for offloading WAL without etcd election API. This API is rather pointless, as sane choice anyway requires knowledge of peers status and leaders lifetime in any case can intersect, which is fine for us -- so manual elections are straightforward. Here, we deterministically choose among the reasonably caught up safekeepers, shifting by timeline id to spread the load. A step towards custom broker https://github.com/neondatabase/neon/issues/2394	2022-10-21 15:33:27 +03:00
Sergey Melnikov	2709878b8b	Deploy scram proxies into new account (#2643 )	2022-10-21 14:21:22 +03:00
Kirill Bulatov	39e4bdb99e	Actualize tenant and timeline API modifiers (#2661 ) * Actualize tenant and timeline API modifiers * Use anyhow::Result explicitly	2022-10-21 10:58:43 +00:00
Anastasia Lubennikova	52e75fead9	Use anyhow::Result explicitly	2022-10-21 12:47:06 +03:00
Anastasia Lubennikova	a347d2b6ac	#2616 handle 'Unsupported pg_version' error properly	2022-10-21 12:47:06 +03:00
Heikki Linnakangas	fc4ea3553e	test_gc_cutoff.py fixes (#2655 ) * Fix bogus early exit from GC. Commit `91411c415a` added this failpoint, but the early exit was not intentional. * Cleanup test_gc_cutoff.py test. - Remove the 'scale' parameter, this isn't a benchmark - Tweak pgbench and pageserver options to create garbage faster that the the GC can collect away. The test used to take just under 5 minutes, which was uncomfortably close to the default 5 minute test timeout, and annoyingly even without the hard limit. These changes bring it down to about 1-2 minutes. - Improve comments, fix typos - Rename the failpoint. The old name, 'gc-before-save-metadata' implied that the failpoint was before the metadata update, but it was in fact much later in the function. - Move the call to persist the metadata outside the lock, to avoid holding it for too long. To verify that this test still covers the original bug, https://github.com/neondatabase/neon/issues/2539, I commenting out updating the metadata file like this: ``` diff --git a/pageserver/src/tenant/timeline.rs b/pageserver/src/tenant/timeline.rs index 1e857a9a..f8a9f34a 100644 --- a/pageserver/src/tenant/timeline.rs +++ b/pageserver/src/tenant/timeline.rs @@ -1962,7 +1962,7 @@ impl Timeline { } // Persist the new GC cutoff value in the metadata file, before // we actually remove anything. - self.update_metadata_file(self.disk_consistent_lsn.load(), HashMap::new())?; + //self.update_metadata_file(self.disk_consistent_lsn.load(), HashMap::new())?; info!("GC starting"); ``` It doesn't fail every time with that, but it did fail after about 5 runs.	2022-10-21 02:39:55 +03:00
Dmitry Rodionov	cca1ace651	make launch_wal_receiver infallible	2022-10-21 00:40:12 +03:00
Sergey Melnikov	30984c163c	Fix race between pushing image to ECR and copying to dockerhub (#2662 )	2022-10-20 23:01:01 +03:00
Konstantin Knizhnik	7404777efc	Pin pages with speculative insert tuples to prevent their reconstruction because spec_token is not wal logged (#2657 ) * Pin pages with speculative insert tuples to prevent their reconstruction because spec_token is not wal logged refer ##2587 * Bump postgres versions	2022-10-20 20:06:05 +03:00
Heikki Linnakangas	eb1bdcc6cf	If an FSM or VM page cannot be reconstructed, fill it with zeros. If we cannot reconstruct an FSM or VM page, while creating image layers, fill it with zeros instead. That should always be safe, for the FSM and VM, in the sense that you won't lose actual user data. It will get cleaned up by VACUUM later. We had a bug with FSM/VM truncation, where we truncated the FSM and VM at WAL replay to a smaller size than PostgreSQL originally did. We thought was harmless, as the FSM and VM are not critical for correctness and can be zeroed out or truncated without affecting user data. However, it lead to a situation where PostgreSQL created incremental WAL records for pages that we had already truncated away in the pageserver, and when we tried to replay those WAL records, that failed. That lead to a permanent error in image layer creation, and prevented it from ever finishing. See https://github.com/neondatabase/neon/issues/2601. With this patch, those pages will be filled with zeros in the image layer, which allows the image layer creation to finish.	2022-10-20 17:27:01 +03:00
Arthur Petukhovsky	f5ab9f761b	Remove flaky checks in test_delete_force (#2567 )	2022-10-20 17:14:32 +04:00
Kirill Bulatov	306a47c4fa	Use uninit mark files during timeline init for atomic creation (#2489 ) Part of https://github.com/neondatabase/neon/pull/2239 Regular, from scratch, timeline creation involves initdb to be run in a separate directory, data from this directory to be imported into pageserver and, finally, timeline-related background tasks to start. This PR ensures we don't leave behind any directories that are not marked as temporary and that pageserver removes such directories on restart, allowing timeline creation to be retried with the same IDs, if needed. It would be good to later rewrite the logic to use a temporary directory, similar what tenant creation does. Yet currently it's harder than this change, so not done.	2022-10-20 14:19:17 +03:00
Kirill Bulatov	84c5f681b0	Fix test feature detection (#2659 ) Follow-up of #2636 and #2654 , fixing the test detection feature. Pageserver currently outputs features as ``` /target/debug/pageserver --version Neon page server git:7734929a8202c8cc41596a861ffbe0b51b5f3cb9 failpoints: true, features: ["testing", "profiling"] ```	2022-10-20 13:44:03 +03:00
Kirill Bulatov	50297bef9f	RFC about Tenant / Timeline guard objects (#2660 ) Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2022-10-20 12:49:54 +03:00
Andrés	9211923bef	Pageserver Python tests should not fail if the server is built with no testing feature (#2636 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-20 10:46:57 +03:00
bojanserafimov	7734929a82	Remove stale todos (#2630 )	2022-10-19 22:59:22 +00:00
Heikki Linnakangas	bc5ec43056	Fix flaky physical-size tests in test_timeline_size.py. These two tests, test_timeline_physical_size_post_compaction and test_timeline_physical_size_post_gc, assumed that after you have waited for the WAL from a bulk insertion to arrive, and you run a cycle of checkpoint and compaction, no new layer files are created. Because if a new layer file is created while we are calculating the incremental and non-incremental physical sizes, they might differ. However, the tests used a very small checkpoint_distance, so even a small amount of WAL generated in PostgreSQL could cause a new layer file to be created. Autovacuum can kick in at any time, and do that. That caused occasional failues in the test. I was able to reproduce it reliably by adding a long delay between the incremental and non-incremental size calculations: ``` --- a/pageserver/src/http/routes.rs +++ b/pageserver/src/http/routes.rs @@ -129,6 +129,9 @@ async fn build_timeline_info( } }; let current_physical_size = Some(timeline.get_physical_size()); + if include_non_incremental_physical_size { + std:🧵:sleep(std::time::Duration::from_millis(60000)); + } let info = TimelineInfo { tenant_id: timeline.tenant_id, ``` To fix, disable autovacuum for the table. Autovacuum could still kick in for other tables, e.g. catalog tables, but that seems less likely to generate enough WAL to causea new layer file to be flushed. If this continues to be a problem in the future, we could simply retry the physical size call a few times, if there's a mismatch. A mismatch could happen every once in a while, but it's very unlikely to happen more than once or twice in a row. Fixes https://github.com/neondatabase/neon/issues/2212	2022-10-19 23:50:21 +03:00
MMeent	b237feedab	Add more redo metrics: (#2645 ) - Measure size of redo WAL (new histogram), with bounds between 24B-32kB - Add 2 more buckets at the upper end of the redo time histogram We often (>0.1% of several hours each day) take more than 250ms to do the redo round-trip to the postgres process. We need to measure these redo times more precisely.	2022-10-19 22:47:11 +02:00
Alexey Kondratov	4d1e48f3b9	[compute_ctl] Use postgres::config to properly escape database names (#2652 ) We've got at least one user in production that cannot create a database with a trailing space in the name. This happens because we use `url` crate for manipulating the DATABASE_URL, but it follows a standard that doesn't fit really well with Postgres. For example, it trims all trailing spaces from the path: > Remove any leading and trailing C0 control or space from input. > See: https://url.spec.whatwg.org/#url-parsing But we used `set_path()` to set database name and it's totally valid to have trailing spaces in the database name in Postgres. Thus, use `postgres::config::Config` to modify database name in the connection details.	2022-10-19 19:20:06 +02:00
Anastasia Lubennikova	7576b18b14	[compute_tools] fix GRANT CREATE ON SCHEMA public - run the grant query in each database	2022-10-19 18:37:52 +03:00
Konstantin Knizhnik	6b49b370fc	Fix build after applying PR #2558	2022-10-19 13:55:30 +03:00
Konstantin Knizhnik	91411c415a	Persists latest_gc_cutoff_lsn before performing GC (#2558 ) * Persists latest_gc_cutoff_lsn before performing GC * Peform some refactoring and code deduplication refer #2539 * Add test for persisting GC cutoff * Fix python test style warnings * Bump postgres version * Reduce number of iterations in test_gc_cutoff test * Bump postgres version * Undo bumping postgres version	2022-10-19 12:32:03 +03:00
Kirill Bulatov	c67cf34040	Update GH Action version (#2646 )	2022-10-19 11:16:36 +03:00

1 2 3 4 5 ...

2263 Commits