rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 09:22:55 +00:00

Author	SHA1	Message	Date
Arthur Petukhovsky	072fb3d7e9	WIP	2023-03-09 14:59:03 +02:00
Arthur Petukhovsky	f2fb9f6be9	WIP	2023-03-09 14:51:29 +02:00
Arthur Petukhovsky	dd4c8fb568	WIP	2023-03-09 00:51:14 +02:00
Arthur Petukhovsky	9116c01614	WIP	2023-03-08 18:45:13 +02:00
Arthur Petukhovsky	17cd96e022	WIP	2023-03-03 20:33:55 +00:00
Christian Schwarz	9cada8b59d	fix benchmarks, broken by PR #3737 Benchmarks only run on `main` branch, so, the pre-commit tests didn't catch these.	2023-03-03 18:47:57 +01:00
Christian Schwarz	66a5159511	fix: compaction: no index upload scheduled if no on-demand downloads Commit `0cf7fd0fb8` Compaction with on-demand download (#3598) introduced a subtle bug: if we don't have to do on-demand downloads, we only take one ROUND in fn compact() and exit early. Thereby, we miss scheduling the index part upload for any layers created by fn compact_inner(). Before that commit, we didn't have this problem. So, this patch fixes it. Since no regression test caught this, I went ahead and extended the timeline size tests to assert that, if remote storage is configured, 1. pageserver_remote_physical_size matches the other physical sizes 2. file sizes reported by the layer map info endpoint match the other physical size metrics Without the pageserver code fix, the regression test would fail at the physical size assertion, complaining that any of the resident physical size != remote physical size metric 50790400.0 != 18399232.0 I figured out what the problem is by comparing the remote storage and local directories like so, and noticed that the image layer in the local directory wasn't present on the remote side. It's size was exactly the difference 50790400.0 - 18399232.0 =32391168.0 fixes https://github.com/neondatabase/neon/issues/3738	2023-03-03 16:11:54 +01:00
Christian Schwarz	d1a0a907ff	tests: use `parse_metrics` everywhere (#3737 ) - use parse_metrics() in all places where we parse Prometheus metrics - query_all: make `filter` argument optional - encourage using properly parsed, typed metrics by changing get_metrics() to return already-parsed metrics. The new get_metric_str() method, like in the Safekeeper type, returns the raw text response.	2023-03-03 14:53:27 +01:00
Christian Schwarz	1b780fa752	timeline_checkpoint_handler: add span with tenant and timeline id Before this patch, the logs written by freeze_and_flush() and compact() didn't have any span, which made the test logs annoying to read.	2023-03-03 12:10:24 +01:00
Christian Schwarz	38022ff11c	gc: only decrement resident size if GC'd layer is resident Before this patch, GC would call PersistentLayer::delete() on every GC'ed layer. RemoteLayer::delete() returned Ok(()) unconditionally. GC would then proceed by decrementing the resident size metric, even though the layer is a RemoteLayer. This patch makes the following changes: - Rename PersistentLayer::delete() to delete_resident_layer_file(). That name is unambiguous. - Make RemoteLayer::delete_resident_layer_file return an Err(). We would have uncovered this bug if we had done that from the start. - Change GC / Timeline::delete_historic_layer check whether the layer is remote or not, and only call delete_resident_layer_file() if it's not remote. This brings us in line with how eviction does it. - Add a regression test. fixes https://github.com/neondatabase/neon/issues/3722	2023-03-03 12:10:24 +01:00
Christian Schwarz	1b9b9d60d4	eviction: add comment explaining resident size decrement on error https://github.com/neondatabase/neon/issues/3722	2023-03-03 12:10:24 +01:00
Christian Schwarz	68141a924d	eviction: remove needless if-let around resident size decrement The branch was always taken at runtime, so, this should not constitute a behavioral change. refs https://github.com/neondatabase/neon/issues/3722	2023-03-03 12:10:24 +01:00
Christian Schwarz	764d27f696	fix checkpoint_timeout serialization in TenantConf Without this change, when actually setting this conf opt, the tenant would become Broken next time we load it. Why? The serde_toml representation that persist_tenant_conf would write out would be a TOML inline table of `secs` and `nsecs`. But our hand-rolled TenantConf parser expects a TOML string. I checked that all other `Duration` values in TenantConfOpt use the humantime serialization. Issues like this would likely be systematically prevent by https://github.com/neondatabase/neon/issues/3682	2023-03-03 12:10:24 +01:00
Arthur Petukhovsky	b23742e09c	Create `/v1/debug_dump` safekeepers endpoint (#3710 ) Add HTTP endpoint to get full safekeeper state of all existing timelines (all in-memory values and info about all files stored on disk). Example: https://gist.github.com/petuhovskiy/3cbb8f870401e9f486731d145161c286	2023-03-03 14:01:05 +03:00
Vadim Kharitonov	5e514b8465	Compile pgTAP extension	2023-03-03 10:14:38 +01:00
Vadim Kharitonov	a60f687ce2	Compile `rum` extension	2023-03-02 12:28:20 +01:00
sharnoff	8dae879994	Disable VM cgroup shenanigans (#3730 ) As discussed - temporary, so it can unblock releasing autoscaling. Cleaner to fully remove, then add back rather than commenting it out.	2023-03-01 23:58:43 -08:00
Shany Pozin	d19c5248c9	Add UUID header to mgmt API (#3708 ) ## Describe your changes ## Issue ticket number and link #3479 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-01 18:09:08 +02:00
sharnoff	1360361f60	Fix missing VM cgconfig.conf (#3718 ) It was being added to the wrong stage in the dockerfile. This should fix it, and resolves an ongoing issue on staging.	2023-02-28 21:11:00 -08:00
Alexander Bayandin	000eb1b069	Bump tempfile from 3.3.0 to 3.4.0 (#3709 ) Update `tempfile` crate to get rid of `remove_dir_all` dependency Ref https://github.com/neondatabase/neon/security/dependabot/15	2023-02-27 12:44:08 +00:00
Heikki Linnakangas	f51b48fa49	Fix UNLOGGED tables. Instead of trying to create missing files on the way, send init fork contents as main fork from pageserver during basebackup. Add test for that. Call put_rel_drop for init forks; previously they weren't removed. Bump vendor/postgres to revert previous approach on Postgres side. Co-authored-by: Arseny Sher <sher-ars@yandex.ru> ref https://github.com/neondatabase/postgres/pull/264 ref https://github.com/neondatabase/postgres/pull/259 ref https://github.com/neondatabase/neon/issues/1222	2023-02-24 23:30:02 +04:00
Sergey Melnikov	9f906ff236	Add pageserver-2.us-east-2.aws.neon.tech (#3701 )	2023-02-23 19:56:21 +01:00
Sam Kleinman	c79dd8d458	compute_ctl: support for fetching spec from control plane (#3610 )	2023-02-23 13:19:39 -05:00
Vadim Kharitonov	ec4ecdd543	Enable postgres SPI extensions	2023-02-23 16:49:37 +01:00
MMeent	20a4d817ce	Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581 ) ## Describe your changes Rebase vendored PostgreSQL onto 14.7 and 15.2 ## Issue ticket number and link #3579 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ``` The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL 14 and 15.2 for PostgreSQL 15. ```	2023-02-23 16:10:22 +02:00
Vadim Kharitonov	5ebf7e5619	Fix `pg_jsonschema` and `pg_graphql`	2023-02-23 10:43:46 +01:00
Arseny Sher	0692fffbf3	Bump vendor/postgres to include hotfix for unlogged tables with indexes. https://github.com/neondatabase/postgres/pull/259 https://github.com/neondatabase/postgres/pull/262	2023-02-23 01:34:59 +04:00
Vadim Kharitonov	093570af20	Compile `pg_hashids` extension	2023-02-22 21:00:25 +01:00
Dmitry Rodionov	eb403da814	Use debug level for successful GET http requests (#3681 ) We started rather frequently scrap some apis for metadata. This includes layer eviction tester, I believe console does that too. It should eliminate these logs: https://neonprod.grafana.net/goto/rr_ace1Vz?orgId=1 (Note the rate around 2k messages per minute)	2023-02-22 22:19:05 +03:00
Vadim Kharitonov	f3ad635911	Compile `pgrouting` extension	2023-02-22 20:16:11 +01:00
Vadim Kharitonov	a8d7360881	Compile `hypopg` extension	2023-02-22 20:14:30 +01:00
Lassi Pölönen	b0311cfdeb	Change the production neon-proxy-scram update strategy to RollingUpdate (#3683 ) ## Describe your changes The same change in production as was done in staging by https://github.com/neondatabase/neon/pull/3678 ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-02-22 20:15:37 +02:00
Konstantin Knizhnik	412e0aa985	Skip largest N holes during compaction (#3597 ) ## Describe your changes This is yet another attempt to address problem with storage size ballooning #2948 Previous PR #3348 tries to address this problem by maintaining list of holes for each layer. The problem with this approach is that we have to load all layer on pageserver start. Lazy loading of layers is not possible any more. This PR tries to collect information of N largest holes on compaction time and exclude this holes from produced layers. It can cause generation of larger number of layers (up to 2 times) and producing small layers. But it requires minimal changes in code and doesn't affect storage format. For graphical explanation please see thread: https://github.com/neondatabase/neon/pull/3597#discussion_r1112704451 ## Issue ticket number and link #2948 #3348 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-02-22 18:28:01 +02:00
Lassi Pölönen	965b4f4ae2	Change the staging neon-proxy-scram update strategy to RollingUpdate (#3678 ) ## Describe your changes When we deploy the proxy with the default Recreate strategy, there's always some downtime and existing connections will be shut down. Change the strategy to RollingUpdate and delay the kill signal by one week. AWS Network Loadbalancer keeps the existing connections alive for as long as the pods are alive, but will direct new connections to new pods. ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-02-22 16:50:07 +02:00
Arthur Petukhovsky	95018672fa	Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671 ) We migrated all timelines to `safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be removed.	2023-02-22 11:55:41 +02:00
Sergey Melnikov	2caece2077	Add -v to ansible invocations (#3670 ) To get more debug output on failures	2023-02-21 23:11:52 +03:00
Joonas Koivunen	b8b8c19fb4	fix: hold permit until GetObject eof (#3663 ) previously we applied the ratelimiting only up to receiving the headers from s3, or somewhere near it. the commit adds an adapter which carries the permit until the AsyncRead has been disposed. fixes #3662.	2023-02-21 21:14:08 +02:00
Joonas Koivunen	225add041f	calculate_logical_size: no longer use spawn_blocking (#3664 ) Calculation of logical size is now async because of layer downloads, so we shouldn't use spawn_blocking for it. Use of `spawn_blocking` exhausted resources which are needed by `tokio::io::copy` when copying from a stream to a file which lead to deadlock. Fixes: #3657	2023-02-21 21:09:31 +02:00
Joonas Koivunen	5d001b1e5a	chore: ignore all compaction inactive tenant errors (#3665 ) these are happening in tests because of #3655 but they sure took some time to appear. makes the `Compaction failed, retrying in 2s: Cannot run compaction iteration on inactive tenant` into a globally allowed error, because it has been seen failing on different test cases.	2023-02-21 20:20:13 +02:00
Joonas Koivunen	fe462de85b	fix: log download failed error (#3661 ) Fixes #3659	2023-02-21 19:31:53 +02:00
Vadim Kharitonov	c0de7f5cd8	Build `pg_jsonschema` and `pg_graphql` extensions (#3535 ) ## Describe your changes Layer for building pg extensions written on Rust It required forking: * `cargo-pgx` (in order not to catch an ABI mismatch error (`cargo-pgx` hardcoded ABI tcdi/pgx#1032) * `pg_jsonschema` (to use forked `cargo-pgx` version) * `pgx-contrib-spiext` (to use forked `cargo-pgx`) * `pg_graphql` (to use forked `cargo-pgx` and `pgx-contrib-spiext` version) Before the patch: ``` postgres=# create extension pg_jsonschema; 2023-02-02 17:45:23.120 UTC [35] ERROR: incompatible library "/usr/local/lib/pg_jsonschema.so": ABI mismatch 2023-02-02 17:45:23.120 UTC [35] DETAIL: Server has ABI "Neon Postgres", library has "PostgreSQL". 2023-02-02 17:45:23.120 UTC [35] STATEMENT: create extension pg_jsonschema; ERROR: incompatible library "/usr/local/lib/pg_jsonschema.so": ABI mismatch DETAIL: Server has ABI "Neon Postgres", library has "PostgreSQL". ``` After ``` postgres=# create extension pg_jsonschema; CREATE EXTENSION postgres=# select json_matches_schema('{"type": "object"}', '{}'); json_matches_schema --------------------- t postgres=# create extension pg_graphql; CREATE EXTENSION postgres=# create table book(id int primary key, title text); CREATE TABLE postgres=# insert into book(id, title) values (1, 'book 1'); INSERT 0 1 postgres=# select graphql.resolve($$ query { bookCollection { edges { node { id } } } } $$); resolve ---------------------------------------------------------------- {"data": {"bookCollection": {"edges": [{"node": {"id": 1}}]}}} (1 row) ``` ## Issue ticket number and link Closes #3429, #3096 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [x] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. `pg_jsonschema` extension will be available for our customers	2023-02-21 17:31:23 +01:00
Joonas Koivunen	b220ba6cd1	add random init delay for background tasks (#3655 ) Fixes #3649.	2023-02-21 12:42:11 +01:00
Joonas Koivunen	7de373210d	Warn when background tasks exceed their configured period (#3654 ) Fixes #3648.	2023-02-21 13:02:19 +02:00
Vadim Kharitonov	5c5b03ce08	Compile xml2 extension	2023-02-21 10:34:45 +01:00
Joonas Koivunen	d7d3f451f0	Use tracing panic hook in all binaries (#3634 ) Enables tracing panic hook in addition to pageserver introduced in #3475: - proxy - safekeeper - storage_broker For proxy, a drop guard which resets the original std panic hook was added on the first commit. Other binaries don't need it so they never reset anything by `disarm`ing the drop guard. The aim of the change is to make sure all panics a) have span information b) are logged similar to other messages, not interleaved with other messages as happens right now. Interleaving happens right now because std prints panics to stderr, and other logging happens in stdout. If this was handled gracefully by some utility, the log message splitter would treat panics as belonging to the previous message because it expects a message to start with a timestamp. Cc: #3468	2023-02-21 10:03:55 +02:00
Keanu Ashwell	bc7d3c6476	docs: add dependency requirements for arch based systems (#3588 ) This pull request adds information on building neon on Arch based system such as Artix, Manjaro, Antergos, etc.	2023-02-20 22:51:54 +03:00
Sergey Melnikov	e3d75879c0	Use fqdn to access console management API on production (#3651 ) console-release.local is legacy manual CNAME to neon-internal-api.aws.neon.tech in r53 We could use neon-internal-api.aws.neon.tech name directly This already was deployed to staging in https://github.com/neondatabase/neon/pull/3642	2023-02-20 18:11:06 +01:00
Christian Schwarz	485b269674	eviction: tone down logs to debug!() level if there were no evictions fixes #3647	2023-02-20 18:01:59 +01:00
Christian Schwarz	ee1eda9921	eviction: remove EvictionStats::not_considered_due_to_clock_skew Rationale: see the block comment added in this patch. fixes #3641	2023-02-20 18:01:59 +01:00
Christian Schwarz	e363911c85	timeline: propagate span to download_remote_layer (#3644 ) fixes #3643 refs #3604	2023-02-20 17:18:13 +02:00

1 2 3 4 5 ...

2864 Commits