rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-14 17:02:56 +00:00

Author	SHA1	Message	Date
Kirill Bulatov	50297bef9f	RFC about Tenant / Timeline guard objects (#2660 ) Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2022-10-20 12:49:54 +03:00
Andrés	9211923bef	Pageserver Python tests should not fail if the server is built with no testing feature (#2636 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-20 10:46:57 +03:00
bojanserafimov	7734929a82	Remove stale todos (#2630 )	2022-10-19 22:59:22 +00:00
Heikki Linnakangas	bc5ec43056	Fix flaky physical-size tests in test_timeline_size.py. These two tests, test_timeline_physical_size_post_compaction and test_timeline_physical_size_post_gc, assumed that after you have waited for the WAL from a bulk insertion to arrive, and you run a cycle of checkpoint and compaction, no new layer files are created. Because if a new layer file is created while we are calculating the incremental and non-incremental physical sizes, they might differ. However, the tests used a very small checkpoint_distance, so even a small amount of WAL generated in PostgreSQL could cause a new layer file to be created. Autovacuum can kick in at any time, and do that. That caused occasional failues in the test. I was able to reproduce it reliably by adding a long delay between the incremental and non-incremental size calculations: ``` --- a/pageserver/src/http/routes.rs +++ b/pageserver/src/http/routes.rs @@ -129,6 +129,9 @@ async fn build_timeline_info( } }; let current_physical_size = Some(timeline.get_physical_size()); + if include_non_incremental_physical_size { + std:🧵:sleep(std::time::Duration::from_millis(60000)); + } let info = TimelineInfo { tenant_id: timeline.tenant_id, ``` To fix, disable autovacuum for the table. Autovacuum could still kick in for other tables, e.g. catalog tables, but that seems less likely to generate enough WAL to causea new layer file to be flushed. If this continues to be a problem in the future, we could simply retry the physical size call a few times, if there's a mismatch. A mismatch could happen every once in a while, but it's very unlikely to happen more than once or twice in a row. Fixes https://github.com/neondatabase/neon/issues/2212	2022-10-19 23:50:21 +03:00
MMeent	b237feedab	Add more redo metrics: (#2645 ) - Measure size of redo WAL (new histogram), with bounds between 24B-32kB - Add 2 more buckets at the upper end of the redo time histogram We often (>0.1% of several hours each day) take more than 250ms to do the redo round-trip to the postgres process. We need to measure these redo times more precisely.	2022-10-19 22:47:11 +02:00
Alexey Kondratov	4d1e48f3b9	[compute_ctl] Use postgres::config to properly escape database names (#2652 ) We've got at least one user in production that cannot create a database with a trailing space in the name. This happens because we use `url` crate for manipulating the DATABASE_URL, but it follows a standard that doesn't fit really well with Postgres. For example, it trims all trailing spaces from the path: > Remove any leading and trailing C0 control or space from input. > See: https://url.spec.whatwg.org/#url-parsing But we used `set_path()` to set database name and it's totally valid to have trailing spaces in the database name in Postgres. Thus, use `postgres::config::Config` to modify database name in the connection details.	2022-10-19 19:20:06 +02:00
Anastasia Lubennikova	7576b18b14	[compute_tools] fix GRANT CREATE ON SCHEMA public - run the grant query in each database	2022-10-19 18:37:52 +03:00
Konstantin Knizhnik	6b49b370fc	Fix build after applying PR #2558	2022-10-19 13:55:30 +03:00
Konstantin Knizhnik	91411c415a	Persists latest_gc_cutoff_lsn before performing GC (#2558 ) * Persists latest_gc_cutoff_lsn before performing GC * Peform some refactoring and code deduplication refer #2539 * Add test for persisting GC cutoff * Fix python test style warnings * Bump postgres version * Reduce number of iterations in test_gc_cutoff test * Bump postgres version * Undo bumping postgres version	2022-10-19 12:32:03 +03:00
Kirill Bulatov	c67cf34040	Update GH Action version (#2646 )	2022-10-19 11:16:36 +03:00
bojanserafimov	8fbe437768	Improve pageserver IO metrics (#2629 )	2022-10-18 11:53:28 -04:00
Heikki Linnakangas	989d78aac8	Buffer the TCP incoming stream on libpq connections. Reduces the number of syscalls needed to read the commands from the compute. Here's a snippet of strace output from the pageserver, when performing a sequential scan on a table, with prefetch: 3084934 recvfrom(47, "d", 1, 0, NULL, NULL) = 1 3084934 recvfrom(47, "\0\0\0\37", 4, 0, NULL, NULL) = 4 3084934 recvfrom(47, "\2\1\0\0\0\0\362\302\360\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\0\3", 27, 0, NULL, NULL) = 27 3084934 pread64(28, "\0\0\0\1\0\0\0\0\0\0\0\253 "..., 8192, 25190400) = 8192 3084934 write(45, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\3A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010 3084934 poll([{fd=46, events=POLLIN}, {fd=48, events=POLLIN}], 2, 60000) = 1 ([{fd=46, revents=POLLIN}]) 3084934 read(46, "\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192 3084934 sendto(47, "d\0\0 \5f\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198 3084934 recvfrom(47, "d", 1, 0, NULL, NULL) = 1 3084934 recvfrom(47, "\0\0\0\37", 4, 0, NULL, NULL) = 4 3084934 recvfrom(47, "\2\1\0\0\0\0\362\302\360\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\0\4", 27, 0, NULL, NULL) = 27 3084934 pread64(28, " \0=\0L\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0;;\0\0\0\4\4\0"..., 8192, 25198592) = 8192 3084934 write(45, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\4A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010 3084934 poll([{fd=46, events=POLLIN}, {fd=48, events=POLLIN}], 2, 60000) = 1 ([{fd=46, revents=POLLIN}]) 3084934 read(46, "\0\0\0\0\260\344q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192 3084934 sendto(47, "d\0\0 \5f\0\0\0\0\260\344q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198 3084934 recvfrom(47, "d", 1, 0, NULL, NULL) = 1 3084934 recvfrom(47, "\0\0\0\37", 4, 0, NULL, NULL) = 4 3084934 recvfrom(47, "\2\1\0\0\0\0\362\302\360\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\0\5", 27, 0, NULL, NULL) = 27 3084934 write(45, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\5A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010 3084934 poll([{fd=46, events=POLLIN}, {fd=48, events=POLLIN}], 2, 60000) = 1 ([{fd=46, revents=POLLIN}]) 3084934 read(46, "\0\0\0\0\330\377q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192 3084934 sendto(47, "d\0\0 \5f\0\0\0\0\330\377q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198 This shows the interaction for three get_page_at_lsn requests. For each request, the pageserver performs three recvfrom syscalls to read the incoming request from the socket. After this patch, those recvfrom calls are gone: 3086123 read(47, "\0\0\0\0\360\222q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192 3086123 sendto(45, "d\0\0 \5f\0\0\0\0\360\222q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198 3086123 pread64(29, " "..., 8192, 25182208) = 8192 3086123 write(46, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\2A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010 3086123 poll([{fd=47, events=POLLIN}, {fd=49, events=POLLIN}], 2, 60000) = 1 ([{fd=47, revents=POLLIN}]) 3086123 read(47, "\0\0\0\0000\256q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192 3086123 sendto(45, "d\0\0 \5f\0\0\0\0000\256q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198 3086123 pread64(29, "\0\0\0\1\0\0\0\0\0\0\0\253 "..., 8192, 25190400) = 8192 3086123 write(46, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\3A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010 3086123 poll([{fd=47, events=POLLIN}, {fd=49, events=POLLIN}], 2, 60000) = 1 ([{fd=47, revents=POLLIN}]) 3086123 read(47, "\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237\362\0\0\237\362\0"..., 8192) = 8192 3086123 sendto(45, "d\0\0 \5f\0\0\0\0p\311q\1\0\0\4\0\f\1\200\1\0 \4 \0\0\0\0\200\237"..., 8198, MSG_NOSIGNAL, NULL, 0) = 8198 3086123 pread64(29, " \0=\0L\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0;;\0\0\0\4\4\0"..., 8192, 25198592) = 8192 3086123 write(46, "B\0\0\0\25\0\0\0\6\177\0\0002\276\0\0@\f\0\0\0\4A\0\0\32\355\0\0\0\0\1"..., 7010) = 7010 3086123 poll([{fd=47, events=POLLIN}, {fd=49, events=POLLIN}], 2, 60000) = 1 ([{fd=47, revents=POLLIN}]) In this test, the compute sends a batch of prefetch requests, and they are read from the socket in one syscall. That syscall was not captured by the strace snippet above, but there are much fewer of them than before.	2022-10-18 18:46:07 +03:00
Stas Kelvich	7ca72578f9	Enable plv8 again Now with quickfix for https://github.com/plv8/plv8/issues/503	2022-10-18 18:34:27 +03:00
Heikki Linnakangas	41550ec8bf	Remove unnecessary indirections of libpqwalproposer functions In the Postgres backend, we cannot link directly with libpq (check the pgsql-hackers arhive for all kinds of fun that ensued when we tried to do that). Therefore, the libpq functions are used through the thin wrapper functions in libpqwalreceiver.so, and libpqwalreceiver.so is loaded dynamically. To hide the dynamic loading and make the calls look like regular functions, we use macros to hide the function pointers. We had inherited the same indirections in libpqwalproposer, but it's not needed since the neon extension is already a shared library that's loaded dynamically. There's no problem calling the functions directly there. Remove the indirections.	2022-10-18 18:25:30 +03:00
Sergey Melnikov	0cd2d91b9d	Fix deploy-new job by installing sivel.toiletwater (#2641 )	2022-10-18 14:44:19 +00:00
Sergey Melnikov	546e9bdbec	Deploy storage into new account and migrate to management API v2 (#2619 ) Deploy storage into new account Migrate safekeeper and pageserver initialisation to management api v2	2022-10-18 15:52:15 +03:00
Heikki Linnakangas	59bc7e67e0	Use an optimized version of amplify_num. Speeds up layer_map::search somewhat. I also opened a PR in the upstream rust-amplify repository with these changes, see https://github.com/rust-amplify/rust-amplify/pull/148. We can switch back to upstream version when that's merged.	2022-10-18 15:00:10 +03:00
Heikki Linnakangas	2418e72649	Speed up layer_map::search, by remembering the "envelope" for each layer. Lookups in the R-tree call the "envelope" function for every comparison, and our envelope function isn't very cheap, so that overhead adds up. Create the envelope once, when the layer is inserted into the tree, and store it along with the layer. That uses some more memory per layer, but that's not very significant. Speeds up the search operation 2x	2022-10-18 15:00:10 +03:00
Heikki Linnakangas	80746b1c7a	Add micro-benchmark for layer map search function The test data was extracted from our pgbench benchmark project on the captest environment, the one we use for the 'neon-captest-reuse' test.	2022-10-18 15:00:10 +03:00
Dmitry Rodionov	129f7c82b7	remove redundant expect_tenant_to_download_timeline	2022-10-18 11:21:48 +03:00
Anastasia Lubennikova	0ec5ddea0b	GRANT CREATE ON SCHEMA public TO web_access	2022-10-17 22:42:51 +03:00
Kirill Bulatov	c4ee62d427	Bump clap and other minor dependencies (#2623 )	2022-10-17 12:58:40 +03:00
Joonas Koivunen	c709354579	Add layer sizes to index_part.json (#2582 ) This is the first step in verifying layer files. Next up on the road is hashing the files and verifying the hashes. The metadata additions do not require any migration. The idea is that the change is backward and forward-compatible with regard to `index_part.json` due to the softness of JSON schema and the deserialization options in use. New types added: - LayerFileMetadata for tracking the file metadata - starting with only the file size - in future hopefully a sha256 as well - IndexLayerMetadata, the serialized counterpart of LayerFileMetadata LayerFileMetadata needing to have all fields Option is a problem but that is not possible to handle without conflicting a lot more with other ongoing work. Co-authored-by: Kirill Bulatov <kirill@neon.tech>	2022-10-17 12:21:04 +03:00
Lassi Pölönen	5d6553d41d	Fix pageserver configuration generation bug (#2584 ) * We had an issue with `lineinfile` usage for pageserver configuration file: if the S3 bucket related values were changed, it would have resulted in duplicate keys, resulting in invalid toml. So to fix the issue, we should keep the configuration in structured format (yaml in this case) so we can always generate syntactically correct toml. Inventories are converted to yaml just so that it's easier to maintain the configuration there. Another alternative would have been a separate variable files. * Keep the ansible collections dir, but locally installed collections should not be tracked.	2022-10-16 11:37:10 +00:00
Kirill Bulatov	f03b7c3458	Bump regular dependencies (#2618 ) * etcd-client is not updated, since we plan to replace it with another client and the new version errors with some missing prost library error * clap had released another major update that requires changing every CLI declaration again, deserves a separate PR	2022-10-15 01:55:31 +03:00
Heikki Linnakangas	9c24de254f	Add description and license fields to OpenAPI spec. These were added earlier to the control plane's copy of this file. This is the master version of this file, so let's keep it in sync.	2022-10-14 18:37:58 +03:00
Heikki Linnakangas	538876650a	Merge 'local' and 'remote' parts of TimelineInfo into one struct. The 'local' part was always filled in, so that was easy to merge into into the TimelineInfo itself. 'remote' only contained two fields, 'remote_consistent_lsn' and 'awaits_download'. I made 'remote_consistent_lsn' an optional field, and 'awaits_download' is now false if the timeline is not present remotely. However, I kept stub versions of the 'local' and 'remote' structs for backwards-compatibility, with a few fields that are actively used by the control plane. They just duplicate the fields from TimelineInfo now. They can be removed later, once the control plane has been updated to use the new fields.	2022-10-14 18:37:14 +03:00
Heikki Linnakangas	500239176c	Make TimelineInfo.local field mandatory. It was only None when you queried the status of a timeline with 'timeline_detail' mgmt API call, and it was still being downloaded. You can check for that status with the 'tenant_status' API call instead, checking for has_in_progress_downloads field. Anothere case was if an error happened while trying to get the current logical size, in a 'timeline_detail' request. It might make sense to tolerate such errors, and leave the fields we cannot fill in as empty, None, 0 or similar, but it doesn't make sense to me to leave the whole 'local' struct empty in tht case.	2022-10-14 18:37:14 +03:00
Anastasia Lubennikova	ee64a6b80b	Fix CI: push versioned compute images to production ECR	2022-10-14 18:12:50 +03:00
Anastasia Lubennikova	a13b486943	Bump vendor/postgres-v15. Rebase to 15.0	2022-10-14 18:12:50 +03:00
Arseny Sher	9fe4548e13	Reimplement explicit timeline creation on safekeepers. With the ability to pass commit_lsn. This allows to perform project WAL recovery through different (from the original) set of safekeepers (or under different ttid) by 1) moving WAL files to s3 under proper ttid; 2) explicitly creating timeline on safekeepers, setting commit_lsn to the latest point; 3) putting the lastest .parital file to the timeline directory on safekeepers, if desired. Extend test_s3_wal_replay to exersise this behaviour. Also extends timeline_status endpoint to return postgres information.	2022-10-13 21:43:10 +04:00
Heikki Linnakangas	14c623b254	Make it possible to build with old cargo version. I'm using the Rust compiler and cargo versions from Debian packages, but the latest available cargo Debian package is quite old, version 1.57. The 'named-profiles' features was not stabilized at that version yet, so ever since commit `a463749f5`, I've had to manually add this line to the Cargo.toml file to compile. I've been wishing that someone would update the cargo Debian package, but it doesn't seem to be happening any time soon. This doesn't seem to bother anyone else but me, but it shouldn't hurt anyone else either. If there was a good reason, I could install a newer cargo version with 'rustup', but if all we need is this one line in Cargo.toml, I'd prefer to continue using the Debian packages.	2022-10-13 15:17:00 +03:00
Alexander Bayandin	ebf54b0de0	Nightly Benchmarks: Add 50 GB projects (#2612 )	2022-10-13 10:00:29 +01:00
Andrés	09dda35dac	Return broken tenants due to non existing timelines dir (#2552 ) (#2575 ) Co-authored-by: andres <andres.rodriguez@outlook.es>	2022-10-12 22:28:39 +03:00
Dmitry Ivanov	6ace79345d	[proxy] Add more context to console requests logging (#2583 )	2022-10-12 21:00:44 +03:00
danieltprice	771e61425e	Update release-pr.md (#2600 ) Update the Release Notes PR example that is referenced from the checklist. The Release Notes file structure changed recently.	2022-10-12 08:38:28 -03:00
Alexander Bayandin	93775f6ca7	GitHub Actions: replace deprecated set-output with GITHUB_OUTPUT (#2608 )	2022-10-12 10:22:24 +01:00
Arseny Sher	6d0dacc4ce	Recreate timeline on pageserver in s3_wal_replay test. That's closer to real usage than switching to brand new pageserver.	2022-10-12 11:46:21 +04:00
Heikki Linnakangas	e5e40a31f4	Clean up terms "delete timeline" and "detach tenant". You cannot attach/detach an individual timeline, attach/detach always applies to the whole tenant. However, you can delete a single timeline from a tenant. Fix some comments and error messages that confused these two operations.	2022-10-11 17:47:41 +03:00
Heikki Linnakangas	676c63c329	Improve comments.	2022-10-11 17:47:41 +03:00
Heikki Linnakangas	47366522a8	Make the return type of 'list_timelines' simpler. It's enough to return just the Timeline references. You can get the timeline's ID easily from Timeline.	2022-10-11 17:47:41 +03:00
Heikki Linnakangas	db26bc49cc	Remove obsolete FIXME comment. Commit `c634cb1d36` removed the trait and changed the function to return a &TimelineWriter, as the FIXME said we should do, but forgot to remove the FIXME.	2022-10-11 17:47:41 +03:00
Lassi Pölönen	e520293090	Add build info metric to pageserver, safekeeper and proxy (#2596 ) * Test that we emit build info metric for pageserver, safekeeper and proxy with some non-zero length revision label * Emit libmetrics_build_info on startup of pageserver, safekeeper and proxy with label "revision" which tells the git revision.	2022-10-11 09:54:32 +03:00
Sergey Melnikov	241e549757	Switch neon-stress etcd to dedicatd instance (#2602 )	2022-10-10 22:07:19 +00:00
Sergey Melnikov	34bea270f0	Fix POSTGRES_DISTRIB_DIR for benchmarks on ec2 runner (#2594 )	2022-10-10 09:12:50 +00:00
Kirill Bulatov	13f0e7a5b4	Deploy pageserver_binutils to the envs	2022-10-09 08:21:11 +03:00
Kirill Bulatov	3e35f10adc	Add a script to reformat the project	2022-10-09 08:21:11 +03:00
Kirill Bulatov	3be3bb7730	Be more verbose with initdb for pageserver timeline creation	2022-10-09 08:21:11 +03:00
Kirill Bulatov	01d2c52c82	Tidy up feature reporting	2022-10-09 08:21:11 +03:00
Kirill Bulatov	9f79e7edea	Merge pageserver helper binaries and provide it for deployment (#2590 )	2022-10-08 12:42:17 +00:00

... 2 3 4 5 6 ...

2373 Commits