rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 13:32:57 +00:00

Author	SHA1	Message	Date
Kirill Bulatov	0aa2f5c9a5	Regroup CI testing (#3049 ) Part of https://github.com/neondatabase/neon/pull/2410 and https://github.com/neondatabase/neon/pull/2407 * adds `hashFiles('rust-toolchain.toml')` into Rust cache keys, thus removing one of the manual steps to do when upgrading rustc * copies Python and Rust style checks from the `codestyle.yml` workflow * adjusts shell defaults in the main workflow * replaces `codestyle.yml` with a `neon_extra_builds.yml` worlflow The new workflow runs on commits to `main` (`codestyle.yml` was run per PR), and runs two custom builds on GH agents: * macos-latest, to ensure the entire project compiles on it (no tests run) There were no frequent breakages on macOs in our builds, so we can check it rarely without making every storage PR to wait for it to complete. The updated mac build use release builds now, so presumably should work a bit faster due to overall smaller files to cache between builds. * ubuntu-latest, without caches, to produce full compilation stats for Rust builds and upload it as an artifact to GitHub Old `clippy build --timings` stats were collected from the builds that use caches and incremental calculation hence never could produce a full report, it got removed.	2022-12-12 12:58:55 +02:00
Vadim Kharitonov	26f4ff949a	Add sentry to storage_broker.	2022-12-12 13:30:16 +03:00
Arseny Sher	a1fd0ba23b	set tag to make proper e2e tests run	2022-12-12 13:30:16 +03:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Arseny Sher	249d77c720	Deploy broker with L4 LB on old envs. To avoid having to configure MAX_CONCURRENT_STREAMS on L7 LB (as well as TLS & public DNS).	2022-12-12 13:00:37 +03:00
Alexander Bayandin	0f445827f5	test_seqscans: increase table size for remote test (#3057 ) Increase table size four times to fix the following error: ``` ______________________ test_seqscans[remote-100000-100-0] ______________________ test_runner/performance/test_seqscans.py:57: in test_seqscans assert int(shared_buffers) < int(table_size) E assert 536870912 < 181239808 E + where 536870912 = int(536870912) E + and 181239808 = int(181239808) ``` 536870912 / 181239808 ≈ 2.96	2022-12-10 23:35:05 +00:00
Kirill Bulatov	700a36ee6b	Wait for certain tenant status in the remote storage test (#3055 ) Closes https://github.com/neondatabase/neon/issues/3052 From what I could understand from the PR, we did not wait enough before the attach failed. Extended the wait period a bit and put a check for a status instead of plain `sleep` to fail if we don't get the expected status.	2022-12-10 10:18:55 +02:00
Joonas Koivunen	b8a5664fb9	test: kill spawned postgres (#3054 ) Fixes #2604.	2022-12-10 00:35:05 +02:00
Kirill Bulatov	861dc8e64e	Remove redundant once_cell usages	2022-12-09 22:14:32 +02:00
Arseny Sher	4d6137e0e6	Try to fix docker image tag in broker deploy.	2022-12-09 20:43:54 +03:00
Lassi Pölönen	8684b1b582	Reduce the storage-broker deployment timeout to 5 minutes. 15 minutes is (#3047 ) 15 minutes is way too long, at least at this point and we want to see the possible errors quicker. Hence drop it to 5min to have some safety margin.	2022-12-09 14:37:53 +00:00
MMeent	3321eea679	Fix for #3043 (#3048 )	2022-12-09 14:26:05 +01:00
Arseny Sher	28667ce724	Make safekeeper exit code 0. We don't have any useful graceful shutdown mode, so immediate one is normal. https://github.com/neondatabase/neon/issues/2956	2022-12-09 12:35:36 +03:00
Lassi Pölönen	6c8b2af1f8	Change storage brokers to internal subdomain (#3039 ) There's a bit of a clash with the naming, so dedicate a subdomain for storage brokers. Back to subdomain separation just to be consistent.	2022-12-09 11:12:42 +02:00
Dmitry Rodionov	3122f3282f	Ignore backup files (ones with .n.old suffix) in download_missing This is rather a hack to resolve immediate issue: https://github.com/neondatabase/neon/issues/3024 Properly cleaning this file from index part requires changes to initialization of remote queue. Because we need to clean it up earlier than we start warking around files. With on-demand there will be no walk around layer files becase download_missing is no longer needed, so I believe it will be natural to unify this with load_layer_map	2022-12-09 12:07:50 +03:00
MMeent	4752385470	Update PostgreSQL to latest vendored releases (#3037 ) Several fixes are included, with among others: - Prefetching for index bulkdelete calls (e.g. during vacuum), plus v14 compiler warning fix - A fix for setting LSN on heap pages while setting vm bits - Some style updates that were lost in the previous wave (v15 only)	2022-12-09 11:02:23 +02:00
Alexander Bayandin	9747e90f3a	Nightly Benchmarks: Move from captest to staging (#2838 ) Migrate Nightly Benchmarks from captest to staging. - Migrate GitHub Workflows - Replace `zenith-benchmarker` with regular runners - Remove `environment` parameter from Neon GitHub Actions, add `postgres_version` - The only job left on captest is `neon-captest-reuse`, which will be moved to staging after its project migration. Ref https://github.com/neondatabase/cloud/issues/2836	2022-12-08 22:28:25 +00:00
Alexander Bayandin	a19c487766	Nightly Benchmarks: add TPC-H benchmark (#2978 ) Ref: https://www.tpc.org/tpch/	2022-12-08 15:32:49 +00:00
Alexander Bayandin	5c701f9a75	merge-allure-report: create report even if benchmarks is skipped (#3029 )	2022-12-08 15:13:40 +00:00
dependabot[bot]	4de4217247	Bump certifi from 2022.9.24 to 2022.12.7 (#3033 )	2022-12-08 14:50:59 +00:00
Arseny Sher	2baf6c09a8	Some more allowed pageserver errors. https://neondb.slack.com/archives/C033RQ5SPDH/p1670497680293859	2022-12-08 15:54:59 +03:00
Sergey Melnikov	f5a735ac3b	Add proxy and broker to us-west-2 (#3027 ) Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>	2022-12-08 12:24:24 +01:00
MMeent	0d04cd0b99	Run compaction on the buffer holding received buffers when useful (#3028 ) This cleans up unused entries and reduces the chance of prefetch buffer thrashing.	2022-12-08 09:49:43 +01:00
Konstantin Knizhnik	e1ef62f086	Print more information about context of failed walredo requests (#3003 )	2022-12-08 09:12:38 +02:00
Kirill Bulatov	b50e0793cf	Rework remote_storage interface (#2993 ) Changes: * Remove `RemoteObjectId` concept from remote_storage. Operate directly on /-separated names instead. These names are now represented by struct `RemotePath` which was renamed from struct `RelativePath` * Require remote storage to operate on relative paths for its contents, thus simplifying the way to derive them in pageserver and safekeeper * Make `IndexPart` to use `String` instead of `RelativePath` for its entries, since those are just the layer names	2022-12-07 23:11:02 +02:00
Christian Schwarz	ac0c167a85	improve pidfile handling This patch centralize the logic of creating & reading pid files into the new pid_file module and improves upon / makes explicit a few race conditions that existed with the previous code. Starting Processes / Creating Pidfiles ====================================== Before this patch, we had three places that had very similar-looking match lock_file::create_lock_file { ... } blocks. After this change, they can use a straight-forward call provided by the pid_file: pid_file::claim_pid_file_for_pid() Stopping Processes / Reading Pidfiles ===================================== The new pid_file module provides a function to read a pidfile, called read_pidfile(), that returns a pub enum PidFileRead { NotExist, NotHeldByAnyProcess(PidFileGuard), LockedByOtherProcess(Pid), } If we get back NotExist, there is nothing to kill. If we get back NotHeldByAnyProcess, the pid file is stale and we must ignore its contents. If it's LockedByOtherProcess, it's either another pidfile reader or, more likely, the daemon that is still running. In this case, we can read the pid in the pidfile and kill it. There's still a small window where this is racy, but it's not a regression compared to what we have before. The NotHeldByAnyProcess is an improvement over what we had before this patch. Before, we would blindly read the pidfile contents and kill, even if no other process held the flock. If the pidfile was stale (NotHeldByAnyProcess), then that kill would either result in ESRCH or hit some other unrelated process on the system. This patch avoids the latter cacse by grabbing an exclusive flock before reading the pidfile, and returning the flock to the caller in the form of a guard object, to avoid concurrent reads / kills. It's hopefully irrelevant in practice, but it's a little robustness that we get for free here. Maintain flock on Pidfile of ETCD / any InitialPidFile::Create() ================================================================ Pageserver and safekeeper create their pidfiles themselves. But for etcd, neon_local creates the pidfile (InitialPidFile::Create()). Before this change, we would unlock the etcd pidfile as soon as `neon_local start` exits, simply because no-one else kept the FD open. During `neon_local stop`, that results in a stale pid file, aka, NotHeldByAnyProcess, and it would henceforth not trust that the PID stored in the file is still valid. With this patch, we make the etcd process inherit the pidfile FD, thereby keeping the flock held until it exits.	2022-12-07 18:24:12 +01:00
Lassi Pölönen	6dfd7cb1d0	Neon storage broker helm value fixes (#3025 ) * We were missing one cluster in production: `prod-ap-southeast-1-epsilon` configs. * We had `metrics` enabled. This means creating `ServiceScrape` objects, but since those clusters don't have `kube-prometheus-stack` like older ones, we are missing the CRDs, so the helm deploy fails.	2022-12-07 17:15:51 +02:00
Heikki Linnakangas	a46a81b5cb	Fix updating "trace_read_requests" with /v1/tenant/config mgmt API. The new "trace_read_requests" option was missing from the parse_toml_tenant_conf function that reads the config file. Because of that, the option was ignored, which caused the test_read_trace.py test to fail. It used to work before commit `9a6c0be823`, because the TenantConfigOpt struct was constructed directly in tenant_create_handler, but now it is saved and read back from disk even for a newly created tenant. The abovementioned bug was fixed in commit `09393279c6` already, which added the missing code to parse_toml_tenant_conf() to parse the new "trace_read_requests" option. This commit fixes one more function that was missed earlier, and adds more detail to the error message if parsing the config file fails.	2022-12-07 15:03:39 +02:00
Lassi Pölönen	c74dca95fc	Helm values for old staging and one region in new staging (#2922 ) helm values for the new `storage-broker`. gRPC, over secure connection with a proper certificate, but no authentication. Uses alb ingress in the old cluster and nginx ingress for the new one. The chart is deployed and the addresses are functional, while the pipeline doesn't exist yet.	2022-12-07 14:24:07 +02:00
Heikki Linnakangas	b513619503	Remove obsolete 'awaits_download' field. It used to be a separate piece of state, but after `9a6c0be823` it's just an alias for the Tenant being in Attaching state. It was only used in one assertion in a test, but that check doesn't make sense anymore, so just remove it. Fixes https://github.com/neondatabase/neon/issues/2930	2022-12-07 13:13:54 +02:00
Shany Pozin	b447eb4d1e	Add postgres-v15 to source tree documentation (#3023 )	2022-12-07 12:56:42 +02:00
Kirill Bulatov	6a57d5bbf9	Make the request tracing test more useful	2022-12-06 23:52:16 +02:00
Kirill Bulatov	09393279c6	Fix tenant config parsing	2022-12-06 23:52:16 +02:00
Nikita Kalyanov	634d0eab68	pass availability zone to console during pageserver registration (#2991 ) this is safe because unknown fields are ignored. After the corresponding PR in control plane is merged this field is going to be required Part of https://github.com/neondatabase/cloud/issues/3131	2022-12-06 21:09:54 +02:00
Kliment Serafimov	8f2b3cbded	Sentry integration for storage. (#2926 ) Added basic instrumentation to integrate sentry with the proxy, pageserver, and safekeeper processes. Currently in sentry there are three projects, one for each process. Sentry url is sent to all three processes separately via cli args.	2022-12-06 18:57:54 +00:00
Christian Schwarz	4530544bb8	draw_timeline_dirs: accept paths as input	2022-12-06 18:17:48 +01:00
Dmitry Rodionov	98ff0396f8	tone down error log for successful process termination	2022-12-06 18:44:07 +03:00
Kirill Bulatov	d6bfe955c6	Add commands to unload and load the tenant in memory (#2977 ) Closes https://github.com/neondatabase/neon/issues/2537 Follow-up of https://github.com/neondatabase/neon/pull/2950 With the new model that prevents attaching without the remote storage, it has started to be even more odd to add attach-with-files functionality (in addition to the issues raised previously). Adds two separate commands: * `POST {tenant_id}/ignore` that places a mark file to skip such tenant on every start and removes it from memory * `POST {tenant_id}/schedule_load` that tries to load a tenant from local FS similar to what pageserver does now on startup, but without directory removals	2022-12-06 15:30:02 +00:00
danieltprice	046ba67d68	Update README.md (#3015 ) Update readme to remove reference to the invite gate.	2022-12-06 11:27:46 -04:00
Alexander Bayandin	61825dfb57	Update chrono to 0.4.23; use only clock feature from it	2022-12-06 15:45:58 +01:00
Kirill Bulatov	c0480facc1	Rename RelativePath to RemotePath Improve rustdocs a bit	2022-12-05 22:52:42 +02:00
Kirill Bulatov	b38473d367	Remove RelativePath conversions Function was unused, but publicly exported from the module lib, so not reported by rustc as unused	2022-12-05 22:52:42 +02:00
Kirill Bulatov	7a9cb75e02	Replace dynamic dispatch with static dispatch	2022-12-05 22:52:42 +02:00
Kirill Bulatov	38af453553	Use async RwLock around tenants (#3009 ) A step towards more async code in our repo, to help avoid most of the odd blocking calls, that might deadlock, as mentioned in https://github.com/neondatabase/neon/issues/2975	2022-12-05 22:48:45 +02:00
Shany Pozin	79fdd3d51b	Fix #2907 : Change missing_layers property to optional in the IndexPart struct (#3005 ) Move missing_layers property to Option<HashSet<RelativePath>> This will allow the safe removal of it once the upgrade of all page servers is done with this new code	2022-12-05 13:56:04 +02:00
Alexander Bayandin	ab073696d0	test_bulk_update: use new prefetch settings (#3007 ) Replace `seqscan_prefetch_buffers` with `effective_io_concurrency` & `maintenance_io_concurrency` in one more place (the last one!)	2022-12-05 10:56:01 +00:00
Kirill Bulatov	4f443c339d	Tone down retry error logs (#2999 ) Closes https://github.com/neondatabase/neon/issues/2990	2022-12-03 15:30:55 +00:00
Alexander Bayandin	ed27c98022	Nightly Benchmarks: use new prefetch settings (#3000 ) - Replace `seqscan_prefetch_buffers` with `effective_io_concurrency` and `maintenance_io_concurrency` for `clickbench-compare` job (see https://github.com/neondatabase/neon/pull/2876) - Get the database name in a runtime (it can be `main` or `neondb` or something else)	2022-12-03 13:11:02 +00:00
Alexander Bayandin	788823ebe3	Fix named_arguments_used_positionally warnings (#2987 ) ``` warning: named argument `file` is not used by name --> pageserver/src/tenant/timeline.rs:1078:54 \| 1078 \| trace!("downloading image file: {}", file = path.display()); \| -- ^^^^ this named argument is referred to by position in formatting string \| \| \| this formatting argument uses named argument `file` by position \| = note: `#[warn(named_arguments_used_positionally)]` on by default help: use the named argument by name to avoid ambiguity \| 1078 \| trace!("downloading image file: {file}", file = path.display()); \| ++++ ``` Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2022-12-02 17:59:26 +00:00
MMeent	145e7e4b96	Prefetch cleanup: (#2876 ) - Enable `enable_seqscan_prefetch` by default - Drop use of `seqscan_prefetch_buffers` in favor of `[maintenance,effective]_io_concurrency` This includes adding some fields to the HeapScan execution node, and vacuum state. - Cleanup some conditionals in vacuumlazy.c - Clarify enable_seqscan_prefetch GUC description - Fix issues in heap SeqScan prefetching where synchronize_seqscan machinery wasn't handled properly.	2022-12-02 13:35:01 +01:00

1 2 3 4 5 ...

2458 Commits