rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-06 05:30:38 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	6dec85b19d	Redefine the timeline_gc API to not perform a forced compaction Previously, the /v1/tenant/:tenant_id/timeline/:timeline_id/do_gc API call performed a flush and compaction on the timeline before GC. Change it not to do that, and change all the tests that used that API to perform compaction explicitly. The compaction happens at a slightly different point now. Previously, the code performed the `refresh_gc_info_internal` step first, and only then did compaction on all the timelines. I don't think that was what was originally intended here. Presumably the idea with compaction was to make some old layer files available for GC. But if we're going to flush the current in-memory layer to disk, surely you would want to include the newly-written layer in the compaction too. I guess this didn't make any difference to the tests in practice, but in any case, the tests now perform the flush and compaction before any of the GC steps. Some of the tests might not need the compaction at all, but I didn't try hard to determine which ones might need it. I left it out from a few tests that intentionally tested calling do_gc with an invalid tenant or timeline ID, though.	2022-12-16 11:05:55 +02:00
Arseny Sher	70ce01d84d	Deploy broker with L4 LB in new env. (#3125 ) Seems to be fixing issue with missing keepalives.	2022-12-15 22:42:30 +01:00
Christian Schwarz	b58f7710ff	seqwait: different error messages per variant Would have been handy to get slightly more details in https://github.com/neondatabase/neon/issues/3109 refs https://github.com/neondatabase/neon/issues/3109	2022-12-15 18:19:43 +01:00
MMeent	807b110946	Update Makefile configuration: (#3011 ) - Use only one templated section for most postgres-versioned steps - Clean up neon_walredo, too, when running neon-pg-ext-clean - Depend on the various cleanup steps for `clean` instead of manually executing those cleanup steps.	2022-12-15 17:06:17 +00:00
Christian Schwarz	397b60feab	common abstraction for waiting for SK commit_lsn to reach PS	2022-12-15 11:50:39 +01:00
Christian Schwarz	10cd64cf8d	make TaskHandle::next_task_event cancellation-safe If we get cancelled before jh.await returns we've take()n the join handle but drop the result on the floor. Fix it by setting self.join_handle = None after the .await fixes https://github.com/neondatabase/neon/issues/3104	2022-12-15 10:26:17 +01:00
Christian Schwarz	bf3ac2be2d	add remote_physical_size metric We do the accounting exclusively after updating remote IndexPart successfully. This is cleaner & more robust than doing it upon completion of individual layer file uploads / deletions since we can uset .set() insteaf of add()/sub(). NB: Originally, this work was intended to be part of #3013 but it turns out that it's completely orthogonal. So, spin it out into this PR for easier review. Since this change is additive, it won't break anything.	2022-12-15 09:48:35 +01:00
Sergey Melnikov	c04c201520	Push proxy metrics to Victoria Metrics (#3106 )	2022-12-14 21:28:14 +01:00
Christian Schwarz	4132ae9dfe	always remove RemoteTimelineClient's metrics when dropping it	2022-12-14 19:25:29 +01:00
Alexander Bayandin	8fcba150db	test_seqscans: temporarily disable remote test (#3101 ) Temporarily disable `test_seqscans` for remote projects; they acquire too much space and time. We can try to reenable it back after switching to per-test projects.	2022-12-14 18:05:05 +00:00
Dmitry Rodionov	df09d0375b	ignore metadata_backup files in index_part	2022-12-14 19:00:19 +03:00
Vadim Kharitonov	62f6e969e7	Fix helm value for proxy	2022-12-14 16:41:26 +01:00
Kirill Bulatov	4d201619ed	Remove large database files after every test suite (#3090 ) Closes https://github.com/neondatabase/neon/issues/1984 Closes https://github.com/neondatabase/neon/pull/2830 A follow-up of https://github.com/neondatabase/neon/pull/2830, I've noticed that benchmarks failed again due to out of space issues. Removes most of the pageserver and safekeeper files from disk after every pytest suite run. ``` $ poetry run pytest -vvsk "test_tenant_redownloads_truncated_file_on_startup[local_fs]" # ... $ du -h test_output/test_tenant_redownloads_truncated_file_on_startup\[local_fs\] # ... 104K test_output/test_tenant_redownloads_truncated_file_on_startup[local_fs] $ poetry run pytest -vvsk "test_tenant_redownloads_truncated_file_on_startup[local_fs]" --preserve-database-files # ... $ du -h test_output/test_tenant_redownloads_truncated_file_on_startup\[local_fs\] # ... 123M test_output/test_tenant_redownloads_truncated_file_on_startup[local_fs] ``` Co-authored-by: Bojan Serafimov <bojan.serafimov7@gmail.com>	2022-12-14 13:09:08 +00:00
Alexander Bayandin	d3787f9b47	neon-project-create/delete: print project id to stdout (#3073 ) Print project_id to GitHub Actions stdout	2022-12-14 13:04:04 +00:00
Shany Pozin	ada5b7158f	Fix Issue #3014 (#3059 ) * TenantConfigRequest now supports tenant_id as hex string input instead of bytes array * Config file is truncated in each creation/update	2022-12-14 14:09:16 +02:00
Arseny Sher	f8ab5ef3b5	Update broker endpoint for prod-us-west-2. (#3095 )	2022-12-14 12:58:12 +01:00
Sergey Melnikov	827ee10b5a	Disable neon-stress deploy (#3093 )	2022-12-14 01:51:42 +01:00
Alexander Bayandin	c819b699be	Nightly Benchmark: run neon-captest-reuse from staging (#3086 ) The project has been migrated (now it is `restless-king-632302`), and now we should run tests from staging runners. Test run: https://github.com/neondatabase/neon/actions/runs/3686865543/jobs/6241367161 Ref https://github.com/neondatabase/cloud/issues/2836	2022-12-13 23:02:45 +00:00
Sergey Melnikov	228f9e4322	Use default folder for ansible collections (#3092 )	2022-12-13 23:59:49 +01:00
Sergey Melnikov	826214ae56	Force ansible-galaxy to also use local ansible.cfg (#3091 )	2022-12-13 21:06:18 +01:00
Sergey Melnikov	b39d6126bb	Force ansible to use local ansible.cfg (#3089 )	2022-12-13 21:57:39 +03:00
Vadim Kharitonov	0bc488b723	Add sentry environment for pageserver and safekeepers in new region (us-west-2)	2022-12-13 16:26:28 +01:00
Christian Schwarz	0c915dcb1d	Timeline::download_missing: fix handling of mismatched layer size Before this patch, when we decide to rename a layer file to backup because of layer file size mismatch, we would not remove the layer from the layer map, but remote the on-disk file. Because we re-download the file immediately after, we simply end up with two layer objects in memory that reference the same file in the layer map. So, GetPage() would work fine until one of the layers gets delete()'d. The other layer's delete() would then fail. Future work: prevent insertion of the same layer at LayerMap level so that we notice such bugs sooner.	2022-12-13 15:53:08 +01:00
Alexander Bayandin	feb07ed510	deploy (old): replace actions/setup-python@v4 with ansible image (#3081 ) Replace actions/setup-python@v4 with the ansible image to fix ``` Version 3.10 was not found in the local cache Error: The version '3.10' with architecture 'x64' was not found for this operating system. ```	2022-12-13 14:01:29 +00:00
Vadim Kharitonov	4603a4cbb5	Bypass SENTRY_ENVIRONMENT variable in order to filter panics in sentry by environment.	2022-12-13 14:52:04 +01:00
Kirill Bulatov	02c1c351dc	Create initial timeline without remote storage (#3077 ) Removes the race during pageserver initial timeline creation that lead to partial layer uploads. This race is only reproducible in test code, we do not create initial timelines in cloud (yet, at least), but still nice to remove the non-deterministic behavior.	2022-12-13 15:42:59 +02:00
Dmitry Ivanov	607c0facfc	[proxy] Propagate more console API errors to the user This patch aims to fix some of the inconsistencies in error reporting, for example "Internal error" or "Console request failed" instead of "password authentication failed for user '<NAME>'".	2022-12-13 16:16:31 +03:00
Sergey Melnikov	e5d523c86a	Add new us-west-2 region (#3071 )	2022-12-13 14:11:40 +01:00
Kirill Bulatov	7a16cde737	Remove useless pub trait method (#3076 )	2022-12-13 12:06:20 +00:00
Arseny Sher	d6325aa79d	Disable body size limit in ingress broker deploy. We have infinite streams.	2022-12-13 13:06:30 +03:00
Arseny Sher	544777e86b	Fix storage_broker deploy typo.	2022-12-13 10:57:26 +03:00
Arseny Sher	e2ae4c09a6	Put e2e tag back. `32662ff1c4` required running e2e tests on patched branch of cloud repo; not that it is merged, put the tag back.	2022-12-13 09:53:22 +03:00
Christian Schwarz	22ae67af8d	refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath (#3026 ) refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath Before this patch, we would sometimes carry around plain file names in `Path` types and/or awkwardly "rebase" paths to have a unified representation of the layer file name between local and remote. This patch introduces a new type `LayerFileName` which replaces the use of `Path` / `PathBuf` / `RemotePath` in the `storage_sync2` APIs. Instead of holding a string, it contains the parsed representation of the image and delta file name. When we need the file name, e.g., to construct a local path or remote object key, we construct the name ad-hoc. `LayerFileName` is also serde {Dese,Se}rializable, and in an initial version of this patch, it was supposed to be used directly inside `IndexPart`, replacing `RemotePath`. However, commit `3122f3282f` Ignore backup files (ones with .n.old suffix) in download_missing fixed handling of `.old` backup file names in IndexPart, and we need to carry that behavior forward. The solution is to remove `.old` backup files names during deserialization. When we re-serialize the IndexPart, the `*.old` file will be gone. This leaks the `.old` file in the remote storage, but makes it safe to clean it up later. There is additional churn by a preliminary refactoring that got squashed into this change: split off LayerMap's needs from trait Layer into super trait That refactoring renames `Layer` to `PersistentLayer` and splits off a subset of the functions into a super-trait called `Layer`. The upser trait implements just the functions needed by `LayerMap`, whereas `PersisentLayer` adds the context of the pageserver. The naming is imperfect as some functions that reside in `PersistentLayer` have nothing persistence-specific to it. But it's a step in the right direction.	2022-12-13 01:27:59 +02:00
Rory de Zoete	d1edc8aa00	Deprecate old runner for deploy job (#3070 ) As we plan to no longer use them Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>	2022-12-12 16:55:40 +01:00
Arseny Sher	f013d53230	Switch to clap derive API in safekeeper. Less lines and easier to read/modify. Practically no functional changes.	2022-12-12 16:25:23 +03:00
Kirill Bulatov	0aa2f5c9a5	Regroup CI testing (#3049 ) Part of https://github.com/neondatabase/neon/pull/2410 and https://github.com/neondatabase/neon/pull/2407 * adds `hashFiles('rust-toolchain.toml')` into Rust cache keys, thus removing one of the manual steps to do when upgrading rustc * copies Python and Rust style checks from the `codestyle.yml` workflow * adjusts shell defaults in the main workflow * replaces `codestyle.yml` with a `neon_extra_builds.yml` worlflow The new workflow runs on commits to `main` (`codestyle.yml` was run per PR), and runs two custom builds on GH agents: * macos-latest, to ensure the entire project compiles on it (no tests run) There were no frequent breakages on macOs in our builds, so we can check it rarely without making every storage PR to wait for it to complete. The updated mac build use release builds now, so presumably should work a bit faster due to overall smaller files to cache between builds. * ubuntu-latest, without caches, to produce full compilation stats for Rust builds and upload it as an artifact to GitHub Old `clippy build --timings` stats were collected from the builds that use caches and incremental calculation hence never could produce a full report, it got removed.	2022-12-12 12:58:55 +02:00
Vadim Kharitonov	26f4ff949a	Add sentry to storage_broker.	2022-12-12 13:30:16 +03:00
Arseny Sher	a1fd0ba23b	set tag to make proper e2e tests run	2022-12-12 13:30:16 +03:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Arseny Sher	249d77c720	Deploy broker with L4 LB on old envs. To avoid having to configure MAX_CONCURRENT_STREAMS on L7 LB (as well as TLS & public DNS).	2022-12-12 13:00:37 +03:00
Alexander Bayandin	0f445827f5	test_seqscans: increase table size for remote test (#3057 ) Increase table size four times to fix the following error: ``` ______________________ test_seqscans[remote-100000-100-0] ______________________ test_runner/performance/test_seqscans.py:57: in test_seqscans assert int(shared_buffers) < int(table_size) E assert 536870912 < 181239808 E + where 536870912 = int(536870912) E + and 181239808 = int(181239808) ``` 536870912 / 181239808 ≈ 2.96	2022-12-10 23:35:05 +00:00
Kirill Bulatov	700a36ee6b	Wait for certain tenant status in the remote storage test (#3055 ) Closes https://github.com/neondatabase/neon/issues/3052 From what I could understand from the PR, we did not wait enough before the attach failed. Extended the wait period a bit and put a check for a status instead of plain `sleep` to fail if we don't get the expected status.	2022-12-10 10:18:55 +02:00
Joonas Koivunen	b8a5664fb9	test: kill spawned postgres (#3054 ) Fixes #2604.	2022-12-10 00:35:05 +02:00
Kirill Bulatov	861dc8e64e	Remove redundant once_cell usages	2022-12-09 22:14:32 +02:00
Arseny Sher	4d6137e0e6	Try to fix docker image tag in broker deploy.	2022-12-09 20:43:54 +03:00
Lassi Pölönen	8684b1b582	Reduce the storage-broker deployment timeout to 5 minutes. 15 minutes is (#3047 ) 15 minutes is way too long, at least at this point and we want to see the possible errors quicker. Hence drop it to 5min to have some safety margin.	2022-12-09 14:37:53 +00:00
MMeent	3321eea679	Fix for #3043 (#3048 )	2022-12-09 14:26:05 +01:00
Arseny Sher	28667ce724	Make safekeeper exit code 0. We don't have any useful graceful shutdown mode, so immediate one is normal. https://github.com/neondatabase/neon/issues/2956	2022-12-09 12:35:36 +03:00
Lassi Pölönen	6c8b2af1f8	Change storage brokers to internal subdomain (#3039 ) There's a bit of a clash with the naming, so dedicate a subdomain for storage brokers. Back to subdomain separation just to be consistent.	2022-12-09 11:12:42 +02:00
Dmitry Rodionov	3122f3282f	Ignore backup files (ones with .n.old suffix) in download_missing This is rather a hack to resolve immediate issue: https://github.com/neondatabase/neon/issues/3024 Properly cleaning this file from index part requires changes to initialization of remote queue. Because we need to clean it up earlier than we start warking around files. With on-demand there will be no walk around layer files becase download_missing is no longer needed, so I believe it will be natural to unify this with load_layer_map	2022-12-09 12:07:50 +03:00

... 30 31 32 33 34 ...

4043 Commits