rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 17:32:56 +00:00

Author	SHA1	Message	Date
Christian Schwarz	d5bb030ea1	WIP	2023-12-13 12:02:58 +00:00
Christian Schwarz	b3c811731a	getpage bench: mode to rate limit per timeline	2023-12-13 10:50:51 +00:00
Christian Schwarz	b6cb717a76	WIP: add tracing_flame stuff, many more spans, looked at spans with Joonas	2023-12-07 18:07:45 +00:00
Christian Schwarz	9e128b58b4	add debug span on Timeline::get	2023-12-07 08:48:25 +00:00
Christian Schwarz	28eb4da171	add a debug span for blob_io	2023-12-07 08:17:37 +00:00
Christian Schwarz	c4f7cab042	basebackup bench: debug-log basebackup size	2023-12-06 18:07:24 +00:00
Christian Schwarz	cfff331da3	WIP: implement tracing_chrome support for utils::logging	2023-12-06 18:07:24 +00:00
Christian Schwarz	658c20bea4	jwt support; debug spans in basebackup	2023-12-06 17:20:04 +00:00
Christian Schwarz	8a555f1cf3	basebackup bench: fixup copy-pasta of wip	2023-12-05 23:55:49 +00:00
Christian Schwarz	4f79b6d140	pagebench: fixup some accidental WIP thing from last week	2023-12-05 23:55:49 +00:00
Christian Schwarz	d6b7bc2abc	implement a basebackup benchmark	2023-12-05 19:59:51 +00:00
Christian Schwarz	4fc3596677	client & getpage bench: distinguish between page_service client and client in pagestream mode	2023-12-05 19:59:51 +00:00
Christian Schwarz	60cc3a3397	pagebench: restructure dir a bit	2023-12-05 19:59:51 +00:00
Christian Schwarz	cb3dcb06cf	cargo fmt	2023-12-05 19:59:40 +00:00
Christian Schwarz	d75470280f	fixup: scale factors in the python benchmark	2023-11-24 18:16:58 +00:00
Christian Schwarz	687678c4ff	a mode where one task picks which work to do & dispatches it to per-timeline clients	2023-11-24 18:01:55 +00:00
Christian Schwarz	59c8a29569	WIP: failed attempt to have fixed number of clients going over all the key ranges of all tenants The problem is that the connections are stateful, need to implement a client pool => sucks	2023-11-24 17:12:42 +00:00
Christian Schwarz	044e96ce50	fixup: few more perecentiles	2023-11-24 16:00:59 +00:00
Christian Schwarz	12a60cd914	parameters for i3en.3xlarge (need to add more modes to the benchmark, e.g., time based)	2023-11-24 15:40:29 +00:00
Christian Schwarz	9f36d19383	few more percentiles for the benchmark	2023-11-24 15:39:02 +00:00
Christian Schwarz	a0909a2b80	make the benchmarking script work again	2023-11-24 14:58:21 +00:00
Christian Schwarz	bd06672cdd	have one HdrHistogram per thread instead of one per task	2023-11-24 14:27:52 +00:00
Christian Schwarz	f1a714e465	Revert "WIP: figure out overhead of linear histogram" This reverts commit `dc914ef368`.	2023-11-24 14:01:07 +00:00
Christian Schwarz	dc914ef368	WIP: figure out overhead of linear histogram	2023-11-24 14:00:54 +00:00
Christian Schwarz	568f6ae332	per-task & global mean + percentiles using hdrhistogram known problem is: one hdrhistogram per task => too much memory usage	2023-11-24 12:35:22 +00:00
Christian Schwarz	857150dcee	CLI structure	2023-11-24 11:19:21 +00:00
Christian Schwarz	9d13d0015f	perftest: use new binary name	2023-11-24 11:06:03 +00:00
Christian Schwarz	281f05398e	further break up	2023-11-24 11:05:55 +00:00
Christian Schwarz	0bd5e3aedc	remove unnucessary return impl Future	2023-11-24 10:56:52 +00:00
Christian Schwarz	4f1197311e	break up client into library & cli	2023-11-24 10:55:54 +00:00
Christian Schwarz	dd5792e488	WIP use results	2023-11-24 10:18:05 +00:00
Christian Schwarz	135e37e5b2	implement the performance test in the Python test suite	2023-11-24 10:17:49 +00:00
Christian Schwarz	ccb9fe9b33	find a way to duplicate a tenant in local_fs Use the script like so, against the tenant to duplicate: poetry run python3 ./test_runner/duplicate_tenant.py 7ea51af32d42bfe7fb93bf5f28114d09 200 8 backup of pageserver.toml d =1 pg_distrib_dir ='/home/admin/neon-main/pg_install' http_auth_type ='Trust' pg_auth_type ='Trust' listen_http_addr ='127.0.0.1:9898' listen_pg_addr ='127.0.0.1:64000' broker_endpoint ='http://127.0.0.1:50051/' #control_plane_api ='http://127.0.0.1:1234/' # Initial configuration file created by 'pageserver --init' #listen_pg_addr = '127.0.0.1:64000' #listen_http_addr = '127.0.0.1:9898' #wait_lsn_timeout = '60 s' #wal_redo_timeout = '60 s' #max_file_descriptors = 10000 #page_cache_size = 160000 # initial superuser role name to use when creating a new tenant #initial_superuser_name = 'cloud_admin' #broker_endpoint = 'http://127.0.0.1:50051' #log_format = 'plain' #concurrent_tenant_size_logical_size_queries = '1' #metric_collection_interval = '10 min' #cached_metric_collection_interval = '0s' #synthetic_size_calculation_interval = '10 min' #disk_usage_based_eviction = { max_usage_pct = .., min_avail_bytes = .., period = "10s"} #background_task_maximum_delay = '10s' [tenant_config] #checkpoint_distance = 268435456 # in bytes #checkpoint_timeout = 10 m #compaction_target_size = 134217728 # in bytes #compaction_period = '20 s' #compaction_threshold = 10 #gc_period = '1 hr' #gc_horizon = 67108864 #image_creation_threshold = 3 #pitr_interval = '7 days' #min_resident_size_override = .. # in bytes #evictions_low_residence_duration_metric_threshold = '24 hour' #gc_feedback = false # make it determinsitic gc_period = '0s' checkpoint_timeout = '3650 day' compaction_period = '20 s' compaction_threshold = 10 compaction_target_size = 134217728 checkpoint_distance = 268435456 image_creation_threshold = 3 [remote_storage] local_path = '/home/admin/neon-main/bench_repo_dir/repo/remote_storage_local_fs' remove http handler switch to generalized rewrite_summary & impl page_ctl subcommand to use it WIP: change duplicate_tenant.py script to use the pagectl command The script works but at restart, we detach the created tenants because they're not known to the attachment service: Detaching tenant, control plane omitted it in re-attach response tenant_id=1e399d390e3aee6b11c701cbc716bb6c => figure out how to further integrate this	2023-11-24 10:17:49 +00:00
Christian Schwarz	1b81640290	random getpage benchmark	2023-11-24 10:17:49 +00:00
Anastasia Lubennikova	2a12e9c46b	Add documentation for our sample pre-commit hook (#5868 )	2023-11-22 12:04:36 +00:00
Christian Schwarz	9e3c07611c	logging: support output to stderr (#5896 ) (part of the getpage benchmarking epic #5771) The plan is to make the benchmarking tool log on stderr and emit results as JSON on stdout. That way, the test suite can simply take captures stdout and json.loads() it, while interactive users of the benchmarking tool have a reasonable experience as well. Existing logging users continue to print to stdout, so, this change should be a no-op functionally and performance-wise.	2023-11-22 11:08:35 +00:00
Christian Schwarz	d353fa1998	refer to our rust-postgres.git fork by branch name (#5894 ) This way, `cargo update -p tokio-postgres` just works. The `Cargo.toml` communicates more clearly that we're referring to the `main` branch. And the git revision is still pinned in `Cargo.lock`.	2023-11-22 10:58:27 +00:00
Joonas Koivunen	0d10992e46	Cleanup compact_level0_phase1 fsyncing (#5852 ) While reviewing code noticed a scary `layer_paths.pop().unwrap()` then realized this should be further asyncified, something I forgot to do when I switched the `compact_level0_phase1` back to async in #4938. This keeps the double-fsync for new deltas as #4749 is still unsolved.	2023-11-21 15:30:40 +02:00
Arpad Müller	3e131bb3d7	Update Rust to 1.74.0 (#5873 ) [Release notes](https://github.com/rust-lang/rust/releases/tag/1.74.0).	2023-11-21 11:41:41 +01:00
Sasha Krassovsky	81b2cefe10	Disallow CREATE DATABASE WITH OWNER neon_superuser (#5887 ) ## Problem Currently, control plane doesn't know about neon_superuser, so if a user creates a database with owner neon_superuser it causes an exception when it tries to forward it. It is also currently possible to ALTER ROLE neon_superuser. ## Summary of changes Disallow creating database with owner neon_superuser. This is probably fine, since I don't think you can create a database with owner normal superuser. Also forbids altering neon_superuser	2023-11-20 22:39:47 +00:00
Christian Schwarz	d2ca410919	build: back to opt-level=0 in debug builds, for faster compile times (#5751 ) This change brings down incremental compilation for me from > 1min to 10s (and this is a pretty old Ryzen 1700X). More details: "incremental compilation" here means to change one character in the `failed to read value from offset` string in `image_layer.rs`. The command for incremental compilation is `cargo build_testing`. The system on which I got these numbers uses `mold` via `~/.cargo/config.toml`. As a bonus, `rust-gdb` is now at least a little fun again. Some tests are timing out in debug builds due to these changes. This PR makes them skip for debug builds. We run both with debug and release build, so, the loss of coverage is marginal. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-11-20 15:41:37 +01:00
Joonas Koivunen	d98ac04136	chore(background_tasks): missed allowed_error change, logging change (#5883 ) - I am always confused by the log for the error wait time, now it will be `2s` or `2.0s` not `2.0` - fix missed string change introduced in #5881 [evidence] [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/main/6921062837/index.html#suites/f9eba3cfdb71aa6e2b54f6466222829b/87897fe1ddee3825	2023-11-20 07:33:17 +00:00
Joonas Koivunen	ac08072d2e	fix(layer): VirtualFile opening and read errors can be caused by contention (#5880 ) A very low number of layer loads have been marked wrongly as permanent, as I did not remember that `VirtualFile::open` or reading could fail transiently for contention. Return separate errors for transient and persistent errors from `{Delta,Image}LayerInner::load`. Includes drive-by comment changes. The implementation looks quite ugly because having the same type be both the inner (operation error) and outer (critical error), but with the alternatives I tried I did not find a better way.	2023-11-19 14:57:39 +00:00
John Spray	d22dce2e31	pageserver: shut down idle walredo processes (#5877 ) The longer a pageserver runs, the more walredo processes it accumulates from tenants that are touched intermittently (e.g. by availability checks). This can lead to getting OOM killed. Changes: - Add an Instant recording the last use of the walredo process for a tenant - After compaction iteration in the background task, check for idleness and stop the walredo process if idle for more than 10x compaction period. Cc: #3620 Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Shany Pozin <shany@neon.tech>	2023-11-19 14:21:16 +00:00
Joonas Koivunen	3b3f040be3	fix(background_tasks): first backoff, compaction error stacktraces (#5881 ) First compaction/gc error backoff starts from 0 which is less than 2s what it was before #5672. This is now fixed to be the intended 2**n. Additionally noticed the `compaction_iteration` creating an `anyhow::Error` via `into()` always captures a stacktrace even if we had a stacktraceful anyhow error within the CompactionError because there is no stable api for querying that.	2023-11-19 14:16:31 +00:00
Em Sharnoff	cad0dca4b8	compute_ctl: Remove deprecated flag `--file-cache-on-disk` (#5622 ) See neondatabase/cloud#7516 for more.	2023-11-18 12:43:54 +01:00
Em Sharnoff	5d13a2e426	Improve error message when neon.max_cluster_size reached (#4173 ) Changes the error message encountered when the `neon.max_cluster_size` limit is reached. Reasoning is that this is user-visible, and so should probably use language that's closer to what users are familiar with.	2023-11-16 21:51:26 +00:00
khanova	0c243faf96	Proxy log pid hack (#5869 ) ## Problem Improve observability for the compute node. ## Summary of changes Log pid from the compute node. Doesn't work with pgbouncer.	2023-11-16 20:46:23 +00:00
Em Sharnoff	d0a842a509	Update vm-builder to v0.19.0 and move its customization here (#5783 ) ref neondatabase/autoscaling#600 for more	2023-11-16 18:17:42 +01:00
khanova	6b82f22ada	Collect number of connections by sni type (#5867 ) ## Problem We don't know the number of users with the different kind of authentication: ["sni", "endpoint in options" (A and B from [here](https://neon.tech/docs/connect/connection-errors)), "password_hack"] ## Summary of changes Collect metrics by sni kind.	2023-11-16 12:19:13 +00:00

1 2 3 4 5 ...

4083 Commits