rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-16 18:02:56 +00:00

Author	SHA1	Message	Date
Christian Schwarz	31fec1fb4b	add the script that I used to generate the delta stack non-package-mode-py3.10christian@neon-hetzner-dev-christian:[~/src/neon/test_runner]: poetry run python3 deep_layers_with_delta.py	2024-12-12 20:25:16 +01:00
Christian Schwarz	87755bf80e	concurrent-futures: poll before pushing into FuturesUnordered (this will do io_uring submission in most cases)	2024-12-12 19:37:23 +01:00
Christian Schwarz	1f7f947119	results ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 47653 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.417 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.588 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.670 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 0.991 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 1.896 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.time: 20.1797 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_cpu_seconds_total: 15.1300 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1: 2209 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.4: 9683 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.8: 16795 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.16: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.32: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.64: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.128: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.256: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.512: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1024: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.9223372036854775807: 50022 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 46773 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.425 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.632 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.837 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.234 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.347 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.time: 20.2244 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_cpu_seconds_total: 15.9300 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1: 2072 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.4: 9362 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.8: 16558 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.16: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.32: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.64: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.128: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.256: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.512: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1024: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.9223372036854775807: 49119 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 47861 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.416 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.609 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.713 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.135 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 1.940 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.time: 20.1969 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_cpu_seconds_total: 15.7000 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1: 2282 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.4: 9597 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.8: 16926 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.16: 50189 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.32: 50189 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.64: 50189 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.128: 50189 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.256: 50189 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.512: 50189 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1024: 50189 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.9223372036854775807: 50189 ============================================================================================================================================================================================= 3 passed, 236 deselected in 101.20s (0:01:41) ==============================================================================================================================================================================================	2024-12-12 14:51:03 +01:00
Christian Schwarz	8b477ce4ee	super hacky way to get layer visit buckets	2024-12-12 14:50:47 +01:00
Christian Schwarz	02d0d89069	run bench on hetzner box christian@neon-hetzner-dev-christian:[~/src/neon]: NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS=true DEFAULT_PG_VERSION=16 BUILD_TYPE=release poetry run pytest --alluredir ~/tmp/alluredir --clean-alluredir test_runner/performance -k 'test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant' --maxfail=1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 46336 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.429 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.607 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.705 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.138 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.059 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 48772 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.408 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.592 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.701 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.139 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.231 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 47102 test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.422 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.605 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.705 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.124 ms test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.029 ms ============================================================================================================================================================================================= 3 passed, 236 deselected in 102.19s (0:01:42) ==============================================================================================================================================================================================	2024-12-12 14:18:51 +01:00
Christian Schwarz	309edebb90	repurpose test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant to measure latency improvement in unbatchable-pagestream but parallelizable workload (multiple layers visited)	2024-12-12 14:18:42 +01:00
Christian Schwarz	80aebce3d6	benc results on my box christian@neon-hetzner-dev-christian:[~/src/neon]: DEFAULT_PG_VERSION=16 BUILD_TYPE=release poetry run pytest --alluredir ~/tmp/alluredir --clean-alluredir test_runner/performance/pageserver/test_page_service_batching.py -k 'test_throughput' --maxfail=1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.0999 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.0450 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.2530 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.3367 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.0713 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.0550 test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2825 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.5882 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.1176 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.5882 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3012 test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4734 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.3162 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.8000 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.5333 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.8000 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3227 test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4442 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2842 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.7647 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.1176 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.7647 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3135 test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4740 ==================================================================================================================================================================================================== 6 passed, 9 deselected in 55.08s ====================================================================================================================================================================================================	2024-12-12 13:57:24 +01:00
Christian Schwarz	71b6aa2ab7	implement futuresunordered mode	2024-12-12 13:55:35 +01:00
Christian Schwarz	e051e916b6	results on my hetzner box ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.0955 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.0250 test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.1962 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.2700 test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2611 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.5000 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.0556 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.5000 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.2850 test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4775 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.3033 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.6875 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.0625 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.6875 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3075 test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4777 ==================================================================================================================================================================================================== 4 passed, 6 deselected in 38.06s ====================================================================================================================================================================================================	2024-12-12 13:37:13 +01:00
Christian Schwarz	7f55a32edb	parametrization over direct io mode (only direct io for now)	2024-12-12 13:36:05 +01:00
Christian Schwarz	3779370f08	results on my hetzner box ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.time: 0.7328 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 0.8850 test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.time: 0.7545 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 0.9667 test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.1824 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.1111 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 297.8889 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.1111 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.2196 test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4883 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2743 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.6667 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.4444 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.6667 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3350 test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4501 test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_mean: 0.145 ms test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p95: 0.178 ms test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p99: 0.199 ms test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p99.9: 0.265 ms test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p99.99: 0.366 ms test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_mean: 0.168 ms test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p95: 0.201 ms test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p99: 0.224 ms test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p99.9: 0.317 ms test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p99.99: 0.416 ms test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.149 ms test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.184 ms test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.205 ms test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.289 ms test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.359 ms test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.180 ms test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.219 ms test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.244 ms test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.341 ms test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.522 ms test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.154 ms test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.189 ms test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.211 ms test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.268 ms test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.307 ms test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.167 ms test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.211 ms test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.238 ms test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.338 ms test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.407 ms ===================================================================================================================================================================================================== 10 passed in 120.65s (0:02:00) =====================================================================================================================================================================================================	2024-12-12 13:32:12 +01:00
Christian Schwarz	659da352a2	DO NOT MERGE trim down benchmark to what's relevant	2024-12-12 13:31:55 +01:00
Christian Schwarz	0e8ac43450	hacky parametrization of relevant benchmarks	2024-12-12 13:28:00 +01:00
Christian Schwarz	410283c9b1	serial mode without tasks	2024-12-12 12:26:15 +01:00
Christian Schwarz	4dc7434d0e	Merge remote-tracking branch 'origin/main' into vlad/read-path-concurrent-io	2024-12-10 13:22:03 +01:00
Erik Grinaker	ad472bd4a1	test_runner: add visibility map test (#9940 ) Verifies that visibility map pages are correctly maintained across shards. Touches #9914.	2024-12-10 12:07:00 +00:00
Arpad Müller	c51db1db61	Replace MAX_KEYS_PER_DELETE constant with function (#10061 ) Azure has a different per-request limit of 256 items for bulk deletion compared to the number of 1000 on AWS. Therefore, we need to support multiple values. Due to `GenericRemoteStorage`, we can't add an associated constant, but it has to be a function. The PR replaces the `MAX_KEYS_PER_DELETE` constant with a function of the same name, implemented on both the `RemoteStorage` trait as well as on `GenericRemoteStorage`. The value serves as hint of how many objects to pass to the `delete_objects` function. Reading: * https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch * https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html Part of #7931	2024-12-10 11:29:38 +00:00
Ivan Efremov	34c1295594	[proxy] impr: Additional logging for cancellation queries (#10039 ) ## Problem Since cancellation tasks spawned in the background sometimes logs missing context. https://neondb.slack.com/archives/C060N3SEF9D/p1733427801527419?thread_ts=1733419882.560159&cid=C060N3SEF9D ## Summary of changes Add `session_id` and change loglevel for cancellation queries	2024-12-10 10:14:28 +00:00
Evan Fleming	b593e51eae	safekeeper: use arc for global timelines and config (#10051 ) Hello! I was interested in potentially making some contributions to Neon and looking through the issue backlog I found [8200](https://github.com/neondatabase/neon/issues/8200) which seemed like a good first issue to attempt to tackle. I see it was assigned a while ago so apologies if I'm stepping on any toes with this PR. I also apologize for the size of this PR. I'm not sure if there is a simple way to reduce it given the footprint of the components being changed. ## Problem This PR is attempting to address part of the problem outlined in issue [8200](https://github.com/neondatabase/neon/issues/8200). Namely to remove global static usage of timeline state in favour of `Arc<GlobalTimelines>` and to replace wasteful clones of `SafeKeeperConf` with `Arc<SafeKeeperConf>`. I did not opt to tackle `RemoteStorage` in this PR to minimize the amount of changes as this PR is already quite large. I also did not opt to introduce an `SafekeeperApp` wrapper struct to similarly minimize changes but I can tackle either or both of these omissions in this PR if folks would like. ## Summary of changes - Remove static usage of `GlobalTimelines` in favour of `Arc<GlobalTimelines>` - Wrap `SafeKeeperConf` in `Arc` to avoid wasteful clones of the underlying struct ## Some additional thoughts - We seem to currently store `SafeKeeperConf` in `GlobalTimelines` and then expose it through a public`get_global_config` function which requires locking. This seems needlessly wasteful and based on observed usage we could remove this public accessor and force consumers to acquire `SafeKeeperConf` through the new Arc reference.	2024-12-09 21:09:20 +00:00
Alex Chi Z.	4c4cb80186	fix(pageserver): fix gc-compaction racing with legacy gc (#10052 ) ## Problem close https://github.com/neondatabase/neon/issues/10049, close https://github.com/neondatabase/neon/issues/10030, close https://github.com/neondatabase/neon/issues/8861 part of https://github.com/neondatabase/neon/issues/9114 The legacy gc process calls `get_latest_gc_cutoff`, which uses a Rcu different than the gc_info struct. In the gc_compaction_smoke test case, the "latest" cutoff could be lower than the gc_info struct, causing gc-compaction to collect data that could be accessed by `latest_gc_cutoff`. Technically speaking, there's nothing wrong with gc-compaction using gc_info without considering latest_gc_cutoff, because gc_info is the source of truth. But anyways, let's fix it. ## Summary of changes * gc-compaction uses `latest_gc_cutoff` instead of gc_info to determine the gc horizon. * if a gc-compaction is scheduled via tenant compaction iteration, it will take the gc_block lock to avoid racing with functionalities like detach ancestor (if it's triggered via manual compaction API without scheduling, then it won't take the lock) --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-09 20:06:06 +00:00
a-masterov	92273b6d5e	Enable the pg_regress tests on staging for PG17 (#9978 ) ## Problem Currently, we run the `pg_regress` tests only for PG16 However, PG17 is a part of Neon and should be tested as well ## Summary of changes Modified the workflow and added a patch for PG17 enabling the `pg_regress` tests. The problem with leftovers was solved by using branches.	2024-12-09 19:30:39 +00:00
Arpad Müller	e74e7aac93	Use updated patched azure SDK crates (#10036 ) For a while already, we've been unable to update the Azure SDK crates due to Azure adopting use of a non-tokio async runtime, see #7545. The effort to upstream the fix got stalled, and I think it's better to switch to a patched version of the SDK that is up to date. Now we have a fork of the SDK under the neondatabase github org, to which I have applied Conrad's rebased patches to: https://github.com/neondatabase/azure-sdk-for-rust/tree/neon . The existence of a fork will also help with shipping bulk delete support before it's upstreamed (#7931). Also, in related news, the Azure SDK has gotten a rift in development, where the main branch pertains to a future, to-be-officially-blessed release of the SDK, and the older versions, which we are currently using, are on the `legacy` branch. Upstream doesn't really want patches for the `legacy` branch any more, they want to focus on the `main` efforts. However, even then, the `legacy` branch is still newer than what we are having right now, so let's switch to `legacy` for now. Depending on how long it takes, we can switch to the official version of the SDK once it's released or switch to the upstream `main` branch if there is changes we want before that. As a nice side effect of this PR, we now use reqwest 0.12 everywhere, dropping the dependency on version 0.11. Fixes #7545	2024-12-09 15:50:06 +00:00
Vlad Lazar	4cca5cdb12	deps: update url to 2.5.4 for RUSTSEC-2024-0421 (#10059 ) ## Problem See https://rustsec.org/advisories/RUSTSEC-2024-0421 ## Summary of changes Update url crate to 2.5.4.	2024-12-09 14:57:42 +00:00
Arpad Müller	9d425b54f7	Update AWS SDK crates (#10056 ) Result of running: cargo update -p aws-types -p aws-sigv4 -p aws-credential-types -p aws-smithy-types -p aws-smithy-async -p aws-sdk-kms -p aws-sdk-iam -p aws-sdk-s3 -p aws-config We want to keep the AWS SDK up to date as that way we benefit from new developments and improvements.	2024-12-09 12:46:59 +00:00
John Spray	ec790870d5	storcon: automatically clear Pause/Stop scheduling policies to enable detaches (#10011 ) ## Problem We saw a tenant get stuck when it had been put into Pause scheduling mode to pin it to a pageserver, then it was left idle for a while and the control plane tried to detach it. Close: https://github.com/neondatabase/neon/issues/9957 ## Summary of changes - When changing policy to Detached or Secondary, set the scheduling policy to Active. - Add a test that exercises this - When persisting tenant shards, set their `generation_pageserver` to null if the placement policy is not Attached (this enables consistency checks to work, and avoids leaving state in the DB that could be confusing/misleading in future)	2024-12-07 13:05:09 +00:00
Christian Schwarz	4d7111f240	page_service: don't count time spent flushing towards smgr latency metrics (#10042 ) ## Problem In #9962 I changed the smgr metrics to include time spent on flush. It isn't under our (=storage team's) control how long that flush takes because the client can stop reading requests. ## Summary of changes Stop the timer as soon as we've buffered up the response in the `pgb_writer`. Track flush time in a separate metric. --------- Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-12-07 08:57:55 +00:00
Alex Chi Z.	b1fd086c0c	test(pageserver): disable gc_compaction smoke test for now (#10045 ) ## Problem The test is flaky. ## Summary of changes Disable the test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-06 22:30:04 +00:00
Heikki Linnakangas	b6eea65597	Fix error message if PS connection is lost while receiving prefetch (#9923 ) If the pageserver connection is lost while receiving the prefetch request, the prefetch queue is cleared. The error message prints the values from the prefetch slot, but because the slot was already cleared, they're all zeros: LOG: [NEON_SMGR] [shard 0] No response from reading prefetch entry 0: 0/0/0.0 block 0. This can be caused by a concurrent disconnect To fix, make local copies of the values. In the passing, also add a sanity check that if the receive() call succeeds, the prefetch slot is still intact.	2024-12-06 20:56:57 +00:00
Alex Chi Z.	c42c28b339	feat(pageserver): gc-compaction split job and partial scheduler (#9897 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114, stacked PR over #9809 The compaction scheduler now schedules partial compaction jobs. ## Summary of changes * Add the compaction job splitter based on size. * Schedule subcompactions using the compaction scheduler. * Test subcompaction scheduler in the smoke regress test. * Temporarily disable layer map checks --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-06 18:44:26 +00:00
Tristan Partin	e4837b0a5a	Bump sql_exporter to 0.16.0 (#10041 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-06 17:43:55 +00:00
Erik Grinaker	14c4fae64a	test_runner/performance: add improved bulk insert benchmark (#9812 ) Adds an improved bulk insert benchmark, including S3 uploads. Touches #9789.	2024-12-06 15:17:15 +00:00
Vlad Lazar	cc70fc802d	pageserver: add metric for number of wal records received by each shard (#10035 ) ## Problem With the current metrics we can't identify which shards are ingesting data at any given time. ## Summary of changes Add a metric for the number of wal records received for processing by each shard. This is per (tenant, timeline, shard).	2024-12-06 12:51:41 +00:00
Alexey Kondratov	fa07097f2f	chore: Reorganize and refresh CODEOWNERS (#10008 ) ## Problem We didn't have a codeowner for `/compute`, so nobody was auto-assigned for PRs like #9973 ## Summary of changes While on it: 1. Group codeowners into sections. 2. Remove control plane from the `/compute_tools` because it's primarily the internal `compute_ctl` code. 3. Add control plane (and compute) to `/libs/compute_api` because that's the shared public interface of the compute.	2024-12-06 11:44:50 +00:00
Erik Grinaker	7838659197	pageserver: assert that keys belong to shard (#9943 ) We've seen cases where stray keys end up on the wrong shard. This shouldn't happen. Add debug assertions to prevent this. In release builds, we should be lenient in order to handle changing key ownership policies. Touches #9914.	2024-12-06 10:24:13 +00:00
Vlad Lazar	3f1c542957	pageserver: add disk consistent and remote lsn metrics (#10005 ) ## Problem There's no metrics for disk consistent LSN and remote LSN. This stuff is useful when looking at ingest performance. ## Summary of changes Two per timeline metrics are added: `pageserver_disk_consistent_lsn` and `pageserver_projected_remote_consistent_lsn`. I went for the projected remote lsn instead of the visible one because that more closely matches remote storage write tput. Ideally we would have both, but these metrics are expensive.	2024-12-06 10:21:52 +00:00
Erik Grinaker	ec4072f845	pageserver: add `wait_until_flushed` parameter for timeline checkpoint (#10013 ) ## Problem I'm writing an ingest benchmark in #9812. To time S3 uploads, I need to schedule a flush of the Pageserver's in-memory layer, but don't actually want to wait around for it to complete (which will take a minute). ## Summary of changes Add a parameter `wait_until_flush` (default `true`) for `timeline/checkpoint` to control whether to wait for the flush to complete.	2024-12-06 10:12:39 +00:00
Erik Grinaker	56f867bde5	pageserver: only zero truncated FSM page on owning shard (#10032 ) ## Problem FSM pages are managed like regular relation pages, and owned by a single shard. However, when truncating the FSM relation the last FSM page was zeroed out on all shards. This is unnecessary and potentially confusing. The superfluous keys will be removed during compactions, as they do not belong on these shards. Resolves #10027. ## Summary of changes Only zero out the truncated FSM page on the owning shard.	2024-12-06 07:22:22 +00:00
Arpad Müller	d1ab7471e2	Fix desc_str for Azure container (#10021 ) Small logs fix I've noticed while working on https://github.com/neondatabase/cloud/issues/19963 .	2024-12-05 20:51:57 +00:00
Tristan Partin	6ff4175fd7	Send Content-Type header on reconfigure request from neon_local (#10029 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 20:30:35 +00:00
Tristan Partin	6331cb2161	Bump anyhow to 1.0.94 (#10028 ) We were over a year out of date. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 19:42:52 +00:00
Alex Chi Z.	71f38d1354	feat(pageserver): support schedule gc-compaction (#9809 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 gc-compaction can take a long time. This patch adds support for scheduling a gc-compaction job. The compaction loop will first handle L0->L1 compaction, and then gc compaction. The scheduled jobs are stored in a non-persistent queue within the tenant structure. This will be the building block for the partial compaction trigger -- if the system determines that we need to do a gc compaction, it will partition the keyspace and schedule several jobs. Each of these jobs will run for a short amount of time (i.e, 1 min). L0 compaction will be prioritized over gc compaction. ## Summary of changes * Add compaction scheduler in tenant. * Run scheduled compaction in integration tests. * Change the manual compaction API to allow schedule a compaction instead of immediately doing it. * Add LSN upper bound as gc-compaction parameter. If we schedule partial compactions, gc_cutoff might move across different runs. Therefore, we need to pass a pre-determined gc_cutoff beforehand. (TODO: support LSN lower bound so that we can compact arbitrary "rectangle" in the layer map) * Refactor the gc_compaction internal interface. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-05 19:37:17 +00:00
Tristan Partin	c0ba416967	Add compute_logical_snapshots_bytes metric (#9887 ) This metric exposes the size of all non-temporary logical snapshot files. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 19:04:33 +00:00
Alexey Kondratov	13e8105740	feat(compute): Allow specifying the reconfiguration concurrency (#10006 ) ## Problem We need a higher concurrency during reconfiguration in case of many DBs, but the instance is already running and used by the client. We can easily get out of `max_connections` limit, and the current code won't handle that. ## Summary of changes Default to 1, but also allow control plane to override this value for specific projects. It's also recommended to bump `superuser_reserved_connections` += `reconfigure_concurrency` for such projects to ensure that we always have enough spare connections for reconfiguration process to succeed. Quick workaround for neondatabase/cloud#17846	2024-12-05 17:57:25 +00:00
Erik Grinaker	db79304416	storage_controller: increase shard scan timeout (#10000 ) ## Problem The node shard scan timeout of 1 second is a bit too aggressive, and we've seen this cause test failures. The scans are performed in parallel across nodes, and the entire operation has a 15 second timeout. Resolves #9801. ## Summary of changes Increase the timeout to 5 seconds. This is still enough to time out on a network failure and retry successfully within 15 seconds.	2024-12-05 17:29:21 +00:00
Ivan Efremov	ffc9c33eb2	proxy: Present new auth backend cplane_proxy_v1 (#10012 ) Implement a new auth backend based on the current Neon backend to switch to the new Proxy V1 cplane API. Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)	2024-12-05 05:30:38 +00:00
Yuchen Liang	ed2d892113	pageserver: fix buffered-writer on macos build (#10019 ) ## Problem In https://github.com/neondatabase/neon/pull/9693, we forgot to check macos build. The [CI run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468) on main showed that macos build failed with unused variables and dead code. ## Summary of changes - add `allow(dead_code)` and `allow(unused_variables)` to the relevant code that is not used on macos. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-12-05 02:16:09 +00:00
Conrad Ludgate	131585eb6b	chore: update rust-postgres (#10002 ) Like #9931 but without rebasing upstream just yet, to try and minimise the differences. Removes all proxy-specific commits from the rust-postgres fork, now that proxy no longer depends on them. Merging upstream changes to come later.	2024-12-04 21:07:44 +00:00
Conrad Ludgate	0bab7e3086	chore: update clap (#10009 ) This updates clap to use a new version of anstream	2024-12-04 17:42:17 +00:00
Yuchen Liang	e6cd5050fc	pageserver: make `BufferedWriter` do double-buffering (#9693 ) Closes #9387. ## Problem `BufferedWriter` cannot proceed while the owned buffer is flushing to disk. We want to implement double buffering so that the flush can happen in the background. See #9387. ## Summary of changes - Maintain two owned buffers in `BufferedWriter`. - The writer is in charge of copying the data into owned, aligned buffer, once full, submit it to the flush task. - The flush background task is in charge of flushing the owned buffer to disk, and returned the buffer to the writer for reuse. - The writer and the flush background task communicate through a bi-directional channel. For in-memory layer, we also need to be able to read from the buffered writer in `get_values_reconstruct_data`. To handle this case, we did the following - Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`, and use `Arc` to share it between writer and background task. - leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`, one clone will be submitted to the channel, the other clone will be saved within the writer to serve reads. When we want to reuse the buffer, we can invoke `IoBuffer::into_mut`, which gives us back the mutable aligned buffer. - InMemoryLayer reads is now aware of the maybe_flushed part of the buffer. Caveat - We removed the owned version of write, because this interface does not work well with buffer alignment. The result is that without direct IO enabled, [`download_object`](`a439d57050/pageserver/src/tenant/remote_timeline_client/download.rs (L243)`) does one more memcpy than before this PR due to the switch to use `_borrowed` version of the write. - "Bypass aligned part of write" could be implemented later to avoid large amount of memcpy. Testing - use an oneshot channel based control mechanism to make flush behavior deterministic in test. - test reading from `EphemeralFile` when the last submitted buffer is not flushed, in-progress, and done flushing to disk. ## Performance We see performance improvement for small values, and regression on big values, likely due to being CPU bound + disk write latency. [Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-04 16:54:56 +00:00
John Spray	60c0d19f57	tests: make storcon scale test AZ-aware (#9952 ) ## Problem We have a scale test for the storage controller which also acts as a good stress test for scheduling stability. However, it created nodes with no AZs set. ## Summary of changes - Bump node count to 6 and set AZs on them. This is a precursor to other AZ-related PRs, to make sure any new code that's landed is getting scale tested in an AZ-aware environment.	2024-12-04 15:04:04 +00:00

1 2 3 4 5 ...

6764 Commits