rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-14 08:52:56 +00:00

Author	SHA1	Message	Date
Bojan Serafimov	c74337e4a1	try using im crate	2023-01-13 02:27:02 -05:00
Bojan Serafimov	76055c7bbc	clippy	2023-01-12 20:16:33 -05:00
Bojan Serafimov	fe0851f2c4	clippy	2023-01-12 20:07:08 -05:00
Bojan Serafimov	21c85b1969	cargo hakari generate	2023-01-12 19:47:33 -05:00
Bojan Serafimov	5dc99f1b04	Fmt	2023-01-12 19:46:51 -05:00
Bojan Serafimov	2fcbb46338	Remove false assertion	2023-01-12 18:52:15 -05:00
Bojan Serafimov	87c3e55449	Add early return to get_difficulty_map for 10x speedup	2023-01-12 13:23:59 -05:00
Bojan Serafimov	9886741828	Compare against bruteforce, fix bugs	2023-01-11 18:04:44 -05:00
Bojan Serafimov	5557fc6062	Add todo	2023-01-11 14:02:55 -05:00
Bojan Serafimov	dbb5d0800d	Fix lsn bound	2023-01-11 13:59:26 -05:00
Bojan Serafimov	e392d25828	Improve image_layer_exists docstring	2023-01-11 12:54:13 -05:00
Bojan Serafimov	2951be5386	Improve search docstring	2023-01-11 12:52:15 -05:00
Bojan Serafimov	7c6af6d729	Comments	2023-01-10 21:43:32 -05:00
Bojan Serafimov	cb5b9375d2	Organize modules, rename structs	2023-01-10 21:03:26 -05:00
Bojan Serafimov	d3d17f2c7c	Simplify	2023-01-10 17:30:40 -05:00
Bojan Serafimov	c3f5e00ad1	Comments	2023-01-10 15:32:18 -05:00
Bojan Serafimov	d0095d4457	Comments	2023-01-10 15:02:06 -05:00
Bojan Serafimov	7d057e1038	Simplify	2023-01-10 11:52:01 -05:00
Bojan Serafimov	f22437086e	Rename file	2023-01-10 11:09:02 -05:00
Bojan Serafimov	7c6909c31f	Merge branch 'main' into immutable_bst_layer_map	2023-01-10 11:02:45 -05:00
Sergey Melnikov	95bf19b85a	Add --atomic to all helm upgrade operations (#3299 ) When number of github actions workers is changed, some jobs get killed. When helm if killed during the upgrade, release stuck in pending-upgrade state. --atomic should initiate automatic rollback in this case.	2023-01-10 10:05:27 +00:00
Vadim Kharitonov	80d4afab0c	Update tokio version (RUSTSEC-2023-0001)	2023-01-10 09:02:00 +01:00
Arthur Petukhovsky	0807522a64	Enable wss proxy in all regions (#3292 ) Follow-up to https://github.com/neondatabase/helm-charts/pull/24 and #3247	2023-01-09 19:56:12 +00:00
Christian Schwarz	8eebd5f039	run on-demand compaction in a task_mgr task With this patch, tenant_detach and timeline_delete's task_mgr::shutdown_tasks() call will wait for on-demand compaction to finish. Before this patch, the on-demand compaction would grab the layer_removal_cs after tenant_detach / timeline_delete had removed the timeline directory. This resulted in error No such file or directory (os error 2) NB: I already implemented this pattern for ondemand GC a while back. fixes https://github.com/neondatabase/neon/issues/3136	2023-01-09 19:08:22 +01:00
Heikki Linnakangas	8c07ef413d	Minor cleanup of test_ondemand_download_timetravel test. - Fix and improve comments - Rename 'physical_size' local variable to 'resident_size' for clarity. - Remove one 'unnecessary wait_for_upload' call. The 'wait_for_sk_commit_lsn_to_reach_remote_storage' call after shutting down compute is sufficient.	2023-01-09 18:56:50 +02:00
Sergey Melnikov	14df37c108	Use GHA environments for gradual prod rollout (#3295 ) Each release will wait for manual approval for each region	2023-01-09 20:18:16 +04:00
Christian Schwarz	d4d0aa6ed6	gc_iteration_internal: better log message & debug log level if nothing to do fixes https://github.com/neondatabase/neon/issues/3107	2023-01-09 13:53:59 +01:00
Kirill Bulatov	a457256fef	Fix log message matching (#3291 ) Spotted https://neon-github-public-dev.s3.amazonaws.com/reports/main/debug/3871991071/index.html#suites/158be07438eb5188d40b466b6acfaeb3/22966d740e33b677/ failing on `main`, fixes that by using a proper regex match string. Also removes one clippy lint suppression.	2023-01-09 14:25:12 +02:00
Shany Pozin	3a22e1335d	Adding a PR template (#3288 ) ## Describe your changes Added a PR template ## Issue ticket number and link #3162 ## Checklist before requesting a review - [ ] I have performed a self-review of my code - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-01-09 12:15:53 +00:00
Sergey Melnikov	93c77b0383	Use GHA environment for per-region deploy approvals on staging (#3293 ) Each main deploy will wait for manual approval for each region	2023-01-09 15:40:14 +04:00
Shany Pozin	7920b39a27	Adding transition reason to the log when a tenant is moved to Broken state (#3289 ) #3160	2023-01-09 10:24:50 +02:00
Kirill Bulatov	23d5e2bdaa	Fix common pg port in the CLI basics test (#3283 ) Closes https://github.com/neondatabase/neon/issues/3282	2023-01-07 00:46:42 +02:00
Christian Schwarz	3526323bc4	prepare Timeline::get_reconstruct_data for becoming async (#3271 ) This patch restructures the code so that PR https://github.com/neondatabase/neon/pull/3228 can seamlessly replace the return PageReconstructResult::NeedsDownload with a download_remote_layer().await. Background: PR https://github.com/neondatabase/neon/pull/3228 will turn get_reconstruct_data() async and do the on-demand download right in place, instead of returning a PageReconstructResult::NeedsDownload. Current rustc requires that the layers lock guard be not in scope across an await point. For on-demand download inside get_reconstruct_data(), we need to do download_remote_layer().await. Supersedes https://github.com/neondatabase/neon/pull/3260 See my comment there: https://github.com/neondatabase/neon/pull/3260#issuecomment-1370752407 Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-01-06 19:42:25 +02:00
Heikki Linnakangas	af9425394f	Print time taken by CREATE/ALTER DATABASE at compute start. Trying to investigate why the "apply_config" stage is taking longer than expected. This proves or disproves that it's the CREATE DATABASE statement.	2023-01-06 17:50:44 +02:00
Arthur Petukhovsky	debd134b15	Implement wss support in proxy (#3247 ) This is a hacky implementation of WebSocket server, embedded into our postgres proxy. The server is used to allow https://github.com/neondatabase/serverless to connect to our postgres from browser and serverless javascript functions. How it will work (general schema): - browser opens a websocket connection to `wss://ep-abc-xyz-123.xx-central-1.aws.neon.tech/` - proxy accepts this connection and terminates TLS (https) - inside encrypted tunnel (HTTPS), browser initiates plain (non-encrypted) postgres connection - proxy performs auth as in usual plain pg connection and forwards connection to the compute Related issue: #3225	2023-01-06 18:34:18 +03:00
Heikki Linnakangas	df42213dbb	Fix missing COMMIT in handle_role_deletions. There was no COMMIT, so the DROP ROLE commands were always implicitly rolled back. Fixes issue #3279.	2023-01-06 17:07:46 +02:00
Kirill Bulatov	b6237474d2	Fix README and basic startup example (#3275 ) Follow-up of https://github.com/neondatabase/neon/pull/3270 which made an example from main README.md not working. Fixes that, by adding a way to specify a default tenant now and modifies the basic neon_local test to start postgres and check branching. Not all neon_local commands are implemented, so not all README.md contents is tested yet.	2023-01-06 12:26:14 +02:00
Bojan Serafimov	a2642966f2	Add arbitrary key partitioning for benchmark	2023-01-05 20:21:19 -05:00
Heikki Linnakangas	8b710b9753	Fix segfault if pageserver connection is lost during backend startup. It's not OK to return early from within a PG_TRY-CATCH block. The PG_TRY macro sets the global PG_exception_stack variable, and PG_END_TRY restores it. If we jump out in between with "return NULL", the PG_exception_stack is left to point to garbage. (I'm surprised the comments in PG_TRY_CATCH don't warn about this.) Add test that re-attaches tenant in pageserver while Postgres is running. If the tenant is detached while compute is connected and busy running queries, those queries will fail if they try to fetch any pages. But when the tenant is re-attached, things should start working again, without disconnecting the client <-> postgres connections. Without this fix, this reproduced the segfault. Fixes issue #3231	2023-01-05 18:51:47 +02:00
Heikki Linnakangas	c187de1101	Copy error message before it's freed. pageserver_disconnect() call invalidates 'pageserver_conn', including the error message pointer we got from PQerrorMessage(pageserver_conn). Copy the message to a temporary variable before disconnecting, like we do in a few other places. In the passing, clear 'pageserver_conn_wes' variable in a few places where it was free'd. I didn't see any live bug from this, but since pageserver_disconnect() checks if it's NULL, let's not leave it dangling to already-free'd memory.	2023-01-05 18:51:47 +02:00
Kirill Bulatov	8712e1899e	Move initial timeline creation into pytest (#3270 ) For every Python test, we start the storage first, and expect that later, in the test, when we start a compute, it will work without specific timeline and tenant creation or their IDs specified. For that, we have a concept of "default" branch that was created on the control plane level first, but that's not needed at all, given that it's only Python tests that need it: let them create the initial timeline during set-up. Before, control plane started and stopped pageserver for timeline creation, now Python harness runs an extra tenant creation request on test env init. I had to adjust the metrics test, turns out it registered the metrics from the default tenant after an extra pageserver restart. New model does not sent the metrics before the collection time happens, and that was 30s before.	2023-01-05 17:48:27 +02:00
Christian Schwarz	d7f1e30112	remote_timeline_client: more metrics & metrics-related cleanups - Clean up redundant metric removal in TimelineMetrics::drop. RemoteTimelineClientMetrics is responsible for cleaning up REMOTE_OPERATION_TIME andREMOTE_UPLOAD_QUEUE_UNFINISHED_TASKS. - Rename `pageserver_remote_upload_queue_unfinished_tasks` to `pageserver_remote_timeline_client_calls_unfinished`. The new name reflects that the metric is with respect to the entire call to remote timeline client. This includes wait time in the upload queue and hence it's a longer span than what `pageserver_remote_OPERATION_seconds` measures. - Add the `pageserver_remote_timeline_client_calls_started` histogram. See the metric description for why we need it. - Add helper functions `call_begin` etc to `RemoteTimelineClientMetrics` to centralize the logic for updating the metrics above (they relate to each other, see comments in code). - Use these constructs to track ongoing downloads in `pageserver_remote_timeline_client_calls_unfinished` refs https://github.com/neondatabase/neon/issues/2029 fixes https://github.com/neondatabase/neon/issues/3249 closes https://github.com/neondatabase/neon/pull/3250	2023-01-05 11:50:17 +01:00
Christian Schwarz	6a9d1030a6	use RemoteTimelineClient for downloading index part during tenant_attach Before this change, we would not .measure_remote_op for index part downloads. And more generally, it's good to pass not just uploads but also downloads through RemoteTimelineClient, e.g., if we ever want to implement some timeline-scoped policies there. Found this while working on https://github.com/neondatabase/neon/pull/3250 where I add a metric to measure the degree of concurrent downloads. Layer download was missing in a test that I added there.	2023-01-05 11:08:50 +01:00
Bojan Serafimov	3d6bc126ed	Partially implement get_difficulty_map bench	2023-01-05 03:23:10 -05:00
Bojan Serafimov	fb6569c880	Implement get_difficulty_map	2023-01-05 02:50:48 -05:00
Bojan Serafimov	115549261c	Add get_difficulty_map method	2023-01-05 01:58:18 -05:00
Bojan Serafimov	01e09fc56c	Cleanup bench_layer_map	2023-01-05 01:35:36 -05:00
Heikki Linnakangas	8c6e607327	Refactor send_tarball() (#3259 ) The Basebackup struct is really just a convenient place to carry the various parameters around in send_tarball and its subroutines. Make it internal to the send_tarball function.	2023-01-04 23:03:16 +02:00
Vadim Kharitonov	f436fb2dfb	Fix panics at compute_ctl:monitor	2023-01-04 17:26:42 +01:00
Kirill Bulatov	8932d14d50	Revert "Run Python tests in 8 threads (#3206 )" (#3264 ) This reverts commit `56a4466d0a`. Seems that flackiness increased after this commit, while the time decrease was a couple of seconds. With every regular Python test spawing 1 etcd, 3 safekeepers, 1 pageserver, few CLI commands and post-run cleanup hooks, it might be hard to run many such tests in parallel. We could return to this later, after we consider alternative test structure and/or CI runner structure.	2023-01-04 17:31:51 +02:00

1 2 3 4 5 ...

2671 Commits