rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-18 21:50:37 +00:00

Author	SHA1	Message	Date
Alek Westover	b4abbfe6fb	fix clippy; use oncelock instead of mutex because it is more appropriate	2023-06-28 16:49:04 -04:00
Alek Westover	a6c9a4abe7	cache extensions	2023-06-28 14:51:55 -04:00
Alek Westover	ce0da2889e	cache available libraries: increases efficiency, and reduces code duplication	2023-06-28 11:48:55 -04:00
Anastasia Lubennikova	7357b7cad5	Code cleanup	2023-06-27 15:29:33 +03:00
Alek Westover	a2e154f07b	real s3 and tenant specific files	2023-06-26 15:25:20 -04:00
Anastasia Lubennikova	1104de0b9b	refactoring - enable CREATE EXTENSION and LOAD test - change test_file_download to use mock_s3 - some code cleanup - add caching of extensions_list - WIP downloading of shared_preload_libraries (not tested yet)	2023-06-26 18:54:38 +03:00
Alek Westover	a8f848b5de	test reals3	2023-06-23 15:51:49 -04:00
Alek Westover	ca59330df8	modify file names	2023-06-23 15:28:30 -04:00
Alek Westover	36bb5ad527	refactor	2023-06-23 14:47:48 -04:00
Alek Westover	8bc128e474	real s3 tests	2023-06-23 13:41:08 -04:00
Alek Westover	c3994541eb	add real s3 tests	2023-06-23 13:26:28 -04:00
Anastasia Lubennikova	7cdcc8a500	Fix downloading of sql files for extension and libraries. Rust code refactoring and C code fixes. Add test for CREATE EXTENSION and LOAD 'library'	2023-06-23 20:25:14 +03:00
Alek Westover	31aa0283b0	More Extension Features (#4555 ) Added tenant specific extensions and more tests	2023-06-23 09:30:49 -04:00
Alek Westover	9c35c06c58	small refactor	2023-06-22 14:24:59 -04:00
Alek Westover	44ac7a45be	Merge branch 'main' into extension_server	2023-06-22 10:30:19 -04:00
Alek Westover	a79b0d69c4	made remote_ext_config an optional parameter	2023-06-22 10:21:07 -04:00
Anastasia Lubennikova	2f618f46be	Use BUILD_TAG in compute_ctl binary. (#4541 ) Pass BUILD_TAG to compute_ctl binary. We need it to access versioned extension storage.	2023-06-22 17:06:16 +03:00
Alek Westover	bf3b83b504	fix code style for clippy	2023-06-22 09:37:07 -04:00
Alek Westover	f984f9e7d3	seems close to working	2023-06-21 15:25:06 -04:00
Alek Westover	605c30e5c5	fixed an issue where pgconfig was pointing at global installation of postgres rather than the correct local version	2023-06-21 14:34:24 -04:00
Alek Westover	0b11d8e836	replaced download_files function with more appropriate download_extensions function	2023-06-21 14:01:44 -04:00
Alek Westover	02a1d4d8c1	refactoring a bit	2023-06-21 11:32:04 -04:00
Alek Westover	559e318328	remote dead imports	2023-06-21 11:01:45 -04:00
Alek Westover	bb414e5a0a	removing debugging	2023-06-21 10:45:37 -04:00
Alek Westover	c99e203094	I think it's working	2023-06-21 10:10:02 -04:00
Alek Westover	356f7d3a7e	more debugging	2023-06-20 22:39:02 -04:00
Alek Westover	890061d371	arg passing is mostly working	2023-06-20 19:35:13 -04:00
Alek Westover	6b74d1a76a	partils	2023-06-19 15:25:53 -04:00
Alek Westover	a936b8a92b	add ext cli args	2023-06-16 17:05:39 -04:00
Alek Westover	c7bea52849	adding command line argument	2023-06-16 16:58:13 -04:00
Anastasia Lubennikova	34f22e9b12	Request extension files from compute_ctl	2023-06-13 16:52:37 +02:00
Heikki Linnakangas	df3bae2ce3	Use `compute_ctl` to manage Postgres in tests. (#3886 ) This adds test coverage for 'compute_ctl', as it is now used by all the python tests. There are a few differences in how 'compute_ctl' is called in the tests, compared to the real web console: - In the tests, the postgresql.conf file is included as one large string in the spec file, and it is written out as it is to the data directory. I added a new field for that to the spec file. The real web console, however, sets all the necessary settings in the 'settings' field, and 'compute_ctl' creates the postgresql.conf from those settings. - In the tests, the information needed to connect to the storage, i.e. tenant_id, timeline_id, connection strings to pageserver and safekeepers, are now passed as new fields in the spec file. The real web console includes them as the GUCs in the 'settings' field. (Both of these are different from what the test control plane used to do: It used to write the GUCs directly in the postgresql.conf file). The plan is to change the control plane to use the new method, and remove the old method, but for now, support both. Some tests that were sensitive to the amount of WAL generated needed small changes, to accommodate that compute_ctl runs the background health monitor which makes a few small updates. Also some tests shut down the pageserver, and now that the background health check can run some queries while the pageserver is down, that can produce a few extra errors in the logs, which needed to be allowlisted. Other changes: - remove obsolete comments about PostgresNode; - create standby.signal file for Static compute node; - log output of `compute_ctl` and `postgres` is merged into `endpoints/compute.log`. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-06-06 14:59:36 +01:00
Heikki Linnakangas	66b06e416a	Pass tracing context in env variables instead of the spec file. (#4174 ) If compute_ctl is launched without a spec file, it fetches it from the control plane with an HTTP request. We cannot get the startup tracing context from the compute spec in that case, because we don't have it available on start. We could still read the tracing context from the compute spec after we have fetched it, but that would leave the fetch itself out of the context. Pass the tracing context in environment variables instead.	2023-05-09 17:08:02 +03:00
Alexey Kondratov	7ba5c286b7	[compute_ctl] Improve 'empty' compute startup sequence (#4034 ) Do several attempts to get spec from the control-plane and retry network errors and all reasonable HTTP response codes. Do not hang waiting for spec without confirmation from the control-plane that compute is known and is in the `Empty` state. Adjust the way we track `total_startup_ms` metric, it should be calculated since the moment we received spec, not from the moment `compute_ctl` started. Also introduce a new `wait_for_spec_ms` metric to track the time spent sleeping and waiting for spec to be delivered from control-plane. Part of neondatabase/cloud#3533	2023-04-21 11:10:48 +02:00
Alexey Kondratov	db8dd6f380	[compute_ctl] Implement live reconfiguration (#3980 ) With this commit one can request compute reconfiguration from the running `compute_ctl` with compute in `Running` state by sending a new spec: ```shell curl -d "{\"spec\": $(cat ./compute-spec-new.json)}" http://localhost:3080/configure ``` Internally, we start a separate configurator thread that is waiting on `Condvar` for `ConfigurationPending` compute state in a loop. Then it does reconfiguration, sets compute back to `Running` state and notifies other waiters. It will need some follow-ups, e.g. for retry logic for control-plane requests, but should be useful for testing in the current state. This shouldn't affect any existing environment, since computes are configured in a different way there. Resolves neondatabase/cloud#4433	2023-04-13 18:07:29 +02:00
Heikki Linnakangas	6064a26963	Refactor 'spec' in ComputeState. Sometimes, it contained real values, sometimes just defaults if the spec was not received yet. Make the state more clear by making it an Option instead. One consequence is that if some of the required settings like neon.tenant_id are missing from the spec file sent to the /configure endpoint, it is spotted earlier and you get an immediate HTTP error response. Not that it matters very much, but it's nicer nevertheless.	2023-04-12 01:55:40 +03:00
Alexey Kondratov	40a68e9077	[compute_ctl] Add timeout for `tracing_utils::shutdown_tracing()` (#3982 ) Shutting down OTEL tracing provider may hang for quite some time, see, for example: - https://github.com/open-telemetry/opentelemetry-rust/issues/868 - and our problems with staging https://github.com/neondatabase/cloud/issues/3707#issuecomment-1493983636 Yet, we want computes to shut down fast enough, as we may need a new one for the same timeline ASAP. So wait no longer than 2s for the shutdown to complete, then just error out and exit the main thread. Related to neondatabase/cloud#3707	2023-04-11 15:05:35 +02:00
Heikki Linnakangas	f0b2e076d9	Move compute_ctl structs used in HTTP API and spec file to separate crate. This is in preparation of using compute_ctl to launch postgres nodes in the neon_local control plane. And seems like a good idea to separate the public interfaces anyway. One non-mechanical change here is that the 'metrics' field is moved under the Mutex, instead of using atomics. We were not using atomics for performance but for convenience here, and it seems more clear to not use atomics in the model for the HTTP response type.	2023-04-09 21:52:28 +03:00
Alexey Kondratov	e42982fb1e	[compute_ctl] Empty computes and /configure API (#3963 ) This commit adds an option to start compute without spec and then pass it a valid spec via `POST /configure` API endpoint. This is a main prerequisite for maintaining the pool of compute nodes in the control-plane. For example: 1. Start compute with ```shell cargo run --bin compute_ctl -- -i no-compute \ -p http://localhost:9095 \ -D compute_pgdata \ -C "postgresql://cloud_admin@127.0.0.1:5434/postgres" \ -b ./pg_install/v15/bin/postgres ``` 2. Configure it with ```shell curl -d "{\"spec\": $(cat ./compute-spec.json)}" http://localhost:3080/configure ``` Internally, it's implemented using a `Condvar` + `Mutex`. Compute spec is moved under Mutex, as it's now could be updated in the http handler. Also `RwLock` was replaced with `Mutex` because the latter works well with `Condvar`. First part of the neondatabase/cloud#4433	2023-04-06 21:21:58 +02:00
Lassi Pölönen	41d364a8f1	Add more detailed logging to compute_ctl's shutdown (#3915 ) Currently we don't see from the logs, if shutting down tracing takes long time or not. We do see that shutting down computes gets delayed for some reason and hits thhe grace period limit. Moving the shutdown message to slightly later, when we don't have anything else than just exit left. ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-30 22:02:39 +03:00
Heikki Linnakangas	6fdd9c10d1	Read storage auth token from spec file. We read the pageserver connection string from the spec file, so let's read the auth token from the same place. We've been talking about pre-launching compute nodes that are not associated with any particular tenant at startup, so that the spec file is delivered to the compute node later. We cannot change the env variables after the process has been launched. We still pass the token to 'postgres' binary in the NEON_AUTH_TOKEN env variable, but compute_ctl is now responsible for setting it.	2023-03-21 20:12:09 +02:00
Sam Kleinman	c79dd8d458	compute_ctl: support for fetching spec from control plane (#3610 )	2023-02-23 13:19:39 -05:00
sharnoff	2153d2e00a	Run compute_ctl in a cgroup in VMs (#3577 )	2023-02-17 14:14:41 -08:00
Heikki Linnakangas	3e94fd5af3	Inherit OpenTelemetry context for compute startup from cloud console. This allows fine-grained distributed tracing of the 'start_compute' operation from the cloud console. The startup actions performed by 'compute_ctl' are now performed in a child of the 'start_compute' context, so you can trace through the whole compute start operation. This needs a corresponding change in the cloud console to fill in the 'startup_tracing_context' field in the json spec. If it's missing, the startup operations are simply traced as a separate trace, without a parent.	2023-01-26 15:20:03 +02:00
Heikki Linnakangas	006ee5f94a	Configure 'compute_ctl' to use OpenTelemetry exporter. This allows tracing the startup actions e.g. with Jaeger (https://www.jaegertracing.io/). We use the "tracing-opentelemetry" crate, which turns tracing spans into OpenTelemetry spans, so you can use the usual "#[instrument]" directives to add tracing. I put the tracing initialization code to a separate crate, `tracing-utils`, so that we can reuse it in other programs. We probably want to set up tracing in the same way in all our programs. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-01-26 15:20:03 +02:00
Heikki Linnakangas	e5cc2f92c4	Switch to 'tracing' for logging, restructure code to make use of spans. Refactors Compute::prepare_and_run. It's split into subroutines differently, to make it easier to attach tracing spans to the different stages. The high-level logic for waiting for Postgres to exit is moved to the caller. Replace 'env_logger' with 'tracing', and add `#instrument` directives to different stages fo the startup process. This is a fairly mechanical change, except for the changes in 'spec.rs'. 'spec.rs' contained some complicated formatting, where parts of log messages were printed directly to stdout with `print`s. That was a bit messed up because the log normally goes to stderr, but those lines were printed to stdout. In our docker images, stderr and stdout both go to the same place so you wouldn't notice, but I don't think it was intentional. This changes the log format to the default 'tracing_subscriber::format' format. It's different from the Postgres log format, however, and because both compute_tools and Postgres print to the same log, it's now a mix of two different formats. I'm not sure how the Grafana log parsing pipeline can handle that. If it's a problem, we can build custom formatter to change the compute_tools log format to be the same as Postgres's, like it was before this commit, or we can change the Postgres log format to match tracing_formatter's, or we can start printing compute_tool's log output to a different destination than Postgres	2023-01-18 19:42:47 +02:00
sharnoff	5c6a7a17cb	Add VM informant to vm-compute-node (#3324 ) The general idea is that the VM informant binary is added to the vm-compute-node images only. `compute_tools` then will run whatever's at `/bin/vm-informant`, if the path exists.	2023-01-16 07:05:29 -08:00
Vadim Kharitonov	9b71215906	Simplify some functions in compute_tools and fix typo errors in func name	2022-12-22 15:05:43 +01:00
Kirill Bulatov	c4ee62d427	Bump clap and other minor dependencies (#2623 )	2022-10-17 12:58:40 +03:00
Heikki Linnakangas	d865892a06	Print full error with stacktrace, if compute node startup fails. It failed in staging environment a few times, and all we got in the logs was: ERROR could not start the compute node: failed to get basebackup@0/2D6194F8 from pageserver host=zenith-us-stage-ps-2.local port=6400 giving control plane 30s to collect the error before shutdown That's missing all the detail on why it failed.	2022-07-29 16:41:55 +03:00

1 2

62 Commits