rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-29 19:10:38 +00:00

Author	SHA1	Message	Date
Arpad Müller	9d93dd4807	Rename hyper 1.0 to hyper and hyper 0.14 to hyper0 (#9254 ) Follow-up of #9234 to give hyper 1.0 the version-free name, and the legacy version of hyper the one with the version number inside. As we move away from hyper 0.14, we can remove the `hyper0` name piece by piece. Part of #9255	2024-10-03 16:33:43 +02:00
Anastasia Lubennikova	ce73db9316	Fix post_apply_config() (#9220 ) Bring back post_apply_config() step that was accidentally removed in `78938d1`	2024-10-01 16:28:58 +01:00
Folke Behrens	2e508b1ff9	Upgrade OpenTelemetry and other tracing crates (#9200 ) * tracing-utils now returns a `Layer` impl. Removes the need for crates to import OTel crates. * Drop the /v1/traces URI check. Verified that the code does the right thing. * Leave a TODO to hook in an error handler for OTel to log errors to when it assumes the regular pipeline cannot be used/is broken.	2024-10-01 11:02:54 +02:00
Arthur Petukhovsky	ba498a630a	Set disk quotas on bind in compute_ctl (#8936 ) Part of https://github.com/neondatabase/cloud/issues/13127. Resolves #9153 What changed in this PR: 1. Adds `ComputeSpec.disk_quota_bytes: Option<u64>` 2. Adds new arg to compute_ctl: `--set-disk-quota-for-fs <mountpoint>` 3. Implements running `/neonvm/bin/set-disk-quota` with the right value if both cmdline arg AND field in the spec are specified 4. Patches `/etc/sudoers.d` to allow `compute_ctl` to set quota with sudo This PR is very similar to the swap support added earlier, you can take a look at it as prior art: #7434 In theory, it can be implemented outside of compute_ctl when we will have a separate neonvm daemon, but we are not there yet. Current implementation is the simplest possible to unblock computes with larger disks. All code related to usage of `/neonvm/bin/set-disk-quota` is located in `disk_quota.rs`. We need to call this script with the following arguments: `/neonvm/bin/set-disk-quota {size_kb} {mountpoint}`. Quotas are set on the filesystem level, so we need to provide path to the directory that filesystem was mounted to. I tested this change locally with https://github.com/neondatabase/cloud/pull/17270. It should be safe to merge, because this feature is gated by both cmdline arg and field in the spec. If control-plane doesn't set values in both places, compute_ctl won't be affected by this change.	2024-09-27 20:52:22 +01:00
Yuchen Liang	42ef08db47	fix(pageserver): LSN lease edge cases around restarts/migrations (#9055 ) Part of #7497, closes #8817. ## Problem See #8817. ## Summary of changes compute_ctl - Renew lsn lease as soon as `/configure` updates pageserver_connstr, use `state_changed` Condvar for synchronization. pageserver As mentioned in https://github.com/neondatabase/neon/issues/8817#issuecomment-2315768076, we still want some permanent error reported if a lease cannot be granted. By considering attachment mode and the added `lsn_lease_deadline` when processing lease requests, we can also bound the case of bad requests to a very short period after migration/restart. - Refactor https://github.com/neondatabase/neon/pull/9024 and move `lsn_lease_deadline` to `AttachedTenantConf` so timeline can easily access it. - Have separate HTTP `init_lsn_lease` and libpq `renew_lsn_lease` API. - Always do LSN verification for the initial HTTP lease request. - LSN verification for the renewal is still done when tenants are not in `AttachedSingle` and we have pass the `lsn_lease_deadline`, which give plenty of time for compute to renew the lease. neon_local - add and call `timeline_init_lsn_lease` mgmt_api at static endpoint start. The initial lsn lease http request is sent when we run `cargo neon endpoint start <static endpoint>`. ## Testing - Extend `test_readonly_node_gc` to do pageserver restarts and migration. ## Future Work - The control plane should make the initial lease request through HTTP when creating a static endpoint. This is currently only done in `neon_local`. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-27 09:56:52 -04:00
Tristan Partin	fc962c9605	Use long options when calling initdb Verbosity in this case is good when reading the code. Short options are better when operating in an interactive shell. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-27 08:22:16 -05:00
Heikki Linnakangas	02cdd37b56	Dump backtrace if a core dump is called just "core" (#9125 ) I hope this lets us capture backtraces in CI. At least it makes it work on my laptop, which is valuable even if we need to do more for CI. See issue #2800.	2024-09-27 15:36:24 +03:00
Arthur Petukhovsky	80e974d05b	fix(compute_ctl): race condition in configurator (#9162 ) There was a tricky race condition in compute_ctl, that sometimes makes configurator skip updates. It makes a deadlock because: - control-plane cannot configure compute, because it's in ConfigurationPending state - compute_ctl doesn't do any reconfiguration because `configurator_main_loop` missed notification for it Full sequence that reproduces the issue: 1. `start_compute` finishes works and changes status `self.set_status(ComputeStatus::Running);` 2. configurator received update about `Running` state and dropped the mutex lock in the iteration 3. `/configure` request was triggered at the same time as step 1, and got the mutex lock 4. same `/configure` request set the spec and updated the state to `ConfigurationPending`, also sent a notification 5. next iteration in configurator got the mutex lock, but missed the notification There are more details in this slack thread: https://neondb.slack.com/archives/C03438W3FLZ/p1727281028478689?thread_ts=1727261220.483799&cid=C03438W3FLZ --------- Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-09-26 15:42:17 +01:00
Heikki Linnakangas	d211f00f05	Remove unnecessary dependencies (#9000 ) Found by "cargo machete"	2024-09-17 17:55:45 +03:00
Tristan Partin	5876c441ab	Grant access to pg_show_replication_origin_status for neon_superuser Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-16 16:38:55 +01:00
Matthias van de Meent	78938d1b59	[compute/postgres] feature: PostgreSQL 17 (#8573 ) This adds preliminary PG17 support to Neon, based on RC1 / 2024-09-04 `07b828e9d4` NOTICE: The data produced by the included version of the PostgreSQL fork may not be compatible with the future full release of PostgreSQL 17 due to expected or unexpected future changes in magic numbers and internals. DO NOT EXPECT DATA IN V17-TENANTS TO BE COMPATIBLE WITH THE 17.0 RELEASE! Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-09-12 23:18:41 +01:00
Arpad Müller	cbcd4058ed	Fix 1.82 clippy lint too_long_first_doc_paragraph (#8941 ) Addresses the 1.82 beta clippy lint `too_long_first_doc_paragraph` by adding newlines to the first sentence if it is short enough, and making a short first sentence if there is the need.	2024-09-06 14:33:52 +02:00
Andrew Rudenko	acc075071d	feat(compute_ctl): add periodic `lease lsn` requests for static computes (#7994 ) Part of #7497 ## Problem Static computes pinned at some fix LSN could be created initially within PITR interval but eventually go out it. To make sure that Static computes are not affected by GC, we need to start using the LSN lease API (introduced in #8084) in compute_ctl. ## Summary of changes compute_ctl - Spawn a thread for when a static compute starts to periodically ping pageserver(s) to make LSN lease requests. - Add `test_readonly_node_gc` to test if static compute can read all pages without error. - (test will fail on main without the code change here) page_service - `wait_or_get_last_lsn` will now allow `request_lsn` less than `latest_gc_cutoff_lsn` to proceed if there is a lease on `request_lsn`. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-08-28 19:09:26 +00:00
Arpad Müller	8c828c586e	Wait for completion of the upload queue in flush_frozen_layer (#8550 ) Makes `flush_frozen_layer` add a barrier to the upload queue and makes it wait for that barrier to be reached until it lets the flushing be completed. This gives us backpressure and ensures that writes can't build up in an unbounded fashion. Fixes #7317	2024-08-02 13:07:12 +02:00
Tristan Partin	ba17025a57	Run each migration in its own transaction Previously, every migration was run in the same transaction. This is preparatory work for fixing CVE-2024-4317.	2024-07-16 12:12:29 -05:00
Tristan Partin	b5ab055526	Rename compute migrations to start at 1 This matches what we put into the neon_migration.migration_id table.	2024-07-16 12:12:29 -05:00
Conrad Ludgate	411a130675	Fix nightly warnings 2024 june (#8151 ) ## Problem new clippy warnings on nightly. ## Summary of changes broken up each commit by warning type. 1. Remove some unnecessary refs. 2. In edition 2024, inference will default to `!` and not `()`. 3. Clippy complains about doc comment indentation 4. Fix `Trait + ?Sized` where `Trait: Sized`. 5. diesel_derives triggering `non_local_defintions`	2024-07-12 13:58:04 +01:00
Sasha Krassovsky	82b9a44ab4	Grant execute on snapshot functions to neon_superuser (#8346 ) ## Problem I need `neon_superuser` to be allowed to create snapshots for replication tests ## Summary of changes Adds a migration that grants these functions to neon_superuser	2024-07-11 20:29:35 +00:00
Stas Kelvich	6bbd34a216	Enable core dumps for postgres (#8272 ) Set core rmilit to ulimited in compute_ctl, so that all child processes inherit it. We could also set rlimit in relevant startup script, but that way we would depend on external setup and might inadvertently disable it again (core dumping worked in pods, but not in VMs with inittab-based startup).	2024-07-11 10:20:14 +03:00
Tristan Partin	abc330e095	Add an application_name to more Neon connections Helps identify connections in the logs.	2024-07-09 12:42:09 -05:00
Tristan Partin	6d3cb222ee	Refactor how migrations are ran Just a small improvement I noticed while looking at fixing CVE-2024-4317 in Neon.	2024-07-09 12:42:09 -05:00
Alexey Kondratov	84b039e615	compute_ctl: Use 'fast' shutdown for Postgres termination (#8289 ) ## Problem We currently use 'immediate' mode in the most commonly used shutdown path, when the control plane calls a `compute_ctl` API to terminate Postgres inside compute without waiting for the actual pod / VM termination. Yet, 'immediate' shutdown doesn't create a shutdown checkpoint and ROs have bad times figuring out the list of running xacts during next start. ## Summary of changes Use 'fast' mode, which creates a shutdown checkpoint that is important for ROs to get a list of running xacts faster instead of going through the CLOG. On the control plane side, we poll this `compute_ctl` termination API for 10s, it should be enough as we don't really write any data at checkpoint time. If it times out, we anyway switch to the slow k8s-based termination. See https://www.postgresql.org/docs/current/server-shutdown.html for the list of modes and signals. The default VM shutdown hook already uses `fast` mode, see [1] [1] `c9fd8d7693/vm-image-spec.yaml (L30-L31)` Related to #6211	2024-07-08 19:54:02 +02:00
Japin Li	cdaed4d79c	Fix outdated comment (#8149 ) Commit `97b48c23f` changes the log wait timeout from 1 second to 100 milliseconds but forgets to update the comment.	2024-07-03 13:55:36 -04:00
Tristan Partin	5700233a47	Add application_name to compute activity monitor connection string This was missed in my previous attempt to mark every connection string with an application name. See `0c3e3a8667`.	2024-06-27 12:38:15 -07:00
Heikki Linnakangas	fdadd6a152	Remove primary_is_running (#8162 ) This was a half-finished mechanism to allow a replica to enter hot standby mode sooner, without waiting for a running-xacts record. It had issues, and we are working on a better mechanism to replace it. The control plane might still set the flag in the spec file, but compute_ctl will simply ignore it.	2024-06-26 15:13:03 +03:00
MMeent	e729f28205	Fix log rates (#8035 ) ## Summary of changes - Stop logging HealthCheck message passing at INFO level (moved to DEBUG) - Stop logging /status accesses at INFO (moved to DEBUG) - Stop logging most occurances of `missing config file "compute_ctl_temp_override.conf"` - Log memory usage only when the data has changed significantly, or if we've not recently logged the data, rather than always every 2 seconds.	2024-06-17 18:57:49 +00:00
Tristan Partin	0c3e3a8667	Set application_name for internal connections to computes This will help when analyzing the origins of connections to a compute like in [0]. [0]: https://github.com/neondatabase/cloud/issues/14247	2024-06-13 12:06:10 -07:00
Tristan Partin	26c68f91f3	Move SQL migrations out of line It makes them much easier to reason about, and allows other SQL tooling to operate on them like language servers, formatters, etc. I also brought back the removed migrations such that we can more easily understand what they were. I included a "-- SKIP" comment describing why those migrations are now skipped. We no longer skip migrations by checking if it is empty, but instead check to see if the migration starts with "-- SKIP".	2024-06-07 08:35:55 -07:00
Em Sharnoff	00d66e8012	compute_ctl: Fix handling of missing /neonvm/bin/resize-swap (#7832 ) The logic added in the original PR (#7434) only worked before sudo was used, because 'sudo foo' will only fail with NotFound if 'sudo' doesn't exist; if 'foo' doesn't exist, then sudo will fail with a normal error exit. This means that compute_ctl may fail to restart if it exits after successfully enabling swap.	2024-05-21 16:52:48 -07:00
Andrew Rudenko	923cf91aa4	compute_ctl: catalog API endpoints (#7575 ) ## Problem There are two cloud's features that require extra compute endpoints. 1. We are running pg_dump to get DB schemas. Currently, we are using a special service for this. But it would be great to execute pg_dump in an isolated environment. And we already have such an environment, it's our compute! And likely enough pg_dump already exists there too! (see https://github.com/neondatabase/cloud/issues/11644#issuecomment-2084617832) 2. We need to have a way to get databases and roles from compute after time travel (see https://github.com/neondatabase/cloud/issues/12109) ## Summary of changes It adds two API endpoints to compute_ctl HTTP API that target both of the aforementioned cases. --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-05-16 12:04:16 +02:00
Em Sharnoff	b827e7b330	compute_ctl: Fix unused variable on non-Linux (#7646 ) Introduced by refactorings from #7577. See an example check-macos-build failure here: https://github.com/neondatabase/neon/actions/runs/8992211409/job/24701531264	2024-05-07 22:35:23 +00:00
Em Sharnoff	26b1483204	compute_ctl: Lift drop(startup_context_guard) into main() (#7577 ) Part of applying the changes from #7600. This piece technically can change the semantics because now the context guard is held before process_cli, but... the difference is likely quite small. Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-05-07 13:58:46 -07:00
Em Sharnoff	d709bcba81	compute_ctl: Break up main() into discrete phases (#7577 ) This commit is intentionally designed to have as small a diff as possible. To that end, the basic idea is that each distinct "chunk" of the previous main() has been wrapped in its own function, with the return values from each function being passed directly into the next. The structure of main() is now visible from its contents, which have a handful of smaller functions. There's a lot of other work that can / should(?) be done beyond this, but I figure that's more opinionated, and this should be a solid start. Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-05-07 13:58:46 -07:00
Em Sharnoff	b158a5eda0	compute_ctl: Non-functional prep changes to reduce diff (#7577 ) A couple lines moved further down in main(), and one case of using Option<&str> instead of Option<&String>.	2024-05-07 13:58:46 -07:00
Heikki Linnakangas	4deb8dc52e	compute_ctl: Be more precise in how startup time is calculated (#7601 ) - On a non-pooled start, do not reset the 'start_time' after launching the HTTP service. In a non-pooled start, it's fair to include that in the total startup time. - When setting wait_for_spec_ms and resetting start_time, call Utc::now() only once. It's a waste of cycles to call it twice, but also, it ensures the time between setting wait_for_spec_ms and resetting start_time is included in one or the other time period. These differences should be insignificant in practice, in the microsecond range, but IMHO it seems more logical and readable this way too. Also fix and clarify some of the surrounding comments. (This caught my eye while reviewing PR #7577)	2024-05-04 08:44:18 +03:00
Em Sharnoff	64f0613edf	compute_ctl: Add support for swap resizing (#7434 ) Part of neondatabase/cloud#12047. Resolves #7239. In short, this PR: 1. Adds `ComputeSpec.swap_size_bytes: Option<u64>` 2. Adds a flag to compute_ctl: `--resize-swap-on-bind` 3. Implements running `/neonvm/bin/resize-swap` with the value from the compute spec before starting postgres, if both the value in the spec AND the flag are specified. 4. Adds `sudo` to the final image 5. Adds a file in `/etc/sudoers.d` to allow `compute_ctl` to resize swap Various bits of reasoning about design decisions in the added comments. In short: We have both a compute spec field and a flag to make rollout easier to implement. The flag will most likely be removed as part of cleanups for neondatabase/cloud#12047.	2024-05-03 12:57:45 -07:00
Arpad Müller	426598cf76	Update rust to 1.78.0 (#7598 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. Release notes: https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html Prior update was in #7198	2024-05-03 15:59:28 +02:00
Alex Chi Z	cb4b40f9c1	chore(compute_ctl): add error context to apply_spec (#7374 ) Make it faster to identify which part of apply spec goes wrong by adding an error context. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-04-17 09:11:04 +03:00
Em Sharnoff	f86845f64b	compute_ctl: Auto-set dynamic_shared_memory_type (#7348 ) Part of neondatabase/cloud#12047. The basic idea is that for our VMs, we want to enable swap and disable Linux memory overcommit. Alongside these, we should set postgres' dynamic_shared_memory_type to mmap, but we want to avoid setting it to mmap if swap is not enabled. Implementing this in the control plane would be fiddly, but it's relatively straightforward to add to compute_ctl.	2024-04-10 13:13:48 +00:00
Alex Chi Z	3ab9f56f5f	fixup(#7278/compute_ctl): remote extension download permission (#7280 ) Fix #7278 ## Summary of changes * Explicitly create the extension download directory and assign correct permissoins. * Fix the problem that the extension download failure will cause all future downloads to fail. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-29 17:59:30 +00:00
Alex Chi Z	90be79fcf5	spec: allow neon extension auto-upgrade + softfail upgrade (#7231 ) reverts https://github.com/neondatabase/neon/pull/7128, unblocks https://github.com/neondatabase/cloud/issues/10742 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-28 17:22:35 +00:00
Sasha Krassovsky	24c5a5ac16	Revert "Revoke REPLICATION" (#7261 ) Reverts neondatabase/neon#7052	2024-03-27 18:07:51 +00:00
Arpad Müller	3ee34a3f26	Update Rust to 1.77.0 (#7198 ) Release notes: https://blog.rust-lang.org/2024/03/21/Rust-1.77.0.html Thanks to #6886 the diff is reasonable, only for one new lint `clippy::suspicious_open_options`. I added `truncate()` calls to the places where it is obviously the right choice to me, and added allows everywhere else, leaving it for followups. I had to specify cargo install --locked because the build would fail otherwise. This was also recommended by upstream.	2024-03-22 06:52:31 +00:00
Tristan Partin	041b653a1a	Add state diagram for compute Models a compute's lifetime.	2024-03-20 17:10:46 -05:00
Alex Chi Z	76c44dc140	spec: disable neon extension auto upgrade (#7128 ) This pull request disables neon extension auto upgrade to help the next compute image upgrade smooth. ## Summary of changes We have two places to auto-upgrade neon extension: during compute spec update, and when the compute node starts. The compute spec update logic is always there, and the compute node start logic is added in https://github.com/neondatabase/neon/pull/7029. In this pull request, we disable both of them, so that we can still roll back to an older version of compute before figuring out the best way of extension upgrade-downgrade. https://github.com/neondatabase/neon/issues/6936 We will enable auto-upgrade in the next release following this release. There are no other extension upgrades from release 4917 and therefore after this pull request, it would be safe to revert to release 4917. Impact: * Project created after unpinning the compute image -> if we need to roll back, they will stuck, because the default neon extension version is 1.3. Need to manually pin the compute image version if such things happen. * Projects already stuck on staging due to not downgradeable -> I don't know their current status, maybe they are already running the latest compute image? * Other projects -> can be rolled back to release 4917. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-14 19:45:38 +00:00
Arseny Sher	0cf0731d8b	SIGQUIT instead of SIGKILL prewarmed postgres. To avoid orphaned processes using wiped datadir with confusing logging.	2024-03-11 22:36:52 +04:00
Sasha Krassovsky	4834d22d2d	Revoke REPLICATION (#7052 ) ## Problem Currently users can cause problems with replication ## Summary of changes Don't let them replicate	2024-03-08 22:24:30 +00:00
Sasha Krassovsky	2fc89428c3	Hopefully stabilize test_bad_connection.py (#6976 ) ## Problem It seems that even though we have a retry on basebackup, it still sometimes fails to fetch it with the failpoint enabled, resulting in a test error. ## Summary of changes If we fail to get the basebackup, disable the failpoint and try again.	2024-03-07 10:12:06 -08:00
Alex Chi Z	0b330e1310	upgrade neon extension on startup (#7029 ) ## Problem Fix https://github.com/neondatabase/neon/issues/7003. Fix https://github.com/neondatabase/neon/issues/6982. Currently, neon extension is only upgraded when new compute spec gets applied, for example, when creating a new role or creating a new database. This also resolves `neon.lfc_stat` not found warnings in prod. ## Summary of changes This pull request adds the logic to spawn a background thread to upgrade the neon extension version if the compute is a primary. If for whatever reason the upgrade fails, it reports an error to the console and does not impact compute node state. This change can be further applied to 3rd-party extension upgrades. We can silently upgrade the version of 3rd party extensions in the background in the future. Questions: * Does alter extension takes some kind of lock that will block user requests? * Does `ALTER EXTENSION` writes to the database if nothing needs to be upgraded? (may impact storage size). Otherwise it's safe to land this pull request. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-06 12:20:44 -05:00
Alex Chi Z	b7db912be6	compute_ctl: only try zenith_admin if could not authenticate (#6955 ) ## Problem Fix https://github.com/neondatabase/neon/issues/6498 ## Summary of changes Only re-authenticate with zenith_admin if authentication fails. Otherwise, directly return the error message. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-04 14:28:45 -05:00

1 2 3 4 5 ...

255 Commits