rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-03 19:42:55 +00:00

Author	SHA1	Message	Date
Tristan Partin	041b653a1a	Add state diagram for compute Models a compute's lifetime.	2024-03-20 17:10:46 -05:00
Alex Chi Z	76c44dc140	spec: disable neon extension auto upgrade (#7128 ) This pull request disables neon extension auto upgrade to help the next compute image upgrade smooth. ## Summary of changes We have two places to auto-upgrade neon extension: during compute spec update, and when the compute node starts. The compute spec update logic is always there, and the compute node start logic is added in https://github.com/neondatabase/neon/pull/7029. In this pull request, we disable both of them, so that we can still roll back to an older version of compute before figuring out the best way of extension upgrade-downgrade. https://github.com/neondatabase/neon/issues/6936 We will enable auto-upgrade in the next release following this release. There are no other extension upgrades from release 4917 and therefore after this pull request, it would be safe to revert to release 4917. Impact: * Project created after unpinning the compute image -> if we need to roll back, they will stuck, because the default neon extension version is 1.3. Need to manually pin the compute image version if such things happen. * Projects already stuck on staging due to not downgradeable -> I don't know their current status, maybe they are already running the latest compute image? * Other projects -> can be rolled back to release 4917. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-14 19:45:38 +00:00
Arseny Sher	0cf0731d8b	SIGQUIT instead of SIGKILL prewarmed postgres. To avoid orphaned processes using wiped datadir with confusing logging.	2024-03-11 22:36:52 +04:00
Sasha Krassovsky	4834d22d2d	Revoke REPLICATION (#7052 ) ## Problem Currently users can cause problems with replication ## Summary of changes Don't let them replicate	2024-03-08 22:24:30 +00:00
Sasha Krassovsky	2fc89428c3	Hopefully stabilize test_bad_connection.py (#6976 ) ## Problem It seems that even though we have a retry on basebackup, it still sometimes fails to fetch it with the failpoint enabled, resulting in a test error. ## Summary of changes If we fail to get the basebackup, disable the failpoint and try again.	2024-03-07 10:12:06 -08:00
Alex Chi Z	0b330e1310	upgrade neon extension on startup (#7029 ) ## Problem Fix https://github.com/neondatabase/neon/issues/7003. Fix https://github.com/neondatabase/neon/issues/6982. Currently, neon extension is only upgraded when new compute spec gets applied, for example, when creating a new role or creating a new database. This also resolves `neon.lfc_stat` not found warnings in prod. ## Summary of changes This pull request adds the logic to spawn a background thread to upgrade the neon extension version if the compute is a primary. If for whatever reason the upgrade fails, it reports an error to the console and does not impact compute node state. This change can be further applied to 3rd-party extension upgrades. We can silently upgrade the version of 3rd party extensions in the background in the future. Questions: * Does alter extension takes some kind of lock that will block user requests? * Does `ALTER EXTENSION` writes to the database if nothing needs to be upgraded? (may impact storage size). Otherwise it's safe to land this pull request. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-06 12:20:44 -05:00
Alex Chi Z	b7db912be6	compute_ctl: only try zenith_admin if could not authenticate (#6955 ) ## Problem Fix https://github.com/neondatabase/neon/issues/6498 ## Summary of changes Only re-authenticate with zenith_admin if authentication fails. Otherwise, directly return the error message. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-04 14:28:45 -05:00
Arpad Müller	82853cc1d1	Fix warnings and compile errors on nightly (#6886 ) Nightly has added a bunch of compiler and linter warnings. There is also two dependencies that fail compilation on latest nightly due to using the old `stdsimd` feature name. This PR fixes them.	2024-03-01 17:14:19 +01:00
Alex Chi Z	b2bbc20311	fix: only alter default privileges when public schema exists (#6914 ) ## Problem Following up https://github.com/neondatabase/neon/pull/6885, only alter default privileges when the public schema exists. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-26 11:48:56 -09:00
Anastasia Lubennikova	a12e4261a3	Add neon.primary_is_running GUC. (#6705 ) We set it for neon replica, if primary is running. Postgres uses this GUC at the start, to determine if replica should wait for RUNNING_XACTS from primary or not. Corresponding cloud PR is https://github.com/neondatabase/cloud/pull/10183 * Add test hot-standby replica startup. * Extract oldest_running_xid from XlRunningXits WAL records. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-23 13:56:41 +00:00
Alex Chi Z	12487e662d	compute_ctl: move default privileges grants to handle_grants (#6885 ) ## Problem Following up https://github.com/neondatabase/neon/pull/6884, hopefully, a real final fix for https://github.com/neondatabase/neon/issues/6236. ## Summary of changes `handle_migrations` is done over the main `postgres` db connection. Therefore, the privileges assigned here do not work with databases created later (i.e., `neondb`). This pull request moves the grants to `handle_grants`, so that it runs for each DB created. The SQL is added into the `BEGIN/END` block, so that it takes only one RTT to apply all of them. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-22 17:00:03 -05:00
Alex Chi Z	837988b6c9	compute_ctl: run migrations to grant default grantable privileges (#6884 ) ## Problem Following up on https://github.com/neondatabase/neon/pull/6845, we did not make the default privileges grantable before, and therefore, even if the users have full privileges, they are not able to grant them to others. Should be a final fix for https://github.com/neondatabase/neon/issues/6236. ## Summary of changes Add `WITH GRANT` to migrations so that neon_superuser can grant the permissions. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-22 17:49:02 +00:00
Alex Chi Z	6921577cec	compute_ctl: grant default privileges on table to `neon_superuser` (#6845 ) ## Problem fix https://github.com/neondatabase/neon/issues/6236 again ## Summary of changes This pull request adds a setup command in compute spec to modify default privileges of public schema to have full permission on table/sequence for neon_superuser. If an extension upgrades to superuser during creation, the tables/sequences they create in the public schema will be automatically granted to neon_superuser. Questions: * does it impose any security flaws? public schema should be fine... * for all extensions that create tables in schemas other than public, we will need to manually handle them (e.g., pg_anon). * we can modify some extensions to remove their superuser requirement in the future. * we may contribute to Postgres to allow for the creation of extensions with a specific user in the future. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-21 16:09:34 -05:00
Nikita Kalyanov	cbb599f353	Add /terminate API (#6745 ) this is to speed up suspends, see https://github.com/neondatabase/cloud/issues/10284 ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-02-20 19:42:36 +02:00
John Spray	24014d8383	pageserver: fix sharding emitting empty image layers during compaction (#6776 ) ## Problem Sharded tenants would sometimes try to write empty image layers during compaction: this was more noticeable on larger databases. - https://github.com/neondatabase/neon/issues/6755 Note to reviewers: the last commit is a refactor that de-intents a whole block, I recommend reviewing the earlier commits one by one to see the real changes ## Summary of changes - Fix a case where when we drop a key during compaction, we might fail to write out keys (this was broken when vectored get was added) - If an image layer is empty, then do not try and write it out, but leave `start` where it is so that if the subsequent key range meets criteria for writing an image layer, we will extend its key range to cover the empty area. - Add a compaction test that configures small layers and compaction thresholds, and asserts that we really successfully did image layer generation. This fails before the fix.	2024-02-18 08:51:12 +00:00
Konstantin Knizhnik	c19625a29c	Support sharding for compute_ctl (#6787 ) ## Problem See https://github.com/neondatabase/neon/issues/6786 ## Summary of changes Split connection string in compute.rs when requesting basebackup	2024-02-16 14:50:09 +00:00
Heikki Linnakangas	a5114a99b2	Create a symlink from pg_dynshmem to /dev/shm See included comment and issue https://github.com/neondatabase/autoscaling/issues/800 for details. This has no effect, unless you set "dynamic_shared_memory_type = mmap" in postgresql.conf.	2024-02-14 11:37:52 +02:00
Sasha Krassovsky	1a4dd58b70	Grant pg_monitor to neon_superuser (#6691 ) ## Problem The people want pg_monitor https://github.com/neondatabase/neon/issues/6682 ## Summary of changes Gives the people pg_monitor	2024-02-09 20:22:53 +00:00
Anastasia Lubennikova	eec1e1a192	Pre-install anon extension from compute_ctl if anon is in shared_preload_libraries. Users cannot install it themselves, because superuser is required. GRANT all priveleged needed to use it to db_owner We use the neon fork of the extension, because small change to sql file is needed to allow db_owner to use it. This feature is behind a feature flag AnonExtension, so it is not enabled by default.	2024-02-09 12:32:07 +00:00
Arpad Müller	c0e0fc8151	Update Rust to 1.76.0 (#6683 ) [Release notes](https://github.com/rust-lang/rust/releases/tag/1.75.0).	2024-02-08 19:57:02 +01:00
Sasha Krassovsky	7b49e5e5c3	Remove compute migrations feature flag (#6653 )	2024-02-07 07:55:55 -09:00
Heikki Linnakangas	dc811d1923	Add a span to 'create_neon_superuser' for better OpenTelemetry traces (#6644 ) create_neon_superuser runs the first queries in the database after cold start. Traces suggest that those first queries can make up a significant fraction of the cold start time. Make it more visible by adding an explict tracing span to it; currently you just have to deduce it by looking at the time spent in the parent 'apply_config' span subtracted by all the other child spans.	2024-02-06 20:37:35 +02:00
Vadim Kharitonov	dae56ef60c	Do not suspend compute if there is an active logical replication subscription. (#6570 ) ## Problem the idea is to keep compute up and running if there are any active logical replication subscriptions. ### Rationale Rationale: - The Write-Ahead Logging (WAL) files, which contain the data changes, will need to be retained on the publisher side until the subscriber is able to connect again and apply these changes. This could potentially lead to increased disk usage on the publisher - and we do not want to disrupt the source - I think it is more pain for our customer to resolve storage issues on the source than to pay for the compute at the target. - Upon resuming the compute resources, the subscriber will start consuming and applying the changes from the retained WAL files. The time taken to catch up will depend on the volume of changes and the configured vCPUs. we can avoid explaining complex situations where we lag behind (in extreme cases we could lag behind hours, days or even months) - I think an important use case for logical replication from a source is a one-time migration or release upgrade. In this case the customer would not mind if we are not suspended for the duration of the migration. We need to document this in the release notes and the documentation in the context of logical replication where Neon is the target (subscriber) ### See internal discussion here https://neondb.slack.com/archives/C04DGM6SMTM/p1706793400746539?thread_ts=1706792628.701279&cid=C04DGM6SMTM	2024-02-06 12:15:42 +00:00
Sasha Krassovsky	be30388901	Add retry to fetching basebackup (#6537 ) ## Problem Currently we have no retry mechanism for fetching basebackup. If there's an unstable connection, starting compute will just fail. ## Summary of changes Adds an exponential backoff with 7 retries to get the basebackup.	2024-02-01 20:50:04 +00:00
Sasha Krassovsky	e8c9a51273	Allow creating subscriptions as neon_superuser (#6484 ) ## Problem We currently can't create subscriptions in PG14 and PG15 because only superusers can, and PG16 requires adding roles to pg_create_subscription. ## Summary of changes I added changes to PG14 and PG15 that allow neon_superuser to bypass the superuser requirement. For PG16, I didn't do that but added a migration that adds neon_superuser to pg_create_subscription. Also added a test to make sure it works.	2024-01-30 22:32:33 -08:00
Sasha Krassovsky	71f495c7f7	Gate it behind feature flags	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	27587e155d	Fix test	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	55aede2762	Prevnet duplicate insertions	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	9f186b4d3e	Fix query	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	585687d563	Fix syntax error	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	65a98e425d	Switch to bigint	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	844303255a	Cargo fmt	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	6d8df2579b	Fix dumb thing	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	30064eb197	Add scary comment	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	869acfe29b	Make migrations transactional	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	11a91eaf7b	Uncomment the thread	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	a718287902	Make migrations happen on a separate thread	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	2eac1adcb9	Make clippy happy	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	a40ed86d87	Add test for migrations, add initial migration	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	1bf8bb88c5	Add support for migrations within compute_ctl	2024-01-22 14:53:29 -08:00
Anastasia Lubennikova	e6e013b3b7	Fix pgbouncer settings update: - Start pgbouncer in VM from postgres user, to allow connection to pgbouncer admin console. - Remove unused compute_ctl options --pgbouncer-connstr and --pgbouncer-ini-path. - Fix and cleanup code of connection to pgbouncer, add retries because pgbouncer may not be instantly ready when compute_ctl starts.	2024-01-18 11:27:12 +00:00
Konstantin Knizhnik	31a4eb40b2	Do not suspend compute if autovacuum is active (#6322 ) ## Problem Se.e https://github.com/orgs/neondatabase/projects/49/views/13?pane=issue&itemId=48282912 ## Summary of changes Do not suspend compute if there are active auto vacuum workers ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-01-14 09:33:57 +02:00
Arthur Petukhovsky	97b48c23f8	Compact some compute_ctl logs (#6346 ) Print postgres roles in a single line and add some info.	2024-01-12 18:24:22 +00:00
Alexey Kondratov	1c432d5492	[compute_ctl] Do not miss short-living connections (#6008 ) ## Problem Currently, activity monitor in `compute_ctl` has 500 ms polling interval. It also looks on the list of current client backends looking for an active one or one with the most recent state change. This means we can miss short-living connections. Yet, during testing this PR I realized that it's usually not a problem with pooled connection, as pgbouncer maintains connections to Postgres even though client connection are short-living. We can still miss direct connections. ## Summary of changes This commit introduces another way to detect user activity on the compute. It polls a sum of `active_time` and sum of `sessions` from all non-system databases in the `pg_stat_database` [1]. If user runs some queries or just open a direct connection, it will rise; if user will drop db, it can go down, but it's still a change and will be detected as activity. New statistic-based logic seems to be working fine. Yet, after having it running for a couple of hours I've seen several odd cases with connections via pgbouncer: 1. Sometimes, if you run just `psql pooler_connstr -c 'select 1;'` `active_time` could be not updated immediately, and it may take a couple of dozens of seconds. This doesn't seem critical, though. 2. Same query with pooler, `active_time` can be bumped a bit, then pgbouncer keeps open connection to Postgres for ~10 minutes, then it disconnects, and `active_time` could be bumped a bit again. 'Could be' because I've seen it once, but it didn't reproduce for a second try. I think this can create false-positives (hopefully rare), when we will not suspend some computes because of lagged statistics update OR because some non-user processes will try to connect to user databases. Currently, we don't touch them outside of startup and `postgres_exporter` is configured to do not discover other databases, but this can change in the future. New behavior is covered by feature flag `activity_monitor_experimental`, which should be provided by control plane via neondatabase/cloud#9171 [1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-VIEW Related to neondatabase/cloud#7966, neondatabase/cloud#7198	2024-01-12 18:15:41 +01:00
Arthur Petukhovsky	544284cce0	Collapse multiline queries in compute_ctl (#6316 )	2024-01-10 22:25:28 +04:00
Arthur Petukhovsky	71beabf82d	Join multiline postgres logs in compute_ctl (#5903 ) Postgres can write multiline logs, and they are difficult to handle after they are mixed with other logs. This PR combines multiline logs from postgres into a single line, where previous line breaks are replaced with unicode zero-width spaces. Then postgres logs are written to stderr with `PG:` prefix. It makes it easy to distinguish postgres logs from all other compute logs with a simple grep, e.g. `\|= "PG:"`	2024-01-10 15:11:43 +00:00
Arseny Sher	a41c4122e3	Don't suspend compute if there is active LR subscriber. https://github.com/neondatabase/neon/issues/6258	2024-01-06 01:24:44 +04:00
Arseny Sher	9a43c04a19	compute_ctl: kill postgres and sync-safekeeprs on exit. Otherwise they are left orphaned when compute_ctl is terminated with a signal. It was invisible most of the time because normally neon_local or k8s kills postgres directly and then compute_ctl finishes gracefully. However, in some tests compute_ctl gets stuck waiting for sync-safekeepers which intentionally never ends because safekeepers are offline, and we want to stop compute_ctl without leaving orphanes behind. This is a quite rough approach which doesn't wait for children termination. A better way would be to convert compute_ctl to async which would make waiting easy.	2024-01-01 20:44:05 +04:00
Anastasia Lubennikova	6e40900569	Manage pgbouncer configuration from compute_ctl: - add pgbouncer_settings section to compute spec; - add pgbouncer-connstr option to compute_ctl. - add pgbouncer-ini-path option to compute_ctl. Default: /etc/pgbouncer/pgbouncer.ini Apply pgbouncer config on compute start and respec to override default spec. Save pgbouncer config updates to pgbouncer.ini to preserve them across pgbouncer restarts.	2023-12-26 15:17:09 +00:00
Anastasia Lubennikova	0bd79eb063	Handle role deletion when project has no databases. (#6170 ) There is still default 'postgres' database, that may contain objects owned by the role or some ACLs. We need to reassign objects in this database too. ## Problem If customer deleted all databases and then tries to delete role, that has some non-standard ACLs, `apply_config` operation will stuck because of failing role deletion.	2023-12-19 16:27:47 +00:00

... 4 5 6 7 8 ...

462 Commits