Commit Graph

9 Commits

Author SHA1 Message Date
Anastasia Lubennikova
e3512340c1 Override neon.max_cluster_size for the time of compute_ctl (#5998)
Temporarily reset neon.max_cluster_size to avoid
the possibility of hitting the limit, while we are applying config:
creating new extensions, roles, etc...
2023-12-03 15:21:44 +00:00
Chengpeng Yan
dfe2e5159a remove the duplicate entries in postgresql.conf (#5090) 2023-09-06 13:57:03 -04:00
Alek Westover
d005c77ea3 Tar Remote Extensions (#4715)
Add infrastructure to dynamically load postgres extensions and shared libraries from remote extension storage.

Before postgres start downloads list of available remote extensions and libraries, and also downloads 'shared_preload_libraries'. After postgres is running, 'compute_ctl' listens for HTTP requests to load files.

Postgres has new GUC 'extension_server_port' to specify port on which 'compute_ctl' listens for requests.

When PostgreSQL requests a file, 'compute_ctl' downloads it.

See more details about feature design and remote extension storage layout in docs/rfcs/024-extension-loading.md

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Alek Westover <alek.westover@gmail.com>
2023-08-02 12:38:12 +03:00
Alexey Kondratov
ed938885ff [compute_ctl] Fix deletion of template databases (#4661)
If database was created with `is_template true` Postgres doesn't allow
dropping it right away and throws error
```
ERROR:  cannot drop a template database
```
so we have to unset `is_template` first.

Fixing it, I noticed that our `escape_literal` isn't exactly correct
and following the same logic as in `quote_literal_internal`, we need to
prepend string with `E`. Otherwise, it's not possible to filter
`pg_database` using `escape_literal()` result if name contains `\`, for
example.

Also use `FORCE` to drop database even if there are active connections.
We run this from `cloud_admin`, so it should have enough privileges.

NB: there could be other db states, which prevent us from dropping
the database. For example, if db is used by any active subscription
or logical replication slot.
TODO: deal with it once we allow logical replication. Proper fix should
involve returning an error code to the control plane, so it could
figure out that this is a non-retryable error, return it to the user and
mark operation as permanently failed.

Related to neondatabase/cloud#4258
2023-07-13 13:18:35 +02:00
Heikki Linnakangas
df3bae2ce3 Use compute_ctl to manage Postgres in tests. (#3886)
This adds test coverage for 'compute_ctl', as it is now used by all
the python tests.
    
There are a few differences in how 'compute_ctl' is called in the
tests, compared to the real web console:
    
- In the tests, the postgresql.conf file is included as one large
  string in the spec file, and it is written out as it is to the data
  directory.  I added a new field for that to the spec file. The real
  web console, however, sets all the necessary settings in the
  'settings' field, and 'compute_ctl' creates the postgresql.conf from
  those settings.

- In the tests, the information needed to connect to the storage, i.e.
  tenant_id, timeline_id, connection strings to pageserver and
  safekeepers, are now passed as new fields in the spec file. The real
  web console includes them as the GUCs in the 'settings' field. (Both
  of these are different from what the test control plane used to do:
  It used to write the GUCs directly in the postgresql.conf file). The
  plan is to change the control plane to use the new method, and
  remove the old method, but for now, support both.

Some tests that were sensitive to the amount of WAL generated needed
small changes, to accommodate that compute_ctl runs the background
health monitor which makes a few small updates. Also some tests shut
down the pageserver, and now that the background health check can run
some queries while the pageserver is down, that can produce a few
extra errors in the logs, which needed to be allowlisted.

Other changes:
- remove obsolete comments about PostgresNode;
- create standby.signal file for Static compute node;
- log output of `compute_ctl` and `postgres` is merged into
`endpoints/compute.log`.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-06-06 14:59:36 +01:00
Heikki Linnakangas
b627fa71e4 Make read-only replicas explicit in compute spec (#4136)
This builds on top of PR #4058, and supersedes #4018
2023-05-04 17:41:42 +03:00
Heikki Linnakangas
f0b2e076d9 Move compute_ctl structs used in HTTP API and spec file to separate crate.
This is in preparation of using compute_ctl to launch postgres nodes
in the neon_local control plane. And seems like a good idea to
separate the public interfaces anyway.

One non-mechanical change here is that the 'metrics' field is moved
under the Mutex, instead of using atomics. We were not using atomics
for performance but for convenience here, and it seems more clear to
not use atomics in the model for the HTTP response type.
2023-04-09 21:52:28 +03:00
Alexey Kondratov
772c2fb4ff Report startup metrics and failure reason from compute_ctl (#1581)
+ neondatabase/cloud#1103

This adds a couple of control endpoints to simplify compute state
discovery for control-plane. For example, now we may figure out
that Postgres wasn't able to start or basebackup failed within
seconds instead of just blindly polling the compute readiness
for a minute or two.

Also we now expose startup metrics (time of the each step: basebackup,
sync safekeepers, config, total). Console grabs them after each
successful start and report as histogram to prometheus and grafana.

OpenAPI spec is added and up-tp date, but is not currently used in the
console yet.
2022-05-18 13:03:29 +04:00
Alexey Kondratov
f64074c609 Move compute_tools from console repo (zenithdb/console#383)
Currently it's included with minimal changes and lives aside of the main
workspace. Later we may re-use and combine common parts with zenith
control_plane.

This change is mostly needed to unify cloud deployment pipeline:
1.1. build compute-tools image
1.2. build compute-node image based on the freshly built compute-tools
2. build zenith image

So we can roll new compute image and new storage required by it to
operate properly. Also it becomes easier to test console against some
specific version of compute-node/-tools.
2021-12-28 20:17:29 +03:00