Commit Graph

22 Commits

Author SHA1 Message Date
Tristan Partin
cf7a596a15 Generate sql_exporter config files with Jsonnet
There are quite a few benefits to this approach:

- Reduce config duplication
  - The two sql_exporter configs were super similar with just a few
    differences
- Pull SQL queries into standalone files
  - That means we could run a SQL formatter on the file in the future
  - It also means access to syntax highlighting
- In the future, run different queries for different PG versions
  - This is relevant because right now, we have queries that are failing
    on PG 17 due to catalog updates

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 11:18:38 -05:00
Alexander Bayandin
0fc4ada3ca Switch CI, Storage and Proxy to Debian 12 (Bookworm) (#9170)
## Problem

This PR switches CI and Storage to Debain 12 (Bookworm) based images.

## Summary of changes
- Add Debian codename (`bookworm`/`bullseye`) to most of docker tags,
create un-codenamed images to be used by default
- `vm-compute-node-image`: create a separate spec for `bookworm` (we
don't need to build cgroups in the future)
- `neon-image`: Switch to `bookworm`-based `build-tools` image
  - Storage components and Proxy use it
- CI: run lints and tests on `bookworm`-based `build-tools` image
2024-10-14 21:12:43 +01:00
Heikki Linnakangas
357fa070a3 Add gdb to build-tools (#9125)
So that compute_ctl can use it to print backtrace on core dumps

See issue #2800.
2024-09-27 15:36:24 +03:00
Arpad Müller
fa3fc73c1b Address 1.82 clippy lints (#8944)
Addresses the clippy lints of the beta 1.82 toolchain.

The `too_long_first_doc_paragraph` lint complained a lot and was
addressed separately: #8941
2024-09-06 21:05:18 +02:00
Arpad Müller
a1323231bc Update Rust to 1.81.0 (#8939)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1810-2024-09-05).

Prior update was in #8667 and #8518
2024-09-06 12:40:19 +02:00
Alexander Bayandin
4a53cd0fc3 Dockerfiles: remove cachepot (#8666)
## Problem
We install and try to use `cachepot`. But it is not configured correctly
and doesn't work (after https://github.com/neondatabase/neon/pull/2290)

## Summary of changes
- Remove `cachepot`
2024-08-09 15:48:16 +01:00
Alexander Bayandin
8acce00953 Dockerfiles: fix LegacyKeyValueFormat & JSONArgsRecommended (#8664)
## Problem
CI complains in all PRs:
```
"ENV key=value" should be used instead of legacy "ENV key value" format 
```
https://docs.docker.com/reference/build-checks/legacy-key-value-format/

See 
- https://github.com/neondatabase/neon/pull/8644/files ("Unchanged files
with check annotations" section)
- https://github.com/neondatabase/neon/actions/runs/10304090562?pr=8644
("Annotations" section)


## Summary of changes
- Use `ENV key=value` instead of `ENV key value` in all Dockerfiles
2024-08-09 07:54:54 +01:00
Alexander Bayandin
d28a6f2576 CI(build-tools): update Rust, Python, Mold (#8667)
## Problem
- Rust 1.80.1 has been released:
https://blog.rust-lang.org/2024/08/08/Rust-1.80.1.html
- Python 3.9.19 has been released:
https://www.python.org/downloads/release/python-3919/
- Mold 2.33.0 has been released:
https://github.com/rui314/mold/releases/tag/v2.33.0
- Unpinned `cargo-deny` in `build-tools` got updated to the latest
version and doesn't work anymore with the current config file

## Summary of changes
- Bump Rust to 1.80.1
- Bump Python to 3.9.19
- Bump Mold to 2.33.0 
- Pin `cargo-deny`, `cargo-hack`, `cargo-hakari`, `cargo-nextest`,
`rustfilt` versions
- Update `deny.toml` to the latest format, see
https://github.com/EmbarkStudios/cargo-deny/pull/611
2024-08-09 06:17:16 +00:00
Arpad Müller
bb2a3f9b02 Update Rust to 1.80.0 (#8518)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust ecosystem as well.

[Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-180-2024-07-25).

Prior update was in #8048
2024-07-26 11:17:33 +02:00
Alexander Bayandin
bf9fc77061 CI(pg-clients): unify workflow with build-and-test (#8160)
## Problem

`pg-clients` workflow looks different from the main `build-and-test`
workflow for historical reasons (it was my very first task at Neon, and 
back then I wasn't really familiar with the rest of the CI pipelines).
This PR unifies `pg-clients` workflow with `build-and-test`

## Summary of changes
- Rename `pg_clients.yml` to `pg-clients.yml`
- Run the workflow on changes in relevant files
- Create Allure report for tests
- Send slack notifications to `#on-call-qa-staging-stream` channel
(instead of `#on-call-staging-stream`)
- Update Client libraries once we're here
2024-07-04 14:58:01 +01:00
Alexander Bayandin
e823b92947 CI(build-tools): Remove libpq from build image (#8206)
## Problem
We use `build-tools` image as a base image to build other images, and it
has a pretty old `libpq-dev` installed (v13; it wasn't that old until I
removed system Postgres 14 from `build-tools` image in
https://github.com/neondatabase/neon/pull/6540)

## Summary of changes
- Remove `libpq-dev` from `build-tools` image
- Set `LD_LIBRARY_PATH` for tests (for different Postgres binaries that
we use, like psql and pgbench)
- Set `PQ_LIB_DIR` to build Storage Controller
- Set `LD_LIBRARY_PATH`/`DYLD_LIBRARY_PATH` in the Storage Controller
where it calls Postgres binaries
2024-07-01 13:11:55 +01:00
Vlad Lazar
dd3adc3693 docker: downgrade openssl to 1.1.1w (#8168)
## Problem
We have seen numerous segfault and memory corruption issue for clients
using libpq and openssl 3.2.2. I don't know if this is a bug in openssl
or libpq. Downgrading to 1.1.1w fixes the issues for the storage
controller and pgbench.

## Summary of Changes:
Use openssl 1.1.1w instead of 3.2.2
2024-06-26 17:27:23 +00:00
Alexander Bayandin
5af9660b9e CI(build-tools): don't install Postgres 14 (#6540)
## Problem

We install Postgres 14 in `build-tools` image, but we don't need
it. We use Postgres binaries, which we build ourselves.

## Summary of changes
- Remove Postgresql 14 installation from `build-tools` image
2024-06-26 16:37:04 +01:00
Christian Schwarz
6c6a7f9ace [v2] Include openssl and ICU statically linked (#8074)
We had to revert the earlier static linking change due to libicu version
incompatibilities:

- original PR: https://github.com/neondatabase/neon/pull/7956
- revert PR: https://github.com/neondatabase/neon/pull/8003

Specifically, the problem manifests for existing projects as error

```
DETAIL:  The collation in the database was created using version 153.120.42, but the operating system provides version 153.14.37.
```

So, this PR reintroduces the original change but with the exact same
libicu version as in Debian `bullseye`, i.e., the libicu version that
we're using today.
This avoids the version incompatibility.


Additional changes made by Christian
====================================
- `hashFiles` can take multiple arguments, use that feature
- validation of the libicu tarball checksum
- parallel build (`-j $(nproc)`) for openssl and libicu

Follow-ups
==========

Debian bullseye has a few patches on top of libicu:
https://sources.debian.org/patches/icu/67.1-7/
We still decide whether we need to include these patches or not.
=> https://github.com/neondatabase/cloud/issues/14527

Eventually, we'll have to figure out an upgrade story for libicu.
That work is tracked in epic
https://github.com/neondatabase/cloud/issues/14525.

The OpenSSL version in this PR is arbitrary.
We should use `1.1.1w` + Debian patches if applicable.
See https://github.com/neondatabase/cloud/issues/14526.

Longer-term:
* https://github.com/neondatabase/cloud/issues/14519
* https://github.com/neondatabase/cloud/issues/14525

Refs
====

Co-authored-by: Christian Schwarz <christian@neon.tech>

refs https://github.com/neondatabase/cloud/issues/12648

---------

Co-authored-by: Rahul Patil <rahul@neon.tech>
2024-06-18 09:42:22 +02:00
Conrad Ludgate
e6eb0020a1 update rust to 1.79.0 (#8048)
## Problem

rust 1.79 new enabled by default lints

## Summary of changes

* update to rust 1.79
* `s/default_features/default-features/`
* fix proxy dead code.
* fix pageserver dead code.
2024-06-14 13:23:52 +02:00
Christian Schwarz
ae5badd375 Revert "Include openssl and ICU statically linked" (#8003)
Reverts neondatabase/neon#7956

Rationale: compute incompatibilties

Slack thread:
https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH

Relevant quotes from @hlinnaka 

> If we go through with the current release candidate, but the compute
is pinned, people who create new projects will get that warning, which
is silly. To them, it looks like the ICU version was downgraded, because
initdb was run with newer version.

> We should upgrade the ICU version eventually. And when we do that,
users with old projects that use ICU will start to see that warning. I
think that's acceptable, as long as we do homework, notify users, and
communicate that properly.
> When do that, we should to try to upgrade the storage and compute
versions at roughly the same time.
2024-06-10 13:20:20 +02:00
Rahul Patil
3b647cd55d Include openssl and ICU statically linked (#7956)
## Problem

Due to the upcoming End of Life (EOL) for Debian 11, we need to upgrade 
the base OS for Pageservers from Debian 11 to Debian 12 for security
reasons.

When deploying a new Pageserver on Debian 12 with the same binary built
on
Debian 11, we encountered the following errors:

```
could not execute operation: pageserver error, status: 500, 
msg: Command failed with status ExitStatus(unix_wait_status(32512)): 
/usr/local/neon/v16/bin/initdb: error while loading shared libraries: 
libicuuc.so.67: cannot open shared object file: No such file or directory
```

and 

```
could not execute operation: pageserver error, status: 500, 
msg: Command failed with status ExitStatus(unix_wait_status(32512)):
 /usr/local/neon/v14/bin/initdb: error while loading shared libraries: 
 libssl.so.1.1: cannot open shared object file: No such file or directory
```

These issues occur when creating new projects.


## Summary of changes

- To address these issues, we configured PostgreSQL build to use 
  statically linked OpenSSL and ICU libraries. 

- This resolves the missing shared library errors when running the 
  binaries on Debian 12.
  
Closes: https://github.com/neondatabase/cloud/issues/12648 

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [x] Do not forget to reformat commit message to not include the above
checklist
2024-06-07 17:28:10 +00:00
Arpad Müller
e67fcf9563 Update mold to 2.31 (#7757)
The [2.31.0 release](https://github.com/rui314/mold/releases/tag/v2.31.0) of mold
includes a 10% speed improvement for binaries with a lot of debug info.
As we have such, it might be useful to update mold to the latest
release. The jump is from 2.4.0 to 2.31.0, but it's not been many
releases in between as the version number was raised by the mold
maintainers to 2.30.0 after 2.4.1 [to avoid confusion for some
tools](https://github.com/rui314/mold/releases/tag/v2.30.0).
2024-05-14 17:49:19 +02:00
Arpad Müller
426598cf76 Update rust to 1.78.0 (#7598)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

Release notes: https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html

Prior update was in #7198
2024-05-03 15:59:28 +02:00
Alexander Bayandin
94505fd672 CI: speed up Allure reports upload (#7362)
## Problem

`create-test-report` job takes more than 8 minutes, the longest step is
uploading Allure report to S3:

Before:
```
+ aws s3 cp --recursive --only-show-errors /tmp/pr-7362-1712847045/report s3://neon-github-public-dev/reports/pr-7362/8647730612

real	6m10.572s
user	6m37.717s
sys	1m9.429s
```

After:
```
+ s5cmd --log error cp '/tmp/pr-7362-1712858221/report/*' s3://neon-github-public-dev/reports/pr-7362/8650636861/

real	0m9.698s
user	1m9.438s
sys	0m6.419s
```

## Summary of changes
- Add `s5cmd`(https://github.com/peak/s5cmd) to build-tools image
- Use `s5cmd` instead of `aws s3` for uploading Allure reports
2024-04-11 23:35:30 +01:00
Arpad Müller
3ee34a3f26 Update Rust to 1.77.0 (#7198)
Release notes: https://blog.rust-lang.org/2024/03/21/Rust-1.77.0.html

Thanks to #6886 the diff is reasonable, only for one new lint
`clippy::suspicious_open_options`. I added `truncate()` calls to the
places where it is obviously the right choice to me, and added allows
everywhere else, leaving it for followups.

I had to specify cargo install --locked because the build would fail otherwise.
This was also recommended by upstream.
2024-03-22 06:52:31 +00:00
Alexander Bayandin
1d5e476c96 CI: use build-tools image from dockerhub (#6795)
## Problem

Currently, after updating `Dockerfile.build-tools` in a PR, it requires
a manual action to make it `pinned`, i.e., the default for everyone. It
also makes all opened PRs use such images (even created in the PR and
without such changes).
This PR overhauls the way we build and use `build-tools` image (and uses
the image from Docker Hub).

## Summary of changes
- The `neondatabase/build-tools` image gets tagged with the latest
commit sha for the `Dockerfile.build-tools` file
- Each PR calculates the tag for `neondatabase/build-tools`, tries to
pull it, and rebuilds the image with such tag if it doesn't exist.
- Use `neondatabase/build-tools` as a default image
- When running on `main` branch — create a `pinned` tag and push it to
ECR
- Use `concurrency` to ensure we don't build `build-tools` image for the
same commit in parallel from different PRs
2024-02-28 12:38:11 +00:00