Commit Graph

2203 Commits

Author SHA1 Message Date
Anastasia Lubennikova
0ec5ddea0b GRANT CREATE ON SCHEMA public TO web_access 2022-10-17 22:42:51 +03:00
Kirill Bulatov
c4ee62d427 Bump clap and other minor dependencies (#2623) 2022-10-17 12:58:40 +03:00
Joonas Koivunen
c709354579 Add layer sizes to index_part.json (#2582)
This is the first step in verifying layer files. Next up on the road is
hashing the files and verifying the hashes.

The metadata additions do not require any migration. The idea is that
the change is backward and forward-compatible with regard to
`index_part.json` due to the softness of JSON schema and the
deserialization options in use.

New types added:

- LayerFileMetadata for tracking the file metadata
    - starting with only the file size
    - in future hopefully a sha256 as well
- IndexLayerMetadata, the serialized counterpart of LayerFileMetadata

LayerFileMetadata needing to have all fields Option is a problem but
that is not possible to handle without conflicting a lot more with other
ongoing work.

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2022-10-17 12:21:04 +03:00
Lassi Pölönen
5d6553d41d Fix pageserver configuration generation bug (#2584)
* We had an issue with `lineinfile` usage for pageserver configuration
file: if the S3 bucket related values were changed, it would have
resulted in duplicate keys, resulting in invalid toml.

So to fix the issue, we should keep the configuration in structured
format (yaml in this case) so we can always generate syntactically
correct toml.

Inventories are converted to yaml just so that it's easier to maintain
the configuration there. Another alternative would have been a separate
variable files.

* Keep the ansible collections dir, but locally installed collections
should not be tracked.
2022-10-16 11:37:10 +00:00
Kirill Bulatov
f03b7c3458 Bump regular dependencies (#2618)
* etcd-client is not updated, since we plan to replace it with another client and the new version errors with some missing prost library error
* clap had released another major update that requires changing every CLI declaration again, deserves a separate PR
2022-10-15 01:55:31 +03:00
Heikki Linnakangas
9c24de254f Add description and license fields to OpenAPI spec.
These were added earlier to the control plane's copy of this file.
This is the master version of this file, so let's keep it in sync.
2022-10-14 18:37:58 +03:00
Heikki Linnakangas
538876650a Merge 'local' and 'remote' parts of TimelineInfo into one struct.
The 'local' part was always filled in, so that was easy to merge into
into the TimelineInfo itself. 'remote' only contained two fields,
'remote_consistent_lsn' and 'awaits_download'. I made
'remote_consistent_lsn' an optional field, and 'awaits_download' is now
false if the timeline is not present remotely.

However, I kept stub versions of the 'local' and 'remote' structs for
backwards-compatibility, with a few fields that are actively used by
the control plane. They just duplicate the fields from TimelineInfo
now. They can be removed later, once the control plane has been
updated to use the new fields.
2022-10-14 18:37:14 +03:00
Heikki Linnakangas
500239176c Make TimelineInfo.local field mandatory.
It was only None when you queried the status of a timeline with
'timeline_detail' mgmt API call, and it was still being downloaded. You
can check for that status with the 'tenant_status' API call instead,
checking for has_in_progress_downloads field.

Anothere case was if an error happened while trying to get the current
logical size, in a 'timeline_detail' request. It might make sense to
tolerate such errors, and leave the fields we cannot fill in as empty,
None, 0 or similar, but it doesn't make sense to me to leave the whole
'local' struct empty in tht case.
2022-10-14 18:37:14 +03:00
Anastasia Lubennikova
ee64a6b80b Fix CI: push versioned compute images to production ECR 2022-10-14 18:12:50 +03:00
Anastasia Lubennikova
a13b486943 Bump vendor/postgres-v15. Rebase to 15.0 2022-10-14 18:12:50 +03:00
Arseny Sher
9fe4548e13 Reimplement explicit timeline creation on safekeepers.
With the ability to pass commit_lsn. This allows to perform project WAL recovery
through different (from the original) set of safekeepers (or under different
ttid) by
1) moving WAL files to s3 under proper ttid;
2) explicitly creating timeline on safekeepers, setting commit_lsn to the
latest point;
3) putting the lastest .parital file to the timeline directory on safekeepers, if
desired.

Extend test_s3_wal_replay to exersise this behaviour.

Also extends timeline_status endpoint to return postgres information.
2022-10-13 21:43:10 +04:00
Heikki Linnakangas
14c623b254 Make it possible to build with old cargo version.
I'm using the Rust compiler and cargo versions from Debian packages,
but the latest available cargo Debian package is quite old, version
1.57.  The 'named-profiles' features was not stabilized at that
version yet, so ever since commit a463749f5, I've had to manually add
this line to the Cargo.toml file to compile. I've been wishing that
someone would update the cargo Debian package, but it doesn't seem to
be happening any time soon.

This doesn't seem to bother anyone else but me, but it shouldn't hurt
anyone else either. If there was a good reason, I could install a
newer cargo version with 'rustup', but if all we need is this one line
in Cargo.toml, I'd prefer to continue using the Debian packages.
2022-10-13 15:17:00 +03:00
Alexander Bayandin
ebf54b0de0 Nightly Benchmarks: Add 50 GB projects (#2612) 2022-10-13 10:00:29 +01:00
Andrés
09dda35dac Return broken tenants due to non existing timelines dir (#2552) (#2575)
Co-authored-by: andres <andres.rodriguez@outlook.es>
2022-10-12 22:28:39 +03:00
Dmitry Ivanov
6ace79345d [proxy] Add more context to console requests logging (#2583) 2022-10-12 21:00:44 +03:00
danieltprice
771e61425e Update release-pr.md (#2600)
Update the Release Notes PR example that is referenced from the checklist. The Release Notes file structure changed recently.
2022-10-12 08:38:28 -03:00
Alexander Bayandin
93775f6ca7 GitHub Actions: replace deprecated set-output with GITHUB_OUTPUT (#2608) 2022-10-12 10:22:24 +01:00
Arseny Sher
6d0dacc4ce Recreate timeline on pageserver in s3_wal_replay test.
That's closer to real usage than switching to brand new pageserver.
2022-10-12 11:46:21 +04:00
Heikki Linnakangas
e5e40a31f4 Clean up terms "delete timeline" and "detach tenant".
You cannot attach/detach an individual timeline, attach/detach always
applies to the whole tenant. However, you can *delete* a single timeline
from a tenant. Fix some comments and error messages that confused these
two operations.
2022-10-11 17:47:41 +03:00
Heikki Linnakangas
676c63c329 Improve comments. 2022-10-11 17:47:41 +03:00
Heikki Linnakangas
47366522a8 Make the return type of 'list_timelines' simpler.
It's enough to return just the Timeline references. You can get the
timeline's ID easily from Timeline.
2022-10-11 17:47:41 +03:00
Heikki Linnakangas
db26bc49cc Remove obsolete FIXME comment.
Commit c634cb1d36 removed the trait and changed the function to return
a &TimelineWriter, as the FIXME said we should do, but forgot to remove
the FIXME.
2022-10-11 17:47:41 +03:00
Lassi Pölönen
e520293090 Add build info metric to pageserver, safekeeper and proxy (#2596)
* Test that we emit build info metric for pageserver, safekeeper and proxy with some non-zero length revision label

* Emit libmetrics_build_info on startup of pageserver, safekeeper and
proxy with label "revision" which tells the git revision.
2022-10-11 09:54:32 +03:00
Sergey Melnikov
241e549757 Switch neon-stress etcd to dedicatd instance (#2602) 2022-10-10 22:07:19 +00:00
Sergey Melnikov
34bea270f0 Fix POSTGRES_DISTRIB_DIR for benchmarks on ec2 runner (#2594) 2022-10-10 09:12:50 +00:00
Kirill Bulatov
13f0e7a5b4 Deploy pageserver_binutils to the envs 2022-10-09 08:21:11 +03:00
Kirill Bulatov
3e35f10adc Add a script to reformat the project 2022-10-09 08:21:11 +03:00
Kirill Bulatov
3be3bb7730 Be more verbose with initdb for pageserver timeline creation 2022-10-09 08:21:11 +03:00
Kirill Bulatov
01d2c52c82 Tidy up feature reporting 2022-10-09 08:21:11 +03:00
Kirill Bulatov
9f79e7edea Merge pageserver helper binaries and provide it for deployment (#2590) 2022-10-08 12:42:17 +00:00
Heikki Linnakangas
a22165d41e Add tests for comparing root and child branch performance.
Author: Thang Pham <thang@neon.tech>
2022-10-08 10:07:33 +03:00
Arseny Sher
725be60bb7 Storage messaging rfc 2. 2022-10-07 21:22:17 +04:00
Dmitry Ivanov
e516c376d6 [proxy] Improve logging (#2554)
* [proxy] Use `tracing::*` instead of `println!` for logging

* Fix a minor misnomer

* Log more stuff
2022-10-07 14:34:57 +03:00
Kirill Bulatov
8e51c27e1a Restore artifact versions (#2578)
Context: https://github.com/neondatabase/neon/pull/2128/files#r989489965

Co-authored-by: Rory de Zoete <rory@neon.tech>
2022-10-07 10:58:31 +00:00
Heikki Linnakangas
9e1eb69d55 Increase default compaction_period setting to 20 s.
The previous default of 1 s caused excessive CPU usage when there were
a lot of projects. Polling every timeline once a second was too aggressive
so let's reduce it.

Fixes https://github.com/neondatabase/neon/issues/2542, but we
probably also want do to something so that we don't poll timelines
that have received no new WAL or layers since last check.
2022-10-07 13:55:19 +03:00
Arthur Petukhovsky
687ba81366 Display sync safekeepers output in compute_ctl (#2571)
Pipe postgres output to compute_ctl stdout and create a test to check that compute_ctl works and prints postgres logs.
2022-10-06 13:53:52 +00:00
Andrés
47bae68a2e Make get_lsn_by_timestamp available in mgmt API (#2536) (#2560)
Co-authored-by: andres <andres.rodriguez@outlook.es>
2022-10-06 12:42:50 +03:00
Joonas Koivunen
e8b195acb7 fix: apply notify workaround on m1 mac docker (#2564)
workaround as discussed in the notify repository.
2022-10-06 11:13:40 +03:00
Anastasia Lubennikova
254cb7dc4f Update CI script to push compute-node-v15 to dockerhub 2022-10-06 10:50:08 +03:00
Anastasia Lubennikova
ed85d97f17 bump vendor/postgres-v15. Rebase it to Stamp 15rc2 2022-10-06 10:50:08 +03:00
Anastasia Lubennikova
4a216c5f7f Use PostGIS 3.3.1 that is compatible with pg 15 2022-10-06 10:50:08 +03:00
Anastasia Lubennikova
c5a428a61a Update Dockerfile.compute-node-v15 to match v14 version.
Fix build script to promote the image for v15 to neon dockerhub
2022-10-06 10:50:08 +03:00
Konstantin Knizhnik
ff8c481777 Normalize last_record LSN in wal receiver (#2529)
* Add test for branching on page boundary

* Normalize start recovery point

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>

Co-authored-by:  Thang Pham <thang@neon.tech>
2022-10-06 09:01:56 +03:00
Arthur Petukhovsky
f25dd75be9 Fix deadlock in safekeeper metrics (#2566)
We had a problem where almost all of the threads were waiting on a futex syscall. More specifically:
- `/metrics` handler was inside `TimelineCollector::collect()`, waiting on a mutex for a single Timeline
- This exact timeline was inside `control_file::FileStorage::persist()`, waiting on a mutex for Lazy initialization of `PERSIST_CONTROL_FILE_SECONDS`
- `PERSIST_CONTROL_FILE_SECONDS: Lazy<Histogram>` was blocked on `prometheus::register`
- `prometheus::register` calls `DEFAULT_REGISTRY.write().register()` to take a write lock on Registry and add a new metric
- `DEFAULT_REGISTRY` lock was already taken inside `DEFAULT_REGISTRY.gather()`, which was called by `/metrics` handler to collect all metrics

This commit creates another Registry with a separate lock, to avoid deadlock in a case where `TimelineCollector` triggers registration of new metrics inside default registry.
2022-10-06 01:07:02 +03:00
Sergey Melnikov
b99bed510d Move proxies to neon-proxy namespace (#2555) 2022-10-05 16:14:09 +03:00
sharnoff
580584c8fc Remove control_plane deps on pageserver/safekeeper (#2513)
Creates new `pageserver_api` and `safekeeper_api` crates to serve as the
shared dependencies. Should reduce both recompile times and cold compile
times.

Decreases the size of the optimized `neon_local` binary: 380M -> 179M.
No significant changes for anything else (mostly as expected).
2022-10-04 11:14:45 -07:00
Kirill Bulatov
d823e84ed5 Allow attaching tenants with zero timelines 2022-10-04 18:13:51 +03:00
Kirill Bulatov
231dfbaed6 Do not remove empty timelines/ directory for tenants 2022-10-04 18:13:51 +03:00
Dmitry Rodionov
5cf53786f9 Improve pytest ergonomics
1. Disable perf tests by default
2. Add instruction to run tests in parallel
2022-10-04 14:53:01 +03:00
Heikki Linnakangas
9b9bbad462 Use 'notify' crate to wait for PostgreSQL startup.
Compute node startup time is very important. After launching
PostgreSQL, use 'notify' to be notified immediately when it has
updated the PID file, instead of polling. The polling loop had 100 ms
interval so this shaves up to 100 ms from the startup time.
2022-10-04 13:00:15 +03:00