947 Commits

Author SHA1 Message Date
Bojan Serafimov
0d7bce5f72 Fix segfault 2022-10-06 15:35:59 -04:00
Bojan Serafimov
42a0f5be00 WIP 2022-10-06 13:18:31 -04:00
Bojan Serafimov
d4065f2e85 Merge branch 'main' into ps-trace 2022-10-05 13:24:06 -04:00
Bojan Serafimov
5315614974 WIP 2022-10-05 13:21:42 -04:00
sharnoff
580584c8fc Remove control_plane deps on pageserver/safekeeper (#2513)
Creates new `pageserver_api` and `safekeeper_api` crates to serve as the
shared dependencies. Should reduce both recompile times and cold compile
times.

Decreases the size of the optimized `neon_local` binary: 380M -> 179M.
No significant changes for anything else (mostly as expected).
2022-10-04 11:14:45 -07:00
Kirill Bulatov
d823e84ed5 Allow attaching tenants with zero timelines 2022-10-04 18:13:51 +03:00
Kirill Bulatov
231dfbaed6 Do not remove empty timelines/ directory for tenants 2022-10-04 18:13:51 +03:00
Joonas Koivunen
31123d1fa8 Silence clippies, minor doc fix (#2543)
* doc: remove stray backtick

* chore: clippy::let_unit_value

* chore: silence useless_transmute, duplicate_mod

* chore: remove allowing deref_nullptr

not needed since bindgen 0.60.0.

* chore: remove repeated allowed lints

they are already allowed from the crate root.
2022-10-03 17:44:17 +03:00
Kirill Bulatov
7b2f9dc908 Reuse existing tenants during attach (#2540) 2022-10-03 13:33:55 +03:00
Dmitry Rodionov
fb68d01449 Preserve task result in TaskHandle by keeping join handle around (#2521)
* Preserve task result in TaskHandle by keeping join handle around

The solution is not great, but it should hep to debug staging issue
I tried to do it in a least destructive way. TaskHandle used only in
one place so it is ok to use something less generic unless we want
to extend its usage across the codebase. In its current current form
for its single usage place it looks too abstract

Some problems around this code:
1. Task can drop event sender and continue running
2. Task cannot be joined several times (probably not needed,
    but still, can be surprising)
3. Had to split task event into two types because ahyhow::Error
    does not implement clone. So TaskContinueEvent derives clone
    but usual task evend does not. Clone requirement appears
    because we clone the current value in next_task_event.
    Taking it by reference is complicated.
4. Split between Init and Started is artificial and comes from
    watch::channel requirement to have some initial value.

    To summarize from 3 and 4. It may be a better idea to use
    RWLock or a bounded channel instead
2022-09-26 20:57:02 +00:00
sharnoff
805bb198c2 Miscellaneous small fixups (#2503)
Changes are:

 * Correct typo "firts" -> "first"
 * Change <empty panic with comment explaining> to <panic with message
   taken from the comment>
 * Fix weird indentation that rustfmt was failing to handle
 * Use existing `anyhow::{anyhow,bail}!` as `{anyhow,bail}!` if it's
   already in scope
 * Spell `Result<T, anyhow::Error>` as `anyhow::Result<T>`
   * In general, closer to matching the rest of the codebase
 * Change usages of `hash_map::Entry` to `Entry` when it's already in
   scope
   * A quick search shows our style on this one varies across the files
     it's used in
2022-09-23 11:49:28 -07:00
MMeent
bc3ba23e0a Fix extreme metrics bloat in storage sync (#2506)
* Fix extreme metrics bloat in storage sync

From 78 metrics per (timeline, tenant) pair down to (max) 10 metrics per
(timeline, tenant) pair, plus another 117 metrics in a global histogram that
replaces the previous per-timeline histogram.

* Drop image sync operation metric series when dropping TimelineMetrics.
2022-09-23 14:35:36 +02:00
Dmitry Rodionov
43560506c0 remove duplicate walreceiver connection span 2022-09-23 00:27:24 +03:00
Egor Suvorov
262fa3be09 pageserver pg proto: add missing auth checks (#2494)
Fixes #1858
2022-09-22 17:07:08 +03:00
Anastasia Lubennikova
2d012f0d32 Fix rebase conflicts in pageserver code 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
eba419fda3 Clean up the pg_version choice code 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
ed6b75e301 show pg_version in create_timeline info span 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
862902f9e5 Update readme and openapi spec 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
8d890b3cbb fix clippy warnings 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
5dddeb8d88 Use non-versioned pg_distrib dir 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
a69e060f0f fix clippy warning 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
03c606f7c5 Pass pg_version parameter to timeline import command.
Add pg_version field to LocalTimelineInfo.
Use pg_version in the export_import_between_pageservers script
2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
9dfede8146 Handle backwards-compatibility of TimelineMetadata.
This commit bumps TimelineMetadata format version and makes it independent from STORAGE_FORMAT_VERSION.
2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
86bf491981 Support pg 15
- Split postgres_ffi into two version specific files.
- Preserve pg_version in timeline metadata.
- Use pg_version in safekeeper code. Check for postgres major version mismatch.
- Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding.

-  Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture.
   To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable:
   'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress'
 Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.
2022-09-22 14:15:13 +03:00
Dmitry Rodionov
e764c1e60f remove self argument from several spans 2022-09-22 11:41:04 +03:00
Konstantin Knizhnik
f3073a4db9 R-Tree layer map (#2317)
Replace the layer array and linear search with R-tree

So far, the in-memory layer map that holds information about layer
files that exist, has used a simple Vec, in no particular order, to
hold information about all the layers. That obviously doesn't scale
very well; with thousands of layer files the linear search was
consuming a lot of CPU. Replace it with a two-dimensional R-tree, with
Key and LSN ranges as the dimensions.

For the R-tree, use the 'rstar' crate. To be able to use that, we
convert the Keys and LSNs into 256-bit integers. 64 bits would be
enough to represent LSNs, and 128 bits would be enough to represent
Keys. However, we use 256 bits, because rstar internally performs
multiplication to calculate the area of rectangles, and the result of
multiplying two 128 bit integers doesn't necessarily fit in 128 bits,
causing integer overflow and, if overflow-checks are enabled,
panic. To avoid that, we use 256 bit integers.

Add a performance test that creates a lot of layer files, to
demonstrate the benefit.
2022-09-22 08:35:06 +03:00
sharnoff
6f949e1556 Improve pageserver/safekeepeer HTTP API errors (#2461)
Part of the general work on improving pageserver logs.

Brief summary of changes:

* Remove `ApiError::from_err`
* Remove `impl From<anyhow::Error> for ApiError`
* Convert `ApiError::{BadRequest, NotFound}` to use `anyhow::Error`
  * Note: `NotFound` has more verbose formatting because it's more
    likely to have useful information for the receiving "user"
* Explicitly convert from `tokio::task::JoinError`s into
  `InternalServerError`s where appropriate

Also note: many of the places where errors were implicitly converted to
500s have now been updated to return a more appropriate error. Some
places where it's not yet possible to distinguish the error types have
been left as 500s.
2022-09-20 17:02:10 -07:00
Kirill Bulatov
8d7024a8c2 Move path manipulation function to utils 2022-09-20 23:43:52 +03:00
Kirill Bulatov
6b8dcad1bb Unify timeline creation steps 2022-09-20 23:43:52 +03:00
Kirill Bulatov
310c507303 Merge path retrieval methods in config.rs 2022-09-20 23:43:52 +03:00
Kirill Bulatov
6fc719db13 Merge timelines.rs with tenant.rs 2022-09-20 23:43:52 +03:00
sharnoff
4a3b3ff11d Move testing pageserver libpq cmds to HTTP api (#2429)
Closes #2422.

The APIs have been feature gated with the `testing_api!` macro so that
they return 400s when support hasn't been compiled in.
2022-09-20 11:28:12 -07:00
sharnoff
4b25b9652a Rename more zid-like idents (#2480)
Follow-up to PR #2433 (b8eb908a). There's still a few more unresolved
locations that have been left as-is for the same compatibility reasons
in the original PR.
2022-09-20 11:06:31 -07:00
Arthur Petukhovsky
566e816298 Refactor safekeeper timelines handling (#2329)
See https://github.com/neondatabase/neon/pull/2329 for details
2022-09-20 07:42:39 +00:00
Dmitry Rodionov
fcb4a61a12 Adjust spans around gc and compaction
So compaction and gc loops have their own span to always show tenant id
in log messages.
2022-09-19 20:08:38 +03:00
Dmitry Rodionov
44fd4e3c9f add more logs 2022-09-16 18:14:05 +03:00
sharnoff
db5ec0dae7 Cleanup/simplify logical size calculation (#2459)
Should produce identical results; replaces an error case that shouldn't
be possible with `expect`.
2022-09-15 23:50:46 -07:00
Kirill Bulatov
031e57a973 Disable failpoints by default 2022-09-16 09:26:29 +03:00
Bojan Serafimov
293bc69e4f Implement drawer 2022-09-15 21:57:16 -04:00
Bojan Serafimov
e9b158e8f5 Add draw binary 2022-09-15 21:14:07 -04:00
Bojan Serafimov
a66a3b4ed8 Fix merge bugs 2022-09-15 20:38:09 -04:00
Bojan Serafimov
6913bedc09 Merge branch 'main' into ps-trace 2022-09-15 20:15:00 -04:00
Kirill Bulatov
b8eb908a3d Rename old project name references 2022-09-14 08:14:05 +03:00
Bojan Serafimov
a180273dc5 Add todo 2022-09-13 16:34:56 -04:00
Bojan Serafimov
6c6aa04ce4 Fix bugs 2022-09-13 16:24:09 -04:00
Bojan Serafimov
cd8c96233f Actually connect 2022-09-13 15:29:36 -04:00
Dmitry Rodionov
1d53173e62 update openapi spec (tenant state has changed) 2022-09-13 21:53:59 +03:00
Dmitry Rodionov
db0c49148d clean up metrics in handle_pagerequests 2022-09-13 21:21:45 +03:00
Bojan Serafimov
ecbda94790 Connect via pagestream 2022-09-13 14:09:15 -04:00
Kirill Bulatov
1a8c8b04d7 Merge Repository and Tenant entities, rework tenant background jobs 2022-09-13 15:39:39 +03:00