Commit Graph

1383 Commits

Author SHA1 Message Date
Alex Chi Z
b1f0bbd12a add reduce num sorted run trigger
Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-28 15:38:14 -04:00
Alex Chi Z
7d16a9f96f fix again
Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-28 15:07:31 -04:00
Alex Chi Z
878627161c revert not
Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-28 14:58:42 -04:00
Alex Chi Z
4db4f42dec correctly handle compaction
Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-28 14:41:26 -04:00
Alex Chi Z
6cb149e3c3 enable tiered again
Signed-off-by: Alex Chi Z <chi@neon.tech>
2023-06-28 14:30:17 -04:00
Alex Chi
f3fdaf8ef1 parallel compaction
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-27 16:42:56 -04:00
Alex Chi
eb93e686ab fix deletion
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-27 14:47:04 -04:00
Alex Chi
2cb79ae3ff fix deletion
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-27 14:43:20 -04:00
Alex Chi
dfe8527806 remove assertion
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-27 13:52:12 -04:00
Alex Chi
335710cec6 bring back original compaction
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-27 13:38:02 -04:00
Alex Chi
a78008ad82 max_merge_width
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-27 13:30:23 -04:00
Alex Chi
30e7ffcd28 adjust compaction strategy
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-26 15:52:37 -04:00
Alex Chi
43d564ce0a incremental image layer
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-26 15:25:35 -04:00
Alex Chi
f86ff5e54b dump more
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-26 14:57:00 -04:00
Alex Chi
9ed6ad1d24 fix weak ptr
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-26 14:33:00 -04:00
Alex Chi
91f28cb516 include delta l0 in compaction, more metrics
Signed-off-by: Alex Chi <chi@neon.tech>
2023-06-26 13:56:29 -04:00
Alex Chi
0b459eb414 fix ratio compute
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 15:11:15 -04:00
Alex Chi
0865ed623c fix comment
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 15:01:25 -04:00
Alex Chi
9e0f103c7b insert at 0
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 15:00:58 -04:00
Alex Chi
9f216a78a1 print
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 15:00:10 -04:00
Alex Chi
6967b4837b fix
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 14:53:53 -04:00
Alex Chi
9b50350857 threshold = 3
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 14:37:29 -04:00
Alex Chi
8ebfa32a0c compaction l0 adds to sorted runs
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 14:24:57 -04:00
Alex Chi
9905d75715 dump file size
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 14:18:54 -04:00
Alex Chi
b0b616f3ac dump
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 14:12:33 -04:00
Alex Chi
867b656ef2 bypass ut
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-22 11:28:34 -04:00
Alex Chi
76b339b150 create partial image layers
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-21 14:38:11 -04:00
Alex Chi
9b3fa1a2e1 fix compile error
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-21 14:16:44 -04:00
Alex Chi
17781776c8 add two compaction triggers
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-21 10:27:24 -04:00
Alex Chi
5274f487e4 add tiered compaction skeleton
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-20 14:43:07 -04:00
Alex Chi
9b7747436c incremental image?
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-20 11:05:16 -04:00
Alex Chi
a2056666ae pgserver: move mapping logic to layer cache
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-14 15:07:38 -04:00
Alex Chi
fc190a2a19 resolve merge conflicts
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-13 13:56:50 -04:00
Alex Chi
faee3152f3 refactor: use LayerDesc in LayerMap (part 2)
Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-06-13 13:54:59 -04:00
Christian Schwarz
3693d1f431 turn Timeline::layers into tokio::sync::RwLock (#4441)
This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).

# Full Stack Of Preliminary PRs

Thanks to the countless preliminary PRs, this conversion is relatively
straight-forward.

1. Clean-ups
  * https://github.com/neondatabase/neon/pull/4316
  * https://github.com/neondatabase/neon/pull/4317
  * https://github.com/neondatabase/neon/pull/4318
  * https://github.com/neondatabase/neon/pull/4319
  * https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
   thought we'd need in my original plan where we would need to convert
   `Tenant::timelines` into an async locking primitive (#4333). In reviews,
   we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485

# Significant Changes In This PR

## `compact_level0_phase1` & `create_delta_layer`

This commit partially reverts

   "pgserver: spawn_blocking in compaction (#4265)"
    4e359db4c7.

Specifically, it reverts the `spawn_blocking`-ificiation of
`compact_level0_phase1`.
If we didn't revert it, we'd have to use `Timeline::layers.blocking_read()`
inside `compact_level0_phase1`. That would use up a thread in the
`spawn_blocking` thread pool, which is hard-capped.

I considered wrapping the code that follows the second
`layers.read().await` into `spawn_blocking`, but there are lifetime
issues with `deltas_to_compact`.

Also, this PR switches the `create_delta_layer` _function_ back to
async, and uses `spawn_blocking` inside to run the code that does sync
IO, while keeping the code that needs to lock `Timeline::layers` async.

## `LayerIter` and `LayerKeyIter` `Send` bounds

I had to add a `Send` bound on the `dyn` type that `LayerIter`
and `LayerKeyIter` wrap. Why? Because we now have the second
`layers.read().await` inside `compact_level0_phase`, and these
iterator instances are held across that await-point.

More background:
https://github.com/neondatabase/neon/pull/4462#issuecomment-1587376960

## `DatadirModification::flush`

Needed to replace the `HashMap::retain` with a hand-rolled variant
because `TimelineWriter::put` is now async.
2023-06-13 18:38:41 +02:00
Christian Schwarz
fdf7a67ed2 init_empty_layer_map: use try_write (#4485)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
Or more specifically, #4441, where we turn Timeline::layers into a
tokio::sync::RwLock.

By using try_write() here, we can avoid turning init_empty_layer_map
async,
which is nice because much of its transitive call(er) graph isn't async.
2023-06-13 13:49:40 +02:00
Christian Schwarz
754ceaefac make TimelineWriter Send by using tokio::sync Mutex internally (#4477)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).

There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.

That will require the `TimelineWriter` to become async, because at times
its functions need to lock `Timeline::layers` in order to freeze the
open layer.

While doing that, rustc complains that we're now holding
`Timeline::write_lock` across await points (lock order is that
`write_lock` must be acquired before `Timelines::layers`).

So, we need to switch it over to an async primitive.
2023-06-13 10:15:25 +02:00
Christian Schwarz
939593d0d3 refactor check_checkpoint_distance to prepare for async Timeline::layers (#4476)
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).

There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.

That will require the `TimelineWriter` to become async.

That will require `freeze_inmem_layer` to become async.

So, inside check_checkpoint_distance, we will have
`freeze_inmem_layer().await`.

But current rustc isn't smart enough to understand that we
`drop(layers)` earlier, and hence, will complain about the `!Send`
`layers` being held across the `freeze_inmem_layer().await`-point.

This patch puts the guard into a scope, so rustc will shut up in the
next patch where we make the transition for `TimelineWriter`.

obsoletes https://github.com/neondatabase/neon/pull/4474
2023-06-12 17:45:56 +01:00
Christian Schwarz
2011cc05cd make Delta{Value,Key}Iter Send (#4472)
... by switching the internal RwLock to a OnceCell.

This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).

See https://github.com/neondatabase/neon/pull/4462#issuecomment-1587398883
for more context.

fixes https://github.com/neondatabase/neon/issues/4471
2023-06-12 17:45:56 +01:00
Heikki Linnakangas
e4f05ce0a2 Enable sanity check that disk_consistent_lsn is valid on created timeline.
Commit `create_test_timeline: always put@initdb_lsn the minimum required keys`
already switched us over to using valid initdb_lsns.

All that's left to do is to actually flush the minimum keys so that
we move from disk_consistent_lsn=Lsn(0) to disk_consistent_lsn=initdb_lsn.

Co-authored-by: Christian Schwarz <christian@neon.tech>

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Heikki Linnakangas
8d106708d7 Clean up timeline initialization code.
Clarify who's responsible for initializing the layer map. There were
previously two different ways to do it:

- create_empty_timeline and bootstrap_timeline let prepare_timeline()
  initialize an empty layer map.

- branch_timeline passed a flag to initialize_with_lock() to tell
  initialize_with_lock to call load_layer_map(). Because it was a
  newly created timeline, load_layer_map() never found any layer
  files, so it just initialized an empty layer map.

With this commit, prepare_new_timeline() always does it. The LSN to
initialize it with is passed as argument.

Other changes per function:

prepare_timeline:
- rename to 'prepare_new_timeline' to make it clear that it's only used
  when creating a new timeline, not when loading an existing timeline
- always initialize an empty layer map. The caller can pass the LSN to
  initialize it with. (Previously, prepare_timeline would optionally
  load the layer map at 'initdb_lsn'. Some caller used that, while others
  let initialize_with_lock do it

initialize_with_lock:
- As mentioned above, remove the option to load the layer map
- Acquire the 'timelines' lock in the function itself. None of the callers
  did any other work while holding the lock.
- Rename it to finish_creation() to make its intent more clear. It's only
  used when creating a new timeline now.

create_timeline_data:
- Rename to create_timeline_struct() for clarity. It just initializes
  the Timeline struct, not any other "data"

create_timeline_files:
- use create_dir rather than create_dir_all, to be a little more strict.
  We know that the parent directory should already exist, and the timeline
  directory should not exist.
- Move the call to create_timeline_struct() to the caller. It was just
  being "passed through"

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Christian Schwarz
f450369b20 timeline_init_and_sync: don't hold Tenant::timelines while load_layer_map
This patch inlines `initialize_with_lock` and then reorganizes the code
such that we can `load_layer_map` without holding the
`Tenant::timelines` lock.

As a nice aside, we can get rid of the dummy() uninit mark, which has
always been a terrible hack.

Part of https://github.com/neondatabase/neon/pull/4364
2023-06-12 11:56:49 +01:00
Christian Schwarz
aad918fb56 create_test_timeline: tests for put@initdb_lsn optimization code 2023-06-12 11:04:49 +01:00
Christian Schwarz
86dd8c96d3 add infrastructure to expect use of initdb_lsn flush optimization 2023-06-12 11:04:49 +01:00
Christian Schwarz
6a65c4a4fe create_test_timeline: always put@initdb_lsn the minimum required keys (#4451)
See the added comment on `create_empty_timeline`.

The various test cases now need to set a valid `Lsn` instead of
`Lsn(0)`.

Rough context:
https://github.com/neondatabase/neon/pull/4364#discussion_r1221995691
2023-06-12 09:28:34 +00:00
Alex Chi Z
cdce04d721 pgserver: add local manifest for atomic operation (#4422)
## Problem

Part of https://github.com/neondatabase/neon/issues/4418

## Summary of changes

This PR implements the local manifest interfaces. After the refactor of
timeline is done, we can integrate this with the current storage. The
reader will stop at the first corrupted record.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-08 19:34:25 -04:00
Dmitry Rodionov
d53f9ab3eb delete timelines from s3 (#4384)
Delete data from s3 when timeline deletion is requested

## Summary of changes

UploadQueue is altered to support scheduling of delete operations in
stopped state. This looks weird, and I'm thinking whether there are
better options/refactorings for upload client to make it look better.

Probably can be part of https://github.com/neondatabase/neon/issues/4378

Deletion is implemented directly in existing endpoint because changes are not
that significant. If we want more safety we can separate those or create
feature flag for new behavior.

resolves [#4193](https://github.com/neondatabase/neon/issues/4193)

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-06-08 15:01:22 +03:00
Dmitry Rodionov
8560a98d68 fix openapi spec to pass swagger editor validation (#4445)
There shouldnt be a dash before `type: object`. Also added description.
2023-06-08 13:25:30 +03:00
Alex Chi Z
2e687bca5b refactor: use LayerDesc in layer map (part 1) (#4408)
## Problem

part of https://github.com/neondatabase/neon/issues/4392

## Summary of changes

This PR adds a new HashMap that maps persistent layer desc to the layer
object *inside* LayerMap. Originally I directly went towards adding such
layer cache in Timeline, but the changes are too many and cannot be
reviewed as a reasonably-sized PR. Therefore, we take this intermediate
step to change part of the codebase to use persistent layer desc, and
come up with other PRs to move this hash map of layer desc to the
timeline struct.

Also, file_size is now part of the layer desc.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-06-07 18:28:18 +03:00
Dmitry Rodionov
1a1019990a map TenantState::Broken to TenantAttachmentStatus::Failed (#4371)
## Problem

Attach failures are not reported in public part of the api (in
`attachment_status` field of TenantInfo).

## Summary of changes

Expose TenantState::Broken as TenantAttachmentStatus::Failed

In the way its written Failed status will be reported even if no
attachment happened. (I e if tenant become broken on startup). This is
in line with other members. I e Active will be resolved to Attached even
if no actual attach took place.

This can be tweaked if needed. At the current stage it would be overengineering without clear motivation

resolves #4344
2023-06-07 18:25:30 +03:00