doc: comment why this is not so easy

2026-01-16 01:42:55 +00:00 · 2024-03-25 15:32:47 +00:00
parent fc82512dde
commit 78ca8679e8
1 changed files with 16 additions and 0 deletions
--- a/pageserver/src/tenant/timeline/compaction.rs
+++ b/pageserver/src/tenant/timeline/compaction.rs
@@ -263,6 +263,22 @@ impl Timeline {
            }
        }

+        // drop the readlock for now; in theory, gc could also remove the same layers as we are now
+        // compacting. FIXME: how to prepare such a test case?
+        // 0. tenant with minimal pitr
+        // 1. create 10 layers
+        // 2. await on pausable_failpoint after dropping the read guard
+        // 3. delete all data, vacuum, checkpoint
+        // 4. gc
+        //
+        // now gc deletes the layers and when we finally get to writing our results back in
+        // finish_compact_batch, LayerManager::finish_compact_l0 will panic when deleting a layer
+        // which does not exist in LayerMap::remove_historic_noflush or LayerFileManager::remove.
+        //
+        // is the easy solution just to make the deletions from compaction more lenient? currently
+        // gc holds a write lock, so it cannot have this problem right now. if gc were to be loosened to take the
+        // read lock only momentarily and write lock for applying, it would have a similar issue in
+        // trying to gc layers which have just been compacted.
        stats.read_lock_held_compute_holes_micros = stats.read_lock_held_key_sort_micros.till_now();
        drop_rlock(guard);