refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath (#3026)

refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath

Before this patch, we would sometimes carry around plain file names in
`Path` types and/or awkwardly "rebase" paths to have a unified
representation of the layer file name between local and remote.

This patch introduces a new type `LayerFileName` which replaces the use
of `Path` / `PathBuf` / `RemotePath` in the `storage_sync2` APIs.

Instead of holding a string, it contains the parsed representation of
the image and delta file name.
When we need the file name, e.g., to construct a local path or
remote object key, we construct the name ad-hoc.

`LayerFileName` is also serde {Dese,Se}rializable, and in an initial   
version of this patch, it was supposed to be used directly inside      
`IndexPart`, replacing `RemotePath`.                                   
However,                                                               
  commit 3122f3282f                      
      Ignore backup files (ones with .n.old suffix) in download_missing
fixed handling of `*.old` backup file names in IndexPart, and we need  
to carry that behavior forward.                                        
The solution is to remove `*.old` backup files names during            
deserialization. When we re-serialize the IndexPart, the `*.old` file  
will be gone.                                                          
This leaks the `.old` file in the remote storage, but makes it safe    
to clean it up later.           

There is additional churn by a preliminary refactoring that got squashed
into this change:

   split off LayerMap's needs from trait Layer into super trait

That refactoring renames `Layer` to `PersistentLayer` and splits off a subset
of the functions into a super-trait called `Layer`.
The upser trait implements just the functions needed by `LayerMap`, whereas
`PersisentLayer` adds the context of the pageserver.

The naming is imperfect as some functions that reside in `PersistentLayer`
have nothing persistence-specific to it. But it's a step in the right direction.
This commit is contained in:
Christian Schwarz
2022-12-13 00:27:59 +01:00
committed by GitHub
parent d1edc8aa00
commit 22ae67af8d
15 changed files with 804 additions and 652 deletions

View File

@@ -411,12 +411,10 @@ def test_tenant_ignores_backup_file(
env.postgres.stop_all()
env.pageserver.stop()
# file is still mentioned in the index. Removing it requires more hacking on remote queue initialization
# Will be easier to do once there will be no .download_missing so it will be only one cycle through the layers
# in load_layer_map
# the .old file is gone from newly serialized index_part
new_index_part = local_fs_index_part(env, tenant_id, timeline_id)
backup_layers = filter(lambda x: x.endswith(".old"), new_index_part["timeline_layers"])
assert len(list(backup_layers)) == 1
assert len(list(backup_layers)) == 0
@pytest.mark.parametrize("remote_storage_kind", [RemoteStorageKind.LOCAL_FS])