feat: a utility for creating "permutation views" (#2552)

I'm working on a lancedb version of pytorch data loading (and hopefully
addressing https://github.com/lancedb/lance/issues/3727).

However, rather than rely on pytorch for everything I'm moving some of
the things that pytorch does into rust. This gives us more control over
data loading (e.g. using shards or a hash-based split) and it allows
permutations to be persistent. In particular I hope to be able to:

* Create a persistent permutation
* This permutation can handle splits, filtering, shuffling, and sharding
* Create a rust data loader that can read a permutation (one or more
splits), or a subset of a permutation (for DDP)
* Create a python data loader that delegates to the rust data loader

Eventually create integrations for other data loading libraries,
including rust & node
This commit is contained in:
Weston Pace
2025-10-09 18:07:31 -07:00
committed by GitHub
parent 3dcec724b7
commit 5a19cf15a6
38 changed files with 3786 additions and 58 deletions

View File

@@ -16,6 +16,9 @@ rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.38.2", default-features = false, "features" = ["dynamodb"] }
lance-core = "=0.38.2"
lance-datagen = "=0.38.2"
lance-file = "=0.38.2"
lance-io = { "version" = "=0.38.2", default-features = false }
lance-index = "=0.38.2"
lance-linalg = "=0.38.2"
@@ -24,6 +27,7 @@ lance-testing = "=0.38.2"
lance-datafusion = "=0.38.2"
lance-encoding = "=0.38.2"
lance-namespace = "0.0.18"
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "56.2", optional = false }
arrow-array = "56.2"
@@ -48,6 +52,7 @@ log = "0.4"
moka = { version = "0.12", features = ["future"] }
object_store = "0.12.0"
pin-project = "1.0.7"
rand = "0.9"
snafu = "0.8"
url = "2"
num-traits = "0.2"