mirror of
https://github.com/lancedb/lancedb.git
synced 2026-05-18 20:40:41 +00:00
## Summary Split out from #3354 Adds `LsmWriteSpec` and `Table::set_lsm_write_spec` / `unset_lsm_write_spec` to install and clear the spec that selects Lance's MemWAL LSM-style write path for `merge_insert`. `LsmWriteSpec` offers three sharding strategies, all built on Lance's `InitializeMemWalBuilder`: - `LsmWriteSpec::bucket(column, num_buckets)` — hash-bucket sharding by the single-column unenforced primary key. - `LsmWriteSpec::identity(column)` — identity sharding by the raw value of a scalar column. - `LsmWriteSpec::unsharded()` — a single MemWAL shard. Each can be refined with `with_maintained_indexes(...)` (indexes the MemWAL keeps up to date as rows are appended) and `with_writer_config_defaults(...)` (default `ShardWriter` configuration recorded in the MemWAL index, so every writer starts from the same defaults). All variants require the table to have an unenforced primary key. - `set_lsm_write_spec` installs the spec by initializing the MemWAL index; `unset_lsm_write_spec` removes it (dropping the MemWAL index), reverting to the standard `merge_insert` path. `unset` is idempotent. - Bindings: Python (`LsmWriteSpec.bucket` / `.identity` / `.unsharded`, `set_lsm_write_spec` / `unset_lsm_write_spec`) and TypeScript (`setLsmWriteSpec` with `specType` `"bucket"` / `"identity"` / `"unsharded"`). `RemoteTable` returns `NotSupported`. The actual `merge_insert` LSM dispatch and `ShardWriter` write path are a follow-up — this PR only installs and clears the spec.
1.0 KiB
1.0 KiB
@lancedb/lancedb • Docs
@lancedb/lancedb / LsmWriteSpec
Interface: LsmWriteSpec
Specification selecting Lance's MemWAL LSM-style write path for
mergeInsert.
specType is "bucket", "identity", or "unsharded". For "bucket",
column and numBuckets are required; for "identity", column is
required.
Properties
column?
optional column: string;
Bucket and identity variants: the sharding column.
maintainedIndexes?
optional maintainedIndexes: string[];
Names of indexes the MemWAL should keep up to date during writes.
numBuckets?
optional numBuckets: number;
Bucket variant: the number of buckets, in [1, 1024].
specType
specType: "bucket" | "identity" | "unsharded";
One of "bucket", "identity", or "unsharded".
writerConfigDefaults?
optional writerConfigDefaults: Record<string, string>;
Default ShardWriter configuration recorded in the MemWAL index.