Files
lancedb/docs/src/js/interfaces/LsmWriteSpec.md
Heng Ge 0d30b31998 feat: support setting LSM write spec for a table (#3396)
## Summary

Split out from #3354

Adds `LsmWriteSpec` and `Table::set_lsm_write_spec` /
`unset_lsm_write_spec` to
install and clear the spec that selects Lance's MemWAL LSM-style write
path for
`merge_insert`.

`LsmWriteSpec` offers three sharding strategies, all built on Lance's
`InitializeMemWalBuilder`:

- `LsmWriteSpec::bucket(column, num_buckets)` — hash-bucket sharding by
the
  single-column unenforced primary key.
- `LsmWriteSpec::identity(column)` — identity sharding by the raw value
of a
  scalar column.
- `LsmWriteSpec::unsharded()` — a single MemWAL shard.

Each can be refined with `with_maintained_indexes(...)` (indexes the
MemWAL
keeps up to date as rows are appended) and
`with_writer_config_defaults(...)`
(default `ShardWriter` configuration recorded in the MemWAL index, so
every
writer starts from the same defaults). All variants require the table to
have
an unenforced primary key.

- `set_lsm_write_spec` installs the spec by initializing the MemWAL
index;
`unset_lsm_write_spec` removes it (dropping the MemWAL index), reverting
to
  the standard `merge_insert` path. `unset` is idempotent.
- Bindings: Python (`LsmWriteSpec.bucket` / `.identity` / `.unsharded`,
  `set_lsm_write_spec` / `unset_lsm_write_spec`) and TypeScript
  (`setLsmWriteSpec` with `specType` `"bucket"` / `"identity"` /
  `"unsharded"`). `RemoteTable` returns `NotSupported`.

The actual `merge_insert` LSM dispatch and `ShardWriter` write path are
a
follow-up — this PR only installs and clears the spec.
2026-05-18 00:11:33 -07:00

1.0 KiB

@lancedb/lancedbDocs


@lancedb/lancedb / LsmWriteSpec

Interface: LsmWriteSpec

Specification selecting Lance's MemWAL LSM-style write path for mergeInsert.

specType is "bucket", "identity", or "unsharded". For "bucket", column and numBuckets are required; for "identity", column is required.

Properties

column?

optional column: string;

Bucket and identity variants: the sharding column.


maintainedIndexes?

optional maintainedIndexes: string[];

Names of indexes the MemWAL should keep up to date during writes.


numBuckets?

optional numBuckets: number;

Bucket variant: the number of buckets, in [1, 1024].


specType

specType: "bucket" | "identity" | "unsharded";

One of "bucket", "identity", or "unsharded".


writerConfigDefaults?

optional writerConfigDefaults: Record<string, string>;

Default ShardWriter configuration recorded in the MemWAL index.