Files
lancedb/docs/src/js/interfaces/LsmWriteSpec.md
Heng Ge 048f52c2aa feat(table): route merge_insert through the MemWAL LSM write path (#3354)
## Summary

When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert`
upsert
calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style
append)
instead of the standard merge path.

- **`use_lsm_write`** — a `merge_insert` builder option, default `true`;
set it
  `false` to use the standard path for a call even when a spec is set.
- **`assume_pre_sharded`** — a `merge_insert` builder option, default
`false`;
  skips the per-row shard check and routes by the first row only.
- **`close_lsm_writers`** — drains and closes the table's cached MemWAL
shard
  writers.
- The `merge_insert` **`on`** columns default to, and are validated
against,
  the table's unenforced primary key.
- Shard writers are cached alongside the dataset (in
  `DatasetConsistencyWrapper`) and reused for the session.
- `MergeResult` gains **`num_rows`** — on the LSM path the insert/update
  breakdown is unknown until compaction, so only the total is reported.

Routing covers all three sharding strategies — bucket (murmur3,
Iceberg-compatible), identity, and unsharded. Each `merge_insert` call
targets
a single shard; the whole input is collected and validated before a
single
atomic `ShardWriter::put`, so a validation failure leaves the MemWAL
untouched.

Bindings: Python (`merge_insert(...).use_lsm_write(...)` /
`.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript
(`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`,
`Table.closeLsmWriters`).

## Context

Reconstructed from the original #3354 branch onto current `main`: the
branch
predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`)
split and
has been rebuilt on that merged foundation. Depends on Lance
`v7.0.0-beta.13`.

The MemWAL read path (reading un-flushed shard data back into queries)
and
remote (LanceDB Cloud) LSM support are follow-ups.

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-05-29 08:48:11 -07:00

1.2 KiB

@lancedb/lancedbDocs


@lancedb/lancedb / LsmWriteSpec

Interface: LsmWriteSpec

Specification selecting Lance's MemWAL LSM-style write path for mergeInsert.

specType is "bucket", "identity", or "unsharded". For "bucket", column and numBuckets are required; for "identity", column is required and must be a deterministic function of the unenforced primary key (every row with a given primary key must always produce the same column value, or upserts of that key can land in different shards and a stale version can win).

Properties

column?

optional column: string;

Bucket and identity variants: the sharding column.


maintainedIndexes?

optional maintainedIndexes: string[];

Names of indexes the MemWAL should keep up to date during writes.


numBuckets?

optional numBuckets: number;

Bucket variant: the number of buckets, in [1, 1024].


specType

specType: "bucket" | "identity" | "unsharded";

One of "bucket", "identity", or "unsharded".


writerConfigDefaults?

optional writerConfigDefaults: Record<string, string>;

Default ShardWriter configuration recorded in the MemWAL index.