## Summary When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert` upsert calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style append) instead of the standard merge path. - **`use_lsm_write`** — a `merge_insert` builder option, default `true`; set it `false` to use the standard path for a call even when a spec is set. - **`assume_pre_sharded`** — a `merge_insert` builder option, default `false`; skips the per-row shard check and routes by the first row only. - **`close_lsm_writers`** — drains and closes the table's cached MemWAL shard writers. - The `merge_insert` **`on`** columns default to, and are validated against, the table's unenforced primary key. - Shard writers are cached alongside the dataset (in `DatasetConsistencyWrapper`) and reused for the session. - `MergeResult` gains **`num_rows`** — on the LSM path the insert/update breakdown is unknown until compaction, so only the total is reported. Routing covers all three sharding strategies — bucket (murmur3, Iceberg-compatible), identity, and unsharded. Each `merge_insert` call targets a single shard; the whole input is collected and validated before a single atomic `ShardWriter::put`, so a validation failure leaves the MemWAL untouched. Bindings: Python (`merge_insert(...).use_lsm_write(...)` / `.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript (`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`, `Table.closeLsmWriters`). ## Context Reconstructed from the original #3354 branch onto current `main`: the branch predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`) split and has been rebuilt on that merged foundation. Depends on Lance `v7.0.0-beta.13`. The MemWAL read path (reading un-flushed shard data back into queries) and remote (LanceDB Cloud) LSM support are follow-ups. --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
4.6 KiB
@lancedb/lancedb • Docs
@lancedb/lancedb / MergeInsertBuilder
Class: MergeInsertBuilder
A builder used to create and run a merge insert operation
Constructors
new MergeInsertBuilder()
new MergeInsertBuilder(native, schema): MergeInsertBuilder
Construct a MergeInsertBuilder. Internal use only.
Parameters
-
native:
NativeMergeInsertBuilder -
schema:
Schema<any> |Promise<Schema<any>>
Returns
Methods
execute()
execute(data, execOptions?): Promise<MergeResult>
Executes the merge insert operation
Parameters
-
data:
Data -
execOptions?:
Partial<WriteExecutionOptions>
Returns
Promise<MergeResult>
the merge result
useIndex()
useIndex(useIndex): MergeInsertBuilder
Controls whether to use indexes for the merge operation.
When set to true (the default), the operation will use an index if available
on the join key for improved performance. When set to false, it forces a full
table scan even if an index exists. This can be useful for benchmarking or when
the query optimizer chooses a suboptimal path.
Parameters
- useIndex:
booleanWhether to use indices for the merge operation. Defaults totrue.
Returns
useLsmWrite()
useLsmWrite(useLsmWrite): MergeInsertBuilder
Controls whether the merge uses the MemWAL LSM write path.
By default (unset), a mergeInsert on a table with an LSM write spec is
routed through Lance's MemWAL shard writer, and a table without one uses
the standard path. Pass false to force the standard path even when a
spec is set. Pass true to require a spec — mergeInsert rejects if none
is installed.
Parameters
- useLsmWrite:
booleanWhether to use the LSM write path.
Returns
validateSingleShard()
validateSingleShard(validateSingleShard): MergeInsertBuilder
Controls how an LSM merge checks that its input targets a single shard.
When a table has an LSM write spec, every row in a mergeInsert call must
route to the same shard. When true (the default), every row is inspected
to verify this. When false, only the first row is inspected and the
shard it routes to is used for the whole input — a faster path for callers
that have already pre-sharded their input. Has no effect on tables without
an LSM write spec.
Parameters
- validateSingleShard:
booleanWhether to check every row routes to one shard. Defaults totrue.
Returns
whenMatchedUpdateAll()
whenMatchedUpdateAll(options?): MergeInsertBuilder
Rows that exist in both the source table (new data) and the target table (old data) will be updated, replacing the old row with the corresponding matching row.
If there are multiple matches then the behavior is undefined. Currently this causes multiple copies of the row to be created but that behavior is subject to change.
An optional condition may be specified. If it is, then only matched rows that satisfy the condtion will be updated. Any rows that do not satisfy the condition will be left as they are. Failing to satisfy the condition does not cause a "matched row" to become a "not matched" row.
The condition should be an SQL string. Use the prefix target. to refer to rows in the target table (old data) and the prefix source. to refer to rows in the source table (new data).
For example, "target.last_update < source.last_update"
Parameters
-
options?
-
options.where?:
string
Returns
whenNotMatchedBySourceDelete()
whenNotMatchedBySourceDelete(options?): MergeInsertBuilder
Rows that exist only in the target table (old data) will be deleted. An optional condition can be provided to limit what data is deleted.
Parameters
-
options?
-
options.where?:
stringAn optional condition to limit what data is deleted
Returns
whenNotMatchedInsertAll()
whenNotMatchedInsertAll(): MergeInsertBuilder
Rows that exist only in the source table (new data) should be inserted into the target table.