mirror of https://github.com/lancedb/lancedb.git synced 2026-06-01 19:30:45 +00:00

Files

Heng Ge 048f52c2aa feat(table): route merge_insert through the MemWAL LSM write path (#3354 )

## Summary

When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert`
upsert
calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style
append)
instead of the standard merge path.

- **`use_lsm_write`** — a `merge_insert` builder option, default `true`;
set it
  `false` to use the standard path for a call even when a spec is set.
- **`assume_pre_sharded`** — a `merge_insert` builder option, default
`false`;
  skips the per-row shard check and routes by the first row only.
- **`close_lsm_writers`** — drains and closes the table's cached MemWAL
shard
  writers.
- The `merge_insert` **`on`** columns default to, and are validated
against,
  the table's unenforced primary key.
- Shard writers are cached alongside the dataset (in
  `DatasetConsistencyWrapper`) and reused for the session.
- `MergeResult` gains **`num_rows`** — on the LSM path the insert/update
  breakdown is unknown until compaction, so only the total is reported.

Routing covers all three sharding strategies — bucket (murmur3,
Iceberg-compatible), identity, and unsharded. Each `merge_insert` call
targets
a single shard; the whole input is collected and validated before a
single
atomic `ShardWriter::put`, so a validation failure leaves the MemWAL
untouched.

Bindings: Python (`merge_insert(...).use_lsm_write(...)` /
`.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript
(`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`,
`Table.closeLsmWriters`).

## Context

Reconstructed from the original #3354 branch onto current `main`: the
branch
predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`)
split and
has been rebuilt on that merged foundation. Depends on Lance
`v7.0.0-beta.13`.

The MemWAL read path (reading un-flushed shard data back into queries)
and
remote (LanceDB Cloud) LSM support are follow-ups.

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>

2026-05-29 08:48:11 -07:00

4.6 KiB

Raw Blame History

@lancedb/lancedb • Docs

@lancedb/lancedb / MergeInsertBuilder

Class: MergeInsertBuilder

A builder used to create and run a merge insert operation

Constructors

new MergeInsertBuilder()

new MergeInsertBuilder(native, schema): MergeInsertBuilder

Construct a MergeInsertBuilder. Internal use only.

Parameters

native: NativeMergeInsertBuilder
schema: Schema<any> | Promise<Schema<any>>

Returns

MergeInsertBuilder

Methods

execute()

execute(data, execOptions?): Promise<MergeResult>

Executes the merge insert operation

Parameters

data: Data
execOptions?: Partial<WriteExecutionOptions>

Returns

Promise<MergeResult>

the merge result

useIndex()

useIndex(useIndex): MergeInsertBuilder

Controls whether to use indexes for the merge operation.

When set to true (the default), the operation will use an index if available on the join key for improved performance. When set to false, it forces a full table scan even if an index exists. This can be useful for benchmarking or when the query optimizer chooses a suboptimal path.

Parameters

useIndex: boolean Whether to use indices for the merge operation. Defaults to true.

Returns

MergeInsertBuilder

useLsmWrite()

useLsmWrite(useLsmWrite): MergeInsertBuilder

Controls whether the merge uses the MemWAL LSM write path.

By default (unset), a mergeInsert on a table with an LSM write spec is routed through Lance's MemWAL shard writer, and a table without one uses the standard path. Pass false to force the standard path even when a spec is set. Pass true to require a spec — mergeInsert rejects if none is installed.

Parameters

useLsmWrite: boolean Whether to use the LSM write path.

Returns

MergeInsertBuilder

validateSingleShard()

validateSingleShard(validateSingleShard): MergeInsertBuilder

Controls how an LSM merge checks that its input targets a single shard.

When a table has an LSM write spec, every row in a mergeInsert call must route to the same shard. When true (the default), every row is inspected to verify this. When false, only the first row is inspected and the shard it routes to is used for the whole input — a faster path for callers that have already pre-sharded their input. Has no effect on tables without an LSM write spec.

Parameters

validateSingleShard: boolean Whether to check every row routes to one shard. Defaults to true.

Returns

MergeInsertBuilder

whenMatchedUpdateAll()

whenMatchedUpdateAll(options?): MergeInsertBuilder

Rows that exist in both the source table (new data) and the target table (old data) will be updated, replacing the old row with the corresponding matching row.

If there are multiple matches then the behavior is undefined. Currently this causes multiple copies of the row to be created but that behavior is subject to change.

An optional condition may be specified. If it is, then only matched rows that satisfy the condtion will be updated. Any rows that do not satisfy the condition will be left as they are. Failing to satisfy the condition does not cause a "matched row" to become a "not matched" row.

The condition should be an SQL string. Use the prefix target. to refer to rows in the target table (old data) and the prefix source. to refer to rows in the source table (new data).

For example, "target.last_update < source.last_update"

Parameters

options?
options.where?: string

Returns

MergeInsertBuilder

whenNotMatchedBySourceDelete()

whenNotMatchedBySourceDelete(options?): MergeInsertBuilder

Rows that exist only in the target table (old data) will be deleted. An optional condition can be provided to limit what data is deleted.

Parameters

options?
options.where?: string An optional condition to limit what data is deleted

Returns

MergeInsertBuilder

whenNotMatchedInsertAll()

whenNotMatchedInsertAll(): MergeInsertBuilder

Rows that exist only in the source table (new data) should be inserted into the target table.

Returns

MergeInsertBuilder

4.6 KiB Raw Blame History

Class: MergeInsertBuilder

Constructors

new MergeInsertBuilder()

Parameters

Returns

Methods

execute()

Parameters

Returns

useIndex()

Parameters

Returns

useLsmWrite()

Parameters

Returns

validateSingleShard()

Parameters

Returns

whenMatchedUpdateAll()

Parameters

Returns

whenNotMatchedBySourceDelete()

Parameters

Returns

whenNotMatchedInsertAll()

Returns

4.6 KiB

Raw Blame History