Files
lancedb/docs
Heng Ge 048f52c2aa feat(table): route merge_insert through the MemWAL LSM write path (#3354)
## Summary

When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert`
upsert
calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style
append)
instead of the standard merge path.

- **`use_lsm_write`** — a `merge_insert` builder option, default `true`;
set it
  `false` to use the standard path for a call even when a spec is set.
- **`assume_pre_sharded`** — a `merge_insert` builder option, default
`false`;
  skips the per-row shard check and routes by the first row only.
- **`close_lsm_writers`** — drains and closes the table's cached MemWAL
shard
  writers.
- The `merge_insert` **`on`** columns default to, and are validated
against,
  the table's unenforced primary key.
- Shard writers are cached alongside the dataset (in
  `DatasetConsistencyWrapper`) and reused for the session.
- `MergeResult` gains **`num_rows`** — on the LSM path the insert/update
  breakdown is unknown until compaction, so only the total is reported.

Routing covers all three sharding strategies — bucket (murmur3,
Iceberg-compatible), identity, and unsharded. Each `merge_insert` call
targets
a single shard; the whole input is collected and validated before a
single
atomic `ShardWriter::put`, so a validation failure leaves the MemWAL
untouched.

Bindings: Python (`merge_insert(...).use_lsm_write(...)` /
`.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript
(`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`,
`Table.closeLsmWriters`).

## Context

Reconstructed from the original #3354 branch onto current `main`: the
branch
predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`)
split and
has been rebuilt on that merged foundation. Depends on Lance
`v7.0.0-beta.13`.

The MemWAL read path (reading un-flushed shard data back into queries)
and
remote (LanceDB Cloud) LSM support are follow-ups.

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-05-29 08:48:11 -07:00
..

LanceDB Documentation

LanceDB docs are available at docs.lancedb.com.

The SDK docs are built and deployed automatically by Github Actions whenever a commit is pushed to the main branch. So it is possible for the docs to show unreleased features.

Building the docs

Setup

  1. Install LanceDB Python. See setup in Python contributing guide. Run make develop to install the Python package.
  2. Install documentation dependencies. From LanceDB repo root: pip install -r docs/requirements.txt

Preview the docs

cd docs
mkdocs serve

If you want to just generate the HTML files:

PYTHONPATH=. mkdocs build -f docs/mkdocs.yml

If successful, you should see a docs/site directory that you can verify locally.

Adding examples

To make sure examples are correct, we put examples in test files so they can be run as part of our test suites.

You can see the tests are at:

  • Python: python/python/tests/docs
  • Typescript: nodejs/examples/

Checking python examples

cd python
pytest -vv python/tests/docs

Checking typescript examples

The @lancedb/lancedb package must be built before running the tests:

pushd nodejs
npm ci
npm run build
popd

Then you can run the examples by going to the nodejs/examples directory and running the tests like a normal npm package:

pushd nodejs/examples
npm ci
npm test
popd

API documentation

Python

The Python API documentation is organized based on the file docs/src/python/python.md. We manually add entries there so we can control the organization of the reference page. However, this means any new types must be manually added to the file. No additional steps are needed to generate the API documentation.

Typescript

The typescript API documentation is generated from the typescript source code using typedoc.

When new APIs are added, you must manually re-run the typedoc command to update the API documentation. The new files should be checked into the repository.

pushd nodejs
npm run docs
popd