## Summary When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert` upsert calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style append) instead of the standard merge path. - **`use_lsm_write`** — a `merge_insert` builder option, default `true`; set it `false` to use the standard path for a call even when a spec is set. - **`assume_pre_sharded`** — a `merge_insert` builder option, default `false`; skips the per-row shard check and routes by the first row only. - **`close_lsm_writers`** — drains and closes the table's cached MemWAL shard writers. - The `merge_insert` **`on`** columns default to, and are validated against, the table's unenforced primary key. - Shard writers are cached alongside the dataset (in `DatasetConsistencyWrapper`) and reused for the session. - `MergeResult` gains **`num_rows`** — on the LSM path the insert/update breakdown is unknown until compaction, so only the total is reported. Routing covers all three sharding strategies — bucket (murmur3, Iceberg-compatible), identity, and unsharded. Each `merge_insert` call targets a single shard; the whole input is collected and validated before a single atomic `ShardWriter::put`, so a validation failure leaves the MemWAL untouched. Bindings: Python (`merge_insert(...).use_lsm_write(...)` / `.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript (`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`, `Table.closeLsmWriters`). ## Context Reconstructed from the original #3354 branch onto current `main`: the branch predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`) split and has been rebuilt on that merged foundation. Depends on Lance `v7.0.0-beta.13`. The MemWAL read path (reading un-flushed shard data back into queries) and remote (LanceDB Cloud) LSM support are follow-ups. --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
LanceDB Documentation
LanceDB docs are available at docs.lancedb.com.
The SDK docs are built and deployed automatically by Github Actions
whenever a commit is pushed to the main branch. So it is possible for the docs to show
unreleased features.
Building the docs
Setup
- Install LanceDB Python. See setup in Python contributing guide.
Run
make developto install the Python package. - Install documentation dependencies. From LanceDB repo root:
pip install -r docs/requirements.txt
Preview the docs
cd docs
mkdocs serve
If you want to just generate the HTML files:
PYTHONPATH=. mkdocs build -f docs/mkdocs.yml
If successful, you should see a docs/site directory that you can verify locally.
Adding examples
To make sure examples are correct, we put examples in test files so they can be run as part of our test suites.
You can see the tests are at:
- Python:
python/python/tests/docs - Typescript:
nodejs/examples/
Checking python examples
cd python
pytest -vv python/tests/docs
Checking typescript examples
The @lancedb/lancedb package must be built before running the tests:
pushd nodejs
npm ci
npm run build
popd
Then you can run the examples by going to the nodejs/examples directory and
running the tests like a normal npm package:
pushd nodejs/examples
npm ci
npm test
popd
API documentation
Python
The Python API documentation is organized based on the file docs/src/python/python.md.
We manually add entries there so we can control the organization of the reference page.
However, this means any new types must be manually added to the file. No additional
steps are needed to generate the API documentation.
Typescript
The typescript API documentation is generated from the typescript source code using typedoc.
When new APIs are added, you must manually re-run the typedoc command to update the API documentation. The new files should be checked into the repository.
pushd nodejs
npm run docs
popd