Files
lancedb/docs/src/js/classes/MergeInsertBuilder.md
Alex Pilon f315f9665a feat: implement bindings to return merge stats (#2367)
Based on this comment:
https://github.com/lancedb/lancedb/issues/2228#issuecomment-2730463075
and https://github.com/lancedb/lance/pull/2357

Here is my attempt at implementing bindings for returning merge stats
from a `merge_insert.execute` call for lancedb.

Note: I have almost no idea what I am doing in Rust but tried to follow
existing code patterns and pay attention to compiler hints.
- The change in nodejs binding appeared to be necessary to get
compilation to work, presumably this could actual work properly by
returning some kind of NAPI JS object of the stats data?
- I am unsure of what to do with the remote/table.rs changes -
necessarily for compilation to work; I assume this is related to LanceDB
cloud, but unsure the best way to handle that at this point.

Proof of function:

```python
import pandas as pd
import lancedb


db = lancedb.connect("/tmp/test.db")

test_data = pd.DataFrame(
    {
        "title": ["Hello", "Test Document", "Example", "Data Sample", "Last One"],
        "id": [1, 2, 3, 4, 5],
        "content": [
            "World",
            "This is a test",
            "Another example",
            "More test data",
            "Final entry",
        ],
    }
)

table = db.create_table("documents", data=test_data, exist_ok=True, mode="overwrite")

update_data = pd.DataFrame(
    {
        "title": [
            "Hello, World",
            "Test Document, it's good",
            "Example",
            "Data Sample",
            "Last One",
            "New One",
        ],
        "id": [1, 2, 3, 4, 5, 6],
        "content": [
            "World",
            "This is a test",
            "Another example",
            "More test data",
            "Final entry",
            "New content",
        ],
    }
)

stats = (
    table.merge_insert(on="id")
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute(update_data)
)

print(stats)
```

returns

```
{'num_inserted_rows': 1, 'num_updated_rows': 5, 'num_deleted_rows': 0}
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Merge-insert operations now return detailed statistics, including
counts of inserted, updated, and deleted rows.
- **Bug Fixes**
- Tests updated to validate returned merge-insert statistics for
accuracy.
- **Documentation**
- Method documentation improved to reflect new return values and clarify
merge operation results.
- Added documentation for the new `MergeStats` interface detailing
operation statistics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-05-01 10:00:20 -07:00

2.6 KiB

@lancedb/lancedbDocs


@lancedb/lancedb / MergeInsertBuilder

Class: MergeInsertBuilder

A builder used to create and run a merge insert operation

Constructors

new MergeInsertBuilder()

new MergeInsertBuilder(native, schema): MergeInsertBuilder

Construct a MergeInsertBuilder. Internal use only.

Parameters

  • native: NativeMergeInsertBuilder

  • schema: Schema<any> | Promise<Schema<any>>

Returns

MergeInsertBuilder

Methods

execute()

execute(data): Promise<MergeStats>

Executes the merge insert operation

Parameters

Returns

Promise<MergeStats>

Statistics about the merge operation: counts of inserted, updated, and deleted rows


whenMatchedUpdateAll()

whenMatchedUpdateAll(options?): MergeInsertBuilder

Rows that exist in both the source table (new data) and the target table (old data) will be updated, replacing the old row with the corresponding matching row.

If there are multiple matches then the behavior is undefined. Currently this causes multiple copies of the row to be created but that behavior is subject to change.

An optional condition may be specified. If it is, then only matched rows that satisfy the condtion will be updated. Any rows that do not satisfy the condition will be left as they are. Failing to satisfy the condition does not cause a "matched row" to become a "not matched" row.

The condition should be an SQL string. Use the prefix target. to refer to rows in the target table (old data) and the prefix source. to refer to rows in the source table (new data).

For example, "target.last_update < source.last_update"

Parameters

  • options?

  • options.where?: string

Returns

MergeInsertBuilder


whenNotMatchedBySourceDelete()

whenNotMatchedBySourceDelete(options?): MergeInsertBuilder

Rows that exist only in the target table (old data) will be deleted. An optional condition can be provided to limit what data is deleted.

Parameters

  • options?

  • options.where?: string An optional condition to limit what data is deleted

Returns

MergeInsertBuilder


whenNotMatchedInsertAll()

whenNotMatchedInsertAll(): MergeInsertBuilder

Rows that exist only in the source table (new data) should be inserted into the target table.

Returns

MergeInsertBuilder