Based on this comment: https://github.com/lancedb/lancedb/issues/2228#issuecomment-2730463075 and https://github.com/lancedb/lance/pull/2357 Here is my attempt at implementing bindings for returning merge stats from a `merge_insert.execute` call for lancedb. Note: I have almost no idea what I am doing in Rust but tried to follow existing code patterns and pay attention to compiler hints. - The change in nodejs binding appeared to be necessary to get compilation to work, presumably this could actual work properly by returning some kind of NAPI JS object of the stats data? - I am unsure of what to do with the remote/table.rs changes - necessarily for compilation to work; I assume this is related to LanceDB cloud, but unsure the best way to handle that at this point. Proof of function: ```python import pandas as pd import lancedb db = lancedb.connect("/tmp/test.db") test_data = pd.DataFrame( { "title": ["Hello", "Test Document", "Example", "Data Sample", "Last One"], "id": [1, 2, 3, 4, 5], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", ], } ) table = db.create_table("documents", data=test_data, exist_ok=True, mode="overwrite") update_data = pd.DataFrame( { "title": [ "Hello, World", "Test Document, it's good", "Example", "Data Sample", "Last One", "New One", ], "id": [1, 2, 3, 4, 5, 6], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", "New content", ], } ) stats = ( table.merge_insert(on="id") .when_matched_update_all() .when_not_matched_insert_all() .execute(update_data) ) print(stats) ``` returns ``` {'num_inserted_rows': 1, 'num_updated_rows': 5, 'num_deleted_rows': 0} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Merge-insert operations now return detailed statistics, including counts of inserted, updated, and deleted rows. - **Bug Fixes** - Tests updated to validate returned merge-insert statistics for accuracy. - **Documentation** - Method documentation improved to reflect new return values and clarify merge operation results. - Added documentation for the new `MergeStats` interface detailing operation statistics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>
2.6 KiB
@lancedb/lancedb • Docs
@lancedb/lancedb / MergeInsertBuilder
Class: MergeInsertBuilder
A builder used to create and run a merge insert operation
Constructors
new MergeInsertBuilder()
new MergeInsertBuilder(native, schema): MergeInsertBuilder
Construct a MergeInsertBuilder. Internal use only.
Parameters
-
native:
NativeMergeInsertBuilder -
schema:
Schema<any> |Promise<Schema<any>>
Returns
Methods
execute()
execute(data): Promise<MergeStats>
Executes the merge insert operation
Parameters
- data:
Data
Returns
Promise<MergeStats>
Statistics about the merge operation: counts of inserted, updated, and deleted rows
whenMatchedUpdateAll()
whenMatchedUpdateAll(options?): MergeInsertBuilder
Rows that exist in both the source table (new data) and the target table (old data) will be updated, replacing the old row with the corresponding matching row.
If there are multiple matches then the behavior is undefined. Currently this causes multiple copies of the row to be created but that behavior is subject to change.
An optional condition may be specified. If it is, then only matched rows that satisfy the condtion will be updated. Any rows that do not satisfy the condition will be left as they are. Failing to satisfy the condition does not cause a "matched row" to become a "not matched" row.
The condition should be an SQL string. Use the prefix target. to refer to rows in the target table (old data) and the prefix source. to refer to rows in the source table (new data).
For example, "target.last_update < source.last_update"
Parameters
-
options?
-
options.where?:
string
Returns
whenNotMatchedBySourceDelete()
whenNotMatchedBySourceDelete(options?): MergeInsertBuilder
Rows that exist only in the target table (old data) will be deleted. An optional condition can be provided to limit what data is deleted.
Parameters
-
options?
-
options.where?:
stringAn optional condition to limit what data is deleted
Returns
whenNotMatchedInsertAll()
whenNotMatchedInsertAll(): MergeInsertBuilder
Rows that exist only in the source table (new data) should be inserted into the target table.