lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-15 02:50:44 +00:00

Author	SHA1	Message	Date
Alex Pilon	f315f9665a	feat: implement bindings to return merge stats (#2367 ) Based on this comment: https://github.com/lancedb/lancedb/issues/2228#issuecomment-2730463075 and https://github.com/lancedb/lance/pull/2357 Here is my attempt at implementing bindings for returning merge stats from a `merge_insert.execute` call for lancedb. Note: I have almost no idea what I am doing in Rust but tried to follow existing code patterns and pay attention to compiler hints. - The change in nodejs binding appeared to be necessary to get compilation to work, presumably this could actual work properly by returning some kind of NAPI JS object of the stats data? - I am unsure of what to do with the remote/table.rs changes - necessarily for compilation to work; I assume this is related to LanceDB cloud, but unsure the best way to handle that at this point. Proof of function: ```python import pandas as pd import lancedb db = lancedb.connect("/tmp/test.db") test_data = pd.DataFrame( { "title": ["Hello", "Test Document", "Example", "Data Sample", "Last One"], "id": [1, 2, 3, 4, 5], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", ], } ) table = db.create_table("documents", data=test_data, exist_ok=True, mode="overwrite") update_data = pd.DataFrame( { "title": [ "Hello, World", "Test Document, it's good", "Example", "Data Sample", "Last One", "New One", ], "id": [1, 2, 3, 4, 5, 6], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", "New content", ], } ) stats = ( table.merge_insert(on="id") .when_matched_update_all() .when_not_matched_insert_all() .execute(update_data) ) print(stats) ``` returns ``` {'num_inserted_rows': 1, 'num_updated_rows': 5, 'num_deleted_rows': 0} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Merge-insert operations now return detailed statistics, including counts of inserted, updated, and deleted rows. - Bug Fixes - Tests updated to validate returned merge-insert statistics for accuracy. - Documentation - Method documentation improved to reflect new return values and clarify merge operation results. - Added documentation for the new `MergeStats` interface detailing operation statistics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-01 10:00:20 -07:00
Andrew C. Oliver	5deb26bc8b	fix: prevent embedded objects from returning null in all of their fields (#2355 ) metadata{filename=xyz} filename would be there structurally, but ALWAYS null. I didn't include this as a file but it may be useful for understanding the problem for people searching on this issue so I'm including it here as documentation. Before this patch any field that is more than 1 deep is accepted but returns null values for subfields when queried. ```js const lancedb = require('@lancedb/lancedb'); // Debug logger function debug(message, data) { console.log(`[TEST] ${message}`, data !== undefined ? data : ''); } // Log when our unwrapArrowObject is called const kParent = Symbol.for("parent"); const kRowIndex = Symbol.for("rowIndex"); // Override console.log for our test const originalConsoleLog = console.log; console.log = function() { // Filter out noisy logs if (arguments[0] && typeof arguments[0] === 'string' && arguments[0].includes('[INFO] [LanceDB]')) { originalConsoleLog.apply(console, arguments); } originalConsoleLog.apply(console, arguments); }; async function main() { debug('Starting test...'); // Connect to the database debug('Connecting to database...'); const db = await lancedb.connect('./.lancedb'); // Try to open an existing table, or create a new one if it doesn't exist let table; try { table = await db.openTable('test_nested_fields'); debug('Opened existing table'); } catch (e) { debug('Creating new table...'); // Create test data with nested metadata structure const data = [ { id: 'test1', vector: [1, 2, 3], metadata: { filePath: "/path/to/file1.ts", startLine: 10, endLine: 20, text: "function test() { return true; }" } }, { id: 'test2', vector: [4, 5, 6], metadata: { filePath: "/path/to/file2.ts", startLine: 30, endLine: 40, text: "function test2() { return false; }" } } ]; debug('Data to be inserted:', JSON.stringify(data, null, 2)); // Create the table table = await db.createTable('test_nested_fields', data); debug('Table created successfully'); } // Query the table and get results debug('Querying table...'); const results = await table.search([1, 2, 3]).limit(10).toArray(); // Log the results debug('Number of results:', results.length); if (results.length > 0) { const firstResult = results[0]; debug('First result properties:', Object.keys(firstResult)); // Check if metadata is accessible and what properties it has if (firstResult.metadata) { debug('Metadata properties:', Object.keys(firstResult.metadata)); debug('Metadata filePath:', firstResult.metadata.filePath); debug('Metadata startLine:', firstResult.metadata.startLine); // Destructure to see if that helps const { filePath, startLine, endLine, text } = firstResult.metadata; debug('Destructured values:', { filePath, startLine, endLine, text }); // Check if it's a proxy object debug('Result is proxy?', Object.getPrototypeOf(firstResult) === Object.prototype ? false : true); debug('Metadata is proxy?', Object.getPrototypeOf(firstResult.metadata) === Object.prototype ? false : true); } else { debug('Metadata is not accessible!'); } } // Close the database await db.close(); } main().catch(e => { console.error('Error:', e); }); ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - Bug Fixes - Improved handling of nested struct fields to ensure accurate preservation of values during serialization and deserialization. - Enhanced robustness when accessing nested object properties, reducing errors with missing or null values. - Tests - Added tests to verify correct handling of nested struct fields through serialization and deserialization. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-01 09:38:55 -07:00
Lance Release	4ade3e31e2	Updating package-lock.json	2025-04-29 22:19:46 +00:00
Lance Release	508e621f3d	Bump version: 0.19.1-beta.0 → 0.19.1-beta.1	2025-04-29 22:19:14 +00:00
Ryan Green	af54e0ce06	feat: add table stats API (#2363 ) * Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - Tests - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 15:19:08 -02:30
Lance Release	554939e5d2	Updating package-lock.json	2025-04-28 17:20:58 +00:00
Lance Release	e9f25f6a12	Bump version: 0.19.0 → 0.19.1-beta.0	2025-04-28 17:20:26 +00:00
LuQQiu	a9311c4dc0	feat: add list/create/delete/update/checkout tag API (#2353 ) add the tag related API to list existing tags, attach tag to a version, update the tag version, delete tag, get the version of the tag, and checkout the version that the tag bounded to. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced table version tagging, allowing users to create, update, delete, and list human-readable tags for specific table versions. - Enabled checking out a table by either version number or tag name. - Added new interfaces for tag management in both Python and Node.js APIs, supporting synchronous and asynchronous workflows. - Bug Fixes - None. - Documentation - Updated documentation to describe the new tagging features, including usage examples. - Tests - Added comprehensive tests for tag creation, updating, deletion, listing, and version checkout by tag in both Python and Node.js environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-28 10:04:46 -07:00
Lance Release	e8c0c52315	Updating package-lock.json	2025-04-25 21:17:03 +00:00
Lance Release	726d629b9b	Bump version: 0.19.0-beta.12 → 0.19.0	2025-04-25 21:16:30 +00:00
Lance Release	b493f56dee	Bump version: 0.19.0-beta.11 → 0.19.0-beta.12	2025-04-25 21:16:25 +00:00
Will Jones	d8517117f1	feat: upgrade Lance to v0.26.0 (#2359 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.26.0 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency management to use published crate versions for improved reliability and maintainability. - Added a temporary workaround for build issues by pinning a specific version of a dependency. - Refactor - Improved resource management and concurrency by updating internal ownership models for object storage components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-25 13:59:12 -07:00
Lance Release	cbb9a7877c	Updating package-lock.json	2025-04-25 05:02:47 +00:00
Lance Release	1fdaf7a1a4	Bump version: 0.19.0-beta.10 → 0.19.0-beta.11	2025-04-25 05:02:16 +00:00
Lance Release	c19bdd9a24	Updating package-lock.json	2025-04-22 18:24:16 +00:00
Lance Release	a705621067	Bump version: 0.19.0-beta.9 → 0.19.0-beta.10	2025-04-22 18:23:39 +00:00
Lance Release	db853c4041	Updating package-lock.json	2025-04-21 22:50:56 +00:00
Lance Release	d8746c61c6	Bump version: 0.19.0-beta.8 → 0.19.0-beta.9	2025-04-21 22:50:20 +00:00
Ryan Green	3ae90dde80	feat: add new table API to wait for async indexing (#2338 ) * Add new wait_for_index() table operation that polls until indices are created/fully indexed * Add an optional wait timeout parameter to all create_index operations * Python and NodeJS interfaces <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Added optional waiting for index creation completion with configurable timeout. - Introduced methods to poll and wait for indices to be fully built across sync and async tables. - Extended index creation APIs to accept a wait timeout parameter. - Bug Fixes - Added a new timeout error variant for improved error reporting on index operations. - Tests - Added tests covering successful index readiness waiting, timeout scenarios, and missing index cases. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:41:21 -02:30
Lance Release	edc4e40a7b	Updating package-lock.json	2025-04-17 22:16:36 +00:00
Lance Release	35cff12e31	Bump version: 0.19.0-beta.7 → 0.19.0-beta.8	2025-04-17 22:16:02 +00:00
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
Lance Release	8a50944061	Updating package-lock.json	2025-04-15 04:11:16 +00:00
Lance Release	b3ad105fa0	Bump version: 0.19.0-beta.6 → 0.19.0-beta.7	2025-04-15 04:10:43 +00:00
BubbleCal	2248aa9508	fix: bugs for new FTS APIs (#2314 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities with support for phrase queries, fuzzy matching, boosting, and multi-column matching. - Search methods now accept full-text query objects directly, improving query flexibility and precision. - Python and JavaScript SDKs updated to handle full-text queries seamlessly, including async search support. - Tests - Added comprehensive tests covering fuzzy search, phrase search, and boosted queries to ensure robust full-text search functionality. - Documentation - Updated query class documentation to reflect new constructor options and removal of deprecated methods for clarity and simplicity. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-15 11:51:35 +08:00
Will Jones	b3a4efd587	fix: revert change default read_consistency_interval=5s (#2327 ) This reverts commit `a547c523c2` or #2281 The current implementation can cause panics and performance degradation. I will bring this back with more testing in https://github.com/lancedb/lancedb/pull/2311 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Enhanced clarity on read consistency settings with updated descriptions and default behavior. - Removed outdated warnings about eventual consistency from the troubleshooting guide. - Refactor - Streamlined the handling of the read consistency interval across integrations, now defaulting to "None" for improved performance. - Simplified internal logic to offer a more consistent experience. - Tests - Updated test expectations to reflect the new default representation for the read consistency interval. - Removed redundant tests related to "no consistency" settings for streamlined testing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-04-14 08:48:15 -07:00
Lance Release	f23aa0a793	Updating package-lock.json	2025-04-08 06:17:03 +00:00
Lance Release	56aa133ee6	Bump version: 0.19.0-beta.5 → 0.19.0-beta.6	2025-04-08 06:16:30 +00:00
BubbleCal	ec8271931f	feat: support to create FTS index on list of strings (#2317 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated internal library dependencies to the latest beta version for improved system stability. - Tests - Added automated tests to validate full-text search functionality on list-based text fields. - Refactor - Enhanced the search processing logic to provide robust support for list-type text data, ensuring more reliable results. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-08 14:12:35 +08:00
Lance Release	2e170c3c7b	Updating package-lock.json	2025-04-04 21:50:28 +00:00
Lance Release	c298482ee1	Bump version: 0.19.0-beta.4 → 0.19.0-beta.5	2025-04-04 21:49:53 +00:00
Will Jones	1cd76b8498	feat: add timeout to query execution options (#2288 ) Closes #2287 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added configurable timeout support for query executions. Users can now specify maximum wait times for queries, enhancing control over long-running operations across various integrations. - Tests - Expanded test coverage to validate timeout behavior in both synchronous and asynchronous query flows, ensuring timely error responses when query execution exceeds the specified limit. - Introduced a new test suite to verify query operations when a timeout is reached, checking for appropriate error handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-04 12:34:41 -07:00
Lance Release	0844c2dd64	Updating package-lock.json	2025-04-02 21:23:50 +00:00
Lance Release	d4ea50fba1	Bump version: 0.19.0-beta.3 → 0.19.0-beta.4	2025-04-02 21:23:19 +00:00
Lance Release	5c32a99e61	Updating package-lock.json	2025-04-02 09:28:46 +00:00
Lance Release	bd62c2384f	Bump version: 0.19.0-beta.2 → 0.19.0-beta.3	2025-04-02 09:28:14 +00:00
Will Jones	f091f57594	ci: fix lancedb musl builds (#2296 ) Fixes #2255 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Enhanced the build process to improve performance and reliability across Linux platforms. - Updated environment settings for more accurate compiler integration. - Activated previously inactive build configurations to support advanced feature support. - Added support for the x86_64 architecture on Linux systems utilizing the musl C library. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-01 14:44:27 -07:00
Lance Release	a997fd4108	Updating package-lock.json	2025-04-01 17:28:57 +00:00
Lance Release	a505bc3965	Bump version: 0.19.0-beta.1 → 0.19.0-beta.2	2025-04-01 17:28:21 +00:00
Lance Release	2eb2c8862a	Updating package-lock.json	2025-04-01 14:27:26 +00:00
Lance Release	e4485a630e	Bump version: 0.19.0-beta.0 → 0.19.0-beta.1	2025-04-01 14:26:47 +00:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
Lance Release	c44fa3abc4	Updating package-lock.json	2025-03-30 18:05:07 +00:00
Lance Release	e67cd0baf9	Bump version: 0.18.3-beta.0 → 0.19.0-beta.0	2025-03-30 18:04:32 +00:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
Will Jones	a547c523c2	feat!: change default read_consistency_interval=5s (#2281 ) Previously, when we loaded the next version of the table, we would block all reads with a write lock. Now, we only do that if `read_consistency_interval=0`. Otherwise, we load the next version asynchronously in the background. This should mean that `read_consistency_interval > 0` won't have a meaningful impact on latency. Along with this change, I felt it was safe to change the default consistency interval to 5 seconds. The current default is `None`, which means we will never check for a new version by default. I think that default is contrary to most users expectations.	2025-03-28 11:04:31 -07:00
Lance Release	c1600cdc06	Updating package-lock.json	2025-03-28 16:04:01 +00:00
Lance Release	346cbf8bf7	Bump version: 0.18.2-beta.0 → 0.18.3-beta.0	2025-03-28 16:03:31 +00:00
Lance Release	315a24c2bc	Updating package-lock.json	2025-03-21 20:03:43 +00:00
Lance Release	f97e751b3c	Bump version: 0.18.1 → 0.18.2-beta.0	2025-03-21 20:02:59 +00:00

1 2 3 4 5 ...

290 Commits