lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-26 06:39:57 +00:00

Author	SHA1	Message	Date
Lance Release	8bf89f887c	Bump version: 0.22.1-beta.2 → 0.22.1-beta.3 python-v0.22.1-beta.3	2025-05-06 02:44:39 +00:00
LuQQiu	b2160b2304	fix: fix backward compatibility with the add API (#2375 ) add API originally returns a struct with request_id, add backward compatibility for that <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes - Improved handling of empty server responses for various data operations to ensure consistent behavior across server versions. - Added default values to version and numeric fields to prevent errors when response data is incomplete. - Tests - Expanded tests to cover multiple server response scenarios, validating correct version handling in data operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 19:26:27 -07:00
Lance Release	1bb82597be	Updating package-lock.json	2025-05-06 01:21:13 +00:00
Lance Release	e4eee38b3c	Updating package-lock.json	2025-05-06 00:09:39 +00:00
Lance Release	64fc2be503	Updating package-lock.json	2025-05-06 00:09:19 +00:00
Lance Release	dc8054e90d	Bump version: 0.19.1-beta.1 → 0.19.1-beta.2 v0.19.1-beta.2	2025-05-06 00:08:55 +00:00
Lance Release	1684940946	Bump version: 0.22.1-beta.1 → 0.22.1-beta.2 python-v0.22.1-beta.2	2025-05-06 00:08:29 +00:00
LuQQiu	695813463c	chore: reduce unneeded API calls for return version for write operations and improve test (#2373 ) Reduce the duplicate code for remote write operation testing. Avoid double call to remote to get version info, just return 0 instead of suddenly adding extra API calls for end users when they are using old servers. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added version tracking to table operation results, allowing users to see the commit version associated with add, update, delete, merge, and column modification operations. - Bug Fixes - Improved compatibility with legacy servers by standardizing version information as zero when the server does not return a version. - Documentation - Clarified the meaning of the version field in operation results, especially for cases involving legacy server responses. - Tests - Enhanced test coverage to verify correct behavior with both legacy and modern server responses. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 16:47:19 -07:00
LuQQiu	ed594b0f76	feat: return version for all write operations (#2368 ) return version info for all write operations (add, update, merge_insert and column modification operations) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Table modification operations (add, update, delete, merge, add/alter/drop columns) now return detailed result objects including version numbers and operation statistics. - Result objects provide clearer feedback such as rows affected and new table version after each operation. - Documentation - Updated documentation to describe new result objects and their fields for all relevant table operations. - Added documentation for new result interfaces and updated method return types in Node.js and Python APIs. - Tests - Enhanced test coverage to assert correctness of returned versioning and operation metadata after table modifications. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 14:25:34 -07:00
Will Jones	cee2b5ea42	chore: upgrade pyarrow pin (#2192 ) Closes #2191 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated the required version of the pyarrow package to version 16 or higher. - Adjusted automated testing workflows to install pyarrow version 16 for compatibility checks. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 11:23:13 -07:00
Alex Pilon	f315f9665a	feat: implement bindings to return merge stats (#2367 ) Based on this comment: https://github.com/lancedb/lancedb/issues/2228#issuecomment-2730463075 and https://github.com/lancedb/lance/pull/2357 Here is my attempt at implementing bindings for returning merge stats from a `merge_insert.execute` call for lancedb. Note: I have almost no idea what I am doing in Rust but tried to follow existing code patterns and pay attention to compiler hints. - The change in nodejs binding appeared to be necessary to get compilation to work, presumably this could actual work properly by returning some kind of NAPI JS object of the stats data? - I am unsure of what to do with the remote/table.rs changes - necessarily for compilation to work; I assume this is related to LanceDB cloud, but unsure the best way to handle that at this point. Proof of function: ```python import pandas as pd import lancedb db = lancedb.connect("/tmp/test.db") test_data = pd.DataFrame( { "title": ["Hello", "Test Document", "Example", "Data Sample", "Last One"], "id": [1, 2, 3, 4, 5], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", ], } ) table = db.create_table("documents", data=test_data, exist_ok=True, mode="overwrite") update_data = pd.DataFrame( { "title": [ "Hello, World", "Test Document, it's good", "Example", "Data Sample", "Last One", "New One", ], "id": [1, 2, 3, 4, 5, 6], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", "New content", ], } ) stats = ( table.merge_insert(on="id") .when_matched_update_all() .when_not_matched_insert_all() .execute(update_data) ) print(stats) ``` returns ``` {'num_inserted_rows': 1, 'num_updated_rows': 5, 'num_deleted_rows': 0} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Merge-insert operations now return detailed statistics, including counts of inserted, updated, and deleted rows. - Bug Fixes - Tests updated to validate returned merge-insert statistics for accuracy. - Documentation - Method documentation improved to reflect new return values and clarify merge operation results. - Added documentation for the new `MergeStats` interface detailing operation statistics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-01 10:00:20 -07:00
Andrew C. Oliver	5deb26bc8b	fix: prevent embedded objects from returning null in all of their fields (#2355 ) metadata{filename=xyz} filename would be there structurally, but ALWAYS null. I didn't include this as a file but it may be useful for understanding the problem for people searching on this issue so I'm including it here as documentation. Before this patch any field that is more than 1 deep is accepted but returns null values for subfields when queried. ```js const lancedb = require('@lancedb/lancedb'); // Debug logger function debug(message, data) { console.log(`[TEST] ${message}`, data !== undefined ? data : ''); } // Log when our unwrapArrowObject is called const kParent = Symbol.for("parent"); const kRowIndex = Symbol.for("rowIndex"); // Override console.log for our test const originalConsoleLog = console.log; console.log = function() { // Filter out noisy logs if (arguments[0] && typeof arguments[0] === 'string' && arguments[0].includes('[INFO] [LanceDB]')) { originalConsoleLog.apply(console, arguments); } originalConsoleLog.apply(console, arguments); }; async function main() { debug('Starting test...'); // Connect to the database debug('Connecting to database...'); const db = await lancedb.connect('./.lancedb'); // Try to open an existing table, or create a new one if it doesn't exist let table; try { table = await db.openTable('test_nested_fields'); debug('Opened existing table'); } catch (e) { debug('Creating new table...'); // Create test data with nested metadata structure const data = [ { id: 'test1', vector: [1, 2, 3], metadata: { filePath: "/path/to/file1.ts", startLine: 10, endLine: 20, text: "function test() { return true; }" } }, { id: 'test2', vector: [4, 5, 6], metadata: { filePath: "/path/to/file2.ts", startLine: 30, endLine: 40, text: "function test2() { return false; }" } } ]; debug('Data to be inserted:', JSON.stringify(data, null, 2)); // Create the table table = await db.createTable('test_nested_fields', data); debug('Table created successfully'); } // Query the table and get results debug('Querying table...'); const results = await table.search([1, 2, 3]).limit(10).toArray(); // Log the results debug('Number of results:', results.length); if (results.length > 0) { const firstResult = results[0]; debug('First result properties:', Object.keys(firstResult)); // Check if metadata is accessible and what properties it has if (firstResult.metadata) { debug('Metadata properties:', Object.keys(firstResult.metadata)); debug('Metadata filePath:', firstResult.metadata.filePath); debug('Metadata startLine:', firstResult.metadata.startLine); // Destructure to see if that helps const { filePath, startLine, endLine, text } = firstResult.metadata; debug('Destructured values:', { filePath, startLine, endLine, text }); // Check if it's a proxy object debug('Result is proxy?', Object.getPrototypeOf(firstResult) === Object.prototype ? false : true); debug('Metadata is proxy?', Object.getPrototypeOf(firstResult.metadata) === Object.prototype ? false : true); } else { debug('Metadata is not accessible!'); } } // Close the database await db.close(); } main().catch(e => { console.error('Error:', e); }); ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - Bug Fixes - Improved handling of nested struct fields to ensure accurate preservation of values during serialization and deserialization. - Enhanced robustness when accessing nested object properties, reducing errors with missing or null values. - Tests - Added tests to verify correct handling of nested struct fields through serialization and deserialization. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-01 09:38:55 -07:00
Lance Release	3cc670ac38	Updating package-lock.json	2025-04-29 23:21:19 +00:00
Lance Release	4ade3e31e2	Updating package-lock.json	2025-04-29 22:19:46 +00:00
Lance Release	a222d2cd91	Updating package-lock.json	2025-04-29 22:19:30 +00:00
Lance Release	508e621f3d	Bump version: 0.19.1-beta.0 → 0.19.1-beta.1 v0.19.1-beta.1	2025-04-29 22:19:14 +00:00
Lance Release	a1a0472f3f	Bump version: 0.22.1-beta.0 → 0.22.1-beta.1 python-v0.22.1-beta.1	2025-04-29 22:18:53 +00:00
Wyatt Alt	3425a6d339	feat: upgrade lance to v0.27.0-beta.2 (#2364 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependencies for related components to use the latest version from a specific repository source. No changes to features or public functionality. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 14:59:56 -07:00
Ryan Green	af54e0ce06	feat: add table stats API (#2363 ) * Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - Tests - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 15:19:08 -02:30
Lance Release	089905fe8f	Updating package-lock.json	2025-04-28 19:13:36 +00:00
Lance Release	554939e5d2	Updating package-lock.json	2025-04-28 17:20:58 +00:00
Lance Release	7a13814922	Updating package-lock.json	2025-04-28 17:20:42 +00:00
Lance Release	e9f25f6a12	Bump version: 0.19.0 → 0.19.1-beta.0 v0.19.1-beta.0	2025-04-28 17:20:26 +00:00
Lance Release	419a433244	Bump version: 0.22.0 → 0.22.1-beta.0 python-v0.22.1-beta.0	2025-04-28 17:20:10 +00:00
LuQQiu	a9311c4dc0	feat: add list/create/delete/update/checkout tag API (#2353 ) add the tag related API to list existing tags, attach tag to a version, update the tag version, delete tag, get the version of the tag, and checkout the version that the tag bounded to. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced table version tagging, allowing users to create, update, delete, and list human-readable tags for specific table versions. - Enabled checking out a table by either version number or tag name. - Added new interfaces for tag management in both Python and Node.js APIs, supporting synchronous and asynchronous workflows. - Bug Fixes - None. - Documentation - Updated documentation to describe the new tagging features, including usage examples. - Tests - Added comprehensive tests for tag creation, updating, deletion, listing, and version checkout by tag in both Python and Node.js environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-28 10:04:46 -07:00
LuQQiu	178bcf9c90	fix: hybrid search explain plan analyze plan (#2360 ) Fix hybrid search explain plan analyze plan API <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added options to view the execution plan and analyze the runtime performance of hybrid queries. - Refactor - Improved internal handling of query setup for better modularity and maintainability. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-27 18:39:43 -07:00
Lance Release	b9be092cb1	Updating package-lock.json	2025-04-25 22:05:57 +00:00
Lance Release	e8c0c52315	Updating package-lock.json	2025-04-25 21:17:03 +00:00
Lance Release	a60fa0d3b7	Updating package-lock.json	2025-04-25 21:16:48 +00:00
Lance Release	726d629b9b	Bump version: 0.19.0-beta.12 → 0.19.0 v0.19.0	2025-04-25 21:16:30 +00:00
Lance Release	b493f56dee	Bump version: 0.19.0-beta.11 → 0.19.0-beta.12	2025-04-25 21:16:25 +00:00
Lance Release	a8b5ad7e74	Bump version: 0.22.0-beta.12 → 0.22.0 python-v0.22.0	2025-04-25 21:16:07 +00:00
Lance Release	f8f6264883	Bump version: 0.22.0-beta.11 → 0.22.0-beta.12	2025-04-25 21:16:07 +00:00
Will Jones	d8517117f1	feat: upgrade Lance to v0.26.0 (#2359 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.26.0 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency management to use published crate versions for improved reliability and maintainability. - Added a temporary workaround for build issues by pinning a specific version of a dependency. - Refactor - Improved resource management and concurrency by updating internal ownership models for object storage components. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-25 13:59:12 -07:00
Lance Release	ab66dd5ed2	Updating package-lock.json	2025-04-25 06:04:06 +00:00
Lance Release	cbb9a7877c	Updating package-lock.json	2025-04-25 05:02:47 +00:00
Lance Release	b7fc223535	Updating package-lock.json	2025-04-25 05:02:32 +00:00
Lance Release	1fdaf7a1a4	Bump version: 0.19.0-beta.10 → 0.19.0-beta.11 v0.19.0-beta.11	2025-04-25 05:02:16 +00:00
Lance Release	d11819c90c	Bump version: 0.22.0-beta.10 → 0.22.0-beta.11 python-v0.22.0-beta.11	2025-04-25 05:01:57 +00:00
BubbleCal	9b902272f1	fix: sync hybrid search ignores the distance range params (#2356 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for distance range filtering in hybrid vector queries, allowing users to specify lower and upper bounds for search results. - Tests - Introduced new tests to validate distance range filtering and reranking in both synchronous and asynchronous hybrid query scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-25 13:01:22 +08:00
Will Jones	8c0622fa2c	fix: remote limit to avoid "Limit must be non-negative" (#2354 ) To workaround this issue: https://github.com/lancedb/lancedb/issues/2211 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes - Improved handling of large query parameters to prevent potential overflow issues when using the "k" parameter in queries. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-24 15:04:06 -07:00
Philip Meier	2191f948c3	fix: add missing pydantic model config compat (#2316 ) Fixes #2315. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Enhanced query processing to maintain smooth functionality across different dependency versions, ensuring improved stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-22 14:46:10 -07:00
Will Jones	acc3b03004	ci: fix docs deploy (#2351 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Improved CI workflow for documentation builds by optimizing Rust build settings and updating the runner environment. - Fixed a typo in a workflow step name. - Streamlined caching steps to reduce redundancy and improve efficiency. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-22 13:55:34 -07:00
Lance Release	7f091b8c8e	Updating package-lock.json	2025-04-22 19:16:43 +00:00
Lance Release	c19bdd9a24	Updating package-lock.json	2025-04-22 18:24:16 +00:00
Lance Release	dad0ff5cd2	Updating package-lock.json	2025-04-22 18:23:59 +00:00
Lance Release	a705621067	Bump version: 0.19.0-beta.9 → 0.19.0-beta.10 v0.19.0-beta.10	2025-04-22 18:23:39 +00:00
Lance Release	39614fdb7d	Bump version: 0.22.0-beta.9 → 0.22.0-beta.10 python-v0.22.0-beta.10	2025-04-22 18:23:17 +00:00
Ryan Green	96d534d4bc	feat: add retries to remote client for requests with stream bodies (#2349 ) Closes https://github.com/lancedb/lancedb/issues/2307 * Adds retries to remote operations with stream bodies (add, merge_insert) * Change default retryable status codes to 409, 429, 500, 502, 503, 504 * Don't retry add or merge_insert operations on 5xx responses Notes: * Supporting retries on stream bodies means we have to buffer the body into memory so it can be cloned on retry. This will impact memory use patterns for the remote client. This buffering can be disabled by disabling retries (i.e. setting retries to 0 in RetryConfig) * It does not seem that retry config can be specified by env vars as the documentation suggests. I added a follow-up issue [here](https://github.com/lancedb/lancedb/issues/2350) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Enhanced retry support for remote requests with configurable limits and exponential backoff with jitter. - Added robust retry logic for streaming data uploads, enabling retries with buffered data to ensure reliability. - Bug Fixes - Improved error handling and retry behavior for HTTP status codes 409 and 504. - Refactor - Centralized and modularized HTTP request sending and retry logic across remote database and table operations. - Streamlined request ID management for improved traceability. - Simplified error message construction in index waiting functionality. - Tests - Added a test verifying merge-insert retries on HTTP 409 responses. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-22 15:40:44 -02:30
Lance Release	5051d30d09	Updating package-lock.json	2025-04-21 23:55:43 +00:00

1 2 3 4 5 ...

1838 Commits