lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-23 05:19:58 +00:00

Author	SHA1	Message	Date
Ryan Green	af54e0ce06	feat: add table stats API (#2363 ) * Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - Tests - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 15:19:08 -02:30
Lance Release	419a433244	Bump version: 0.22.0 → 0.22.1-beta.0	2025-04-28 17:20:10 +00:00
LuQQiu	a9311c4dc0	feat: add list/create/delete/update/checkout tag API (#2353 ) add the tag related API to list existing tags, attach tag to a version, update the tag version, delete tag, get the version of the tag, and checkout the version that the tag bounded to. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced table version tagging, allowing users to create, update, delete, and list human-readable tags for specific table versions. - Enabled checking out a table by either version number or tag name. - Added new interfaces for tag management in both Python and Node.js APIs, supporting synchronous and asynchronous workflows. - Bug Fixes - None. - Documentation - Updated documentation to describe the new tagging features, including usage examples. - Tests - Added comprehensive tests for tag creation, updating, deletion, listing, and version checkout by tag in both Python and Node.js environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-28 10:04:46 -07:00
LuQQiu	178bcf9c90	fix: hybrid search explain plan analyze plan (#2360 ) Fix hybrid search explain plan analyze plan API <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added options to view the execution plan and analyze the runtime performance of hybrid queries. - Refactor - Improved internal handling of query setup for better modularity and maintainability. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-27 18:39:43 -07:00
Lance Release	a8b5ad7e74	Bump version: 0.22.0-beta.12 → 0.22.0	2025-04-25 21:16:07 +00:00
Lance Release	f8f6264883	Bump version: 0.22.0-beta.11 → 0.22.0-beta.12	2025-04-25 21:16:07 +00:00
Lance Release	d11819c90c	Bump version: 0.22.0-beta.10 → 0.22.0-beta.11	2025-04-25 05:01:57 +00:00
BubbleCal	9b902272f1	fix: sync hybrid search ignores the distance range params (#2356 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for distance range filtering in hybrid vector queries, allowing users to specify lower and upper bounds for search results. - Tests - Introduced new tests to validate distance range filtering and reranking in both synchronous and asynchronous hybrid query scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-25 13:01:22 +08:00
Philip Meier	2191f948c3	fix: add missing pydantic model config compat (#2316 ) Fixes #2315. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Enhanced query processing to maintain smooth functionality across different dependency versions, ensuring improved stability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-22 14:46:10 -07:00
Lance Release	39614fdb7d	Bump version: 0.22.0-beta.9 → 0.22.0-beta.10	2025-04-22 18:23:17 +00:00
Lance Release	1a66df2627	Bump version: 0.22.0-beta.8 → 0.22.0-beta.9	2025-04-21 22:49:59 +00:00
Will Jones	92f0b16e46	fix(python): make sure pandas is optional (#2346 ) Fixes #2344 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Tests - Updated tests to use PyArrow Tables instead of pandas DataFrames where possible, reducing reliance on pandas. - Tests that require pandas are now automatically skipped if pandas is not installed. - Chores - Improved workflow to uninstall both pylance and pandas in a specific test step. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 13:42:13 -07:00
Ryan Green	3ae90dde80	feat: add new table API to wait for async indexing (#2338 ) * Add new wait_for_index() table operation that polls until indices are created/fully indexed * Add an optional wait timeout parameter to all create_index operations * Python and NodeJS interfaces <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Added optional waiting for index creation completion with configurable timeout. - Introduced methods to poll and wait for indices to be fully built across sync and async tables. - Extended index creation APIs to accept a wait timeout parameter. - Bug Fixes - Added a new timeout error variant for improved error reporting on index operations. - Tests - Added tests covering successful index readiness waiting, timeout scenarios, and missing index cases. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:41:21 -02:30
Magnus	4f07fea6df	feat: add ColPali embedding support with MultiVector type (#2170 ) This PR adds ColPali support with ColPaliEmbeddings class (tagged "colpali") using ColQwen2.5 for multi-vector text/image embeddings. Also added MultiVector Pydantic type to handle the vector lists. I've added some integration test for the embedding model and some unit test for the new Pydantic type. Could be a template for other ColPali variants as well. or until transformers🤗 starts supporting it. Still `TODO`: - [ ] Documentation - [ ] Add an example _Could also allow Image as query, but didn't work well when testing it._ [ColPali-Engine](https://github.com/illuin-tech/colpali) version: 0.3.9.dev17+g3faee24 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced support for ColPali-based multimodal multi-vector embeddings for both text and images. - Added a new embedding class for generating multi-vector embeddings, configurable for various model and processing options. - Added a new Pydantic type for multi-vector embeddings, supporting validation and schema generation for lists of fixed-dimension vectors. - Bug Fixes - Ensured proper asynchronous index creation in query tests for improved reliability. - Tests - Added integration tests for ColPali embeddings, including text-to-image search and validation of multi-vector fields. - Added comprehensive tests for the new multi-vector Pydantic type, covering schema, validation, and default value behavior. - Chores - Updated optional dependencies to include the ColPali engine. - Added utility to check for availability of flash attention support. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 11:47:37 +08:00
Lance Release	c6c20cb2bd	Bump version: 0.22.0-beta.7 → 0.22.0-beta.8	2025-04-17 22:15:46 +00:00
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
Lance Release	6e701d3e1b	Bump version: 0.22.0-beta.6 → 0.22.0-beta.7	2025-04-15 04:10:26 +00:00
BubbleCal	2248aa9508	fix: bugs for new FTS APIs (#2314 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities with support for phrase queries, fuzzy matching, boosting, and multi-column matching. - Search methods now accept full-text query objects directly, improving query flexibility and precision. - Python and JavaScript SDKs updated to handle full-text queries seamlessly, including async search support. - Tests - Added comprehensive tests covering fuzzy search, phrase search, and boosted queries to ensure robust full-text search functionality. - Documentation - Updated query class documentation to reflect new constructor options and removal of deprecated methods for clarity and simplicity. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-15 11:51:35 +08:00
PhorstenkampFuzzy	a6fa69ab89	fix(python): add pylance as its own optional dependency (#2336 ) This change allows to centrally manage the plance depndency without everybody needing to monitor for compatibility manually. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an optional dependency that enhances development support. Users can now benefit from improved static analysis capabilities when installing the recommended version (0.23.2 or later). <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-14 09:28:16 -07:00
Will Jones	b3a4efd587	fix: revert change default read_consistency_interval=5s (#2327 ) This reverts commit `a547c523c2` or #2281 The current implementation can cause panics and performance degradation. I will bring this back with more testing in https://github.com/lancedb/lancedb/pull/2311 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Enhanced clarity on read consistency settings with updated descriptions and default behavior. - Removed outdated warnings about eventual consistency from the troubleshooting guide. - Refactor - Streamlined the handling of the read consistency interval across integrations, now defaulting to "None" for improved performance. - Simplified internal logic to offer a more consistent experience. - Tests - Updated test expectations to reflect the new default representation for the read consistency interval. - Removed redundant tests related to "no consistency" settings for streamlined testing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-04-14 08:48:15 -07:00
Lei Xu	080ea2f9a4	chore: fix 1.86 warnings (#2312 ) Fix rust 1.86 warnings	2025-04-12 08:29:10 -05:00
Ayush Chaurasia	32fdde23f8	fix: robust handling of empty result when reranking (#2313 ) I found some edge cases while running experiments that - depending on the base reranking libraries, some of them don't handle empty lists well. This PR manually checks if the result set to be reranked is empty <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes - Enhanced search result processing by ensuring that reordering only occurs when valid, non-empty results are available, thereby preventing unnecessary operations and potential errors. - Tests - Added automated tests to verify that empty search result sets are handled correctly, ensuring consistent behavior across various rerankers. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-09 16:26:05 +05:30
Lance Release	27d9e5c596	Bump version: 0.22.0-beta.5 → 0.22.0-beta.6	2025-04-08 06:16:14 +00:00
BubbleCal	ec8271931f	feat: support to create FTS index on list of strings (#2317 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated internal library dependencies to the latest beta version for improved system stability. - Tests - Added automated tests to validate full-text search functionality on list-based text fields. - Refactor - Enhanced the search processing logic to provide robust support for list-type text data, ensuring more reliable results. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-08 14:12:35 +08:00
Lance Release	d59f64b5a3	Bump version: 0.22.0-beta.4 → 0.22.0-beta.5	2025-04-04 21:49:34 +00:00
fzowl	30ed8c4c43	fix: voyageai regression multimodal supercedes text models (#2268 ) fix #2160	2025-04-04 14:45:56 -07:00
Will Jones	1cd76b8498	feat: add timeout to query execution options (#2288 ) Closes #2287 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added configurable timeout support for query executions. Users can now specify maximum wait times for queries, enhancing control over long-running operations across various integrations. - Tests - Expanded test coverage to validate timeout behavior in both synchronous and asynchronous query flows, ensuring timely error responses when query execution exceeds the specified limit. - Introduced a new test suite to verify query operations when a timeout is reached, checking for appropriate error handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-04 12:34:41 -07:00
Lei Xu	a38f784081	chore: add numpy as dependency (#2308 )	2025-04-04 10:33:39 -07:00
Lance Release	0d42297cf8	Bump version: 0.22.0-beta.3 → 0.22.0-beta.4	2025-04-02 21:23:02 +00:00
Lance Release	f0bc08c0d7	Bump version: 0.22.0-beta.2 → 0.22.0-beta.3	2025-04-02 09:27:55 +00:00
BubbleCal	e52ac79c69	fix: can't do structured FTS in python (#2300 ) missed to support it in `search()` API and there were some pydantic errors <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities by incorporating additional parameters, enabling more flexible query definitions. - Extended table search functionality to support full-text queries alongside existing search types. - Tests - Introduced new tests that validate both structured and conditional full-text search behaviors. - Expanded test coverage for various query types, including MatchQuery, BoostQuery, MultiMatchQuery, and PhraseQuery. - Bug Fixes - Fixed a logic issue in query processing to ensure correct handling of full-text search queries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-02 17:27:15 +08:00
Lance Release	c1738250a3	Bump version: 0.22.0-beta.1 → 0.22.0-beta.2	2025-04-01 17:27:57 +00:00
Weston Pace	1ee63984f5	feat: allow FSB to be used for btree indices (#2297 ) We recently allowed this for lance but there was a check in lancedb as well that was preventing it <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for indexing fixed-size binary data using B-tree structures for efficient data storage and retrieval. - Tests - Implemented automated tests to ensure the new binary indexing works correctly and meets the expected configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-01 10:27:22 -07:00
Lance Release	fb95f9b3bd	Bump version: 0.22.0-beta.0 → 0.22.0-beta.1	2025-04-01 14:26:28 +00:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
Lance Release	26dab93f2a	Bump version: 0.21.3-beta.0 → 0.22.0-beta.0	2025-03-30 18:04:14 +00:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
Will Jones	a547c523c2	feat!: change default read_consistency_interval=5s (#2281 ) Previously, when we loaded the next version of the table, we would block all reads with a write lock. Now, we only do that if `read_consistency_interval=0`. Otherwise, we load the next version asynchronously in the background. This should mean that `read_consistency_interval > 0` won't have a meaningful impact on latency. Along with this change, I felt it was safe to change the default consistency interval to 5 seconds. The current default is `None`, which means we will never check for a new version by default. I think that default is contrary to most users expectations.	2025-03-28 11:04:31 -07:00
Lance Release	3c7dfe9f28	Bump version: 0.21.2-beta.0 → 0.21.3-beta.0	2025-03-28 16:03:17 +00:00
Lei Xu	f52d05d3fa	feat: add columns using pyarrow schema (#2284 )	2025-03-28 08:51:50 -07:00
LuQQiu	cba14a5743	feat: add restore remote api (#2282 )	2025-03-27 16:33:52 -07:00
LuQQiu	698f329598	feat: add explain plan remote api (#2263 ) Add explain plan remote api	2025-03-26 11:22:40 -07:00
Wyatt Alt	f882f5b69a	fix: update Query pydoc (#2273 ) Removes reference of nonexistent method.	2025-03-25 08:50:23 -07:00
Benjamin Clavié	a68311a893	fix: answerdotai rerankers argument passing (#2117 ) This fixes an issue for people wishing to use different kinds of rerankers in lancedb via AnswerDotAI rerankers. Currently, the arguments are passed sequentially, but they don't match the[Reranker class implementation](`d604a8c47d/rerankers/reranker.py (L179)`): the second argument is expected to be an optional "lang" for default models, while model_type should be passed explicitly. The one line changes in this PR fixes it and enables the use of other methods (eg LLMs-as-rerankers)	2025-03-24 12:31:59 +05:30
Will Jones	abe06fee3d	feat(python): warn on fork (#2258 ) Closes #768	2025-03-21 17:18:10 -07:00
Lance Release	e803a626a1	Bump version: 0.21.1 → 0.21.2-beta.0	2025-03-21 20:02:25 +00:00
Weston Pace	9403254442	feat: add to_query_object method (#2239 ) This PR adds a `to_query_object` method to the various query builders (except not hybrid queries yet). This makes it possible to inspect the query that is built. In addition this PR does some normalization between the sync and async query paths. A few custom defaults were removed in favor of None (with the default getting set once, in rust). Also, the synchronous to_batches method will now actually stream results Also, the remote API now defaults to prefiltering	2025-03-21 13:01:51 -07:00
Will Jones	b2a38ac366	fix: make pylance optional again (#2209 ) The two remaining blockers were: * A method `with_embeddings` that was deprecated a year ago * A typecheck for `LanceDataset`	2025-03-21 11:26:32 -07:00
BubbleCal	7ff6ec7fe3	feat: upgrade to lance v0.25.0-beta.5 (#2248 ) - adds `loss` into the index stats for vector index - now `optimize` can retrain the vector index --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:12:23 -07:00
Ayush Chaurasia	ba1ded933a	fix: add better check for empty results in hybrid search (#2252 ) fixes: https://github.com/lancedb/lancedb/issues/2249	2025-03-21 13:05:05 +05:30

... 2 3 4 5 6 ...

846 Commits