lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-27 15:12:53 +00:00

Author	SHA1	Message	Date
Weston Pace	ed640a76d9	feat: add take_offsets and take_row_ids (#2584 ) These operations have existed in lance for a long while and many users need to drop down to lance for this capability. This PR adds the API and implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't currently add any load to `BaseTable`. I'm not sure that is sustainable as base table implementations may want to specialize how they handle this method. However, I figure it is a good starting point. In addition, unlike Lance, this API does not currently guarantee anything about the order of the take results. This is necessary for the fallback filter approach to work (SQL filters cannot guarantee result order)	2025-08-15 06:48:24 -07:00
Weston Pace	1dadb2aefa	feat: upgrade to lance 0.31.0-beta.1 (#2469 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Updated dependencies to newer versions for improved compatibility and stability. * Refactor * Improved internal handling of data ranges and stream lifetimes for enhanced performance and reliability. * Simplified code style for Python query object conversions without affecting functionality. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-30 11:10:53 -07:00
Will Jones	4beb2d2877	fix(python): make sure `explain_plan` works with FTS queries (#2466 ) ## Summary Fixes issue #2465 where FTS explain plans only showed basic `LanceScan` instead of detailed execution plans with FTS query details, limits, and offsets. ## Root Cause The `FTSQuery::explain_plan()` and `analyze_plan()` methods were missing the `.full_text_search()` call before calling explain/analyze plan, causing them to operate on the base query without FTS context. ## Changes - Fixed `explain_plan()` and `analyze_plan()` in `src/query.rs` to call `.full_text_search()` - Added comprehensive test coverage for FTS explain plans with limits, offsets, and filters - Updated existing tests to expect correct behavior instead of buggy behavior ## Before/After Before (broken): ``` LanceScan: uri=..., projection=[...], row_id=false, row_addr=false, ordered=true ``` After (fixed): ``` ProjectionExec: expr=[id@2 as id, text@3 as text, _score@1 as _score] Take: columns="_rowid, _score, (id), (text)" CoalesceBatchesExec: target_batch_size=1024 GlobalLimitExec: skip=2, fetch=4 MatchQuery: query=test ``` ## Test Plan - [x] All new FTS explain plan tests pass - [x] Existing tests continue to pass - [x] FTS queries now show proper execution plans with MatchQuery, limits, filters Closes #2465 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Added new test cases to verify explain plan output for full-text search, vector queries with pagination, and queries with filters. * Bug Fixes * Improved the accuracy of explain plan and analysis output for full-text search queries, ensuring the correct query details are reflected. * Refactor * Enhanced the formatting and hierarchical structure of execution plans for hybrid queries, providing clearer and more detailed plan representations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-06-26 23:35:14 -07:00
BubbleCal	cbb5a841b1	feat: support prefix matching and must_not clause (#2441 )	2025-06-19 10:32:32 +08:00
Wyatt Alt	627ca4c810	chore: update lance to v0.29.1-beta.2 (#2442 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated internal dependencies to use a newer version of the Lance library. - New Features - Added support for a new query occurrence type labeled "MUST NOT" in search filters. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-17 14:02:13 -07:00
Weston Pace	59b57e30ed	feat: add maximum and minimum nprobes properties (#2430 ) This exposes the maximum_nprobes and minimum_nprobes feature that was added in https://github.com/lancedb/lance/pull/3903 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for specifying minimum and maximum probe counts in vector search queries, allowing finer control over search behavior. - Users can now independently set minimum and maximum probes for vector and hybrid queries via new methods and parameters in Python, Node.js, and Rust APIs. - Bug Fixes - Improved parameter validation to ensure correct usage of minimum and maximum probe values. - Tests - Expanded test coverage to validate correct handling, serialization, and error cases for the new probe parameters. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-13 15:18:29 -07:00
BubbleCal	84ded9d678	feat: support new FTS features in python SDK (#2411 ) - AND operator - phrase query slop param - boolean query <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for combining full-text search queries using AND/OR operators, enabling more flexible query composition. - Introduced new query types and parameters, including boolean queries, operator selection, occurrence constraints, and phrase slop for advanced search scenarios. - Enhanced asynchronous search to accept rich full-text query objects directly. - Bug Fixes - Improved handling and validation of full-text search queries in both synchronous and asynchronous search operations. - Tests - Updated and expanded tests to cover new full-text query types and their usage in search functions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-06-06 14:33:46 +08:00
BubbleCal	9b902272f1	fix: sync hybrid search ignores the distance range params (#2356 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for distance range filtering in hybrid vector queries, allowing users to specify lower and upper bounds for search results. - Tests - Introduced new tests to validate distance range filtering and reranking in both synchronous and asynchronous hybrid query scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-25 13:01:22 +08:00
Lei Xu	080ea2f9a4	chore: fix 1.86 warnings (#2312 ) Fix rust 1.86 warnings	2025-04-12 08:29:10 -05:00
Will Jones	1cd76b8498	feat: add timeout to query execution options (#2288 ) Closes #2287 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added configurable timeout support for query executions. Users can now specify maximum wait times for queries, enhancing control over long-running operations across various integrations. - Tests - Expanded test coverage to validate timeout behavior in both synchronous and asynchronous query flows, ensuring timely error responses when query execution exceeds the specified limit. - Introduced a new test suite to verify query operations when a timeout is reached, checking for appropriate error handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-04 12:34:41 -07:00
BubbleCal	e52ac79c69	fix: can't do structured FTS in python (#2300 ) missed to support it in `search()` API and there were some pydantic errors <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities by incorporating additional parameters, enabling more flexible query definitions. - Extended table search functionality to support full-text queries alongside existing search types. - Tests - Introduced new tests that validate both structured and conditional full-text search behaviors. - Expanded test coverage for various query types, including MatchQuery, BoostQuery, MultiMatchQuery, and PhraseQuery. - Bug Fixes - Fixed a logic issue in query processing to ensure correct handling of full-text search queries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-02 17:27:15 +08:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
LuQQiu	698f329598	feat: add explain plan remote api (#2263 ) Add explain plan remote api	2025-03-26 11:22:40 -07:00
Weston Pace	9403254442	feat: add to_query_object method (#2239 ) This PR adds a `to_query_object` method to the various query builders (except not hybrid queries yet). This makes it possible to inspect the query that is built. In addition this PR does some normalization between the sync and async query paths. A few custom defaults were removed in favor of None (with the default getting set once, in rust). Also, the synchronous to_batches method will now actually stream results Also, the remote API now defaults to prefiltering	2025-03-21 13:01:51 -07:00
Weston Pace	6bf742c759	feat: expose table trait (#2097 ) Similar to `c269524b2f` this PR reworks and exposes an internal trait (this time `TableInternal`) to be a public trait. These two PRs together should make it possible for others to integrate LanceDB on top of other catalogs. This PR also adds a basic `TableProvider` implementation for tables, although some work still needs to be done here (pushdown not yet enabled).	2025-02-05 18:13:51 -08:00
Will Jones	15f8f4d627	ci: check license headers (#2076 ) Based on the same workflow in Lance.	2025-01-29 08:27:07 -08:00
BubbleCal	f4dea72cc5	feat: support vector search with distance thresholds (#1993 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-01-06 13:23:39 +08:00
BubbleCal	445a312667	fix: selecting columns failed on FTS and hybrid search (#1991 ) it reports error `AttributeError: 'builtins.FTSQuery' object has no attribute 'select_columns'` because we missed `select_columns` method in rust Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-01-03 13:08:12 +08:00
Bert	2a9e3e2084	feat(python): support hybrid search in async sdk (#1915 ) fixes: https://github.com/lancedb/lancedb/issues/1765 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-12-06 13:53:15 -05:00
Will Jones	5f261cf2d8	feat: upgrade to Lance v0.20.0 (#1908 ) Upstream change log: https://github.com/lancedb/lance/releases/tag/v0.20.0	2024-12-05 10:53:59 -08:00
BubbleCal	b2f88f0b29	feat: support to sepcify ef search param (#1844 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-11-19 23:12:25 +08:00
Will Jones	abd75e0ead	feat: search multiple query vectors as one query (#1811 ) Allows users to pass multiple query vector as part of a single query plan. This just runs the queries in parallel without any further optimization. It's mostly a convenience. Previously, I think this was only handled by the sync Python remote API. This makes it common across all SDKs. Closes https://github.com/lancedb/lancedb/issues/1803 ```python >>> import lancedb >>> import asyncio >>> >>> async def main(): ... db = await lancedb.connect_async("./demo") ... table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite") ... return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas() ... >>> asyncio.run(main()) query_index id vector _distance 0 2 2 [4.0, 5.0, 6.0] 0.0 1 1 2 [4.0, 5.0, 6.0] 0.0 2 0 1 [1.0, 2.0, 3.0] 0.0 ```	2024-11-13 16:05:16 -08:00
Will Jones	3604d20ad3	feat(python,node): support with_row_id in Python and remote (#1784 ) Needed to support hybrid search in Remote SDK.	2024-11-04 11:25:45 -08:00
Will Jones	15ed7f75a0	feat(python): support post filter on FTS (#1783 )	2024-11-01 10:05:05 -07:00
Will Jones	96181ab421	feat: `fast_search` in Python and Node (#1623 ) Sometimes it is acceptable to users to only search indexed data and skip and new un-indexed data. For example, if un-indexed data will be shortly indexed and they don't mind the delay. In these cases, we can save a lot of CPU time in search, and provide better latency. Users can activate this on queries using `fast_search()`.	2024-11-01 09:29:09 -07:00
Gagan Bhullar	b24810a011	feat(python, rust): expose offset in query (#1556 ) PR is part of #1555	2024-09-05 08:33:07 -07:00
BubbleCal	0fa50775d6	feat: support to query/index FTS on RemoteTable/AsyncTable (#1537 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-08-16 12:01:05 +08:00
Will Jones	9555efacf9	feat: upgrade lance to 0.15.0 (#1477 ) Changelog: https://github.com/lancedb/lance/releases/tag/v0.15.0 * Fixes #1466 * Closes #1475 * Fixes #1446	2024-07-26 09:13:49 -07:00
Will Jones	4f601a2d4c	fix: handle camelCase column names in select (#1460 ) Fixes #1385	2024-07-22 12:53:17 -07:00
Nuvic	46c6ff889d	feat: add the explain_plan function (#1328 ) It's useful to see the underlying query plan for debugging purposes. This exposes LanceScanner's `explain_plan` function. Addresses #1288 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-02 11:10:01 -07:00
Weston Pace	d5586c9c32	feat: make it possible to opt in to using the v2 format (#1352 ) This also exposed the max_batch_length configuration option in python/node (it was needed to verify if we are actually in v2 mode or not)	2024-06-04 21:52:14 -07:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00

33 Commits