Commit Graph

33 Commits

Author SHA1 Message Date
Weston Pace
ed640a76d9 feat: add take_offsets and take_row_ids (#2584)
These operations have existed in lance for a long while and many users
need to drop down to lance for this capability. This PR adds the API and
implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't
currently add any load to `BaseTable`. I'm not sure that is sustainable
as base table implementations may want to specialize how they handle
this method. However, I figure it is a good starting point.

In addition, unlike Lance, this API does not currently guarantee
anything about the order of the take results. This is necessary for the
fallback filter approach to work (SQL filters cannot guarantee result
order)
2025-08-15 06:48:24 -07:00
Weston Pace
1dadb2aefa feat: upgrade to lance 0.31.0-beta.1 (#2469)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Updated dependencies to newer versions for improved compatibility and
stability.

* **Refactor**
* Improved internal handling of data ranges and stream lifetimes for
enhanced performance and reliability.
* Simplified code style for Python query object conversions without
affecting functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-30 11:10:53 -07:00
Will Jones
4beb2d2877 fix(python): make sure explain_plan works with FTS queries (#2466)
## Summary

Fixes issue #2465 where FTS explain plans only showed basic `LanceScan`
instead of detailed execution plans with FTS query details, limits, and
offsets.

## Root Cause

The `FTSQuery::explain_plan()` and `analyze_plan()` methods were missing
the `.full_text_search()` call before calling explain/analyze plan,
causing them to operate on the base query without FTS context.

## Changes

- **Fixed** `explain_plan()` and `analyze_plan()` in `src/query.rs` to
call `.full_text_search()`
- **Added comprehensive test coverage** for FTS explain plans with
limits, offsets, and filters
- **Updated existing tests** to expect correct behavior instead of buggy
behavior

## Before/After

**Before (broken):**
```
LanceScan: uri=..., projection=[...], row_id=false, row_addr=false, ordered=true
```

**After (fixed):**
```
ProjectionExec: expr=[id@2 as id, text@3 as text, _score@1 as _score]
  Take: columns="_rowid, _score, (id), (text)"
    CoalesceBatchesExec: target_batch_size=1024
      GlobalLimitExec: skip=2, fetch=4
        MatchQuery: query=test
```

## Test Plan

- [x] All new FTS explain plan tests pass 
- [x] Existing tests continue to pass
- [x] FTS queries now show proper execution plans with MatchQuery,
limits, filters

Closes #2465

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added new test cases to verify explain plan output for full-text
search, vector queries with pagination, and queries with filters.

* **Bug Fixes**
* Improved the accuracy of explain plan and analysis output for
full-text search queries, ensuring the correct query details are
reflected.

* **Refactor**
* Enhanced the formatting and hierarchical structure of execution plans
for hybrid queries, providing clearer and more detailed plan
representations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-26 23:35:14 -07:00
BubbleCal
cbb5a841b1 feat: support prefix matching and must_not clause (#2441) 2025-06-19 10:32:32 +08:00
Wyatt Alt
627ca4c810 chore: update lance to v0.29.1-beta.2 (#2442)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies to use a newer version of the Lance
library.
- **New Features**
- Added support for a new query occurrence type labeled "MUST NOT" in
search filters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-17 14:02:13 -07:00
Weston Pace
59b57e30ed feat: add maximum and minimum nprobes properties (#2430)
This exposes the maximum_nprobes and minimum_nprobes feature that was
added in https://github.com/lancedb/lance/pull/3903

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying minimum and maximum probe counts in
vector search queries, allowing finer control over search behavior.
- Users can now independently set minimum and maximum probes for vector
and hybrid queries via new methods and parameters in Python, Node.js,
and Rust APIs.

- **Bug Fixes**
- Improved parameter validation to ensure correct usage of minimum and
maximum probe values.

- **Tests**
- Expanded test coverage to validate correct handling, serialization,
and error cases for the new probe parameters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-13 15:18:29 -07:00
BubbleCal
84ded9d678 feat: support new FTS features in python SDK (#2411)
- AND operator
- phrase query slop param
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for combining full-text search queries using AND/OR
operators, enabling more flexible query composition.
- Introduced new query types and parameters, including boolean queries,
operator selection, occurrence constraints, and phrase slop for advanced
search scenarios.
- Enhanced asynchronous search to accept rich full-text query objects
directly.

- **Bug Fixes**
- Improved handling and validation of full-text search queries in both
synchronous and asynchronous search operations.

- **Tests**
- Updated and expanded tests to cover new full-text query types and
their usage in search functions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-06 14:33:46 +08:00
BubbleCal
9b902272f1 fix: sync hybrid search ignores the distance range params (#2356)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for distance range filtering in hybrid vector queries,
allowing users to specify lower and upper bounds for search results.

- **Tests**
- Introduced new tests to validate distance range filtering and
reranking in both synchronous and asynchronous hybrid query scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-25 13:01:22 +08:00
Lei Xu
080ea2f9a4 chore: fix 1.86 warnings (#2312)
Fix rust 1.86 warnings
2025-04-12 08:29:10 -05:00
Will Jones
1cd76b8498 feat: add timeout to query execution options (#2288)
Closes #2287


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added configurable timeout support for query executions. Users can now
specify maximum wait times for queries, enhancing control over
long-running operations across various integrations.
- **Tests**
- Expanded test coverage to validate timeout behavior in both
synchronous and asynchronous query flows, ensuring timely error
responses when query execution exceeds the specified limit.
- Introduced a new test suite to verify query operations when a timeout
is reached, checking for appropriate error handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-04 12:34:41 -07:00
BubbleCal
e52ac79c69 fix: can't do structured FTS in python (#2300)
missed to support it in `search()` API and there were some pydantic
errors

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced full-text search capabilities by incorporating additional
parameters, enabling more flexible query definitions.
- Extended table search functionality to support full-text queries
alongside existing search types.

- **Tests**
- Introduced new tests that validate both structured and conditional
full-text search behaviors.
- Expanded test coverage for various query types, including MatchQuery,
BoostQuery, MultiMatchQuery, and PhraseQuery.

- **Bug Fixes**
- Fixed a logic issue in query processing to ensure correct handling of
full-text search queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-04-02 17:27:15 +08:00
Weston Pace
625bab3f21 feat: update to lance 0.25.3b1 (#2294)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated dependency versions for improved performance and
compatibility.

- **New Features**
- Added support for structured full-text search with expanded query
types (e.g., match, phrase, boost, multi-match) and flexible input
formats.
- Introduced a new method to check server support for structural
full-text search features.
- Enhanced the query system with new classes and interfaces for handling
various full-text queries.
- Expanded the functionality of existing methods to accept more complex
query structures, including updates to method signatures.

- **Bug Fixes**
  - Improved error handling and reporting for full-text search queries.

- **Refactor**
- Enhanced query processing with streamlined input handling and improved
error reporting, ensuring more robust and consistent search results
across platforms.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
2025-04-01 06:36:42 -07:00
LuQQiu
a1d1833a40 feat: add analyze_plan api (#2280)
add analyze plan api to allow executing the queries and see runtime
metrics.
Which help identify the query IO overhead and help identify query
slowness
2025-03-28 14:28:52 -07:00
LuQQiu
698f329598 feat: add explain plan remote api (#2263)
Add explain plan remote api
2025-03-26 11:22:40 -07:00
Weston Pace
9403254442 feat: add to_query_object method (#2239)
This PR adds a `to_query_object` method to the various query builders
(except not hybrid queries yet). This makes it possible to inspect the
query that is built.

In addition this PR does some normalization between the sync and async
query paths. A few custom defaults were removed in favor of None (with
the default getting set once, in rust).

Also, the synchronous to_batches method will now actually stream results

Also, the remote API now defaults to prefiltering
2025-03-21 13:01:51 -07:00
Weston Pace
6bf742c759 feat: expose table trait (#2097)
Similar to
c269524b2f
this PR reworks and exposes an internal trait (this time
`TableInternal`) to be a public trait. These two PRs together should
make it possible for others to integrate LanceDB on top of other
catalogs.

This PR also adds a basic `TableProvider` implementation for tables,
although some work still needs to be done here (pushdown not yet
enabled).
2025-02-05 18:13:51 -08:00
Will Jones
15f8f4d627 ci: check license headers (#2076)
Based on the same workflow in Lance.
2025-01-29 08:27:07 -08:00
BubbleCal
f4dea72cc5 feat: support vector search with distance thresholds (#1993)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-06 13:23:39 +08:00
BubbleCal
445a312667 fix: selecting columns failed on FTS and hybrid search (#1991)
it reports error `AttributeError: 'builtins.FTSQuery' object has no
attribute 'select_columns'`
because we missed `select_columns` method in rust

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-03 13:08:12 +08:00
Bert
2a9e3e2084 feat(python): support hybrid search in async sdk (#1915)
fixes: https://github.com/lancedb/lancedb/issues/1765

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-12-06 13:53:15 -05:00
Will Jones
5f261cf2d8 feat: upgrade to Lance v0.20.0 (#1908)
Upstream change log:
https://github.com/lancedb/lance/releases/tag/v0.20.0
2024-12-05 10:53:59 -08:00
BubbleCal
b2f88f0b29 feat: support to sepcify ef search param (#1844)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-11-19 23:12:25 +08:00
Will Jones
abd75e0ead feat: search multiple query vectors as one query (#1811)
Allows users to pass multiple query vector as part of a single query
plan. This just runs the queries in parallel without any further
optimization. It's mostly a convenience.

Previously, I think this was only handled by the sync Python remote API.
This makes it common across all SDKs.

Closes https://github.com/lancedb/lancedb/issues/1803

```python
>>> import lancedb
>>> import asyncio
>>> 
>>> async def main():
...     db = await lancedb.connect_async("./demo")
...     table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite")
...     return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas()
... 
>>> asyncio.run(main())
   query_index  id           vector  _distance
0            2   2  [4.0, 5.0, 6.0]        0.0
1            1   2  [4.0, 5.0, 6.0]        0.0
2            0   1  [1.0, 2.0, 3.0]        0.0
```
2024-11-13 16:05:16 -08:00
Will Jones
3604d20ad3 feat(python,node): support with_row_id in Python and remote (#1784)
Needed to support hybrid search in Remote SDK.
2024-11-04 11:25:45 -08:00
Will Jones
15ed7f75a0 feat(python): support post filter on FTS (#1783) 2024-11-01 10:05:05 -07:00
Will Jones
96181ab421 feat: fast_search in Python and Node (#1623)
Sometimes it is acceptable to users to only search indexed data and skip
and new un-indexed data. For example, if un-indexed data will be shortly
indexed and they don't mind the delay. In these cases, we can save a lot
of CPU time in search, and provide better latency. Users can activate
this on queries using `fast_search()`.
2024-11-01 09:29:09 -07:00
Gagan Bhullar
b24810a011 feat(python, rust): expose offset in query (#1556)
PR is part of #1555
2024-09-05 08:33:07 -07:00
BubbleCal
0fa50775d6 feat: support to query/index FTS on RemoteTable/AsyncTable (#1537)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-08-16 12:01:05 +08:00
Will Jones
9555efacf9 feat: upgrade lance to 0.15.0 (#1477)
Changelog: https://github.com/lancedb/lance/releases/tag/v0.15.0

* Fixes #1466
* Closes #1475
* Fixes #1446
2024-07-26 09:13:49 -07:00
Will Jones
4f601a2d4c fix: handle camelCase column names in select (#1460)
Fixes #1385
2024-07-22 12:53:17 -07:00
Nuvic
46c6ff889d feat: add the explain_plan function (#1328)
It's useful to see the underlying query plan for debugging purposes.
This exposes LanceScanner's `explain_plan` function. Addresses #1288

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-07-02 11:10:01 -07:00
Weston Pace
d5586c9c32 feat: make it possible to opt in to using the v2 format (#1352)
This also exposed the max_batch_length configuration option in
python/node (it was needed to verify if we are actually in v2 mode or
not)
2024-06-04 21:52:14 -07:00
Weston Pace
4180b44472 feat: refactor the query API and add query support to the python async API (#1113)
In addition, there are also a number of changes in nodejs to the
docstrings of existing methods because this PR adds a jsdoc linter.
2024-04-05 16:32:47 -07:00