Mark McCaskey fe76496a59 fix: .nprobes method in python bindings, improve error messages (#2556)
`nprobes` with a value greater than 20 fails with the minimum error:

```
self = <lancedb.query.AsyncVectorQuery object at 0x10b749720>, minimum_nprobes = 30

    def minimum_nprobes(self, minimum_nprobes: int) -> Self:
        """Set the minimum number of probes to use.

        See `nprobes` for more details.

        These partitions will be searched on every indexed vector query and will
        increase recall at the expense of latency.
        """
>       self._inner.minimum_nprobes(minimum_nprobes)
E       ValueError: Invalid input, minimum_nprobes must be less than or equal to maximum_nprobes

python/lancedb/query.py:2744: ValueError
```

Putting the max set before the min seems reasonable but it causes this
reasonable case to fail:
```
def test_nprobes_min_max_works_sync(table):
    LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(2).maximum_nprobes(4).to_list()
```

with

```
self = <lancedb.query.AsyncVectorQuery object at 0x1203f1c90>, maximum_nprobes = 4

    def maximum_nprobes(self, maximum_nprobes: int) -> Self:
        """Set the maximum number of probes to use.

        See `nprobes` for more details.

        If this value is greater than `minimum_nprobes` then the excess partitions
        will be searched only if we have not found enough results.

        This can be useful when there is a narrow filter to allow these queries to
        spend more time searching and avoid potential false negatives.

        If this value is 0 then no limit will be applied and all partitions could be
        searched if needed to satisfy the limit.
        """
>       self._inner.maximum_nprobes(maximum_nprobes)
E       ValueError: Invalid input, maximum_nprobes must be greater than or equal to minimum_nprobes

python/lancedb/query.py:2761: ValueError
```.

The case I care about is where min == max, but this solution handles it
even if they're not. If both min and max exist, we set both to the
minimum and then set the max. This isn't 100% the same as the minimum
setter checks for 0 on the min and `.nprobes` does not do any sanity
checking at all. But I figured this was the most reasonable and general
solution without touching more of this code.

As part of this I noticed the error messages were a bit ambiguous so I
made them symmetric and clarified them while I was here.
2025-07-30 09:23:25 -07:00
2025-03-21 10:56:29 -07:00
2025-07-23 12:20:36 -07:00
2023-03-17 18:15:19 -07:00
2025-03-10 09:01:23 -07:00
2025-05-27 17:45:17 +02:00

LanceDB Cloud Public Beta

LanceDB Website Blog Discord Twitter LinkedIn

LanceDB

The Multimodal AI Lakehouse

How to Install Detailed DocumentationTutorials and RecipesContributors

The ultimate multimodal data platform for AI/ML applications.

LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease. LanceDB is a central location where developers can build, train and analyze their AI workloads.


Demo: Multimodal Search by Keyword, Vector or with SQL

LanceDB Multimodal Search

Star LanceDB to get updates!

Click here to see how fast we're growing!

Key Features:

  • Fast Vector Search: Search billions of vectors in milliseconds with state-of-the-art indexing.
  • Comprehensive Search: Support for vector similarity search, full-text search and SQL.
  • Multimodal Support: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
  • Advanced Features: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.

Products:

  • Open Source & Local: 100% open source, runs locally or in your cloud. No vendor lock-in.
  • Cloud and Enterprise: Production-scale vector search with no servers to manage. Complete data sovereignty and security.

Ecosystem:

  • Columnar Storage: Built on the Lance columnar format for efficient storage and analytics.
  • Seamless Integration: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
  • Rich Ecosystem: Integrations with LangChain 🦜🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

How to Install:

Follow the Quickstart doc to set up LanceDB locally.

API & SDK: We also support Python, Typescript and Rust SDKs

Interface Documentation
Python SDK https://lancedb.github.io/lancedb/python/python/
Typescript SDK https://lancedb.github.io/lancedb/js/globals/
Rust SDK https://docs.rs/lancedb/latest/lancedb/index.html
REST API https://docs.lancedb.com/api-reference/introduction

Join Us and Contribute

We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.

If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our Discord server.

Check out the GitHub Issues if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.

Contributors

Stay in Touch With Us


Website Blog Discord Twitter LinkedIn

Description
Languages
Rust 42.8%
Python 41.9%
TypeScript 14.2%
Shell 0.6%
Java 0.3%