`nprobes` with a value greater than 20 fails with the minimum error:
```
self = <lancedb.query.AsyncVectorQuery object at 0x10b749720>, minimum_nprobes = 30
def minimum_nprobes(self, minimum_nprobes: int) -> Self:
"""Set the minimum number of probes to use.
See `nprobes` for more details.
These partitions will be searched on every indexed vector query and will
increase recall at the expense of latency.
"""
> self._inner.minimum_nprobes(minimum_nprobes)
E ValueError: Invalid input, minimum_nprobes must be less than or equal to maximum_nprobes
python/lancedb/query.py:2744: ValueError
```
Putting the max set before the min seems reasonable but it causes this
reasonable case to fail:
```
def test_nprobes_min_max_works_sync(table):
LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(2).maximum_nprobes(4).to_list()
```
with
```
self = <lancedb.query.AsyncVectorQuery object at 0x1203f1c90>, maximum_nprobes = 4
def maximum_nprobes(self, maximum_nprobes: int) -> Self:
"""Set the maximum number of probes to use.
See `nprobes` for more details.
If this value is greater than `minimum_nprobes` then the excess partitions
will be searched only if we have not found enough results.
This can be useful when there is a narrow filter to allow these queries to
spend more time searching and avoid potential false negatives.
If this value is 0 then no limit will be applied and all partitions could be
searched if needed to satisfy the limit.
"""
> self._inner.maximum_nprobes(maximum_nprobes)
E ValueError: Invalid input, maximum_nprobes must be greater than or equal to minimum_nprobes
python/lancedb/query.py:2761: ValueError
```.
The case I care about is where min == max, but this solution handles it
even if they're not. If both min and max exist, we set both to the
minimum and then set the max. This isn't 100% the same as the minimum
setter checks for 0 on the min and `.nprobes` does not do any sanity
checking at all. But I figured this was the most reasonable and general
solution without touching more of this code.
As part of this I noticed the error messages were a bit ambiguous so I
made them symmetric and clarified them while I was here.
## Summary
- Exposes `Session` in Python and Typescript so users can set the
`index_cache_size_bytes` and `metadata_cache_size_bytes`
* The `Session` is attached to the `Connection`, and thus shared across
all tables in that connection.
- Adds deprecation warnings for table-level cache configuration
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
Fixes#2515 by implementing comprehensive support for missing columns in
Arrow table inputs when using embedding functions.
### Problem
Previously, when an Arrow table was passed to `fromDataToBuffer` with
missing columns and a schema containing embedding functions, the system
would fail because `applyEmbeddingsFromMetadata` expected all columns to
be present in the table.
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Thanks for all your work.
The docstring for `OptimizeOptions ` seems to reference a non-existent
method on `Table`. I believe this is the correct example for
`cleanupOlderThan`.
This also appears in the generated docs, but I assume they live
downstream from this code?
- Enhanced error messages for schema inference failures to suggest
providing an explicit schema.
- Updated embedding application logic to check for existing destination
columns, allowing for filling embeddings in columns that are all null.
- Added comments for clarity on handling existing columns during
embedding application.
Fixes https://github.com/lancedb/lancedb/issues/2183
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Summary by CodeRabbit
- **Bug Fixes**
- Improved error messages for schema inference to enhance readability.
- Prevented redundant embedding application by skipping columns that
already contain data, avoiding unnecessary errors and computations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This exposes the maximum_nprobes and minimum_nprobes feature that was
added in https://github.com/lancedb/lance/pull/3903
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added support for specifying minimum and maximum probe counts in
vector search queries, allowing finer control over search behavior.
- Users can now independently set minimum and maximum probes for vector
and hybrid queries via new methods and parameters in Python, Node.js,
and Rust APIs.
- **Bug Fixes**
- Improved parameter validation to ensure correct usage of minimum and
maximum probe values.
- **Tests**
- Expanded test coverage to validate correct handling, serialization,
and error cases for the new probe parameters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
- operator for match query
- slop for phrase query
- boolean query
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced support for boolean full-text search queries with AND/OR
logic and occurrence conditions.
- Added operator options for match and multi-match queries to control
term combination logic.
- Enabled phrase queries to specify proximity (slop) for flexible phrase
matching.
- Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery`
class for enhanced query expressiveness.
- **Bug Fixes**
- Improved validation and error handling for invalid operator and
occurrence inputs in full-text queries.
- **Tests**
- Expanded test coverage with new cases for boolean queries and
operator-based full-text searches.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Downgraded version numbers for the Node.js, Python, and Rust packages.
No other user-facing changes were made.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
this introduces some breaking changes in terms of rust API of creating
FTS index, and the default index params changed
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Updated default settings for full-text search (FTS) index creation:
stemming, stop word removal, and ASCII folding are now enabled by
default, while token position storage is disabled by default.
- **Refactor**
- Simplified and streamlined the configuration and handling of FTS index
parameters for improved maintainability and consistency across
interfaces.
- Enhanced serialization and request construction for FTS index
parameters to reduce manual handling and improve code clarity.
- Improved test coverage by explicitly enabling positional indexing in
FTS tests to support phrase queries.
- **Chores**
- Upgraded all internal dependencies related to FTS indexing to the
latest version for enhanced compatibility and performance.
- Updated package versions for Node.js, Python, and Rust components to
the latest beta releases.
- Improved CI workflows by adding Rust toolchain setup with formatting
and linting tools.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Provides the ability to set a timeout for merge insert. The default
underlying timeout is however long the first attempt takes, or if there
are multiple attempts, 30 seconds. This has two use cases:
1. Make the timeout shorter, when you want to fail if it takes too long.
2. Allow taking more time to do retries.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added support for specifying a timeout when performing merge insert
operations in Python, Node.js, and Rust APIs.
- Introduced a new option to control the maximum allowed execution time
for merge inserts, including retry timeout handling.
- **Documentation**
- Updated and added documentation to describe the new timeout option and
its usage in APIs.
- **Tests**
- Added and updated tests to verify correct timeout behavior during
merge insert operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
add restore with tag API in python and nodejs API and add tests to guard
them
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- The restore functionality now supports using version tags in addition
to numeric version identifiers, allowing you to revert tables to a state
marked by a tag.
- **Bug Fixes**
- Restoring with an unknown tag now properly raises an error.
- **Documentation**
- Updated documentation and examples to clarify that restore accepts
both version numbers and tags.
- **Tests**
- Added new tests to verify restore behavior with version tags and error
handling for unknown tags.
- Added tests for checkout and restore operations involving tags.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->