Compare commits

...

19 Commits

Author SHA1 Message Date
Lance Release
e08d45e090 Bump version: 0.23.1-beta.0 → 0.23.1-beta.1 2025-06-17 23:22:00 +00:00
Will Jones
2e3ddb8382 ci: fix lockfile failure for vectordb node (#2443)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated release workflow to set a specific Git user name and email for
automated commits during the package publishing process.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-17 15:14:11 -07:00
Wyatt Alt
627ca4c810 chore: update lance to v0.29.1-beta.2 (#2442)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies to use a newer version of the Lance
library.
- **New Features**
- Added support for a new query occurrence type labeled "MUST NOT" in
search filters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-17 14:02:13 -07:00
Lance Release
f8dae4ffe9 Bump version: 0.20.0 → 0.20.1-beta.0 2025-06-16 16:30:14 +00:00
Lance Release
9eb6119468 Bump version: 0.23.0 → 0.23.1-beta.0 2025-06-16 16:29:22 +00:00
Weston Pace
59b57e30ed feat: add maximum and minimum nprobes properties (#2430)
This exposes the maximum_nprobes and minimum_nprobes feature that was
added in https://github.com/lancedb/lance/pull/3903

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying minimum and maximum probe counts in
vector search queries, allowing finer control over search behavior.
- Users can now independently set minimum and maximum probes for vector
and hybrid queries via new methods and parameters in Python, Node.js,
and Rust APIs.

- **Bug Fixes**
- Improved parameter validation to ensure correct usage of minimum and
maximum probe values.

- **Tests**
- Expanded test coverage to validate correct handling, serialization,
and error cases for the new probe parameters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-13 15:18:29 -07:00
BubbleCal
fec8d58f06 feat: support a bunch or FTS features in JS SDK (#2431)
- operator for match query
- slop for phrase query
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for boolean full-text search queries with AND/OR
logic and occurrence conditions.
- Added operator options for match and multi-match queries to control
term combination logic.
- Enabled phrase queries to specify proximity (slop) for flexible phrase
matching.
- Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery`
class for enhanced query expressiveness.

- **Bug Fixes**
- Improved validation and error handling for invalid operator and
occurrence inputs in full-text queries.

- **Tests**
- Expanded test coverage with new cases for boolean queries and
operator-based full-text searches.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-12 17:04:19 +08:00
BubbleCal
84ded9d678 feat: support new FTS features in python SDK (#2411)
- AND operator
- phrase query slop param
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for combining full-text search queries using AND/OR
operators, enabling more flexible query composition.
- Introduced new query types and parameters, including boolean queries,
operator selection, occurrence constraints, and phrase slop for advanced
search scenarios.
- Enhanced asynchronous search to accept rich full-text query objects
directly.

- **Bug Fixes**
- Improved handling and validation of full-text search queries in both
synchronous and asynchronous search operations.

- **Tests**
- Updated and expanded tests to cover new full-text query types and
their usage in search functions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-06-06 14:33:46 +08:00
Wyatt Alt
65696d9713 chore: update lance in lancedb (#2424)
This updates lance to v0.29.1-beta.1.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated workspace dependencies for improved consistency and
reliability. No changes to user-facing functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 19:06:51 -07:00
Will Jones
e2f2ea32e4 ci: fix vectordb release (#2422)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated the release workflow to include an additional step for
improved process reliability. No changes to user-facing functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 17:06:02 -07:00
Lance Release
d5f2eca754 Bump version: 0.20.0-beta.3 → 0.20.0 2025-06-04 21:08:31 +00:00
Lance Release
7fa455a8a5 Bump version: 0.20.0-beta.2 → 0.20.0-beta.3 2025-06-04 21:07:59 +00:00
Lance Release
8f42b5874e Bump version: 0.23.0-beta.3 → 0.23.0 2025-06-04 21:07:39 +00:00
Lance Release
274f19f560 Bump version: 0.23.0-beta.2 → 0.23.0-beta.3 2025-06-04 21:07:38 +00:00
Will Jones
fbcbc75b5b feat: upgrade lance to stable version (#2420)
Adds a script to change the lance dependency easily. To make this
change, I just had to run:

```bash
python ci/set_lance_version.py stable
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added a script to automate updating the Lance package version in
project dependencies.
- **Chores**
- Updated workflows to improve lockfile management and automate updates
during releases and publishing.
- Switched Lance dependencies from git-based references to fixed version
numbers for improved stability.
- Enhanced lockfile update script with an option to amend commits and
quieter output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-06-04 13:34:30 -07:00
Will Jones
008f389bd0 ci: commit updated Cargo.lock (#2418)
Follow up to #2416

Forgot to do `git add`.
Also need to delete old actions updating package lock.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
  - Removed legacy workflows related to updating package lock files.
- Improved the update lockfiles script to ensure updated lockfiles are
always included in amended commits.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-06-04 08:40:38 -07:00
Lance Release
91af6518d9 Updating package-lock.json 2025-06-04 07:15:07 +00:00
Lance Release
af6819762c Updating package-lock.json 2025-06-04 07:14:50 +00:00
Lance Release
7acece493d Bump version: 0.20.0-beta.1 → 0.20.0-beta.2 2025-06-04 07:14:39 +00:00
53 changed files with 1450 additions and 531 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.20.0-beta.1"
current_version = "0.20.1-beta.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -84,7 +84,7 @@ jobs:
run: |
pip install bump-my-version PyGithub packaging
bash ci/bump_version.sh ${{ inputs.type }} ${{ inputs.bump-minor }} v $COMMIT_BEFORE_BUMP
bash ci/update_lockfiles.sh
bash ci/update_lockfiles.sh --amend
- name: Push new version tag
if: ${{ !inputs.dry_run }}
uses: ad-m/github-push-action@master
@@ -93,11 +93,3 @@ jobs:
github_token: ${{ secrets.LANCEDB_RELEASE_TOKEN }}
branch: ${{ github.ref }}
tags: true
- uses: ./.github/workflows/update_package_lock
if: ${{ !inputs.dry_run && inputs.other }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
- uses: ./.github/workflows/update_package_lock_nodejs
if: ${{ !inputs.dry_run && inputs.other }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -505,6 +505,8 @@ jobs:
name: vectordb NPM Publish
needs: [node, node-macos, node-linux-gnu, node-windows]
runs-on: ubuntu-latest
permissions:
contents: write
# Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v')
steps:
@@ -537,6 +539,15 @@ jobs:
# We need to deprecate the old package to avoid confusion.
# Each time we publish a new version, it gets undeprecated.
run: npm deprecate vectordb "Use @lancedb/lancedb instead."
- name: Checkout
uses: actions/checkout@v4
- name: Update package-lock.json
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
bash ci/update_lockfiles.sh
- name: Push new commit
uses: ad-m/github-push-action@master
- name: Notify Slack Action
uses: ravsamhq/notify-slack-action@2.3.0
if: ${{ always() }}

View File

@@ -1,33 +0,0 @@
name: update_package_lock
description: "Update node's package.lock"
inputs:
github_token:
required: true
description: "github token for the repo"
runs:
using: "composite"
steps:
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Set git configs
shell: bash
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
- name: Update package-lock.json file
working-directory: ./node
run: |
npm install
git add package-lock.json
git commit -m "Updating package-lock.json"
shell: bash
- name: Push changes
if: ${{ inputs.dry_run }} == "false"
uses: ad-m/github-push-action@master
with:
github_token: ${{ inputs.github_token }}
branch: main
tags: true

View File

@@ -1,33 +0,0 @@
name: update_package_lock_nodejs
description: "Update nodejs's package.lock"
inputs:
github_token:
required: true
description: "github token for the repo"
runs:
using: "composite"
steps:
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Set git configs
shell: bash
run: |
git config user.name 'Lance Release'
git config user.email 'lance-dev@lancedb.com'
- name: Update package-lock.json file
working-directory: ./nodejs
run: |
npm install
git add package-lock.json
git commit -m "Updating package-lock.json"
shell: bash
- name: Push changes
if: ${{ inputs.dry_run }} == "false"
uses: ad-m/github-push-action@master
with:
github_token: ${{ inputs.github_token }}
branch: main
tags: true

62
Cargo.lock generated
View File

@@ -2835,8 +2835,8 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"rand 0.8.5",
]
@@ -3928,8 +3928,8 @@ dependencies = [
[[package]]
name = "lance"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow",
"arrow-arith",
@@ -3992,8 +3992,8 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4010,8 +4010,8 @@ dependencies = [
[[package]]
name = "lance-core"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4047,8 +4047,8 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow",
"arrow-array",
@@ -4077,8 +4077,8 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow",
"arrow-array",
@@ -4093,8 +4093,8 @@ dependencies = [
[[package]]
name = "lance-encoding"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrayref",
"arrow",
@@ -4133,8 +4133,8 @@ dependencies = [
[[package]]
name = "lance-file"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4168,8 +4168,8 @@ dependencies = [
[[package]]
name = "lance-index"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow",
"arrow-array",
@@ -4189,7 +4189,7 @@ dependencies = [
"datafusion-physical-expr",
"datafusion-sql",
"deepsize",
"dirs 5.0.1",
"dirs 6.0.0",
"fst",
"futures",
"half",
@@ -4224,8 +4224,8 @@ dependencies = [
[[package]]
name = "lance-io"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow",
"arrow-arith",
@@ -4264,8 +4264,8 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow-array",
"arrow-ord",
@@ -4288,8 +4288,8 @@ dependencies = [
[[package]]
name = "lance-table"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow",
"arrow-array",
@@ -4328,8 +4328,8 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "0.29.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.0-beta.2#0cfaf95bb914d589fde2f01c6f5ef0ef0beddbca"
version = "0.29.1"
source = "git+https://github.com/lancedb/lance.git?tag=v0.29.1-beta.2#77bbcb2453f8fa37d825652b56bbe611e02589de"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -4340,7 +4340,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.20.0-beta.1"
version = "0.20.1-beta.0"
dependencies = [
"arrow",
"arrow-array",
@@ -4427,7 +4427,7 @@ dependencies = [
[[package]]
name = "lancedb-node"
version = "0.20.0-beta.1"
version = "0.20.1-beta.0"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4452,7 +4452,7 @@ dependencies = [
[[package]]
name = "lancedb-nodejs"
version = "0.20.0-beta.1"
version = "0.20.1-beta.0"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4472,7 +4472,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.23.0-beta.1"
version = "0.23.1-beta.0"
dependencies = [
"arrow",
"env_logger",

View File

@@ -21,14 +21,14 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.29.0", "features" = ["dynamodb"], tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance = { "version" = "=0.29.1", "features" = ["dynamodb"], tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.29.1", tag = "v0.29.1-beta.2", git="https://github.com/lancedb/lance.git" }
# Note that this one does not include pyarrow
arrow = { version = "55.1", optional = false }
arrow-array = "55.1"

174
ci/set_lance_version.py Normal file
View File

@@ -0,0 +1,174 @@
import argparse
import sys
import json
def run_command(command: str) -> str:
"""
Run a shell command and return stdout as a string.
If exit code is not 0, raise an exception with the stderr output.
"""
import subprocess
result = subprocess.run(command, shell=True, capture_output=True, text=True)
if result.returncode != 0:
raise Exception(f"Command failed with error: {result.stderr.strip()}")
return result.stdout.strip()
def get_latest_stable_version() -> str:
version_line = run_command("cargo info lance | grep '^version:'")
version = version_line.split(" ")[1].strip()
return version
def get_latest_preview_version() -> str:
lance_tags = run_command(
"git ls-remote --tags https://github.com/lancedb/lance.git | grep 'refs/tags/v[0-9beta.-]\\+$'"
).splitlines()
lance_tags = (
tag.split("refs/tags/")[1]
for tag in lance_tags
if "refs/tags/" in tag and "beta" in tag
)
from packaging.version import Version
latest = max(
(tag[1:] for tag in lance_tags if tag.startswith("v")), key=lambda t: Version(t)
)
return str(latest)
def extract_features(line: str) -> list:
"""
Extracts the features from a line in Cargo.toml.
Example: 'lance = { "version" = "=0.29.0", "features" = ["dynamodb"] }'
Returns: ['dynamodb']
"""
import re
match = re.search(r'"features"\s*=\s*\[(.*?)\]', line)
if match:
features_str = match.group(1)
return [f.strip('"') for f in features_str.split(",")]
return []
def update_cargo_toml(line_updater):
"""
Updates the Cargo.toml file by applying the line_updater function to each line.
The line_updater function should take a line as input and return the updated line.
"""
with open("Cargo.toml", "r") as f:
lines = f.readlines()
new_lines = []
for line in lines:
if line.startswith("lance"):
# Update the line using the provided function
new_lines.append(line_updater(line))
else:
# Keep the line unchanged
new_lines.append(line)
with open("Cargo.toml", "w") as f:
f.writelines(new_lines)
def set_stable_version(version: str):
"""
Sets lines to
lance = { "version" = "=0.29.0", "features" = ["dynamodb"] }
lance-io = "=0.29.0"
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
features = extract_features(line)
if features:
return f'{package_name} = {{ "version" = "={version}", "features" = {json.dumps(features)} }}\n'
else:
return f'{package_name} = "={version}"\n'
update_cargo_toml(line_updater)
def set_preview_version(version: str):
"""
Sets lines to
lance = { "version" = "=0.29.0", "features" = ["dynamodb"], tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.29.0", tag = "v0.29.0-beta.2", git="https://github.com/lancedb/lance.git" }
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
features = extract_features(line)
base_version = version.split("-")[0] # Get the base version without beta suffix
if features:
return f'{package_name} = {{ "version" = "={base_version}", "features" = {json.dumps(features)}, "tag" = "v{version}", "git" = "https://github.com/lancedb/lance.git" }}\n'
else:
return f'{package_name} = {{ "version" = "={base_version}", "tag" = "v{version}", "git" = "https://github.com/lancedb/lance.git" }}\n'
update_cargo_toml(line_updater)
def set_local_version():
"""
Sets lines to
lance = { path = "../lance/rust/lance", features = ["dynamodb"] }
lance-io = { path = "../lance/rust/lance-io" }
...
"""
def line_updater(line: str) -> str:
package_name = line.split("=", maxsplit=1)[0].strip()
features = extract_features(line)
if features:
return f'{package_name} = {{ "path" = "../lance/rust/{package_name}", "features" = {json.dumps(features)} }}\n'
else:
return f'{package_name} = {{ "path" = "../lance/rust/{package_name}" }}\n'
update_cargo_toml(line_updater)
parser = argparse.ArgumentParser(description="Set the version of the Lance package.")
parser.add_argument(
"version",
type=str,
help="The version to set for the Lance package. Use 'stable' for the latest stable version, 'preview' for latest preview version, or a specific version number (e.g., '0.1.0'). You can also specify 'local' to use a local path.",
)
args = parser.parse_args()
if args.version == "stable":
latest_stable_version = get_latest_stable_version()
print(
f"Found latest stable version: \033[1mv{latest_stable_version}\033[0m",
file=sys.stderr,
)
set_stable_version(latest_stable_version)
elif args.version == "preview":
latest_preview_version = get_latest_preview_version()
print(
f"Found latest preview version: \033[1mv{latest_preview_version}\033[0m",
file=sys.stderr,
)
set_preview_version(latest_preview_version)
elif args.version == "local":
set_local_version()
else:
# Parse the version number.
version = args.version
# Ignore initial v if present.
if version.startswith("v"):
version = version[1:]
if "beta" in version:
set_preview_version(version)
else:
set_stable_version(version)
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)

View File

@@ -1,18 +1,30 @@
#!/usr/bin/env bash
set -euo pipefail
AMEND=false
for arg in "$@"; do
if [[ "$arg" == "--amend" ]]; then
AMEND=true
fi
done
# This updates the lockfile without building
cargo metadata > /dev/null
cargo metadata --quiet > /dev/null
pushd nodejs || exit 1
npm install --package-lock-only
npm install --package-lock-only --silent
popd
pushd node || exit 1
npm install --package-lock-only
npm install --package-lock-only --silent
popd
if git diff --quiet --exit-code; then
echo "No lockfile changes to commit; skipping amend."
else
elif $AMEND; then
git add Cargo.lock nodejs/package-lock.json node/package-lock.json
git commit --amend --no-edit
else
git add Cargo.lock nodejs/package-lock.json node/package-lock.json
git commit -m "Update lockfiles"
fi

View File

@@ -42,6 +42,7 @@ duckdb.query("SELECT * FROM arrow_table")
Have the required imports before doing any querying.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:import-lancedb"
--8<-- "python/python/tests/docs/test_guide_tables.py:import-session-context"
@@ -51,6 +52,7 @@ Have the required imports before doing any querying.
Register the table created with the Datafusion session context.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:lance_sql_basic"
```

View File

@@ -0,0 +1,53 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BooleanQuery
# Class: BooleanQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BooleanQuery()
```ts
new BooleanQuery(queries): BooleanQuery
```
Creates an instance of BooleanQuery.
#### Parameters
* **queries**: [[`Occur`](../enumerations/Occur.md), [`FullTextQuery`](../interfaces/FullTextQuery.md)][]
An array of (Occur, FullTextQuery objects) to combine.
Occur specifies whether the query must match, or should match.
#### Returns
[`BooleanQuery`](BooleanQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -40,6 +40,7 @@ Creates an instance of MatchQuery.
- `boost`: The boost factor for the query (default is 1.0).
- `fuzziness`: The fuzziness level for the query (default is 0).
- `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boost?**: `number`
@@ -47,6 +48,8 @@ Creates an instance of MatchQuery.
* **options.maxExpansions?**: `number`
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns
[`MatchQuery`](MatchQuery.md)

View File

@@ -38,9 +38,12 @@ Creates an instance of MultiMatchQuery.
* **options?**
Optional parameters for the multi-match query.
- `boosts`: An array of boost factors for each column (default is 1.0 for all).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boosts?**: `number`[]
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns
[`MultiMatchQuery`](MultiMatchQuery.md)

View File

@@ -19,7 +19,10 @@ including methods to retrieve the query type and convert the query to a dictiona
### new PhraseQuery()
```ts
new PhraseQuery(query, column): PhraseQuery
new PhraseQuery(
query,
column,
options?): PhraseQuery
```
Creates an instance of `PhraseQuery`.
@@ -32,6 +35,12 @@ Creates an instance of `PhraseQuery`.
* **column**: `string`
The name of the column to search within.
* **options?**
Optional parameters for the phrase query.
- `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
* **options.slop?**: `number`
#### Returns
[`PhraseQuery`](PhraseQuery.md)

View File

@@ -15,6 +15,14 @@ Enum representing the types of full-text queries supported.
## Enumeration Members
### Boolean
```ts
Boolean: "boolean";
```
***
### Boost
```ts

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Occur
# Enumeration: Occur
Enum representing the occurrence of terms in full-text queries.
- `Must`: The term must be present in the document.
- `Should`: The term should contribute to the document score, but is not required.
## Enumeration Members
### Must
```ts
Must: "MUST";
```
***
### Should
```ts
Should: "SHOULD";
```

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Operator
# Enumeration: Operator
Enum representing the logical operators used in full-text queries.
- `And`: All terms must match.
- `Or`: At least one term must match.
## Enumeration Members
### And
```ts
And: "AND";
```
***
### Or
```ts
Or: "OR";
```

View File

@@ -12,9 +12,12 @@
## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md)
- [Occur](enumerations/Occur.md)
- [Operator](enumerations/Operator.md)
## Classes
- [BooleanQuery](classes/BooleanQuery.md)
- [BoostQuery](classes/BoostQuery.md)
- [Connection](classes/Connection.md)
- [Index](classes/Index.md)

View File

@@ -7,3 +7,4 @@ tantivy==0.20.1
--extra-index-url https://download.pytorch.org/whl/cpu
torch
polars>=0.19, <=1.3.0
datafusion

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.20.0-beta.1</version>
<version>0.20.1-beta.0</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.20.0-beta.1</version>
<version>0.20.1-beta.0</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>

49
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "vectordb",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "vectordb",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"cpu": [
"x64",
"arm64"
@@ -52,11 +52,11 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.20.0-beta.1",
"@lancedb/vectordb-darwin-x64": "0.20.0-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.20.0-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.20.0-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.20.0-beta.1"
"@lancedb/vectordb-darwin-arm64": "0.20.1-beta.0",
"@lancedb/vectordb-darwin-x64": "0.20.1-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.20.1-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.20.1-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.20.1-beta.0"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",
@@ -327,65 +327,60 @@
}
},
"node_modules/@lancedb/vectordb-darwin-arm64": {
"version": "0.20.0-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.20.0-beta.1.tgz",
"integrity": "sha512-yds8wFjni68RfA+KziTz/8v4YKku1i6q4JF8I2EhpzDI8tT0fk1YqGlVhtdn9fHDWq/9m1M05kGVuyzLypZ2Yw==",
"version": "0.20.1-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.20.1-beta.0.tgz",
"integrity": "sha512-EZl1nvF/2MbLkB8DkNPg+9SpYWpqnNR9kY5a1JWtNWQWw735oT2VPnH3B2htDKU42gJ/9DJGBdEvIJwzeHT85w==",
"cpu": [
"arm64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"darwin"
]
},
"node_modules/@lancedb/vectordb-darwin-x64": {
"version": "0.20.0-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.20.0-beta.1.tgz",
"integrity": "sha512-oF2MNtkWaJQWyUSIKU/zrbgygK94MzomUKc/Z9CYs7Ar3PI4CIfG72e5o/Zbhjpl318BkR4AbQQYX8BZaNIPVw==",
"version": "0.20.1-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.20.1-beta.0.tgz",
"integrity": "sha512-1ZkMcsXsysLRohAeHGpbytVHUp4yEU89A34rrh48vcQUNvYtqxbAw+TLjAbN0vvNvOZOI4DRllxSL1O+Dbybbg==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"darwin"
]
},
"node_modules/@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.20.0-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.20.0-beta.1.tgz",
"integrity": "sha512-3Si0+K5T4awMiUVu0dD9NizcqIiGnEdsTu4YxbKKq1aI4xoaHrYGERkz58mtIFoBQHfre42ujPDoahTkAQ1j/Q==",
"version": "0.20.1-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.20.1-beta.0.tgz",
"integrity": "sha512-CxjSGaLJNRYxljdrC8MSirnHu73jctv3S3Q90CbsWMsij9za87zvnrjoiRIn7kv7UNS4ArwS9yyH6gNorCBf6Q==",
"cpu": [
"arm64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@lancedb/vectordb-linux-x64-gnu": {
"version": "0.20.0-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.20.0-beta.1.tgz",
"integrity": "sha512-5umO9XaDIxmqUiFnWaHxJtgkCO7oFWtEvLtzM4hG1mkEnwnE3bmXEO+cm+jPro7zwdKEzsnXh0GoCSUvuHk0tA==",
"version": "0.20.1-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.20.1-beta.0.tgz",
"integrity": "sha512-WI2XWYYO5ygL0Az7SlX98VpNqrz8hKuTK/xC/PoM99s1xnfcCukM28DaDGZJpXOGnLbVnexcO2RW4daJ2xDPaQ==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@lancedb/vectordb-win32-x64-msvc": {
"version": "0.20.0-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.20.0-beta.1.tgz",
"integrity": "sha512-EKyDamAi3RmDTu+BFYxr41eGLggZ3FVGu289gCprzljk38d8uxdgKhvDtYN9FWoMew4VvVk/EJQJx6L8sJJRng==",
"version": "0.20.1-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.20.1-beta.0.tgz",
"integrity": "sha512-Mxd7V3Y8whEBoQFQZhZGFQi0avq8ujHRI2c0LhjhYTdwGylrBS3bfGD+/nbDGhAjp7dp5U8P4kiBi30QNwoedA==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"win32"

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"description": " Serverless, low-latency vector database for AI applications",
"private": false,
"main": "dist/index.js",
@@ -89,10 +89,10 @@
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-x64": "0.20.0-beta.1",
"@lancedb/vectordb-darwin-arm64": "0.20.0-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.20.0-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.20.0-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.20.0-beta.1"
"@lancedb/vectordb-darwin-x64": "0.20.1-beta.0",
"@lancedb/vectordb-darwin-arm64": "0.20.1-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.20.1-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.20.1-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.20.1-beta.0"
}
}

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.20.0-beta.1"
version = "0.20.1-beta.0"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -33,7 +33,12 @@ import {
register,
} from "../lancedb/embedding";
import { Index } from "../lancedb/indices";
import { instanceOfFullTextQuery } from "../lancedb/query";
import {
BooleanQuery,
Occur,
Operator,
instanceOfFullTextQuery,
} from "../lancedb/query";
import exp = require("constants");
describe.each([arrow15, arrow16, arrow17, arrow18])(
@@ -554,6 +559,32 @@ describe("When creating an index", () => {
rst = await tbl.query().limit(2).offset(1).nearestTo(queryVec).toArrow();
expect(rst.numRows).toBe(1);
// test nprobes
rst = await tbl.query().nearestTo(queryVec).limit(2).nprobes(50).toArrow();
expect(rst.numRows).toBe(2);
rst = await tbl
.query()
.nearestTo(queryVec)
.limit(2)
.minimumNprobes(15)
.toArrow();
expect(rst.numRows).toBe(2);
rst = await tbl
.query()
.nearestTo(queryVec)
.limit(2)
.minimumNprobes(10)
.maximumNprobes(20)
.toArrow();
expect(rst.numRows).toBe(2);
expect(() => tbl.query().nearestTo(queryVec).minimumNprobes(0)).toThrow(
"Invalid input, minimum_nprobes must be greater than 0",
);
expect(() => tbl.query().nearestTo(queryVec).maximumNprobes(5)).toThrow(
"Invalid input, maximum_nprobes must be greater than minimum_nprobes",
);
await tbl.dropIndex("vec_idx");
const indices2 = await tbl.listIndices();
expect(indices2.length).toBe(0);
@@ -1531,6 +1562,18 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
const results = await table.search("hello").toArray();
expect(results[0].text).toBe(data[0].text);
const results2 = await table
.search(new MatchQuery("hello world", "text"))
.toArray();
expect(results2.length).toBe(2);
const results3 = await table
.search(
new MatchQuery("hello world", "text", { operator: Operator.And }),
)
.toArray();
expect(results3.length).toBe(1);
});
test("full text search without lowercase", async () => {
@@ -1609,6 +1652,38 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(resultSet.has("food")).toBe(true);
});
test("full text search boolean query", async () => {
const db = await connect(tmpDir.name);
const data = [
{ text: "hello world", vector: [0.1, 0.2, 0.3] },
{ text: "goodbye world", vector: [0.4, 0.5, 0.6] },
];
const table = await db.createTable("test", data);
await table.createIndex("text", {
config: Index.fts({ withPosition: false }),
});
const shouldResults = await table
.search(
new BooleanQuery([
[Occur.Should, new MatchQuery("hello", "text")],
[Occur.Should, new MatchQuery("goodbye", "text")],
]),
)
.toArray();
expect(shouldResults.length).toBe(2);
const mustResults = await table
.search(
new BooleanQuery([
[Occur.Must, new MatchQuery("hello", "text")],
[Occur.Must, new MatchQuery("world", "text")],
]),
)
.toArray();
expect(mustResults.length).toBe(1);
});
test.each([
[0.4, 0.5, 0.599], // number[]
Float32Array.of(0.4, 0.5, 0.599), // Float32Array

View File

@@ -64,7 +64,10 @@ export {
PhraseQuery,
BoostQuery,
MultiMatchQuery,
BooleanQuery,
FullTextQueryType,
Operator,
Occur,
} from "./query";
export {

View File

@@ -448,6 +448,10 @@ export class VectorQuery extends QueryBase<NativeVectorQuery> {
* For best results we recommend tuning this parameter with a benchmark against
* your actual data to find the smallest possible value that will still give
* you the desired recall.
*
* For more fine grained control over behavior when you have a very narrow filter
* you can use `minimumNprobes` and `maximumNprobes`. This method sets both
* the minimum and maximum to the same value.
*/
nprobes(nprobes: number): VectorQuery {
super.doCall((inner) => inner.nprobes(nprobes));
@@ -455,6 +459,33 @@ export class VectorQuery extends QueryBase<NativeVectorQuery> {
return this;
}
/**
* Set the minimum number of probes used.
*
* This controls the minimum number of partitions that will be searched. This
* parameter will impact every query against a vector index, regardless of the
* filter. See `nprobes` for more details. Higher values will increase recall
* but will also increase latency.
*/
minimumNprobes(minimumNprobes: number): VectorQuery {
super.doCall((inner) => inner.minimumNprobes(minimumNprobes));
return this;
}
/**
* Set the maximum number of probes used.
*
* This controls the maximum number of partitions that will be searched. If this
* number is greater than minimumNprobes then the excess partitions will _only_ be
* searched if we have not found enough results. This can be useful when there is
* a narrow filter to allow these queries to spend more time searching and avoid
* potential false negatives.
*/
maximumNprobes(maximumNprobes: number): VectorQuery {
super.doCall((inner) => inner.maximumNprobes(maximumNprobes));
return this;
}
/*
* Set the distance range to use
*
@@ -762,6 +793,29 @@ export enum FullTextQueryType {
MatchPhrase = "match_phrase",
Boost = "boost",
MultiMatch = "multi_match",
Boolean = "boolean",
}
/**
* Enum representing the logical operators used in full-text queries.
*
* - `And`: All terms must match.
* - `Or`: At least one term must match.
*/
export enum Operator {
And = "AND",
Or = "OR",
}
/**
* Enum representing the occurrence of terms in full-text queries.
*
* - `Must`: The term must be present in the document.
* - `Should`: The term should contribute to the document score, but is not required.
*/
export enum Occur {
Must = "MUST",
Should = "SHOULD",
}
/**
@@ -791,6 +845,7 @@ export function instanceOfFullTextQuery(obj: any): obj is FullTextQuery {
export class MatchQuery implements FullTextQuery {
/** @ignore */
public readonly inner: JsFullTextQuery;
/**
* Creates an instance of MatchQuery.
*
@@ -800,6 +855,7 @@ export class MatchQuery implements FullTextQuery {
* - `boost`: The boost factor for the query (default is 1.0).
* - `fuzziness`: The fuzziness level for the query (default is 0).
* - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
* - `operator`: The logical operator to use for combining terms in the query (default is "OR").
*/
constructor(
query: string,
@@ -808,6 +864,7 @@ export class MatchQuery implements FullTextQuery {
boost?: number;
fuzziness?: number;
maxExpansions?: number;
operator?: Operator;
},
) {
let fuzziness = options?.fuzziness;
@@ -820,6 +877,7 @@ export class MatchQuery implements FullTextQuery {
options?.boost ?? 1.0,
fuzziness,
options?.maxExpansions ?? 50,
options?.operator ?? Operator.Or,
);
}
@@ -836,9 +894,11 @@ export class PhraseQuery implements FullTextQuery {
*
* @param query - The phrase to search for in the specified column.
* @param column - The name of the column to search within.
* @param options - Optional parameters for the phrase query.
* - `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
*/
constructor(query: string, column: string) {
this.inner = JsFullTextQuery.phraseQuery(query, column);
constructor(query: string, column: string, options?: { slop?: number }) {
this.inner = JsFullTextQuery.phraseQuery(query, column, options?.slop ?? 0);
}
queryType(): FullTextQueryType {
@@ -889,18 +949,21 @@ export class MultiMatchQuery implements FullTextQuery {
* @param columns - An array of column names to search within.
* @param options - Optional parameters for the multi-match query.
* - `boosts`: An array of boost factors for each column (default is 1.0 for all).
* - `operator`: The logical operator to use for combining terms in the query (default is "OR").
*/
constructor(
query: string,
columns: string[],
options?: {
boosts?: number[];
operator?: Operator;
},
) {
this.inner = JsFullTextQuery.multiMatchQuery(
query,
columns,
options?.boosts,
options?.operator ?? Operator.Or,
);
}
@@ -908,3 +971,23 @@ export class MultiMatchQuery implements FullTextQuery {
return FullTextQueryType.MultiMatch;
}
}
export class BooleanQuery implements FullTextQuery {
/** @ignore */
public readonly inner: JsFullTextQuery;
/**
* Creates an instance of BooleanQuery.
*
* @param queries - An array of (Occur, FullTextQuery objects) to combine.
* Occur specifies whether the query must match, or should match.
*/
constructor(queries: [Occur, FullTextQuery][]) {
this.inner = JsFullTextQuery.booleanQuery(
queries.map(([occur, query]) => [occur, query.inner]),
);
}
queryType(): FullTextQueryType {
return FullTextQueryType.Boolean;
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.20.0-beta.1",
"version": "0.20.1-beta.0",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -4,7 +4,8 @@
use std::sync::Arc;
use lancedb::index::scalar::{
BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, PhraseQuery,
BooleanQuery, BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, Occur,
Operator, PhraseQuery,
};
use lancedb::query::ExecutableQuery;
use lancedb::query::Query as LanceDbQuery;
@@ -177,6 +178,31 @@ impl VectorQuery {
self.inner = self.inner.clone().nprobes(nprobe as usize);
}
#[napi]
pub fn minimum_nprobes(&mut self, minimum_nprobe: u32) -> napi::Result<()> {
self.inner = self
.inner
.clone()
.minimum_nprobes(minimum_nprobe as usize)
.default_error()?;
Ok(())
}
#[napi]
pub fn maximum_nprobes(&mut self, maximum_nprobes: u32) -> napi::Result<()> {
let maximum_nprobes = if maximum_nprobes == 0 {
None
} else {
Some(maximum_nprobes as usize)
};
self.inner = self
.inner
.clone()
.maximum_nprobes(maximum_nprobes)
.default_error()?;
Ok(())
}
#[napi]
pub fn distance_range(&mut self, lower_bound: Option<f64>, upper_bound: Option<f64>) {
// napi doesn't support f32, so we have to convert to f32
@@ -308,6 +334,7 @@ impl JsFullTextQuery {
boost: f64,
fuzziness: Option<u32>,
max_expansions: u32,
operator: String,
) -> napi::Result<Self> {
Ok(Self {
inner: MatchQuery::new(query)
@@ -315,14 +342,22 @@ impl JsFullTextQuery {
.with_boost(boost as f32)
.with_fuzziness(fuzziness)
.with_max_expansions(max_expansions as usize)
.with_operator(
Operator::try_from(operator.as_str()).map_err(|e| {
napi::Error::from_reason(format!("Invalid operator: {}", e))
})?,
)
.into(),
})
}
#[napi(factory)]
pub fn phrase_query(query: String, column: String) -> napi::Result<Self> {
pub fn phrase_query(query: String, column: String, slop: u32) -> napi::Result<Self> {
Ok(Self {
inner: PhraseQuery::new(query).with_column(Some(column)).into(),
inner: PhraseQuery::new(query)
.with_column(Some(column))
.with_slop(slop)
.into(),
})
}
@@ -348,6 +383,7 @@ impl JsFullTextQuery {
query: String,
columns: Vec<String>,
boosts: Option<Vec<f64>>,
operator: String,
) -> napi::Result<Self> {
let q = match boosts {
Some(boosts) => MultiMatchQuery::try_new(query, columns)
@@ -358,7 +394,37 @@ impl JsFullTextQuery {
napi::Error::from_reason(format!("Failed to create multi match query: {}", e))
})?;
Ok(Self { inner: q.into() })
let operator = Operator::try_from(operator.as_str()).map_err(|e| {
napi::Error::from_reason(format!("Invalid operator for multi match query: {}", e))
})?;
Ok(Self {
inner: q.with_operator(operator).into(),
})
}
#[napi(factory)]
pub fn boolean_query(queries: Vec<(String, &JsFullTextQuery)>) -> napi::Result<Self> {
let mut sub_queries = Vec::with_capacity(queries.len());
for (occur, q) in queries {
let occur = Occur::try_from(occur.as_str())
.map_err(|e| napi::Error::from_reason(e.to_string()))?;
sub_queries.push((occur, q.inner.clone()));
}
Ok(Self {
inner: BooleanQuery::new(sub_queries).into(),
})
}
#[napi(getter)]
pub fn query_type(&self) -> String {
match self.inner {
FtsQuery::Match(_) => "match".to_string(),
FtsQuery::Phrase(_) => "phrase".to_string(),
FtsQuery::Boost(_) => "boost".to_string(),
FtsQuery::MultiMatch(_) => "multi_match".to_string(),
FtsQuery::Boolean(_) => "boolean".to_string(),
}
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.23.0-beta.2"
current_version = "0.23.1-beta.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.23.0-beta.2"
version = "0.23.1-beta.1"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -143,6 +143,8 @@ class VectorQuery:
def postfilter(self): ...
def refine_factor(self, refine_factor: int): ...
def nprobes(self, nprobes: int): ...
def minimum_nprobes(self, minimum_nprobes: int): ...
def maximum_nprobes(self, maximum_nprobes: int): ...
def bypass_vector_index(self): ...
def nearest_to_text(self, query: dict) -> HybridQuery: ...
def to_query_request(self) -> PyQueryRequest: ...
@@ -158,6 +160,8 @@ class HybridQuery:
def distance_type(self, distance_type: str): ...
def refine_factor(self, refine_factor: int): ...
def nprobes(self, nprobes: int): ...
def minimum_nprobes(self, minimum_nprobes: int): ...
def maximum_nprobes(self, maximum_nprobes: int): ...
def bypass_vector_index(self): ...
def to_vector_query(self) -> VectorQuery: ...
def to_fts_query(self) -> FTSQuery: ...
@@ -165,23 +169,21 @@ class HybridQuery:
def get_with_row_id(self) -> bool: ...
def to_query_request(self) -> PyQueryRequest: ...
class PyFullTextSearchQuery:
columns: Optional[List[str]]
query: str
limit: Optional[int]
wand_factor: Optional[float]
class FullTextQuery:
pass
class PyQueryRequest:
limit: Optional[int]
offset: Optional[int]
filter: Optional[Union[str, bytes]]
full_text_search: Optional[PyFullTextSearchQuery]
full_text_search: Optional[FullTextQuery]
select: Optional[Union[str, List[str]]]
fast_search: Optional[bool]
with_row_id: Optional[bool]
column: Optional[str]
query_vector: Optional[List[pa.Array]]
nprobes: Optional[int]
minimum_nprobes: Optional[int]
maximum_nprobes: Optional[int]
lower_bound: Optional[float]
upper_bound: Optional[float]
ef: Optional[int]

View File

@@ -4,7 +4,6 @@
from __future__ import annotations
from abc import ABC, abstractmethod
import abc
from concurrent.futures import ThreadPoolExecutor
from enum import Enum
from datetime import timedelta
@@ -88,15 +87,27 @@ def ensure_vector_query(
return val
class FullTextQueryType(Enum):
class FullTextQueryType(str, Enum):
MATCH = "match"
MATCH_PHRASE = "match_phrase"
BOOST = "boost"
MULTI_MATCH = "multi_match"
BOOLEAN = "boolean"
class FullTextQuery(abc.ABC, pydantic.BaseModel):
@abc.abstractmethod
class FullTextOperator(str, Enum):
AND = "AND"
OR = "OR"
class Occur(str, Enum):
MUST = "MUST"
SHOULD = "SHOULD"
@pydantic.dataclasses.dataclass
class FullTextQuery(ABC):
@abstractmethod
def query_type(self) -> FullTextQueryType:
"""
Get the query type of the query.
@@ -106,193 +117,174 @@ class FullTextQuery(abc.ABC, pydantic.BaseModel):
str
The type of the query.
"""
pass
@abc.abstractmethod
def to_dict(self) -> dict:
def __and__(self, other: "FullTextQuery") -> "FullTextQuery":
"""
Convert the query to a dictionary.
Returns
-------
dict
The query as a dictionary.
"""
class MatchQuery(FullTextQuery):
query: str
column: str
boost: float = 1.0
fuzziness: int = 0
max_expansions: int = 50
def __init__(
self,
query: str,
column: str,
*,
boost: float = 1.0,
fuzziness: int = 0,
max_expansions: int = 50,
):
"""
Match query for full-text search.
Combine two queries with a logical AND operation.
Parameters
----------
query : str
The query string to match against.
column : str
The name of the column to match against.
boost : float, default 1.0
The boost factor for the query.
The score of each matching document is multiplied by this value.
fuzziness : int, optional
The maximum edit distance for each term in the match query.
Defaults to 0 (exact match).
If None, fuzziness is applied automatically by the rules:
- 0 for terms with length <= 2
- 1 for terms with length <= 5
- 2 for terms with length > 5
max_expansions : int, optional
The maximum number of terms to consider for fuzzy matching.
Defaults to 50.
other : FullTextQuery
The other query to combine with.
Returns
-------
FullTextQuery
A new query that combines both queries with AND.
"""
super().__init__(
query=query,
column=column,
boost=boost,
fuzziness=fuzziness,
max_expansions=max_expansions,
)
return BooleanQuery([(Occur.MUST, self), (Occur.MUST, other)])
def __or__(self, other: "FullTextQuery") -> "FullTextQuery":
"""
Combine two queries with a logical OR operation.
Parameters
----------
other : FullTextQuery
The other query to combine with.
Returns
-------
FullTextQuery
A new query that combines both queries with OR.
"""
return BooleanQuery([(Occur.SHOULD, self), (Occur.SHOULD, other)])
@pydantic.dataclasses.dataclass
class MatchQuery(FullTextQuery):
"""
Match query for full-text search.
Parameters
----------
query : str
The query string to match against.
column : str
The name of the column to match against.
boost : float, default 1.0
The boost factor for the query.
The score of each matching document is multiplied by this value.
fuzziness : int, optional
The maximum edit distance for each term in the match query.
Defaults to 0 (exact match).
If None, fuzziness is applied automatically by the rules:
- 0 for terms with length <= 2
- 1 for terms with length <= 5
- 2 for terms with length > 5
max_expansions : int, optional
The maximum number of terms to consider for fuzzy matching.
Defaults to 50.
operator : FullTextOperator, default OR
The operator to use for combining the query results.
Can be either `AND` or `OR`.
If `AND`, all terms in the query must match.
If `OR`, at least one term in the query must match.
"""
query: str
column: str
boost: float = pydantic.Field(1.0, kw_only=True)
fuzziness: int = pydantic.Field(0, kw_only=True)
max_expansions: int = pydantic.Field(50, kw_only=True)
operator: FullTextOperator = pydantic.Field(FullTextOperator.OR, kw_only=True)
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.MATCH
def to_dict(self) -> dict:
return {
"match": {
self.column: {
"query": self.query,
"boost": self.boost,
"fuzziness": self.fuzziness,
"max_expansions": self.max_expansions,
}
}
}
@pydantic.dataclasses.dataclass
class PhraseQuery(FullTextQuery):
"""
Phrase query for full-text search.
Parameters
----------
query : str
The query string to match against.
column : str
The name of the column to match against.
"""
query: str
column: str
def __init__(self, query: str, column: str):
"""
Phrase query for full-text search.
Parameters
----------
query : str
The query string to match against.
column : str
The name of the column to match against.
"""
super().__init__(query=query, column=column)
slop: int = pydantic.Field(0, kw_only=True)
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.MATCH_PHRASE
def to_dict(self) -> dict:
return {
"match_phrase": {
self.column: self.query,
}
}
@pydantic.dataclasses.dataclass
class BoostQuery(FullTextQuery):
"""
Boost query for full-text search.
Parameters
----------
positive : dict
The positive query object.
negative : dict
The negative query object.
negative_boost : float, default 0.5
The boost factor for the negative query.
"""
positive: FullTextQuery
negative: FullTextQuery
negative_boost: float = 0.5
def __init__(
self,
positive: FullTextQuery,
negative: FullTextQuery,
*,
negative_boost: float = 0.5,
):
"""
Boost query for full-text search.
Parameters
----------
positive : dict
The positive query object.
negative : dict
The negative query object.
negative_boost : float
The boost factor for the negative query.
"""
super().__init__(
positive=positive, negative=negative, negative_boost=negative_boost
)
negative_boost: float = pydantic.Field(0.5, kw_only=True)
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.BOOST
def to_dict(self) -> dict:
return {
"boost": {
"positive": self.positive.to_dict(),
"negative": self.negative.to_dict(),
"negative_boost": self.negative_boost,
}
}
@pydantic.dataclasses.dataclass
class MultiMatchQuery(FullTextQuery):
"""
Multi-match query for full-text search.
Parameters
----------
query : str | list[Query]
If a string, the query string to match against.
columns : list[str]
The list of columns to match against.
boosts : list[float], optional
The list of boost factors for each column. If not provided,
all columns will have the same boost factor.
operator : FullTextOperator, default OR
The operator to use for combining the query results.
Can be either `AND` or `OR`.
It would be applied to all columns individually.
For example, if the operator is `AND`,
then the query "hello world" is equal to
`match("hello AND world", column1) OR match("hello AND world", column2)`.
"""
query: str
columns: list[str]
boosts: list[float]
def __init__(
self,
query: str,
columns: list[str],
*,
boosts: Optional[list[float]] = None,
):
"""
Multi-match query for full-text search.
Parameters
----------
query : str
The query string to match against.
columns : list[str]
The list of columns to match against.
boosts : list[float], optional
The list of boost factors for each column. If not provided,
all columns will have the same boost factor.
"""
if boosts is None:
boosts = [1.0] * len(columns)
super().__init__(query=query, columns=columns, boosts=boosts)
boosts: Optional[list[float]] = pydantic.Field(None, kw_only=True)
operator: FullTextOperator = pydantic.Field(FullTextOperator.OR, kw_only=True)
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.MULTI_MATCH
def to_dict(self) -> dict:
return {
"multi_match": {
"query": self.query,
"columns": self.columns,
"boost": self.boosts,
}
}
@pydantic.dataclasses.dataclass
class BooleanQuery(FullTextQuery):
"""
Boolean query for full-text search.
Parameters
----------
queries : list[tuple(Occur, FullTextQuery)]
The list of queries with their occurrence requirements.
"""
queries: list[tuple[Occur, FullTextQuery]]
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.BOOLEAN
class FullTextSearchQuery(pydantic.BaseModel):
@@ -445,8 +437,18 @@ class Query(pydantic.BaseModel):
# which columns to return in the results
columns: Optional[Union[List[str], Dict[str, str]]] = None
# number of IVF partitions to search
nprobes: Optional[int] = None
# minimum number of IVF partitions to search
#
# If None then a default value (20) will be used.
minimum_nprobes: Optional[int] = None
# maximum number of IVF partitions to search
#
# If None then a default value (20) will be used.
#
# If 0 then no limit will be applied and all partitions could be searched
# if needed to satisfy the limit.
maximum_nprobes: Optional[int] = None
# lower bound for distance search
lower_bound: Optional[float] = None
@@ -484,7 +486,8 @@ class Query(pydantic.BaseModel):
query.vector_column = req.column
query.vector = req.query_vector
query.distance_type = req.distance_type
query.nprobes = req.nprobes
query.minimum_nprobes = req.minimum_nprobes
query.maximum_nprobes = req.maximum_nprobes
query.lower_bound = req.lower_bound
query.upper_bound = req.upper_bound
query.ef = req.ef
@@ -493,10 +496,8 @@ class Query(pydantic.BaseModel):
query.postfilter = req.postfilter
if req.full_text_search is not None:
query.full_text_query = FullTextSearchQuery(
columns=req.full_text_search.columns,
query=req.full_text_search.query,
limit=req.full_text_search.limit,
wand_factor=req.full_text_search.wand_factor,
columns=None,
query=req.full_text_search,
)
return query
@@ -1047,7 +1048,8 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
super().__init__(table)
self._query = query
self._distance_type = None
self._nprobes = None
self._minimum_nprobes = None
self._maximum_nprobes = None
self._lower_bound = None
self._upper_bound = None
self._refine_factor = None
@@ -1110,6 +1112,10 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
See discussion in [Querying an ANN Index][querying-an-ann-index] for
tuning advice.
This method sets both the minimum and maximum number of probes to the same
value. See `minimum_nprobes` and `maximum_nprobes` for more fine-grained
control.
Parameters
----------
nprobes: int
@@ -1120,7 +1126,36 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
LanceVectorQueryBuilder
The LanceQueryBuilder object.
"""
self._nprobes = nprobes
self._minimum_nprobes = nprobes
self._maximum_nprobes = nprobes
return self
def minimum_nprobes(self, minimum_nprobes: int) -> LanceVectorQueryBuilder:
"""Set the minimum number of probes to use.
See `nprobes` for more details.
These partitions will be searched on every vector query and will increase recall
at the expense of latency.
"""
self._minimum_nprobes = minimum_nprobes
return self
def maximum_nprobes(self, maximum_nprobes: int) -> LanceVectorQueryBuilder:
"""Set the maximum number of probes to use.
See `nprobes` for more details.
If this value is greater than `minimum_nprobes` then the excess partitions
will be searched only if we have not found enough results.
This can be useful when there is a narrow filter to allow these queries to
spend more time searching and avoid potential false negatives.
If this value is 0 then no limit will be applied and all partitions could be
searched if needed to satisfy the limit.
"""
self._maximum_nprobes = maximum_nprobes
return self
def distance_range(
@@ -1224,7 +1259,8 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
limit=self._limit,
distance_type=self._distance_type,
columns=self._columns,
nprobes=self._nprobes,
minimum_nprobes=self._minimum_nprobes,
maximum_nprobes=self._maximum_nprobes,
lower_bound=self._lower_bound,
upper_bound=self._upper_bound,
refine_factor=self._refine_factor,
@@ -1588,7 +1624,8 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
self._fts_columns = fts_columns
self._norm = None
self._reranker = None
self._nprobes = None
self._minimum_nprobes = None
self._maximum_nprobes = None
self._refine_factor = None
self._distance_type = None
self._phrase_query = None
@@ -1820,7 +1857,24 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
LanceHybridQueryBuilder
The LanceHybridQueryBuilder object.
"""
self._nprobes = nprobes
self._minimum_nprobes = nprobes
self._maximum_nprobes = nprobes
return self
def minimum_nprobes(self, minimum_nprobes: int) -> LanceHybridQueryBuilder:
"""Set the minimum number of probes to use.
See `nprobes` for more details.
"""
self._minimum_nprobes = minimum_nprobes
return self
def maximum_nprobes(self, maximum_nprobes: int) -> LanceHybridQueryBuilder:
"""Set the maximum number of probes to use.
See `nprobes` for more details.
"""
self._maximum_nprobes = maximum_nprobes
return self
def distance_range(
@@ -2049,8 +2103,10 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
self._fts_query.phrase_query(True)
if self._distance_type:
self._vector_query.metric(self._distance_type)
if self._nprobes:
self._vector_query.nprobes(self._nprobes)
if self._minimum_nprobes:
self._vector_query.minimum_nprobes(self._minimum_nprobes)
if self._maximum_nprobes is not None:
self._vector_query.maximum_nprobes(self._maximum_nprobes)
if self._refine_factor:
self._vector_query.refine_factor(self._refine_factor)
if self._ef:
@@ -2513,7 +2569,7 @@ class AsyncQuery(AsyncQueryBase):
self._inner.nearest_to_text({"query": query, "columns": columns})
)
# FullTextQuery object
return AsyncFTSQuery(self._inner.nearest_to_text({"query": query.to_dict()}))
return AsyncFTSQuery(self._inner.nearest_to_text({"query": query}))
class AsyncFTSQuery(AsyncQueryBase):
@@ -2661,6 +2717,34 @@ class AsyncVectorQueryBase:
self._inner.nprobes(nprobes)
return self
def minimum_nprobes(self, minimum_nprobes: int) -> Self:
"""Set the minimum number of probes to use.
See `nprobes` for more details.
These partitions will be searched on every indexed vector query and will
increase recall at the expense of latency.
"""
self._inner.minimum_nprobes(minimum_nprobes)
return self
def maximum_nprobes(self, maximum_nprobes: int) -> Self:
"""Set the maximum number of probes to use.
See `nprobes` for more details.
If this value is greater than `minimum_nprobes` then the excess partitions
will be searched only if we have not found enough results.
This can be useful when there is a narrow filter to allow these queries to
spend more time searching and avoid potential false negatives.
If this value is 0 then no limit will be applied and all partitions could be
searched if needed to satisfy the limit.
"""
self._inner.maximum_nprobes(maximum_nprobes)
return self
def distance_range(
self, lower_bound: Optional[float] = None, upper_bound: Optional[float] = None
) -> Self:
@@ -2835,7 +2919,7 @@ class AsyncVectorQuery(AsyncQueryBase, AsyncVectorQueryBase):
self._inner.nearest_to_text({"query": query, "columns": columns})
)
# FullTextQuery object
return AsyncHybridQuery(self._inner.nearest_to_text({"query": query.to_dict()}))
return AsyncHybridQuery(self._inner.nearest_to_text({"query": query}))
async def to_batches(
self,

View File

@@ -3637,8 +3637,10 @@ class AsyncTable:
)
if query.distance_type is not None:
async_query = async_query.distance_type(query.distance_type)
if query.nprobes is not None:
async_query = async_query.nprobes(query.nprobes)
if query.minimum_nprobes is not None:
async_query = async_query.minimum_nprobes(query.minimum_nprobes)
if query.maximum_nprobes is not None:
async_query = async_query.maximum_nprobes(query.maximum_nprobes)
if query.refine_factor is not None:
async_query = async_query.refine_factor(query.refine_factor)
if query.vector_column:

View File

@@ -215,6 +215,19 @@ def test_search_fts(table, use_tantivy):
assert len(results) == 5
assert len(results[0]) == 3 # id, text, _score
# Test boolean query
results = (
table.search(MatchQuery("puppy", "text") & MatchQuery("runs", "text"))
.select(["id", "text"])
.limit(5)
.to_list()
)
assert len(results) == 5
assert len(results[0]) == 3 # id, text, _score
for r in results:
assert "puppy" in r["text"]
assert "runs" in r["text"]
@pytest.mark.asyncio
async def test_fts_select_async(async_table):

View File

@@ -25,6 +25,8 @@ from lancedb.query import (
AsyncQueryBase,
AsyncVectorQuery,
LanceVectorQueryBuilder,
MatchQuery,
PhraseQuery,
Query,
FullTextSearchQuery,
)
@@ -437,6 +439,33 @@ def test_query_builder_with_filter(table):
assert all(np.array(rs[0]["vector"]) == [3, 4])
def test_invalid_nprobes_sync(table):
with pytest.raises(ValueError, match="minimum_nprobes must be greater than 0"):
LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(0).to_list()
with pytest.raises(
ValueError, match="maximum_nprobes must be greater than minimum_nprobes"
):
LanceVectorQueryBuilder(table, [0, 0], "vector").maximum_nprobes(5).to_list()
with pytest.raises(
ValueError, match="minimum_nprobes must be less or equal to maximum_nprobes"
):
LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(100).to_list()
@pytest.mark.asyncio
async def test_invalid_nprobes_async(table_async: AsyncTable):
with pytest.raises(ValueError, match="minimum_nprobes must be greater than 0"):
await table_async.vector_search([0, 0]).minimum_nprobes(0).to_list()
with pytest.raises(
ValueError, match="maximum_nprobes must be greater than minimum_nprobes"
):
await table_async.vector_search([0, 0]).maximum_nprobes(5).to_list()
with pytest.raises(
ValueError, match="minimum_nprobes must be less or equal to maximum_nprobes"
):
await table_async.vector_search([0, 0]).minimum_nprobes(100).to_list()
def test_query_builder_with_prefilter(table):
df = (
LanceVectorQueryBuilder(table, [0, 0], "vector")
@@ -583,6 +612,21 @@ async def test_query_async(table_async: AsyncTable):
table_async.query().nearest_to(pa.array([1, 2])).nprobes(10),
expected_num_rows=2,
)
await check_query(
table_async.query().nearest_to(pa.array([1, 2])).minimum_nprobes(10),
expected_num_rows=2,
)
await check_query(
table_async.query().nearest_to(pa.array([1, 2])).maximum_nprobes(30),
expected_num_rows=2,
)
await check_query(
table_async.query()
.nearest_to(pa.array([1, 2]))
.minimum_nprobes(10)
.maximum_nprobes(20),
expected_num_rows=2,
)
await check_query(
table_async.query().nearest_to(pa.array([1, 2])).bypass_vector_index(),
expected_num_rows=2,
@@ -909,7 +953,39 @@ def test_query_serialization_sync(table: lancedb.table.Table):
q = table.search([5.0, 6.0]).nprobes(10).refine_factor(5).to_query_object()
check_set_props(
q, vector_column="vector", vector=[5.0, 6.0], nprobes=10, refine_factor=5
q,
vector_column="vector",
vector=[5.0, 6.0],
minimum_nprobes=10,
maximum_nprobes=10,
refine_factor=5,
)
q = table.search([5.0, 6.0]).minimum_nprobes(10).to_query_object()
check_set_props(
q,
vector_column="vector",
vector=[5.0, 6.0],
minimum_nprobes=10,
maximum_nprobes=None,
)
q = table.search([5.0, 6.0]).nprobes(50).to_query_object()
check_set_props(
q,
vector_column="vector",
vector=[5.0, 6.0],
minimum_nprobes=50,
maximum_nprobes=50,
)
q = table.search([5.0, 6.0]).maximum_nprobes(10).to_query_object()
check_set_props(
q,
vector_column="vector",
vector=[5.0, 6.0],
maximum_nprobes=10,
minimum_nprobes=None,
)
q = table.search([5.0, 6.0]).distance_range(0.0, 1.0).to_query_object()
@@ -961,7 +1037,8 @@ async def test_query_serialization_async(table_async: AsyncTable):
limit=10,
vector=sample_vector,
postfilter=False,
nprobes=20,
minimum_nprobes=20,
maximum_nprobes=20,
with_row_id=False,
bypass_vector_index=False,
)
@@ -971,7 +1048,20 @@ async def test_query_serialization_async(table_async: AsyncTable):
q,
vector=sample_vector,
postfilter=False,
nprobes=20,
minimum_nprobes=20,
maximum_nprobes=20,
with_row_id=False,
bypass_vector_index=False,
limit=10,
)
q = (await table_async.search([5.0, 6.0])).nprobes(50).to_query_object()
check_set_props(
q,
vector=sample_vector,
postfilter=False,
minimum_nprobes=50,
maximum_nprobes=50,
with_row_id=False,
bypass_vector_index=False,
limit=10,
@@ -990,7 +1080,8 @@ async def test_query_serialization_async(table_async: AsyncTable):
filter="id = 1",
postfilter=True,
vector=sample_vector,
nprobes=20,
minimum_nprobes=20,
maximum_nprobes=20,
with_row_id=False,
bypass_vector_index=False,
)
@@ -1004,7 +1095,8 @@ async def test_query_serialization_async(table_async: AsyncTable):
check_set_props(
q,
vector=sample_vector,
nprobes=10,
minimum_nprobes=10,
maximum_nprobes=10,
refine_factor=5,
postfilter=False,
with_row_id=False,
@@ -1012,6 +1104,18 @@ async def test_query_serialization_async(table_async: AsyncTable):
limit=10,
)
q = (await table_async.search([5.0, 6.0])).minimum_nprobes(5).to_query_object()
check_set_props(
q,
vector=sample_vector,
minimum_nprobes=5,
maximum_nprobes=20,
postfilter=False,
with_row_id=False,
bypass_vector_index=False,
limit=10,
)
q = (
(await table_async.search([5.0, 6.0]))
.distance_range(0.0, 1.0)
@@ -1023,7 +1127,8 @@ async def test_query_serialization_async(table_async: AsyncTable):
lower_bound=0.0,
upper_bound=1.0,
postfilter=False,
nprobes=20,
minimum_nprobes=20,
maximum_nprobes=20,
with_row_id=False,
bypass_vector_index=False,
limit=10,
@@ -1035,7 +1140,8 @@ async def test_query_serialization_async(table_async: AsyncTable):
distance_type="cosine",
vector=sample_vector,
postfilter=False,
nprobes=20,
minimum_nprobes=20,
maximum_nprobes=20,
with_row_id=False,
bypass_vector_index=False,
limit=10,
@@ -1047,7 +1153,8 @@ async def test_query_serialization_async(table_async: AsyncTable):
ef=7,
vector=sample_vector,
postfilter=False,
nprobes=20,
minimum_nprobes=20,
maximum_nprobes=20,
with_row_id=False,
bypass_vector_index=False,
limit=10,
@@ -1059,24 +1166,34 @@ async def test_query_serialization_async(table_async: AsyncTable):
bypass_vector_index=True,
vector=sample_vector,
postfilter=False,
nprobes=20,
minimum_nprobes=20,
maximum_nprobes=20,
with_row_id=False,
limit=10,
)
# FTS queries
q = (await table_async.search("foo")).limit(10).to_query_object()
match_query = MatchQuery("foo", "text")
q = (await table_async.search(match_query)).limit(10).to_query_object()
check_set_props(
q,
limit=10,
full_text_query=FullTextSearchQuery(columns=[], query="foo"),
full_text_query=FullTextSearchQuery(columns=None, query=match_query),
with_row_id=False,
)
q = (await table_async.search("foo", query_type="fts")).to_query_object()
q = (await table_async.search(match_query)).to_query_object()
check_set_props(
q,
full_text_query=FullTextSearchQuery(columns=[], query="foo"),
full_text_query=FullTextSearchQuery(columns=None, query=match_query),
with_row_id=False,
)
phrase_query = PhraseQuery("foo", "text", slop=1)
q = (await table_async.search(phrase_query)).to_query_object()
check_set_props(
q,
full_text_query=FullTextSearchQuery(columns=None, query=phrase_query),
with_row_id=False,
)

View File

@@ -496,6 +496,8 @@ def test_query_sync_minimal():
"ef": None,
"vector": [1.0, 2.0, 3.0],
"nprobes": 20,
"minimum_nprobes": 20,
"maximum_nprobes": 20,
"version": None,
}
@@ -536,6 +538,8 @@ def test_query_sync_maximal():
"refine_factor": 10,
"vector": [1.0, 2.0, 3.0],
"nprobes": 5,
"minimum_nprobes": 5,
"maximum_nprobes": 5,
"lower_bound": None,
"upper_bound": None,
"ef": None,
@@ -564,6 +568,66 @@ def test_query_sync_maximal():
)
def test_query_sync_nprobes():
def handler(body):
assert body == {
"distance_type": "l2",
"k": 10,
"prefilter": True,
"fast_search": True,
"vector_column": "vector2",
"refine_factor": None,
"lower_bound": None,
"upper_bound": None,
"ef": None,
"vector": [1.0, 2.0, 3.0],
"nprobes": 5,
"minimum_nprobes": 5,
"maximum_nprobes": 15,
"version": None,
}
return pa.table({"id": [1, 2, 3], "name": ["a", "b", "c"]})
with query_test_table(handler) as table:
(
table.search([1, 2, 3], vector_column_name="vector2", fast_search=True)
.minimum_nprobes(5)
.maximum_nprobes(15)
.to_list()
)
def test_query_sync_no_max_nprobes():
def handler(body):
assert body == {
"distance_type": "l2",
"k": 10,
"prefilter": True,
"fast_search": True,
"vector_column": "vector2",
"refine_factor": None,
"lower_bound": None,
"upper_bound": None,
"ef": None,
"vector": [1.0, 2.0, 3.0],
"nprobes": 5,
"minimum_nprobes": 5,
"maximum_nprobes": 0,
"version": None,
}
return pa.table({"id": [1, 2, 3], "name": ["a", "b", "c"]})
with query_test_table(handler) as table:
(
table.search([1, 2, 3], vector_column_name="vector2", fast_search=True)
.minimum_nprobes(5)
.maximum_nprobes(0)
.to_list()
)
@pytest.mark.parametrize("server_version", [Version("0.1.0"), Version("0.2.0")])
def test_query_sync_batch_queries(server_version):
def handler(body):
@@ -666,6 +730,8 @@ def test_query_sync_hybrid():
"refine_factor": None,
"vector": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
"nprobes": 20,
"minimum_nprobes": 20,
"maximum_nprobes": 20,
"lower_bound": None,
"upper_bound": None,
"ef": None,

View File

@@ -9,15 +9,16 @@ use arrow::array::Array;
use arrow::array::ArrayData;
use arrow::pyarrow::FromPyArrow;
use arrow::pyarrow::IntoPyArrow;
use lancedb::index::scalar::{FtsQuery, FullTextSearchQuery, MatchQuery, PhraseQuery};
use lancedb::index::scalar::{
BooleanQuery, BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, Occur,
Operator, PhraseQuery,
};
use lancedb::query::QueryExecutionOptions;
use lancedb::query::QueryFilter;
use lancedb::query::{
ExecutableQuery, Query as LanceDbQuery, QueryBase, Select, VectorQuery as LanceDbVectorQuery,
};
use lancedb::table::AnyQuery;
use pyo3::exceptions::PyRuntimeError;
use pyo3::exceptions::{PyNotImplementedError, PyValueError};
use pyo3::prelude::{PyAnyMethods, PyDictMethods};
use pyo3::pymethods;
use pyo3::types::PyList;
@@ -27,34 +28,183 @@ use pyo3::IntoPyObject;
use pyo3::PyAny;
use pyo3::PyRef;
use pyo3::PyResult;
use pyo3::{exceptions::PyRuntimeError, FromPyObject};
use pyo3::{
exceptions::{PyNotImplementedError, PyValueError},
intern,
};
use pyo3::{pyclass, PyErr};
use pyo3_async_runtimes::tokio::future_into_py;
use crate::arrow::RecordBatchStream;
use crate::error::PythonErrorExt;
use crate::util::{parse_distance_type, parse_fts_query};
use crate::util::parse_distance_type;
use crate::{arrow::RecordBatchStream, util::PyLanceDB};
use crate::{error::PythonErrorExt, index::class_name};
// Python representation of full text search parameters
#[derive(Clone)]
#[pyclass(get_all)]
pub struct PyFullTextSearchQuery {
pub columns: Vec<String>,
pub query: String,
pub limit: Option<i64>,
pub wand_factor: Option<f32>,
impl FromPyObject<'_> for PyLanceDB<FtsQuery> {
fn extract_bound(ob: &Bound<'_, PyAny>) -> PyResult<Self> {
match class_name(ob)?.as_str() {
"MatchQuery" => {
let query = ob.getattr("query")?.extract()?;
let column = ob.getattr("column")?.extract()?;
let boost = ob.getattr("boost")?.extract()?;
let fuzziness = ob.getattr("fuzziness")?.extract()?;
let max_expansions = ob.getattr("max_expansions")?.extract()?;
let operator = ob.getattr("operator")?.extract::<String>()?;
Ok(PyLanceDB(
MatchQuery::new(query)
.with_column(Some(column))
.with_boost(boost)
.with_fuzziness(fuzziness)
.with_max_expansions(max_expansions)
.with_operator(Operator::try_from(operator.as_str()).map_err(|e| {
PyValueError::new_err(format!("Invalid operator: {}", e))
})?)
.into(),
))
}
"PhraseQuery" => {
let query = ob.getattr("query")?.extract()?;
let column = ob.getattr("column")?.extract()?;
let slop = ob.getattr("slop")?.extract()?;
Ok(PyLanceDB(
PhraseQuery::new(query)
.with_column(Some(column))
.with_slop(slop)
.into(),
))
}
"BoostQuery" => {
let positive: PyLanceDB<FtsQuery> = ob.getattr("positive")?.extract()?;
let negative: PyLanceDB<FtsQuery> = ob.getattr("negative")?.extract()?;
let negative_boost = ob.getattr("negative_boost")?.extract()?;
Ok(PyLanceDB(
BoostQuery::new(positive.0, negative.0, negative_boost).into(),
))
}
"MultiMatchQuery" => {
let query = ob.getattr("query")?.extract()?;
let columns = ob.getattr("columns")?.extract()?;
let boosts: Option<Vec<f32>> = ob.getattr("boosts")?.extract()?;
let operator: String = ob.getattr("operator")?.extract()?;
let q = MultiMatchQuery::try_new(query, columns)
.map_err(|e| PyValueError::new_err(format!("Invalid query: {}", e)))?;
let q = if let Some(boosts) = boosts {
q.try_with_boosts(boosts)
.map_err(|e| PyValueError::new_err(format!("Invalid boosts: {}", e)))?
} else {
q
};
let op = Operator::try_from(operator.as_str())
.map_err(|e| PyValueError::new_err(format!("Invalid operator: {}", e)))?;
Ok(PyLanceDB(q.with_operator(op).into()))
}
"BooleanQuery" => {
let queries: Vec<(String, PyLanceDB<FtsQuery>)> =
ob.getattr("queries")?.extract()?;
let mut sub_queries = Vec::with_capacity(queries.len());
for (occur, q) in queries {
let occur = Occur::try_from(occur.as_str())
.map_err(|e| PyValueError::new_err(e.to_string()))?;
sub_queries.push((occur, q.0));
}
Ok(PyLanceDB(BooleanQuery::new(sub_queries).into()))
}
name => Err(PyValueError::new_err(format!(
"Unsupported FTS query type: {}",
name
))),
}
}
}
impl From<FullTextSearchQuery> for PyFullTextSearchQuery {
fn from(query: FullTextSearchQuery) -> Self {
Self {
columns: query.columns().into_iter().collect(),
query: query.query.query().to_owned(),
limit: query.limit,
wand_factor: query.wand_factor,
impl<'py> IntoPyObject<'py> for PyLanceDB<FtsQuery> {
type Target = PyAny;
type Output = Bound<'py, Self::Target>;
type Error = PyErr;
fn into_pyobject(self, py: pyo3::Python<'py>) -> PyResult<Self::Output> {
let namespace = py
.import(intern!(py, "lancedb"))
.and_then(|m| m.getattr(intern!(py, "query")))
.expect("Failed to import namespace");
match self.0 {
FtsQuery::Match(query) => {
let kwargs = PyDict::new(py);
kwargs.set_item("boost", query.boost)?;
kwargs.set_item("fuzziness", query.fuzziness)?;
kwargs.set_item("max_expansions", query.max_expansions)?;
kwargs.set_item("operator", operator_to_str(query.operator))?;
namespace
.getattr(intern!(py, "MatchQuery"))?
.call((query.terms, query.column.unwrap()), Some(&kwargs))
}
FtsQuery::Phrase(query) => {
let kwargs = PyDict::new(py);
kwargs.set_item("slop", query.slop)?;
namespace
.getattr(intern!(py, "PhraseQuery"))?
.call((query.terms, query.column.unwrap()), Some(&kwargs))
}
FtsQuery::Boost(query) => {
let positive = PyLanceDB(query.positive.as_ref().clone()).into_pyobject(py)?;
let negative = PyLanceDB(query.negative.as_ref().clone()).into_pyobject(py)?;
let kwargs = PyDict::new(py);
kwargs.set_item("negative_boost", query.negative_boost)?;
namespace
.getattr(intern!(py, "BoostQuery"))?
.call((positive, negative), Some(&kwargs))
}
FtsQuery::MultiMatch(query) => {
let first = &query.match_queries[0];
let (columns, boosts): (Vec<_>, Vec<_>) = query
.match_queries
.iter()
.map(|q| (q.column.as_ref().unwrap().clone(), q.boost))
.unzip();
let kwargs = PyDict::new(py);
kwargs.set_item("boosts", boosts)?;
kwargs.set_item("operator", operator_to_str(first.operator))?;
namespace
.getattr(intern!(py, "MultiMatchQuery"))?
.call((first.terms.clone(), columns), Some(&kwargs))
}
FtsQuery::Boolean(query) => {
let mut queries = Vec::with_capacity(query.must.len() + query.should.len());
for q in query.must {
queries.push((occur_to_str(Occur::Must), PyLanceDB(q).into_pyobject(py)?));
}
for q in query.should {
queries.push((occur_to_str(Occur::Should), PyLanceDB(q).into_pyobject(py)?));
}
namespace
.getattr(intern!(py, "BooleanQuery"))?
.call1((queries,))
}
}
}
}
fn operator_to_str(op: Operator) -> &'static str {
match op {
Operator::And => "AND",
Operator::Or => "OR",
}
}
fn occur_to_str(occur: Occur) -> &'static str {
match occur {
Occur::Must => "MUST",
Occur::Should => "SHOULD",
Occur::MustNot => "MUST NOT",
}
}
// Python representation of query vector(s)
#[derive(Clone)]
pub struct PyQueryVectors(Vec<Arc<dyn Array>>);
@@ -80,13 +230,16 @@ pub struct PyQueryRequest {
pub limit: Option<usize>,
pub offset: Option<usize>,
pub filter: Option<PyQueryFilter>,
pub full_text_search: Option<PyFullTextSearchQuery>,
pub full_text_search: Option<PyLanceDB<FtsQuery>>,
pub select: PySelect,
pub fast_search: Option<bool>,
pub with_row_id: Option<bool>,
pub column: Option<String>,
pub query_vector: Option<PyQueryVectors>,
pub nprobes: Option<usize>,
pub minimum_nprobes: Option<usize>,
// None means user did not set it and default shoud be used (currenty 20)
// Some(0) means user set it to None and there is no limit
pub maximum_nprobes: Option<usize>,
pub lower_bound: Option<f32>,
pub upper_bound: Option<f32>,
pub ef: Option<usize>,
@@ -106,13 +259,14 @@ impl From<AnyQuery> for PyQueryRequest {
filter: query_request.filter.map(PyQueryFilter),
full_text_search: query_request
.full_text_search
.map(PyFullTextSearchQuery::from),
.map(|fts| PyLanceDB(fts.query)),
select: PySelect(query_request.select),
fast_search: Some(query_request.fast_search),
with_row_id: Some(query_request.with_row_id),
column: None,
query_vector: None,
nprobes: None,
minimum_nprobes: None,
maximum_nprobes: None,
lower_bound: None,
upper_bound: None,
ef: None,
@@ -132,7 +286,11 @@ impl From<AnyQuery> for PyQueryRequest {
with_row_id: Some(vector_query.base.with_row_id),
column: vector_query.column,
query_vector: Some(PyQueryVectors(vector_query.query_vector)),
nprobes: Some(vector_query.nprobes),
minimum_nprobes: Some(vector_query.minimum_nprobes),
maximum_nprobes: match vector_query.maximum_nprobes {
None => Some(0),
Some(value) => Some(value),
},
lower_bound: vector_query.lower_bound,
upper_bound: vector_query.upper_bound,
ef: vector_query.ef,
@@ -269,8 +427,8 @@ impl Query {
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
match columns {
Some(cols) if !cols.is_empty() => {
query = query.with_columns(&cols).map_err(|e| {
PyValueError::new_err(format!(
"Failed to set full text search columns: {}",
@@ -278,15 +436,12 @@ impl Query {
))
})?;
}
_ => {}
}
query
} else if let Ok(query) = fts_query.downcast::<PyDict>() {
let query = parse_fts_query(query)?;
FullTextSearchQuery::new_query(query)
} else {
return Err(PyValueError::new_err(
"query must be a string or a Query object",
));
let query = fts_query.extract::<PyLanceDB<FtsQuery>>()?;
FullTextSearchQuery::new_query(query.0)
};
Ok(FTSQuery {
@@ -509,6 +664,29 @@ impl VectorQuery {
self.inner = self.inner.clone().nprobes(nprobe as usize);
}
pub fn minimum_nprobes(&mut self, minimum_nprobes: u32) -> PyResult<()> {
self.inner = self
.inner
.clone()
.minimum_nprobes(minimum_nprobes as usize)
.infer_error()?;
Ok(())
}
pub fn maximum_nprobes(&mut self, maximum_nprobes: u32) -> PyResult<()> {
let maximum_nprobes = if maximum_nprobes == 0 {
None
} else {
Some(maximum_nprobes as usize)
};
self.inner = self
.inner
.clone()
.maximum_nprobes(maximum_nprobes)
.infer_error()?;
Ok(())
}
#[pyo3(signature = (lower_bound=None, upper_bound=None))]
pub fn distance_range(&mut self, lower_bound: Option<f32>, upper_bound: Option<f32>) {
self.inner = self.inner.clone().distance_range(lower_bound, upper_bound);

View File

@@ -3,15 +3,11 @@
use std::sync::Mutex;
use lancedb::index::scalar::{BoostQuery, FtsQuery, MatchQuery, MultiMatchQuery, PhraseQuery};
use lancedb::DistanceType;
use pyo3::prelude::{PyAnyMethods, PyDictMethods, PyListMethods};
use pyo3::types::PyDict;
use pyo3::{
exceptions::{PyRuntimeError, PyValueError},
pyfunction, PyResult,
};
use pyo3::{Bound, PyAny};
/// A wrapper around a rust builder
///
@@ -64,116 +60,6 @@ pub fn validate_table_name(table_name: &str) -> PyResult<()> {
.map_err(|e| PyValueError::new_err(e.to_string()))
}
pub fn parse_fts_query(query: &Bound<'_, PyDict>) -> PyResult<FtsQuery> {
let query_type = query.keys().get_item(0)?.extract::<String>()?;
let query_value = query
.get_item(&query_type)?
.ok_or(PyValueError::new_err(format!(
"Query type {} not found",
query_type
)))?;
let query_value = query_value.downcast::<PyDict>()?;
match query_type.as_str() {
"match" => {
let column = query_value.keys().get_item(0)?.extract::<String>()?;
let params = query_value
.get_item(&column)?
.ok_or(PyValueError::new_err(format!(
"column {} not found",
column
)))?;
let params = params.downcast::<PyDict>()?;
let query = params
.get_item("query")?
.ok_or(PyValueError::new_err("query not found"))?
.extract::<String>()?;
let boost = params
.get_item("boost")?
.ok_or(PyValueError::new_err("boost not found"))?
.extract::<f32>()?;
let fuzziness = params
.get_item("fuzziness")?
.ok_or(PyValueError::new_err("fuzziness not found"))?
.extract::<Option<u32>>()?;
let max_expansions = params
.get_item("max_expansions")?
.ok_or(PyValueError::new_err("max_expansions not found"))?
.extract::<usize>()?;
let query = MatchQuery::new(query)
.with_column(Some(column))
.with_boost(boost)
.with_fuzziness(fuzziness)
.with_max_expansions(max_expansions);
Ok(query.into())
}
"match_phrase" => {
let column = query_value.keys().get_item(0)?.extract::<String>()?;
let query = query_value
.get_item(&column)?
.ok_or(PyValueError::new_err(format!(
"column {} not found",
column
)))?
.extract::<String>()?;
let query = PhraseQuery::new(query).with_column(Some(column));
Ok(query.into())
}
"boost" => {
let positive: Bound<'_, PyAny> = query_value
.get_item("positive")?
.ok_or(PyValueError::new_err("positive not found"))?;
let positive = positive.downcast::<PyDict>()?;
let negative = query_value
.get_item("negative")?
.ok_or(PyValueError::new_err("negative not found"))?;
let negative = negative.downcast::<PyDict>()?;
let negative_boost = query_value
.get_item("negative_boost")?
.ok_or(PyValueError::new_err("negative_boost not found"))?
.extract::<f32>()?;
let positive_query = parse_fts_query(positive)?;
let negative_query = parse_fts_query(negative)?;
let query = BoostQuery::new(positive_query, negative_query, Some(negative_boost));
Ok(query.into())
}
"multi_match" => {
let query = query_value
.get_item("query")?
.ok_or(PyValueError::new_err("query not found"))?
.extract::<String>()?;
let columns = query_value
.get_item("columns")?
.ok_or(PyValueError::new_err("columns not found"))?
.extract::<Vec<String>>()?;
let boost = query_value
.get_item("boost")?
.ok_or(PyValueError::new_err("boost not found"))?
.extract::<Vec<f32>>()?;
let query = MultiMatchQuery::try_new(query, columns)
.and_then(|q| q.try_with_boosts(boost))
.map_err(|e| {
PyValueError::new_err(format!("Error creating MultiMatchQuery: {}", e))
})?;
Ok(query.into())
}
_ => Err(PyValueError::new_err(format!(
"Unsupported query type: {}",
query_type
))),
}
}
/// A wrapper around a LanceDB type to allow it to be used in Python
#[derive(Debug, Clone)]
pub struct PyLanceDB<T>(pub T);

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-node"
version = "0.20.0-beta.1"
version = "0.20.1-beta.0"
description = "Serverless, low-latency vector database for AI applications"
license.workspace = true
edition.workspace = true

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.20.0-beta.1"
version = "0.20.1-beta.0"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -796,8 +796,10 @@ pub struct VectorQueryRequest {
pub column: Option<String>,
/// The vector(s) to search for
pub query_vector: Vec<Arc<dyn Array>>,
/// The number of partitions to search
pub nprobes: usize,
/// The minimum number of partitions to search
pub minimum_nprobes: usize,
/// The maximum number of partitions to search
pub maximum_nprobes: Option<usize>,
/// The lower bound (inclusive) of the distance to search for.
pub lower_bound: Option<f32>,
/// The upper bound (exclusive) of the distance to search for.
@@ -819,7 +821,8 @@ impl Default for VectorQueryRequest {
base: QueryRequest::default(),
column: None,
query_vector: Vec::new(),
nprobes: 20,
minimum_nprobes: 20,
maximum_nprobes: Some(20),
lower_bound: None,
upper_bound: None,
ef: None,
@@ -925,11 +928,75 @@ impl VectorQuery {
/// For best results we recommend tuning this parameter with a benchmark against
/// your actual data to find the smallest possible value that will still give
/// you the desired recall.
///
/// This method sets both the minimum and maximum number of partitions to search.
/// For more fine-grained control see [`VectorQuery::minimum_nprobes`] and
/// [`VectorQuery::maximum_nprobes`].
pub fn nprobes(mut self, nprobes: usize) -> Self {
self.request.nprobes = nprobes;
self.request.minimum_nprobes = nprobes;
self.request.maximum_nprobes = Some(nprobes);
self
}
/// Set the minimum number of partitions to search
///
/// This argument is only used when the vector column has an IVF PQ index.
/// If there is no index then this value is ignored.
///
/// See [`VectorQuery::nprobes`] for more details.
///
/// These partitions will be searched on every indexed vector query.
///
/// Will return an error if the value is not greater than 0 or if maximum_nprobes
/// has been set and is less than the minimum_nprobes.
pub fn minimum_nprobes(mut self, minimum_nprobes: usize) -> Result<Self> {
if minimum_nprobes == 0 {
return Err(Error::InvalidInput {
message: "minimum_nprobes must be greater than 0".to_string(),
});
}
if let Some(maximum_nprobes) = self.request.maximum_nprobes {
if minimum_nprobes > maximum_nprobes {
return Err(Error::InvalidInput {
message: "minimum_nprobes must be less or equal to maximum_nprobes".to_string(),
});
}
}
self.request.minimum_nprobes = minimum_nprobes;
Ok(self)
}
/// Set the maximum number of partitions to search
///
/// This argument is only used when the vector column has an IVF PQ index.
/// If there is no index then this value is ignored.
///
/// See [`VectorQuery::nprobes`] for more details.
///
/// If this value is greater than minimum_nprobes then the excess partitions will
/// only be searched if the initial search does not return enough results.
///
/// This can be useful when there is a narrow filter to allow these queries to
/// spend more time searching and avoid potential false negatives.
///
/// Set to None to search all partitions, if needed, to satsify the limit
pub fn maximum_nprobes(mut self, maximum_nprobes: Option<usize>) -> Result<Self> {
if let Some(maximum_nprobes) = maximum_nprobes {
if maximum_nprobes == 0 {
return Err(Error::InvalidInput {
message: "maximum_nprobes must be greater than 0".to_string(),
});
}
if maximum_nprobes < self.request.minimum_nprobes {
return Err(Error::InvalidInput {
message: "maximum_nprobes must be greater than minimum_nprobes".to_string(),
});
}
}
self.request.maximum_nprobes = maximum_nprobes;
Ok(self)
}
/// Set the distance range for vector search,
/// only rows with distances in the range [lower_bound, upper_bound) will be returned
pub fn distance_range(mut self, lower_bound: Option<f32>, upper_bound: Option<f32>) -> Self {
@@ -1208,7 +1275,8 @@ mod tests {
);
assert_eq!(query.request.base.limit.unwrap(), 100);
assert_eq!(query.request.base.offset.unwrap(), 1);
assert_eq!(query.request.nprobes, 1000);
assert_eq!(query.request.minimum_nprobes, 1000);
assert_eq!(query.request.maximum_nprobes, Some(1000));
assert!(query.request.use_index);
assert_eq!(query.request.distance_type, Some(DistanceType::Cosine));
assert_eq!(query.request.refine_factor, Some(999));

View File

@@ -32,6 +32,7 @@ use lance::dataset::{ColumnAlteration, NewColumnTransform, Version};
use lance_datafusion::exec::{execute_plan, OneShotExec};
use reqwest::{RequestBuilder, Response};
use serde::{Deserialize, Serialize};
use serde_json::Number;
use std::collections::HashMap;
use std::io::Cursor;
use std::pin::Pin;
@@ -438,7 +439,18 @@ impl<S: HttpSend> RemoteTable<S> {
// Apply general parameters, before we dispatch based on number of query vectors.
body["distance_type"] = serde_json::json!(query.distance_type.unwrap_or_default());
body["nprobes"] = query.nprobes.into();
// In 0.23.1 we migrated from `nprobes` to `minimum_nprobes` and `maximum_nprobes`.
// Old client / new server: since minimum_nprobes is missing, fallback to nprobes
// New client / old server: old server will only see nprobes, make sure to set both
// nprobes and minimum_nprobes
// New client / new server: since minimum_nprobes is present, server can ignore nprobes
body["nprobes"] = query.minimum_nprobes.into();
body["minimum_nprobes"] = query.minimum_nprobes.into();
if let Some(maximum_nprobes) = query.maximum_nprobes {
body["maximum_nprobes"] = maximum_nprobes.into();
} else {
body["maximum_nprobes"] = serde_json::Value::Number(Number::from_u128(0).unwrap())
}
body["lower_bound"] = query.lower_bound.into();
body["upper_bound"] = query.upper_bound.into();
body["ef"] = query.ef.into();
@@ -2075,6 +2087,8 @@ mod tests {
"prefilter": true,
"distance_type": "l2",
"nprobes": 20,
"minimum_nprobes": 20,
"maximum_nprobes": 20,
"lower_bound": Option::<f32>::None,
"upper_bound": Option::<f32>::None,
"k": 10,
@@ -2175,6 +2189,8 @@ mod tests {
"bypass_vector_index": true,
"columns": ["a", "b"],
"nprobes": 12,
"minimum_nprobes": 12,
"maximum_nprobes": 12,
"lower_bound": Option::<f32>::None,
"upper_bound": Option::<f32>::None,
"ef": Option::<usize>::None,
@@ -2302,6 +2318,7 @@ mod tests {
"fuzziness": 0,
"max_expansions": 50,
"operator": "Or",
"prefix_length": 0,
},
}
},

View File

@@ -2354,12 +2354,15 @@ impl BaseTable for NativeTable {
query.base.limit.unwrap_or(DEFAULT_TOP_K),
)?;
}
scanner.minimum_nprobes(query.minimum_nprobes);
if let Some(maximum_nprobes) = query.maximum_nprobes {
scanner.maximum_nprobes(maximum_nprobes);
}
}
scanner.limit(
query.base.limit.map(|limit| limit as i64),
query.base.offset.map(|offset| offset as i64),
)?;
scanner.nprobs(query.nprobes);
if let Some(ef) = query.ef {
scanner.ef(ef);
}