Compare commits

...

27 Commits

Author SHA1 Message Date
Lance Release
c1738250a3 Bump version: 0.22.0-beta.1 → 0.22.0-beta.2 2025-04-01 17:27:57 +00:00
Weston Pace
1ee63984f5 feat: allow FSB to be used for btree indices (#2297)
We recently allowed this for lance but there was a check in lancedb as
well that was preventing it

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added support for indexing fixed-size binary data using B-tree
structures for efficient data storage and retrieval.
- **Tests**
- Implemented automated tests to ensure the new binary indexing works
correctly and meets the expected configuration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-01 10:27:22 -07:00
Lance Release
2eb2c8862a Updating package-lock.json 2025-04-01 14:27:26 +00:00
Lance Release
4ea8e178d3 Updating package-lock.json 2025-04-01 14:27:07 +00:00
Lance Release
e4485a630e Bump version: 0.19.0-beta.0 → 0.19.0-beta.1 2025-04-01 14:26:47 +00:00
Lance Release
fb95f9b3bd Bump version: 0.22.0-beta.0 → 0.22.0-beta.1 2025-04-01 14:26:28 +00:00
Weston Pace
625bab3f21 feat: update to lance 0.25.3b1 (#2294)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated dependency versions for improved performance and
compatibility.

- **New Features**
- Added support for structured full-text search with expanded query
types (e.g., match, phrase, boost, multi-match) and flexible input
formats.
- Introduced a new method to check server support for structural
full-text search features.
- Enhanced the query system with new classes and interfaces for handling
various full-text queries.
- Expanded the functionality of existing methods to accept more complex
query structures, including updates to method signatures.

- **Bug Fixes**
  - Improved error handling and reporting for full-text search queries.

- **Refactor**
- Enhanced query processing with streamlined input handling and improved
error reporting, ensuring more robust and consistent search results
across platforms.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
2025-04-01 06:36:42 -07:00
Will Jones
e59f9382a0 ci: deprecate vectordb each release (#2292)
I released each time we published, the new package was no longer
deprecated. This re-deprecated the package after a new publish.
2025-03-31 12:03:04 -07:00
Lance Release
fdee7ba477 Updating package-lock.json 2025-03-30 19:09:17 +00:00
Lance Release
c44fa3abc4 Updating package-lock.json 2025-03-30 18:05:07 +00:00
Lance Release
fc43aac0ed Updating package-lock.json 2025-03-30 18:04:51 +00:00
Lance Release
e67cd0baf9 Bump version: 0.18.3-beta.0 → 0.19.0-beta.0 2025-03-30 18:04:32 +00:00
Lance Release
26dab93f2a Bump version: 0.21.3-beta.0 → 0.22.0-beta.0 2025-03-30 18:04:14 +00:00
LuQQiu
b9bdb8d937 fix: fix remote restore api to always checkout latest version (#2291)
Fix restore to always checkout latest version, following local restore
api implementation

a1d1833a40/rust/lancedb/src/table.rs (L1910)
Otherwise
table.create_table -> version 1
table.add_table -> version 2
table.checkout(1), table.restore() -> the version remains at 1 (should
checkout_latest inside restore method to update version to latest
version and allow write operation)
table.checkout_latest() -> version is 3
can do write operations
2025-03-29 22:46:57 -07:00
LuQQiu
a1d1833a40 feat: add analyze_plan api (#2280)
add analyze plan api to allow executing the queries and see runtime
metrics.
Which help identify the query IO overhead and help identify query
slowness
2025-03-28 14:28:52 -07:00
Will Jones
a547c523c2 feat!: change default read_consistency_interval=5s (#2281)
Previously, when we loaded the next version of the table, we would block
all reads with a write lock. Now, we only do that if
`read_consistency_interval=0`. Otherwise, we load the next version
asynchronously in the background. This should mean that
`read_consistency_interval > 0` won't have a meaningful impact on
latency.

Along with this change, I felt it was safe to change the default
consistency interval to 5 seconds. The current default is `None`, which
means we will **never** check for a new version by default. I think that
default is contrary to most users expectations.
2025-03-28 11:04:31 -07:00
Lance Release
dc8b75feab Updating package-lock.json 2025-03-28 17:15:17 +00:00
Lance Release
c1600cdc06 Updating package-lock.json 2025-03-28 16:04:01 +00:00
Lance Release
f5dee46970 Updating package-lock.json 2025-03-28 16:03:46 +00:00
Lance Release
346cbf8bf7 Bump version: 0.18.2-beta.0 → 0.18.3-beta.0 2025-03-28 16:03:31 +00:00
Lance Release
3c7dfe9f28 Bump version: 0.21.2-beta.0 → 0.21.3-beta.0 2025-03-28 16:03:17 +00:00
Lei Xu
f52d05d3fa feat: add columns using pyarrow schema (#2284) 2025-03-28 08:51:50 -07:00
vinoyang
c321cccc12 chore(java): make rust release to be a switch option (#2277) 2025-03-28 11:26:24 +08:00
LuQQiu
cba14a5743 feat: add restore remote api (#2282) 2025-03-27 16:33:52 -07:00
vinoyang
72057b743d chore(java): introduce spotless plugin (#2278) 2025-03-27 10:38:39 +08:00
LuQQiu
698f329598 feat: add explain plan remote api (#2263)
Add explain plan remote api
2025-03-26 11:22:40 -07:00
BubbleCal
79fa745130 feat: upgrade lance to v0.25.1-beta.3 (#2276)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-26 23:14:27 +08:00
76 changed files with 2760 additions and 615 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.18.2-beta.0"
current_version = "0.19.0-beta.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -43,7 +43,7 @@ jobs:
- uses: Swatinem/rust-cache@v2
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: "1.79.0"
toolchain: "1.81.0"
cache-workspaces: "./java/core/lancedb-jni"
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
@@ -97,7 +97,7 @@ jobs:
- name: Dry run
if: github.event_name == 'pull_request'
run: |
mvn --batch-mode -DskipTests package
mvn --batch-mode -DskipTests -Drust.release.build=true package
- name: Set github
run: |
git config --global user.email "LanceDB Github Runner"
@@ -108,7 +108,7 @@ jobs:
echo "use-agent" >> ~/.gnupg/gpg.conf
echo "pinentry-mode loopback" >> ~/.gnupg/gpg.conf
export GPG_TTY=$(tty)
mvn --batch-mode -DskipTests -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -P deploy-to-ossrh
mvn --batch-mode -DskipTests -Drust.release.build=true -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -P deploy-to-ossrh
env:
SONATYPE_USER: ${{ secrets.SONATYPE_USER }}
SONATYPE_TOKEN: ${{ secrets.SONATYPE_TOKEN }}

View File

@@ -535,6 +535,10 @@ jobs:
for filename in *.tgz; do
npm publish $PUBLISH_ARGS $filename
done
- name: Deprecate
# We need to deprecate the old package to avoid confusion.
# Each time we publish a new version, it gets undeprecated.
run: npm deprecate vectordb "Use @lancedb/lancedb instead."
- name: Notify Slack Action
uses: ravsamhq/notify-slack-action@2.3.0
if: ${{ always() }}

634
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -21,16 +21,16 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.25.0", "features" = [
lance = { "version" = "=0.25.3", "features" = [
"dynamodb",
], tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.25.0", tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.25.0", tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.25.0", tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.25.0", tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.25.0", tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.25.0", tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.25.0", tag = "v0.25.0-beta.5", git = "https://github.com/lancedb/lance.git" }
], tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
lance-io = { version = "=0.25.3", tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
lance-index = { version = "=0.25.3", tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
lance-linalg = { version = "=0.25.3", tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
lance-table = { version = "=0.25.3", tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
lance-testing = { version = "=0.25.3", tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
lance-datafusion = { version = "=0.25.3", tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
lance-encoding = { version = "=0.25.3", tag = "v0.25.3-beta.1", git = "https://github.com/lancedb/lance" }
# Note that this one does not include pyarrow
arrow = { version = "54.1", optional = false }
arrow-array = "54.1"
@@ -41,12 +41,12 @@ arrow-schema = "54.1"
arrow-arith = "54.1"
arrow-cast = "54.1"
async-trait = "0"
datafusion = { version = "45.0", default-features = false }
datafusion-catalog = "45.0"
datafusion-common = { version = "45.0", default-features = false }
datafusion-execution = "45.0"
datafusion-expr = "45.0"
datafusion-physical-plan = "45.0"
datafusion = { version = "46.0", default-features = false }
datafusion-catalog = "46.0"
datafusion-common = { version = "46.0", default-features = false }
datafusion-execution = "46.0"
datafusion-expr = "46.0"
datafusion-physical-plan = "46.0"
env_logger = "0.11"
half = { "version" = "=2.4.1", default-features = false, features = [
"num-traits",

View File

@@ -1001,9 +1001,11 @@ In LanceDB OSS, users can set the `read_consistency_interval` parameter on conne
There are three possible settings for `read_consistency_interval`:
1. **Unset (default)**: The database does not check for updates to tables made by other processes. This provides the best query performance, but means that clients may not see the most up-to-date data. This setting is suitable for applications where the data does not change during the lifetime of the table reference.
2. **Zero seconds (Strong consistency)**: The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS.
3. **Custom interval (Eventual consistency)**: The database checks for updates at a custom interval, such as every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually.
1. **Unset**: The database does not check for updates to tables made by other processes. This setting is suitable for applications where the data does not change during the lifetime of the table reference.
2. **Zero seconds (Strong consistency)**: The database checks for updates on every read. This provides the strongest consistency guarantees, ensuring that all clients see the latest committed data. However, it has the most overhead. This setting is suitable when consistency matters more than having high QPS. For best performance, combine this setting with the storage option `new_table_enable_v2_manifest_paths` set to `true`.
3. **Custom interval (Eventual consistency, the default)**: The database checks for updates at a custom interval. By default, this is every 5 seconds. This provides eventual consistency, allowing for some lag between write and read operations. Performance wise, this is a middle ground between strong consistency and no consistency check. This setting is suitable for applications where immediate consistency is not critical, but clients should see updated data eventually.
You can always force a synchronization by calling `checkout_latest()` / `checkoutLatest()` on a table.
!!! tip "Consistency in LanceDB Cloud"
@@ -1041,7 +1043,21 @@ There are three possible settings for `read_consistency_interval`:
--8<-- "python/python/tests/docs/test_guide_tables.py:table_async_eventual_consistency"
```
By default, a `Table` will never check for updates from other writers. To manually check for updates you can use `checkout_latest`:
For no consistency, use `None`:
=== "Sync API"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:table_no_consistency"
```
=== "Async API"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:table_async_no_consistency"
```
To manually check for updates you can use `checkout_latest`:
=== "Sync API"
@@ -1059,15 +1075,25 @@ There are three possible settings for `read_consistency_interval`:
To set strong consistency, use `0`:
```ts
const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 0 });
const tbl = await db.openTable("my_table");
--8<-- "nodejs/examples/basic.test.ts:table_strong_consistency"
```
For eventual consistency, specify the update interval as seconds:
```ts
const db = await lancedb.connect({ uri: "./.lancedb", readConsistencyInterval: 5 });
const tbl = await db.openTable("my_table");
--8<-- "nodejs/examples/basic.test.ts:table_eventual_consistency"
```
For no consistency, use `null`:
```ts
--8<-- "nodejs/examples/basic.test.ts:table_no_consistency"
```
To manually check for updates you can use `checkoutLatest`:
```ts
--8<-- "nodejs/examples/basic.test.ts:table_checkout_latest"
```
<!-- Node doesn't yet support the version time travel: https://github.com/lancedb/lancedb/issues/1007

View File

@@ -0,0 +1,75 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BoostQuery
# Class: BoostQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BoostQuery()
```ts
new BoostQuery(
positive,
negative,
negativeBoost): BoostQuery
```
Creates an instance of BoostQuery.
#### Parameters
* **positive**: [`FullTextQuery`](../interfaces/FullTextQuery.md)
The positive query that boosts the relevance score.
* **negative**: [`FullTextQuery`](../interfaces/FullTextQuery.md)
The negative query that reduces the relevance score.
* **negativeBoost**: `number`
The factor by which the negative query reduces the score.
#### Returns
[`BoostQuery`](BoostQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -0,0 +1,83 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MatchQuery
# Class: MatchQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new MatchQuery()
```ts
new MatchQuery(
query,
column,
boost,
fuzziness,
maxExpansions): MatchQuery
```
Creates an instance of MatchQuery.
#### Parameters
* **query**: `string`
The text query to search for.
* **column**: `string`
The name of the column to search within.
* **boost**: `number` = `1.0`
(Optional) The boost factor to influence the relevance score of this query. Default is `1.0`.
* **fuzziness**: `number` = `0`
(Optional) The allowed edit distance for fuzzy matching. Default is `0`.
* **maxExpansions**: `number` = `50`
(Optional) The maximum number of terms to consider for fuzzy matching. Default is `50`.
#### Returns
[`MatchQuery`](MatchQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -0,0 +1,77 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / MultiMatchQuery
# Class: MultiMatchQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new MultiMatchQuery()
```ts
new MultiMatchQuery(
query,
columns,
boosts): MultiMatchQuery
```
Creates an instance of MultiMatchQuery.
#### Parameters
* **query**: `string`
The text query to search for across multiple columns.
* **columns**: `string`[]
An array of column names to search within.
* **boosts**: `number`[] = `...`
(Optional) An array of boost factors corresponding to each column. Default is an array of 1.0 for each column.
The `boosts` array should have the same length as `columns`. If not provided, all columns will have a default boost of 1.0.
If the length of `boosts` is less than `columns`, it will be padded with 1.0s.
#### Returns
[`MultiMatchQuery`](MultiMatchQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -0,0 +1,69 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / PhraseQuery
# Class: PhraseQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new PhraseQuery()
```ts
new PhraseQuery(query, column): PhraseQuery
```
Creates an instance of `PhraseQuery`.
#### Parameters
* **query**: `string`
The phrase to search for in the specified column.
* **column**: `string`
The name of the column to search within.
#### Returns
[`PhraseQuery`](PhraseQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict)

View File

@@ -30,6 +30,53 @@ protected inner: Query | Promise<Query>;
## Methods
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
***
### execute()
```ts
@@ -159,7 +206,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;
@@ -262,7 +309,7 @@ nearestToText(query, columns?): Query
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **columns?**: `string`[]

View File

@@ -36,6 +36,49 @@ protected inner: NativeQueryType | Promise<NativeQueryType>;
## Methods
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
***
### execute()
```ts
@@ -149,7 +192,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;

View File

@@ -48,6 +48,53 @@ addQueryVector(vector): VectorQuery
***
### analyzePlan()
```ts
analyzePlan(): Promise<string>
```
Executes the query and returns the physical query plan annotated with runtime metrics.
This is useful for debugging and performance analysis, as it shows how the query was executed
and includes metrics such as elapsed time, rows processed, and I/O statistics.
#### Returns
`Promise`&lt;`string`&gt;
A query execution plan with runtime metrics for each step.
#### Example
```ts
import * as lancedb from "@lancedb/lancedb"
const db = await lancedb.connect("./.lancedb");
const table = await db.createTable("my_table", [
{ vector: [1.1, 0.9], id: "1" },
]);
const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
Example output (with runtime metrics inlined):
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
```
#### Inherited from
[`QueryBase`](QueryBase.md).[`analyzePlan`](QueryBase.md#analyzeplan)
***
### bypassVectorIndex()
```ts
@@ -300,7 +347,7 @@ fullTextSearch(query, options?): this
#### Parameters
* **query**: `string`
* **query**: `string` \| [`FullTextQuery`](../interfaces/FullTextQuery.md)
* **options?**: `Partial`&lt;[`FullTextSearchOptions`](../interfaces/FullTextSearchOptions.md)&gt;

View File

@@ -0,0 +1,46 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FullTextQueryType
# Enumeration: FullTextQueryType
Enum representing the types of full-text queries supported.
- `Match`: Performs a full-text search for terms in the query string.
- `MatchPhrase`: Searches for an exact phrase match in the text.
- `Boost`: Boosts the relevance score of specific terms in the query.
- `MultiMatch`: Searches across multiple fields for the query terms.
## Enumeration Members
### Boost
```ts
Boost: "boost";
```
***
### Match
```ts
Match: "match";
```
***
### MatchPhrase
```ts
MatchPhrase: "match_phrase";
```
***
### MultiMatch
```ts
MultiMatch: "multi_match";
```

View File

@@ -9,12 +9,20 @@
- [embedding](namespaces/embedding/README.md)
- [rerankers](namespaces/rerankers/README.md)
## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md)
## Classes
- [BoostQuery](classes/BoostQuery.md)
- [Connection](classes/Connection.md)
- [Index](classes/Index.md)
- [MakeArrowTableOptions](classes/MakeArrowTableOptions.md)
- [MatchQuery](classes/MatchQuery.md)
- [MergeInsertBuilder](classes/MergeInsertBuilder.md)
- [MultiMatchQuery](classes/MultiMatchQuery.md)
- [PhraseQuery](classes/PhraseQuery.md)
- [Query](classes/Query.md)
- [QueryBase](classes/QueryBase.md)
- [RecordBatchIterator](classes/RecordBatchIterator.md)
@@ -33,6 +41,7 @@
- [CreateTableOptions](interfaces/CreateTableOptions.md)
- [ExecutableQuery](interfaces/ExecutableQuery.md)
- [FtsOptions](interfaces/FtsOptions.md)
- [FullTextQuery](interfaces/FullTextQuery.md)
- [FullTextSearchOptions](interfaces/FullTextSearchOptions.md)
- [HnswPqOptions](interfaces/HnswPqOptions.md)
- [HnswSqOptions](interfaces/HnswSqOptions.md)

View File

@@ -44,7 +44,7 @@ for testing purposes.
### readConsistencyInterval?
```ts
optional readConsistencyInterval: number;
optional readConsistencyInterval: null | number;
```
(For LanceDB OSS only): The interval, in seconds, at which to check for

View File

@@ -0,0 +1,35 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / FullTextQuery
# Interface: FullTextQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
***
### toDict()
```ts
toDict(): Record<string, unknown>
```
#### Returns
`Record`&lt;`string`, `unknown`&gt;

View File

@@ -11,6 +11,7 @@ likely that someone who knows the answer will see your question.
## Common issues
* Multiprocessing with `fork` is not supported. You should use `spawn` instead.
* Data returned by queries may not reflect the most recent writes, depending on configuration. LanceDB uses eventual consistency by default. See [consistency](/docs/src/guides/tables.md#consistency) for more information.
## Enabling logging
@@ -35,3 +36,9 @@ print the resolved query plan. You can use the `explain_plan` method to do this:
* Python Sync: [LanceQueryBuilder.explain_plan][lancedb.query.LanceQueryBuilder.explain_plan]
* Python Async: [AsyncQueryBase.explain_plan][lancedb.query.AsyncQueryBase.explain_plan]
* Node @lancedb/lancedb: [LanceQueryBuilder.explainPlan](/lancedb/js/classes/QueryBase/#explainplan)
To understand how a query was actually executed—including metrics like execution time, number of rows processed, I/O stats, and more—use the analyze_plan method. This executes the query and returns a physical execution plan annotated with runtime metrics, making it especially helpful for performance tuning and debugging.
* Python Sync: [LanceQueryBuilder.analyze_plan][lancedb.query.LanceQueryBuilder.analyze_plan]
* Python Async: [AsyncQueryBase.analyze_plan][lancedb.query.AsyncQueryBase.analyze_plan]
* Node @lancedb/lancedb: [LanceQueryBuilder.analyzePlan](/lancedb/js/classes/QueryBase/#analyzePlan)

View File

@@ -8,13 +8,16 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.18.2-beta.0</version>
<version>0.19.0-beta.1</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-core</artifactId>
<name>LanceDB Core</name>
<packaging>jar</packaging>
<properties>
<rust.release.build>false</rust.release.build>
</properties>
<dependencies>
<dependency>
@@ -68,7 +71,7 @@
</goals>
<configuration>
<path>lancedb-jni</path>
<release>true</release>
<release>${rust.release.build}</release>
<!-- Copy native libraries to target/classes for runtime access -->
<copyTo>${project.build.directory}/classes/nativelib</copyTo>
<copyWithPlatformDir>true</copyWithPlatformDir>

View File

@@ -1,16 +1,25 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import io.questdb.jar.jni.JarJniLoader;
import java.io.Closeable;
import java.util.List;
import java.util.Optional;
/**
* Represents LanceDB database.
*/
/** Represents LanceDB database. */
public class Connection implements Closeable {
static {
JarJniLoader.loadLib(Connection.class, "/nativelib", "lancedb_jni");
@@ -18,14 +27,11 @@ public class Connection implements Closeable {
private long nativeConnectionHandle;
/**
* Connect to a LanceDB instance.
*/
/** Connect to a LanceDB instance. */
public static native Connection connect(String uri);
/**
* Get the names of all tables in the database. The names are sorted in
* ascending order.
* Get the names of all tables in the database. The names are sorted in ascending order.
*
* @return the table names
*/
@@ -34,8 +40,7 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param limit The number of results to return.
* @return the table names
@@ -45,12 +50,11 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @return the table names
*/
public List<String> tableNames(String startAfter) {
@@ -58,12 +62,11 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
@@ -72,22 +75,19 @@ public class Connection implements Closeable {
}
/**
* Get the names of filtered tables in the database. The names are sorted in
* ascending order.
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination
* by setting this to the last table name from the previous page.
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
public native List<String> tableNames(
Optional<String> startAfter, Optional<Integer> limit);
public native List<String> tableNames(Optional<String> startAfter, Optional<Integer> limit);
/**
* Closes this connection and releases any system resources associated with it. If
* the connection is
* already closed, then invoking this method has no effect.
* Closes this connection and releases any system resources associated with it. If the connection
* is already closed, then invoking this method has no effect.
*/
@Override
public void close() {
@@ -98,8 +98,7 @@ public class Connection implements Closeable {
}
/**
* Native method to release the Lance connection resources associated with the
* given handle.
* Native method to release the Lance connection resources associated with the given handle.
*
* @param handle The native handle to the connection resource.
*/

View File

@@ -1,27 +1,35 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
import java.nio.file.Path;
import java.util.List;
import java.net.URL;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;
import java.net.URL;
import java.nio.file.Path;
import java.util.List;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
public class ConnectionTest {
private static final String[] TABLE_NAMES = {
"dataset_version",
"new_empty_dataset",
"test",
"write_stream"
"dataset_version", "new_empty_dataset", "test", "write_stream"
};
@TempDir
static Path tempDir; // Temporary directory for the tests
@TempDir static Path tempDir; // Temporary directory for the tests
private static URL lanceDbURL;
@BeforeAll
@@ -53,18 +61,21 @@ public class ConnectionTest {
@Test
void tableNamesStartAfter() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
assertTableNamesStartAfter(conn, TABLE_NAMES[0], 3, TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(
conn, TABLE_NAMES[0], 3, TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[1], 2, TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[2], 1, TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[3], 0);
assertTableNamesStartAfter(conn, "a_dataset", 4, TABLE_NAMES[0], TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(
conn, "a_dataset", 4, TABLE_NAMES[0], TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "o_dataset", 2, TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "v_dataset", 1, TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "z_dataset", 0);
}
}
private void assertTableNamesStartAfter(Connection conn, String startAfter, int expectedSize, String... expectedNames) {
private void assertTableNamesStartAfter(
Connection conn, String startAfter, int expectedSize, String... expectedNames) {
List<String> tableNames = conn.tableNames(startAfter);
assertEquals(expectedSize, tableNames.size());
for (int i = 0; i < expectedNames.length; i++) {
@@ -74,7 +85,7 @@ public class ConnectionTest {
@Test
void tableNamesLimit() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
for (int i = 0; i <= TABLE_NAMES.length; i++) {
List<String> tableNames = conn.tableNames(i);
assertEquals(i, tableNames.size());

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.18.2-beta.0</version>
<version>0.19.0-beta.1</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>
@@ -29,6 +29,25 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
<spotless.delimiter>package</spotless.delimiter>
<spotless.license.header>
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
</spotless.license.header>
</properties>
<modules>
@@ -127,7 +146,8 @@
<configuration>
<configLocation>google_checks.xml</configLocation>
<consoleOutput>true</consoleOutput>
<failsOnError>true</failsOnError>
<failsOnError>false</failsOnError>
<failOnViolation>false</failOnViolation>
<violationSeverity>warning</violationSeverity>
<linkXRef>false</linkXRef>
</configuration>
@@ -141,6 +161,10 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
</plugin>
</plugins>
<pluginManagement>
<plugins>
@@ -179,6 +203,54 @@
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
<version>${spotless.version}</version>
<configuration>
<skip>${spotless.skip}</skip>
<upToDateChecking>
<enabled>true</enabled>
</upToDateChecking>
<java>
<includes>
<include>src/main/java/**/*.java</include>
<include>src/test/java/**/*.java</include>
</includes>
<googleJavaFormat>
<version>${spotless.java.googlejavaformat.version}</version>
<style>GOOGLE</style>
</googleJavaFormat>
<importOrder>
<order>com.lancedb.lance,,javax,java,\#</order>
</importOrder>
<removeUnusedImports />
</java>
<scala>
<includes>
<include>src/main/scala/**/*.scala</include>
<include>src/main/scala-*/**/*.scala</include>
<include>src/test/scala/**/*.scala</include>
<include>src/test/scala-*/**/*.scala</include>
</includes>
</scala>
<licenseHeader>
<content>${spotless.license.header}</content>
<delimiter>${spotless.delimiter}</delimiter>
</licenseHeader>
</configuration>
<executions>
<execution>
<id>spotless-check</id>
<phase>validate</phase>
<goals>
<goal>apply</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
</build>

79
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "vectordb",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "vectordb",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"cpu": [
"x64",
"arm64"
@@ -52,11 +52,11 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.18.2-beta.0",
"@lancedb/vectordb-darwin-x64": "0.18.2-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.18.2-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.18.2-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.18.2-beta.0"
"@lancedb/vectordb-darwin-arm64": "0.19.0-beta.1",
"@lancedb/vectordb-darwin-x64": "0.19.0-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.0-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.19.0-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.19.0-beta.1"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",
@@ -326,71 +326,6 @@
"@jridgewell/sourcemap-codec": "^1.4.10"
}
},
"node_modules/@lancedb/vectordb-darwin-arm64": {
"version": "0.18.2-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.18.2-beta.0.tgz",
"integrity": "sha512-FzIcElkS6R5I5kU1S5m7yLVTB1Duv1XcmZQtVmYl/JjNlfxS1WTtMzdzMqSBFohDcgU2Tkc5+1FpK1B94dUUbg==",
"cpu": [
"arm64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"darwin"
]
},
"node_modules/@lancedb/vectordb-darwin-x64": {
"version": "0.18.2-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.18.2-beta.0.tgz",
"integrity": "sha512-jv+XludfLNBDm1DjdqyghwDMtd4E+ygwycQpkpK72wyZSh6Qytrgq+4dNi/zCZ3UChFLbKbIxrVxv9yENQn2Pg==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"darwin"
]
},
"node_modules/@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.18.2-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.18.2-beta.0.tgz",
"integrity": "sha512-8/fBpbNYhhpetf/pZv0DyPnQkeAbsiICMyCoRiNu5auvQK4AsGF1XvLWrDi68u9F0GysBKvuatYuGqa/yh+Anw==",
"cpu": [
"arm64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@lancedb/vectordb-linux-x64-gnu": {
"version": "0.18.2-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.18.2-beta.0.tgz",
"integrity": "sha512-7a1Kc/2V2ff4HlLzXyXVdK0Z0VIFUt50v2SBRdlcycJ0NLW9ZqV+9UjB/NAOwMXVgYd7d3rKjACGkQzkpvcyeg==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"linux"
]
},
"node_modules/@lancedb/vectordb-win32-x64-msvc": {
"version": "0.18.2-beta.0",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.18.2-beta.0.tgz",
"integrity": "sha512-EeCiSf2RtJMESnkIca28GI6rAStYj2q9sVIyNCXpmIZSkJVpfQ3iswHGAbHrEfaPl0J1Re9cnRHLLuqkumwiIQ==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"win32"
]
},
"node_modules/@neon-rs/cli": {
"version": "0.0.160",
"resolved": "https://registry.npmjs.org/@neon-rs/cli/-/cli-0.0.160.tgz",

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"description": " Serverless, low-latency vector database for AI applications",
"private": false,
"main": "dist/index.js",
@@ -89,10 +89,10 @@
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-x64": "0.18.2-beta.0",
"@lancedb/vectordb-darwin-arm64": "0.18.2-beta.0",
"@lancedb/vectordb-linux-x64-gnu": "0.18.2-beta.0",
"@lancedb/vectordb-linux-arm64-gnu": "0.18.2-beta.0",
"@lancedb/vectordb-win32-x64-msvc": "0.18.2-beta.0"
"@lancedb/vectordb-darwin-x64": "0.19.0-beta.1",
"@lancedb/vectordb-darwin-arm64": "0.19.0-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.19.0-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.0-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.19.0-beta.1"
}
}

View File

@@ -110,7 +110,7 @@ describe('LanceDB Mirrored Store Integration test', function () {
fs.readdir(path.join(mirroredPath, 'data'), { withFileTypes: true }, (err, files) => {
if (err != null) throw err
assert.equal(files.length, 1)
assert.equal(files.length, 1, `Found files: ${files.map(f => f.name)}`)
assert.isTrue(files[0].name.endsWith('.lance'))
})

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.18.2-beta.0"
version = "0.19.0-beta.1"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -17,7 +17,7 @@ describe("when connecting", () => {
it("should connect", async () => {
const db = await connect(tmpDir.name);
expect(db.display()).toBe(
`ListingDatabase(uri=${tmpDir.name}, read_consistency_interval=None)`,
`ListingDatabase(uri=${tmpDir.name}, read_consistency_interval=5s)`,
);
});

View File

@@ -58,7 +58,7 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
it("be displayable", async () => {
expect(table.display()).toMatch(
/NativeTable\(some_table, uri=.*, read_consistency_interval=None\)/,
/NativeTable\(some_table, uri=.*, read_consistency_interval=5s\)/,
);
table.close();
expect(table.display()).toBe("ClosedTable(some_table)");
@@ -633,6 +633,23 @@ describe("When creating an index", () => {
expect(plan2).not.toMatch("LanceScan");
});
it("should be able to run analyze plan", async () => {
await tbl.createIndex("vec");
await tbl.add([
{
id: 300,
vec: Array(32)
.fill(1)
.map(() => Math.random()),
tags: [],
},
]);
const plan = await tbl.query().nearestTo(queryVec).analyzePlan();
expect(plan).toMatch("AnalyzeExec");
expect(plan).toMatch("metrics=");
});
it("should be able to query with row id", async () => {
const results = await tbl
.query()
@@ -1346,6 +1363,30 @@ describe("when calling explainPlan", () => {
});
});
describe("when calling analyzePlan", () => {
let tmpDir: tmp.DirResult;
let table: Table;
let queryVec: number[];
beforeEach(async () => {
tmpDir = tmp.dirSync({ unsafeCleanup: true });
const con = await connect(tmpDir.name);
table = await con.createTable("vectors", [{ id: 1, vector: [1.1, 0.9] }]);
});
afterEach(() => {
tmpDir.removeCallback();
});
it("retrieves runtime metrics", async () => {
queryVec = Array(2)
.fill(1)
.map(() => Math.random());
const plan = await table.query().nearestTo(queryVec).analyzePlan();
console.log("Query Plan:\n", plan); // <--- Print the plan
expect(plan).toMatch("AnalyzeExec");
});
});
describe("column name options", () => {
let tmpDir: tmp.DirResult;
let table: Table;

View File

@@ -202,5 +202,35 @@ test("basic table examples", async () => {
// --8<-- [end:create_f16_table]
await db.dropTable("f16_tbl");
}
const uri = databaseDir;
await db.createTable("my_table", [{ id: 1 }, { id: 2 }]);
{
// --8<-- [start:table_strong_consistency]
const db = await lancedb.connect({ uri, readConsistencyInterval: 0 });
const tbl = await db.openTable("my_table");
// --8<-- [end:table_strong_consistency]
}
{
// --8<-- [start:table_eventual_consistency]
const db = await lancedb.connect({ uri, readConsistencyInterval: 5 });
const tbl = await db.openTable("my_table");
// --8<-- [end:table_eventual_consistency]
}
{
// --8<-- [start:table_no_consistency]
const db = await lancedb.connect({ uri, readConsistencyInterval: null });
const tbl = await db.openTable("my_table");
// --8<-- [end:table_no_consistency]
}
{
// --8<-- [start:table_checkout_latest]
const tbl = await db.openTable("my_table");
// (Other writes happen to test_table_async from another process)
// Check for updates
tbl.checkoutLatest();
// --8<-- [end:table_checkout_latest]
}
});
});

View File

@@ -47,6 +47,12 @@ export {
QueryExecutionOptions,
FullTextSearchOptions,
RecordBatchIterator,
FullTextQuery,
MatchQuery,
PhraseQuery,
BoostQuery,
MultiMatchQuery,
FullTextQueryType,
} from "./query";
export {

View File

@@ -17,6 +17,7 @@ import {
VectorQuery as NativeVectorQuery,
} from "./native";
import { Reranker } from "./rerankers";
export class RecordBatchIterator implements AsyncIterator<RecordBatch> {
private promisedInner?: Promise<NativeBatchIterator>;
private inner?: NativeBatchIterator;
@@ -152,7 +153,7 @@ export class QueryBase<NativeQueryType extends NativeQuery | NativeVectorQuery>
}
fullTextSearch(
query: string,
query: string | FullTextQuery,
options?: Partial<FullTextSearchOptions>,
): this {
let columns: string[] | null = null;
@@ -164,9 +165,18 @@ export class QueryBase<NativeQueryType extends NativeQuery | NativeVectorQuery>
}
}
this.doCall((inner: NativeQueryType) =>
inner.fullTextSearch(query, columns),
);
this.doCall((inner: NativeQueryType) => {
if (typeof query === "string") {
inner.fullTextSearch({
query: query,
columns: columns,
});
} else {
// If query is a FullTextQuery object, convert it to a dict
const queryObj = query.toDict();
inner.fullTextSearch(queryObj);
}
});
return this;
}
@@ -348,6 +358,43 @@ export class QueryBase<NativeQueryType extends NativeQuery | NativeVectorQuery>
return this.inner.explainPlan(verbose);
}
}
/**
* Executes the query and returns the physical query plan annotated with runtime metrics.
*
* This is useful for debugging and performance analysis, as it shows how the query was executed
* and includes metrics such as elapsed time, rows processed, and I/O statistics.
*
* @example
* import * as lancedb from "@lancedb/lancedb"
*
* const db = await lancedb.connect("./.lancedb");
* const table = await db.createTable("my_table", [
* { vector: [1.1, 0.9], id: "1" },
* ]);
*
* const plan = await table.query().nearestTo([0.5, 0.2]).analyzePlan();
*
* Example output (with runtime metrics inlined):
* AnalyzeExec verbose=true, metrics=[]
* ProjectionExec: expr=[id@3 as id, vector@0 as vector, _distance@2 as _distance], metrics=[output_rows=1, elapsed_compute=3.292µs]
* Take: columns="vector, _rowid, _distance, (id)", metrics=[output_rows=1, elapsed_compute=66.001µs, batches_processed=1, bytes_read=8, iops=1, requests=1]
* CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=1, elapsed_compute=3.333µs]
* GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=1, elapsed_compute=167ns]
* FilterExec: _distance@2 IS NOT NULL, metrics=[output_rows=1, elapsed_compute=8.542µs]
* SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST], metrics=[output_rows=1, elapsed_compute=63.25µs, row_replacements=1]
* KNNVectorDistance: metric=l2, metrics=[output_rows=1, elapsed_compute=114.333µs, output_batches=1]
* LanceScan: uri=/path/to/data, projection=[vector], row_id=true, row_addr=false, ordered=false, metrics=[output_rows=1, elapsed_compute=103.626µs, bytes_read=549, iops=2, requests=2]
*
* @returns A query execution plan with runtime metrics for each step.
*/
async analyzePlan(): Promise<string> {
if (this.inner instanceof Promise) {
return this.inner.then((inner) => inner.analyzePlan());
} else {
return this.inner.analyzePlan();
}
}
}
/**
@@ -681,8 +728,167 @@ export class Query extends QueryBase<NativeQuery> {
}
}
nearestToText(query: string, columns?: string[]): Query {
this.doCall((inner) => inner.fullTextSearch(query, columns));
nearestToText(query: string | FullTextQuery, columns?: string[]): Query {
this.doCall((inner) => {
if (typeof query === "string") {
inner.fullTextSearch({
query: query,
columns: columns,
});
} else {
const queryObj = query.toDict();
inner.fullTextSearch(queryObj);
}
});
return this;
}
}
/**
* Enum representing the types of full-text queries supported.
*
* - `Match`: Performs a full-text search for terms in the query string.
* - `MatchPhrase`: Searches for an exact phrase match in the text.
* - `Boost`: Boosts the relevance score of specific terms in the query.
* - `MultiMatch`: Searches across multiple fields for the query terms.
*/
export enum FullTextQueryType {
Match = "match",
MatchPhrase = "match_phrase",
Boost = "boost",
MultiMatch = "multi_match",
}
/**
* Represents a full-text query interface.
* This interface defines the structure and behavior for full-text queries,
* including methods to retrieve the query type and convert the query to a dictionary format.
*/
export interface FullTextQuery {
queryType(): FullTextQueryType;
toDict(): Record<string, unknown>;
}
export class MatchQuery implements FullTextQuery {
/**
* Creates an instance of MatchQuery.
*
* @param query - The text query to search for.
* @param column - The name of the column to search within.
* @param boost - (Optional) The boost factor to influence the relevance score of this query. Default is `1.0`.
* @param fuzziness - (Optional) The allowed edit distance for fuzzy matching. Default is `0`.
* @param maxExpansions - (Optional) The maximum number of terms to consider for fuzzy matching. Default is `50`.
*/
constructor(
private query: string,
private column: string,
private boost: number = 1.0,
private fuzziness: number = 0,
private maxExpansions: number = 50,
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.Match;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
[this.column]: {
query: this.query,
boost: this.boost,
fuzziness: this.fuzziness,
// biome-ignore lint/style/useNamingConvention: use underscore for consistency with the other APIs
max_expansions: this.maxExpansions,
},
},
};
}
}
export class PhraseQuery implements FullTextQuery {
/**
* Creates an instance of `PhraseQuery`.
*
* @param query - The phrase to search for in the specified column.
* @param column - The name of the column to search within.
*/
constructor(
private query: string,
private column: string,
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.MatchPhrase;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
[this.column]: this.query,
},
};
}
}
export class BoostQuery implements FullTextQuery {
/**
* Creates an instance of BoostQuery.
*
* @param positive - The positive query that boosts the relevance score.
* @param negative - The negative query that reduces the relevance score.
* @param negativeBoost - The factor by which the negative query reduces the score.
*/
constructor(
private positive: FullTextQuery,
private negative: FullTextQuery,
private negativeBoost: number,
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.Boost;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
positive: this.positive.toDict(),
negative: this.negative.toDict(),
// biome-ignore lint/style/useNamingConvention: use underscore for consistency with the other APIs
negative_boost: this.negativeBoost,
},
};
}
}
export class MultiMatchQuery implements FullTextQuery {
/**
* Creates an instance of MultiMatchQuery.
*
* @param query - The text query to search for across multiple columns.
* @param columns - An array of column names to search within.
* @param boosts - (Optional) An array of boost factors corresponding to each column. Default is an array of 1.0 for each column.
*
* The `boosts` array should have the same length as `columns`. If not provided, all columns will have a default boost of 1.0.
* If the length of `boosts` is less than `columns`, it will be padded with 1.0s.
*/
constructor(
private query: string,
private columns: string[],
private boosts: number[] = columns.map(() => 1.0),
) {}
queryType(): FullTextQueryType {
return FullTextQueryType.MultiMatch;
}
toDict(): Record<string, unknown> {
return {
[this.queryType()]: {
query: this.query,
columns: this.columns,
boost: this.boosts,
},
};
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.18.2-beta.0",
"version": "0.19.0-beta.1",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -48,8 +48,16 @@ impl Connection {
pub async fn new(uri: String, options: ConnectionOptions) -> napi::Result<Self> {
let mut builder = ConnectBuilder::new(&uri);
if let Some(interval) = options.read_consistency_interval {
builder =
builder.read_consistency_interval(std::time::Duration::from_secs_f64(interval));
match interval {
Either::A(seconds) => {
builder = builder.read_consistency_interval(Some(
std::time::Duration::from_secs_f64(seconds),
));
}
Either::B(_) => {
builder = builder.read_consistency_interval(None);
}
}
}
if let Some(storage_options) = options.storage_options {
for (key, value) in storage_options {

View File

@@ -4,6 +4,7 @@
use std::collections::HashMap;
use env_logger::Env;
use napi::{bindgen_prelude::Null, Either};
use napi_derive::*;
mod connection;
@@ -18,7 +19,6 @@ mod table;
mod util;
#[napi(object)]
#[derive(Debug)]
pub struct ConnectionOptions {
/// (For LanceDB OSS only): The interval, in seconds, at which to check for
/// updates to the table from other processes. If None, then consistency is not
@@ -29,7 +29,7 @@ pub struct ConnectionOptions {
/// has passed since the last check, then the table will be checked for updates.
/// Note: this consistency only applies to read operations. Write operations are
/// always consistent.
pub read_consistency_interval: Option<f64>,
pub read_consistency_interval: Option<Either<f64, Null>>,
/// (For LanceDB OSS only): configuration for object storage.
///
/// The available options are described at https://lancedb.github.io/lancedb/guides/storage/

View File

@@ -3,7 +3,7 @@
use std::sync::Arc;
use lancedb::index::scalar::FullTextSearchQuery;
use lancedb::index::scalar::{FtsQuery, FullTextSearchQuery, MatchQuery, PhraseQuery};
use lancedb::query::ExecutableQuery;
use lancedb::query::Query as LanceDbQuery;
use lancedb::query::QueryBase;
@@ -18,7 +18,7 @@ use crate::error::NapiErrorExt;
use crate::iterator::RecordBatchIterator;
use crate::rerankers::Reranker;
use crate::rerankers::RerankerCallbacks;
use crate::util::parse_distance_type;
use crate::util::{parse_distance_type, parse_fts_query};
#[napi]
pub struct Query {
@@ -38,9 +38,53 @@ impl Query {
}
#[napi]
pub fn full_text_search(&mut self, query: String, columns: Option<Vec<String>>) {
let query = FullTextSearchQuery::new(query).columns(columns);
pub fn full_text_search(&mut self, query: napi::JsUnknown) -> napi::Result<()> {
let query = unsafe { query.cast::<napi::JsObject>() };
let query = if let Some(query_text) = query.get::<_, String>("query").transpose() {
let mut query_text = query_text?;
let columns = query.get::<_, Option<Vec<String>>>("columns")?.flatten();
let is_phrase =
query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"');
let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false);
if is_phrase {
// Remove the surrounding quotes for phrase queries
query_text = query_text[1..query_text.len() - 1].to_string();
}
let query: FtsQuery = match (is_phrase, is_multi_match) {
(false, _) => MatchQuery::new(query_text).into(),
(true, false) => PhraseQuery::new(query_text).into(),
(true, true) => {
return Err(napi::Error::from_reason(
"Phrase queries cannot be used with multiple columns.",
));
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!(
"Failed to set full text search columns: {}",
e
))
})?;
}
}
query
} else if let Some(query) = query.get::<_, napi::JsObject>("query")? {
let query = parse_fts_query(&query)?;
FullTextSearchQuery::new_query(query)
} else {
return Err(napi::Error::from_reason(
"Invalid full text search query object".to_string(),
));
};
self.inner = self.inner.clone().full_text_search(query);
Ok(())
}
#[napi]
@@ -114,6 +158,16 @@ impl Query {
))
})
}
#[napi(catch_unwind)]
pub async fn analyze_plan(&self) -> napi::Result<String> {
self.inner.analyze_plan().await.map_err(|e| {
napi::Error::from_reason(format!(
"Failed to execute analyze plan: {}",
convert_error(&e)
))
})
}
}
#[napi]
@@ -185,9 +239,53 @@ impl VectorQuery {
}
#[napi]
pub fn full_text_search(&mut self, query: String, columns: Option<Vec<String>>) {
let query = FullTextSearchQuery::new(query).columns(columns);
pub fn full_text_search(&mut self, query: napi::JsUnknown) -> napi::Result<()> {
let query = unsafe { query.cast::<napi::JsObject>() };
let query = if let Some(query_text) = query.get::<_, String>("query").transpose() {
let mut query_text = query_text?;
let columns = query.get::<_, Option<Vec<String>>>("columns")?.flatten();
let is_phrase =
query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"');
let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false);
if is_phrase {
// Remove the surrounding quotes for phrase queries
query_text = query_text[1..query_text.len() - 1].to_string();
}
let query: FtsQuery = match (is_phrase, is_multi_match) {
(false, _) => MatchQuery::new(query_text).into(),
(true, false) => PhraseQuery::new(query_text).into(),
(true, true) => {
return Err(napi::Error::from_reason(
"Phrase queries cannot be used with multiple columns.",
));
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!(
"Failed to set full text search columns: {}",
e
))
})?;
}
}
query
} else if let Some(query) = query.get::<_, napi::JsObject>("query")? {
let query = parse_fts_query(&query)?;
FullTextSearchQuery::new_query(query)
} else {
return Err(napi::Error::from_reason(
"Invalid full text search query object".to_string(),
));
};
self.inner = self.inner.clone().full_text_search(query);
Ok(())
}
#[napi]
@@ -259,4 +357,14 @@ impl VectorQuery {
))
})
}
#[napi(catch_unwind)]
pub async fn analyze_plan(&self) -> napi::Result<String> {
self.inner.analyze_plan().await.map_err(|e| {
napi::Error::from_reason(format!(
"Failed to execute analyze plan: {}",
convert_error(&e)
))
})
}
}

View File

@@ -1,6 +1,7 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use lancedb::index::scalar::{BoostQuery, FtsQuery, MatchQuery, MultiMatchQuery, PhraseQuery};
use lancedb::DistanceType;
pub fn parse_distance_type(distance_type: impl AsRef<str>) -> napi::Result<DistanceType> {
@@ -15,3 +16,144 @@ pub fn parse_distance_type(distance_type: impl AsRef<str>) -> napi::Result<Dista
))),
}
}
pub fn parse_fts_query(query: &napi::JsObject) -> napi::Result<FtsQuery> {
let query_type = query
.get_property_names()?
.get_element::<napi::JsString>(0)?;
let query_type = query_type.into_utf8()?.into_owned()?;
let query_value =
query
.get::<_, napi::JsObject>(&query_type)?
.ok_or(napi::Error::from_reason(format!(
"query value {} not found",
query_type
)))?;
match query_type.as_str() {
"match" => {
let column = query_value
.get_property_names()?
.get_element::<napi::JsString>(0)?
.into_utf8()?
.into_owned()?;
let params =
query_value
.get::<_, napi::JsObject>(&column)?
.ok_or(napi::Error::from_reason(format!(
"column {} not found",
column
)))?;
let query = params
.get::<_, napi::JsString>("query")?
.ok_or(napi::Error::from_reason("query not found"))?
.into_utf8()?
.into_owned()?;
let boost = params
.get::<_, napi::JsNumber>("boost")?
.ok_or(napi::Error::from_reason("boost not found"))?
.get_double()? as f32;
let fuzziness = params
.get::<_, napi::JsNumber>("fuzziness")?
.map(|f| f.get_uint32())
.transpose()?;
let max_expansions = params
.get::<_, napi::JsNumber>("max_expansions")?
.ok_or(napi::Error::from_reason("max_expansions not found"))?
.get_uint32()? as usize;
let query = MatchQuery::new(query)
.with_column(Some(column))
.with_boost(boost)
.with_fuzziness(fuzziness)
.with_max_expansions(max_expansions);
Ok(query.into())
}
"match_phrase" => {
let column = query_value
.get_property_names()?
.get_element::<napi::JsString>(0)?
.into_utf8()?
.into_owned()?;
let query = query_value
.get::<_, napi::JsString>(&column)?
.ok_or(napi::Error::from_reason(format!(
"column {} not found",
column
)))?
.into_utf8()?
.into_owned()?;
let query = PhraseQuery::new(query).with_column(Some(column));
Ok(query.into())
}
"boost" => {
let positive = query_value
.get::<_, napi::JsObject>("positive")?
.ok_or(napi::Error::from_reason("positive not found"))?;
let negative = query_value
.get::<_, napi::JsObject>("negative")?
.ok_or(napi::Error::from_reason("negative not found"))?;
let negative_boost = query_value
.get::<_, napi::JsNumber>("negative_boost")?
.ok_or(napi::Error::from_reason("negative_boost not found"))?
.get_double()? as f32;
let positive = parse_fts_query(&positive)?;
let negative = parse_fts_query(&negative)?;
let query = BoostQuery::new(positive, negative, Some(negative_boost));
Ok(query.into())
}
"multi_match" => {
let query = query_value
.get::<_, napi::JsString>("query")?
.ok_or(napi::Error::from_reason("query not found"))?
.into_utf8()?
.into_owned()?;
let columns_array = query_value
.get::<_, napi::JsTypedArray>("columns")?
.ok_or(napi::Error::from_reason("columns not found"))?;
let columns_num = columns_array.get_array_length()?;
let mut columns = Vec::with_capacity(columns_num as usize);
for i in 0..columns_num {
let column = columns_array
.get_element::<napi::JsString>(i)?
.into_utf8()?
.into_owned()?;
columns.push(column);
}
let boost_array = query_value
.get::<_, napi::JsTypedArray>("boost")?
.ok_or(napi::Error::from_reason("boost not found"))?;
if boost_array.get_array_length()? != columns_num {
return Err(napi::Error::from_reason(format!(
"boost array length ({}) does not match columns length ({})",
boost_array.get_array_length()?,
columns_num
)));
}
let mut boost = Vec::with_capacity(columns_num as usize);
for i in 0..columns_num {
let b = boost_array.get_element::<napi::JsNumber>(i)?.get_double()? as f32;
boost.push(b);
}
let query =
MultiMatchQuery::try_new_with_boosts(query, columns, boost).map_err(|e| {
napi::Error::from_reason(format!("Error creating MultiMatchQuery: {}", e))
})?;
Ok(query.into())
}
_ => Err(napi::Error::from_reason(format!(
"Unsupported query type: {}",
query_type
))),
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.21.2-beta.0"
current_version = "0.22.0-beta.2"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.21.2-beta.0"
version = "0.22.0-beta.2"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -26,7 +26,7 @@ def connect(
api_key: Optional[str] = None,
region: str = "us-east-1",
host_override: Optional[str] = None,
read_consistency_interval: Optional[timedelta] = None,
read_consistency_interval: Optional[timedelta] = timedelta(seconds=5),
request_thread_pool: Optional[Union[int, ThreadPoolExecutor]] = None,
client_config: Union[ClientConfig, Dict[str, Any], None] = None,
storage_options: Optional[Dict[str, str]] = None,
@@ -49,9 +49,8 @@ def connect(
read_consistency_interval: timedelta, default None
(For LanceDB OSS only)
The interval at which to check for updates to the table from other
processes. If None, then consistency is not checked. For performance
reasons, this is the default. For strong consistency, set this to
zero seconds. Then every read will check for updates from other
processes. If None, then consistency is not checked. For strong consistency,
set this to zero seconds. Then every read will check for updates from other
processes. As a compromise, you can set this to a non-zero timedelta
for eventual consistency. If more than that interval has passed since
the last check, then the table will be checked for updates. Note: this
@@ -122,7 +121,7 @@ async def connect_async(
api_key: Optional[str] = None,
region: str = "us-east-1",
host_override: Optional[str] = None,
read_consistency_interval: Optional[timedelta] = None,
read_consistency_interval: Optional[timedelta] = timedelta(seconds=5),
client_config: Optional[Union[ClientConfig, Dict[str, Any]]] = None,
storage_options: Optional[Dict[str, str]] = None,
) -> AsyncConnection:
@@ -143,9 +142,8 @@ async def connect_async(
read_consistency_interval: timedelta, default None
(For LanceDB OSS only)
The interval at which to check for updates to the table from other
processes. If None, then consistency is not checked. For performance
reasons, this is the default. For strong consistency, set this to
zero seconds. Then every read will check for updates from other
processes. If None, then consistency is not checked. For strong consistency,
set this to zero seconds. Then every read will check for updates from other
processes. As a compromise, you can set this to a non-zero timedelta
for eventual consistency. If more than that interval has passed since
the last check, then the table will be checked for updates. Note: this

View File

@@ -48,10 +48,11 @@ class Table:
async def version(self) -> int: ...
async def checkout(self, version: int): ...
async def checkout_latest(self): ...
async def restore(self): ...
async def restore(self, version: Optional[int] = None): ...
async def list_indices(self) -> list[IndexConfig]: ...
async def delete(self, filter: str): ...
async def add_columns(self, columns: list[tuple[str, str]]) -> None: ...
async def add_columns_with_schema(self, schema: pa.Schema) -> None: ...
async def alter_columns(self, columns: list[dict[str, Any]]) -> None: ...
async def optimize(
self,
@@ -94,6 +95,8 @@ class Query:
def nearest_to(self, query_vec: pa.Array) -> VectorQuery: ...
def nearest_to_text(self, query: dict) -> FTSQuery: ...
async def execute(self, max_batch_length: Optional[int]) -> RecordBatchStream: ...
async def explain_plan(self, verbose: Optional[bool]) -> str: ...
async def analyze_plan(self) -> str: ...
def to_query_request(self) -> PyQueryRequest: ...
class FTSQuery:
@@ -108,7 +111,6 @@ class FTSQuery:
def add_query_vector(self, query_vec: pa.Array) -> None: ...
def nearest_to(self, query_vec: pa.Array) -> HybridQuery: ...
async def execute(self, max_batch_length: Optional[int]) -> RecordBatchStream: ...
async def explain_plan(self) -> str: ...
def to_query_request(self) -> PyQueryRequest: ...
class VectorQuery:

View File

@@ -6,6 +6,7 @@ from __future__ import annotations
from abc import abstractmethod
from pathlib import Path
from datetime import timedelta
from typing import TYPE_CHECKING, Dict, Iterable, List, Literal, Optional, Union
from lancedb.embeddings.registry import EmbeddingFunctionRegistry
@@ -32,7 +33,6 @@ import deprecation
if TYPE_CHECKING:
import pyarrow as pa
from .pydantic import LanceModel
from datetime import timedelta
from ._lancedb import Connection as LanceDbConnection
from .common import DATA, URI
@@ -318,9 +318,8 @@ class LanceDBConnection(DBConnection):
The root uri of the database.
read_consistency_interval: timedelta, default None
The interval at which to check for updates to the table from other
processes. If None, then consistency is not checked. For performance
reasons, this is the default. For strong consistency, set this to
zero seconds. Then every read will check for updates from other
processes. If None, then consistency is not checked. For strong consistency,
set this to zero seconds. Then every read will check for updates from other
processes. As a compromise, you can set this to a non-zero timedelta
for eventual consistency. If more than that interval has passed since
the last check, then the table will be checked for updates. Note: this
@@ -352,7 +351,7 @@ class LanceDBConnection(DBConnection):
self,
uri: URI,
*,
read_consistency_interval: Optional[timedelta] = None,
read_consistency_interval: Optional[timedelta] = timedelta(seconds=5),
storage_options: Optional[Dict[str, str]] = None,
):
if not isinstance(uri, Path):

View File

@@ -4,7 +4,9 @@
from __future__ import annotations
from abc import ABC, abstractmethod
import abc
from concurrent.futures import ThreadPoolExecutor
from enum import Enum
from typing import (
TYPE_CHECKING,
Dict,
@@ -83,6 +85,196 @@ def ensure_vector_query(
return val
class FullTextQueryType(Enum):
MATCH = "match"
MATCH_PHRASE = "match_phrase"
BOOST = "boost"
MULTI_MATCH = "multi_match"
class FullTextQuery(abc.ABC, pydantic.BaseModel):
@abc.abstractmethod
def query_type(self) -> FullTextQueryType:
"""
Get the query type of the query.
Returns
-------
str
The type of the query.
"""
@abc.abstractmethod
def to_dict(self) -> dict:
"""
Convert the query to a dictionary.
Returns
-------
dict
The query as a dictionary.
"""
class MatchQuery(FullTextQuery):
def __init__(
self,
query: str,
column: str,
*,
boost: float = 1.0,
fuzziness: int = 0,
max_expansions: int = 50,
):
"""
Match query for full-text search.
Parameters
----------
query : str
The query string to match against.
column : str
The name of the column to match against.
boost : float, default 1.0
The boost factor for the query.
The score of each matching document is multiplied by this value.
fuzziness : int, optional
The maximum edit distance for each term in the match query.
Defaults to 0 (exact match).
If None, fuzziness is applied automatically by the rules:
- 0 for terms with length <= 2
- 1 for terms with length <= 5
- 2 for terms with length > 5
max_expansions : int, optional
The maximum number of terms to consider for fuzzy matching.
Defaults to 50.
"""
self.column = column
self.query = query
self.boost = boost
self.fuzziness = fuzziness
self.max_expansions = max_expansions
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.MATCH
def to_dict(self) -> dict:
return {
"match": {
self.column: {
"query": self.query,
"boost": self.boost,
"fuzziness": self.fuzziness,
"max_expansions": self.max_expansions,
}
}
}
class PhraseQuery(FullTextQuery):
def __init__(self, query: str, column: str):
"""
Phrase query for full-text search.
Parameters
----------
query : str
The query string to match against.
column : str
The name of the column to match against.
"""
self.column = column
self.query = query
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.MATCH_PHRASE
def to_dict(self) -> dict:
return {
"match_phrase": {
self.column: self.query,
}
}
class BoostQuery(FullTextQuery):
def __init__(
self,
positive: FullTextQuery,
negative: FullTextQuery,
negative_boost: float,
):
"""
Boost query for full-text search.
Parameters
----------
positive : dict
The positive query object.
negative : dict
The negative query object.
negative_boost : float
The boost factor for the negative query.
"""
self.positive = positive
self.negative = negative
self.negative_boost = negative_boost
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.BOOST
def to_dict(self) -> dict:
return {
"boost": {
"positive": self.positive.to_dict(),
"negative": self.negative.to_dict(),
"negative_boost": self.negative_boost,
}
}
class MultiMatchQuery(FullTextQuery):
def __init__(
self,
query: str,
columns: list[str],
*,
boosts: Optional[list[float]] = None,
):
"""
Multi-match query for full-text search.
Parameters
----------
query : str | list[Query]
If a string, the query string to match against.
columns : list[str]
The list of columns to match against.
boosts : list[float], optional
The list of boost factors for each column. If not provided,
all columns will have the same boost factor.
"""
self.query = query
self.columns = columns
if boosts is None:
boosts = [1.0] * len(columns)
self.boosts = boosts
def query_type(self) -> FullTextQueryType:
return FullTextQueryType.MULTI_MATCH
def to_dict(self) -> dict:
return {
"multi_match": {
"query": self.query,
"columns": self.columns,
"boost": self.boosts,
}
}
class FullTextSearchQuery(pydantic.BaseModel):
"""A LanceDB Full Text Search Query
@@ -92,18 +284,13 @@ class FullTextSearchQuery(pydantic.BaseModel):
The columns to search
If None, then the table should select the column automatically.
query: str
The query to search for
limit: Optional[int] = None
The limit on the number of results to return
wand_factor: Optional[float] = None
The wand factor to use for the search
query: str | FullTextQuery
If a string, it is treated as a MatchQuery.
If a FullTextQuery object, it is used directly.
"""
columns: Optional[List[str]] = None
query: str
limit: Optional[int] = None
wand_factor: Optional[float] = None
query: Union[str, FullTextQuery]
class Query(pydantic.BaseModel):
@@ -657,7 +844,45 @@ class LanceQueryBuilder(ABC):
-------
plan : str
""" # noqa: E501
return self._table._explain_plan(self.to_query_object())
return self._table._explain_plan(self.to_query_object(), verbose=verbose)
def analyze_plan(self) -> str:
"""
Run the query and return its execution plan with runtime metrics.
This returns detailed metrics for each step, such as elapsed time,
rows processed, bytes read, and I/O stats. It is useful for debugging
and performance tuning.
Examples
--------
>>> import lancedb
>>> db = lancedb.connect("./.lancedb")
>>> table = db.create_table("my_table", [{"vector": [99.0, 99]}])
>>> query = [100, 100]
>>> plan = table.search(query).analyze_plan()
>>> print(plan) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
AnalyzeExec verbose=true, metrics=[]
ProjectionExec: expr=[...], metrics=[...]
GlobalLimitExec: skip=0, fetch=10, metrics=[...]
FilterExec: _distance@2 IS NOT NULL,
metrics=[output_rows=..., elapsed_compute=...]
SortExec: TopK(fetch=10), expr=[...],
preserve_partitioning=[...],
metrics=[output_rows=..., elapsed_compute=..., row_replacements=...]
KNNVectorDistance: metric=l2,
metrics=[output_rows=..., elapsed_compute=..., output_batches=...]
LanceScan: uri=..., projection=[vector], row_id=true,
row_addr=false, ordered=false,
metrics=[output_rows=..., elapsed_compute=...,
bytes_read=..., iops=..., requests=...]
Returns
-------
plan : str
The physical query execution plan with runtime metrics.
"""
return self._table._analyze_plan(self.to_query_object())
def vector(self, vector: Union[np.ndarray, list]) -> Self:
"""Set the vector to search for.
@@ -674,13 +899,14 @@ class LanceQueryBuilder(ABC):
"""
raise NotImplementedError
def text(self, text: str) -> Self:
def text(self, text: str | FullTextQuery) -> Self:
"""Set the text to search for.
Parameters
----------
text: str
The text to search for.
text: str | FullTextQuery
If a string, it is treated as a MatchQuery.
If a FullTextQuery object, it is used directly.
Returns
-------
@@ -1046,7 +1272,7 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
def __init__(
self,
table: "Table",
query: str,
query: str | FullTextQuery,
ordering_field_name: Optional[str] = None,
fts_columns: Optional[Union[str, List[str]]] = None,
):
@@ -1653,7 +1879,7 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
self._vector = vector
return self
def text(self, text: str) -> LanceHybridQueryBuilder:
def text(self, text: str | FullTextQuery) -> LanceHybridQueryBuilder:
self._text = text
return self
@@ -1941,6 +2167,15 @@ class AsyncQueryBase(object):
""" # noqa: E501
return await self._inner.explain_plan(verbose)
async def analyze_plan(self):
"""Execute the query and display with runtime metrics.
Returns
-------
plan : str
"""
return await self._inner.analyze_plan()
class AsyncQuery(AsyncQueryBase):
def __init__(self, inner: LanceQuery):
@@ -2041,7 +2276,7 @@ class AsyncQuery(AsyncQueryBase):
)
def nearest_to_text(
self, query: str, columns: Union[str, List[str], None] = None
self, query: str | FullTextQuery, columns: Union[str, List[str], None] = None
) -> AsyncFTSQuery:
"""
Find the documents that are most relevant to the given text query.
@@ -2067,9 +2302,13 @@ class AsyncQuery(AsyncQueryBase):
columns = [columns]
if columns is None:
columns = []
return AsyncFTSQuery(
self._inner.nearest_to_text({"query": query, "columns": columns})
)
if isinstance(query, str):
return AsyncFTSQuery(
self._inner.nearest_to_text({"query": query, "columns": columns})
)
# FullTextQuery object
return AsyncFTSQuery(self._inner.nearest_to_text(query.to_dict()))
class AsyncFTSQuery(AsyncQueryBase):
@@ -2352,7 +2591,7 @@ class AsyncVectorQuery(AsyncQueryBase, AsyncVectorQueryBase):
return self
def nearest_to_text(
self, query: str, columns: Union[str, List[str], None] = None
self, query: str | FullTextQuery, columns: Union[str, List[str], None] = None
) -> AsyncHybridQuery:
"""
Find the documents that are most relevant to the given text query,
@@ -2382,9 +2621,13 @@ class AsyncVectorQuery(AsyncQueryBase, AsyncVectorQueryBase):
columns = [columns]
if columns is None:
columns = []
return AsyncHybridQuery(
self._inner.nearest_to_text({"query": query, "columns": columns})
)
if isinstance(query, str):
return AsyncHybridQuery(
self._inner.nearest_to_text({"query": query, "columns": columns})
)
# FullTextQuery object
return AsyncHybridQuery(self._inner.nearest_to_text(query.to_dict()))
async def to_batches(
self, *, max_batch_length: Optional[int] = None
@@ -2510,7 +2753,7 @@ class AsyncHybridQuery(AsyncQueryBase, AsyncVectorQueryBase):
Returns
-------
plan
plan : str
""" # noqa: E501
results = ["Vector Search Plan:"]
@@ -2519,3 +2762,23 @@ class AsyncHybridQuery(AsyncQueryBase, AsyncVectorQueryBase):
results.append(await self._inner.to_fts_query().explain_plan(verbose))
return "\n".join(results)
async def analyze_plan(self):
"""
Execute the query and return the physical execution plan with runtime metrics.
This runs both the vector and FTS (full-text search) queries and returns
detailed metrics for each step of execution—such as rows processed,
elapsed time, I/O stats, and more. Its useful for debugging and
performance analysis.
Returns
-------
plan : str
"""
results = ["Vector Search Query:"]
results.append(await self._inner.to_vector_query().analyze_plan())
results.append("FTS Search Query:")
results.append(await self._inner.to_fts_query().analyze_plan())
return "\n".join(results)

View File

@@ -87,6 +87,9 @@ class RemoteTable(Table):
def checkout_latest(self):
return LOOP.run(self._table.checkout_latest())
def restore(self, version: Optional[int] = None):
return LOOP.run(self._table.restore(version))
def list_indices(self) -> Iterable[IndexConfig]:
"""List all the indices on the table"""
return LOOP.run(self._table.list_indices())
@@ -365,6 +368,12 @@ class RemoteTable(Table):
return pa.RecordBatchReader.from_batches(async_iter.schema, iter_sync())
def _explain_plan(self, query: Query, verbose: Optional[bool] = False) -> str:
return LOOP.run(self._table._explain_plan(query, verbose))
def _analyze_plan(self, query: Query) -> str:
return LOOP.run(self._table._analyze_plan(query))
def merge_insert(self, on: Union[str, Iterable[str]]) -> LanceMergeInsertBuilder:
"""Returns a [`LanceMergeInsertBuilder`][lancedb.merge.LanceMergeInsertBuilder]
that can be used to create a "merge insert" operation.

View File

@@ -1007,6 +1007,12 @@ class Table(ABC):
self, query: Query, batch_size: Optional[int] = None
) -> pa.RecordBatchReader: ...
@abstractmethod
def _explain_plan(self, query: Query, verbose: Optional[bool] = False) -> str: ...
@abstractmethod
def _analyze_plan(self, query: Query) -> str: ...
@abstractmethod
def _do_merge(
self,
@@ -1262,16 +1268,21 @@ class Table(ABC):
"""
@abstractmethod
def add_columns(self, transforms: Dict[str, str]):
def add_columns(
self, transforms: Dict[str, str] | pa.Field | List[pa.Field] | pa.Schema
):
"""
Add new columns with defined values.
Parameters
----------
transforms: Dict[str, str]
transforms: Dict[str, str], pa.Field, List[pa.Field], pa.Schema
A map of column name to a SQL expression to use to calculate the
value of the new column. These expressions will be evaluated for
each row in the table, and can reference existing columns.
Alternatively, a pyarrow Field or Schema can be provided to add
new columns with the specified data types. The new columns will
be initialized with null values.
"""
@abstractmethod
@@ -1339,6 +1350,21 @@ class Table(ABC):
It can also be used to undo a `[Self::checkout]` operation
"""
@abstractmethod
def restore(self, version: Optional[int] = None):
"""Restore a version of the table. This is an in-place operation.
This creates a new version where the data is equivalent to the
specified previous version. Data is not copied (as of python-v0.2.1).
Parameters
----------
version : int, default None
The version to restore. If unspecified then restores the currently
checked out version. If the currently checked out version is the
latest version then this is a no-op.
"""
@abstractmethod
def list_versions(self) -> List[Dict[str, Any]]:
"""List all versions of the table"""
@@ -2292,8 +2318,11 @@ class LanceTable(Table):
return pa.RecordBatchReader.from_batches(async_iter.schema, iter_sync())
def _explain_plan(self, query: Query) -> str:
return LOOP.run(self._table._explain_plan(query))
def _explain_plan(self, query: Query, verbose: Optional[bool] = False) -> str:
return LOOP.run(self._table._explain_plan(query, verbose))
def _analyze_plan(self, query: Query) -> str:
return LOOP.run(self._table._analyze_plan(query))
def _do_merge(
self,
@@ -2442,7 +2471,9 @@ class LanceTable(Table):
"""
return LOOP.run(self._table.index_stats(index_name))
def add_columns(self, transforms: Dict[str, str]):
def add_columns(
self, transforms: Dict[str, str] | pa.field | List[pa.field] | pa.Schema
):
LOOP.run(self._table.add_columns(transforms))
def alter_columns(self, *alterations: Iterable[Dict[str, str]]):
@@ -3342,8 +3373,6 @@ class AsyncTable:
async_query = async_query.nearest_to_text(
query.full_text_query.query, query.full_text_query.columns
)
if query.full_text_query.limit is not None:
async_query = async_query.limit(query.full_text_query.limit)
return async_query
@@ -3358,10 +3387,15 @@ class AsyncTable:
return await async_query.to_batches(max_batch_length=batch_size)
async def _explain_plan(self, query: Query) -> str:
async def _explain_plan(self, query: Query, verbose: Optional[bool]) -> str:
# This method is used by the sync table
async_query = self._sync_query_to_async(query)
return await async_query.explain_plan()
return await async_query.explain_plan(verbose)
async def _analyze_plan(self, query: Query) -> str:
# This method is used by the sync table
async_query = self._sync_query_to_async(query)
return await async_query.analyze_plan()
async def _do_merge(
self,
@@ -3501,7 +3535,9 @@ class AsyncTable:
return await self._inner.update(updates_sql, where)
async def add_columns(self, transforms: dict[str, str]):
async def add_columns(
self, transforms: dict[str, str] | pa.field | List[pa.field] | pa.Schema
):
"""
Add new columns with defined values.
@@ -3511,8 +3547,19 @@ class AsyncTable:
A map of column name to a SQL expression to use to calculate the
value of the new column. These expressions will be evaluated for
each row in the table, and can reference existing columns.
Alternatively, you can pass a pyarrow field or schema to add
new columns with NULLs.
"""
await self._inner.add_columns(list(transforms.items()))
if isinstance(transforms, pa.Field):
transforms = [transforms]
if isinstance(transforms, list) and all(
{isinstance(f, pa.Field) for f in transforms}
):
transforms = pa.schema(transforms)
if isinstance(transforms, pa.Schema):
await self._inner.add_columns_with_schema(transforms)
else:
await self._inner.add_columns(list(transforms.items()))
async def alter_columns(self, *alterations: Iterable[dict[str, Any]]):
"""
@@ -3610,7 +3657,7 @@ class AsyncTable:
"""
await self._inner.checkout_latest()
async def restore(self):
async def restore(self, version: Optional[int] = None):
"""
Restore the table to the currently checked out version
@@ -3623,7 +3670,7 @@ class AsyncTable:
Once the operation concludes the table will no longer be in a checked
out state and the read_consistency_interval, if any, will apply.
"""
await self._inner.restore()
await self._inner.restore(version)
async def optimize(
self,

View File

@@ -315,6 +315,11 @@ def test_table():
db = lancedb.connect(uri, read_consistency_interval=timedelta(seconds=5))
tbl = db.open_table("test_table")
# --8<-- [end:table_eventual_consistency]
# --8<-- [start:table_no_consistency]
uri = "data/sample-lancedb"
db = lancedb.connect(uri, read_consistency_interval=None)
tbl = db.open_table("test_table")
# --8<-- [end:table_no_consistency]
# --8<-- [start:table_checkout_latest]
tbl = db.open_table("test_table")
@@ -562,13 +567,19 @@ async def test_table_async():
async_db = await lancedb.connect_async(uri, read_consistency_interval=timedelta(0))
async_tbl = await async_db.open_table("test_table_async")
# --8<-- [end:table_async_strong_consistency]
# --8<-- [start:table_async_ventual_consistency]
# --8<-- [start:table_async_eventual_consistency]
uri = "data/sample-lancedb"
async_db = await lancedb.connect_async(
uri, read_consistency_interval=timedelta(seconds=5)
)
async_tbl = await async_db.open_table("test_table_async")
# --8<-- [end:table_async_eventual_consistency]
# --8<-- [start:table_async_no_consistency]
uri = "data/sample-lancedb"
async_db = await lancedb.connect_async(uri, read_consistency_interval=None)
async_tbl = await async_db.open_table("test_table_async")
# --8<-- [end:table_async_no_consistency]
# --8<-- [start:table_async_checkout_latest]
async_tbl = await async_db.open_table("test_table_async")

View File

@@ -3,7 +3,6 @@
import re
from datetime import timedelta
import os
import lancedb
@@ -299,13 +298,11 @@ def test_create_exist_ok(tmp_db: lancedb.DBConnection):
@pytest.mark.asyncio
async def test_connect(tmp_path):
db = await lancedb.connect_async(tmp_path)
assert str(db) == f"ListingDatabase(uri={tmp_path}, read_consistency_interval=None)"
db = await lancedb.connect_async(
tmp_path, read_consistency_interval=timedelta(seconds=5)
)
assert str(db) == f"ListingDatabase(uri={tmp_path}, read_consistency_interval=5s)"
db = await lancedb.connect_async(tmp_path, read_consistency_interval=None)
assert str(db) == f"ListingDatabase(uri={tmp_path}, read_consistency_interval=None)"
@pytest.mark.asyncio
async def test_close(mem_db_async: lancedb.AsyncConnection):
@@ -453,7 +450,7 @@ async def test_open_table(tmp_path):
assert tbl.name == "test"
assert (
re.search(
r"NativeTable\(test, uri=.*test\.lance, read_consistency_interval=None\)",
r"NativeTable\(test, uri=.*test\.lance, read_consistency_interval=5s\)",
str(tbl),
)
is not None

View File

@@ -114,6 +114,16 @@ async def test_explain_plan(table: AsyncTable):
assert "LanceScan" in plan
@pytest.mark.asyncio
async def test_analyze_plan(table: AsyncTable):
res = await (
table.query().nearest_to_text("dog").nearest_to([0.1, 0.1]).analyze_plan()
)
assert "AnalyzeExec" in res
assert "metrics=" in res
def test_normalize_scores():
cases = [
(pa.array([0.1, 0.4]), pa.array([0.0, 1.0])),

View File

@@ -31,6 +31,7 @@ async def some_table(db_async):
{
"id": list(range(NROWS)),
"vector": sample_fixed_size_list_array(NROWS, DIM),
"fsb": pa.array([bytes([i]) for i in range(NROWS)], pa.binary(1)),
"tags": [
[f"tag{random.randint(0, 8)}" for _ in range(2)] for _ in range(NROWS)
],
@@ -85,6 +86,16 @@ async def test_create_scalar_index(some_table: AsyncTable):
assert len(indices) == 0
@pytest.mark.asyncio
async def test_create_fixed_size_binary_index(some_table: AsyncTable):
await some_table.create_index("fsb", config=BTree())
indices = await some_table.list_indices()
assert str(indices) == '[Index(BTree, columns=["fsb"], name="fsb_idx")]'
assert len(indices) == 1
assert indices[0].index_type == "BTree"
assert indices[0].columns == ["fsb"]
@pytest.mark.asyncio
async def test_create_bitmap_index(some_table: AsyncTable):
await some_table.create_index("id", config=Bitmap())

View File

@@ -702,6 +702,20 @@ async def test_fast_search_async(tmp_path):
assert "LanceScan" not in plan
def test_analyze_plan(table):
q = LanceVectorQueryBuilder(table, [0, 0], "vector")
res = q.analyze_plan()
assert "AnalyzeExec" in res
assert "metrics=" in res
@pytest.mark.asyncio
async def test_analyze_plan_async(table_async: AsyncTable):
res = await table_async.query().nearest_to(pa.array([1, 2])).analyze_plan()
assert "AnalyzeExec" in res
assert "metrics=" in res
def test_explain_plan(table):
q = LanceVectorQueryBuilder(table, [0, 0], "vector")
plan = q.explain_plan(verbose=True)

View File

@@ -444,6 +444,16 @@ def test_query_sync_fts():
"prefilter": True,
"with_row_id": True,
"version": None,
} or body == {
"full_text_query": {
"query": "puppy",
"columns": ["description", "name"],
},
"k": 42,
"vector": [],
"prefilter": True,
"with_row_id": True,
"version": None,
}
return pa.table({"id": [1, 2, 3]})

View File

@@ -32,7 +32,11 @@ def test_basic(mem_db: DBConnection):
table = mem_db.create_table("test", data=data)
assert table.name == "test"
assert "LanceTable(name='test', version=1, _conn=LanceDBConnection(" in repr(table)
assert (
"LanceTable(name='test', version=1, "
"read_consistency_interval=datetime.timedelta(seconds=5), "
"_conn=LanceDBConnection("
) in repr(table)
expected_schema = pa.schema(
{
"vector": pa.list_(pa.float32(), 2),
@@ -1384,6 +1388,37 @@ async def test_add_columns_async(mem_db_async: AsyncConnection):
assert data["new_col"].to_pylist() == [2, 3]
@pytest.mark.asyncio
async def test_add_columns_with_schema(mem_db_async: AsyncConnection):
data = pa.table({"id": [0, 1]})
table = await mem_db_async.create_table("my_table", data=data)
await table.add_columns(
[pa.field("x", pa.int64()), pa.field("vector", pa.list_(pa.float32(), 8))]
)
assert await table.schema() == pa.schema(
[
pa.field("id", pa.int64()),
pa.field("x", pa.int64()),
pa.field("vector", pa.list_(pa.float32(), 8)),
]
)
table = await mem_db_async.create_table("table2", data=data)
await table.add_columns(
pa.schema(
[pa.field("y", pa.int64()), pa.field("emb", pa.list_(pa.float32(), 8))]
)
)
assert await table.schema() == pa.schema(
[
pa.field("id", pa.int64()),
pa.field("y", pa.int64()),
pa.field("emb", pa.list_(pa.float32(), 8)),
]
)
def test_alter_columns(mem_db: DBConnection):
data = pa.table({"id": [0, 1]})
table = mem_db.create_table("my_table", data=data)

View File

@@ -204,7 +204,9 @@ pub fn connect(
}
if let Some(read_consistency_interval) = read_consistency_interval {
let read_consistency_interval = Duration::from_secs_f64(read_consistency_interval);
builder = builder.read_consistency_interval(read_consistency_interval);
builder = builder.read_consistency_interval(Some(read_consistency_interval));
} else {
builder = builder.read_consistency_interval(None);
}
if let Some(storage_options) = storage_options {
builder = builder.storage_options(storage_options);

View File

@@ -8,19 +8,19 @@ use arrow::array::Array;
use arrow::array::ArrayData;
use arrow::pyarrow::FromPyArrow;
use arrow::pyarrow::IntoPyArrow;
use lancedb::index::scalar::FullTextSearchQuery;
use lancedb::index::scalar::{FtsQuery, FullTextSearchQuery, MatchQuery, PhraseQuery};
use lancedb::query::QueryExecutionOptions;
use lancedb::query::QueryFilter;
use lancedb::query::{
ExecutableQuery, Query as LanceDbQuery, QueryBase, Select, VectorQuery as LanceDbVectorQuery,
};
use lancedb::table::AnyQuery;
use pyo3::exceptions::PyNotImplementedError;
use pyo3::exceptions::PyRuntimeError;
use pyo3::exceptions::{PyNotImplementedError, PyValueError};
use pyo3::prelude::{PyAnyMethods, PyDictMethods};
use pyo3::pymethods;
use pyo3::types::PyDict;
use pyo3::types::PyList;
use pyo3::types::{PyDict, PyString};
use pyo3::Bound;
use pyo3::IntoPyObject;
use pyo3::PyAny;
@@ -31,7 +31,7 @@ use pyo3_async_runtimes::tokio::future_into_py;
use crate::arrow::RecordBatchStream;
use crate::error::PythonErrorExt;
use crate::util::parse_distance_type;
use crate::util::{parse_distance_type, parse_fts_query};
// Python representation of full text search parameters
#[derive(Clone)]
@@ -46,8 +46,8 @@ pub struct PyFullTextSearchQuery {
impl From<FullTextSearchQuery> for PyFullTextSearchQuery {
fn from(query: FullTextSearchQuery) -> Self {
PyFullTextSearchQuery {
columns: query.columns,
query: query.query,
columns: query.columns().into_iter().collect(),
query: query.query.query().to_owned(),
limit: query.limit,
wand_factor: query.wand_factor,
}
@@ -236,22 +236,61 @@ impl Query {
}
pub fn nearest_to_text(&mut self, query: Bound<'_, PyDict>) -> PyResult<FTSQuery> {
let query_text = query
let fts_query = query
.get_item("query")?
.ok_or(PyErr::new::<PyRuntimeError, _>(
"Query text is required for nearest_to_text",
))?
.extract::<String>()?;
let columns = query
.get_item("columns")?
.map(|columns| columns.extract::<Vec<String>>())
.transpose()?;
))?;
let fts_query = FullTextSearchQuery::new(query_text).columns(columns);
let query = if let Ok(query_text) = fts_query.downcast::<PyString>() {
let mut query_text = query_text.to_string();
let columns = query
.get_item("columns")?
.map(|columns| columns.extract::<Vec<String>>())
.transpose()?;
let is_phrase =
query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"');
let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false);
if is_phrase {
// Remove the surrounding quotes for phrase queries
query_text = query_text[1..query_text.len() - 1].to_string();
}
let query: FtsQuery = match (is_phrase, is_multi_match) {
(false, _) => MatchQuery::new(query_text).into(),
(true, false) => PhraseQuery::new(query_text).into(),
(true, true) => {
return Err(PyValueError::new_err(
"Phrase queries cannot be used with multiple columns.",
));
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
query = query.with_columns(&cols).map_err(|e| {
PyValueError::new_err(format!(
"Failed to set full text search columns: {}",
e
))
})?;
}
}
query
} else if let Ok(query) = query.downcast::<PyDict>() {
let query = parse_fts_query(query)?;
FullTextSearchQuery::new_query(query)
} else {
return Err(PyValueError::new_err(
"query must be a string or a Query object",
));
};
Ok(FTSQuery {
fts_query,
inner: self.inner.clone(),
fts_query: query,
})
}
@@ -271,7 +310,7 @@ impl Query {
})
}
fn explain_plan(self_: PyRef<'_, Self>, verbose: bool) -> PyResult<Bound<'_, PyAny>> {
pub fn explain_plan(self_: PyRef<'_, Self>, verbose: bool) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner.clone();
future_into_py(self_.py(), async move {
inner
@@ -281,6 +320,16 @@ impl Query {
})
}
pub fn analyze_plan(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner.clone();
future_into_py(self_.py(), async move {
inner
.analyze_plan()
.await
.map_err(|e| PyRuntimeError::new_err(e.to_string()))
})
}
pub fn to_query_request(&self) -> PyQueryRequest {
PyQueryRequest::from(AnyQuery::Query(self.inner.clone().into_request()))
}
@@ -365,8 +414,18 @@ impl FTSQuery {
})
}
pub fn analyze_plan(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner.clone();
future_into_py(self_.py(), async move {
inner
.analyze_plan()
.await
.map_err(|e| PyRuntimeError::new_err(e.to_string()))
})
}
pub fn get_query(&self) -> String {
self.fts_query.query.clone()
self.fts_query.query.query().to_owned()
}
pub fn to_query_request(&self) -> PyQueryRequest {
@@ -470,7 +529,7 @@ impl VectorQuery {
})
}
fn explain_plan(self_: PyRef<'_, Self>, verbose: bool) -> PyResult<Bound<'_, PyAny>> {
pub fn explain_plan(self_: PyRef<'_, Self>, verbose: bool) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner.clone();
future_into_py(self_.py(), async move {
inner
@@ -480,6 +539,16 @@ impl VectorQuery {
})
}
pub fn analyze_plan(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner.clone();
future_into_py(self_.py(), async move {
inner
.analyze_plan()
.await
.map_err(|e| PyRuntimeError::new_err(e.to_string()))
})
}
pub fn nearest_to_text(&mut self, query: Bound<'_, PyDict>) -> PyResult<HybridQuery> {
let base_query = self.inner.clone().into_plain();
let fts_query = Query::new(base_query).nearest_to_text(query)?;

View File

@@ -1,9 +1,11 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::{collections::HashMap, sync::Arc};
use arrow::{
datatypes::DataType,
datatypes::{DataType, Schema},
ffi_stream::ArrowArrayStreamReader,
pyarrow::{FromPyArrow, ToPyArrow},
pyarrow::{FromPyArrow, PyArrowType, ToPyArrow},
};
use lancedb::table::{
AddDataMode, ColumnAlteration, Duration, NewColumnTransform, OptimizeAction, OptimizeOptions,
@@ -16,7 +18,6 @@ use pyo3::{
Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
use std::collections::HashMap;
use crate::{
error::PythonErrorExt,
@@ -303,12 +304,16 @@ impl Table {
})
}
pub fn restore(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
#[pyo3(signature = (version=None))]
pub fn restore(self_: PyRef<'_, Self>, version: Option<u64>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
future_into_py(
self_.py(),
async move { inner.restore().await.infer_error() },
)
future_into_py(self_.py(), async move {
if let Some(version) = version {
inner.checkout(version).await.infer_error()?;
}
inner.restore().await.infer_error()
})
}
pub fn query(&self) -> Query {
@@ -440,6 +445,20 @@ impl Table {
})
}
pub fn add_columns_with_schema(
self_: PyRef<'_, Self>,
schema: PyArrowType<Schema>,
) -> PyResult<Bound<'_, PyAny>> {
let arrow_schema = &schema.0;
let transform = NewColumnTransform::AllNulls(Arc::new(arrow_schema.clone()));
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {
inner.add_columns(transform, None).await.infer_error()?;
Ok(())
})
}
pub fn alter_columns<'a>(
self_: PyRef<'a, Self>,
alterations: Vec<Bound<PyDict>>,

View File

@@ -3,11 +3,15 @@
use std::sync::Mutex;
use lancedb::index::scalar::{BoostQuery, FtsQuery, MatchQuery, MultiMatchQuery, PhraseQuery};
use lancedb::DistanceType;
use pyo3::prelude::{PyAnyMethods, PyDictMethods, PyListMethods};
use pyo3::types::PyDict;
use pyo3::{
exceptions::{PyRuntimeError, PyValueError},
pyfunction, PyResult,
};
use pyo3::{Bound, PyAny};
/// A wrapper around a rust builder
///
@@ -59,3 +63,116 @@ pub fn validate_table_name(table_name: &str) -> PyResult<()> {
lancedb::utils::validate_table_name(table_name)
.map_err(|e| PyValueError::new_err(e.to_string()))
}
pub fn parse_fts_query(query: &Bound<'_, PyDict>) -> PyResult<FtsQuery> {
let query_type = query.keys().get_item(0)?.extract::<String>()?;
let query_value = query
.get_item(&query_type)?
.ok_or(PyValueError::new_err(format!(
"Query type {} not found",
query_type
)))?;
let query_value = query_value.downcast::<PyDict>()?;
match query_type.as_str() {
"match" => {
let column = query_value.keys().get_item(0)?.extract::<String>()?;
let params = query_value
.get_item(&column)?
.ok_or(PyValueError::new_err(format!(
"column {} not found",
column
)))?;
let params = params.downcast::<PyDict>()?;
let query = params
.get_item("query")?
.ok_or(PyValueError::new_err("query not found"))?
.extract::<String>()?;
let boost = params
.get_item("boost")?
.ok_or(PyValueError::new_err("boost not found"))?
.extract::<f32>()?;
let fuzziness = params
.get_item("fuzziness")?
.ok_or(PyValueError::new_err("fuzziness not found"))?
.extract::<Option<u32>>()?;
let max_expansions = params
.get_item("max_expansions")?
.ok_or(PyValueError::new_err("max_expansions not found"))?
.extract::<usize>()?;
let query = MatchQuery::new(query)
.with_column(Some(column))
.with_boost(boost)
.with_fuzziness(fuzziness)
.with_max_expansions(max_expansions);
Ok(query.into())
}
"match_phrase" => {
let column = query_value.keys().get_item(0)?.extract::<String>()?;
let query = query_value
.get_item(&column)?
.ok_or(PyValueError::new_err(format!(
"column {} not found",
column
)))?
.extract::<String>()?;
let query = PhraseQuery::new(query).with_column(Some(column));
Ok(query.into())
}
"boost" => {
let positive: Bound<'_, PyAny> = query_value
.get_item("positive")?
.ok_or(PyValueError::new_err("positive not found"))?;
let positive = positive.downcast::<PyDict>()?;
let negative = query_value
.get_item("negative")?
.ok_or(PyValueError::new_err("negative not found"))?;
let negative = negative.downcast::<PyDict>()?;
let negative_boost = query_value
.get_item("negative_boost")?
.ok_or(PyValueError::new_err("negative_boost not found"))?
.extract::<f32>()?;
let positive_query = parse_fts_query(positive)?;
let negative_query = parse_fts_query(negative)?;
let query = BoostQuery::new(positive_query, negative_query, Some(negative_boost));
Ok(query.into())
}
"multi_match" => {
let query = query_value
.get_item("query")?
.ok_or(PyValueError::new_err("query not found"))?
.extract::<String>()?;
let columns = query_value
.get_item("columns")?
.ok_or(PyValueError::new_err("columns not found"))?
.extract::<Vec<String>>()?;
let boost = query_value
.get_item("boost")?
.ok_or(PyValueError::new_err("boost not found"))?
.extract::<Vec<f32>>()?;
let query =
MultiMatchQuery::try_new_with_boosts(query, columns, boost).map_err(|e| {
PyValueError::new_err(format!("Error creating MultiMatchQuery: {}", e))
})?;
Ok(query.into())
}
_ => Err(PyValueError::new_err(format!(
"Unsupported query type: {}",
query_type
))),
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-node"
version = "0.18.2-beta.0"
version = "0.19.0-beta.1"
description = "Serverless, low-latency vector database for AI applications"
license.workspace = true
edition.workspace = true

View File

@@ -60,7 +60,7 @@ fn database_new(mut cx: FunctionContext) -> JsResult<JsPromise> {
let mut conn_builder = connect(&path).storage_options(storage_options);
if let Some(interval) = read_consistency_interval {
conn_builder = conn_builder.read_consistency_interval(interval);
conn_builder = conn_builder.read_consistency_interval(Some(interval));
}
rt.spawn(async move {
let database = conn_builder.execute().await;

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.18.2-beta.0"
version = "0.19.0-beta.1"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -12,7 +12,7 @@ use super::{
Catalog, CatalogOptions, CreateDatabaseMode, CreateDatabaseRequest, DatabaseNamesRequest,
OpenDatabaseRequest,
};
use crate::connection::ConnectRequest;
use crate::connection::{ConnectRequest, DEFAULT_READ_CONSISTENCY_INTERVAL};
use crate::database::listing::{ListingDatabase, ListingDatabaseOptions};
use crate::database::{Database, DatabaseOptions};
use crate::error::{CreateDirSnafu, Error, Result};
@@ -214,7 +214,7 @@ impl Catalog for ListingCatalog {
uri: db_uri,
#[cfg(feature = "remote")]
client_config: Default::default(),
read_consistency_interval: None,
read_consistency_interval: DEFAULT_READ_CONSISTENCY_INTERVAL,
options: Default::default(),
};
@@ -241,7 +241,7 @@ impl Catalog for ListingCatalog {
uri: db_path.to_string(),
#[cfg(feature = "remote")]
client_config: Default::default(),
read_consistency_interval: None,
read_consistency_interval: DEFAULT_READ_CONSISTENCY_INTERVAL,
options: Default::default(),
};
@@ -311,7 +311,7 @@ mod tests {
#[cfg(feature = "remote")]
client_config: Default::default(),
options: Default::default(),
read_consistency_interval: None,
read_consistency_interval: DEFAULT_READ_CONSISTENCY_INTERVAL,
};
let catalog = ListingCatalog::connect(&request).await.unwrap();

View File

@@ -36,6 +36,9 @@ pub use lance_encoding::version::LanceFileVersion;
#[cfg(feature = "remote")]
use lance_io::object_store::StorageOptions;
pub(crate) const DEFAULT_READ_CONSISTENCY_INTERVAL: Option<std::time::Duration> =
Some(std::time::Duration::from_secs(5));
/// A builder for configuring a [`Connection::table_names`] operation
pub struct TableNamesBuilder {
parent: Arc<dyn Database>,
@@ -618,14 +621,15 @@ pub struct ConnectRequest {
/// The interval at which to check for updates from other processes.
///
/// If None, then consistency is not checked. For performance
/// reasons, this is the default. For strong consistency, set this to
/// If None, then consistency is not checked. For strong consistency, set this to
/// zero seconds. Then every read will check for updates from other
/// processes. As a compromise, you can set this to a non-zero timedelta
/// for eventual consistency. If more than that interval has passed since
/// the last check, then the table will be checked for updates. Note: this
/// consistency only applies to read operations. Write operations are
/// always consistent.
///
/// The default is 5 seconds.
pub read_consistency_interval: Option<std::time::Duration>,
}
@@ -643,7 +647,7 @@ impl ConnectBuilder {
uri: uri.to_string(),
#[cfg(feature = "remote")]
client_config: Default::default(),
read_consistency_interval: None,
read_consistency_interval: DEFAULT_READ_CONSISTENCY_INTERVAL,
options: HashMap::new(),
},
embedding_registry: None,
@@ -782,8 +786,7 @@ impl ConnectBuilder {
/// The interval at which to check for updates from other processes. This
/// only affects LanceDB OSS.
///
/// If left unset, consistency is not checked. For maximum read
/// performance, this is the default. For strong consistency, set this to
/// If left unset, consistency is not checked. For strong consistency, set this to
/// zero seconds. Then every read will check for updates from other processes.
/// As a compromise, set this to a non-zero duration for eventual consistency.
/// If more than that duration has passed since the last read, the read will
@@ -792,13 +795,15 @@ impl ConnectBuilder {
/// This only affects read operations. Write operations are always
/// consistent.
///
/// The default is 5 seconds.
///
/// LanceDB Cloud uses eventual consistency under the hood, and is not
/// currently configurable.
pub fn read_consistency_interval(
mut self,
read_consistency_interval: std::time::Duration,
read_consistency_interval: Option<std::time::Duration>,
) -> Self {
self.request.read_consistency_interval = Some(read_consistency_interval);
self.request.read_consistency_interval = read_consistency_interval;
self
}
@@ -882,7 +887,7 @@ impl CatalogConnectBuilder {
uri: uri.to_string(),
#[cfg(feature = "remote")]
client_config: Default::default(),
read_consistency_interval: None,
read_consistency_interval: DEFAULT_READ_CONSISTENCY_INTERVAL,
options: HashMap::new(),
},
}

View File

@@ -80,5 +80,6 @@ impl FtsIndexBuilder {
}
}
pub use lance_index::scalar::inverted::query::*;
pub use lance_index::scalar::inverted::TokenizerConfig;
pub use lance_index::scalar::FullTextSearchQuery;

View File

@@ -579,6 +579,15 @@ pub trait ExecutableQuery {
) -> impl Future<Output = Result<SendableRecordBatchStream>> + Send;
fn explain_plan(&self, verbose: bool) -> impl Future<Output = Result<String>> + Send;
fn analyze_plan(&self) -> impl Future<Output = Result<String>> + Send {
self.analyze_plan_with_options(QueryExecutionOptions::default())
}
fn analyze_plan_with_options(
&self,
options: QueryExecutionOptions,
) -> impl Future<Output = Result<String>> + Send;
}
/// A query filter that can be applied to a query
@@ -765,6 +774,11 @@ impl ExecutableQuery for Query {
let query = AnyQuery::Query(self.request.clone());
self.parent.explain_plan(&query, verbose).await
}
async fn analyze_plan_with_options(&self, options: QueryExecutionOptions) -> Result<String> {
let query = AnyQuery::Query(self.request.clone());
self.parent.analyze_plan(&query, options).await
}
}
/// A request for a nearest-neighbors search into a table
@@ -1042,7 +1056,7 @@ impl VectorQuery {
})?;
let mut results = reranker
.rerank_hybrid(&fts_query.query, vec_results, fts_results)
.rerank_hybrid(&fts_query.query.query(), vec_results, fts_results)
.await?;
check_reranker_result(&results)?;
@@ -1089,6 +1103,11 @@ impl ExecutableQuery for VectorQuery {
let query = AnyQuery::VectorQuery(self.request.clone());
self.parent.explain_plan(&query, verbose).await
}
async fn analyze_plan_with_options(&self, options: QueryExecutionOptions) -> Result<String> {
let query = AnyQuery::VectorQuery(self.request.clone());
self.parent.analyze_plan(&query, options).await
}
}
impl HasQuery for VectorQuery {
@@ -1370,6 +1389,31 @@ mod tests {
}
}
#[tokio::test]
async fn test_analyze_plan() {
let tmp_dir = tempdir().unwrap();
let table = make_test_table(&tmp_dir).await;
let result = table.query().analyze_plan().await.unwrap();
assert!(result.contains("metrics="));
}
#[tokio::test]
async fn test_analyze_plan_with_options() {
let tmp_dir = tempdir().unwrap();
let table = make_test_table(&tmp_dir).await;
let result = table
.query()
.analyze_plan_with_options(QueryExecutionOptions {
max_batch_length: 10,
..Default::default()
})
.await
.unwrap();
assert!(result.contains("metrics="));
}
fn assert_plan_exists(plan: &Arc<dyn ExecutionPlan>, name: &str) -> bool {
if plan.name() == name {
return true;

View File

@@ -52,6 +52,10 @@ impl ServerVersion {
pub fn support_multivector(&self) -> bool {
self.0 >= semver::Version::new(0, 2, 0)
}
pub fn support_structural_fts(&self) -> bool {
self.0 >= semver::Version::new(0, 3, 0)
}
}
pub const OPT_REMOTE_PREFIX: &str = "remote_database_";

View File

@@ -155,7 +155,11 @@ impl<S: HttpSend> RemoteTable<S> {
Ok(Box::pin(RecordBatchStreamAdapter::new(schema, stream)))
}
fn apply_query_params(body: &mut serde_json::Value, params: &QueryRequest) -> Result<()> {
fn apply_query_params(
&self,
body: &mut serde_json::Value,
params: &QueryRequest,
) -> Result<()> {
body["prefilter"] = params.prefilter.into();
if let Some(offset) = params.offset {
body["offset"] = serde_json::Value::Number(serde_json::Number::from(offset));
@@ -209,10 +213,17 @@ impl<S: HttpSend> RemoteTable<S> {
message: "Wand factor is not yet supported in LanceDB Cloud".into(),
});
}
body["full_text_query"] = serde_json::json!({
"columns": full_text_search.columns,
"query": full_text_search.query,
})
if self.server_version.support_structural_fts() {
body["full_text_query"] = serde_json::json!({
"query": full_text_search.query.clone(),
});
} else {
body["full_text_query"] = serde_json::json!({
"columns": full_text_search.columns().into_iter().collect::<Vec<_>>(),
"query": full_text_search.query.query(),
})
}
}
Ok(())
@@ -223,7 +234,7 @@ impl<S: HttpSend> RemoteTable<S> {
mut body: serde_json::Value,
query: &VectorQueryRequest,
) -> Result<Vec<serde_json::Value>> {
Self::apply_query_params(&mut body, &query.base)?;
self.apply_query_params(&mut body, &query.base)?;
// Apply general parameters, before we dispatch based on number of query vectors.
body["distance_type"] = serde_json::json!(query.distance_type.unwrap_or_default());
@@ -325,24 +336,11 @@ impl<S: HttpSend> RemoteTable<S> {
) -> Result<Vec<Pin<Box<dyn RecordBatchStream + Send>>>> {
let request = self.client.post(&format!("/v1/table/{}/query/", self.name));
let version = self.current_version().await;
let mut body = serde_json::json!({ "version": version });
let requests = match query {
AnyQuery::Query(query) => {
Self::apply_query_params(&mut body, query)?;
// Empty vector can be passed if no vector search is performed.
body["vector"] = serde_json::Value::Array(Vec::new());
vec![request.json(&body)]
}
AnyQuery::VectorQuery(query) => {
let bodies = self.apply_vector_query_params(body, query)?;
bodies
.into_iter()
.map(|body| request.try_clone().unwrap().json(&body))
.collect()
}
};
let query_bodies = self.prepare_query_bodies(query).await?;
let requests: Vec<reqwest::RequestBuilder> = query_bodies
.into_iter()
.map(|body| request.try_clone().unwrap().json(&body))
.collect();
let futures = requests.into_iter().map(|req| async move {
let (request_id, response) = self.client.send(req, true).await?;
@@ -351,6 +349,22 @@ impl<S: HttpSend> RemoteTable<S> {
let streams = futures::future::try_join_all(futures).await?;
Ok(streams)
}
async fn prepare_query_bodies(&self, query: &AnyQuery) -> Result<Vec<serde_json::Value>> {
let version = self.current_version().await;
let base_body = serde_json::json!({ "version": version });
match query {
AnyQuery::Query(query) => {
let mut body = base_body.clone();
self.apply_query_params(&mut body, query)?;
// Empty vector can be passed if no vector search is performed.
body["vector"] = serde_json::Value::Array(Vec::new());
Ok(vec![body])
}
AnyQuery::VectorQuery(query) => self.apply_vector_query_params(base_body, query),
}
}
}
#[derive(Deserialize)]
@@ -422,10 +436,17 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
Ok(())
}
async fn restore(&self) -> Result<()> {
self.check_mutable().await?;
Err(Error::NotSupported {
message: "restore is not supported on LanceDB cloud.".into(),
})
let mut request = self
.client
.post(&format!("/v1/table/{}/restore/", self.name));
let version = self.current_version().await;
let body = serde_json::json!({ "version": version });
request = request.json(&body);
let (request_id, response) = self.client.send(request, true).await?;
self.check_table_response(&request_id, response).await?;
self.checkout_latest().await?;
Ok(())
}
async fn list_versions(&self) -> Result<Vec<Version>> {
@@ -559,6 +580,94 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
)?))
}
}
async fn explain_plan(&self, query: &AnyQuery, verbose: bool) -> Result<String> {
let base_request = self
.client
.post(&format!("/v1/table/{}/explain_plan/", self.name));
let query_bodies = self.prepare_query_bodies(query).await?;
let requests: Vec<reqwest::RequestBuilder> = query_bodies
.into_iter()
.map(|query_body| {
let explain_request = serde_json::json!({
"verbose": verbose,
"query": query_body
});
base_request.try_clone().unwrap().json(&explain_request)
})
.collect::<Vec<_>>();
let futures = requests.into_iter().map(|req| async move {
let (request_id, response) = self.client.send(req, true).await?;
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
serde_json::from_str(&body).map_err(|e| Error::Http {
source: format!("Failed to parse explain plan: {}", e).into(),
request_id,
status_code: None,
})
});
let plan_texts = futures::future::try_join_all(futures).await?;
let final_plan = if plan_texts.len() > 1 {
plan_texts
.into_iter()
.enumerate()
.map(|(i, plan)| format!("--- Plan #{} ---\n{}", i + 1, plan))
.collect::<Vec<_>>()
.join("\n\n")
} else {
plan_texts.into_iter().next().unwrap_or_default()
};
Ok(final_plan)
}
async fn analyze_plan(
&self,
query: &AnyQuery,
_options: QueryExecutionOptions,
) -> Result<String> {
let request = self
.client
.post(&format!("/v1/table/{}/analyze_plan/", self.name));
let query_bodies = self.prepare_query_bodies(query).await?;
let requests: Vec<reqwest::RequestBuilder> = query_bodies
.into_iter()
.map(|body| request.try_clone().unwrap().json(&body))
.collect();
let futures = requests.into_iter().map(|req| async move {
let (request_id, response) = self.client.send(req, true).await?;
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
serde_json::from_str(&body).map_err(|e| Error::Http {
source: format!("Failed to execute analyze plan: {}", e).into(),
request_id,
status_code: None,
})
});
let analyze_result_texts = futures::future::try_join_all(futures).await?;
let final_analyze = if analyze_result_texts.len() > 1 {
analyze_result_texts
.into_iter()
.enumerate()
.map(|(i, plan)| format!("--- Query #{} ---\n{}", i + 1, plan))
.collect::<Vec<_>>()
.join("\n\n")
} else {
analyze_result_texts.into_iter().next().unwrap_or_default()
};
Ok(final_analyze)
}
async fn update(&self, update: UpdateBuilder) -> Result<u64> {
self.check_mutable().await?;
let request = self
@@ -581,6 +690,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
Ok(0) // TODO: support returning number of modified rows once supported in SaaS.
}
async fn delete(&self, predicate: &str) -> Result<()> {
self.check_mutable().await?;
let body = serde_json::json!({ "predicate": predicate });
@@ -1584,7 +1694,18 @@ mod tests {
"prefilter": true,
"version": null
});
assert_eq!(body, expected_body);
let expected_body_2 = serde_json::json!({
"full_text_query": {
"columns": ["b","a"],
"query": "hello world",
},
"k": 10,
"vector": [],
"with_row_id": true,
"prefilter": true,
"version": null
});
assert!(body == expected_body || body == expected_body_2);
let data = RecordBatch::try_new(
Arc::new(Schema::new(vec![Field::new("a", DataType::Int32, false)])),
@@ -1603,7 +1724,8 @@ mod tests {
.query()
.full_text_search(
FullTextSearchQuery::new("hello world".into())
.columns(Some(vec!["a".into(), "b".into()])),
.with_columns(&["a".into(), "b".into()])
.unwrap(),
)
.with_row_id()
.limit(10)

View File

@@ -33,7 +33,7 @@ use lance::dataset::{
use lance::dataset::{MergeInsertBuilder as LanceMergeInsertBuilder, WhenNotMatchedBySource};
use lance::index::vector::utils::infer_vector_dim;
use lance::io::WrappingObjectStore;
use lance_datafusion::exec::execute_plan;
use lance_datafusion::exec::{analyze_plan as lance_analyze_plan, execute_plan};
use lance_datafusion::utils::StreamingWriteSource;
use lance_index::vector::hnsw::builder::HnswBuildParams;
use lance_index::vector::ivf::IvfBuildParams;
@@ -433,6 +433,12 @@ pub trait BaseTable: std::fmt::Display + std::fmt::Debug + Send + Sync {
Ok(format!("{}", display.indent(verbose)))
}
async fn analyze_plan(
&self,
query: &AnyQuery,
options: QueryExecutionOptions,
) -> Result<String>;
/// Add new records to the table.
async fn add(
&self,
@@ -2192,6 +2198,15 @@ impl BaseTable for NativeTable {
self.generic_query(query, options).await
}
async fn analyze_plan(
&self,
query: &AnyQuery,
options: QueryExecutionOptions,
) -> Result<String> {
let plan = self.create_plan(query, options).await?;
Ok(lance_analyze_plan(plan, Default::default()).await?)
}
async fn merge_insert(
&self,
params: MergeInsertBuilder,
@@ -2611,7 +2626,7 @@ mod tests {
let dataset_path = tmp_dir.path().join("test.lance");
let uri = dataset_path.to_str().unwrap();
let conn = connect(uri)
.read_consistency_interval(Duration::from_secs(0))
.read_consistency_interval(Some(Duration::from_secs(0)))
.execute()
.await
.unwrap();
@@ -2694,7 +2709,7 @@ mod tests {
let dataset_path = tmp_dir.path().join("test.lance");
let uri = dataset_path.to_str().unwrap();
let conn = connect(uri)
.read_consistency_interval(Duration::from_secs(0))
.read_consistency_interval(Some(Duration::from_secs(0)))
.execute()
.await
.unwrap();
@@ -2891,7 +2906,7 @@ mod tests {
let dataset_path = tmp_dir.path().join("test.lance");
let uri = dataset_path.to_str().unwrap();
let conn = connect(uri)
.read_consistency_interval(Duration::from_secs(0))
.read_consistency_interval(Some(Duration::from_secs(0)))
.execute()
.await
.unwrap();
@@ -3462,7 +3477,8 @@ mod tests {
let mut conn2 = ConnectBuilder::new(uri);
if let Some(interval) = interval {
conn2 = conn2.read_consistency_interval(std::time::Duration::from_millis(interval));
conn2 = conn2
.read_consistency_interval(Some(std::time::Duration::from_millis(interval)));
}
let conn2 = conn2.execute().await.unwrap();
let table2 = conn2.open_table("my_table").execute().await.unwrap();
@@ -3498,7 +3514,7 @@ mod tests {
let uri = tmp_dir.path().to_str().unwrap();
let conn = ConnectBuilder::new(uri)
.read_consistency_interval(Duration::from_secs(0))
.read_consistency_interval(Some(Duration::from_secs(0)))
.execute()
.await
.unwrap();
@@ -3519,7 +3535,7 @@ mod tests {
let uri = tmp_dir.path().to_str().unwrap();
let conn = ConnectBuilder::new(uri)
.read_consistency_interval(Duration::from_secs(0))
.read_consistency_interval(Some(Duration::from_secs(0)))
.execute()
.await
.unwrap();
@@ -3594,7 +3610,7 @@ mod tests {
let uri = tmp_dir.path().to_str().unwrap();
let conn = ConnectBuilder::new(uri)
.read_consistency_interval(Duration::from_secs(0))
.read_consistency_interval(Some(Duration::from_secs(0)))
.execute()
.await
.unwrap();
@@ -3656,7 +3672,7 @@ mod tests {
let uri = tmp_dir.path().to_str().unwrap();
let conn = ConnectBuilder::new(uri)
.read_consistency_interval(Duration::from_secs(0))
.read_consistency_interval(Some(Duration::from_secs(0)))
.execute()
.await
.unwrap();

View File

@@ -7,6 +7,7 @@ use std::{
time::{self, Duration, Instant},
};
use futures::FutureExt;
use lance::Dataset;
use tokio::sync::{RwLock, RwLockReadGuard, RwLockWriteGuard};
@@ -22,13 +23,16 @@ pub struct DatasetConsistencyWrapper(Arc<RwLock<DatasetRef>>);
///
/// The dataset is lazily loaded, and starts off as None. On the first access,
/// the dataset is loaded.
#[derive(Debug, Clone)]
#[derive(Debug)]
enum DatasetRef {
/// In this mode, the dataset is always the latest version.
Latest {
dataset: Dataset,
read_consistency_interval: Option<Duration>,
last_consistency_check: Option<time::Instant>,
/// A background task loading the next version of the dataset. This happens
/// in the background so as not to block the current thread.
refresh_task: Option<tokio::task::JoinHandle<Result<Dataset>>>,
},
/// In this mode, the dataset is a specific version. It cannot be mutated.
TimeTravel { dataset: Dataset, version: u64 },
@@ -41,9 +45,19 @@ impl DatasetRef {
Self::Latest {
dataset,
last_consistency_check,
refresh_task,
..
} => {
dataset.checkout_latest().await?;
// Replace the refresh task
if let Some(refresh_task) = refresh_task {
refresh_task.abort();
}
let mut new_dataset = dataset.clone();
refresh_task.replace(tokio::spawn(async move {
new_dataset.checkout_latest().await?;
Ok(new_dataset)
}));
last_consistency_check.replace(Instant::now());
}
Self::TimeTravel { dataset, version } => {
@@ -57,26 +71,24 @@ impl DatasetRef {
matches!(self, Self::Latest { .. })
}
async fn need_reload(&self) -> Result<bool> {
Ok(match self {
Self::Latest { dataset, .. } => {
dataset.latest_version_id().await? != dataset.version().version
}
Self::TimeTravel { dataset, version } => dataset.version().version != *version,
})
fn strong_consistency(&self) -> bool {
matches!(
self,
Self::Latest { read_consistency_interval: Some(interval), .. }
if interval.as_nanos() == 0
)
}
async fn as_latest(&mut self, read_consistency_interval: Option<Duration>) -> Result<()> {
match self {
Self::Latest { .. } => Ok(()),
Self::TimeTravel { dataset, .. } => {
dataset
.checkout_version(dataset.latest_version_id().await?)
.await?;
dataset.checkout_latest().await?;
*self = Self::Latest {
dataset: dataset.clone(),
read_consistency_interval,
last_consistency_check: Some(Instant::now()),
refresh_task: None,
};
Ok(())
}
@@ -114,13 +126,74 @@ impl DatasetRef {
match self {
Self::Latest {
dataset: ref mut ds,
refresh_task,
last_consistency_check,
..
} => {
*ds = dataset;
if let Some(refresh_task) = refresh_task {
refresh_task.abort();
}
*refresh_task = None;
*last_consistency_check = Some(Instant::now());
}
_ => unreachable!("Dataset should be in latest mode at this point"),
}
}
/// Wait for the background refresh task to complete.
async fn await_refresh(&mut self) -> Result<()> {
if let Self::Latest {
refresh_task: Some(refresh_task),
read_consistency_interval,
..
} = self
{
let dataset = refresh_task.await.expect("Refresh task panicked")?;
*self = Self::Latest {
dataset,
read_consistency_interval: *read_consistency_interval,
last_consistency_check: Some(Instant::now()),
refresh_task: None,
};
}
Ok(())
}
/// Check if background refresh task is done, and if so, update the dataset.
fn check_refresh(&mut self) -> Result<()> {
if let Self::Latest {
refresh_task: Some(refresh_task),
read_consistency_interval,
..
} = self
{
if refresh_task.is_finished() {
let dataset = refresh_task
.now_or_never()
.unwrap()
.expect("Refresh task panicked")?;
*self = Self::Latest {
dataset,
read_consistency_interval: *read_consistency_interval,
last_consistency_check: Some(Instant::now()),
refresh_task: None,
};
}
}
Ok(())
}
fn refresh_is_ready(&self) -> bool {
matches!(
self,
Self::Latest {
refresh_task: Some(refresh_task),
..
}
if refresh_task.is_finished()
)
}
}
impl DatasetConsistencyWrapper {
@@ -130,6 +203,7 @@ impl DatasetConsistencyWrapper {
dataset,
read_consistency_interval,
last_consistency_check: Some(Instant::now()),
refresh_task: None,
})))
}
@@ -188,18 +262,9 @@ impl DatasetConsistencyWrapper {
}
pub async fn reload(&self) -> Result<()> {
if !self.0.read().await.need_reload().await? {
return Ok(());
}
let mut write_guard = self.0.write().await;
// on lock escalation -- check if someone else has already reloaded
if !write_guard.need_reload().await? {
return Ok(());
}
// actually need reloading
write_guard.reload().await
write_guard.reload().await?;
write_guard.await_refresh().await
}
/// Returns the version, if in time travel mode, or None otherwise
@@ -245,9 +310,26 @@ impl DatasetConsistencyWrapper {
/// Ensures that the dataset is loaded and up-to-date with consistency and
/// version parameters.
async fn ensure_up_to_date(&self) -> Result<()> {
// We may have previously created a background task to fetch the new
// version of the dataset. If that task is done, we should update the
// dataset.
{
let read_guard = self.0.read().await;
if read_guard.refresh_is_ready() {
drop(read_guard);
self.0.write().await.check_refresh()?;
}
}
if !self.is_up_to_date().await? {
self.reload().await?;
}
// If we are in strong consistency mode, we should await the refresh task.
if self.0.read().await.strong_consistency() {
self.0.write().await.await_refresh().await?;
}
Ok(())
}
}

View File

@@ -135,6 +135,7 @@ pub fn supported_btree_data_type(dtype: &DataType) -> bool {
| DataType::Date32
| DataType::Date64
| DataType::Timestamp(_, _)
| DataType::FixedSizeBinary(_)
)
}