feat: support a bunch or FTS features in JS SDK (#2431)

- operator for match query
- slop for phrase query
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for boolean full-text search queries with AND/OR
logic and occurrence conditions.
- Added operator options for match and multi-match queries to control
term combination logic.
- Enabled phrase queries to specify proximity (slop) for flexible phrase
matching.
- Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery`
class for enhanced query expressiveness.

- **Bug Fixes**
- Improved validation and error handling for invalid operator and
occurrence inputs in full-text queries.

- **Tests**
- Expanded test coverage with new cases for boolean queries and
operator-based full-text searches.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
This commit is contained in:
BubbleCal
2025-06-12 17:04:19 +08:00
committed by GitHub
parent 84ded9d678
commit fec8d58f06
14 changed files with 291 additions and 8 deletions

View File

@@ -42,6 +42,7 @@ duckdb.query("SELECT * FROM arrow_table")
Have the required imports before doing any querying.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:import-lancedb"
--8<-- "python/python/tests/docs/test_guide_tables.py:import-session-context"
@@ -51,6 +52,7 @@ Have the required imports before doing any querying.
Register the table created with the Datafusion session context.
=== "Python"
```python
--8<-- "python/python/tests/docs/test_guide_tables.py:lance_sql_basic"
```

View File

@@ -0,0 +1,53 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BooleanQuery
# Class: BooleanQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BooleanQuery()
```ts
new BooleanQuery(queries): BooleanQuery
```
Creates an instance of BooleanQuery.
#### Parameters
* **queries**: [[`Occur`](../enumerations/Occur.md), [`FullTextQuery`](../interfaces/FullTextQuery.md)][]
An array of (Occur, FullTextQuery objects) to combine.
Occur specifies whether the query must match, or should match.
#### Returns
[`BooleanQuery`](BooleanQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -40,6 +40,7 @@ Creates an instance of MatchQuery.
- `boost`: The boost factor for the query (default is 1.0).
- `fuzziness`: The fuzziness level for the query (default is 0).
- `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boost?**: `number`
@@ -47,6 +48,8 @@ Creates an instance of MatchQuery.
* **options.maxExpansions?**: `number`
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns
[`MatchQuery`](MatchQuery.md)

View File

@@ -38,9 +38,12 @@ Creates an instance of MultiMatchQuery.
* **options?**
Optional parameters for the multi-match query.
- `boosts`: An array of boost factors for each column (default is 1.0 for all).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boosts?**: `number`[]
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns
[`MultiMatchQuery`](MultiMatchQuery.md)

View File

@@ -19,7 +19,10 @@ including methods to retrieve the query type and convert the query to a dictiona
### new PhraseQuery()
```ts
new PhraseQuery(query, column): PhraseQuery
new PhraseQuery(
query,
column,
options?): PhraseQuery
```
Creates an instance of `PhraseQuery`.
@@ -32,6 +35,12 @@ Creates an instance of `PhraseQuery`.
* **column**: `string`
The name of the column to search within.
* **options?**
Optional parameters for the phrase query.
- `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
* **options.slop?**: `number`
#### Returns
[`PhraseQuery`](PhraseQuery.md)

View File

@@ -15,6 +15,14 @@ Enum representing the types of full-text queries supported.
## Enumeration Members
### Boolean
```ts
Boolean: "boolean";
```
***
### Boost
```ts

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Occur
# Enumeration: Occur
Enum representing the occurrence of terms in full-text queries.
- `Must`: The term must be present in the document.
- `Should`: The term should contribute to the document score, but is not required.
## Enumeration Members
### Must
```ts
Must: "MUST";
```
***
### Should
```ts
Should: "SHOULD";
```

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Operator
# Enumeration: Operator
Enum representing the logical operators used in full-text queries.
- `And`: All terms must match.
- `Or`: At least one term must match.
## Enumeration Members
### And
```ts
And: "AND";
```
***
### Or
```ts
Or: "OR";
```

View File

@@ -12,9 +12,12 @@
## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md)
- [Occur](enumerations/Occur.md)
- [Operator](enumerations/Operator.md)
## Classes
- [BooleanQuery](classes/BooleanQuery.md)
- [BoostQuery](classes/BoostQuery.md)
- [Connection](classes/Connection.md)
- [Index](classes/Index.md)

View File

@@ -7,3 +7,4 @@ tantivy==0.20.1
--extra-index-url https://download.pytorch.org/whl/cpu
torch
polars>=0.19, <=1.3.0
datafusion

View File

@@ -33,7 +33,12 @@ import {
register,
} from "../lancedb/embedding";
import { Index } from "../lancedb/indices";
import { instanceOfFullTextQuery } from "../lancedb/query";
import {
BooleanQuery,
Occur,
Operator,
instanceOfFullTextQuery,
} from "../lancedb/query";
import exp = require("constants");
describe.each([arrow15, arrow16, arrow17, arrow18])(
@@ -1531,6 +1536,18 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
const results = await table.search("hello").toArray();
expect(results[0].text).toBe(data[0].text);
const results2 = await table
.search(new MatchQuery("hello world", "text"))
.toArray();
expect(results2.length).toBe(2);
const results3 = await table
.search(
new MatchQuery("hello world", "text", { operator: Operator.And }),
)
.toArray();
expect(results3.length).toBe(1);
});
test("full text search without lowercase", async () => {
@@ -1609,6 +1626,38 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(resultSet.has("food")).toBe(true);
});
test("full text search boolean query", async () => {
const db = await connect(tmpDir.name);
const data = [
{ text: "hello world", vector: [0.1, 0.2, 0.3] },
{ text: "goodbye world", vector: [0.4, 0.5, 0.6] },
];
const table = await db.createTable("test", data);
await table.createIndex("text", {
config: Index.fts({ withPosition: false }),
});
const shouldResults = await table
.search(
new BooleanQuery([
[Occur.Should, new MatchQuery("hello", "text")],
[Occur.Should, new MatchQuery("goodbye", "text")],
]),
)
.toArray();
expect(shouldResults.length).toBe(2);
const mustResults = await table
.search(
new BooleanQuery([
[Occur.Must, new MatchQuery("hello", "text")],
[Occur.Must, new MatchQuery("world", "text")],
]),
)
.toArray();
expect(mustResults.length).toBe(1);
});
test.each([
[0.4, 0.5, 0.599], // number[]
Float32Array.of(0.4, 0.5, 0.599), // Float32Array

View File

@@ -64,7 +64,10 @@ export {
PhraseQuery,
BoostQuery,
MultiMatchQuery,
BooleanQuery,
FullTextQueryType,
Operator,
Occur,
} from "./query";
export {

View File

@@ -762,6 +762,29 @@ export enum FullTextQueryType {
MatchPhrase = "match_phrase",
Boost = "boost",
MultiMatch = "multi_match",
Boolean = "boolean",
}
/**
* Enum representing the logical operators used in full-text queries.
*
* - `And`: All terms must match.
* - `Or`: At least one term must match.
*/
export enum Operator {
And = "AND",
Or = "OR",
}
/**
* Enum representing the occurrence of terms in full-text queries.
*
* - `Must`: The term must be present in the document.
* - `Should`: The term should contribute to the document score, but is not required.
*/
export enum Occur {
Must = "MUST",
Should = "SHOULD",
}
/**
@@ -791,6 +814,7 @@ export function instanceOfFullTextQuery(obj: any): obj is FullTextQuery {
export class MatchQuery implements FullTextQuery {
/** @ignore */
public readonly inner: JsFullTextQuery;
/**
* Creates an instance of MatchQuery.
*
@@ -800,6 +824,7 @@ export class MatchQuery implements FullTextQuery {
* - `boost`: The boost factor for the query (default is 1.0).
* - `fuzziness`: The fuzziness level for the query (default is 0).
* - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
* - `operator`: The logical operator to use for combining terms in the query (default is "OR").
*/
constructor(
query: string,
@@ -808,6 +833,7 @@ export class MatchQuery implements FullTextQuery {
boost?: number;
fuzziness?: number;
maxExpansions?: number;
operator?: Operator;
},
) {
let fuzziness = options?.fuzziness;
@@ -820,6 +846,7 @@ export class MatchQuery implements FullTextQuery {
options?.boost ?? 1.0,
fuzziness,
options?.maxExpansions ?? 50,
options?.operator ?? Operator.Or,
);
}
@@ -836,9 +863,11 @@ export class PhraseQuery implements FullTextQuery {
*
* @param query - The phrase to search for in the specified column.
* @param column - The name of the column to search within.
* @param options - Optional parameters for the phrase query.
* - `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
*/
constructor(query: string, column: string) {
this.inner = JsFullTextQuery.phraseQuery(query, column);
constructor(query: string, column: string, options?: { slop?: number }) {
this.inner = JsFullTextQuery.phraseQuery(query, column, options?.slop ?? 0);
}
queryType(): FullTextQueryType {
@@ -889,18 +918,21 @@ export class MultiMatchQuery implements FullTextQuery {
* @param columns - An array of column names to search within.
* @param options - Optional parameters for the multi-match query.
* - `boosts`: An array of boost factors for each column (default is 1.0 for all).
* - `operator`: The logical operator to use for combining terms in the query (default is "OR").
*/
constructor(
query: string,
columns: string[],
options?: {
boosts?: number[];
operator?: Operator;
},
) {
this.inner = JsFullTextQuery.multiMatchQuery(
query,
columns,
options?.boosts,
options?.operator ?? Operator.Or,
);
}
@@ -908,3 +940,23 @@ export class MultiMatchQuery implements FullTextQuery {
return FullTextQueryType.MultiMatch;
}
}
export class BooleanQuery implements FullTextQuery {
/** @ignore */
public readonly inner: JsFullTextQuery;
/**
* Creates an instance of BooleanQuery.
*
* @param queries - An array of (Occur, FullTextQuery objects) to combine.
* Occur specifies whether the query must match, or should match.
*/
constructor(queries: [Occur, FullTextQuery][]) {
this.inner = JsFullTextQuery.booleanQuery(
queries.map(([occur, query]) => [occur, query.inner]),
);
}
queryType(): FullTextQueryType {
return FullTextQueryType.Boolean;
}
}

View File

@@ -4,7 +4,8 @@
use std::sync::Arc;
use lancedb::index::scalar::{
BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, PhraseQuery,
BooleanQuery, BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, Occur,
Operator, PhraseQuery,
};
use lancedb::query::ExecutableQuery;
use lancedb::query::Query as LanceDbQuery;
@@ -308,6 +309,7 @@ impl JsFullTextQuery {
boost: f64,
fuzziness: Option<u32>,
max_expansions: u32,
operator: String,
) -> napi::Result<Self> {
Ok(Self {
inner: MatchQuery::new(query)
@@ -315,14 +317,22 @@ impl JsFullTextQuery {
.with_boost(boost as f32)
.with_fuzziness(fuzziness)
.with_max_expansions(max_expansions as usize)
.with_operator(
Operator::try_from(operator.as_str()).map_err(|e| {
napi::Error::from_reason(format!("Invalid operator: {}", e))
})?,
)
.into(),
})
}
#[napi(factory)]
pub fn phrase_query(query: String, column: String) -> napi::Result<Self> {
pub fn phrase_query(query: String, column: String, slop: u32) -> napi::Result<Self> {
Ok(Self {
inner: PhraseQuery::new(query).with_column(Some(column)).into(),
inner: PhraseQuery::new(query)
.with_column(Some(column))
.with_slop(slop)
.into(),
})
}
@@ -348,6 +358,7 @@ impl JsFullTextQuery {
query: String,
columns: Vec<String>,
boosts: Option<Vec<f64>>,
operator: String,
) -> napi::Result<Self> {
let q = match boosts {
Some(boosts) => MultiMatchQuery::try_new(query, columns)
@@ -358,7 +369,37 @@ impl JsFullTextQuery {
napi::Error::from_reason(format!("Failed to create multi match query: {}", e))
})?;
Ok(Self { inner: q.into() })
let operator = Operator::try_from(operator.as_str()).map_err(|e| {
napi::Error::from_reason(format!("Invalid operator for multi match query: {}", e))
})?;
Ok(Self {
inner: q.with_operator(operator).into(),
})
}
#[napi(factory)]
pub fn boolean_query(queries: Vec<(String, &JsFullTextQuery)>) -> napi::Result<Self> {
let mut sub_queries = Vec::with_capacity(queries.len());
for (occur, q) in queries {
let occur = Occur::try_from(occur.as_str())
.map_err(|e| napi::Error::from_reason(e.to_string()))?;
sub_queries.push((occur, q.inner.clone()));
}
Ok(Self {
inner: BooleanQuery::new(sub_queries).into(),
})
}
#[napi(getter)]
pub fn query_type(&self) -> String {
match self.inner {
FtsQuery::Match(_) => "match".to_string(),
FtsQuery::Phrase(_) => "phrase".to_string(),
FtsQuery::Boost(_) => "boost".to_string(),
FtsQuery::MultiMatch(_) => "multi_match".to_string(),
FtsQuery::Boolean(_) => "boolean".to_string(),
}
}
}