feat: support a bunch or FTS features in JS SDK (#2431)

- operator for match query
- slop for phrase query
- boolean query

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced support for boolean full-text search queries with AND/OR
logic and occurrence conditions.
- Added operator options for match and multi-match queries to control
term combination logic.
- Enabled phrase queries to specify proximity (slop) for flexible phrase
matching.
- Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery`
class for enhanced query expressiveness.

- **Bug Fixes**
- Improved validation and error handling for invalid operator and
occurrence inputs in full-text queries.

- **Tests**
- Expanded test coverage with new cases for boolean queries and
operator-based full-text searches.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
This commit is contained in:
BubbleCal
2025-06-12 17:04:19 +08:00
committed by GitHub
parent 84ded9d678
commit fec8d58f06
14 changed files with 291 additions and 8 deletions

View File

@@ -42,6 +42,7 @@ duckdb.query("SELECT * FROM arrow_table")
Have the required imports before doing any querying. Have the required imports before doing any querying.
=== "Python" === "Python"
```python ```python
--8<-- "python/python/tests/docs/test_guide_tables.py:import-lancedb" --8<-- "python/python/tests/docs/test_guide_tables.py:import-lancedb"
--8<-- "python/python/tests/docs/test_guide_tables.py:import-session-context" --8<-- "python/python/tests/docs/test_guide_tables.py:import-session-context"
@@ -51,6 +52,7 @@ Have the required imports before doing any querying.
Register the table created with the Datafusion session context. Register the table created with the Datafusion session context.
=== "Python" === "Python"
```python ```python
--8<-- "python/python/tests/docs/test_guide_tables.py:lance_sql_basic" --8<-- "python/python/tests/docs/test_guide_tables.py:lance_sql_basic"
``` ```

View File

@@ -0,0 +1,53 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / BooleanQuery
# Class: BooleanQuery
Represents a full-text query interface.
This interface defines the structure and behavior for full-text queries,
including methods to retrieve the query type and convert the query to a dictionary format.
## Implements
- [`FullTextQuery`](../interfaces/FullTextQuery.md)
## Constructors
### new BooleanQuery()
```ts
new BooleanQuery(queries): BooleanQuery
```
Creates an instance of BooleanQuery.
#### Parameters
* **queries**: [[`Occur`](../enumerations/Occur.md), [`FullTextQuery`](../interfaces/FullTextQuery.md)][]
An array of (Occur, FullTextQuery objects) to combine.
Occur specifies whether the query must match, or should match.
#### Returns
[`BooleanQuery`](BooleanQuery.md)
## Methods
### queryType()
```ts
queryType(): FullTextQueryType
```
The type of the full-text query.
#### Returns
[`FullTextQueryType`](../enumerations/FullTextQueryType.md)
#### Implementation of
[`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype)

View File

@@ -40,6 +40,7 @@ Creates an instance of MatchQuery.
- `boost`: The boost factor for the query (default is 1.0). - `boost`: The boost factor for the query (default is 1.0).
- `fuzziness`: The fuzziness level for the query (default is 0). - `fuzziness`: The fuzziness level for the query (default is 0).
- `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50). - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boost?**: `number` * **options.boost?**: `number`
@@ -47,6 +48,8 @@ Creates an instance of MatchQuery.
* **options.maxExpansions?**: `number` * **options.maxExpansions?**: `number`
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns #### Returns
[`MatchQuery`](MatchQuery.md) [`MatchQuery`](MatchQuery.md)

View File

@@ -38,9 +38,12 @@ Creates an instance of MultiMatchQuery.
* **options?** * **options?**
Optional parameters for the multi-match query. Optional parameters for the multi-match query.
- `boosts`: An array of boost factors for each column (default is 1.0 for all). - `boosts`: An array of boost factors for each column (default is 1.0 for all).
- `operator`: The logical operator to use for combining terms in the query (default is "OR").
* **options.boosts?**: `number`[] * **options.boosts?**: `number`[]
* **options.operator?**: [`Operator`](../enumerations/Operator.md)
#### Returns #### Returns
[`MultiMatchQuery`](MultiMatchQuery.md) [`MultiMatchQuery`](MultiMatchQuery.md)

View File

@@ -19,7 +19,10 @@ including methods to retrieve the query type and convert the query to a dictiona
### new PhraseQuery() ### new PhraseQuery()
```ts ```ts
new PhraseQuery(query, column): PhraseQuery new PhraseQuery(
query,
column,
options?): PhraseQuery
``` ```
Creates an instance of `PhraseQuery`. Creates an instance of `PhraseQuery`.
@@ -32,6 +35,12 @@ Creates an instance of `PhraseQuery`.
* **column**: `string` * **column**: `string`
The name of the column to search within. The name of the column to search within.
* **options?**
Optional parameters for the phrase query.
- `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
* **options.slop?**: `number`
#### Returns #### Returns
[`PhraseQuery`](PhraseQuery.md) [`PhraseQuery`](PhraseQuery.md)

View File

@@ -15,6 +15,14 @@ Enum representing the types of full-text queries supported.
## Enumeration Members ## Enumeration Members
### Boolean
```ts
Boolean: "boolean";
```
***
### Boost ### Boost
```ts ```ts

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Occur
# Enumeration: Occur
Enum representing the occurrence of terms in full-text queries.
- `Must`: The term must be present in the document.
- `Should`: The term should contribute to the document score, but is not required.
## Enumeration Members
### Must
```ts
Must: "MUST";
```
***
### Should
```ts
Should: "SHOULD";
```

View File

@@ -0,0 +1,28 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Operator
# Enumeration: Operator
Enum representing the logical operators used in full-text queries.
- `And`: All terms must match.
- `Or`: At least one term must match.
## Enumeration Members
### And
```ts
And: "AND";
```
***
### Or
```ts
Or: "OR";
```

View File

@@ -12,9 +12,12 @@
## Enumerations ## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md) - [FullTextQueryType](enumerations/FullTextQueryType.md)
- [Occur](enumerations/Occur.md)
- [Operator](enumerations/Operator.md)
## Classes ## Classes
- [BooleanQuery](classes/BooleanQuery.md)
- [BoostQuery](classes/BoostQuery.md) - [BoostQuery](classes/BoostQuery.md)
- [Connection](classes/Connection.md) - [Connection](classes/Connection.md)
- [Index](classes/Index.md) - [Index](classes/Index.md)

View File

@@ -7,3 +7,4 @@ tantivy==0.20.1
--extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://download.pytorch.org/whl/cpu
torch torch
polars>=0.19, <=1.3.0 polars>=0.19, <=1.3.0
datafusion

View File

@@ -33,7 +33,12 @@ import {
register, register,
} from "../lancedb/embedding"; } from "../lancedb/embedding";
import { Index } from "../lancedb/indices"; import { Index } from "../lancedb/indices";
import { instanceOfFullTextQuery } from "../lancedb/query"; import {
BooleanQuery,
Occur,
Operator,
instanceOfFullTextQuery,
} from "../lancedb/query";
import exp = require("constants"); import exp = require("constants");
describe.each([arrow15, arrow16, arrow17, arrow18])( describe.each([arrow15, arrow16, arrow17, arrow18])(
@@ -1531,6 +1536,18 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
const results = await table.search("hello").toArray(); const results = await table.search("hello").toArray();
expect(results[0].text).toBe(data[0].text); expect(results[0].text).toBe(data[0].text);
const results2 = await table
.search(new MatchQuery("hello world", "text"))
.toArray();
expect(results2.length).toBe(2);
const results3 = await table
.search(
new MatchQuery("hello world", "text", { operator: Operator.And }),
)
.toArray();
expect(results3.length).toBe(1);
}); });
test("full text search without lowercase", async () => { test("full text search without lowercase", async () => {
@@ -1609,6 +1626,38 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(resultSet.has("food")).toBe(true); expect(resultSet.has("food")).toBe(true);
}); });
test("full text search boolean query", async () => {
const db = await connect(tmpDir.name);
const data = [
{ text: "hello world", vector: [0.1, 0.2, 0.3] },
{ text: "goodbye world", vector: [0.4, 0.5, 0.6] },
];
const table = await db.createTable("test", data);
await table.createIndex("text", {
config: Index.fts({ withPosition: false }),
});
const shouldResults = await table
.search(
new BooleanQuery([
[Occur.Should, new MatchQuery("hello", "text")],
[Occur.Should, new MatchQuery("goodbye", "text")],
]),
)
.toArray();
expect(shouldResults.length).toBe(2);
const mustResults = await table
.search(
new BooleanQuery([
[Occur.Must, new MatchQuery("hello", "text")],
[Occur.Must, new MatchQuery("world", "text")],
]),
)
.toArray();
expect(mustResults.length).toBe(1);
});
test.each([ test.each([
[0.4, 0.5, 0.599], // number[] [0.4, 0.5, 0.599], // number[]
Float32Array.of(0.4, 0.5, 0.599), // Float32Array Float32Array.of(0.4, 0.5, 0.599), // Float32Array

View File

@@ -64,7 +64,10 @@ export {
PhraseQuery, PhraseQuery,
BoostQuery, BoostQuery,
MultiMatchQuery, MultiMatchQuery,
BooleanQuery,
FullTextQueryType, FullTextQueryType,
Operator,
Occur,
} from "./query"; } from "./query";
export { export {

View File

@@ -762,6 +762,29 @@ export enum FullTextQueryType {
MatchPhrase = "match_phrase", MatchPhrase = "match_phrase",
Boost = "boost", Boost = "boost",
MultiMatch = "multi_match", MultiMatch = "multi_match",
Boolean = "boolean",
}
/**
* Enum representing the logical operators used in full-text queries.
*
* - `And`: All terms must match.
* - `Or`: At least one term must match.
*/
export enum Operator {
And = "AND",
Or = "OR",
}
/**
* Enum representing the occurrence of terms in full-text queries.
*
* - `Must`: The term must be present in the document.
* - `Should`: The term should contribute to the document score, but is not required.
*/
export enum Occur {
Must = "MUST",
Should = "SHOULD",
} }
/** /**
@@ -791,6 +814,7 @@ export function instanceOfFullTextQuery(obj: any): obj is FullTextQuery {
export class MatchQuery implements FullTextQuery { export class MatchQuery implements FullTextQuery {
/** @ignore */ /** @ignore */
public readonly inner: JsFullTextQuery; public readonly inner: JsFullTextQuery;
/** /**
* Creates an instance of MatchQuery. * Creates an instance of MatchQuery.
* *
@@ -800,6 +824,7 @@ export class MatchQuery implements FullTextQuery {
* - `boost`: The boost factor for the query (default is 1.0). * - `boost`: The boost factor for the query (default is 1.0).
* - `fuzziness`: The fuzziness level for the query (default is 0). * - `fuzziness`: The fuzziness level for the query (default is 0).
* - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50). * - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50).
* - `operator`: The logical operator to use for combining terms in the query (default is "OR").
*/ */
constructor( constructor(
query: string, query: string,
@@ -808,6 +833,7 @@ export class MatchQuery implements FullTextQuery {
boost?: number; boost?: number;
fuzziness?: number; fuzziness?: number;
maxExpansions?: number; maxExpansions?: number;
operator?: Operator;
}, },
) { ) {
let fuzziness = options?.fuzziness; let fuzziness = options?.fuzziness;
@@ -820,6 +846,7 @@ export class MatchQuery implements FullTextQuery {
options?.boost ?? 1.0, options?.boost ?? 1.0,
fuzziness, fuzziness,
options?.maxExpansions ?? 50, options?.maxExpansions ?? 50,
options?.operator ?? Operator.Or,
); );
} }
@@ -836,9 +863,11 @@ export class PhraseQuery implements FullTextQuery {
* *
* @param query - The phrase to search for in the specified column. * @param query - The phrase to search for in the specified column.
* @param column - The name of the column to search within. * @param column - The name of the column to search within.
* @param options - Optional parameters for the phrase query.
* - `slop`: The maximum number of intervening unmatched positions allowed between words in the phrase (default is 0).
*/ */
constructor(query: string, column: string) { constructor(query: string, column: string, options?: { slop?: number }) {
this.inner = JsFullTextQuery.phraseQuery(query, column); this.inner = JsFullTextQuery.phraseQuery(query, column, options?.slop ?? 0);
} }
queryType(): FullTextQueryType { queryType(): FullTextQueryType {
@@ -889,18 +918,21 @@ export class MultiMatchQuery implements FullTextQuery {
* @param columns - An array of column names to search within. * @param columns - An array of column names to search within.
* @param options - Optional parameters for the multi-match query. * @param options - Optional parameters for the multi-match query.
* - `boosts`: An array of boost factors for each column (default is 1.0 for all). * - `boosts`: An array of boost factors for each column (default is 1.0 for all).
* - `operator`: The logical operator to use for combining terms in the query (default is "OR").
*/ */
constructor( constructor(
query: string, query: string,
columns: string[], columns: string[],
options?: { options?: {
boosts?: number[]; boosts?: number[];
operator?: Operator;
}, },
) { ) {
this.inner = JsFullTextQuery.multiMatchQuery( this.inner = JsFullTextQuery.multiMatchQuery(
query, query,
columns, columns,
options?.boosts, options?.boosts,
options?.operator ?? Operator.Or,
); );
} }
@@ -908,3 +940,23 @@ export class MultiMatchQuery implements FullTextQuery {
return FullTextQueryType.MultiMatch; return FullTextQueryType.MultiMatch;
} }
} }
export class BooleanQuery implements FullTextQuery {
/** @ignore */
public readonly inner: JsFullTextQuery;
/**
* Creates an instance of BooleanQuery.
*
* @param queries - An array of (Occur, FullTextQuery objects) to combine.
* Occur specifies whether the query must match, or should match.
*/
constructor(queries: [Occur, FullTextQuery][]) {
this.inner = JsFullTextQuery.booleanQuery(
queries.map(([occur, query]) => [occur, query.inner]),
);
}
queryType(): FullTextQueryType {
return FullTextQueryType.Boolean;
}
}

View File

@@ -4,7 +4,8 @@
use std::sync::Arc; use std::sync::Arc;
use lancedb::index::scalar::{ use lancedb::index::scalar::{
BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, PhraseQuery, BooleanQuery, BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, Occur,
Operator, PhraseQuery,
}; };
use lancedb::query::ExecutableQuery; use lancedb::query::ExecutableQuery;
use lancedb::query::Query as LanceDbQuery; use lancedb::query::Query as LanceDbQuery;
@@ -308,6 +309,7 @@ impl JsFullTextQuery {
boost: f64, boost: f64,
fuzziness: Option<u32>, fuzziness: Option<u32>,
max_expansions: u32, max_expansions: u32,
operator: String,
) -> napi::Result<Self> { ) -> napi::Result<Self> {
Ok(Self { Ok(Self {
inner: MatchQuery::new(query) inner: MatchQuery::new(query)
@@ -315,14 +317,22 @@ impl JsFullTextQuery {
.with_boost(boost as f32) .with_boost(boost as f32)
.with_fuzziness(fuzziness) .with_fuzziness(fuzziness)
.with_max_expansions(max_expansions as usize) .with_max_expansions(max_expansions as usize)
.with_operator(
Operator::try_from(operator.as_str()).map_err(|e| {
napi::Error::from_reason(format!("Invalid operator: {}", e))
})?,
)
.into(), .into(),
}) })
} }
#[napi(factory)] #[napi(factory)]
pub fn phrase_query(query: String, column: String) -> napi::Result<Self> { pub fn phrase_query(query: String, column: String, slop: u32) -> napi::Result<Self> {
Ok(Self { Ok(Self {
inner: PhraseQuery::new(query).with_column(Some(column)).into(), inner: PhraseQuery::new(query)
.with_column(Some(column))
.with_slop(slop)
.into(),
}) })
} }
@@ -348,6 +358,7 @@ impl JsFullTextQuery {
query: String, query: String,
columns: Vec<String>, columns: Vec<String>,
boosts: Option<Vec<f64>>, boosts: Option<Vec<f64>>,
operator: String,
) -> napi::Result<Self> { ) -> napi::Result<Self> {
let q = match boosts { let q = match boosts {
Some(boosts) => MultiMatchQuery::try_new(query, columns) Some(boosts) => MultiMatchQuery::try_new(query, columns)
@@ -358,7 +369,37 @@ impl JsFullTextQuery {
napi::Error::from_reason(format!("Failed to create multi match query: {}", e)) napi::Error::from_reason(format!("Failed to create multi match query: {}", e))
})?; })?;
Ok(Self { inner: q.into() }) let operator = Operator::try_from(operator.as_str()).map_err(|e| {
napi::Error::from_reason(format!("Invalid operator for multi match query: {}", e))
})?;
Ok(Self {
inner: q.with_operator(operator).into(),
})
}
#[napi(factory)]
pub fn boolean_query(queries: Vec<(String, &JsFullTextQuery)>) -> napi::Result<Self> {
let mut sub_queries = Vec::with_capacity(queries.len());
for (occur, q) in queries {
let occur = Occur::try_from(occur.as_str())
.map_err(|e| napi::Error::from_reason(e.to_string()))?;
sub_queries.push((occur, q.inner.clone()));
}
Ok(Self {
inner: BooleanQuery::new(sub_queries).into(),
})
}
#[napi(getter)]
pub fn query_type(&self) -> String {
match self.inner {
FtsQuery::Match(_) => "match".to_string(),
FtsQuery::Phrase(_) => "phrase".to_string(),
FtsQuery::Boost(_) => "boost".to_string(),
FtsQuery::MultiMatch(_) => "multi_match".to_string(),
FtsQuery::Boolean(_) => "boolean".to_string(),
}
} }
} }