diff --git a/docs/src/js/classes/BoostQuery.md b/docs/src/js/classes/BoostQuery.md index 450c3cc3..b69c6c5a 100644 --- a/docs/src/js/classes/BoostQuery.md +++ b/docs/src/js/classes/BoostQuery.md @@ -22,10 +22,13 @@ including methods to retrieve the query type and convert the query to a dictiona new BoostQuery( positive, negative, - negativeBoost): BoostQuery + options?): BoostQuery ``` Creates an instance of BoostQuery. +The boost returns documents that match the positive query, +but penalizes those that match the negative query. +the penalty is controlled by the `negativeBoost` parameter. #### Parameters @@ -35,8 +38,11 @@ Creates an instance of BoostQuery. * **negative**: [`FullTextQuery`](../interfaces/FullTextQuery.md) The negative query that reduces the relevance score. -* **negativeBoost**: `number` - The factor by which the negative query reduces the score. +* **options?** + Optional parameters for the boost query. + - `negativeBoost`: The boost factor for the negative query (default is 0.0). + +* **options.negativeBoost?**: `number` #### Returns @@ -50,6 +56,8 @@ Creates an instance of BoostQuery. queryType(): FullTextQueryType ``` +The type of the full-text query. + #### Returns [`FullTextQueryType`](../enumerations/FullTextQueryType.md) @@ -57,19 +65,3 @@ queryType(): FullTextQueryType #### Implementation of [`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype) - -*** - -### toDict() - -```ts -toDict(): Record -``` - -#### Returns - -`Record`<`string`, `unknown`> - -#### Implementation of - -[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict) diff --git a/docs/src/js/classes/MatchQuery.md b/docs/src/js/classes/MatchQuery.md index 9b0e6a95..aa936930 100644 --- a/docs/src/js/classes/MatchQuery.md +++ b/docs/src/js/classes/MatchQuery.md @@ -22,9 +22,7 @@ including methods to retrieve the query type and convert the query to a dictiona new MatchQuery( query, column, - boost, - fuzziness, - maxExpansions): MatchQuery + options?): MatchQuery ``` Creates an instance of MatchQuery. @@ -37,14 +35,17 @@ Creates an instance of MatchQuery. * **column**: `string` The name of the column to search within. -* **boost**: `number` = `1.0` - (Optional) The boost factor to influence the relevance score of this query. Default is `1.0`. +* **options?** + Optional parameters for the match query. + - `boost`: The boost factor for the query (default is 1.0). + - `fuzziness`: The fuzziness level for the query (default is 0). + - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50). -* **fuzziness**: `number` = `0` - (Optional) The allowed edit distance for fuzzy matching. Default is `0`. +* **options.boost?**: `number` -* **maxExpansions**: `number` = `50` - (Optional) The maximum number of terms to consider for fuzzy matching. Default is `50`. +* **options.fuzziness?**: `number` + +* **options.maxExpansions?**: `number` #### Returns @@ -58,6 +59,8 @@ Creates an instance of MatchQuery. queryType(): FullTextQueryType ``` +The type of the full-text query. + #### Returns [`FullTextQueryType`](../enumerations/FullTextQueryType.md) @@ -65,19 +68,3 @@ queryType(): FullTextQueryType #### Implementation of [`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype) - -*** - -### toDict() - -```ts -toDict(): Record -``` - -#### Returns - -`Record`<`string`, `unknown`> - -#### Implementation of - -[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict) diff --git a/docs/src/js/classes/MultiMatchQuery.md b/docs/src/js/classes/MultiMatchQuery.md index f1e5673f..dca5d128 100644 --- a/docs/src/js/classes/MultiMatchQuery.md +++ b/docs/src/js/classes/MultiMatchQuery.md @@ -22,7 +22,7 @@ including methods to retrieve the query type and convert the query to a dictiona new MultiMatchQuery( query, columns, - boosts): MultiMatchQuery + options?): MultiMatchQuery ``` Creates an instance of MultiMatchQuery. @@ -35,10 +35,11 @@ Creates an instance of MultiMatchQuery. * **columns**: `string`[] An array of column names to search within. -* **boosts**: `number`[] = `...` - (Optional) An array of boost factors corresponding to each column. Default is an array of 1.0 for each column. - The `boosts` array should have the same length as `columns`. If not provided, all columns will have a default boost of 1.0. - If the length of `boosts` is less than `columns`, it will be padded with 1.0s. +* **options?** + Optional parameters for the multi-match query. + - `boosts`: An array of boost factors for each column (default is 1.0 for all). + +* **options.boosts?**: `number`[] #### Returns @@ -52,6 +53,8 @@ Creates an instance of MultiMatchQuery. queryType(): FullTextQueryType ``` +The type of the full-text query. + #### Returns [`FullTextQueryType`](../enumerations/FullTextQueryType.md) @@ -59,19 +62,3 @@ queryType(): FullTextQueryType #### Implementation of [`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype) - -*** - -### toDict() - -```ts -toDict(): Record -``` - -#### Returns - -`Record`<`string`, `unknown`> - -#### Implementation of - -[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict) diff --git a/docs/src/js/classes/PhraseQuery.md b/docs/src/js/classes/PhraseQuery.md index 66b62dd4..10315de0 100644 --- a/docs/src/js/classes/PhraseQuery.md +++ b/docs/src/js/classes/PhraseQuery.md @@ -44,6 +44,8 @@ Creates an instance of `PhraseQuery`. queryType(): FullTextQueryType ``` +The type of the full-text query. + #### Returns [`FullTextQueryType`](../enumerations/FullTextQueryType.md) @@ -51,19 +53,3 @@ queryType(): FullTextQueryType #### Implementation of [`FullTextQuery`](../interfaces/FullTextQuery.md).[`queryType`](../interfaces/FullTextQuery.md#querytype) - -*** - -### toDict() - -```ts -toDict(): Record -``` - -#### Returns - -`Record`<`string`, `unknown`> - -#### Implementation of - -[`FullTextQuery`](../interfaces/FullTextQuery.md).[`toDict`](../interfaces/FullTextQuery.md#todict) diff --git a/docs/src/js/classes/Table.md b/docs/src/js/classes/Table.md index aed26c4e..26dfbb16 100644 --- a/docs/src/js/classes/Table.md +++ b/docs/src/js/classes/Table.md @@ -575,7 +575,7 @@ of the given query #### Parameters -* **query**: `string` \| [`IntoVector`](../type-aliases/IntoVector.md) +* **query**: `string` \| [`IntoVector`](../type-aliases/IntoVector.md) \| [`FullTextQuery`](../interfaces/FullTextQuery.md) the query, a vector or string * **queryType?**: `string` diff --git a/docs/src/js/interfaces/FullTextQuery.md b/docs/src/js/interfaces/FullTextQuery.md index bf63433e..8da76558 100644 --- a/docs/src/js/interfaces/FullTextQuery.md +++ b/docs/src/js/interfaces/FullTextQuery.md @@ -18,18 +18,8 @@ including methods to retrieve the query type and convert the query to a dictiona queryType(): FullTextQueryType ``` +The type of the full-text query. + #### Returns [`FullTextQueryType`](../enumerations/FullTextQueryType.md) - -*** - -### toDict() - -```ts -toDict(): Record -``` - -#### Returns - -`Record`<`string`, `unknown`> diff --git a/nodejs/__test__/table.test.ts b/nodejs/__test__/table.test.ts index 7976152c..0e977b46 100644 --- a/nodejs/__test__/table.test.ts +++ b/nodejs/__test__/table.test.ts @@ -10,7 +10,7 @@ import * as arrow16 from "apache-arrow-16"; import * as arrow17 from "apache-arrow-17"; import * as arrow18 from "apache-arrow-18"; -import { Table, connect } from "../lancedb"; +import { MatchQuery, PhraseQuery, Table, connect } from "../lancedb"; import { Table as ArrowTable, Field, @@ -33,6 +33,7 @@ import { register, } from "../lancedb/embedding"; import { Index } from "../lancedb/indices"; +import { instanceOfFullTextQuery } from "../lancedb/query"; describe.each([arrow15, arrow16, arrow17, arrow18])( "Given a table", @@ -1302,6 +1303,13 @@ describe.each([arrow15, arrow16, arrow17, arrow18])( const results = await table.search("hello").toArray(); expect(results[0].text).toBe(data[0].text); + + const query = new MatchQuery("goodbye", "text"); + expect(instanceOfFullTextQuery(query)).toBe(true); + const results2 = await table + .search(new MatchQuery("goodbye", "text")) + .toArray(); + expect(results2[0].text).toBe(data[1].text); }); test("full text index on list", async () => { @@ -1375,6 +1383,43 @@ describe.each([arrow15, arrow16, arrow17, arrow18])( expect(results.length).toBe(2); const phraseResults = await table.search('"hello world"').toArray(); expect(phraseResults.length).toBe(1); + const phraseResults2 = await table + .search(new PhraseQuery("hello world", "text")) + .toArray(); + expect(phraseResults2.length).toBe(1); + }); + + test("full text search fuzzy query", async () => { + const db = await connect(tmpDir.name); + const data = [ + { text: "fa", vector: [0.1, 0.2, 0.3] }, + { text: "fo", vector: [0.4, 0.5, 0.6] }, + { text: "fob", vector: [0.4, 0.5, 0.6] }, + { text: "focus", vector: [0.4, 0.5, 0.6] }, + { text: "foo", vector: [0.4, 0.5, 0.6] }, + { text: "food", vector: [0.4, 0.5, 0.6] }, + { text: "foul", vector: [0.4, 0.5, 0.6] }, + ]; + const table = await db.createTable("test", data); + await table.createIndex("text", { + config: Index.fts(), + }); + + const results = await table + .search(new MatchQuery("foo", "text")) + .toArray(); + expect(results.length).toBe(1); + expect(results[0].text).toBe("foo"); + + const fuzzyResults = await table + .search(new MatchQuery("foo", "text", { fuzziness: 1 })) + .toArray(); + expect(fuzzyResults.length).toBe(4); + const resultSet = new Set(fuzzyResults.map((r) => r.text)); + expect(resultSet.has("foo")).toBe(true); + expect(resultSet.has("fob")).toBe(true); + expect(resultSet.has("fo")).toBe(true); + expect(resultSet.has("food")).toBe(true); }); test.each([ diff --git a/nodejs/lancedb/query.ts b/nodejs/lancedb/query.ts index a8143a2e..80c221f1 100644 --- a/nodejs/lancedb/query.ts +++ b/nodejs/lancedb/query.ts @@ -11,6 +11,7 @@ import { } from "./arrow"; import { type IvfPqOptions } from "./indices"; import { + JsFullTextQuery, RecordBatchIterator as NativeBatchIterator, Query as NativeQuery, Table as NativeTable, @@ -177,9 +178,7 @@ export class QueryBase columns: columns, }); } else { - // If query is a FullTextQuery object, convert it to a dict - const queryObj = query.toDict(); - inner.fullTextSearch(queryObj); + inner.fullTextSearch({ query: query.inner }); } }); return this; @@ -743,8 +742,7 @@ export class Query extends QueryBase { columns: columns, }); } else { - const queryObj = query.toDict(); - inner.fullTextSearch(queryObj); + inner.fullTextSearch({ query: query.inner }); } }); return this; @@ -772,130 +770,141 @@ export enum FullTextQueryType { * including methods to retrieve the query type and convert the query to a dictionary format. */ export interface FullTextQuery { + /** + * Returns the inner query object. + * This is the underlying query object used by the database engine. + * @ignore + */ + inner: JsFullTextQuery; + + /** + * The type of the full-text query. + */ queryType(): FullTextQueryType; - toDict(): Record; +} + +// biome-ignore lint/suspicious/noExplicitAny: we want any here +export function instanceOfFullTextQuery(obj: any): obj is FullTextQuery { + return obj != null && obj.inner instanceof JsFullTextQuery; } export class MatchQuery implements FullTextQuery { + /** @ignore */ + public readonly inner: JsFullTextQuery; /** * Creates an instance of MatchQuery. * * @param query - The text query to search for. * @param column - The name of the column to search within. - * @param boost - (Optional) The boost factor to influence the relevance score of this query. Default is `1.0`. - * @param fuzziness - (Optional) The allowed edit distance for fuzzy matching. Default is `0`. - * @param maxExpansions - (Optional) The maximum number of terms to consider for fuzzy matching. Default is `50`. + * @param options - Optional parameters for the match query. + * - `boost`: The boost factor for the query (default is 1.0). + * - `fuzziness`: The fuzziness level for the query (default is 0). + * - `maxExpansions`: The maximum number of terms to consider for fuzzy matching (default is 50). */ constructor( - private query: string, - private column: string, - private boost: number = 1.0, - private fuzziness: number = 0, - private maxExpansions: number = 50, - ) {} + query: string, + column: string, + options?: { + boost?: number; + fuzziness?: number; + maxExpansions?: number; + }, + ) { + let fuzziness = options?.fuzziness; + if (fuzziness === undefined) { + fuzziness = 0; + } + this.inner = JsFullTextQuery.matchQuery( + query, + column, + options?.boost ?? 1.0, + fuzziness, + options?.maxExpansions ?? 50, + ); + } queryType(): FullTextQueryType { return FullTextQueryType.Match; } - - toDict(): Record { - return { - [this.queryType()]: { - [this.column]: { - query: this.query, - boost: this.boost, - fuzziness: this.fuzziness, - // biome-ignore lint/style/useNamingConvention: use underscore for consistency with the other APIs - max_expansions: this.maxExpansions, - }, - }, - }; - } } export class PhraseQuery implements FullTextQuery { + /** @ignore */ + public readonly inner: JsFullTextQuery; /** * Creates an instance of `PhraseQuery`. * * @param query - The phrase to search for in the specified column. * @param column - The name of the column to search within. */ - constructor( - private query: string, - private column: string, - ) {} + constructor(query: string, column: string) { + this.inner = JsFullTextQuery.phraseQuery(query, column); + } queryType(): FullTextQueryType { return FullTextQueryType.MatchPhrase; } - - toDict(): Record { - return { - [this.queryType()]: { - [this.column]: this.query, - }, - }; - } } export class BoostQuery implements FullTextQuery { + /** @ignore */ + public readonly inner: JsFullTextQuery; /** * Creates an instance of BoostQuery. + * The boost returns documents that match the positive query, + * but penalizes those that match the negative query. + * the penalty is controlled by the `negativeBoost` parameter. * * @param positive - The positive query that boosts the relevance score. * @param negative - The negative query that reduces the relevance score. - * @param negativeBoost - The factor by which the negative query reduces the score. + * @param options - Optional parameters for the boost query. + * - `negativeBoost`: The boost factor for the negative query (default is 0.0). */ constructor( - private positive: FullTextQuery, - private negative: FullTextQuery, - private negativeBoost: number, - ) {} + positive: FullTextQuery, + negative: FullTextQuery, + options?: { + negativeBoost?: number; + }, + ) { + this.inner = JsFullTextQuery.boostQuery( + positive.inner, + negative.inner, + options?.negativeBoost, + ); + } queryType(): FullTextQueryType { return FullTextQueryType.Boost; } - - toDict(): Record { - return { - [this.queryType()]: { - positive: this.positive.toDict(), - negative: this.negative.toDict(), - // biome-ignore lint/style/useNamingConvention: use underscore for consistency with the other APIs - negative_boost: this.negativeBoost, - }, - }; - } } export class MultiMatchQuery implements FullTextQuery { + /** @ignore */ + public readonly inner: JsFullTextQuery; /** * Creates an instance of MultiMatchQuery. * * @param query - The text query to search for across multiple columns. * @param columns - An array of column names to search within. - * @param boosts - (Optional) An array of boost factors corresponding to each column. Default is an array of 1.0 for each column. - * - * The `boosts` array should have the same length as `columns`. If not provided, all columns will have a default boost of 1.0. - * If the length of `boosts` is less than `columns`, it will be padded with 1.0s. + * @param options - Optional parameters for the multi-match query. + * - `boosts`: An array of boost factors for each column (default is 1.0 for all). */ constructor( - private query: string, - private columns: string[], - private boosts: number[] = columns.map(() => 1.0), - ) {} + query: string, + columns: string[], + options?: { + boosts?: number[]; + }, + ) { + this.inner = JsFullTextQuery.multiMatchQuery( + query, + columns, + options?.boosts, + ); + } queryType(): FullTextQueryType { return FullTextQueryType.MultiMatch; } - - toDict(): Record { - return { - [this.queryType()]: { - query: this.query, - columns: this.columns, - boost: this.boosts, - }, - }; - } } diff --git a/nodejs/lancedb/table.ts b/nodejs/lancedb/table.ts index 81bccb38..8fa456f2 100644 --- a/nodejs/lancedb/table.ts +++ b/nodejs/lancedb/table.ts @@ -22,7 +22,12 @@ import { OptimizeStats, Table as _NativeTable, } from "./native"; -import { Query, VectorQuery } from "./query"; +import { + FullTextQuery, + Query, + VectorQuery, + instanceOfFullTextQuery, +} from "./query"; import { sanitizeType } from "./sanitize"; import { IntoSql, toSQL } from "./util"; export { IndexConfig } from "./native"; @@ -294,7 +299,7 @@ export abstract class Table { * if the query is a string and no embedding function is defined, it will be treated as a full text search query */ abstract search( - query: string | IntoVector, + query: string | IntoVector | FullTextQuery, queryType?: string, ftsColumns?: string | string[], ): VectorQuery | Query; @@ -565,11 +570,11 @@ export class LocalTable extends Table { } search( - query: string | IntoVector, + query: string | IntoVector | FullTextQuery, queryType: string = "auto", ftsColumns?: string | string[], ): VectorQuery | Query { - if (typeof query !== "string") { + if (typeof query !== "string" && !instanceOfFullTextQuery(query)) { if (queryType === "fts") { throw new Error("Cannot perform full text search on a vector query"); } @@ -585,7 +590,10 @@ export class LocalTable extends Table { // The query type is auto or vector // fall back to full text search if no embedding functions are defined and the query is a string - if (queryType === "auto" && getRegistry().length() === 0) { + if ( + queryType === "auto" && + (getRegistry().length() === 0 || instanceOfFullTextQuery(query)) + ) { return this.query().fullTextSearch(query, { columns: ftsColumns, }); diff --git a/nodejs/src/query.rs b/nodejs/src/query.rs index 59dc72f6..b5acece4 100644 --- a/nodejs/src/query.rs +++ b/nodejs/src/query.rs @@ -3,7 +3,9 @@ use std::sync::Arc; -use lancedb::index::scalar::{FtsQuery, FullTextSearchQuery, MatchQuery, PhraseQuery}; +use lancedb::index::scalar::{ + BoostQuery, FtsQuery, FullTextSearchQuery, MatchQuery, MultiMatchQuery, PhraseQuery, +}; use lancedb::query::ExecutableQuery; use lancedb::query::Query as LanceDbQuery; use lancedb::query::QueryBase; @@ -18,7 +20,7 @@ use crate::error::NapiErrorExt; use crate::iterator::RecordBatchIterator; use crate::rerankers::Reranker; use crate::rerankers::RerankerCallbacks; -use crate::util::{parse_distance_type, parse_fts_query}; +use crate::util::parse_distance_type; #[napi] pub struct Query { @@ -38,51 +40,8 @@ impl Query { } #[napi] - pub fn full_text_search(&mut self, query: napi::JsUnknown) -> napi::Result<()> { - let query = unsafe { query.cast::() }; - let query = if let Some(query_text) = query.get::<_, String>("query").transpose() { - let mut query_text = query_text?; - let columns = query.get::<_, Option>>("columns")?.flatten(); - - let is_phrase = - query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"'); - let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false); - - if is_phrase { - // Remove the surrounding quotes for phrase queries - query_text = query_text[1..query_text.len() - 1].to_string(); - } - - let query: FtsQuery = match (is_phrase, is_multi_match) { - (false, _) => MatchQuery::new(query_text).into(), - (true, false) => PhraseQuery::new(query_text).into(), - (true, true) => { - return Err(napi::Error::from_reason( - "Phrase queries cannot be used with multiple columns.", - )); - } - }; - let mut query = FullTextSearchQuery::new_query(query); - if let Some(cols) = columns { - if !cols.is_empty() { - query = query.with_columns(&cols).map_err(|e| { - napi::Error::from_reason(format!( - "Failed to set full text search columns: {}", - e - )) - })?; - } - } - query - } else if let Some(query) = query.get::<_, napi::JsObject>("query")? { - let query = parse_fts_query(&query)?; - FullTextSearchQuery::new_query(query) - } else { - return Err(napi::Error::from_reason( - "Invalid full text search query object".to_string(), - )); - }; - + pub fn full_text_search(&mut self, query: napi::JsObject) -> napi::Result<()> { + let query = parse_fts_query(query)?; self.inner = self.inner.clone().full_text_search(query); Ok(()) } @@ -243,51 +202,8 @@ impl VectorQuery { } #[napi] - pub fn full_text_search(&mut self, query: napi::JsUnknown) -> napi::Result<()> { - let query = unsafe { query.cast::() }; - let query = if let Some(query_text) = query.get::<_, String>("query").transpose() { - let mut query_text = query_text?; - let columns = query.get::<_, Option>>("columns")?.flatten(); - - let is_phrase = - query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"'); - let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false); - - if is_phrase { - // Remove the surrounding quotes for phrase queries - query_text = query_text[1..query_text.len() - 1].to_string(); - } - - let query: FtsQuery = match (is_phrase, is_multi_match) { - (false, _) => MatchQuery::new(query_text).into(), - (true, false) => PhraseQuery::new(query_text).into(), - (true, true) => { - return Err(napi::Error::from_reason( - "Phrase queries cannot be used with multiple columns.", - )); - } - }; - let mut query = FullTextSearchQuery::new_query(query); - if let Some(cols) = columns { - if !cols.is_empty() { - query = query.with_columns(&cols).map_err(|e| { - napi::Error::from_reason(format!( - "Failed to set full text search columns: {}", - e - )) - })?; - } - } - query - } else if let Some(query) = query.get::<_, napi::JsObject>("query")? { - let query = parse_fts_query(&query)?; - FullTextSearchQuery::new_query(query) - } else { - return Err(napi::Error::from_reason( - "Invalid full text search query object".to_string(), - )); - }; - + pub fn full_text_search(&mut self, query: napi::JsObject) -> napi::Result<()> { + let query = parse_fts_query(query)?; self.inner = self.inner.clone().full_text_search(query); Ok(()) } @@ -376,3 +292,118 @@ impl VectorQuery { }) } } + +#[napi] +#[derive(Debug, Clone)] +pub struct JsFullTextQuery { + pub(crate) inner: FtsQuery, +} + +#[napi] +impl JsFullTextQuery { + #[napi(factory)] + pub fn match_query( + query: String, + column: String, + boost: f64, + fuzziness: Option, + max_expansions: u32, + ) -> napi::Result { + Ok(Self { + inner: MatchQuery::new(query) + .with_column(Some(column)) + .with_boost(boost as f32) + .with_fuzziness(fuzziness) + .with_max_expansions(max_expansions as usize) + .into(), + }) + } + + #[napi(factory)] + pub fn phrase_query(query: String, column: String) -> napi::Result { + Ok(Self { + inner: PhraseQuery::new(query).with_column(Some(column)).into(), + }) + } + + #[napi(factory)] + pub fn boost_query( + positive: &JsFullTextQuery, + negative: &JsFullTextQuery, + negative_boost: Option, + ) -> napi::Result { + Ok(Self { + inner: BoostQuery::new( + positive.inner.clone(), + negative.inner.clone(), + negative_boost.map(|v| v as f32), + ) + .into(), + }) + } + + #[napi(factory)] + pub fn multi_match_query( + query: String, + columns: Vec, + boosts: Option>, + ) -> napi::Result { + let q = match boosts { + Some(boosts) => MultiMatchQuery::try_new_with_boosts( + query, + columns, + boosts.into_iter().map(|v| v as f32).collect(), + ), + None => MultiMatchQuery::try_new(query, columns), + } + .map_err(|e| { + napi::Error::from_reason(format!("Failed to create multi match query: {}", e)) + })?; + + Ok(Self { inner: q.into() }) + } +} + +fn parse_fts_query(query: napi::JsObject) -> napi::Result { + if let Ok(Some(query)) = query.get::<_, &JsFullTextQuery>("query") { + Ok(FullTextSearchQuery::new_query(query.inner.clone())) + } else if let Ok(Some(query_text)) = query.get::<_, String>("query") { + let mut query_text = query_text; + let columns = query.get::<_, Option>>("columns")?.flatten(); + + let is_phrase = + query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"'); + let is_multi_match = columns.as_ref().map(|cols| cols.len() > 1).unwrap_or(false); + + if is_phrase { + // Remove the surrounding quotes for phrase queries + query_text = query_text[1..query_text.len() - 1].to_string(); + } + + let query: FtsQuery = match (is_phrase, is_multi_match) { + (false, _) => MatchQuery::new(query_text).into(), + (true, false) => PhraseQuery::new(query_text).into(), + (true, true) => { + return Err(napi::Error::from_reason( + "Phrase queries cannot be used with multiple columns.", + )); + } + }; + let mut query = FullTextSearchQuery::new_query(query); + if let Some(cols) = columns { + if !cols.is_empty() { + query = query.with_columns(&cols).map_err(|e| { + napi::Error::from_reason(format!( + "Failed to set full text search columns: {}", + e + )) + })?; + } + } + Ok(query) + } else { + Err(napi::Error::from_reason( + "Invalid full text search query object".to_string(), + )) + } +} diff --git a/nodejs/src/util.rs b/nodejs/src/util.rs index 18959136..a29a67f9 100644 --- a/nodejs/src/util.rs +++ b/nodejs/src/util.rs @@ -1,7 +1,6 @@ // SPDX-License-Identifier: Apache-2.0 // SPDX-FileCopyrightText: Copyright The LanceDB Authors -use lancedb::index::scalar::{BoostQuery, FtsQuery, MatchQuery, MultiMatchQuery, PhraseQuery}; use lancedb::DistanceType; pub fn parse_distance_type(distance_type: impl AsRef) -> napi::Result { @@ -16,144 +15,3 @@ pub fn parse_distance_type(distance_type: impl AsRef) -> napi::Result napi::Result { - let query_type = query - .get_property_names()? - .get_element::(0)?; - let query_type = query_type.into_utf8()?.into_owned()?; - let query_value = - query - .get::<_, napi::JsObject>(&query_type)? - .ok_or(napi::Error::from_reason(format!( - "query value {} not found", - query_type - )))?; - - match query_type.as_str() { - "match" => { - let column = query_value - .get_property_names()? - .get_element::(0)? - .into_utf8()? - .into_owned()?; - let params = - query_value - .get::<_, napi::JsObject>(&column)? - .ok_or(napi::Error::from_reason(format!( - "column {} not found", - column - )))?; - - let query = params - .get::<_, napi::JsString>("query")? - .ok_or(napi::Error::from_reason("query not found"))? - .into_utf8()? - .into_owned()?; - let boost = params - .get::<_, napi::JsNumber>("boost")? - .ok_or(napi::Error::from_reason("boost not found"))? - .get_double()? as f32; - let fuzziness = params - .get::<_, napi::JsNumber>("fuzziness")? - .map(|f| f.get_uint32()) - .transpose()?; - let max_expansions = params - .get::<_, napi::JsNumber>("max_expansions")? - .ok_or(napi::Error::from_reason("max_expansions not found"))? - .get_uint32()? as usize; - - let query = MatchQuery::new(query) - .with_column(Some(column)) - .with_boost(boost) - .with_fuzziness(fuzziness) - .with_max_expansions(max_expansions); - Ok(query.into()) - } - - "match_phrase" => { - let column = query_value - .get_property_names()? - .get_element::(0)? - .into_utf8()? - .into_owned()?; - let query = query_value - .get::<_, napi::JsString>(&column)? - .ok_or(napi::Error::from_reason(format!( - "column {} not found", - column - )))? - .into_utf8()? - .into_owned()?; - - let query = PhraseQuery::new(query).with_column(Some(column)); - Ok(query.into()) - } - - "boost" => { - let positive = query_value - .get::<_, napi::JsObject>("positive")? - .ok_or(napi::Error::from_reason("positive not found"))?; - - let negative = query_value - .get::<_, napi::JsObject>("negative")? - .ok_or(napi::Error::from_reason("negative not found"))?; - let negative_boost = query_value - .get::<_, napi::JsNumber>("negative_boost")? - .ok_or(napi::Error::from_reason("negative_boost not found"))? - .get_double()? as f32; - - let positive = parse_fts_query(&positive)?; - let negative = parse_fts_query(&negative)?; - let query = BoostQuery::new(positive, negative, Some(negative_boost)); - Ok(query.into()) - } - - "multi_match" => { - let query = query_value - .get::<_, napi::JsString>("query")? - .ok_or(napi::Error::from_reason("query not found"))? - .into_utf8()? - .into_owned()?; - let columns_array = query_value - .get::<_, napi::JsTypedArray>("columns")? - .ok_or(napi::Error::from_reason("columns not found"))?; - let columns_num = columns_array.get_array_length()?; - let mut columns = Vec::with_capacity(columns_num as usize); - for i in 0..columns_num { - let column = columns_array - .get_element::(i)? - .into_utf8()? - .into_owned()?; - columns.push(column); - } - let boost_array = query_value - .get::<_, napi::JsTypedArray>("boost")? - .ok_or(napi::Error::from_reason("boost not found"))?; - if boost_array.get_array_length()? != columns_num { - return Err(napi::Error::from_reason(format!( - "boost array length ({}) does not match columns length ({})", - boost_array.get_array_length()?, - columns_num - ))); - } - let mut boost = Vec::with_capacity(columns_num as usize); - for i in 0..columns_num { - let b = boost_array.get_element::(i)?.get_double()? as f32; - boost.push(b); - } - - let query = - MultiMatchQuery::try_new_with_boosts(query, columns, boost).map_err(|e| { - napi::Error::from_reason(format!("Error creating MultiMatchQuery: {}", e)) - })?; - - Ok(query.into()) - } - - _ => Err(napi::Error::from_reason(format!( - "Unsupported query type: {}", - query_type - ))), - } -} diff --git a/python/python/lancedb/query.py b/python/python/lancedb/query.py index 06d4b1b6..6db7e78a 100644 --- a/python/python/lancedb/query.py +++ b/python/python/lancedb/query.py @@ -266,8 +266,8 @@ class MultiMatchQuery(FullTextQuery): Parameters ---------- - query : str | list[Query] - If a string, the query string to match against. + query : str + The query string to match against. columns : list[str] The list of columns to match against. diff --git a/python/python/lancedb/table.py b/python/python/lancedb/table.py index 7e28237d..d940b46c 100644 --- a/python/python/lancedb/table.py +++ b/python/python/lancedb/table.py @@ -2141,6 +2141,8 @@ class LanceTable(Table): and also the "_distance" column which is the distance between the query vector and the returned vector. """ + if isinstance(query, FullTextQuery): + query_type = "fts" vector_column_name = infer_vector_column_name( schema=self.schema, query_type=query_type, @@ -3223,8 +3225,10 @@ class AsyncTable: async def get_embedding_func( vector_column_name: Optional[str], query_type: QueryType, - query: Optional[Union[VEC, str, "PIL.Image.Image", Tuple]], + query: Optional[Union[VEC, str, "PIL.Image.Image", Tuple, FullTextQuery]], ) -> Tuple[str, EmbeddingFunctionConfig]: + if isinstance(query, FullTextQuery): + query_type = "fts" schema = await self.schema() vector_column_name = infer_vector_column_name( schema=schema, diff --git a/python/python/lancedb/util.py b/python/python/lancedb/util.py index 2139252e..64573f6b 100644 --- a/python/python/lancedb/util.py +++ b/python/python/lancedb/util.py @@ -253,9 +253,14 @@ def infer_vector_column_name( query: Optional[Any], # inferred later in query builder vector_column_name: Optional[str], ): - if (vector_column_name is None and query is not None and query_type != "fts") or ( - vector_column_name is None and query_type == "hybrid" - ): + if vector_column_name is not None: + return vector_column_name + + if query_type == "fts": + # FTS queries do not require a vector column + return None + + if query is not None or query_type == "hybrid": try: vector_column_name = inf_vector_column_query(schema) except Exception as e: diff --git a/python/python/tests/docs/test_search.py b/python/python/tests/docs/test_search.py index fe276fe4..545ac161 100644 --- a/python/python/tests/docs/test_search.py +++ b/python/python/tests/docs/test_search.py @@ -6,7 +6,9 @@ import lancedb # --8<-- [end:import-lancedb] # --8<-- [start:import-numpy] +from lancedb.query import BoostQuery, MatchQuery import numpy as np +import pyarrow as pa # --8<-- [end:import-numpy] # --8<-- [start:import-datetime] @@ -154,6 +156,84 @@ async def test_vector_search_async(): # --8<-- [end:search_result_async_as_list] +def test_fts_fuzzy_query(): + uri = "data/fuzzy-example" + db = lancedb.connect(uri) + + table = db.create_table( + "my_table_fts_fuzzy", + data=pa.table( + { + "text": [ + "fa", + "fo", # spellchecker:disable-line + "fob", + "focus", + "foo", + "food", + "foul", + ] + } + ), + mode="overwrite", + ) + table.create_fts_index("text", use_tantivy=False, replace=True) + + results = table.search(MatchQuery("foo", "text", fuzziness=1)).to_pandas() + assert len(results) == 4 + assert set(results["text"].to_list()) == { + "foo", + "fo", # 1 deletion # spellchecker:disable-line + "fob", # 1 substitution + "food", # 1 insertion + } + + +def test_fts_boost_query(): + uri = "data/boost-example" + db = lancedb.connect(uri) + + table = db.create_table( + "my_table_fts_boost", + data=pa.table( + { + "title": [ + "The Hidden Gems of Travel", + "Exploring Nature's Wonders", + "Cultural Treasures Unveiled", + "The Nightlife Chronicles", + "Scenic Escapes and Challenges", + ], + "desc": [ + "A vibrant city with occasional traffic jams.", + "Beautiful landscapes but overpriced tourist spots.", + "Rich cultural heritage but humid summers.", + "Bustling nightlife but noisy streets.", + "Scenic views but limited public transport options.", + ], + } + ), + mode="overwrite", + ) + table.create_fts_index("desc", use_tantivy=False, replace=True) + + results = table.search( + BoostQuery( + MatchQuery("beautiful, cultural, nightlife", "desc"), + MatchQuery("bad traffic jams, overpriced", "desc"), + ), + ).to_pandas() + + # we will hit 3 results because the positive query has 3 hits + assert len(results) == 3 + # the one containing "overpriced" will be negatively boosted, + # so it will be the last one + assert ( + results["desc"].to_list()[2] + == "Beautiful landscapes but overpriced tourist spots." + ) + + def test_fts_native(): # --8<-- [start:basic_fts] uri = "data/sample-lancedb"