mirror of
https://github.com/lancedb/lancedb.git
synced 2026-04-28 18:50:41 +00:00
Fixes #2716 ## Summary Add support for querying with Float16Array, Float64Array, and Uint8Array vectors in the Node.js SDK, eliminating precision loss from the previous \Float32Array.from()\ conversion. ## Implementation Follows @wjones127's [5-step plan](https://github.com/lancedb/lancedb/issues/2716#issuecomment-3447750543): ### Rust (\ odejs/src/query.rs\) 1. \ytes_to_arrow_array(data: Uint8Array, dtype: String)\ helper that: - Creates an Arrow \Buffer\ from the raw bytes - Wraps it in a typed \ScalarBuffer<T>\ based on the dtype enum - Constructs a \PrimitiveArray\ and returns \Arc<dyn Array>\ 2. \ earest_to_raw(data, dtype)\ and \dd_query_vector_raw(data, dtype)\ NAPI methods that pass the type-erased array to the core \ earest_to\/\dd_query_vector\ which already accept \impl IntoQueryVector\ for \Arc<dyn Array>\ ### TypeScript (\ odejs/lancedb/query.ts\, \rrow.ts\) 3. Extended \IntoVector\ type to include \Uint8Array\ (and \Float16Array\ via runtime check for Node 22+) 4. \xtractVectorBuffer()\ helper detects non-Float32 typed arrays and extracts their underlying byte buffer + dtype string 5. \ earestTo()\ and \ddQueryVector()\ route through the raw NAPI path when the input is Float16/Float64/Uint8 ### Backward compatibility Existing \Float32Array\ and \ umber[]\ inputs are unchanged -- they still use the original \ earest_to(Float32Array)\ NAPI method. The new raw path is only used when a non-Float32 typed array is detected. ## Usage \\\ ypescript // Float16Array (Node 22+) -- no precision loss const f16vec = new Float16Array([0.1, 0.2, 0.3]); const results = await table.query().nearestTo(f16vec).limit(10).toArray(); // Float64Array -- no precision loss const f64vec = new Float64Array([0.1, 0.2, 0.3]); const results = await table.query().nearestTo(f64vec).limit(10).toArray(); // Uint8Array (binary embeddings) const u8vec = new Uint8Array([1, 0, 1, 1, 0]); const results = await table.query().nearestTo(u8vec).limit(10).toArray(); // Existing usage unchanged const results = await table.query().nearestTo([0.1, 0.2, 0.3]).limit(10).toArray(); \\\ ## Note on dependencies The Rust side uses \rrow_array\, \rrow_buffer\, and \half\ crates. These should already be in the dependency tree via \lancedb\ core, but \Cargo.toml\ may need explicit entries for \half\ and the arrow sub-crates in the nodejs workspace. --------- Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com> Co-authored-by: Will Jones <willjones127@gmail.com>
263 lines
5.3 KiB
Markdown
263 lines
5.3 KiB
Markdown
[**@lancedb/lancedb**](../../../README.md) • **Docs**
|
|
|
|
***
|
|
|
|
[@lancedb/lancedb](../../../globals.md) / [embedding](../README.md) / TextEmbeddingFunction
|
|
|
|
# Class: `abstract` TextEmbeddingFunction<M>
|
|
|
|
an abstract class for implementing embedding functions that take text as input
|
|
|
|
## Extends
|
|
|
|
- [`EmbeddingFunction`](EmbeddingFunction.md)<`string`, `M`>
|
|
|
|
## Type Parameters
|
|
|
|
• **M** *extends* [`FunctionOptions`](../interfaces/FunctionOptions.md) = [`FunctionOptions`](../interfaces/FunctionOptions.md)
|
|
|
|
## Constructors
|
|
|
|
### new TextEmbeddingFunction()
|
|
|
|
```ts
|
|
new TextEmbeddingFunction<M>(): TextEmbeddingFunction<M>
|
|
```
|
|
|
|
#### Returns
|
|
|
|
[`TextEmbeddingFunction`](TextEmbeddingFunction.md)<`M`>
|
|
|
|
#### Inherited from
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`constructor`](EmbeddingFunction.md#constructors)
|
|
|
|
## Methods
|
|
|
|
### computeQueryEmbeddings()
|
|
|
|
```ts
|
|
computeQueryEmbeddings(data): Promise<number[] | Uint8Array | Float32Array | Float64Array>
|
|
```
|
|
|
|
Compute the embeddings for a single query
|
|
|
|
#### Parameters
|
|
|
|
* **data**: `string`
|
|
|
|
#### Returns
|
|
|
|
`Promise`<`number`[] \| `Uint8Array` \| `Float32Array` \| `Float64Array`>
|
|
|
|
#### Overrides
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`computeQueryEmbeddings`](EmbeddingFunction.md#computequeryembeddings)
|
|
|
|
***
|
|
|
|
### computeSourceEmbeddings()
|
|
|
|
```ts
|
|
computeSourceEmbeddings(data): Promise<number[][] | Float32Array[] | Float64Array[]>
|
|
```
|
|
|
|
Creates a vector representation for the given values.
|
|
|
|
#### Parameters
|
|
|
|
* **data**: `string`[]
|
|
|
|
#### Returns
|
|
|
|
`Promise`<`number`[][] \| `Float32Array`[] \| `Float64Array`[]>
|
|
|
|
#### Overrides
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`computeSourceEmbeddings`](EmbeddingFunction.md#computesourceembeddings)
|
|
|
|
***
|
|
|
|
### embeddingDataType()
|
|
|
|
```ts
|
|
embeddingDataType(): Float<Floats>
|
|
```
|
|
|
|
The datatype of the embeddings
|
|
|
|
#### Returns
|
|
|
|
`Float`<`Floats`>
|
|
|
|
#### Overrides
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`embeddingDataType`](EmbeddingFunction.md#embeddingdatatype)
|
|
|
|
***
|
|
|
|
### generateEmbeddings()
|
|
|
|
```ts
|
|
abstract generateEmbeddings(texts, ...args): Promise<number[][] | Float32Array[] | Float64Array[]>
|
|
```
|
|
|
|
#### Parameters
|
|
|
|
* **texts**: `string`[]
|
|
|
|
* ...**args**: `any`[]
|
|
|
|
#### Returns
|
|
|
|
`Promise`<`number`[][] \| `Float32Array`[] \| `Float64Array`[]>
|
|
|
|
***
|
|
|
|
### getSensitiveKeys()
|
|
|
|
```ts
|
|
protected getSensitiveKeys(): string[]
|
|
```
|
|
|
|
Provide a list of keys in the function options that should be treated as
|
|
sensitive. If users pass raw values for these keys, they will be rejected.
|
|
|
|
#### Returns
|
|
|
|
`string`[]
|
|
|
|
#### Inherited from
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`getSensitiveKeys`](EmbeddingFunction.md#getsensitivekeys)
|
|
|
|
***
|
|
|
|
### init()?
|
|
|
|
```ts
|
|
optional init(): Promise<void>
|
|
```
|
|
|
|
Optionally load any resources needed for the embedding function.
|
|
|
|
This method is called after the embedding function has been initialized
|
|
but before any embeddings are computed. It is useful for loading local models
|
|
or other resources that are needed for the embedding function to work.
|
|
|
|
#### Returns
|
|
|
|
`Promise`<`void`>
|
|
|
|
#### Inherited from
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`init`](EmbeddingFunction.md#init)
|
|
|
|
***
|
|
|
|
### ndims()
|
|
|
|
```ts
|
|
ndims(): undefined | number
|
|
```
|
|
|
|
The number of dimensions of the embeddings
|
|
|
|
#### Returns
|
|
|
|
`undefined` \| `number`
|
|
|
|
#### Inherited from
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`ndims`](EmbeddingFunction.md#ndims)
|
|
|
|
***
|
|
|
|
### resolveVariables()
|
|
|
|
```ts
|
|
protected resolveVariables(config): Partial<M>
|
|
```
|
|
|
|
Apply variables to the config.
|
|
|
|
#### Parameters
|
|
|
|
* **config**: `Partial`<`M`>
|
|
|
|
#### Returns
|
|
|
|
`Partial`<`M`>
|
|
|
|
#### Inherited from
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`resolveVariables`](EmbeddingFunction.md#resolvevariables)
|
|
|
|
***
|
|
|
|
### sourceField()
|
|
|
|
```ts
|
|
sourceField(): [DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]
|
|
```
|
|
|
|
sourceField is used in combination with `LanceSchema` to provide a declarative data model
|
|
|
|
#### Returns
|
|
|
|
[`DataType`<`Type`, `any`>, `Map`<`string`, [`EmbeddingFunction`](EmbeddingFunction.md)<`any`, [`FunctionOptions`](../interfaces/FunctionOptions.md)>>]
|
|
|
|
#### See
|
|
|
|
[LanceSchema](../functions/LanceSchema.md)
|
|
|
|
#### Overrides
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`sourceField`](EmbeddingFunction.md#sourcefield)
|
|
|
|
***
|
|
|
|
### toJSON()
|
|
|
|
```ts
|
|
toJSON(): Record<string, any>
|
|
```
|
|
|
|
Get the original arguments to the constructor, to serialize them so they
|
|
can be used to recreate the embedding function later.
|
|
|
|
#### Returns
|
|
|
|
`Record`<`string`, `any`>
|
|
|
|
#### Inherited from
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`toJSON`](EmbeddingFunction.md#tojson)
|
|
|
|
***
|
|
|
|
### vectorField()
|
|
|
|
```ts
|
|
vectorField(optionsOrDatatype?): [DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]
|
|
```
|
|
|
|
vectorField is used in combination with `LanceSchema` to provide a declarative data model
|
|
|
|
#### Parameters
|
|
|
|
* **optionsOrDatatype?**: `DataType`<`Type`, `any`> \| `Partial`<[`FieldOptions`](../interfaces/FieldOptions.md)<`DataType`<`Type`, `any`>>>
|
|
The options for the field
|
|
|
|
#### Returns
|
|
|
|
[`DataType`<`Type`, `any`>, `Map`<`string`, [`EmbeddingFunction`](EmbeddingFunction.md)<`any`, [`FunctionOptions`](../interfaces/FunctionOptions.md)>>]
|
|
|
|
#### See
|
|
|
|
[LanceSchema](../functions/LanceSchema.md)
|
|
|
|
#### Inherited from
|
|
|
|
[`EmbeddingFunction`](EmbeddingFunction.md).[`vectorField`](EmbeddingFunction.md#vectorfield)
|