Files
lancedb/docs/src/js/namespaces/embedding/classes/TextEmbeddingFunction.md
Vedant Madane 1ba19d728e feat(node): support Float16, Float64, and Uint8 vector queries (#3193)
Fixes #2716

## Summary

Add support for querying with Float16Array, Float64Array, and Uint8Array
vectors in the Node.js SDK, eliminating precision loss from the previous
\Float32Array.from()\ conversion.

## Implementation

Follows @wjones127's [5-step
plan](https://github.com/lancedb/lancedb/issues/2716#issuecomment-3447750543):

### Rust (\
odejs/src/query.rs\)

1. \ytes_to_arrow_array(data: Uint8Array, dtype: String)\ helper that:
   - Creates an Arrow \Buffer\ from the raw bytes
   - Wraps it in a typed \ScalarBuffer<T>\ based on the dtype enum
   - Constructs a \PrimitiveArray\ and returns \Arc<dyn Array>\
2. \
earest_to_raw(data, dtype)\ and \dd_query_vector_raw(data, dtype)\ NAPI
methods that pass the type-erased array to the core \
earest_to\/\dd_query_vector\ which already accept \impl
IntoQueryVector\ for \Arc<dyn Array>\

### TypeScript (\
odejs/lancedb/query.ts\, \rrow.ts\)

3. Extended \IntoVector\ type to include \Uint8Array\ (and
\Float16Array\ via runtime check for Node 22+)
4. \xtractVectorBuffer()\ helper detects non-Float32 typed arrays and
extracts their underlying byte buffer + dtype string
5. \
earestTo()\ and \ddQueryVector()\ route through the raw NAPI path when
the input is Float16/Float64/Uint8

### Backward compatibility

Existing \Float32Array\ and \
umber[]\ inputs are unchanged -- they still use the original \
earest_to(Float32Array)\ NAPI method. The new raw path is only used when
a non-Float32 typed array is detected.

## Usage

\\\	ypescript
// Float16Array (Node 22+) -- no precision loss
const f16vec = new Float16Array([0.1, 0.2, 0.3]);
const results = await
table.query().nearestTo(f16vec).limit(10).toArray();

// Float64Array -- no precision loss
const f64vec = new Float64Array([0.1, 0.2, 0.3]);
const results = await
table.query().nearestTo(f64vec).limit(10).toArray();

// Uint8Array (binary embeddings)
const u8vec = new Uint8Array([1, 0, 1, 1, 0]);
const results = await
table.query().nearestTo(u8vec).limit(10).toArray();

// Existing usage unchanged
const results = await table.query().nearestTo([0.1, 0.2,
0.3]).limit(10).toArray();
\\\

## Note on dependencies

The Rust side uses \rrow_array\, \rrow_buffer\, and \half\ crates.
These should already be in the dependency tree via \lancedb\ core, but
\Cargo.toml\ may need explicit entries for \half\ and the arrow
sub-crates in the nodejs workspace.

---------

Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2026-03-30 11:15:35 -07:00

263 lines
5.3 KiB
Markdown

[**@lancedb/lancedb**](../../../README.md) • **Docs**
***
[@lancedb/lancedb](../../../globals.md) / [embedding](../README.md) / TextEmbeddingFunction
# Class: `abstract` TextEmbeddingFunction&lt;M&gt;
an abstract class for implementing embedding functions that take text as input
## Extends
- [`EmbeddingFunction`](EmbeddingFunction.md)&lt;`string`, `M`&gt;
## Type Parameters
**M** *extends* [`FunctionOptions`](../interfaces/FunctionOptions.md) = [`FunctionOptions`](../interfaces/FunctionOptions.md)
## Constructors
### new TextEmbeddingFunction()
```ts
new TextEmbeddingFunction<M>(): TextEmbeddingFunction<M>
```
#### Returns
[`TextEmbeddingFunction`](TextEmbeddingFunction.md)&lt;`M`&gt;
#### Inherited from
[`EmbeddingFunction`](EmbeddingFunction.md).[`constructor`](EmbeddingFunction.md#constructors)
## Methods
### computeQueryEmbeddings()
```ts
computeQueryEmbeddings(data): Promise<number[] | Uint8Array | Float32Array | Float64Array>
```
Compute the embeddings for a single query
#### Parameters
* **data**: `string`
#### Returns
`Promise`&lt;`number`[] \| `Uint8Array` \| `Float32Array` \| `Float64Array`&gt;
#### Overrides
[`EmbeddingFunction`](EmbeddingFunction.md).[`computeQueryEmbeddings`](EmbeddingFunction.md#computequeryembeddings)
***
### computeSourceEmbeddings()
```ts
computeSourceEmbeddings(data): Promise<number[][] | Float32Array[] | Float64Array[]>
```
Creates a vector representation for the given values.
#### Parameters
* **data**: `string`[]
#### Returns
`Promise`&lt;`number`[][] \| `Float32Array`[] \| `Float64Array`[]&gt;
#### Overrides
[`EmbeddingFunction`](EmbeddingFunction.md).[`computeSourceEmbeddings`](EmbeddingFunction.md#computesourceembeddings)
***
### embeddingDataType()
```ts
embeddingDataType(): Float<Floats>
```
The datatype of the embeddings
#### Returns
`Float`&lt;`Floats`&gt;
#### Overrides
[`EmbeddingFunction`](EmbeddingFunction.md).[`embeddingDataType`](EmbeddingFunction.md#embeddingdatatype)
***
### generateEmbeddings()
```ts
abstract generateEmbeddings(texts, ...args): Promise<number[][] | Float32Array[] | Float64Array[]>
```
#### Parameters
* **texts**: `string`[]
* ...**args**: `any`[]
#### Returns
`Promise`&lt;`number`[][] \| `Float32Array`[] \| `Float64Array`[]&gt;
***
### getSensitiveKeys()
```ts
protected getSensitiveKeys(): string[]
```
Provide a list of keys in the function options that should be treated as
sensitive. If users pass raw values for these keys, they will be rejected.
#### Returns
`string`[]
#### Inherited from
[`EmbeddingFunction`](EmbeddingFunction.md).[`getSensitiveKeys`](EmbeddingFunction.md#getsensitivekeys)
***
### init()?
```ts
optional init(): Promise<void>
```
Optionally load any resources needed for the embedding function.
This method is called after the embedding function has been initialized
but before any embeddings are computed. It is useful for loading local models
or other resources that are needed for the embedding function to work.
#### Returns
`Promise`&lt;`void`&gt;
#### Inherited from
[`EmbeddingFunction`](EmbeddingFunction.md).[`init`](EmbeddingFunction.md#init)
***
### ndims()
```ts
ndims(): undefined | number
```
The number of dimensions of the embeddings
#### Returns
`undefined` \| `number`
#### Inherited from
[`EmbeddingFunction`](EmbeddingFunction.md).[`ndims`](EmbeddingFunction.md#ndims)
***
### resolveVariables()
```ts
protected resolveVariables(config): Partial<M>
```
Apply variables to the config.
#### Parameters
* **config**: `Partial`&lt;`M`&gt;
#### Returns
`Partial`&lt;`M`&gt;
#### Inherited from
[`EmbeddingFunction`](EmbeddingFunction.md).[`resolveVariables`](EmbeddingFunction.md#resolvevariables)
***
### sourceField()
```ts
sourceField(): [DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]
```
sourceField is used in combination with `LanceSchema` to provide a declarative data model
#### Returns
[`DataType`&lt;`Type`, `any`&gt;, `Map`&lt;`string`, [`EmbeddingFunction`](EmbeddingFunction.md)&lt;`any`, [`FunctionOptions`](../interfaces/FunctionOptions.md)&gt;&gt;]
#### See
[LanceSchema](../functions/LanceSchema.md)
#### Overrides
[`EmbeddingFunction`](EmbeddingFunction.md).[`sourceField`](EmbeddingFunction.md#sourcefield)
***
### toJSON()
```ts
toJSON(): Record<string, any>
```
Get the original arguments to the constructor, to serialize them so they
can be used to recreate the embedding function later.
#### Returns
`Record`&lt;`string`, `any`&gt;
#### Inherited from
[`EmbeddingFunction`](EmbeddingFunction.md).[`toJSON`](EmbeddingFunction.md#tojson)
***
### vectorField()
```ts
vectorField(optionsOrDatatype?): [DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]
```
vectorField is used in combination with `LanceSchema` to provide a declarative data model
#### Parameters
* **optionsOrDatatype?**: `DataType`&lt;`Type`, `any`&gt; \| `Partial`&lt;[`FieldOptions`](../interfaces/FieldOptions.md)&lt;`DataType`&lt;`Type`, `any`&gt;&gt;&gt;
The options for the field
#### Returns
[`DataType`&lt;`Type`, `any`&gt;, `Map`&lt;`string`, [`EmbeddingFunction`](EmbeddingFunction.md)&lt;`any`, [`FunctionOptions`](../interfaces/FunctionOptions.md)&gt;&gt;]
#### See
[LanceSchema](../functions/LanceSchema.md)
#### Inherited from
[`EmbeddingFunction`](EmbeddingFunction.md).[`vectorField`](EmbeddingFunction.md#vectorfield)