BREAKING CHANGE: embedding function implementations in Node need to now call `resolveVariables()` in their constructors and should **not** implement `toJSON()`. This tries to address the handling of secrets. In Node, they are currently lost. In Python, they are currently leaked into the table schema metadata. This PR introduces an in-memory variable store on the function registry. It also allows embedding function definitions to label certain config values as "sensitive", and the preprocessing logic will raise an error if users try to pass in hard-coded values. Closes #2110 Closes #521 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>
4.5 KiB
@lancedb/lancedb • Docs
@lancedb/lancedb / embedding / EmbeddingFunction
Class: abstract EmbeddingFunction<T, M>
An embedding function that automatically creates vector representation for a given column.
It's important subclasses pass the original options to the super constructor
and then pass those options to resolveVariables to resolve any variables before
using them.
Example
class MyEmbeddingFunction extends EmbeddingFunction {
constructor(options: {model: string, timeout: number}) {
super(optionsRaw);
const options = this.resolveVariables(optionsRaw);
this.model = options.model;
this.timeout = options.timeout;
}
}
Extended by
Type Parameters
• T = any
• M extends FunctionOptions = FunctionOptions
Constructors
new EmbeddingFunction()
new EmbeddingFunction<T, M>(): EmbeddingFunction<T, M>
Returns
EmbeddingFunction<T, M>
Methods
computeQueryEmbeddings()
computeQueryEmbeddings(data): Promise<number[] | Float32Array | Float64Array>
Compute the embeddings for a single query
Parameters
- data:
T
Returns
Promise<number[] | Float32Array | Float64Array>
computeSourceEmbeddings()
abstract computeSourceEmbeddings(data): Promise<number[][] | Float32Array[] | Float64Array[]>
Creates a vector representation for the given values.
Parameters
- data:
T[]
Returns
Promise<number[][] | Float32Array[] | Float64Array[]>
embeddingDataType()
abstract embeddingDataType(): Float<Floats>
The datatype of the embeddings
Returns
Float<Floats>
getSensitiveKeys()
protected getSensitiveKeys(): string[]
Provide a list of keys in the function options that should be treated as sensitive. If users pass raw values for these keys, they will be rejected.
Returns
string[]
init()?
optional init(): Promise<void>
Optionally load any resources needed for the embedding function.
This method is called after the embedding function has been initialized but before any embeddings are computed. It is useful for loading local models or other resources that are needed for the embedding function to work.
Returns
Promise<void>
ndims()
ndims(): undefined | number
The number of dimensions of the embeddings
Returns
undefined | number
resolveVariables()
protected resolveVariables(config): Partial<M>
Apply variables to the config.
Parameters
- config:
Partial<M>
Returns
Partial<M>
sourceField()
sourceField(optionsOrDatatype): [DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]
sourceField is used in combination with LanceSchema to provide a declarative data model
Parameters
- optionsOrDatatype:
DataType<Type,any> |Partial<FieldOptions<DataType<Type,any>>> The options for the field or the datatype
Returns
[DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]
See
toJSON()
toJSON(): Record<string, any>
Get the original arguments to the constructor, to serialize them so they can be used to recreate the embedding function later.
Returns
Record<string, any>
vectorField()
vectorField(optionsOrDatatype?): [DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]
vectorField is used in combination with LanceSchema to provide a declarative data model
Parameters
- optionsOrDatatype?:
DataType<Type,any> |Partial<FieldOptions<DataType<Type,any>>> The options for the field
Returns
[DataType<Type, any>, Map<string, EmbeddingFunction<any, FunctionOptions>>]