mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-07 12:22:59 +00:00
feat!: add variable store to embeddings registry (#2112)
BREAKING CHANGE: embedding function implementations in Node need to now call `resolveVariables()` in their constructors and should **not** implement `toJSON()`. This tries to address the handling of secrets. In Node, they are currently lost. In Python, they are currently leaked into the table schema metadata. This PR introduces an in-memory variable store on the function registry. It also allows embedding function definitions to label certain config values as "sensitive", and the preprocessing logic will raise an error if users try to pass in hard-coded values. Closes #2110 Closes #521 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>
This commit is contained in:
@@ -114,12 +114,37 @@ abstract generateEmbeddings(texts, ...args): Promise<number[][] | Float32Array[]
|
||||
|
||||
***
|
||||
|
||||
### getSensitiveKeys()
|
||||
|
||||
```ts
|
||||
protected getSensitiveKeys(): string[]
|
||||
```
|
||||
|
||||
Provide a list of keys in the function options that should be treated as
|
||||
sensitive. If users pass raw values for these keys, they will be rejected.
|
||||
|
||||
#### Returns
|
||||
|
||||
`string`[]
|
||||
|
||||
#### Inherited from
|
||||
|
||||
[`EmbeddingFunction`](EmbeddingFunction.md).[`getSensitiveKeys`](EmbeddingFunction.md#getsensitivekeys)
|
||||
|
||||
***
|
||||
|
||||
### init()?
|
||||
|
||||
```ts
|
||||
optional init(): Promise<void>
|
||||
```
|
||||
|
||||
Optionally load any resources needed for the embedding function.
|
||||
|
||||
This method is called after the embedding function has been initialized
|
||||
but before any embeddings are computed. It is useful for loading local models
|
||||
or other resources that are needed for the embedding function to work.
|
||||
|
||||
#### Returns
|
||||
|
||||
`Promise`<`void`>
|
||||
@@ -148,6 +173,28 @@ The number of dimensions of the embeddings
|
||||
|
||||
***
|
||||
|
||||
### resolveVariables()
|
||||
|
||||
```ts
|
||||
protected resolveVariables(config): Partial<M>
|
||||
```
|
||||
|
||||
Apply variables to the config.
|
||||
|
||||
#### Parameters
|
||||
|
||||
* **config**: `Partial`<`M`>
|
||||
|
||||
#### Returns
|
||||
|
||||
`Partial`<`M`>
|
||||
|
||||
#### Inherited from
|
||||
|
||||
[`EmbeddingFunction`](EmbeddingFunction.md).[`resolveVariables`](EmbeddingFunction.md#resolvevariables)
|
||||
|
||||
***
|
||||
|
||||
### sourceField()
|
||||
|
||||
```ts
|
||||
@@ -173,37 +220,15 @@ sourceField is used in combination with `LanceSchema` to provide a declarative d
|
||||
### toJSON()
|
||||
|
||||
```ts
|
||||
abstract toJSON(): Partial<M>
|
||||
toJSON(): Record<string, any>
|
||||
```
|
||||
|
||||
Convert the embedding function to a JSON object
|
||||
It is used to serialize the embedding function to the schema
|
||||
It's important that any object returned by this method contains all the necessary
|
||||
information to recreate the embedding function
|
||||
|
||||
It should return the same object that was passed to the constructor
|
||||
If it does not, the embedding function will not be able to be recreated, or could be recreated incorrectly
|
||||
Get the original arguments to the constructor, to serialize them so they
|
||||
can be used to recreate the embedding function later.
|
||||
|
||||
#### Returns
|
||||
|
||||
`Partial`<`M`>
|
||||
|
||||
#### Example
|
||||
|
||||
```ts
|
||||
class MyEmbeddingFunction extends EmbeddingFunction {
|
||||
constructor(options: {model: string, timeout: number}) {
|
||||
super();
|
||||
this.model = options.model;
|
||||
this.timeout = options.timeout;
|
||||
}
|
||||
toJSON() {
|
||||
return {
|
||||
model: this.model,
|
||||
timeout: this.timeout,
|
||||
};
|
||||
}
|
||||
```
|
||||
`Record`<`string`, `any`>
|
||||
|
||||
#### Inherited from
|
||||
|
||||
|
||||
Reference in New Issue
Block a user