mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-08 21:02:58 +00:00
docs: add lancedb embedding fcn on cloud docs (#1521)
This commit is contained in:
@@ -2,8 +2,8 @@ Representing multi-modal data as vector embeddings is becoming a standard practi
|
||||
|
||||
For this purpose, LanceDB introduces an **embedding functions API**, that allow you simply set up once, during the configuration stage of your project. After this, the table remembers it, effectively making the embedding functions *disappear in the background* so you don't have to worry about manually passing callables, and instead, simply focus on the rest of your data engineering pipeline.
|
||||
|
||||
!!! Note "LanceDB cloud doesn't support embedding functions yet"
|
||||
LanceDB Cloud does not support embedding functions yet. You need to generate embeddings before ingesting into the table or querying.
|
||||
!!! Note "Embedding functions on LanceDB cloud"
|
||||
When using embedding functions with LanceDB cloud, the embeddings will be generated on the source device and sent to the cloud. This means that the source device must have the necessary resources to generate the embeddings.
|
||||
|
||||
!!! warning
|
||||
Using the embedding function registry means that you don't have to explicitly generate the embeddings yourself.
|
||||
|
||||
@@ -99,28 +99,28 @@ LanceDB registers the Sentence Transformers embeddings function in the registry
|
||||
|
||||
Coming Soon!
|
||||
|
||||
### Jina Embeddings
|
||||
|
||||
LanceDB registers the JinaAI embeddings function in the registry as `jina`. You can pass any supported model name to the `create`. By default it uses `"jina-clip-v1"`.
|
||||
`jina-clip-v1` can handle both text and images and other models only support `text`.
|
||||
|
||||
You need to pass `JINA_API_KEY` in the environment variable or pass it as `api_key` to `create` method.
|
||||
### Embedding function with LanceDB cloud
|
||||
Embedding functions are now supported on LanceDB cloud. The embeddings will be generated on the source device and sent to the cloud. This means that the source device must have the necessary resources to generate the embeddings. Here's an example using the OpenAI embedding function:
|
||||
|
||||
```python
|
||||
import os
|
||||
import lancedb
|
||||
from lancedb.pydantic import LanceModel, Vector
|
||||
from lancedb.embeddings import get_registry
|
||||
os.environ['JINA_API_KEY'] = "jina_*"
|
||||
os.environ['OPENAI_API_KEY'] = "..."
|
||||
|
||||
db = lancedb.connect("/tmp/db")
|
||||
func = get_registry().get("jina").create(name="jina-clip-v1")
|
||||
db = lancedb.connect(
|
||||
uri="db://....",
|
||||
api_key="sk_...",
|
||||
region="us-east-1"
|
||||
)
|
||||
func = get_registry().get("openai").create()
|
||||
|
||||
class Words(LanceModel):
|
||||
text: str = func.SourceField()
|
||||
vector: Vector(func.ndims()) = func.VectorField()
|
||||
|
||||
table = db.create_table("words", schema=Words, mode="overwrite")
|
||||
table = db.create_table("words", schema=Words)
|
||||
table.add(
|
||||
[
|
||||
{"text": "hello world"},
|
||||
|
||||
Reference in New Issue
Block a user