The legacy `with_embeddings` API is for Python only and is deprecated.

### Hugging Face

The most popular open source option is to use the [sentence-transformers](https://www.sbert.net/) 
library, which can be installed via pip.

```bash
pip install sentence-transformers
```

The example below shows how to use the `paraphrase-albert-small-v2` model to generate embeddings 
for a given document.

```python
from sentence_transformers import SentenceTransformer

name="paraphrase-albert-small-v2"
model = SentenceTransformer(name)

# used for both training and querying
def embed_func(batch):
    return [model.encode(sentence) for sentence in batch]
```


### OpenAI

Another popular alternative is to use an external API like OpenAI's [embeddings API](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).

```python
import openai
import os

# Configuring the environment variable OPENAI_API_KEY
if "OPENAI_API_KEY" not in os.environ:
# OR set the key here as a variable
openai.api_key = "sk-..."

client = openai.OpenAI()

def embed_func(c):    
    rs = client.embeddings.create(input=c, model="text-embedding-ada-002")
    return [record.embedding for record in rs["data"]]
```


## Applying an embedding function to data

Using an embedding function, you can apply it to raw data
to generate embeddings for each record.

Say you have a pandas DataFrame with a `text` column that you want embedded,
you can use the `with_embeddings` function to generate embeddings and add them to 
an existing table.

```python
    import pandas as pd
    from lancedb.embeddings import with_embeddings

    df = pd.DataFrame(
        [
            {"text": "pepperoni"},
            {"text": "pineapple"}
        ]
    )
    data = with_embeddings(embed_func, df)

    # The output is used to create / append to a table
    tbl = db.create_table("my_table", data=data)
```

If your data is in a different column, you can specify the `column` kwarg to `with_embeddings`.

By default, LanceDB calls the function with batches of 1000 rows. This can be configured
using the `batch_size` parameter to `with_embeddings`.

LanceDB automatically wraps the function with retry and rate-limit logic to ensure the OpenAI
API call is reliable.

## Querying using an embedding function

!!! warning
    At query time, you **must** use the same embedding function you used to vectorize your data.
    If you use a different embedding function, the embeddings will not reside in the same vector
    space and the results will be nonsensical.

=== "Python"
     ```python
     query = "What's the best pizza topping?"
     query_vector = embed_func([query])[0]
     results = (
        tbl.search(query_vector)
        .limit(10)
        .to_pandas()
     )
     ```

     The above snippet returns a pandas DataFrame with the 10 closest vectors to the query.