docs(nodejs): add @lancedb/lancedb examples everywhere (#1411)

Co-authored-by: Will Jones <willjones127@gmail.com>
2026-05-30 18:30:40 +00:00 · 2024-07-10 13:29:03 -05:00
parent cef24801f4
commit 31be9212da
24 changed files with 1631 additions and 449 deletions
--- a/docs/src/embeddings/embedding_functions.md
+++ b/docs/src/embeddings/embedding_functions.md
@@ -6,8 +6,8 @@ For this purpose, LanceDB introduces an **embedding functions API**, that allow
    LanceDB Cloud does not support embedding functions yet. You need to generate embeddings before ingesting into the table or querying.

 !!! warning
-    Using the embedding function registry means that you don't have to explicitly generate the embeddings yourself. 
-    However, if your embedding function changes, you'll have to re-configure your table with the new embedding function 
+    Using the embedding function registry means that you don't have to explicitly generate the embeddings yourself.
+    However, if your embedding function changes, you'll have to re-configure your table with the new embedding function
    and regenerate the embeddings. In the future, we plan to support the ability to change the embedding function via
    table metadata and have LanceDB automatically take care of regenerating the embeddings.

@@ -16,7 +16,7 @@ For this purpose, LanceDB introduces an **embedding functions API**, that allow

 === "Python"
    In the LanceDB python SDK, we define a global embedding function registry with
-    many different embedding models and even more coming soon. 
+    many different embedding models and even more coming soon.
    Here's let's an implementation of CLIP as example.

    ```python
@@ -26,20 +26,35 @@ For this purpose, LanceDB introduces an **embedding functions API**, that allow
    clip = registry.get("open-clip").create()
    ```

-    You can also define your own embedding function by implementing the `EmbeddingFunction` 
+    You can also define your own embedding function by implementing the `EmbeddingFunction`
    abstract base interface. It subclasses Pydantic Model which can be utilized to write complex schemas simply as we'll see next!

-=== "JavaScript""
+=== "TypeScript"
    In the TypeScript SDK, the choices are more limited. For now, only the OpenAI
    embedding function is available.

    ```javascript
-    const lancedb = require("vectordb");
+    import * as lancedb from '@lancedb/lancedb'
+    import { getRegistry } from '@lancedb/lancedb/embeddings'

    // You need to provide an OpenAI API key
    const apiKey = "sk-..."
    // The embedding function will create embeddings for the 'text' column
-    const embedding = new lancedb.OpenAIEmbeddingFunction('text', apiKey)
+    const func = getRegistry().get("openai").create({apiKey})
+    ```
+=== "Rust"
+    In the Rust SDK, the choices are more limited. For now, only the OpenAI
+    embedding function is available. But unlike the Python and TypeScript SDKs, you need manually register the OpenAI embedding function.
+
+    ```toml
+    // Make sure to include the `openai` feature
+    [dependencies]
+    lancedb = {version = "*", features = ["openai"]}
+    ```
+
+    ```rust
+    --8<-- "rust/lancedb/examples/openai.rs:imports"
+    --8<-- "rust/lancedb/examples/openai.rs:openai_embeddings"
    ```

 ## 2. Define the data model or schema
@@ -55,14 +70,14 @@ For this purpose, LanceDB introduces an **embedding functions API**, that allow

    `VectorField` tells LanceDB to use the clip embedding function to generate query embeddings for the `vector` column and `SourceField` ensures that when adding data, we automatically use the specified embedding function to encode `image_uri`.

-=== "JavaScript"
+=== "TypeScript"

    For the TypeScript SDK, a schema can be inferred from input data, or an explicit
    Arrow schema can be provided.

 ## 3. Create table and add data

-Now that we have chosen/defined our embedding function and the schema, 
+Now that we have chosen/defined our embedding function and the schema,
 we can create the table and ingest data without needing to explicitly generate
 the embeddings at all:

@@ -74,17 +89,26 @@ the embeddings at all:
    table.add([{"image_uri": u} for u in uris])
    ```

-=== "JavaScript"
+=== "TypeScript"

-    ```javascript
-    const db = await lancedb.connect("data/sample-lancedb");
-    const data = [
-    { text: "pepperoni"},
-    { text: "pineapple"}
-    ]
+    === "@lancedb/lancedb"

-    const table = await db.createTable("vectors", data, embedding)
-    ```
+        ```ts
+        --8<-- "nodejs/examples/embedding.ts:imports"
+        --8<-- "nodejs/examples/embedding.ts:embedding_function"
+        ```
+
+    === "vectordb (deprecated)"
+
+        ```ts
+        const db = await lancedb.connect("data/sample-lancedb");
+        const data = [
+            { text: "pepperoni"},
+            { text: "pineapple"}
+        ]
+
+        const table = await db.createTable("vectors", data, embedding)
+        ```

 ## 4. Querying your table
 Not only can you forget about the embeddings during ingestion, you also don't
@@ -97,8 +121,8 @@ need to worry about it when you query the table:
    ```python
    results = (
        table.search("dog")
-        .limit(10)
-        .to_pandas()
+            .limit(10)
+            .to_pandas()
    )
    ```

@@ -109,22 +133,32 @@ need to worry about it when you query the table:
    query_image = Image.open(p)
    results = (
        table.search(query_image)
-        .limit(10)
-        .to_pandas()
+            .limit(10)
+            .to_pandas()
    )
    ```

    Both of the above snippet returns a pandas DataFrame with the 10 closest vectors to the query.

-=== "JavaScript"
+=== "TypeScript"
+
+    === "@lancedb/lancedb"
+
+        ```ts
+        const results = await table.search("What's the best pizza topping?")
+            .limit(10)
+            .toArray()
+        ```
+
+    === "vectordb (deprecated)
+
+        ```ts
+        const results = await table
+            .search("What's the best pizza topping?")
+            .limit(10)
+            .execute()
+        ```

-    ```javascript
-    const results = await table
-    .search("What's the best pizza topping?")
-    .limit(10)
-    .execute()
-    ```    
-    
    The above snippet returns an array of records with the top 10 nearest neighbors to the query.

 ---
--- a/docs/src/embeddings/index.md
+++ b/docs/src/embeddings/index.md
@@ -1,13 +1,13 @@
-Due to the nature of vector embeddings, they can be used to represent any kind of data, from text to images to audio. 
-This makes them a very powerful tool for machine learning practitioners. 
-However, there's no one-size-fits-all solution for generating embeddings - there are many different libraries and APIs 
+Due to the nature of vector embeddings, they can be used to represent any kind of data, from text to images to audio.
+This makes them a very powerful tool for machine learning practitioners.
+However, there's no one-size-fits-all solution for generating embeddings - there are many different libraries and APIs
 (both commercial and open source) that can be used to generate embeddings from structured/unstructured data.

 LanceDB supports 3 methods of working with embeddings.

 1. You can manually generate embeddings for the data and queries. This is done outside of LanceDB.
 2. You can use the built-in [embedding functions](./embedding_functions.md) to embed the data and queries in the background.
-3. For python users, you can define your own [custom embedding function](./custom_embedding_function.md)
+3. You can define your own [custom embedding function](./custom_embedding_function.md)
   that extends the default embedding functions.

 For python users, there is also a legacy [with_embeddings API](./legacy.md).
@@ -18,62 +18,89 @@ It is retained for compatibility and will be removed in a future version.
 To get started with embeddings, you can use the built-in embedding functions.

 ### OpenAI Embedding function
+
 LanceDB registers the OpenAI embeddings function in the registry as `openai`. You can pass any supported model name to the `create`. By default it uses `"text-embedding-ada-002"`.

-```python
-import lancedb
-from lancedb.pydantic import LanceModel, Vector
-from lancedb.embeddings import get_registry
+=== "Python"

-db = lancedb.connect("/tmp/db")
-func = get_registry().get("openai").create(name="text-embedding-ada-002")
+    ```python
+    import lancedb
+    from lancedb.pydantic import LanceModel, Vector
+    from lancedb.embeddings import get_registry

-class Words(LanceModel):
-    text: str = func.SourceField()
-    vector: Vector(func.ndims()) = func.VectorField()
+    db = lancedb.connect("/tmp/db")
+    func = get_registry().get("openai").create(name="text-embedding-ada-002")

-table = db.create_table("words", schema=Words, mode="overwrite")
-table.add(
-    [
-        {"text": "hello world"},
-        {"text": "goodbye world"}
-    ]
-    )
+    class Words(LanceModel):
+        text: str = func.SourceField()
+        vector: Vector(func.ndims()) = func.VectorField()

-query = "greetings"
-actual = table.search(query).limit(1).to_pydantic(Words)[0]
-print(actual.text)
-```
+    table = db.create_table("words", schema=Words, mode="overwrite")
+    table.add(
+        [
+            {"text": "hello world"},
+            {"text": "goodbye world"}
+        ]
+        )
+
+    query = "greetings"
+    actual = table.search(query).limit(1).to_pydantic(Words)[0]
+    print(actual.text)
+    ```
+
+=== "TypeScript"
+
+    ```typescript
+    --8<--- "nodejs/examples/embedding.ts:imports"
+    --8<--- "nodejs/examples/embedding.ts:openai_embeddings"
+    ```
+
+=== "Rust"
+
+    ```rust
+    --8<--- "rust/lancedb/examples/openai.rs:imports"
+    --8<--- "rust/lancedb/examples/openai.rs:openai_embeddings"
+    ```

 ### Sentence Transformers Embedding function
 LanceDB registers the Sentence Transformers embeddings function in the registry as `sentence-transformers`. You can pass any supported model name to the `create`. By default it uses `"sentence-transformers/paraphrase-MiniLM-L6-v2"`.

-```python
-import lancedb
-from lancedb.pydantic import LanceModel, Vector
-from lancedb.embeddings import get_registry
+=== "Python"
+    ```python
+    import lancedb
+    from lancedb.pydantic import LanceModel, Vector
+    from lancedb.embeddings import get_registry

-db = lancedb.connect("/tmp/db")
-model = get_registry().get("sentence-transformers").create(name="BAAI/bge-small-en-v1.5", device="cpu")
+    db = lancedb.connect("/tmp/db")
+    model = get_registry().get("sentence-transformers").create(name="BAAI/bge-small-en-v1.5", device="cpu")

-class Words(LanceModel):
-    text: str = model.SourceField()
-    vector: Vector(model.ndims()) = model.VectorField()
+    class Words(LanceModel):
+        text: str = model.SourceField()
+        vector: Vector(model.ndims()) = model.VectorField()

-table = db.create_table("words", schema=Words)
-table.add(
-    [
-        {"text": "hello world"},
-        {"text": "goodbye world"}
-    ]
-)
+    table = db.create_table("words", schema=Words)
+    table.add(
+        [
+            {"text": "hello world"},
+            {"text": "goodbye world"}
+        ]
+    )

-query = "greetings"
-actual = table.search(query).limit(1).to_pydantic(Words)[0]
-print(actual.text)
-```
+    query = "greetings"
+    actual = table.search(query).limit(1).to_pydantic(Words)[0]
+    print(actual.text)
+    ```
+
+=== "TypeScript"
+
+    Coming Soon!
+
+=== "Rust"
+
+    Coming Soon!

 ### Jina Embeddings
+
 LanceDB registers the JinaAI embeddings function in the registry as `jina`. You can pass any supported model name to the `create`. By default it uses `"jina-clip-v1"`.
 `jina-clip-v1` can handle both text and images and other models only support `text`.

@@ -104,4 +131,4 @@ table.add(
 query = "greetings"
 actual = table.search(query).limit(1).to_pydantic(Words)[0]
 print(actual.text)
-```
+```