[Doc] Metrics types. (#135)

Closes #129
2025-12-27 07:09:57 +00:00 · 2023-06-02 17:18:01 -07:00
parent adcb2a1387
commit cdd08ef35c
3 changed files with 91 additions and 4 deletions
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -38,13 +38,15 @@ nav:
 - Home: index.md
 - Basics: basic.md
 - Embeddings: embedding.md
- Indexing: ann_indexes.md
 - Python full-text search: fts.md
 - Python integrations: integrations.md
 - Python examples:
  - YouTube Transcript Search using OpenAI: notebooks/youtube_transcript_search.ipynb
  - Documentation QA Bot using LangChain: notebooks/code_qa_bot.ipynb
  - Multimodal search using CLIP: notebooks/multimodal_search.ipynb
+- References:
+  - Vector Search: search.md
+  - Indexing: ann_indexes.md
 - API references:
  - Python API: python/python.md
  - Javascript API: javascript/modules.md
--- a/docs/src/ann_indexes.md
+++ b/docs/src/ann_indexes.md
@@ -18,7 +18,7 @@ In the future we will look to automatically create and configure the ANN index.
     ```python
     import lancedb
     import numpy as np
-     uri = "~/.lancedb"
+     uri = "data/sample-lancedb"
     db = lancedb.connect(uri)

     # Create 10,000 sample vectors
@@ -48,7 +48,7 @@ In the future we will look to automatically create and configure the ANN index.
 Since `create_index` has a training step, it can take a few minutes to finish for large tables. You can control the index
 creation by providing the following parameters:

- **metric** (default: "L2"): The distance metric to use. By default we use euclidean distance. We also support cosine distance.
+- **metric** (default: "L2"): The distance metric to use. By default we use euclidean distance. We also support "cosine" distance.
 - **num_partitions** (default: 256): The number of partitions of the index. The number of partitions should be configured so each partition has 3-5K vectors. For example, a table
 with ~1M vectors should use 256 partitions. You can specify arbitrary number of partitions but powers of 2 is most conventional.
 A higher number leads to faster queries, but it makes index generation slower.
@@ -87,7 +87,7 @@ There are a couple of parameters that can be used to fine-tune the search:
 === "Javascript"
     ```javascript
     const results = await table
-         .search(Array(1536).fill(1.2))
+         .search(Array(768).fill(1.2))
         .limit(2)
         .nprobes(20)
         .refineFactor(10)
--- a/docs/src/search.md
+++ b/docs/src/search.md
@@ -0,0 +1,85 @@
+# Vector Search
+
+`Vector Search` finds the nearest vectors from the database.
+In a recommendation system or search engine, you can find similar products from
+the one you searched.
+In LLM and other AI applications,
+each data point can be [presented by the embeddings generated from some models](embedding.md),
+it returns the most relevant features.
+
+A search in high-dimensional vector space, is to find `K-Nearest-Neighbors (KNN)` of the query vector.
+
+## Metric
+
+In LanceDB, a `Metric` is the way to describe the distance between a pair of vectors.
+Currently, we support the following metrics:
+
+| Metric      | Description                          |
+| ----------- | ------------------------------------ |
+| `L2`        | [Euclidean / L2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) |
+| `Cosine`    | [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity)|
+
+
+## Search
+
+### Flat Search
+
+
+If there is no [vector index is created](ann_indexes.md), LanceDB will just brute-force scan
+the vector column and compute the distance.
+
+=== "Python"
+
+    ```python
+    import lancedb
+    db = lancedb.connect("data/sample-lancedb")
+
+    tbl = db.open_table("my_vectors")
+
+    df = tbl.search(np.random.random((768)))
+        .limit(10)
+        .to_df()
+    ```
+
+=== "JavaScript"
+
+    ```javascript
+    const vectordb = require('vectordb')
+    const db = await vectordb.connect('data/sample-lancedb')
+
+    tbl = db.open_table("my_vectors")
+
+    const results = await tbl.search(Array(768))
+        .limit(20)
+        .execute()
+    ```
+
+By default, `l2` will be used as `Metric` type. You can customize the metric type
+as well.
+
+=== "Python"
+
+    ```python
+    df = tbl.search(np.random.random((768)))
+        .metric("cosine")
+        .limit(10)
+        .to_df()
+    ```
+
+=== "JavaScript"
+
+    ```javascript
+    const vectordb = require('vectordb')
+    const db = await vectordb.connect('data/sample-lancedb')
+
+    tbl = db.open_table("my_vectors")
+
+    const results = await tbl.search(Array(768))
+        .metric("cosine")
+        .limit(20)
+        .execute()
+    ```
+
+### Search with Vector Index.
+
+See [ANN Index](ann_indexes.md) for more details.