Fix broken link to embedding functions testing: broken link was verified after local docs build to have been repaired --------- Co-authored-by: Chang She <chang@lancedb.com>
3.5 KiB
Vector Search
Vector Search finds the nearest vectors from the database.
In a recommendation system or search engine, you can find similar products from
the one you searched.
In LLM and other AI applications,
each data point can be presented by the embeddings generated from some models,
it returns the most relevant features.
A search in high-dimensional vector space, is to find K-Nearest-Neighbors (KNN) of the query vector.
Metric
In LanceDB, a Metric is the way to describe the distance between a pair of vectors.
Currently, we support the following metrics:
| Metric | Description |
|---|---|
L2 |
Euclidean / L2 distance |
Cosine |
Cosine Similarity |
Dot |
Dot Production |
Search
Flat Search
If you do not create a vector index, LanceDB would need to exhaustively scan the entire vector column (via Flat Search)
and compute the distance for every vector in order to find the closest matches. This is effectively a KNN search.
=== "Python"
```python
import lancedb
import numpy as np
db = lancedb.connect("data/sample-lancedb")
tbl = db.open_table("my_vectors")
df = tbl.search(np.random.random((1536))) \
.limit(10) \
.to_list()
```
=== "JavaScript"
```javascript
const vectordb = require('vectordb')
const db = await vectordb.connect('data/sample-lancedb')
const tbl = await db.openTable("my_vectors")
const results_1 = await tbl.search(Array(1536).fill(1.2))
.limit(10)
.execute()
```
By default, l2 will be used as Metric type. You can customize the metric type
as well.
=== "Python"
```python
df = tbl.search(np.random.random((1536))) \
.metric("cosine") \
.limit(10) \
.to_list()
```
=== "JavaScript"
```javascript
const results_2 = await tbl.search(Array(1536).fill(1.2))
.metricType("cosine")
.limit(10)
.execute()
```
Approximate Nearest Neighbor (ANN) Search with Vector Index.
To accelerate vector retrievals, it is common to build vector indices. A vector index is a data structure specifically designed to efficiently organize and search vector data based on their similarity via the chosen distance metric. By constructing a vector index, you can reduce the search space and avoid the need for brute-force scanning of the entire vector column.
However, fast vector search using indices often entails making a trade-off with accuracy to some extent. This is why it is often called Approximate Nearest Neighbors (ANN) search, while the Flat Search (KNN) always returns 100% recall.
See ANN Index for more details.