mirror of
https://github.com/lancedb/lancedb.git
synced 2025-12-27 07:09:57 +00:00
@@ -38,13 +38,15 @@ nav:
|
||||
- Home: index.md
|
||||
- Basics: basic.md
|
||||
- Embeddings: embedding.md
|
||||
- Indexing: ann_indexes.md
|
||||
- Python full-text search: fts.md
|
||||
- Python integrations: integrations.md
|
||||
- Python examples:
|
||||
- YouTube Transcript Search using OpenAI: notebooks/youtube_transcript_search.ipynb
|
||||
- Documentation QA Bot using LangChain: notebooks/code_qa_bot.ipynb
|
||||
- Multimodal search using CLIP: notebooks/multimodal_search.ipynb
|
||||
- References:
|
||||
- Vector Search: search.md
|
||||
- Indexing: ann_indexes.md
|
||||
- API references:
|
||||
- Python API: python/python.md
|
||||
- Javascript API: javascript/modules.md
|
||||
|
||||
@@ -18,7 +18,7 @@ In the future we will look to automatically create and configure the ANN index.
|
||||
```python
|
||||
import lancedb
|
||||
import numpy as np
|
||||
uri = "~/.lancedb"
|
||||
uri = "data/sample-lancedb"
|
||||
db = lancedb.connect(uri)
|
||||
|
||||
# Create 10,000 sample vectors
|
||||
@@ -48,7 +48,7 @@ In the future we will look to automatically create and configure the ANN index.
|
||||
Since `create_index` has a training step, it can take a few minutes to finish for large tables. You can control the index
|
||||
creation by providing the following parameters:
|
||||
|
||||
- **metric** (default: "L2"): The distance metric to use. By default we use euclidean distance. We also support cosine distance.
|
||||
- **metric** (default: "L2"): The distance metric to use. By default we use euclidean distance. We also support "cosine" distance.
|
||||
- **num_partitions** (default: 256): The number of partitions of the index. The number of partitions should be configured so each partition has 3-5K vectors. For example, a table
|
||||
with ~1M vectors should use 256 partitions. You can specify arbitrary number of partitions but powers of 2 is most conventional.
|
||||
A higher number leads to faster queries, but it makes index generation slower.
|
||||
@@ -87,7 +87,7 @@ There are a couple of parameters that can be used to fine-tune the search:
|
||||
=== "Javascript"
|
||||
```javascript
|
||||
const results = await table
|
||||
.search(Array(1536).fill(1.2))
|
||||
.search(Array(768).fill(1.2))
|
||||
.limit(2)
|
||||
.nprobes(20)
|
||||
.refineFactor(10)
|
||||
|
||||
85
docs/src/search.md
Normal file
85
docs/src/search.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Vector Search
|
||||
|
||||
`Vector Search` finds the nearest vectors from the database.
|
||||
In a recommendation system or search engine, you can find similar products from
|
||||
the one you searched.
|
||||
In LLM and other AI applications,
|
||||
each data point can be [presented by the embeddings generated from some models](embedding.md),
|
||||
it returns the most relevant features.
|
||||
|
||||
A search in high-dimensional vector space, is to find `K-Nearest-Neighbors (KNN)` of the query vector.
|
||||
|
||||
## Metric
|
||||
|
||||
In LanceDB, a `Metric` is the way to describe the distance between a pair of vectors.
|
||||
Currently, we support the following metrics:
|
||||
|
||||
| Metric | Description |
|
||||
| ----------- | ------------------------------------ |
|
||||
| `L2` | [Euclidean / L2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) |
|
||||
| `Cosine` | [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity)|
|
||||
|
||||
|
||||
## Search
|
||||
|
||||
### Flat Search
|
||||
|
||||
|
||||
If there is no [vector index is created](ann_indexes.md), LanceDB will just brute-force scan
|
||||
the vector column and compute the distance.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
import lancedb
|
||||
db = lancedb.connect("data/sample-lancedb")
|
||||
|
||||
tbl = db.open_table("my_vectors")
|
||||
|
||||
df = tbl.search(np.random.random((768)))
|
||||
.limit(10)
|
||||
.to_df()
|
||||
```
|
||||
|
||||
=== "JavaScript"
|
||||
|
||||
```javascript
|
||||
const vectordb = require('vectordb')
|
||||
const db = await vectordb.connect('data/sample-lancedb')
|
||||
|
||||
tbl = db.open_table("my_vectors")
|
||||
|
||||
const results = await tbl.search(Array(768))
|
||||
.limit(20)
|
||||
.execute()
|
||||
```
|
||||
|
||||
By default, `l2` will be used as `Metric` type. You can customize the metric type
|
||||
as well.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
df = tbl.search(np.random.random((768)))
|
||||
.metric("cosine")
|
||||
.limit(10)
|
||||
.to_df()
|
||||
```
|
||||
|
||||
=== "JavaScript"
|
||||
|
||||
```javascript
|
||||
const vectordb = require('vectordb')
|
||||
const db = await vectordb.connect('data/sample-lancedb')
|
||||
|
||||
tbl = db.open_table("my_vectors")
|
||||
|
||||
const results = await tbl.search(Array(768))
|
||||
.metric("cosine")
|
||||
.limit(20)
|
||||
.execute()
|
||||
```
|
||||
|
||||
### Search with Vector Index.
|
||||
|
||||
See [ANN Index](ann_indexes.md) for more details.
|
||||
Reference in New Issue
Block a user