mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-05 19:32:56 +00:00
docs: update lntegration docs & fixed links (#1423)
1. Updated langchain docs. 2. Minor update to llamaindex doc. 3. Added notebook examples and linked them correctly
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||

|
||||
|
||||
## Quick Start
|
||||
You can load your document data using langchain's loaders, for this example we are using `TextLoader` and `OpenAIEmbeddings` as the embedding model.
|
||||
You can load your document data using langchain's loaders, for this example we are using `TextLoader` and `OpenAIEmbeddings` as the embedding model. Checkout Complete example here - [LangChain demo](../notebooks/langchain_example.ipynb)
|
||||
```python
|
||||
import os
|
||||
from langchain.document_loaders import TextLoader
|
||||
@@ -38,6 +38,8 @@ The exhaustive list of parameters for `LanceDB` vector store are :
|
||||
- `api_key`: (Optional) API key to use for LanceDB cloud database. Defaults to `None`.
|
||||
- `region`: (Optional) Region to use for LanceDB cloud database. Only for LanceDB Cloud, defaults to `None`.
|
||||
- `mode`: (Optional) Mode to use for adding data to the table. Defaults to `'overwrite'`.
|
||||
- `reranker`: (Optional) The reranker to use for LanceDB.
|
||||
- `relevance_score_fn`: (Optional[Callable[[float], float]]) Langchain relevance score function to be used. Defaults to `None`.
|
||||
|
||||
```python
|
||||
db_url = "db://lang_test" # url of db you created
|
||||
@@ -54,12 +56,14 @@ vector_store = LanceDB(
|
||||
```
|
||||
|
||||
### Methods
|
||||
To add texts and store respective embeddings automatically:
|
||||
|
||||
##### add_texts()
|
||||
- `texts`: `Iterable` of strings to add to the vectorstore.
|
||||
- `metadatas`: Optional `list[dict()]` of metadatas associated with the texts.
|
||||
- `ids`: Optional `list` of ids to associate with the texts.
|
||||
- `kwargs`: `Any`
|
||||
|
||||
This method adds texts and stores respective embeddings automatically.
|
||||
|
||||
```python
|
||||
vector_store.add_texts(texts = ['test_123'], metadatas =[{'source' :'wiki'}])
|
||||
@@ -74,7 +78,6 @@ pd_df.to_csv("docsearch.csv", index=False)
|
||||
# you can also create a new vector store object using an older connection object:
|
||||
vector_store = LanceDB(connection=tbl, embedding=embeddings)
|
||||
```
|
||||
For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
|
||||
##### create_index()
|
||||
- `col_name`: `Optional[str] = None`
|
||||
- `vector_col`: `Optional[str] = None`
|
||||
@@ -82,6 +85,8 @@ For index creation make sure your table has enough data in it. An ANN index is u
|
||||
- `num_sub_vectors`: `Optional[int] = 96`
|
||||
- `index_cache_size`: `Optional[int] = None`
|
||||
|
||||
This method creates an index for the vector store. For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
|
||||
|
||||
```python
|
||||
# for creating vector index
|
||||
vector_store.create_index(vector_col='vector', metric = 'cosine')
|
||||
@@ -89,4 +94,108 @@ vector_store.create_index(vector_col='vector', metric = 'cosine')
|
||||
# for creating scalar index(for non-vector columns)
|
||||
vector_store.create_index(col_name='text')
|
||||
|
||||
```
|
||||
```
|
||||
|
||||
##### similarity_search()
|
||||
- `query`: `str`
|
||||
- `k`: `Optional[int] = None`
|
||||
- `filter`: `Optional[Dict[str, str]] = None`
|
||||
- `fts`: `Optional[bool] = False`
|
||||
- `name`: `Optional[str] = None`
|
||||
- `kwargs`: `Any`
|
||||
|
||||
Return documents most similar to the query without relevance scores
|
||||
|
||||
```python
|
||||
docs = docsearch.similarity_search(query)
|
||||
print(docs[0].page_content)
|
||||
```
|
||||
|
||||
##### similarity_search_by_vector()
|
||||
- `embedding`: `List[float]`
|
||||
- `k`: `Optional[int] = None`
|
||||
- `filter`: `Optional[Dict[str, str]] = None`
|
||||
- `name`: `Optional[str] = None`
|
||||
- `kwargs`: `Any`
|
||||
|
||||
Returns documents most similar to the query vector.
|
||||
|
||||
```python
|
||||
docs = docsearch.similarity_search_by_vector(query)
|
||||
print(docs[0].page_content)
|
||||
```
|
||||
|
||||
##### similarity_search_with_score()
|
||||
- `query`: `str`
|
||||
- `k`: `Optional[int] = None`
|
||||
- `filter`: `Optional[Dict[str, str]] = None`
|
||||
- `kwargs`: `Any`
|
||||
|
||||
Returns documents most similar to the query string with relevance scores, gets called by base class's `similarity_search_with_relevance_scores` which selects relevance score based on our `_select_relevance_score_fn`.
|
||||
|
||||
```python
|
||||
docs = docsearch.similarity_search_with_relevance_scores(query)
|
||||
print("relevance score - ", docs[0][1])
|
||||
print("text- ", docs[0][0].page_content[:1000])
|
||||
```
|
||||
|
||||
##### similarity_search_by_vector_with_relevance_scores()
|
||||
- `embedding`: `List[float]`
|
||||
- `k`: `Optional[int] = None`
|
||||
- `filter`: `Optional[Dict[str, str]] = None`
|
||||
- `name`: `Optional[str] = None`
|
||||
- `kwargs`: `Any`
|
||||
|
||||
Return documents most similar to the query vector with relevance scores.
|
||||
Relevance score
|
||||
|
||||
```python
|
||||
docs = docsearch.similarity_search_by_vector_with_relevance_scores(query_embedding)
|
||||
print("relevance score - ", docs[0][1])
|
||||
print("text- ", docs[0][0].page_content[:1000])
|
||||
```
|
||||
|
||||
##### max_marginal_relevance_search()
|
||||
- `query`: `str`
|
||||
- `k`: `Optional[int] = None`
|
||||
- `fetch_k` : Number of Documents to fetch to pass to MMR algorithm, `Optional[int] = None`
|
||||
- `lambda_mult`: Number between 0 and 1 that determines the degree
|
||||
of diversity among the results with 0 corresponding
|
||||
to maximum diversity and 1 to minimum diversity.
|
||||
Defaults to 0.5. `float = 0.5`
|
||||
- `filter`: `Optional[Dict[str, str]] = None`
|
||||
- `kwargs`: `Any`
|
||||
|
||||
Returns docs selected using the maximal marginal relevance(MMR).
|
||||
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
|
||||
|
||||
Similarly, `max_marginal_relevance_search_by_vector()` function returns docs most similar to the embedding passed to the function using MMR. instead of a string query you need to pass the embedding to be searched for.
|
||||
|
||||
```python
|
||||
result = docsearch.max_marginal_relevance_search(
|
||||
query="text"
|
||||
)
|
||||
result_texts = [doc.page_content for doc in result]
|
||||
print(result_texts)
|
||||
|
||||
## search by vector :
|
||||
result = docsearch.max_marginal_relevance_search_by_vector(
|
||||
embeddings.embed_query("text")
|
||||
)
|
||||
result_texts = [doc.page_content for doc in result]
|
||||
print(result_texts)
|
||||
```
|
||||
|
||||
##### add_images()
|
||||
- `uris` : File path to the image. `List[str]`.
|
||||
- `metadatas` : Optional list of metadatas. `(Optional[List[dict]], optional)`
|
||||
- `ids` : Optional list of IDs. `(Optional[List[str]], optional)`
|
||||
|
||||
Adds images by automatically creating their embeddings and adds them to the vectorstore.
|
||||
|
||||
```python
|
||||
vec_store.add_images(uris=image_uris)
|
||||
# here image_uris are local fs paths to the images.
|
||||
```
|
||||
|
||||
|
||||
|
||||
@@ -2,7 +2,8 @@
|
||||

|
||||
|
||||
## Quick start
|
||||
You would need to install the integration via `pip install llama-index-vector-stores-lancedb` in order to use it. You can run the below script to try it out :
|
||||
You would need to install the integration via `pip install llama-index-vector-stores-lancedb` in order to use it.
|
||||
You can run the below script to try it out :
|
||||
```python
|
||||
import logging
|
||||
import sys
|
||||
@@ -43,6 +44,8 @@ retriever = index.as_retriever(vector_store_kwargs={"where": lance_filter})
|
||||
response = retriever.retrieve("What did the author do growing up?")
|
||||
```
|
||||
|
||||
Checkout Complete example here - [LlamaIndex demo](../notebooks/LlamaIndex_example.ipynb)
|
||||
|
||||
### Filtering
|
||||
For metadata filtering, you can use a Lance SQL-like string filter as demonstrated in the example above. Additionally, you can also filter using the `MetadataFilters` class from LlamaIndex:
|
||||
```python
|
||||
|
||||
Reference in New Issue
Block a user