[DOCS] Update integrations + small typos (#432)

Depends on - https://github.com/lancedb/lancedb/pull/430

---------

Co-authored-by: Kevin Tse <NivekT@users.noreply.github.com>
This commit is contained in:
Ayush Chaurasia
2023-08-18 09:59:22 +05:30
committed by GitHub
parent 0b9924b432
commit f35f8e451f
4 changed files with 39 additions and 20 deletions

View File

@@ -67,6 +67,7 @@ nav:
- LlamaIndex 🦙: https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
- Pydantic: python/pydantic.md
- Voxel51: integrations/voxel51.md
- PromptTools: integrations/prompttools.md
- Python examples:
- YouTube Transcript Search: notebooks/youtube_transcript_search.ipynb
- Documentation QA Bot using LangChain: notebooks/code_qa_bot.ipynb

View File

@@ -63,6 +63,25 @@ A Table is a collection of Records in a LanceDB Database.
table = db.create_table("table3", data, schema=custom_schema)
```
### From PyArrow Tables
You can also create LanceDB tables directly from pyarrow tables
```python
table = pa.Table.from_arrays(
[
pa.array([[3.1, 4.1], [5.9, 26.5]],
pa.list_(pa.float32(), 2)),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"],
)
db = lancedb.connect("db")
tbl = db.create_table("test1", table)
```
### From Pydantic Models
LanceDB supports to create Apache Arrow Schema from a Pydantic BaseModel via pydantic_to_schema() method.
@@ -86,10 +105,14 @@ A Table is a collection of Records in a LanceDB Database.
table = db.create_table(table_name, schema=Content.to_arrow_schema())
```
### Using RecordBatch Iterator / Writing Large Datasets
### Using Iterators / Writing Large Datasets
It is recommended to use RecordBatch itertator to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using `table.add()`
It is recommended to use itertators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using `table.add()`
LanceDB additionally supports pyarrow's `RecordBatch` Iterators or other generators producing supported data types.
Here's an example using using `RecordBatch` iterator for creating tables.
```python
import pyarrow as pa
@@ -97,7 +120,8 @@ A Table is a collection of Records in a LanceDB Database.
for i in range(5):
yield pa.RecordBatch.from_arrays(
[
pa.array([[3.1, 4.1], [5.9, 26.5]]),
pa.array([[3.1, 4.1], [5.9, 26.5]],
pa.list_(pa.float32(), 2)),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
@@ -105,7 +129,7 @@ A Table is a collection of Records in a LanceDB Database.
)
schema = pa.schema([
pa.field("vector", pa.list_(pa.float32())),
pa.field("vector", pa.list_(pa.float32(), 2)),
pa.field("item", pa.utf8()),
pa.field("price", pa.float32()),
])
@@ -113,20 +137,7 @@ A Table is a collection of Records in a LanceDB Database.
db.create_table("table4", make_batches(), schema=schema)
```
You can also use Pandas dataframe directly in the above example by converting it to `RecordBatch` object
```python
import pandas as pd
import pyarrow as pa
df = pd.DataFrame({'vector': [[0,1], [2,3], [4,5],[6,7]],
'month': [3, 5, 7, 9],
'day': [1, 5, 9, 13],
'n_legs': [2, 4, 5, 100],
'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]})
batch = pa.RecordBatch.from_pandas(df)
```
You can also use iterators of other types like Pandas dataframe or Pylists directly in the above example.
## Creating Empty Table
You can also create empty tables in python. Initialize it with schema and later ingest data into it.

View File

@@ -0,0 +1,7 @@
[PromptTools](https://github.com/hegelai/prompttools) offers a set of free, open-source tools for testing and experimenting with models, prompts, and configurations. The core idea is to enable developers to evaluate prompts using familiar interfaces like code and notebooks. You can use it to experiment with different configurations of LanceDB, and test how LanceDB integrates with the LLM of your choice.
[Evaluating Prompts with PromptTools](./examples/prompttools-eval-prompts/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/prompttools-eval-prompts/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
![Alt text](https://prompttools.readthedocs.io/en/latest/_images/demo.gif "a title")

View File

@@ -149,14 +149,14 @@ class DBConnection(ABC):
... for i in range(5):
... yield pa.RecordBatch.from_arrays(
... [
... pa.array([[3.1, 4.1], [5.9, 26.5]]),
... pa.array([[3.1, 4.1], [5.9, 26.5]], pa.list_(pa.float32(), 2)),
... pa.array(["foo", "bar"]),
... pa.array([10.0, 20.0]),
... ],
... ["vector", "item", "price"],
... )
>>> schema=pa.schema([
... pa.field("vector", pa.list_(pa.float32())),
... pa.field("vector", pa.list_(pa.float32(), 2)),
... pa.field("item", pa.utf8()),
... pa.field("price", pa.float32()),
... ])