It improves the UX as iterators can be of any type supported by the
table (plus recordbatch) & there is no separate requirement.
Also expands the test cases for pydantic & arrow schema.
If this is looks good I'll update the docs.
Example usage:
```
class Content(LanceModel):
vector: vector(2)
item: str
price: float
def make_batches():
for _ in range(5):
yield from [
# pandas
pd.DataFrame({
"vector": [[3.1, 4.1], [1, 1]],
"item": ["foo", "bar"],
"price": [10.0, 20.0],
}),
# pylist
[
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
],
# recordbatch
pa.RecordBatch.from_arrays(
[
pa.array([[3.1, 4.1], [5.9, 26.5]], pa.list_(pa.float32(), 2)),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"],
),
# pydantic list
[
Content(vector=[3.1, 4.1], item="foo", price=10.0),
Content(vector=[5.9, 26.5], item="bar", price=20.0),
]]
db = lancedb.connect("db")
tbl = db.create_table("tabley", make_batches(), schema=Content, mode="overwrite")
tbl.add(make_batches())
```
Same should with arrow schema.
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
This adds LanceTable.restore as a temporary feature. It reads data from
a previous version and creates
a new snapshot version using that data. This makes the version writeable
unlike checkout. This should be replaced once the feature is implemented
in pylance.
Co-authored-by: Chang She <chang@lancedb.com>
BREAKING CHANGE: The `score` column has been renamed to `_distance` to
more accurately describe the semantics (smaller means closer / better).
---------
Co-authored-by: Lei Xu <lei@lancedb.com>
#416 Fixed.
added drop_database() method . This deletes all the tables from the
database with a single command.
---------
Signed-off-by: Ashis Kumar Naik <ashishami2002@gmail.com>
Saves users from having to explicitly call
`LanceModel.to_arrow_schema()` when creating an empty table.
See new docs for full details.
---------
Co-authored-by: Chang She <chang@lancedb.com>
Makes the following work so all the formats accepted by `create_table()`
are also accepted by `add()`
```
import lancedb
import pyarrow as pa
db = lancedb.connect("/tmp")
def make_batches():
for i in range(5):
yield pa.RecordBatch.from_arrays(
[
pa.array([[3.1, 4.1], [5.9, 26.5]]),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"],
)
schema = pa.schema([
pa.field("vector", pa.list_(pa.float32())),
pa.field("item", pa.utf8()),
pa.field("price", pa.float32()),
])
tbl = db.create_table("table4", make_batches(), schema=schema)
tbl.add(make_batches())
```
* `Table::open()` from absolute path, and gives the responsibility of
organizing metadata out of Table object
* Fix Clippy warnings
* Add `Table::checkout(version)` API
It's inconvenient to always require data at table creation time.
Here we enable you to create an empty table and add data and set schema
later.
---------
Co-authored-by: Chang She <chang@lancedb.com>
Sometimes LangChain would insert a single `[np.nan]` as a placeholder if
the embedding function failed. This causes a problem for Lance format
because then the array can't be stored as a FixedSizedListArray.
Instead:
1. By default we remove rows with embedding lengths less than the
maximum length in the batch
2. If `strict=True` kwargs is set to True, then a `ValueError` is raised
if the embeddings aren't all the same length
---------
Co-authored-by: Chang She <chang@lancedb.com>