@eddyxu added instructions for linting here:
7af213801a/python/README.md (L45-L50)
However, we had a lot of failures and weren't checking this in CI. This
PR fixes all lints and adds a check to CI to keep us in compliance with
the lints.
This mimics CREATE TABLE IF NOT EXISTS behavior.
We add `db.create_table(..., exist_ok=True)` parameter.
By default it is set to False, so trying to create
a table with the same name will raise an exception.
If set to True, then it only opens the table if it
already exists. If you pass in a schema, it will
be checked against the existing table to make sure
you get what you want. If you pass in data, it will
NOT be added to the existing table.
Add `to_list` to return query results as list of python dict (so we're
not too pandas-centric). Closes#555
Add `to_pandas` API and add deprecation warning on `to_df`. Closes#545
Co-authored-by: Chang She <chang@lancedb.com>
1. Support persistent embedding function so users can just search using
query string
2. Add fixed size list conversion for multiple vector columns
3. Add support for empty query (just apply select/where/limit).
4. Refactor and simplify some of the data prep code
---------
Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
It improves the UX as iterators can be of any type supported by the
table (plus recordbatch) & there is no separate requirement.
Also expands the test cases for pydantic & arrow schema.
If this is looks good I'll update the docs.
Example usage:
```
class Content(LanceModel):
vector: vector(2)
item: str
price: float
def make_batches():
for _ in range(5):
yield from [
# pandas
pd.DataFrame({
"vector": [[3.1, 4.1], [1, 1]],
"item": ["foo", "bar"],
"price": [10.0, 20.0],
}),
# pylist
[
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0},
],
# recordbatch
pa.RecordBatch.from_arrays(
[
pa.array([[3.1, 4.1], [5.9, 26.5]], pa.list_(pa.float32(), 2)),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"],
),
# pydantic list
[
Content(vector=[3.1, 4.1], item="foo", price=10.0),
Content(vector=[5.9, 26.5], item="bar", price=20.0),
]]
db = lancedb.connect("db")
tbl = db.create_table("tabley", make_batches(), schema=Content, mode="overwrite")
tbl.add(make_batches())
```
Same should with arrow schema.
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
#416 Fixed.
added drop_database() method . This deletes all the tables from the
database with a single command.
---------
Signed-off-by: Ashis Kumar Naik <ashishami2002@gmail.com>
Saves users from having to explicitly call
`LanceModel.to_arrow_schema()` when creating an empty table.
See new docs for full details.
---------
Co-authored-by: Chang She <chang@lancedb.com>
Makes the following work so all the formats accepted by `create_table()`
are also accepted by `add()`
```
import lancedb
import pyarrow as pa
db = lancedb.connect("/tmp")
def make_batches():
for i in range(5):
yield pa.RecordBatch.from_arrays(
[
pa.array([[3.1, 4.1], [5.9, 26.5]]),
pa.array(["foo", "bar"]),
pa.array([10.0, 20.0]),
],
["vector", "item", "price"],
)
schema = pa.schema([
pa.field("vector", pa.list_(pa.float32())),
pa.field("item", pa.utf8()),
pa.field("price", pa.float32()),
])
tbl = db.create_table("table4", make_batches(), schema=schema)
tbl.add(make_batches())
```