Commit Graph

15 Commits

Author SHA1 Message Date
Will Jones
8509f73221 feat: better errors for remote SDK (#1722)
* Adds nicer errors to remote SDK, that expose useful properties like
`request_id` and `status_code`.
* Makes sure the Python tracebacks print nicely by mapping the `source`
field from a Rust error to the `__cause__` field.
2024-10-08 22:21:13 -06:00
LuQQiu
abeaae3d80 feat!: upgrade Lance to 0.18.0 (#1657)
BREAKING CHANGE: default file format changed to Lance v2.0.

Upgrade Lance to 0.18.0

Change notes: https://github.com/lancedb/lance/releases/tag/v0.18.0
2024-09-19 10:50:26 -07:00
Will Jones
2a6586d6fb feat: add flag to enable faster manifest paths (#1612)
The new V2 manifest path scheme makes discovering the latest version of
a table constant time on object stores, regardless of the number of
versions in the table. See benchmarks in the PR here:
https://github.com/lancedb/lance/pull/2798

Closes #1583
2024-09-09 11:34:36 -07:00
BubbleCal
8dcd328dce feat: support to create table from record batch iterator (#1593) 2024-09-06 10:41:38 +08:00
BubbleCal
f9d5fa88a1 feat!: migrate FTS from tantivy to lance-index (#1483)
Lance now supports FTS, so add it into lancedb Python, TypeScript and
Rust SDKs.

For Python, we still use tantivy based FTS by default because the lance
FTS index now misses some features of tantivy.

For Python:
- Support to create lance based FTS index
- Support to specify columns for full text search (only available for
lance based FTS index)

For TypeScript:
- Change the search method so that it can accept both string and vector
- Support full text search

For Rust
- Support full text search

The others:
- Update the FTS doc

BREAKING CHANGE: 
- for Python, this renames the attached score column of FTS from "score"
to "_score", this could be a breaking change for users that rely the
scores

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-08-08 15:33:15 +08:00
Weston Pace
d4aad82aec fix: don't use v2 by default on empty table (#1469) 2024-07-23 06:47:49 -07:00
Weston Pace
d5586c9c32 feat: make it possible to opt in to using the v2 format (#1352)
This also exposed the max_batch_length configuration option in
python/node (it was needed to verify if we are actually in v2 mode or
not)
2024-06-04 21:52:14 -07:00
zhongpu
3bb7c546d7 fix: the bug of async connection context manager (#1333)
- add `return` for `__enter__`

The buggy code didn't return the object, therefore it will always return
None within a context manager:

```python
with await lancedb.connect_async("./.lancedb") as db:
        # db is always None
```

(BTW, why not to design an async context manager?)

- add a unit test for Async connection context manager

- update return type of `AsyncConnection.open_table` to `AsyncTable`

Although type annotation doesn't affect the functionality, it is helpful
for IDEs.
2024-05-29 09:33:32 -07:00
Weston Pace
3d7c48feca feat: allow the index_cache_size to be configured when opening a table (#1245)
This was already configurable in the rust API but it wasn't actually
being passed down to the underlying dataset. I added this option to both
the async python API and the new nodejs API.

I also added this option to the synchronous python API.

I did not add the option to vectordb.
2024-04-26 13:42:02 -07:00
Raghav Dixit
a6aa67baed python: Bug fixes / tests (#1210)
closes #1194 #1172 #1124 #1208 


@wjones127 : `if query_type != "fts":` is needed because both fts and
vector search create `LanceQueryBuilder` which has `vector_column_name`
as a required attribute.
2024-04-10 10:17:14 -07:00
Lei Xu
473ef7e426 chore: validate table name (#1146)
Closes #1129
2024-04-05 16:33:37 -07:00
Weston Pace
73c69a6b9a feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059)
The synchronous table_names function in python lancedb relies on arrow's
filesystem which behaves slightly differently than object_store. As a
result, the function would not work properly in GCS.

However, the async table_names function uses object_store directly and
thus is accurate. In most cases we can fallback to using the async
table_names function and so this PR does so. The one case we cannot is
if the user is already in an async context (we can't start a new async
event loop). Soon, we can just redirect those users to use the async API
instead of the sync API and so that case will eventually go away. For
now, we fallback to the old behavior.
2024-04-05 16:31:45 -07:00
Weston Pace
8033a44d68 feat: add support for add to async python API (#1037)
In order to add support for `add` we needed to migrate the rust `Table`
trait to a `Table` struct and `TableInternal` trait (similar to the way
the connection is designed).

While doing this we also cleaned up some inconsistencies between the
SDKs:

* Python and Node are garbage collected languages and it can be
difficult to trigger something to be freed. The convention for these
languages is to have some kind of close method. I added a close method
to both the table and connection which will drop the underlying rust
object.
* We made significant improvements to table creation in
cc5f2136a6
for the `node` SDK. I copied these changes to the `nodejs` SDK.
* The nodejs tables were using fs to create tmp directories and these
were not getting cleaned up. This is mostly harmless but annoying and so
I changed it up a bit to ensure we cleanup tmp directories.
* ~~countRows in the node SDK was returning `bigint`. I changed it to
return `number`~~ (this actually happened in a previous PR)
* Tables and connections now implement `std::fmt::Display` which is
hooked into python's `__repr__`. Node has no concept of a regular "to
string" function and so I added a `display` method.
* Python method signatures are changing so that optional parameters are
always `Optional[foo] = None` instead of something like `foo = False`.
This is because we want those defaults to be in rust whenever possible
(though we still need to mention the default in documentation).
* I changed the python `AsyncConnection/AsyncTable` classes from
abstract classes with a single implementation to just classes because we
no longer have the remote implementation in python.

Note: this does NOT add the `add` function to the remote table. This PR
was already large enough, and the remote implementation is unique
enough, that I am going to do all the remote stuff at a later date (we
should have the structure in place and correct so there shouldn't be any
refactor concerns)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-04-05 16:31:36 -07:00
Weston Pace
4299f719ec feat: port create_table to the async python API and the remote rust API (#1031)
I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking
changes from sync to async python.
2024-04-05 16:31:36 -07:00
Weston Pace
2cec2a8937 feat: add a basic async python client starting point (#1014)
This changes `lancedb` from a "pure python" setuptools project to a
maturin project and adds a rust lancedb dependency.

The async python client is extremely minimal (only `connect` and
`Connection.table_names` are supported). The purpose of this PR is to
get the infrastructure in place for building out the rest of the async
client.

Although this is not technically a breaking change (no APIs are
changing) it is still a considerable change in the way the wheels are
built because they now include the native shared library.
2024-04-05 16:31:34 -07:00