Commit Graph

18 Commits

Author SHA1 Message Date
BubbleCal
4372c231cd feat: support optimize indices in sync API (#1769)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-11-08 08:48:07 -08:00
Weston Pace
55104c5bae feat: allow distance type (metric) to be specified during hybrid search (#1777) 2024-10-29 13:51:18 -07:00
Gagan Bhullar
4d458d5829 feat(python): drop support for dictionary in Table.add (#1725)
PR closes #1706
2024-10-08 20:41:08 -06:00
Will Jones
2c4b07eb17 feat(python): merge_insert in async Python (#1707)
Fixes #1401
2024-10-01 10:06:52 -07:00
Ayush Chaurasia
f81ce68e41 fix(python): force deduce vector column name if running explicit hybrid query (#1692)
Right now when passing vector and query explicitly for hybrid search ,
vector_column_name is not deduced.
(https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/#hybrid-search-in-lancedb
). Because vector and query can be both none when initialising the
QueryBuilder in this case. This PR forces deduction of query type if it
is set to "hybrid"
2024-09-24 19:02:56 +05:30
LuQQiu
c7732585bf fix: support pyarrow input types (#1628)
fixes #1625 
Support PyArrow.RecordBatch, pa.dataset.Dataset, pa.dataset.Scanner,
paRecordBatchReader
2024-09-12 10:59:18 -07:00
James Wu
029b01bbbf feat: enable phrase_query(bool) for hybrid search queries (#1578)
first off, apologies for any folly since i'm new to contributing to
lancedb. this PR is the continuation of [a discord
thread](https://discord.com/channels/1030247538198061086/1030247538667827251/1278844345713299599):

## user story

here's the lance db search query i'd like to run:

```
def search(phrase):
    logger.info(f'Searching for phrase: {phrase}')
    phrase_embedding = get_embedding(phrase)
    df = (table.search((phrase_embedding, phrase), query_type='hybrid')
        .limit(10).to_list())
    logger.info(f'Success search with row count: {len(df)}')

search('howdy (howdy)')
search('howdy(howdy)')
```

the second search fails due to `ValueError: Syntax Error: howdy(howdy)`

i saw on the
[docs](https://lancedb.github.io/lancedb/fts/#phrase-queries-vs-terms-queries)
that i can use `phrase_query()` to [enable a
flag](https://github.com/lancedb/lancedb/blob/main/python/python/lancedb/query.py#L790-L792)
to wrap the query in double quotes (as well as sanitize single quotes)
prior to sending the query to search. this works for [normal
FTS](https://lancedb.github.io/lancedb/fts/), but the command is
unavailable on [hybrid
search](https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/).

## changes

i added `phrase_query()` function to `LanceHybridQueryBuilder` by
propagating the call down to its `self. _fts_query` object. i'm not too
familiar with the codebase and am not sure if this is the best way to
implement the functionality. feel free to riff on this PR or discard


## tests

```
(lancedb) JamesMPB:python james$ pwd
/Users/james/src/lancedb/python
(lancedb) JamesMPB:python james$ pytest python/tests/test_table.py 
python/tests/test_table.py .......................................                                                                   [100%]
====================================================== 39 passed, 1 warning in 2.23s =======================================================
```
2024-09-07 08:58:05 +05:30
BubbleCal
0fa50775d6 feat: support to query/index FTS on RemoteTable/AsyncTable (#1537)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-08-16 12:01:05 +08:00
Gagan Bhullar
20faa4424b feat(python): add delete unverified parameter (#1542)
PR fixes #1527
2024-08-15 09:01:32 -07:00
Lei Xu
2bdf0a02f9 feat!: upgrade lance to 0.16 (#1519) 2024-08-07 13:15:22 -07:00
Gagan Bhullar
32123713fd feat(python): optimize stats repr method (#1510)
PR fixes #1507
2024-08-07 08:47:52 -07:00
Lei Xu
0708428357 feat: support update over binary field (#1440) 2024-07-12 09:22:00 -07:00
Weston Pace
4f512af024 feat: add the optimize function to nodejs and async python (#1257)
The optimize function is pretty crucial for getting good performance
when building a large scale dataset but it was only exposed in rust
(many sync python users are probably doing this via to_lance today)

This PR adds the optimize function to nodejs and to python.

I left the function marked experimental because I think there will
likely be changes to optimization (e.g. if we add features like
"optimize on write"). I also only exposed the `cleanup_older_than`
configuration parameter since this one is very commonly used and the
rest have sensible defaults and we don't really know why we would
recommend different values for these defaults anyways.
2024-05-20 07:09:31 -07:00
Weston Pace
9031ec6878 feat: add update to the async API (#1093) 2024-04-05 16:32:15 -07:00
Weston Pace
47daf9b7b0 feat: add time travel operations to the async API (#1070) 2024-04-05 16:32:15 -07:00
Ayush Chaurasia
b5326d31e9 fix(python): Few fts patches (#1039)
1. filtering with fts mutated the schema, which caused schema mistmatch
problems with hybrid search as it combines fts and vector search tables.
2. fts with filter failed with `with_row_id`. This was because row_id
was calculated before filtering which caused size mismatch on attaching
it after.
3. The fix for 1 meant that now row_id is attached before filtering but
passing a filter to `to_lance` on a dataset that already contains
`_rowid` raises a panic from lance. So temporarily, in case where fts is
used with a filter AND `with_row_id`, we just force user to using the
duckdb pathway.

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-04-05 16:31:45 -07:00
Weston Pace
8033a44d68 feat: add support for add to async python API (#1037)
In order to add support for `add` we needed to migrate the rust `Table`
trait to a `Table` struct and `TableInternal` trait (similar to the way
the connection is designed).

While doing this we also cleaned up some inconsistencies between the
SDKs:

* Python and Node are garbage collected languages and it can be
difficult to trigger something to be freed. The convention for these
languages is to have some kind of close method. I added a close method
to both the table and connection which will drop the underlying rust
object.
* We made significant improvements to table creation in
cc5f2136a6
for the `node` SDK. I copied these changes to the `nodejs` SDK.
* The nodejs tables were using fs to create tmp directories and these
were not getting cleaned up. This is mostly harmless but annoying and so
I changed it up a bit to ensure we cleanup tmp directories.
* ~~countRows in the node SDK was returning `bigint`. I changed it to
return `number`~~ (this actually happened in a previous PR)
* Tables and connections now implement `std::fmt::Display` which is
hooked into python's `__repr__`. Node has no concept of a regular "to
string" function and so I added a `display` method.
* Python method signatures are changing so that optional parameters are
always `Optional[foo] = None` instead of something like `foo = False`.
This is because we want those defaults to be in rust whenever possible
(though we still need to mention the default in documentation).
* I changed the python `AsyncConnection/AsyncTable` classes from
abstract classes with a single implementation to just classes because we
no longer have the remote implementation in python.

Note: this does NOT add the `add` function to the remote table. This PR
was already large enough, and the remote implementation is unique
enough, that I am going to do all the remote stuff at a later date (we
should have the structure in place and correct so there shouldn't be any
refactor concerns)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-04-05 16:31:36 -07:00
Weston Pace
2cec2a8937 feat: add a basic async python client starting point (#1014)
This changes `lancedb` from a "pure python" setuptools project to a
maturin project and adds a rust lancedb dependency.

The async python client is extremely minimal (only `connect` and
`Connection.table_names` are supported). The purpose of this PR is to
get the infrastructure in place for building out the rest of the async
client.

Although this is not technically a breaking change (no APIs are
changing) it is still a considerable change in the way the wheels are
built because they now include the native shared library.
2024-04-05 16:31:34 -07:00