Commit Graph

699 Commits

Author SHA1 Message Date
Lance Release
482f1ee1d3 Bump version: 0.18.1-beta.2 → 0.18.1-beta.3 2025-02-01 01:20:49 +00:00
Will Jones
2f39274a66 feat: upgrade lance to 0.23.0-beta.4 (#2089)
Upstream changelog:
https://github.com/lancedb/lance/releases/tag/v0.23.0-beta.4
2025-01-31 17:20:15 -08:00
Will Jones
2fc174f532 docs: add sync/async tabs to quickstart (#2087)
Closes #2033
2025-01-31 15:43:54 -08:00
Will Jones
dba85f4d6f docs: user guide for merge insert (#2083)
Closes #2062
2025-01-31 10:03:21 -08:00
Will Jones
15f8f4d627 ci: check license headers (#2076)
Based on the same workflow in Lance.
2025-01-29 08:27:07 -08:00
Lance Release
a9897d9d85 Bump version: 0.18.1-beta.1 → 0.18.1-beta.2 2025-01-28 22:31:14 +00:00
Will Jones
acda7a4589 feat: upgrade lance to v0.23.0-beta.3 (#2074)
This includes several bugfixes for `merge_insert` and null handling in
vector search.

https://github.com/lancedb/lance/releases/tag/v0.23.0-beta.3
2025-01-28 14:00:06 -08:00
Vaibhav
dac0857745 feat: add distance_type() parameter to python sync query builders and metric() as an alias (#2073)
This PR aims to fix #2047 by doing the following things:
- Add a distance_type parameter to the sync query builders of Python
SDK.
- Make metric an alias to distance_type.
2025-01-28 13:59:53 -08:00
Lance Release
e5f42a850e Bump version: 0.18.1-beta.0 → 0.18.1-beta.1 2025-01-23 23:01:13 +00:00
Will Jones
28e1b70e4b fix(python): preserve original distance and score in hybrid queries (#2061)
Fixes #2031

When we do hybrid search, we normalize the scores. We do this
calculation in-place, because the Rerankers expect the `_distance` and
`_score` columns to be the normalized ones. So I've changed the logic so
that we restore the original distance and scores by matching on row ids.
2025-01-23 13:54:26 -08:00
Will Jones
52b79d2b1e feat: upgrade lance to v0.23.0-beta.2 (#2063)
Fixes https://github.com/lancedb/lancedb/issues/2043
2025-01-23 13:51:30 -08:00
Will Jones
bcfc93cc88 fix(python): various fixes for async query builders (#2048)
This includes several improvements and fixes to the Python Async query
builders:

1. The API reference docs show all the methods for each builder
2. The hybrid query builder now has all the same setter methods as the
vector search one, so you can now set things like `.distance_type()` on
a hybrid query.
3. Re-rankers are now properly hooked up and tested for FTS and vector
search. Previously the re-rankers were accidentally bypassed in unit
tests, because the builders overrode `.to_arrow()`, but the unit test
called `.to_batches()` which was only defined in the base class. Now all
builders implement `.to_batches()` and leave `.to_arrow()` to the base
class.
4. The `AsyncQueryBase` and `AsyncVectoryQueryBase` setter methods now
return `Self`, which provides the appropriate subclass as the type hint
return value. Previously, `AsyncQueryBase` had them all hard-coded to
`AsyncQuery`, which was unfortunate. (This required bringing in
`typing-extensions` for older Python version, but I think it's worth
it.)
2025-01-20 16:14:34 -08:00
BubbleCal
214d0debf5 docs: claim LanceDB supports float16/float32/float64 for multivector (#2040) 2025-01-21 07:04:15 +08:00
Will Jones
f059372137 feat: add drop_index() method (#2039)
Closes #1665
2025-01-20 10:08:51 -08:00
Lance Release
3dc1803c07 Bump version: 0.18.0 → 0.18.1-beta.0 2025-01-17 04:37:23 +00:00
BubbleCal
d0501f65f1 fix: linear reranker applies wrong score to combine (#2035)
related to #2014 
this fixes:
- linear reranker may lost some results if the merging consumes all
vector results earlier than fts results
- linear reranker inverts the fts score but only vector distance can be
inverted

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-17 11:33:48 +08:00
Bert
4703cc6894 chore: upgrade lance to v0.22.1-beta.3 (#2038) 2025-01-16 12:42:42 -05:00
BubbleCal
493f9ce467 fix: can't infer the vector column for multivector (#2026)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-16 14:08:04 +08:00
Weston Pace
5c759505b8 feat: upgrade lance 0.22.1b1 (#2029)
Now the version actually exists :)
2025-01-15 07:37:37 -08:00
BubbleCal
d57bed90e5 docs: add missing example code (#2025) 2025-01-14 21:17:05 -08:00
BubbleCal
648327e90c docs: show how to pack bits for binary vector (#2020)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-14 09:00:57 -08:00
Lance Release
995bd9bf37 Bump version: 0.18.0-beta.1 → 0.18.0 2025-01-14 01:02:26 +00:00
Lance Release
36cc06697f Bump version: 0.18.0-beta.0 → 0.18.0-beta.1 2025-01-14 01:02:25 +00:00
Will Jones
92dcf24b0c feat: upgrade Lance to v0.22.0 (#2017)
Upstream changelog:
https://github.com/lancedb/lance/releases/tag/v0.22.0
2025-01-13 15:06:01 -08:00
BubbleCal
66cbf6b6c5 feat: support multivector type (#2005)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-13 14:10:40 -08:00
Prashant Dixit
b66cd943a7 fix: broken voyageai embedding API (#2013)
This PR fixes the broken Embedding API for Voyageai.
2025-01-13 08:52:38 -08:00
Weston Pace
d8d11f48e7 feat: upgrade to lance 0.22.0b1 (#2011) 2025-01-10 12:51:52 -08:00
Lance Release
fbffe532a8 Bump version: 0.17.2-beta.2 → 0.18.0-beta.0 2025-01-10 19:01:20 +00:00
Will Jones
6eacae18c4 test: fix test failure from merge (#2007) 2025-01-09 11:27:24 -08:00
Bert
f4afe456e8 feat!: change default from postfiltering to prefiltering for sync python (#2000)
BREAKING CHANGE: prefiltering is now the default in the synchronous
python SDK

resolves: #1872
2025-01-08 19:13:58 -05:00
Renato Marroquin
ea5c2266b8 feat(python): support .rerank() on non-hybrid queries in Async API (WIP) (#1972)
Fixes https://github.com/lancedb/lancedb/issues/1950

---------

Co-authored-by: Renato Marroquin <renato.marroquin@oracle.com>
2025-01-08 16:42:47 -05:00
Will Jones
c557e77f09 feat(python)!: support inserting and upserting subschemas (#1965)
BREAKING CHANGE: For a field "vector", list of integers will now be
converted to binary (uint8) vectors instead of f32 vectors. Use float
values instead for f32 vectors.

* Adds proper support for inserting and upserting subsets of the full
schema. I thought I had previously implemented this in #1827, but it
turns out I had not tested carefully enough.
* Refactors `_santize_data` and other utility functions to be simpler
and not require `numpy` or `combine_chunks()`.
* Added a new suite of unit tests to validate sanitization utilities.

## Examples

```python
import pandas as pd
import lancedb

db = lancedb.connect("memory://demo")
intial_data = pd.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6],
    "c": [7, 8, 9]
})
table = db.create_table("demo", intial_data)

# Insert a subschema
new_data = pd.DataFrame({"a": [10, 11]})
table.add(new_data)
table.to_pandas()
```
```
    a    b    c
0   1  4.0  7.0
1   2  5.0  8.0
2   3  6.0  9.0
3  10  NaN  NaN
4  11  NaN  NaN
```


```python
# Upsert a subschema
upsert_data = pd.DataFrame({
    "a": [3, 10, 15],
    "b": [6, 7, 8],
})
table.merge_insert(on="a").when_matched_update_all().when_not_matched_insert_all().execute(upsert_data)
table.to_pandas()
```
```
    a    b    c
0   1  4.0  7.0
1   2  5.0  8.0
2   3  6.0  9.0
3  10  7.0  NaN
4  11  NaN  NaN
5  15  8.0  NaN
```
2025-01-08 10:11:10 -08:00
BubbleCal
3c0a64be8f feat: support distance range in queries (#1999)
this also updates the docs

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-08 11:03:27 +08:00
Will Jones
0e496ed3b5 docs: contributing guide (#1970)
* Adds basic contributing guides.
* Simplifies Python development with a Makefile.
2025-01-07 15:11:16 -08:00
QianZhu
17c9e9afea docs: add async examples to doc (#1941)
- added sync and async tabs for python examples
- moved python code to tests/docs

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-01-07 15:10:25 -08:00
Gagan Bhullar
b474f98049 feat(python): flatten in AsyncQuery (#1967)
PR fixes #1949

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-01-06 10:52:03 -08:00
Takahiro Ebato
2c05ffed52 feat(python): add to_polars to AsyncQueryBase (#1986)
Fixes https://github.com/lancedb/lancedb/issues/1952

Added `to_polars` method to `AsyncQueryBase`.
2025-01-06 09:35:28 -08:00
Lance Release
a27c5cf12b Bump version: 0.17.2-beta.1 → 0.17.2-beta.2 2025-01-06 05:34:27 +00:00
BubbleCal
f4dea72cc5 feat: support vector search with distance thresholds (#1993)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-06 13:23:39 +08:00
Lei Xu
f76c4a5ce1 chore: add pyright static type checking and fix some of the table interface (#1996)
* Enable `pyright` in the project
* Fixed some pyright typing errors in `table.py`
2025-01-04 15:24:58 -08:00
BubbleCal
445a312667 fix: selecting columns failed on FTS and hybrid search (#1991)
it reports error `AttributeError: 'builtins.FTSQuery' object has no
attribute 'select_columns'`
because we missed `select_columns` method in rust

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-01-03 13:08:12 +08:00
Lance Release
92d845fa72 Bump version: 0.17.2-beta.0 → 0.17.2-beta.1 2024-12-31 23:36:18 +00:00
Lei Xu
397813f6a4 chore: bump pylance to 0.21.1b1 (#1989) 2024-12-31 15:34:27 -08:00
Lei Xu
50c30c5d34 chore(python): fix typo of the synchronized checkout API (#1988) 2024-12-30 18:54:31 -08:00
Renato Marroquin
0cb6da6b7e docs: add new indexes to python docs (#1945)
closes issue #1855

Co-authored-by: Renato Marroquin <renato.marroquin@oracle.com>
2024-12-28 15:35:10 -08:00
BubbleCal
aec8332eb5 chore: add dynamic = ["version"] to pass build check (#1977)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-12-28 10:45:23 -08:00
Lance Release
dae8334d0b Bump version: 0.17.1 → 0.17.2-beta.0 2024-12-25 08:28:59 +00:00
BubbleCal
16cf2990f3 feat: create IVF_FLAT on remote table (#1978)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-12-25 14:57:07 +08:00
Lance Release
27404c8623 Bump version: 0.17.1-beta.7 → 0.17.1 2024-12-24 18:37:28 +00:00
Lance Release
f181c7e77f Bump version: 0.17.1-beta.6 → 0.17.1-beta.7 2024-12-24 18:37:27 +00:00