Commit Graph

699 Commits

Author SHA1 Message Date
BubbleCal
e70fd4fecc feat: support IVF_FLAT, binary vectors and hamming distance (#1955)
binary vectors and hamming distance can work on only IVF_FLAT, so
introduce them all in this PR.

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-12-24 10:36:20 -08:00
verma nakul
ac0068b80e feat(python): add ignore_missing to the async drop_table() method (#1953)
- feat(db): add `ignore_missing` to async `drop_table` method

Fixes #1951

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-12-24 10:33:47 -08:00
Hezi Zisman
ebac960571 feat(python): add bypass_vector_index to sync api (#1947)
Hi lancedb team,

This PR adds the `bypass_vector_index` logic to the sync API, as
described in [Issue
#535](https://github.com/lancedb/lancedb/issues/535). (Closes #535).

Iv'e implemented it only for the regular vector search. If you think it
should also be supported for FTS, Hybrid, or Empty queries and for the
cloud solution, please let me know, and I’ll be happy to extend it.

Since there’s no `CONTRIBUTING.md` or contribution guidelines, I opted
for the simplest implementation to get this started.

Looking forward to your feedback!

Thanks!

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-12-24 10:33:26 -08:00
Lance Release
cf8c2edaf4 Bump version: 0.17.1-beta.5 → 0.17.1-beta.6 2024-12-19 19:39:08 +00:00
Will Jones
61a714a459 docs: improve optimization docs (#1957)
* Add `See Also` section to `cleanup_old_files` and `compact_files` so
they know it's linked to `optimize`.
* Fixes link to `compact_files` arguments
* Improves formatting of note.
2024-12-19 10:55:11 -08:00
Will Jones
5ddd84cec0 feat: upgrade lance to 0.21.0-beta.5 (#1961) 2024-12-19 10:54:59 -08:00
Lance Release
144b7f5d54 Bump version: 0.17.1-beta.4 → 0.17.1-beta.5 2024-12-13 22:37:13 +00:00
LuQQiu
edc9b9adec chore: bump Lance version to v0.21.0-beta.4 (#1939) 2024-12-13 14:36:13 -08:00
Will Jones
980aa70e2d feat(python): async-sync feature parity on Table (#1914)
### Changes to sync API
* Updated `LanceTable` and `LanceDBConnection` reprs
* Add `storage_options`, `data_storage_version`, and
`enable_v2_manifest_paths` to sync create table API.
* Add `storage_options` to `open_table` in sync API.
* Add `list_indices()` and `index_stats()` to sync API
* `create_table()` will now create only 1 version when data is passed.
Previously it would always create two versions: 1 to create an empty
table and 1 to add data to it.

### Changes to async API
* Add `embedding_functions` to async `create_table()` API.
* Added `head()` to async API

### Refactors
* Refactor index parameters into dataclasses so they are easier to use
from Python
* Moved most tests to use an in-memory DB so we don't need to create so
many temp directories

Closes #1792
Closes #1932

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-12-13 12:56:44 -08:00
Lance Release
e3c6213333 Bump version: 0.17.1-beta.3 → 0.17.1-beta.4 2024-12-13 05:33:34 +00:00
Weston Pace
00552439d9 feat: upgrade lance to 0.21.0b3 (#1936) 2024-12-12 21:32:59 -08:00
QianZhu
c0ee370f83 docs: improve schema evolution api examples (#1929) 2024-12-12 10:52:06 -08:00
Lance Release
bcbbeb7a00 Bump version: 0.17.1-beta.2 → 0.17.1-beta.3 2024-12-11 19:17:54 +00:00
Weston Pace
d6c0f75078 feat: upgrade to lance prerelease 0.21.0b2 (#1933) 2024-12-11 11:17:10 -08:00
Lance Release
f9789ec962 Bump version: 0.17.1-beta.1 → 0.17.1-beta.2 2024-12-11 17:57:18 +00:00
Lei Xu
347515aa51 fix: support list of numpy f16 floats as query vector (#1931)
User reported on Discord, when using
`table.vector_search([np.float16(1.0), np.float16(2.0), ...])`, it
yields `TypeError: 'numpy.float16' object is not iterable`
2024-12-10 16:17:28 -08:00
BubbleCal
3324e7d525 feat: support 4bit PQ (#1916) 2024-12-10 10:36:03 +08:00
Will Jones
ab5316b4fa feat: support offset in remote client (#1923)
Closes https://github.com/lancedb/lancedb/issues/1876
2024-12-09 17:04:18 -08:00
Lance Release
6e5927ce6d Bump version: 0.17.1-beta.0 → 0.17.1-beta.1 2024-12-09 08:40:35 +00:00
Lance Release
6ef20b85ca Bump version: 0.17.0 → 0.17.1-beta.0 2024-12-09 04:01:19 +00:00
LuQQiu
35bacdd57e feat: support azure account name storage options in sync db.connect (#1926)
db.connect with azure storage account name is supported in async connect
but not sync connect.
Add this functionality

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-12-08 20:00:23 -08:00
Will Jones
a5ebe5a6c4 fix: create_scalar_index in cloud (#1922)
Fixes #1920
2024-12-07 19:48:40 -08:00
Bert
2a9e3e2084 feat(python): support hybrid search in async sdk (#1915)
fixes: https://github.com/lancedb/lancedb/issues/1765

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-12-06 13:53:15 -05:00
Lance Release
fe655a15f0 Bump version: 0.17.0-beta.4 → 0.17.0 2024-12-06 17:12:43 +00:00
Lance Release
9d0af794d0 Bump version: 0.17.0-beta.3 → 0.17.0-beta.4 2024-12-06 17:12:43 +00:00
BubbleCal
c663085203 feat: support FTS options on RemoteTable (#1807) 2024-12-06 21:49:03 +08:00
Will Jones
3c487e5fc7 perf: re-use table instance during write (#1909)
Previously, whenever `Table.add()` was called, we would write and
re-open the underlying dataset. This was bad for performance, as it
reset the table cache and initiated a lot of IO. It also could be the
source of bugs, since we didn't necessarily pass all the necessary
connection options down when re-opening the table.

Closes #1655
2024-12-05 14:44:50 -08:00
Bert
239f725b32 feat(python)!: async-sync feature parity on Connections (#1905)
Closes #1791
Closes #1764
Closes #1897 (Makes this unnecessary)

BREAKING CHANGE: when using azure connection string `az://...` the call
to connect will fail if the azure storage credentials are not set. this
is breaking from the previous behaviour where the call would fail after
connect, when user invokes methods on the connection.
2024-12-05 14:54:39 -05:00
Will Jones
5f261cf2d8 feat: upgrade to Lance v0.20.0 (#1908)
Upstream change log:
https://github.com/lancedb/lance/releases/tag/v0.20.0
2024-12-05 10:53:59 -08:00
Will Jones
79eaa52184 feat: schema evolution APIs in all SDKs (#1851)
* Support `add_columns`, `alter_columns`, `drop_columns` in Remote SDK
and async Python
* Add `data_type` parameter to node
* Docs updates
2024-12-04 14:47:50 -08:00
Lei Xu
bd82e1f66d feat(python): add support for Azure OpenAPI SDK (#1906)
Closes #1699
2024-12-04 13:09:38 -08:00
Lance Release
12c7bd18a5 Bump version: 0.17.0-beta.2 → 0.17.0-beta.3 2024-12-04 01:13:18 +00:00
Weston Pace
c998a47e17 feat: add a pyarrow dataset adapater for LanceDB tables (#1902)
This currently only works for local tables (remote tables cannot be
queried)
This is also exclusive to the sync interface. However, since the pyarrow
dataset interface is synchronous I am not sure if there is much value in
making an async-wrapping variant.

In addition, I added a `to_batches` method to the base query in the sync
API. This already exists in the async API. In the sync API this PR only
adds support for vector queries and scalar queries and not for hybrid or
FTS queries.
2024-12-03 15:42:54 -08:00
Frank Liu
d8c758513c feat: add multimodal capabilities for Voyage embedder (#1878)
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-12-03 10:25:48 -08:00
Will Jones
3795e02ee3 chore: fix ci on main (#1899) 2024-12-02 15:21:18 -08:00
Lance Release
4231925476 Bump version: 0.17.0-beta.1 → 0.17.0-beta.2 2024-11-29 22:45:55 +00:00
Lance Release
84a6693294 Bump version: 0.17.0-beta.0 → 0.17.0-beta.1 2024-11-29 18:16:02 +00:00
QianZhu
2616a50502 fix: test errors after setting default limit (#1891) 2024-11-26 16:03:16 -08:00
LuQQiu
d6d9cb7415 feat: bump lance to 0.20.0b3 (#1882)
Bump lance version.
Upstream change log:
https://github.com/lancedb/lance/releases/tag/v0.20.0-beta.3
2024-11-25 16:15:44 -08:00
Lance Release
38b0d91848 Bump version: 0.16.1-beta.0 → 0.17.0-beta.0 2024-11-25 22:05:49 +00:00
Will Jones
6826039575 fix(python): run remote SDK futures in background thread (#1856)
Users who call the remote SDK from code that uses futures (either
`ThreadPoolExecutor` or `asyncio`) can get odd errors like:

```
Traceback (most recent call last):
  File "/usr/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7cfe94cdc900> is already entered
```

This PR fixes that by executing all LanceDB futures in a dedicated
thread pool running on a background thread. That way, it doesn't
interact with their threadpool.
2024-11-25 13:12:47 -08:00
Lei Xu
2ded17452b fix(python)!: handle bad openai embeddings gracefully (#1873)
BREAKING-CHANGE: change Pydantic Vector field to be nullable by default.
Closes #1577
2024-11-23 13:33:52 -08:00
Lance Release
96933d7df8 Bump version: 0.16.0 → 0.16.1-beta.0 2024-11-21 21:52:39 +00:00
Lei Xu
d369233b3d feat: bump lance to 0.20.0b2 (#1865)
Bump lance version.
Upstream change log:
https://github.com/lancedb/lance/releases/tag/v0.20.0-beta.2
2024-11-21 13:16:59 -08:00
QianZhu
43a670ed4b fix: limit docstring change (#1860) 2024-11-21 10:50:50 -08:00
Bert
cb9a00a28d feat: add list_versions to typescript, rust and remote python sdks (#1850)
Will require update to lance dependency to bring in this change which
makes the version serializable
https://github.com/lancedb/lance/pull/3143
2024-11-21 13:35:14 -05:00
Max Epstein
72af977a73 fix(CohereReranker): updated default model_name param to newest v3 (#1862) 2024-11-21 09:02:49 -08:00
Bert
7cecb71df0 feat: support for checkout and checkout_latest in remote sdks (#1863) 2024-11-21 11:28:46 -05:00
BubbleCal
b2f88f0b29 feat: support to sepcify ef search param (#1844)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-11-19 23:12:25 +08:00
Lei Xu
267aa83bf8 feat(python): check vector query is not None (#1847)
Fix the type hints of `nearest_to` method, and raise `ValueError` when
the input is None
2024-11-18 14:15:22 -08:00