Commit Graph

1073 Commits

Author SHA1 Message Date
Ayush Chaurasia
76fc16c7a1 docs: add retriever guide, address minor onboarding feedbacks & enhancement (#1326)
- Tried to address some onboarding feedbacks listed in
https://github.com/lancedb/lancedb/issues/1224
- Improve visibility of pydantic integration and embedding API. (Based
on onboarding feedback - Many ways of ingesting data, defining schema
but not sure what to use in a specific use-case)
- Add a guide that takes users through testing and improving retriever
performance using built-in utilities like hybrid-search and reranking
- Add some benchmarks for the above
- Add missing cohere docs

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-06-08 06:25:31 +05:30
Weston Pace
007f9c1af8 chore: change build machine for linux arm (#1360) 2024-06-06 13:22:58 -07:00
Lance Release
27e4ad3f11 Updating package-lock.json 2024-06-05 13:47:44 +00:00
Lance Release
df42943ccf Bump version: 0.5.2-beta.0 → 0.5.2 v0.5.2 2024-06-05 13:47:28 +00:00
Lance Release
3eec9ea740 Bump version: 0.5.1 → 0.5.2-beta.0 2024-06-05 13:47:27 +00:00
Lance Release
11fcdb1194 Bump version: 0.8.2-beta.0 → 0.8.2 python-v0.8.2 2024-06-05 13:47:16 +00:00
Lance Release
95a5a0d713 Bump version: 0.8.1 → 0.8.2-beta.0 2024-06-05 13:47:16 +00:00
Weston Pace
c3043a54c6 feat: bump lance dependency to 0.12.1 (#1357) 2024-06-05 06:07:11 -07:00
Weston Pace
d5586c9c32 feat: make it possible to opt in to using the v2 format (#1352)
This also exposed the max_batch_length configuration option in
python/node (it was needed to verify if we are actually in v2 mode or
not)
2024-06-04 21:52:14 -07:00
Rob Meng
d39e7d23f4 feat: fast path for checkout_latest (#1355)
similar to https://github.com/lancedb/lancedb/pull/1354
do locked IO less frequently
2024-06-04 23:01:28 -04:00
Rob Meng
ddceda4ff7 feat: add fast path to dataset reload (#1354)
most of the time we don't need to reload. Locking the write lock and
performing IO is not an ideal pattern.

This PR tries to make the critical section of `.write()` happen less
frequently.

This isn't the most ideal solution. The most ideal solution should not
lock until the new dataset has been loaded. But that would require too
much refactoring.
2024-06-04 19:03:53 -04:00
Cory Grinstead
70f92f19a6 feat(nodejs): table.search functionality (#1341)
closes https://github.com/lancedb/lancedb/issues/1256
2024-06-04 14:04:03 -05:00
Cory Grinstead
d9fb6457e1 fix(nodejs): better support for f16 and f64 (#1343)
closes https://github.com/lancedb/lancedb/issues/1292
closes https://github.com/lancedb/lancedb/issues/1293
2024-06-04 13:41:21 -05:00
Lei Xu
56b4fd2bd9 feat(rust): allow to create execution plan on queries (#1350) 2024-05-31 17:33:58 -07:00
paul n walsh
7c133ec416 feat(nodejs): table.toArrow function (#1282)
Addresses https://github.com/lancedb/lancedb/issues/1254.

---------

Co-authored-by: universalmind303 <cory.grinstead@gmail.com>
2024-05-31 13:24:21 -05:00
QianZhu
1dbb4cd1e2 fix: error msg when query vector dim is wrong (#1339)
- changed the error msg for table.search with wrong query vector dim 
- added missing fields for listIndices and indexStats to be consistent
with Python API - will make changes in node integ test
2024-05-31 10:18:06 -07:00
Paul Rinaldi
af65417d19 fix: update broken blog link on readme (#1310) 2024-05-31 10:04:56 -07:00
Cory Grinstead
01dd6c5e75 feat(rust): openai embedding function (#1275)
part of https://github.com/lancedb/lancedb/issues/994. 

Adds the ability to use the openai embedding functions.


the example can be run by the following

```sh
> EXPORT OPENAI_API_KEY="sk-..."
> cargo run --example openai --features=openai
```

which should output
```
Closest match: Winter Parka
```
2024-05-30 15:55:55 -05:00
Weston Pace
1e85b57c82 ci: don't update package locks if we are not releasing node (#1323)
This doesn't actually block a python-only release since this step runs
after the version bump has been pushed but it still would be nice for
the git job to finish successfully.
2024-05-30 04:42:06 -07:00
Ayush Chaurasia
16eff254ea feat: add support for new cohere models in cohere and bedrock embedding functions (#1335)
Fixes #1329

Will update docs on https://github.com/lancedb/lancedb/pull/1326
2024-05-30 10:20:03 +05:30
Lance Release
1b2463c5dd Updating package-lock.json 2024-05-30 01:00:43 +00:00
Lance Release
92f74f955f Bump version: 0.5.1-beta.0 → 0.5.1 v0.5.1 2024-05-30 01:00:28 +00:00
Lance Release
53b5ea3f92 Bump version: 0.5.0 → 0.5.1-beta.0 2024-05-30 01:00:28 +00:00
Lance Release
291ed41c3e Bump version: 0.8.1-beta.0 → 0.8.1 python-v0.8.1 2024-05-30 01:00:21 +00:00
Lance Release
fdda7b1a76 Bump version: 0.8.0 → 0.8.1-beta.0 2024-05-30 01:00:21 +00:00
Weston Pace
eb2cbedf19 feat: upgrade lance to 0.11.1 (#1338) 2024-05-29 16:28:09 -07:00
Cory Grinstead
bc139000bd feat(nodejs): add compatibility across arrow versions (#1337)
while adding some more docs & examples for the new js sdk, i ran across
a few compatibility issues when using different arrow versions. This
should fix those issues.
2024-05-29 17:36:34 -05:00
Cory Grinstead
dbea3a7544 feat: js embedding registry (#1308)
---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-05-29 13:12:19 -05:00
zhongpu
3bb7c546d7 fix: the bug of async connection context manager (#1333)
- add `return` for `__enter__`

The buggy code didn't return the object, therefore it will always return
None within a context manager:

```python
with await lancedb.connect_async("./.lancedb") as db:
        # db is always None
```

(BTW, why not to design an async context manager?)

- add a unit test for Async connection context manager

- update return type of `AsyncConnection.open_table` to `AsyncTable`

Although type annotation doesn't affect the functionality, it is helpful
for IDEs.
2024-05-29 09:33:32 -07:00
Cory Grinstead
2f4b70ecfe chore: clippy warnings inside java bindings (#1330)
this was causing unrelated PR's to fail.
https://github.com/lancedb/lancedb/actions/runs/9274579178/job/25517248069?pr=1308
2024-05-28 14:05:07 -05:00
Philip Meier
1ad1c0820d chore: replace semver dependency with packaging (#1311)
Fixes #1296 per title. See
https://github.com/lancedb/lancedb/pull/1298#discussion_r1603931457 Cc
@wjones127

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-05-28 10:05:16 -07:00
LuQQiu
db712b0f99 feat(java): add table names java api (#1279)
Add lancedb-jni and table names API

---------

Co-authored-by: Lei Xu <eddyxu@gmail.com>
2024-05-24 11:49:11 -07:00
BubbleCal
fd1a5ce788 feat: support IVF_HNSW_PQ (#1314)
this also simplifies the code of creating index with macro

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-05-24 18:32:00 +08:00
QianZhu
def087fc85 fix: parse index_stats for scalar index (#1319)
parse the index stats for scalar index - it is different from the index
stats for vector index
2024-05-23 13:10:46 -07:00
Lance Release
43f920182a Bump version: 0.8.0-beta.0 → 0.8.0 python-v0.8.0 2024-05-23 17:32:36 +00:00
Lance Release
718963d1fb Bump version: 0.7.0 → 0.8.0-beta.0 2024-05-23 17:32:36 +00:00
Weston Pace
e4dac751e7 chore: remove working-directory from pypi upload step (#1322)
The wheels are built to `WORKDIR/target/wheels` and the step was
configured to look for them at `WORKDIR/python/target/wheels`.
2024-05-23 10:31:32 -07:00
Lance Release
aae02953eb Updating package-lock.json 2024-05-23 16:30:46 +00:00
Lance Release
1d9f76bdda Bump version: 0.5.0-beta.0 → 0.5.0 v0.5.0 2024-05-23 16:30:27 +00:00
Lance Release
affdfc4d48 Bump version: 0.4.20 → 0.5.0-beta.0 2024-05-23 16:30:26 +00:00
Lance Release
41b77f5e25 Bump version: 0.7.0-beta.0 → 0.7.0 python-v0.7.0 2024-05-23 16:30:16 +00:00
Lance Release
eb8b3b8c54 Bump version: 0.6.13 → 0.7.0-beta.0 2024-05-23 16:30:16 +00:00
Weston Pace
f69c3e0595 chore: sync bumpversion.toml with actual version (#1321)
Attempting to create a new minor version failed with:

```
   Specified version (0.4.21-beta.0) does not match last tagged version (0.4.20) 
```

It seems the last release commit for rust/node was made without the new
process and did not adjust bumpversion.toml correctly (or maybe
bumpversion.toml did not exist at that time)
2024-05-23 09:29:40 -07:00
Weston Pace
8511edaaab fix: get the last stable release before we've added a new tag (#1320)
I tried to do a stable release and it failed with:

```
 Traceback (most recent call last):
  File "/home/runner/work/lancedb/lancedb/ci/check_breaking_changes.py", line 20, in <module>
    commits = repo.compare(args.base, args.head).commits
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Repository.py", line 1133, in compare
    headers, data = self._requester.requestJsonAndCheck("GET", f"{self.url}/compare/{base}...{head}", params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Requester.py", line 548, in requestJsonAndCheck
    return self.__check(*self.requestJson(verb, url, parameters, headers, input, self.__customConnection(url)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Requester.py", line 609, in __check
    raise self.createException(status, responseHeaders, data)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits"}
```

I believe the problem is that we are calculating the
`LAST_STABLE_RELEASE` after we have run bump version and so the newly
created tag is in the list of tags we search and it is the most recent
one and so it gets included as `LAST_STABLE_RELEASE`. Then, the call to
github fails because we haven't pushed the tag yet. This changes the
logic to grab `LAST_STABLE_RELEASE` before we create any new tags.
2024-05-23 09:11:43 -07:00
Will Jones
657aba3c05 ci: pin aws sdk versions (#1318) 2024-05-22 08:26:09 -07:00
Rob Meng
2e197ef387 feat: upgrade lance to 0.11.0 (#1317)
upgrade lance and make fixes for the upgrade
2024-05-21 18:53:19 -04:00
Weston Pace
4f512af024 feat: add the optimize function to nodejs and async python (#1257)
The optimize function is pretty crucial for getting good performance
when building a large scale dataset but it was only exposed in rust
(many sync python users are probably doing this via to_lance today)

This PR adds the optimize function to nodejs and to python.

I left the function marked experimental because I think there will
likely be changes to optimization (e.g. if we add features like
"optimize on write"). I also only exposed the `cleanup_older_than`
configuration parameter since this one is very commonly used and the
rest have sensible defaults and we don't really know why we would
recommend different values for these defaults anyways.
2024-05-20 07:09:31 -07:00
Will Jones
5349e8b1db ci: make preview releases (#1302)
This PR changes the release process. Some parts are more complex, and
other parts I've simplified.

## Simplifications

* Combined `Create Release Commit` and `Create Python Release Commit`
into a single workflow. By default, it does a release of all packages,
but you can still choose to make just a Python or just Node/Rust release
through the arguments. This will make it rarer that we create a Node
release but forget about Python or vice-versa.
* Releases are automatically generated once a tag is pushed. This
eliminates the manual step of creating the release.
* Release notes are automatically generated and changes are categorized
based on the PR labels.
* Removed the use of `LANCEDB_RELEASE_TOKEN` in favor of just using
`GITHUB_TOKEN` where it wasn't necessary. In the one place it is
necessary, I left a comment as to why it is.
* Reused the version in `python/Cargo.toml` so we don't have two
different versions in Python LanceDB.

## New changes

* We now can create `preview` / `beta` releases. By default `Create
Release Commit` will create a preview release, but you can select a
"stable" release type and it will create a full stable release.
  * For Python, pre-releases go to fury.io instead of PyPI
* `bump2version` was deprecated, so upgraded to `bump-my-version`. This
also seems to better support semantic versioning with pre-releases.
* `ci` changes will now be shown in the changelog, allowing changes like
this to be visible to users. `chore` is still hidden.

## Versioning

**NOTE**: unlike how it is in lance repo right now, the version in main
is the last one released, including beta versions.

---------

Co-authored-by: Lance Release <lance-dev@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-05-17 11:24:38 -07:00
BubbleCal
5e01810438 feat: support IVF_HNSW_SQ (#1284)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-05-16 14:28:06 +08:00
Cory Grinstead
6eaaee59f8 fix: remove accidental console.log (#1307)
i accidentally left a console.log when doing
https://github.com/lancedb/lancedb/pull/1290
2024-05-15 16:07:46 -05:00