Commit Graph

983 Commits

Author SHA1 Message Date
Lance Release
41b77f5e25 Bump version: 0.7.0-beta.0 → 0.7.0 python-v0.7.0 2024-05-23 16:30:16 +00:00
Lance Release
eb8b3b8c54 Bump version: 0.6.13 → 0.7.0-beta.0 2024-05-23 16:30:16 +00:00
Weston Pace
f69c3e0595 chore: sync bumpversion.toml with actual version (#1321)
Attempting to create a new minor version failed with:

```
   Specified version (0.4.21-beta.0) does not match last tagged version (0.4.20) 
```

It seems the last release commit for rust/node was made without the new
process and did not adjust bumpversion.toml correctly (or maybe
bumpversion.toml did not exist at that time)
2024-05-23 09:29:40 -07:00
Weston Pace
8511edaaab fix: get the last stable release before we've added a new tag (#1320)
I tried to do a stable release and it failed with:

```
 Traceback (most recent call last):
  File "/home/runner/work/lancedb/lancedb/ci/check_breaking_changes.py", line 20, in <module>
    commits = repo.compare(args.base, args.head).commits
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Repository.py", line 1133, in compare
    headers, data = self._requester.requestJsonAndCheck("GET", f"{self.url}/compare/{base}...{head}", params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Requester.py", line 548, in requestJsonAndCheck
    return self.__check(*self.requestJson(verb, url, parameters, headers, input, self.__customConnection(url)))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Requester.py", line 609, in __check
    raise self.createException(status, responseHeaders, data)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits"}
```

I believe the problem is that we are calculating the
`LAST_STABLE_RELEASE` after we have run bump version and so the newly
created tag is in the list of tags we search and it is the most recent
one and so it gets included as `LAST_STABLE_RELEASE`. Then, the call to
github fails because we haven't pushed the tag yet. This changes the
logic to grab `LAST_STABLE_RELEASE` before we create any new tags.
2024-05-23 09:11:43 -07:00
Will Jones
657aba3c05 ci: pin aws sdk versions (#1318) 2024-05-22 08:26:09 -07:00
Rob Meng
2e197ef387 feat: upgrade lance to 0.11.0 (#1317)
upgrade lance and make fixes for the upgrade
2024-05-21 18:53:19 -04:00
Weston Pace
4f512af024 feat: add the optimize function to nodejs and async python (#1257)
The optimize function is pretty crucial for getting good performance
when building a large scale dataset but it was only exposed in rust
(many sync python users are probably doing this via to_lance today)

This PR adds the optimize function to nodejs and to python.

I left the function marked experimental because I think there will
likely be changes to optimization (e.g. if we add features like
"optimize on write"). I also only exposed the `cleanup_older_than`
configuration parameter since this one is very commonly used and the
rest have sensible defaults and we don't really know why we would
recommend different values for these defaults anyways.
2024-05-20 07:09:31 -07:00
Will Jones
5349e8b1db ci: make preview releases (#1302)
This PR changes the release process. Some parts are more complex, and
other parts I've simplified.

## Simplifications

* Combined `Create Release Commit` and `Create Python Release Commit`
into a single workflow. By default, it does a release of all packages,
but you can still choose to make just a Python or just Node/Rust release
through the arguments. This will make it rarer that we create a Node
release but forget about Python or vice-versa.
* Releases are automatically generated once a tag is pushed. This
eliminates the manual step of creating the release.
* Release notes are automatically generated and changes are categorized
based on the PR labels.
* Removed the use of `LANCEDB_RELEASE_TOKEN` in favor of just using
`GITHUB_TOKEN` where it wasn't necessary. In the one place it is
necessary, I left a comment as to why it is.
* Reused the version in `python/Cargo.toml` so we don't have two
different versions in Python LanceDB.

## New changes

* We now can create `preview` / `beta` releases. By default `Create
Release Commit` will create a preview release, but you can select a
"stable" release type and it will create a full stable release.
  * For Python, pre-releases go to fury.io instead of PyPI
* `bump2version` was deprecated, so upgraded to `bump-my-version`. This
also seems to better support semantic versioning with pre-releases.
* `ci` changes will now be shown in the changelog, allowing changes like
this to be visible to users. `chore` is still hidden.

## Versioning

**NOTE**: unlike how it is in lance repo right now, the version in main
is the last one released, including beta versions.

---------

Co-authored-by: Lance Release <lance-dev@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-05-17 11:24:38 -07:00
BubbleCal
5e01810438 feat: support IVF_HNSW_SQ (#1284)
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2024-05-16 14:28:06 +08:00
Cory Grinstead
6eaaee59f8 fix: remove accidental console.log (#1307)
i accidentally left a console.log when doing
https://github.com/lancedb/lancedb/pull/1290
2024-05-15 16:07:46 -05:00
Cory Grinstead
055efdcdb6 refactor(nodejs): use biomejs instead of eslint & prettier (#1304)
I've been noticing a lot of friction with the current toolchain for
'/nodejs'. Particularly with the usage of eslint and prettier.

[Biome](https://biomejs.dev/) is an all in one formatter & linter that
replaces the need for two different ones that can potentially clash with
one another.

I've been using it in the
[nodejs-polars](https://github.com/pola-rs/nodejs-polars) repo for quite
some time & have found it much more pleasant to work with.

---

One other small change included in this PR:

use [ts-jest](https://www.npmjs.com/package/ts-jest) so we can run our
tests without having to rebuild typescript code first
2024-05-14 11:11:18 -05:00
Cory Grinstead
bc582bb702 fix(nodejs): add better error handling when missing embedding functions (#1290)
note: 
running the default lint command `npm run lint -- --fix` seems to have
made a lot of unrelated changes.
2024-05-14 08:43:39 -05:00
Will Jones
df9c41f342 ci: write down breaking change policy (#1294)
* Enforce conventional commit PR titles
* Add automatic labelling of PRs
* Write down breaking change policy.

Left for another PR:
* Validation of breaking change version bumps. (This is complicated due
to separate releases for Python and other package.)
2024-05-13 10:25:55 -07:00
Raghav Dixit
0bd6ac945e Documentation : Langchain doc bug fix (#1301)
nav bar update
2024-05-13 20:56:34 +05:30
Raghav Dixit
c9d5475333 Documentation: Langchain Integration (#1297)
Integration doc update
2024-05-13 10:19:33 -04:00
asmith26
3850d5fb35 Add ollama embeddings function (#1263)
Following the docs
[here](https://lancedb.github.io/lancedb/python/python/#lancedb.embeddings.openai.OpenAIEmbeddings)
I've been trying to use ollama embedding via the OpenAI API interface,
but unfortunately I couldn't get it to work (possibly related to
https://github.com/ollama/ollama/issues/2416)

Given the popularity of ollama I thought it could be helpful to have a
dedicated Ollama Embedding function in lancedb.

Very much welcome any thought on this or my code etc. Thanks!
2024-05-13 13:09:19 +05:30
Lance Release
b37c58342e [python] Bump version: 0.6.12 → 0.6.13 python-v0.6.13 2024-05-10 16:15:13 +00:00
Lance Release
a06e64f22d Updating package-lock.json 2024-05-09 22:46:19 +00:00
Lance Release
e983198f0e Updating package-lock.json 2024-05-09 22:12:17 +00:00
Lance Release
76e7b4abf8 Updating package-lock.json 2024-05-09 21:14:47 +00:00
Lance Release
5f6eb4651e Bump version: 0.4.19 → 0.4.20 v0.4.20 2024-05-09 21:14:30 +00:00
Bert
805c78bb20 chore: bump lance to v0.10.18 (#1287)
https://github.com/lancedb/lance/releases/tag/v0.10.18
2024-05-09 17:06:26 -03:00
QianZhu
4746281b21 fix rename_table api and cache pop (#1283) 2024-05-08 13:41:18 -07:00
Aman Kishore
7b3b6bdccd Remove semvar strict dependancy (#1253) 2024-05-08 11:16:15 -07:00
Ryan Green
37e1124c0f chore: upgrade lance to 0.10.17 (#1280) 2024-05-08 09:56:48 -02:30
Lance Release
93f037ee41 Updating package-lock.json 2024-05-07 20:50:44 +00:00
Lance Release
e4fc06825a Updating package-lock.json 2024-05-07 20:09:25 +00:00
Lance Release
fe89a373a2 [python] Bump version: 0.6.11 → 0.6.12 python-v0.6.12 2024-05-07 19:27:17 +00:00
Lance Release
3d3915edef Updating package-lock.json 2024-05-07 19:04:42 +00:00
Lance Release
e2e8b6aee4 Bump version: 0.4.18 → 0.4.19 v0.4.19 2024-05-07 19:04:31 +00:00
Will Jones
12dbca5248 ci: better test for test_syntax (#1278)
The syntax error was fixed in tantivy 0.22.0, so I changed the test case
to something more wrong.
2024-05-07 11:52:39 -07:00
Will Jones
a6babfa651 fix(node/vectordb): parse value not key (#1276) 2024-05-07 10:16:05 -07:00
Will Jones
75ede86fab fix: clearer error that FTS is not supported on object stores (#1273)
Closes #1272
2024-05-07 10:15:53 -07:00
Will Jones
becd649130 docs: add tip about using allow_http on local servers (#1277)
Based on user question
https://discord.com/channels/1030247538198061086/1197630499926057021/1237350091191222293
2024-05-07 10:15:26 -07:00
Cory Grinstead
9d2fb7d602 feat: rust embedding registry (#1259)
Todo:

- [x] add proper documentation
- [x] add unit tests
- [x] better handling of the registry**1
- [x] allow user defined registry**2

**1 The python implementation just uses a global registry so it makes
things a bit easier. I attached it to the db/connection to prevent
future conflicts if running multiple connections/databases. I mostly
modeled the registry & pattern off of datafusion's
[FunctionRegistry](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html).

**2 Ideally, the user should be able to provide it's own registry
entirely, but currently it just uses an in memory registry by default
(_which isn't configurable_)

`rust/lancedb/examples/embedding_registry.rs` provides a thorough
example of expected usage.

---

Some additional notes:

This does not provide any of the out of box functionality that the
python registry does.

_i.e there are no built-in embedding functions._ 

You can think of this as the ground work for adding those built in
functions, So while this is part of
https://github.com/lancedb/lancedb/issues/994, it does not yet offer
feature parity.
2024-05-06 18:39:07 -05:00
Ben Poulson
fdb5d6fdf1 Update README.md to correct LangChain URL (#1262)
URL in the README for LangChain is currently 404ing. Here's the new URL.
2024-05-06 11:50:34 +05:30
Ayush Chaurasia
2f13fa225f Chore (python): Better retry loop logging when embedding api fails (#1267)
https://github.com/lancedb/lancedb/issues/1266#event-12703166915

This happens because openai API errors out with None values. The current
log level didn't really print out the msg on screen. Changed the log
level to warning, which better suits this case.

Also, retry loop can be disabled by setting `max_retries=0` (I'm not
sure if we should also set this as the default behaviour as hitting api
rate is quite common when ingesting large corpus)

```
func = get_registry().get("openai").create(max_retries=0)
````
2024-05-06 11:49:11 +05:30
Nehil Jain
e933de003d fix: Docs for embed_func fixed in youtube transcript search notebook (#1269)
Fixes issue https://github.com/lancedb/lancedb/issues/1268
2024-05-06 11:48:25 +05:30
Ikko Eltociear Ashimine
05fd387425 docs: update README.md (#1270)
retrevial -> retrieval
2024-05-06 11:46:48 +05:30
Will Jones
82a1da554c fix(python): return ValueError if passed unknown args to connect() (#1265)
It's confusing to users that keyword arguments from the async API like
`storage_options` are accepted by `connect()`, but don't do anything. We
should error if unknown arguments are passed instead.
2024-05-03 17:00:08 -07:00
Rohit Rastogi
a7c0d80b9e Implement convertors to and from Polars DataFrames in Rust SDK using convertors based on C FFI #1099 (#1260)
https://github.com/lancedb/lancedb/issues/1099

Took the same general approach from:
https://github.com/lancedb/lancedb/pull/1235. Instead of using
high-level convertors implemented in polars-arrow (with the arrow-rs
feature flag, which adds a dependency on arrow-rs), I used convertors
based on the C FFI to avoid dependency conflicts.

---------

Co-authored-by: Rohit Rastogi <rohitrastogi@Rohits-MacBook-Pro.local>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-05-03 16:15:14 -07:00
Cory Grinstead
71323a064a chore(nodejs): update docs on "table.ts" (#1255)
closes https://github.com/lancedb/lancedb/issues/1007
2024-05-01 23:00:22 -05:00
asmith26
df48454b70 Update embedding_functions.md (#1250)
`clip.ndims` seems to be a function (I installed with `pip install
open_clip_torch`).
2024-05-01 09:33:42 -07:00
Lance Release
6603414885 Updating package-lock.json 2024-04-30 20:57:12 +00:00
Lance Release
c256f6c502 Updating package-lock.json 2024-04-30 19:58:49 +00:00
Lance Release
cc03f90379 Updating package-lock.json 2024-04-30 19:21:48 +00:00
Lance Release
975da09b02 Bump version: 0.4.17 → 0.4.18 v0.4.18 2024-04-30 19:21:37 +00:00
Cory Grinstead
c32e17b497 chore(nodejs): remove "optionalDependencies" (#1252)
closes #1248 

the binding specific `optionalDependencies` are added automatically as
part of the `prepublishOnly` hook, and they are not supposed to be
committed to `package.json`.



--- 

npm lifecycle scripts: 
https://docs.npmjs.com/cli/v7/using-npm/scripts#life-cycle-scripts
2024-04-30 10:51:10 -05:00
Ryan Green
0528abdf97 fix: fix path on remote create_table and check for error response (#1244) 2024-04-28 11:33:05 -02:30
Lance Release
1090c311e8 [python] Bump version: 0.6.10 → 0.6.11 python-v0.6.11 2024-04-27 03:54:58 +00:00