This PR fixes build issues associated with `aws-lc-rs`, while
simplifying the build process. Previously, we used custom scripts for
the musl and Windows ARM builds. These were complicated and prone to
breaking. This PR switches to a setup that mirrors
https://github.com/napi-rs/package-template/blob/main/.github/workflows/CI.yml.
* linux glibc and musl builds now use the Docker images provided by the
napi project
* Windows ARM build now just cross compiles from Windows x64, which
turns out to work quite well.
@wjones127 is there a standard way you guys setup your virtualenv? I can
either relist all the dependencies in the pyright precommit section, or
specify a venv, or the user has to be in the virtual environment when
they run git commit. If the venv location was standardized or a python
manager like `uv` was used it would be easier to avoid duplicating the
pyright dependency list.
Per your suggestion, in `pyproject.toml` I added in all the passing
files to the `includes` section.
For ruff I upgraded the version and removed "TCH" which doesn't exist as
an option.
I added a `pyright_report.csv` which contains a list of all files sorted
by pyright errors ascending as a todo list to work on.
I fixed about 30 issues in `table.py` stemming from str's being passed
into methods that required a string within a set of string Literals by
extracting them into `types.py`
Can you verify in the rust bridge that the schema should be a property
and not a method here? If it's a method, then there's another place in
the code where `inner.schema` should be `inner.schema()`
``` python
class RecordBatchStream:
@property
def schema(self) -> pa.Schema: ...
```
Also unless the `_lancedb.pyi` file is wrong, then there is no
`__anext__` here for `__inner` when it's not an `AsyncGenerator` and
only `next` is defined:
``` python
async def __anext__(self) -> pa.RecordBatch:
return await self._inner.__anext__()
if isinstance(self._inner, AsyncGenerator):
batch = await self._inner.__anext__()
else:
batch = await self._inner.next()
if batch is None:
raise StopAsyncIteration
return batch
```
in the else statement, `_inner` is a `RecordBatchStream`
```python
class RecordBatchStream:
@property
def schema(self) -> pa.Schema: ...
async def next(self) -> Optional[pa.RecordBatch]: ...
```
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
fixes that all `neon pack-build` packs are named
`vectordb-linux-x64-musl-*.tgz` even when cross-compiling
adds 2nd param:
`TARGET_TRIPLE=${2:-x86_64-unknown-linux-gnu}`
`npm run pack-build -- -t $TARGET_TRIPLE`
This is done as setup for a PR that will fix the OpenAI dependency
issue.
* [x] FTS examples
* [x] Setup mock openai
* [x] Ran `npm audit fix`
* [x] sentences embeddings test
* [x] Double check formatting of docs examples
Building with Node 16 produced this error:
```
npm ERR! code ENOENT
npm ERR! syscall chmod
npm ERR! path /io/nodejs/node_modules/apache-arrow-15/bin/arrow2csv.cjs
npm ERR! errno -2
npm ERR! enoent ENOENT: no such file or directory, chmod '/io/nodejs/node_modules/apache-arrow-15/bin/arrow2csv.cjs'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent
```
[CI
Failure](https://github.com/lancedb/lancedb/actions/runs/10117131772/job/27981475770).
This looks like it is https://github.com/apache/arrow/issues/43341
Upgrading to Node 18 makes this goes away. Since Node 18 requires glibc
>= 2_28, we had to upgrade the manylinux version we are using. This is
fine since we already state a minimum Node version of 18.
This also upgrades the openssl version we bundle, as well as
consolidates the build files.
* Labelled jobs `vectordb` and `lancedb` so it's clear which package
they are for
* Fix permission issue in aarch64 Linux `vectordb` build that has been
blocking release for two months.
* Added Slack notifications for failure of these publish jobs.
I tried to do a stable release and it failed with:
```
Traceback (most recent call last):
File "/home/runner/work/lancedb/lancedb/ci/check_breaking_changes.py", line 20, in <module>
commits = repo.compare(args.base, args.head).commits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Repository.py", line 1133, in compare
headers, data = self._requester.requestJsonAndCheck("GET", f"{self.url}/compare/{base}...{head}", params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Requester.py", line 548, in requestJsonAndCheck
return self.__check(*self.requestJson(verb, url, parameters, headers, input, self.__customConnection(url)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/github/Requester.py", line 609, in __check
raise self.createException(status, responseHeaders, data)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits"}
```
I believe the problem is that we are calculating the
`LAST_STABLE_RELEASE` after we have run bump version and so the newly
created tag is in the list of tags we search and it is the most recent
one and so it gets included as `LAST_STABLE_RELEASE`. Then, the call to
github fails because we haven't pushed the tag yet. This changes the
logic to grab `LAST_STABLE_RELEASE` before we create any new tags.
This PR changes the release process. Some parts are more complex, and
other parts I've simplified.
## Simplifications
* Combined `Create Release Commit` and `Create Python Release Commit`
into a single workflow. By default, it does a release of all packages,
but you can still choose to make just a Python or just Node/Rust release
through the arguments. This will make it rarer that we create a Node
release but forget about Python or vice-versa.
* Releases are automatically generated once a tag is pushed. This
eliminates the manual step of creating the release.
* Release notes are automatically generated and changes are categorized
based on the PR labels.
* Removed the use of `LANCEDB_RELEASE_TOKEN` in favor of just using
`GITHUB_TOKEN` where it wasn't necessary. In the one place it is
necessary, I left a comment as to why it is.
* Reused the version in `python/Cargo.toml` so we don't have two
different versions in Python LanceDB.
## New changes
* We now can create `preview` / `beta` releases. By default `Create
Release Commit` will create a preview release, but you can select a
"stable" release type and it will create a full stable release.
* For Python, pre-releases go to fury.io instead of PyPI
* `bump2version` was deprecated, so upgraded to `bump-my-version`. This
also seems to better support semantic versioning with pre-releases.
* `ci` changes will now be shown in the changelog, allowing changes like
this to be visible to users. `chore` is still hidden.
## Versioning
**NOTE**: unlike how it is in lance repo right now, the version in main
is the last one released, including beta versions.
---------
Co-authored-by: Lance Release <lance-dev@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
When we turned on fat LTO builds, we made the release build job **much**
more compute and memory intensive. The ARM runners have particularly low
memory per core, which makes them susceptible to OOM errors. To avoid
issues, I have enabled memory swap on ARM and bumped the side of the
runner.
there's build failure for the rust artifact but the macos arm64 build
for npm publish still passed. So we had a silent failure for 2 releases.
By setting error to immediate this should cause fail immediately.
* Refactors the Node module to load the shared library from a separate
package. When a user does `npm install vectordb`, the correct optional
dependency is automatically downloaded by npm.
* Add scripts and instructions to build Linux and MacOS node artifacts
locally.
* Add instructions for publishing the npm module and crates.
Co-authored-by: Will Jones <willjones127@gmail.com>
Changes:
* Refactors the Node module to load the shared library from a separate
package. When a user does `npm install vectordb`, the correct optional
dependency is automatically downloaded by npm.
* Brings Rust and Node versions in alignment at 0.1.2.
* Add scripts and instructions to build Linux and MacOS node artifacts
locally.
* Add instructions for publishing the npm module and crates.