If we start supporting external catalogs then "drop database" may be
misleading (and not possible). We should be more clear that this is a
utility method to drop all tables. This is also a nice chance for some
consistency cleanup as it was `drop_db` in rust, `drop_database` in
python, and non-existent in typescript.
This PR also adds a public accessor to get the database trait from a
connection.
BREAKING CHANGE: the `drop_database` / `drop_db` methods are now
deprecated.
Similar to
c269524b2f
this PR reworks and exposes an internal trait (this time
`TableInternal`) to be a public trait. These two PRs together should
make it possible for others to integrate LanceDB on top of other
catalogs.
This PR also adds a basic `TableProvider` implementation for tables,
although some work still needs to be done here (pushdown not yet
enabled).
Closes#1106
Unfortunately, these need to be set at the connection level. I
investigated whether if we let users provide a callback they could use
`AsyncLocalStorage` to access their context. However, it doesn't seem
like NAPI supports this right now. I filed an issue:
https://github.com/napi-rs/napi-rs/issues/2456
This opens up the door for more custom database implementations than the
two we have today. The biggest change should be inivisble:
`ConnectionInternal` has been renamed to `Database`, made public, and
refactored
However, there are a few breaking changes. `data_storage_version` and
`enable_v2_manifest_paths` have been moved from options on
`create_table` to options for the database which are now set via
`storage_options`.
Before:
```
db = connect(uri)
tbl = db.create_table("my_table", data, data_storage_version="legacy", enable_v2_manifest_paths=True)
```
After:
```
db = connect(uri, storage_options={
"new_table_enable_v2_manifest_paths": "true",
"new_table_data_storage_version": "legacy"
})
tbl = db.create_table("my_table", data)
```
BREAKING CHANGE: the data_storage_version, enable_v2_manifest_paths
options have moved from options to create_table to storage_options.
BREAKING CHANGE: the use_legacy_format option has been removed,
data_storage_version has replaced it for some time now
* Make `npm run docs` fail if there are any warnings. This will catch
items missing from the API reference.
* Add a check in our CI to make sure `npm run dos` runs without warnings
and doesn't generate any new files (indicating it might be out-of-date.
* Hide constructors that aren't user facing.
* Remove unused enum `WriteMode`.
Closes#2068
Some Rust jobs (such as
[Rust/linux](https://github.com/lancedb/lancedb/actions/runs/13019232960/job/36315830779))
take almost minutes. This can be a bit of a bottleneck.
* Two fixes to make caches more effective
* Check in `Cargo.lock` so that dependencies don't change much between
runs
* Added a new CI job to validate we can build without a lockfile
* Altered build commands so they don't have contradictory features and
therefore don't trigger multiple builds
Sadly, I don't think there's much to be done for windows-arm64, as much
of the compile time is because the base image is so bare we need to
install the build tools ourselves.
This PR aims to fix#2047 by doing the following things:
- Add a distance_type parameter to the sync query builders of Python
SDK.
- Make metric an alias to distance_type.
* `createTable()` now saves embeddings in the schema metadata.
Previously, it would drop them. (`createEmptyTable()` was already tested
and worked.)
* `mergeInsert()` now uses embeddings.
Fixes#2066