Commit Graph

45 Commits

Author SHA1 Message Date
Wyatt Alt
386fc9e466 feat: add num_attempts to merge insert result (#2795)
This pipes the num_attempts field from lance's merge insert result
through lancedb. This allows callers of merge_insert to get a better
idea of whether transaction conflicts are occurring.
2025-11-19 09:32:57 -08:00
Weston Pace
8f8e06a2da feat: add output_schema method to queries (#2717)
This is a helper utility I need for some of my data loader work. It
makes it easy to see the output schema even when a `select` has been
applied.
2025-10-14 05:13:28 -07:00
Weston Pace
5a19cf15a6 feat: a utility for creating "permutation views" (#2552)
I'm working on a lancedb version of pytorch data loading (and hopefully
addressing https://github.com/lancedb/lance/issues/3727).

However, rather than rely on pytorch for everything I'm moving some of
the things that pytorch does into rust. This gives us more control over
data loading (e.g. using shards or a hash-based split) and it allows
permutations to be persistent. In particular I hope to be able to:

* Create a persistent permutation
* This permutation can handle splits, filtering, shuffling, and sharding
* Create a rust data loader that can read a permutation (one or more
splits), or a subset of a permutation (for DDP)
* Create a python data loader that delegates to the rust data loader

Eventually create integrations for other data loading libraries,
including rust & node
2025-10-09 18:07:31 -07:00
Will Jones
ad09234d59 feat: allow setting train=False and name on indices (#2586)
Enables two new parameters when building indices:

* `name`: Allows explicitly setting a name on the index. Default is
`{col_name}_idx`.
* `train` (default `True`): When set to `False`, an empty index will be
immediately created.

The upgrade of Lance means there are also additional behaviors from
cd76a993b8:

* When a scalar index is created on a Table, it will be kept around even
if all rows are deleted or updated.
* Scalar indices can be created on empty tables. They will default to
`train=False` if the table is empty.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-08-15 14:00:26 -07:00
Weston Pace
ed640a76d9 feat: add take_offsets and take_row_ids (#2584)
These operations have existed in lance for a long while and many users
need to drop down to lance for this capability. This PR adds the API and
implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't
currently add any load to `BaseTable`. I'm not sure that is sustainable
as base table implementations may want to specialize how they handle
this method. However, I figure it is a good starting point.

In addition, unlike Lance, this API does not currently guarantee
anything about the order of the take results. This is necessary for the
fallback filter approach to work (SQL filters cannot guarantee result
order)
2025-08-15 06:48:24 -07:00
LuQQiu
ed594b0f76 feat: return version for all write operations (#2368)
return version info for all write operations (add, update, merge_insert
and column modification operations)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Table modification operations (add, update, delete, merge,
add/alter/drop columns) now return detailed result objects including
version numbers and operation statistics.
- Result objects provide clearer feedback such as rows affected and new
table version after each operation.

- **Documentation**
- Updated documentation to describe new result objects and their fields
for all relevant table operations.
- Added documentation for new result interfaces and updated method
return types in Node.js and Python APIs.

- **Tests**
- Enhanced test coverage to assert correctness of returned versioning
and operation metadata after table modifications.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 14:25:34 -07:00
Ryan Green
af54e0ce06 feat: add table stats API (#2363)
* Add a new "table stats" API to expose basic table and fragment
statistics with local and remote table implementations

### Questions
* This is using `calculate_data_stats` to determine total bytes in the
table. This seems like a potentially expensive operation - are there any
concerns about performance for large datasets?

### Notes
* bytes_on_disk seems to be stored at the column level but there does
not seem to be a way to easily calculate total bytes per fragment. This
may need to be added in lance before we can support fragment size
(bytes) statistics.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added a method to retrieve comprehensive table statistics, including
total rows, index counts, storage size, and detailed fragment size
metrics such as minimum, maximum, mean, and percentiles.
- Enabled fetching of table statistics from remote sources through
asynchronous requests.
- Extended table interfaces across Python, Rust, and Node.js to support
synchronous and asynchronous retrieval of table statistics.
- **Tests**
- Introduced tests to verify the accuracy of the new table statistics
feature for both populated and empty tables.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-29 15:19:08 -02:30
LuQQiu
a9311c4dc0 feat: add list/create/delete/update/checkout tag API (#2353)
add the tag related API to list existing tags, attach tag to a version,
update the tag version, delete tag, get the version of the tag, and
checkout the version that the tag bounded to.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced table version tagging, allowing users to create, update,
delete, and list human-readable tags for specific table versions.
  - Enabled checking out a table by either version number or tag name.
- Added new interfaces for tag management in both Python and Node.js
APIs, supporting synchronous and asynchronous workflows.

- **Bug Fixes**
  - None.

- **Documentation**
- Updated documentation to describe the new tagging features, including
usage examples.

- **Tests**
- Added comprehensive tests for tag creation, updating, deletion,
listing, and version checkout by tag in both Python and Node.js
environments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-28 10:04:46 -07:00
Ryan Green
3ae90dde80 feat: add new table API to wait for async indexing (#2338)
* Add new wait_for_index() table operation that polls until indices are
created/fully indexed
* Add an optional wait timeout parameter to all create_index operations
* Python and NodeJS interfaces

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Added optional waiting for index creation completion with configurable
timeout.
- Introduced methods to poll and wait for indices to be fully built
across sync and async tables.
  - Extended index creation APIs to accept a wait timeout parameter.
- **Bug Fixes**
- Added a new timeout error variant for improved error reporting on
index operations.
- **Tests**
- Added tests covering successful index readiness waiting, timeout
scenarios, and missing index cases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-21 08:41:21 -02:30
Weston Pace
26080ee4c1 feat: add prewarm_index function (#2342)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added the ability to prewarm (load into memory) table indexes via new
methods in Python, Node.js, and Rust APIs, potentially reducing
cold-start query latency.
- **Bug Fixes**
- Ensured prewarming an index does not interfere with subsequent search
operations.
- **Tests**
- Introduced new test cases to verify full-text search index creation,
prewarming, and search functionalities in both Python and Node.js.
- **Chores**
  - Updated dependencies for improved compatibility and performance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Lu Qiu <luqiujob@gmail.com>
2025-04-17 15:14:36 -07:00
BubbleCal
7ff6ec7fe3 feat: upgrade to lance v0.25.0-beta.5 (#2248)
- adds `loss` into the index stats for vector index
- now `optimize` can retrain the vector index

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-03-21 10:12:23 -07:00
Will Jones
15f8f4d627 ci: check license headers (#2076)
Based on the same workflow in Lance.
2025-01-29 08:27:07 -08:00
Will Jones
f059372137 feat: add drop_index() method (#2039)
Closes #1665
2025-01-20 10:08:51 -08:00
Will Jones
79eaa52184 feat: schema evolution APIs in all SDKs (#1851)
* Support `add_columns`, `alter_columns`, `drop_columns` in Remote SDK
and async Python
* Add `data_type` parameter to node
* Docs updates
2024-12-04 14:47:50 -08:00
Bert
cb9a00a28d feat: add list_versions to typescript, rust and remote python sdks (#1850)
Will require update to lance dependency to bring in this change which
makes the version serializable
https://github.com/lancedb/lance/pull/3143
2024-11-21 13:35:14 -05:00
Will Jones
a324f4ad7a feat(node): enable logging and show full errors (#1775)
This exposes the `LANCEDB_LOG` environment variable in node, so that
users can now turn on logging.

In addition, fixes a bug where only the top-level error from Rust was
being shown. This PR makes sure the full error chain is included in the
error message. In the future, will improve this so the error chain is
set on the [cause](https://nodejs.org/api/errors.html#errorcause)
property of JS errors https://github.com/lancedb/lancedb/issues/1779

Fixes #1774
2024-10-29 15:13:34 -07:00
Will Jones
f958f4d2e8 feat: remote index stats (#1702)
BREAKING CHANGE: the return value of `index_stats` method has changed
and all `index_stats` APIs now take index name instead of UUID. Also
several deprecated index statistics methods were removed.

* Removes deprecated methods for individual index statistics
* Aligns public `IndexStatistics` struct with API response from LanceDB
Cloud.
* Implements `index_stats` for remote Rust SDK and Python async API.
2024-09-27 12:10:00 -07:00
LuQQiu
abeaae3d80 feat!: upgrade Lance to 0.18.0 (#1657)
BREAKING CHANGE: default file format changed to Lance v2.0.

Upgrade Lance to 0.18.0

Change notes: https://github.com/lancedb/lance/releases/tag/v0.18.0
2024-09-19 10:50:26 -07:00
Will Jones
2a6586d6fb feat: add flag to enable faster manifest paths (#1612)
The new V2 manifest path scheme makes discovering the latest version of
a table constant time on object stores, regardless of the number of
versions in the table. See benchmarks in the PR here:
https://github.com/lancedb/lance/pull/2798

Closes #1583
2024-09-09 11:34:36 -07:00
Gagan Bhullar
d2caa5e202 feat(nodejs): add delete unverified (#1530)
PR fixes part of #1527
2024-08-14 08:53:53 -07:00
Lei Xu
2bdf0a02f9 feat!: upgrade lance to 0.16 (#1519) 2024-08-07 13:15:22 -07:00
Cory Grinstead
b8a1719174 feat(nodejs): catch unwinds in node bindings (#1414)
this bumps napi version to 2.16 which contains a few bug fixes.
Additionally, it adds `catch_unwind` to any method that may
unintentionally panic.

`catch_unwind` will unwind the panics and return a regular JS error
instead of panicking.
2024-07-01 09:28:10 -05:00
Cory Grinstead
55f88346d0 feat(nodejs): table.indexStats (#1361)
closes https://github.com/lancedb/lancedb/issues/1359
2024-06-21 17:06:52 -05:00
Cory Grinstead
3cd84c9375 feat(nodejs): feature parity [4/N] - add 'name' to 'IndexConfig' for 'listIndices' (#1390)
depends on https://github.com/lancedb/lancedb/pull/1386

see actual diff here
https://github.com/universalmind303/lancedb/compare/create-table-args...universalmind303:list-indices-name
2024-06-21 15:45:02 -05:00
Cory Grinstead
b3e5ac6d2a feat(nodejs): feature parity [2/N] - add table.name and lancedb.connect({args}) (#1380)
depends on https://github.com/lancedb/lancedb/pull/1378

see proper diff here
https://github.com/universalmind303/lancedb/compare/remote-table-node...universalmind303:lancedb:table-name
2024-06-21 11:38:26 -05:00
Cory Grinstead
bc19a75f65 feat(nodejs): merge insert (#1351)
closes https://github.com/lancedb/lancedb/issues/1349
2024-06-11 15:05:15 -05:00
Rob Meng
2e197ef387 feat: upgrade lance to 0.11.0 (#1317)
upgrade lance and make fixes for the upgrade
2024-05-21 18:53:19 -04:00
Weston Pace
4f512af024 feat: add the optimize function to nodejs and async python (#1257)
The optimize function is pretty crucial for getting good performance
when building a large scale dataset but it was only exposed in rust
(many sync python users are probably doing this via to_lance today)

This PR adds the optimize function to nodejs and to python.

I left the function marked experimental because I think there will
likely be changes to optimization (e.g. if we add features like
"optimize on write"). I also only exposed the `cleanup_older_than`
configuration parameter since this one is very commonly used and the
rest have sensible defaults and we don't really know why we would
recommend different values for these defaults anyways.
2024-05-20 07:09:31 -07:00
Weston Pace
968c62cb8f feat: introduce ArrowNative wrapper struct for adding data that is already a RecordBatchReader (#1139)
In
2de226220b
I added a new `IntoArrow` trait for adding data into a table.
Unfortunately, it seems my approach for implementing the trait for
"things that are already record batch readers" was flawed. This PR
corrects that flaw and, conveniently, removes the need to box readers at
all (though it is ok if you do).
2024-04-05 16:33:37 -07:00
Weston Pace
4180b44472 feat: refactor the query API and add query support to the python async API (#1113)
In addition, there are also a number of changes in nodejs to the
docstrings of existing methods because this PR adds a jsdoc linter.
2024-04-05 16:32:47 -07:00
Weston Pace
b6a522d483 feat: add list_indices to the async api (#1074) 2024-04-05 16:32:15 -07:00
Weston Pace
9031ec6878 feat: add update to the async API (#1093) 2024-04-05 16:32:15 -07:00
Weston Pace
47daf9b7b0 feat: add time travel operations to the async API (#1070) 2024-04-05 16:32:15 -07:00
Weston Pace
f822255683 feat: add create_index to the async python API (#1052)
This also refactors the rust lancedb index builder API (and,
correspondingly, the nodejs API)
2024-04-05 16:32:14 -07:00
Weston Pace
8033a44d68 feat: add support for add to async python API (#1037)
In order to add support for `add` we needed to migrate the rust `Table`
trait to a `Table` struct and `TableInternal` trait (similar to the way
the connection is designed).

While doing this we also cleaned up some inconsistencies between the
SDKs:

* Python and Node are garbage collected languages and it can be
difficult to trigger something to be freed. The convention for these
languages is to have some kind of close method. I added a close method
to both the table and connection which will drop the underlying rust
object.
* We made significant improvements to table creation in
cc5f2136a6
for the `node` SDK. I copied these changes to the `nodejs` SDK.
* The nodejs tables were using fs to create tmp directories and these
were not getting cleaned up. This is mostly harmless but annoying and so
I changed it up a bit to ensure we cleanup tmp directories.
* ~~countRows in the node SDK was returning `bigint`. I changed it to
return `number`~~ (this actually happened in a previous PR)
* Tables and connections now implement `std::fmt::Display` which is
hooked into python's `__repr__`. Node has no concept of a regular "to
string" function and so I added a `display` method.
* Python method signatures are changing so that optional parameters are
always `Optional[foo] = None` instead of something like `foo = False`.
This is because we want those defaults to be in rust whenever possible
(though we still need to mention the default in documentation).
* I changed the python `AsyncConnection/AsyncTable` classes from
abstract classes with a single implementation to just classes because we
no longer have the remote implementation in python.

Note: this does NOT add the `add` function to the remote table. This PR
was already large enough, and the remote implementation is unique
enough, that I am going to do all the remote stuff at a later date (we
should have the structure in place and correct so there shouldn't be any
refactor concerns)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-04-05 16:31:36 -07:00
Weston Pace
4299f719ec feat: port create_table to the async python API and the remote rust API (#1031)
I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking
changes from sync to async python.
2024-04-05 16:31:36 -07:00
Rob Meng
f3de3d990d chore: upgrade to lance 0.10.1 (#1034)
upgrade to lance 0.10.1 and update doc string to reflect dynamic
projection options
2024-04-05 16:31:36 -07:00
Will Jones
464a36ad38 feat: {add|alter|drop}_columns APIs (#1015)
Initial work for #959. This exposes the basic functionality for each in
all of the APIs. Will add user guide documentation in a later PR.
2024-04-05 16:30:47 -07:00
Weston Pace
2163502b31 refactor: rename the rust crate from vectordb to lancedb (#1012)
This also renames the new experimental node package to lancedb. The
classic node package remains named vectordb.

The goal here is to avoid introducing piecemeal breaking changes to the
vectordb crate. Instead, once the new API is stabilized, we will
officially release the lancedb crate and deprecate the vectordb crate.
The same pattern will eventually happen with the npm package vectordb.
2024-04-05 16:30:40 -07:00
Will Jones
c5b0934bfb feat(node): add read_consistency_interval to Node and Rust (#1002)
This PR adds the same consistency semantics as was added in #828. It
*does not* add the same lazy-loading of tables, since that breaks some
existing tests.

This closes #998.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2024-04-05 16:30:40 -07:00
Weston Pace
cbc0c439ef refactor: rust vectordb API stabilization of the Connection trait (#993)
This is the start of a more comprehensive refactor and stabilization of
the Rust API. The `Connection` trait is cleaned up to not require
`lance` and to match the `Connection` trait in other APIs. In addition,
the concrete implementation `Database` is hidden.

BREAKING CHANGE: The struct `crate::connection::Database` is now gone.
Several examples opened a connection using `Database::connect` or
`Database::connect_with_params`. Users should now use
`vectordb::connect`.

BREAKING CHANGE: The `connect`, `create_table`, and `open_table` methods
now all return a builder object. This means that a call like
`conn.open_table(..., opt1, opt2)` will now become
`conn.open_table(...).opt1(opt1).opt2(opt2).execute()` In addition, the
structure of options has changed slightly. However, no options
capability has been removed.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-04-05 16:30:40 -07:00
Weston Pace
138fc3f66b feat: add a filterable count_rows to all the lancedb APIs (#913)
A `count_rows` method that takes a filter was recently added to
`LanceTable`. This PR adds it everywhere else except `RemoteTable` (that
will come soon).
2024-04-05 16:29:58 -07:00
Lei Xu
db4a979278 feat(napi): Provide a new createIndex API in the napi SDK. (#857) 2024-04-05 16:27:51 -07:00
Lei Xu
dfabbe9081 feat(rust): create index API improvement (#853)
* Extract a minimal Table interface in Rust SDK
* Make create_index composable in Rust.
* Fix compiling issues from ffi
2024-04-05 16:27:51 -07:00
Lei Xu
efcaa433fe feat: rework NodeJS SDK using napi (#847)
Use Napi to write a Node.js SDK that follows Polars for better
maintainability, while keeping most of the logic in Rust.
2024-04-05 16:27:51 -07:00