Commit Graph

550 Commits

Author SHA1 Message Date
Jack Ye
dadb042978 feat: bump lance to 0.38.3-beta.2 and rust to 1.90.0 (#2714) 2025-10-10 14:02:41 -07:00
Weston Pace
5a19cf15a6 feat: a utility for creating "permutation views" (#2552)
I'm working on a lancedb version of pytorch data loading (and hopefully
addressing https://github.com/lancedb/lance/issues/3727).

However, rather than rely on pytorch for everything I'm moving some of
the things that pytorch does into rust. This gives us more control over
data loading (e.g. using shards or a hash-based split) and it allows
permutations to be persistent. In particular I hope to be able to:

* Create a persistent permutation
* This permutation can handle splits, filtering, shuffling, and sharding
* Create a rust data loader that can read a permutation (one or more
splits), or a subset of a permutation (for DDP)
* Create a python data loader that delegates to the rust data loader

Eventually create integrations for other data loading libraries,
including rust & node
2025-10-09 18:07:31 -07:00
LuQQiu
86a6bb9fcb chore: supports limit push down through MetadataEraserExec (#2679)
For limit to sucessfully push down to FilteredReadExec
https://github.com/lancedb/lance/pull/4795/
2025-10-09 09:33:38 -07:00
BubbleCal
b59d1007d3 feat(index): add IVF_RQ index type (#2687)
this expose IVF_RQ (RabitQ quantization) index type to lancedb

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-10-09 15:46:18 +08:00
Lance Release
56a16b1728 Bump version: 0.22.2-beta.3 → 0.22.2 2025-10-08 18:13:08 +00:00
Lance Release
b7afed9beb Bump version: 0.22.2-beta.2 → 0.22.2-beta.3 2025-10-08 18:12:23 +00:00
Jack Ye
5ec12c9971 fix: federated database should not pass namesapce to listing database (#2702)
Fixes error that when converting a federated database operation to a
listing database operation, the namespace parameter is no longer correct
and should be dropped.

Note that with the testing infra we have today, we don't have a good way
to test these changes. I will do a quick follow up on
https://github.com/lancedb/lancedb/issues/2701 but would be great to get
this in first to resolve the related issues.
2025-10-06 14:12:41 -07:00
Lance Release
d7e02c8181 Bump version: 0.22.2-beta.1 → 0.22.2-beta.2 2025-10-06 18:10:40 +00:00
Will Jones
1ac745eb18 ci: fix Python and Node CI on main (#2700)
Example failure:
https://github.com/lancedb/lancedb/actions/runs/18237024283/job/51932651993
2025-10-06 09:40:08 -07:00
LuQQiu
0d78929893 feat: upgrade lance to 0.38.0 (#2695)
https://github.com/lancedb/lance/releases/tag/v0.38.0

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-10-03 16:47:05 -07:00
Lance Release
fec2a05629 Bump version: 0.22.2-beta.0 → 0.22.2-beta.1 2025-09-30 19:31:44 +00:00
Colin Patrick McCabe
82b25a71e9 feat: add support for test_remote_connections (#2666)
Add a new test feature which allows for running the lancedb tests
against a remote server. Convert over a few tests in src/connection.rs
as a proof of concept.

To make local development easier, the remote tests can be run locally
from a Makefile. This file can also be used to run the feature tests,
with a single invocation of 'make'. (The feature tests require bringing
up a docker compose environment.)
2025-09-26 11:24:43 -07:00
Jack Ye
13c613d45f chore: upgrade lance to v0.37.1-beta.1 (#2682) 2025-09-25 23:12:09 -07:00
Weston Pace
e07389a36c feat: allow bitmap indexes on large-string, binary, large-binary, and bitmap (#2678)
The underlying `pylance` already supported this, it was just blocked out
by an over-eager validation function

Closes #1981
2025-09-25 09:46:42 -07:00
Lance Release
e7e9e80b1d Bump version: 0.22.1 → 0.22.2-beta.0 2025-09-24 22:54:54 +00:00
Jack Ye
504bdc471c feat(rust): support namespace backed database (#2664)
This PR adds support for namespace-backed databases through
lance-namespace integration, enabling centralized table management
through namespace APIs.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-24 15:33:31 -07:00
Will Jones
d617cdef4a feat: add use_index parameter to merge insert operations (#2674)
## Summary

Exposes `use_index` Merge Insert parameter, which was created upstream
in https://github.com/lancedb/lance/pull/4688.

## API Examples

### Python
```python
# Force table scan
table.merge_insert(["id"]) \
    .when_not_matched_insert_all() \
    .use_index(False) \
    .execute(data)
```

### Node.js/TypeScript
```typescript
// Force table scan  
await table.mergeInsert("id")
    .whenNotMatchedInsertAll()
    .useIndex(false)
    .execute(data);
```

### Rust
```rust
// Force table scan
let mut builder = table.merge_insert(&["id"]);
builder.when_not_matched_insert_all()
       .use_index(false);
builder.execute(data).await?;
```

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-24 12:50:21 -07:00
Lance Release
d6cc68f671 Bump version: 0.22.1-beta.4 → 0.22.1 2025-09-23 22:07:31 +00:00
Lance Release
55eacfa685 Bump version: 0.22.1-beta.3 → 0.22.1-beta.4 2025-09-23 22:06:45 +00:00
Will Jones
1ab60fae7f feat: upgrade Lance to v0.37.0 (#2672)
Change logs:

* https://github.com/lancedb/lance/releases/tag/v0.37.0
* https://github.com/lancedb/lance/releases/tag/v0.36.0
2025-09-23 13:41:47 -07:00
Lance Release
05a4ea646a Bump version: 0.22.1-beta.2 → 0.22.1-beta.3 2025-09-22 04:49:00 +00:00
Jack Ye
ff71d7e552 feat: support shallow clone (#2653)
Support shallow cloning a dataset at a specific location to create a new
dataset, using the shallow_clone feature in Lance. Also introduce remote
`clone` API for remote tables for this functionality.
2025-09-21 21:28:40 -07:00
Lance Release
b5a39bffec Bump version: 0.22.1-beta.1 → 0.22.1-beta.2 2025-09-18 20:22:35 +00:00
Jack Ye
97e9938dfe fix: add missing validations to namespace operations (#2659) 2025-09-17 23:27:04 -07:00
Weston Pace
1d4b92e01e refactor: remove catalog implementation now that we have namespaces in database (#2662)
We had previously prototyped a `Catalog` trait anticipating a
three-tiered Catalog-Database-Table structure. Now that we have
namespaces in the `Database` we can support any tiering scheme and the
`Catalog` trait is no longer needed.
2025-09-17 08:40:20 -07:00
Jack Ye
0ebc8d45a8 chore: fix no lock build warnings and CI timeouts (#2650)
Example CI failures:
- publish build timeout:
https://github.com/lancedb/lancedb/actions/runs/17626482881/job/50084552906
- doc test build timeout:
https://github.com/lancedb/lancedb/actions/runs/17627058590/job/50086456818
2025-09-11 15:30:35 -07:00
BubbleCal
f7d78c3420 feat: add 'target_partition_size' param (#2642)
this exposes the param `target_partition_size` from lance

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2025-09-11 22:56:16 +08:00
Lance Release
6ea6884260 Bump version: 0.22.1-beta.0 → 0.22.1-beta.1 2025-09-10 20:49:43 +00:00
Jack Ye
8da74dcb37 feat: support per-request header override (#2631)
## Summary

This PR introduces a `HeaderProvider` which is called for all remote
HTTP calls to get the latest headers to inject. This is useful for
features like adding the latest auth tokens where the header provider
can auto-refresh tokens internally and each request always set the
refreshed token.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-10 13:44:00 -07:00
Lance Release
3c7419b392 Bump version: 0.22.0 → 0.22.1-beta.0 2025-09-10 14:24:58 +00:00
Wyatt Alt
e77d57a5b6 chore: update lance to 0.35.0-beta4 (#2639)
Updates lance to 0.35.0-beta4, which also incurs a datafusion update.
This brings in a fix for a memory leak in index caching, resulting from
a cyclical reference.
2025-09-10 06:19:35 -07:00
Jack Ye
9391ad1450 feat: support mTLS for remote database (#2638)
This PR adds mTLS (mutual TLS) configuration support for the LanceDB
remote HTTP client, allowing users to authenticate with client
certificates and configure custom CA certificates for server
verification.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-09 21:04:46 -07:00
LuQQiu
79960b254e fix: add partition statistics to MetadataEraser (#2637)
Some of the data fusion optimizers optimize based on data statistics
(e.g. total bytes, number of rows).
If those statistics are not supplied, optimizers cannot optimize on top.
One example is Anti Hash Join which can optimize from LeftAnti (Left:
big table, Right: small table) to RightAnti (Left: small table, Right:
big table). Left Anti requires reading the whole big & small table while
RightAnti only requires reading the whole left table and supports limit
push down to only read partial of big table
2025-09-09 09:13:22 -07:00
Lance Release
06d5612443 Bump version: 0.22.0-beta.2 → 0.22.0 2025-09-04 08:33:40 +00:00
Lance Release
45f96f4151 Bump version: 0.22.0-beta.1 → 0.22.0-beta.2 2025-09-04 08:33:09 +00:00
Jack Ye
683aaed716 chore: upgrade lance to 0.35.0 (#2625) 2025-09-04 01:31:13 -07:00
Lance Release
48f7b20daa Bump version: 0.22.0-beta.0 → 0.22.0-beta.1 2025-09-03 17:51:36 +00:00
Jack Ye
e6f1da31dc chore: upgrade lance to 0.34.0-beta.4 (#2621) 2025-09-02 21:33:55 -07:00
Lance Release
47747287b6 Bump version: 0.21.4-beta.1 → 0.22.0-beta.0 2025-08-29 21:20:57 +00:00
Wyatt Alt
981f8427e6 chore: update lance (#2610)
Adds storage_options to object_store wrap() to adhere to upstream lance
change.
2025-08-29 13:41:02 -07:00
Jack Ye
faf8973624 feat!: support multi-level namespace (#2603)
This PR adds support of multi-level namespace in a LanceDB database,
according to the Lance Namespace spec.

This allows users to create namespace inside a database connection,
perform create, drop, list, list_tables in a namespace. (other
operations like update, describe will be in a follow-up PR)

The 3 types of database connections behave like the following:
1 Local database connections will continue to have just a flat list of
tables for backwards compatibility.
2. Remote database connections will make REST API calls according to the
APIs in the Lance Namespace spec.
3. Lance Namespace connections will invoke the corresponding operations
against the specific namespace implementation which could have different
behaviors regarding these APIs.

All the table APIs now take identifier instead of name, for example
`/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is
directly in the root namespace, the API call is identical. If the table
is in a namespace, then the full table ID should be used, with `$` as
the default delimiter (`.` is a special character and creates issues
with URL parsing so `$` is used), for example
`/v1/table/ns1$table1/create`. If a different parameter needs to be
passed in, user can configure the `id_delimiter` in client config and
that becomes a query parameter, for example
`/v1/table/ns1__table1/create?delimiter=__`

The Python and Typescript APIs are kept backwards compatible, but the
following Rust APIs are not:
1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>`
is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace:
&[String]) -> Result<()>`
2. `Connection::drop_all_tables(&self) -> Result<()>` is now
`Connection::drop_all_tables(&self, name: impl AsRef<str>) ->
Result<()>`
2025-08-27 12:07:55 -07:00
Lance Release
6839ac3509 Bump version: 0.21.4-beta.0 → 0.21.4-beta.1 2025-08-22 03:55:22 +00:00
Lance Release
d4a41b5663 Bump version: 0.21.3 → 0.21.4-beta.0 2025-08-19 22:56:52 +00:00
Vitali Lovich
d602e9f98c fix: make cloud features optional (#2567) (#2568)
This shrinks the size of a local embedded build that can disable all the
default features. When combined with
https://github.com/lancedb/lance/pull/4362 and the dependencies are
updated to point to the fix, this resolves #2567 fully.

Verified by patching the workspace to redirect to my clone of lance with
the PR applied.
```
cargo tree -p lancedb -e no-build -e no-dev --no-default-features -i aws-config | less
```

The reason that lance itself needs to change too is that many
dependencies within that project depend on lance-io/default and lancedb
depends on them which transitively ends up enabling the cloud
regardless. The PR in lance removes the dependency on lance-io/default
from all sibling crates.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-08-15 16:46:52 -07:00
Will Jones
ad09234d59 feat: allow setting train=False and name on indices (#2586)
Enables two new parameters when building indices:

* `name`: Allows explicitly setting a name on the index. Default is
`{col_name}_idx`.
* `train` (default `True`): When set to `False`, an empty index will be
immediately created.

The upgrade of Lance means there are also additional behaviors from
cd76a993b8:

* When a scalar index is created on a Table, it will be kept around even
if all rows are deleted or updated.
* Scalar indices can be created on empty tables. They will default to
`train=False` if the table is empty.

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-08-15 14:00:26 -07:00
Lance Release
0c34ffb252 Bump version: 0.21.3-beta.0 → 0.21.3 2025-08-15 18:03:26 +00:00
Lance Release
d9f333d828 Bump version: 0.21.2 → 0.21.3-beta.0 2025-08-15 18:02:43 +00:00
Yuval Lifshitz
ce550e6c45 feat: add missing rust examples (#2583)
all 3 example are running now with:
```
cargo run --example simple
cargo run --example full_text_search
cargo run --example ivf_pq
```

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
2025-08-15 10:38:58 -07:00
Will Jones
dcf53c4506 fix: limit and offset support paginating through FTS and vector search results (#2592)
Adds tests to ensure that users can paginate through simple scan, FTS,
and vector search results using `limit` and `offset`.

Tests upstream work: https://github.com/lancedb/lance/pull/4318

Closes #2459
2025-08-15 08:55:12 -07:00
Weston Pace
ed640a76d9 feat: add take_offsets and take_row_ids (#2584)
These operations have existed in lance for a long while and many users
need to drop down to lance for this capability. This PR adds the API and
implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't
currently add any load to `BaseTable`. I'm not sure that is sustainable
as base table implementations may want to specialize how they handle
this method. However, I figure it is a good starting point.

In addition, unlike Lance, this API does not currently guarantee
anything about the order of the take results. This is necessary for the
fallback filter approach to work (SQL filters cannot guarantee result
order)
2025-08-15 06:48:24 -07:00