The page_token and limit parameters for table_names() are supported by
both local storage and LanceDB Cloud, not just Cloud as the docstring
incorrectly stated.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Adds `Table.to_lance()` and `Table.to_polars()` methods (non-abstract
methods, defaulting to `NotImplementedError`) so type checkers like
mypy, pyright and ty don’t flag them as unknown attributes on `Table`.
Not making these abstract methods should keep existing remote/other
`Table` implementations instantiable.
This is non-breaking change to existing functionality and is purely for
the purpose of pleasing static type-checkers like mypy, ty and pyright.
<img width="626" height="134" alt="image"
src="https://github.com/user-attachments/assets/f4619bca-a882-432b-bd23-ae8f189ff9e3"
/>
## Summary
- infer integer vector columns as float32 when any value exceeds uint8
range or is negative
- keep uint8 for integer vectors within range and nulls only
- add sync/async tests covering large integer vector inference
## Testing
- ./.venv/bin/pytest python/python/tests/test_table.py -k
"large_int_vectors"
Adds IVF_SQ index config through Rust core and Python bindings, plus
alias names IvfHnswSq/Pq for backward compatibility. Updates
remote/table helpers and types to accept the new index type. Includes
tests covering IVF SQ creation and alias usage.
1. Use generated models in lance-namespace for request response models
to avoid multiple layers of conversions
2. Make sure the API is consistent with the namespace spec
3. Deprecate the table_names API in favor of the list_tables API in
namespace that allows full pagination support without the need to have
sorted table names
4. Add describe_namespace API which was a miss in the original
implementation
Add support for enabling stable row IDs when creating tables via the
`new_table_enable_stable_row_ids` storage option.
Stable row IDs ensure that row identifiers remain constant after
compaction, update, delete, and merge operations. This is useful for
materialized views and other use cases that need to track source rows
across these operations.
The option can be set at two levels:
- Connection level: applies to all tables created with that connection
- Table level: per-table override via create_table storage_options
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
- bump all Lance crates to v1.0.0-beta.16 via ci/set_lance_version.py
- refresh Cargo.lock (reqwest/opendal/etc.) to satisfy the new release
## Verification
- cargo clippy --workspace --tests --all-features -- -D warnings
- cargo fmt --all
Triggered by
[refs/tags/v1.0.0-beta.16](https://github.com/lance-format/lance/releases/tag/v1.0.0-beta.16)
---------
Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
This PR improves the docstring for `IVF_RQ` (RabitQ) in Python. The
earlier version referred to it as "residual quantization", which is
confusing to future readers of the code.
In contrast, the TypeScript and Rust codebases defined `IVF_RQ` as
RabitQ. So now the three languages use comments that are consistent with
one another.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Did a full scan of all URLs that used to point to the old mkdocs pages,
and now links to the appropriate pages on lancedb.com/docs or lance.org
docs.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The `self._query` value was not set when wrapping its copy `query` with
quotation marks.
The test for phrase queries has been updated to test the
`.phrase_query()` method as well, which will catch this bug.
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
This request improves support for `pydantic` integration by adding
`to_pydantic` method to asynchronous queries and handling models that
use `alias` in field definitions. Fixes#2436 and closes#2437 .
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added support for converting asynchronous query results to Pydantic
models.
- **Bug Fixes**
- Simplified conversion of query results to Pydantic models for improved
reliability.
- Improved handling of field aliases and computed fields when mapping
query results to Pydantic models.
- **Tests**
- Added tests to verify correct mapping of aliased and computed fields
in both synchronous and asynchronous scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This pipes the num_attempts field from lance's merge insert result
through lancedb. This allows callers of merge_insert to get a better
idea of whether transaction conflicts are occurring.
* Add `ci` profile for smaller build caches. This had a meaningful
impact in Lance, and I expect a similar impact here.
https://github.com/lancedb/lance/pull/5236
* Get caching working in Rust. Previously was not working due to
`workspaces: rust`.
* Get caching working in NodeJs lint job. Previously wasn't working
because we installed the toolchain **after** we called `- uses:
Swatinem/rust-cache@v2`, which invalidates the cache locally.
* Fix broken pytest from async io transition
(`pytest.PytestRemovedIn9Warning`)
* Altered `get_num_sub_vectors` to handle bug in case of 4-bit PQ. This
was cause of `rust future panicked: unknown error`. Raised an issue
upstream to change panic to error:
https://github.com/lancedb/lance/issues/5257
* Call `npm run docs` to fix doc issue.
* Disable flakey Windows test for consistency. It's just an OS-specific
timer issue, not our fault.
* Fix Windows absolute path handling in namespaces. Was causing CI
failure `OSError: [WinError 123] The filename, directory name, or volume
label syntax is incorrect: `
Fixes pydantic validation errors when creating materialized views with
namespace.
```
> return JsonArrowSchema(fields=fields, metadata=schema.metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E pydantic_core._pydantic_core.ValidationError: 4 validation errors for JsonArrowSchema
E metadata.b'geneva::view::query'
E Input should be a valid string [type=string_type, input_value=b'{"base":{"vector_column...t-image:latest\\"}"}}]}', input_type=bytes]
E For further information visit https://errors.pydantic.dev/2.12/v/string_type
```
Based on https://github.com/lancedb/lance/pull/4984
1. Bump to 1.0.0-beta.2
2. Use DirectoryNamespace in lance to perform all testing in python and
rust for much better coverage
3. Refactor `ListingDatabase` to be able to accept location and
namespace. This is because we have to leverage listing database (local
lancedb connection) for using namespace, namespace only resolves the
location and storage options but we don't want to bind all the way to
rust since user will plug-in namespace from python side. And thus
`ListingDatabase` needs to be able to accept location and namespace that
are created from namespace connection.
4. For credentials vending, we also pass storage options provider all
the way to rust layer, and the rust layer calls back to the python
function to fetch next storage option. This is exactly the same thing we
did in pylance.
This PR does the following:
- Pare down the docs to only what's needed (Python, JS/TS API docs and a
pointer to Rust docs)
- Styling changes to be more in line with the main website theme
The relative URLs remain unchanged, so assuming CI passes, there should
be no breaking changes from the main docs site that points back here.
Support FTS feature parity in SQL to match current Python API
capability.
Add `.to_json()` method to FTS query classes to enable usage with SQL
`fts()` UDTF.
Related: https://github.com/lancedb/blog-lancedb/pull/147
query = MatchQuery("puppy", "text", fuzziness=2)
result = client.execute(f"SELECT * FROM fts('table',
'{query.to_json()}')")
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
- Updated all Lance dependencies from v0.38.3-beta.9 to v0.38.3-beta.11
- Migrated `lance-namespace-impls` to use new granular cloud provider
features (`dir-aws`, `dir-gcp`, `dir-azure`, `dir-oss`) instead of
deprecated `dir` feature
- Updated namespace connection API to use `ConnectBuilder` instead of
deprecated `connect()` function
## API Changes
The Lance team refactored the `lance-namespace-impls` package in
v0.38.3-beta.11:
1. **Feature flags**: The single `dir` feature was split into cloud
provider-specific features:
- `dir-aws` for AWS S3 support
- `dir-gcp` for Google Cloud Storage support
- `dir-azure` for Azure Blob Storage support
- `dir-oss` for Alibaba Cloud OSS support
2. **Connection API**: The `connect()` function was replaced with a
`ConnectBuilder` pattern for more flexibility
## Testing
- ✅ Ran `cargo clippy --workspace --tests --all-features -- -D warnings`
- no warnings
- ✅ Ran `cargo fmt --all` - code formatted
- ✅ All changes verified and committed
## Related
This update was triggered by the Lance release:
https://github.com/lancedb/lance/releases/tag/v0.38.3-beta.11🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
I'm working on a lancedb version of pytorch data loading (and hopefully
addressing https://github.com/lancedb/lance/issues/3727).
However, rather than rely on pytorch for everything I'm moving some of
the things that pytorch does into rust. This gives us more control over
data loading (e.g. using shards or a hash-based split) and it allows
permutations to be persistent. In particular I hope to be able to:
* Create a persistent permutation
* This permutation can handle splits, filtering, shuffling, and sharding
* Create a rust data loader that can read a permutation (one or more
splits), or a subset of a permutation (for DDP)
* Create a python data loader that delegates to the rust data loader
Eventually create integrations for other data loading libraries,
including rust & node
## Description of changes
Fixes#2698
This PR uses
[`typing.override`](https://docs.python.org/3/library/typing.html#typing.override)
in favor of the [`overrides`](https://pypi.org/project/overrides/)
dependency when possible. As of Python 3.12, the standard library offers
`typing.override` to perform a static check on overridden methods.
### Motivation
Currently, `overrides` is incompatible with Python 3.14. As a result,
any package that attempts to import `overrides` using Python 3.14+ will
raise an `AttributeError`. An
[issue](https://github.com/mkorpela/overrides/issues/127) has been
raised and a [pull
request](https://github.com/mkorpela/overrides/pull/133) has been
submitted to the GitHub repo for the `overrides` project. But the
maintainer has been unresponsive.
To ensure readiness for Python 3.14, this package (and any other package
directly depending on `overrides`) should consider using
`typing.override` instead.
### Impact
The standard library added `typing.override` as of 3.12. As a result,
this change will affect only users of Python 3.12+. Previous versions
will continue to rely on `overrides`. Notably, the standard library
implementation is slightly different than that of `overrides`. A
thorough discussion of those differences is shown in [PEP
698](https://peps.python.org/pep-0698/), and it is also summarized
nicely by the maintainer of `overrides`
[here](https://github.com/mkorpela/overrides/issues/126#issuecomment-2401327116).
There are 2 main ways that switching from `overrides` to
`typing.override` will have an impact on developers of this repo.
1. `typing.override` does not implement any runtime checking. Instead,
it provides information to type checkers.
2. The stdlib does not provide a mixin class to enforce override
decorators on child classes. (Their reasoning for this is explained in
[the PEP](https://peps.python.org/pep-0698/).) This PR disables that
behavior entirely by replacing the `EnforceOverrides`.
The basic idea of MRR is this -
https://www.evidentlyai.com/ranking-metrics/mean-reciprocal-rank-mrr
I've implemented a weighted version for allowing user to set weightage
between vector and fts.
The gist is something like this
### Scenario A: Document at rank 1 in one set, absent from another
```
# Assuming equal weights: weight_vector = 0.5, weight_fts = 0.5
vector_rr = 1.0 # rank 1 → 1/1 = 1.0
fts_rr = 0.0 # absent → 0.0
weighted_mrr = 0.5 × 1.0 + 0.5 × 0.0 = 0.5
```
### Scenario B: Document at rank 1 in one set, rank 2 in another
```
# Same weights: weight_vector = 0.5, weight_fts = 0.5
vector_rr = 1.0 # rank 1 → 1/1 = 1.0
fts_rr = 0.5 # rank 2 → 1/2 = 0.5
weighted_mrr = 0.5 × 1.0 + 0.5 × 0.5 = 0.5 + 0.25 = 0.75
```
And so with `return_score="all"` the result looks something like this
(this is from the reranker tests).
Because this is a weighted rank based reranker, some results might have
the same score
```
text vector _distance _rowid _score _relevance_score
0 I am your father [-0.010703234, 0.069315575, 0.030076642, 0.002... 8.149148e-13 8589934598 10.978719 1.000000
1 the ground beneath my feet [-0.09500901, 0.00092102867, 0.0755851, 0.0372... 1.376896e+00 8589934604 NaN 0.250000
2 I find your lack of faith disturbing [0.07525753, -0.0100010475, 0.09990541, 0.0209... NaN 8589934595 3.483394 0.250000
3 but I don't wanna die [0.033476487, -0.011235877, -0.057625435, -0.0... 1.538222e+00 8589934610 1.130355 0.238095
4 if you strike me down I shall become more powe... [0.00432201, 0.030120496, 5.3317923e-05, 0.033... 1.381086e+00 8589934594 0.715157 0.216667
5 I see a salty message written in the eves [-0.04213107, 0.0016004723, 0.061052393, -0.02... 1.638301e+00 8589934603 1.043785 0.133333
6 but his son was mortal [0.012462767, 0.049041674, -0.057339743, -0.04... 1.421566e+00 8589934620 NaN 0.125000
7 I've got a bad feeling about this [-0.06973199, -0.029960092, 0.02641632, -0.031... NaN 8589934596 1.043785 0.125000
8 now that's a name I haven't heard in a long time [-0.014374257, -0.013588792, -0.07487557, 0.03... 1.597573e+00 8589934593 0.848772 0.118056
9 he was a god [-0.0258895, 0.11925236, -0.029397793, 0.05888... 1.423147e+00 8589934618 NaN 0.100000
10 I wish they would make another one [-0.14737535, -0.015304729, 0.04318139, -0.061... NaN 8589934622 1.043785 0.100000
11 Kratos had a son [-0.057455737, 0.13734367, -0.03537109, -0.000... 1.488075e+00 8589934617 NaN 0.083333
12 I don't wanna live like this [-0.0028891307, 0.015214227, 0.025183653, 0.08... NaN 8589934609 1.043785 0.071429
13 I see a mansard roof through the trees [0.052383978, 0.087759204, 0.014739997, 0.0239... NaN 8589934602 1.043785 0.062500
14 great kid don't get cocky [-0.047043696, 0.054648954, -0.008509666, -0.0... 1.618125e+00 8589934592 NaN 0.055556
```
Support shallow cloning a dataset at a specific location to create a new
dataset, using the shallow_clone feature in Lance. Also introduce remote
`clone` API for remote tables for this functionality.
## Summary
This PR introduces a `HeaderProvider` which is called for all remote
HTTP calls to get the latest headers to inject. This is useful for
features like adding the latest auth tokens where the header provider
can auto-refresh tokens internally and each request always set the
refreshed token.
---------
Co-authored-by: Claude <noreply@anthropic.com>
This PR adds mTLS (mutual TLS) configuration support for the LanceDB
remote HTTP client, allowing users to authenticate with client
certificates and configure custom CA certificates for server
verification.
---------
Co-authored-by: Claude <noreply@anthropic.com>
This changes the default values for some namespace parameters in the
remote python SDK from None to [], to match the underlying code it
calls.
Prior to this commit, failing to supply "namespace" with the remote SDK
would cause an error because the underlying code it dispatches to does
not consider None to be valid input.
## Summary
This PR adds the missing `name` parameter to `create_scalar_index` and
`create_fts_index` methods in the Python SDK, which was inadvertently
omitted when it was added to `create_index` in PR #2586.
## Changes
- Add `name: Optional[str] = None` parameter to abstract
`Table.create_scalar_index` and `Table.create_fts_index` methods
- Update `LanceTable` implementation to accept and pass the `name`
parameter to the underlying Rust layer
- Update `RemoteTable` implementation to accept and pass the `name`
parameter
- Enhanced tests to verify custom index names work correctly for both
scalar and FTS indices
- When `name` is not provided, default names are generated (e.g.,
`{column}_idx`)
## Test plan
- [x] Added test cases for custom names in scalar index creation
- [x] Added test cases for custom names in FTS index creation
- [x] Verified existing tests continue to pass
- [x] Code formatting and linting checks pass
This ensures API consistency across all index creation methods in the
LanceDB Python SDK.
Fixes#2616🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
This PR adds support of multi-level namespace in a LanceDB database,
according to the Lance Namespace spec.
This allows users to create namespace inside a database connection,
perform create, drop, list, list_tables in a namespace. (other
operations like update, describe will be in a follow-up PR)
The 3 types of database connections behave like the following:
1 Local database connections will continue to have just a flat list of
tables for backwards compatibility.
2. Remote database connections will make REST API calls according to the
APIs in the Lance Namespace spec.
3. Lance Namespace connections will invoke the corresponding operations
against the specific namespace implementation which could have different
behaviors regarding these APIs.
All the table APIs now take identifier instead of name, for example
`/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is
directly in the root namespace, the API call is identical. If the table
is in a namespace, then the full table ID should be used, with `$` as
the default delimiter (`.` is a special character and creates issues
with URL parsing so `$` is used), for example
`/v1/table/ns1$table1/create`. If a different parameter needs to be
passed in, user can configure the `id_delimiter` in client config and
that becomes a query parameter, for example
`/v1/table/ns1__table1/create?delimiter=__`
The Python and Typescript APIs are kept backwards compatible, but the
following Rust APIs are not:
1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>`
is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace:
&[String]) -> Result<()>`
2. `Connection::drop_all_tables(&self) -> Result<()>` is now
`Connection::drop_all_tables(&self, name: impl AsRef<str>) ->
Result<()>`
This PR integrates `lancedb` with `lance-namespace` so that users can
use LanceDB client to access Lance tables in any catalog services. In
general, we expect most of the logic to be delegated to the existing
`LanceDBConnection` and `LanceTable`, but the namespace implemenation
will control how table is created, dropped, and describe where the table
is stored with any related storage options like access credentials.
The implementation currently only supports a 1 level namespace that
directly contains tables. We will introduce nested namespace support in
a separated PR.
Users are expected to use it in the following way:
```python
>>> import lancedb
>>> import pyarrow as pa
>>> # Connect using GlueNamespace
>>> db = lancedb.connect_namespace("glue", {"catalog_id": "123456789012"})
>>> # Create a table with schema
>>> schema = pa.schema([
... pa.field("id", pa.int64()),
... pa.field("vector", pa.list_(pa.float32(), 2))
... ])
>>> table = db.create_table("my_table", schema=schema)
>>> # List tables
>>> db.table_names()
['my_table']
```
Enables two new parameters when building indices:
* `name`: Allows explicitly setting a name on the index. Default is
`{col_name}_idx`.
* `train` (default `True`): When set to `False`, an empty index will be
immediately created.
The upgrade of Lance means there are also additional behaviors from
cd76a993b8:
* When a scalar index is created on a Table, it will be kept around even
if all rows are deleted or updated.
* Scalar indices can be created on empty tables. They will default to
`train=False` if the table is empty.
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>