## Summary
- PR #2957 changed the permutation builder to only select `_rowid` from
the base table, but `Splitter::project()` for hash and calculated splits
replaced the selection entirely, dropping `_rowid`.
- Include `_rowid` in the column selections for hash and calculated
split projections.
- Fix a Python test that queried the permutation table for base table
columns no longer materialized.
Fixes the `test_split_hash`, `test_split_hash_with_discard`,
`test_split_calculated`, `test_shuffle_combined_with_splits`, and
`test_filter_with_splits` failures in `test_permutation.py`.
## Test plan
- [x] `cargo test -p lancedb -- permutation` (22 passed)
- [x] `pytest python/tests/test_permutation.py` (46 passed)
- [x] `npm test __test__/permutation.test.ts` (20 passed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
References #2949 Moved DeleteResult and delete() implementation to
src/table/delete.rs. No functional changes. Added a test delete which
works. Will work on refactoring update next.
Fixes#2898
Problem:
Sync API cancellations didn’t stop remote query coroutines, so requests
could continue after interrupt.
Changes:
- Cancel run_coroutine_threadsafe futures on any BaseException in the
sync background loop
- Update cancellation test to avoid starting a real background thread
and cover GeneratorExit
Fixes the Rust SDK's `create_empty_table` to properly support embedding
column definitions, bringing it to parity with the Python SDK.
## Problem
The Rust SDK's `Connection::create_empty_table` did not support setting
embedding columns. When using `.add_embedding()` on the builder, the
embedding column definitions were lost because
`TableDefinition::new_from_schema(schema)` marks all columns as physical
only, without embedding metadata.
The Python SDK worked around this by creating an empty record batch with
proper schema metadata rather than using `create_empty_table` directly.
## Solution
Modified `CreateTableBuilder<false>` to handle embeddings
Closes#2759
Importing `PIL` alone does not guarantee that the `Image` submodule is
loaded. In a clean environment where no other code has imported
`PIL.Image` before, `PIL.Image` does not exist on the `PIL` package,
which leads to the AttributeError.
The permutation table was always intended to be a small table of row id
pointers (and split id). However, it was accidentally doing a full
materialization of the base table 🤦
This PR changes the permutation builder to only store row id and split
id.
Realized our MSRV check was inert because `rust-toolchain.toml` was
overriding the Rust version. We set the `RUSTUP_TOOLCHAIN` environment
variable, which overrides that.
Also needed to update to MSRV 1.88 (due to dependencies like Lance and
DataFusion) and fix some clippy warnings.
Unlike in Amazon S3, in Azure bucket names are not globally unique.
Instead, the combination of (storage_account_name, bucket_name) is
unique.
Therefore, when using Azure blob store, we always need a way to
configure the storage account name. One way is to use the
storage_options hash map and set azure_storage_account_name. Another way
is to set an environment variable, AZURE_STORAGE_ACCOUNT_NAME.
Prior to this PR, the second way (environment variable) did not work
with remote connections. This is because the existing code that checks
for these environment variables happens inside the Azure object store
implementation itself, which does not run locally when using remote
connections.
This PR addresses that situation by adding a check of the environment
variable. This functions as a default if the relevant storage option is
not set in the storage_options hash map.
Pandas 3.0+ string now converts to Arrow large_utf8. This PR mainly
makes sure our test accounts for the difference across the pandas
versions when constructing schema.
BREAKING CHANGE: removes `aws`, `dynamodb`, `azure`, `gcs`, `oss`,
`huggingface` from default Rust features. They can be enabled by users
as needed.
They are still enabled for Python and NodeJS, since those users don't
control the compilation of artifacts.
Closes#2911
RemoteDBConnection should support passing exist_ok to create_table, just
like LanceDBConnection (the non-remote form) does. It can support this
by passing 'exist_ok' as the mode parameter.
Implement parallel execution of multiple embedding functions using
std:🧵:scope to improve performance when a table has multiple
embedding columns.
Key changes:
- Add compute_embeddings_parallel() helper method to WithEmbeddings
- Use fast path for single embeddings (no threading overhead)
- Use scoped threads for parallel execution of multiple embeddings
- Add comprehensive tests including parallelization timing verification
- Update WithEmbeddings documentation
Performance improvements:
- I/O-bound embeddings (OpenAI, Bedrock): High benefit from concurrent
API calls
- CPU-bound embeddings (sentence-transformers): Medium benefit from core
utilization
- Single embedding: No overhead (fast path)
Closes TODO on line 266 in rust/lancedb/src/embeddings.rs
Aesthetic and styling fixes to the SDK reference docs:
- [x] Improve readability of LanceDB in the header
- [x] Make header more compact, and consistent in gradient color with
the main website/docs
- [x] Updated favicon to match with the docs page
- [x] Enable permalink display to allow users to get anchor links to
each function/method
- [x] Point readers to the main docs at
[docs.lancedb.com](https://docs.lancedb.com)
Convert test_table_names to test both remote and local connections.
This PR also includes some miscellaneous improvements in
src/test_utils/connection.rs. It starts a thread to drain stdout from
the server process. It adds the
PRINT_LANCEDB_TEST_CONNECTION_SCRIPT_OUTPUT environment variable, which
optionally displays server stdout.
Fix a bash conditional in run_with_test_connection.sh.