mirror of https://github.com/lancedb/lancedb.git synced 2026-05-15 11:00:41 +00:00

Go to file

yaommen a0a2942ad5 fix: respect max_batch_length for Rust vector and hybrid queries (#3172 )

Fixes #1540

I could not reproduce this on current `main` from Python, but I could
still reproduce it from the Rust SDK.

Python no longer reproduces because the current Python vector/hybrid
query paths re-chunk results into a `pyarrow.Table` before returning
batches. Rust still reproduced because `max_batch_length` was passed
into planning/scanning, but vector search could still emit larger
`RecordBatch`es later in execution (for example after KNN / TopK), so it
was not enforced on the final Rust output stream.

This PR enforces `max_batch_length` on the final Rust query output
stream and adds Rust regression coverage.

Before the fix, the Rust repro produced:
`num_batches=2, max_batch=8192, min_batch=1808, all_le_100=false`

After the fix, the same repro produces batches `<= 100`.

## Runnable Rust repro

Before this fix, current `main` could still return batches like `[8192,
1808]` here even with `max_batch_length = 100`:

```rust
use std::sync::Arc;

use arrow_array::{
    types::Float32Type, FixedSizeListArray, RecordBatch, RecordBatchReader, StringArray,
};
use arrow_schema::{DataType, Field, Schema};
use futures::TryStreamExt;
use lancedb::query::{ExecutableQuery, QueryBase, QueryExecutionOptions};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let tmp = tempfile::tempdir()?;
    let uri = tmp.path().to_str().unwrap();

    let rows = 10_000;
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Utf8, false),
        Field::new(
            "vector",
            DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4),
            false,
        ),
    ]));

    let ids = StringArray::from_iter_values((0..rows).map(|i| format!("row-{i}")));
    let vectors = FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
        (0..rows).map(|i| Some(vec![Some(i as f32), Some(1.0), Some(2.0), Some(3.0)])),
        4,
    );
    let batch = RecordBatch::try_new(schema.clone(), vec![Arc::new(ids), Arc::new(vectors)])?;
    let reader: Box<dyn RecordBatchReader + Send> = Box::new(
        arrow_array::RecordBatchIterator::new(vec![Ok(batch)].into_iter(), schema),
    );

    let db = lancedb::connect(uri).execute().await?;
    let table = db.create_table("test", reader).execute().await?;

    let mut opts = QueryExecutionOptions::default();
    opts.max_batch_length = 100;

    let mut stream = table
        .query()
        .nearest_to(vec![0.0, 1.0, 2.0, 3.0])?
        .limit(rows)
        .execute_with_options(opts)
        .await?;

    let mut sizes = Vec::new();
    while let Some(batch) = stream.try_next().await? {
        sizes.push(batch.num_rows());
    }

    println!("{sizes:?}");
    Ok(())
}
```

Signed-off-by: yaommen <myanstu@163.com>

2026-03-30 15:43:58 -07:00

.cargo

chore: clippy::string_to_string has been replaced by implicit_clone (#2817 )

2025-11-26 16:30:35 +08:00

.github

ci: mitigate template injection attack in build_linux_wheel (#3195 )

2026-03-30 09:29:24 -07:00

ci: modify check_lance_release.py to prefer stable releases over betas (#3146 )

2026-03-17 09:21:30 -07:00

dockerfiles

chore: dependency updates and security fixes (#3116 )

2026-03-09 20:04:27 -07:00

docs

feat(node): support Float16, Float64, and Uint8 vector queries (#3193 )

2026-03-30 11:15:35 -07:00

java

Bump version: 0.27.2-beta.0 → 0.27.2-beta.1

2026-03-25 16:22:09 +00:00

nodejs

feat(node): support Float16, Float64, and Uint8 vector queries (#3193 )

2026-03-30 11:15:35 -07:00

python

fix(python): skip test_url_retrieve_downloads_image when PIL not installed (#3208 )

2026-03-30 14:48:49 -07:00

rust

fix: respect max_batch_length for Rust vector and hybrid queries (#3172 )

2026-03-30 15:43:58 -07:00

.bumpversion.toml

Bump version: 0.27.2-beta.0 → 0.27.2-beta.1

2026-03-25 16:22:09 +00:00

.gitignore

feat: bump lance version to 0.40-0-beta.2 (#2772 )

2025-11-10 14:36:37 -08:00

.pre-commit-config.yaml

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

about.hbs

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

about.toml

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

AGENTS.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

Cargo.lock

feat(node): support Float16, Float64, and Uint8 vector queries (#3193 )

2026-03-30 11:15:35 -07:00

Cargo.toml

feat: update lance dependency to v4.0.0-rc.3 (#3187 )

2026-03-25 09:20:29 -07:00

CLAUDE.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

CONTRIBUTING.md

docs: contributing guide (#1970 )

2025-01-07 15:11:16 -08:00

docker-compose.yml

fix(ci): upgrade LocalStack to 4.0 for S3 integration tests (#3147 )

2026-03-16 09:02:11 -07:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

Makefile

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

pyright_report.csv

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

README.md

docs: update REST API link in README.md (#2906 )

2026-01-30 15:49:41 -08:00

release_process.md

ci: enable java auto release (#1602 )

2024-09-19 10:51:03 -07:00

RUST_THIRD_PARTY_LICENSES.html

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

rust-toolchain.toml

chore: update lance dependency to v3.0.0-beta.5 (#3058 )

2026-02-23 00:39:30 -08:00

README.md

The Multimodal AI Lakehouse

How to Install ✦ Detailed Documentation ✦ Tutorials and Recipes ✦ Contributors

The ultimate multimodal data platform for AI/ML applications.

LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease. LanceDB is a central location where developers can build, train and analyze their AI workloads.

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

⭐ Click here ⭐ to see how fast we're growing!

Key Features:

Fast Vector Search: Search billions of vectors in milliseconds with state-of-the-art indexing.
Comprehensive Search: Support for vector similarity search, full-text search and SQL.
Multimodal Support: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
Advanced Features: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.

Products:

Open Source & Local: 100% open source, runs locally or in your cloud. No vendor lock-in.
Cloud and Enterprise: Production-scale vector search with no servers to manage. Complete data sovereignty and security.

Ecosystem:

Columnar Storage: Built on the Lance columnar format for efficient storage and analytics.
Seamless Integration: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
Rich Ecosystem: Integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

How to Install:

Follow the Quickstart doc to set up LanceDB locally.

API & SDK: We also support Python, Typescript and Rust SDKs

Interface	Documentation
Python SDK	https://lancedb.github.io/lancedb/python/python/
Typescript SDK	https://lancedb.github.io/lancedb/js/globals/
Rust SDK	https://docs.rs/lancedb/latest/lancedb/index.html
REST API	https://docs.lancedb.com/api-reference/rest

Join Us and Contribute

We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.

If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our Discord server.

Check out the GitHub Issues if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.

Contributors

Stay in Touch With Us

Languages

HTML 39.5%

Rust 29%

Python 23%

TypeScript 8%

Shell 0.3%

Other 0.1%

README.md Unescape Escape

The Multimodal AI Lakehouse

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

Key Features:

Products:

Ecosystem:

How to Install:

Join Us and Contribute

Contributors

Stay in Touch With Us

README.md