mirror of https://github.com/lancedb/lancedb.git synced 2026-05-30 18:30:40 +00:00

Go to file

devteamaegis 7dba793629 fix(rerankers): inverted scores and incorrect missing-FTS penalty in LinearCombinationReranker (#3437 )

## Problem

`LinearCombinationReranker.merge_results` has two related bugs that make
it return **inverted relevance rankings** — the least relevant document
ranks first (closes #3154).

### Bug 1 — `_combine_score` subtracts from 1, inverting the final
ranking

```python
def _combine_score(self, vector_score, fts_score):
    return 1 - (self.weight * vector_score + (1 - self.weight) * fts_score)
```

Both `vector_score` (already converted via `_invert_score`) and
`fts_score` (BM25 relevance) are in **higher-is-better** space. Wrapping
the weighted average in `1 - (...)` flips the direction: a perfectly
matching document (`vector_score=1, fts_score=1`) gets `_relevance_score
= 0.0`, while a non-matching document gets a high score.

### Bug 2 — Documents missing an FTS score are rewarded, not penalised

```python
fts_score = result.get("_score", fill)  # fill=1.0 by default
```

When a document has no FTS match, `fts_score = fill = 1.0`. In
`_combine_score` (with the bug-1 formula), this large value becomes a
**negative penalty** via `1 - (... + 0.3 * 1.0)`, counterintuitively
*boosting* the document's score. By contrast, missing vector results
correctly receive `_invert_score(fill) = 0.0` (penalised).

## Fix

**Bug 1** — remove the `1 -` inversion from `_combine_score`:

```python
def _combine_score(self, vector_score, fts_score):
    return self.weight * vector_score + (1 - self.weight) * fts_score
```

**Bug 2** — use `1 - fill` for missing FTS scores so both penalties are
symmetric (mirror of what `_invert_score(fill)` already does for missing
vector scores):

```python
fts_score = result.get("_score", 1 - fill)  # was: fill
```

With `fill=1.0` (default): `1 - 1.0 = 0.0` — missing-FTS entries
contribute `0` to the FTS term, identical to how missing-vector entries
contribute `0` to the vector term.

## Verification

Concrete example from the issue. With `weight=0.7`, `fill=1.0`:

| Document | `_distance` | `_score` | Old `_relevance_score` | New
`_relevance_score` |

|----------|-------------|----------|------------------------|------------------------|
| `apple orange` | 0.0 (best) | 2.41 (only FTS) | 0.30 (**wrong: ranked
2nd**) | 1.42 (**correct: ranked 1st**) |
| `banana grape` | 0.9999 (worst) | — | 0.70 (**wrong: ranked 1st**) |
0.00 (**correct: ranked last**) |

## Tests

Two regression tests added to `python/python/tests/test_rerankers.py`:

- `test_linear_combination_best_match_ranks_first` — the document with
the smallest distance **and** an FTS match must have the highest
`_relevance_score`.
- `test_linear_combination_missing_fts_is_penalised` — a document with
any FTS score must beat an otherwise-equal document with no FTS match.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>

2026-05-26 15:26:34 -07:00

.cargo

chore: clippy::string_to_string has been replaced by implicit_clone (#2817 )

2025-11-26 16:30:35 +08:00

.github

chore(deps): bump the rust-minor-patch group across 1 directory with 23 updates (#3382 )

2026-05-20 09:09:39 -07:00

fix: use releases API in check_lance_release.py (#3427 )

2026-05-22 15:00:44 -07:00

dockerfiles

refactor(python): remove legacy tantivy FTS support (#3282 )

2026-04-20 09:28:45 +08:00

docs

feat(remote): send read freshness headers for remote table consistency (#3439 )

2026-05-26 13:38:07 -07:00

java

Bump version: 0.30.0-beta.0 → 0.30.0-beta.1

2026-05-22 10:09:01 +00:00

nodejs

feat(remote): send read freshness headers for remote table consistency (#3439 )

2026-05-26 13:38:07 -07:00

python

fix(rerankers): inverted scores and incorrect missing-FTS penalty in LinearCombinationReranker (#3437 )

2026-05-26 15:26:34 -07:00

rust

feat(remote): send read freshness headers for remote table consistency (#3439 )

2026-05-26 13:38:07 -07:00

.bumpversion.toml

Bump version: 0.30.0-beta.0 → 0.30.0-beta.1

2026-05-22 10:09:01 +00:00

.gitignore

feat: bump lance version to 0.40-0-beta.2 (#2772 )

2025-11-10 14:36:37 -08:00

.pre-commit-config.yaml

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

about.hbs

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

about.toml

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

AGENTS.md

docs: clarify PR title requirement for agents (#3433 )

2026-05-22 20:09:20 +08:00

Cargo.lock

chore(deps): bump the rust-minor-patch group across 1 directory with 2 updates (#3440 )

2026-05-26 14:28:40 -07:00

Cargo.toml

chore: update lance dependency to v7.0.0-beta.13 (#3399 )

2026-05-18 13:19:32 -07:00

CLAUDE.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

CONTRIBUTING.md

docs: contributing guide (#1970 )

2025-01-07 15:11:16 -08:00

deny.toml

feat(python): support model-backed native FTS tokenizers (#3289 )

2026-05-08 23:53:14 +08:00

docker-compose.yml

fix(ci): upgrade LocalStack to 4.0 for S3 integration tests (#3147 )

2026-03-16 09:02:11 -07:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

Makefile

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

pyright_report.csv

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

README.md

docs: fix broken documentation links (#3278 )

2026-04-15 20:56:59 +08:00

release_process.md

ci: enable java auto release (#1602 )

2024-09-19 10:51:03 -07:00

RUST_THIRD_PARTY_LICENSES.html

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

rust-toolchain.toml

chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257 )

2026-04-10 07:57:47 -07:00

README.md

The Multimodal AI Lakehouse

How to Install ✦ Detailed Documentation ✦ Tutorials and Recipes ✦ Contributors

The ultimate multimodal data platform for AI/ML applications.

LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease. LanceDB is a central location where developers can build, train and analyze their AI workloads.

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

⭐ Click here ⭐ to see how fast we're growing!

Key Features:

Fast Vector Search: Search billions of vectors in milliseconds with state-of-the-art indexing.
Comprehensive Search: Support for vector similarity search, full-text search and SQL.
Multimodal Support: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
Advanced Features: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.

Products:

Open Source & Local: 100% open source, runs locally or in your cloud. No vendor lock-in.
Cloud and Enterprise: Production-scale vector search with no servers to manage. Complete data sovereignty and security.

Ecosystem:

Columnar Storage: Built on the Lance columnar format for efficient storage and analytics.
Seamless Integration: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
Rich Ecosystem: Integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

How to Install:

Follow the Quickstart doc to set up LanceDB locally.

API & SDK: We also support Python, Typescript and Rust SDKs

Interface	Documentation
Python SDK	https://lancedb.github.io/lancedb/python/python/
Typescript SDK	https://lancedb.github.io/lancedb/js/globals/
Rust SDK	https://docs.rs/lancedb/latest/lancedb/index.html
REST API	https://docs.lancedb.com/api-reference/rest

Join Us and Contribute

We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.

If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our Discord server.

Check out the GitHub Issues if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.

Contributors

Stay in Touch With Us

Languages

HTML 37.8%

Rust 30.5%

Python 23.1%

TypeScript 8.1%

Shell 0.3%

Other 0.1%

README.md Unescape Escape

The Multimodal AI Lakehouse

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

Key Features:

Products:

Ecosystem:

How to Install:

Join Us and Contribute

Contributors

Stay in Touch With Us

README.md