mirror of https://github.com/lancedb/lancedb.git synced 2026-05-31 10:50:40 +00:00

Files

devteamaegis 7dba793629 fix(rerankers): inverted scores and incorrect missing-FTS penalty in LinearCombinationReranker (#3437 )

## Problem

`LinearCombinationReranker.merge_results` has two related bugs that make
it return **inverted relevance rankings** — the least relevant document
ranks first (closes #3154).

### Bug 1 — `_combine_score` subtracts from 1, inverting the final
ranking

```python
def _combine_score(self, vector_score, fts_score):
    return 1 - (self.weight * vector_score + (1 - self.weight) * fts_score)
```

Both `vector_score` (already converted via `_invert_score`) and
`fts_score` (BM25 relevance) are in **higher-is-better** space. Wrapping
the weighted average in `1 - (...)` flips the direction: a perfectly
matching document (`vector_score=1, fts_score=1`) gets `_relevance_score
= 0.0`, while a non-matching document gets a high score.

### Bug 2 — Documents missing an FTS score are rewarded, not penalised

```python
fts_score = result.get("_score", fill)  # fill=1.0 by default
```

When a document has no FTS match, `fts_score = fill = 1.0`. In
`_combine_score` (with the bug-1 formula), this large value becomes a
**negative penalty** via `1 - (... + 0.3 * 1.0)`, counterintuitively
*boosting* the document's score. By contrast, missing vector results
correctly receive `_invert_score(fill) = 0.0` (penalised).

## Fix

**Bug 1** — remove the `1 -` inversion from `_combine_score`:

```python
def _combine_score(self, vector_score, fts_score):
    return self.weight * vector_score + (1 - self.weight) * fts_score
```

**Bug 2** — use `1 - fill` for missing FTS scores so both penalties are
symmetric (mirror of what `_invert_score(fill)` already does for missing
vector scores):

```python
fts_score = result.get("_score", 1 - fill)  # was: fill
```

With `fill=1.0` (default): `1 - 1.0 = 0.0` — missing-FTS entries
contribute `0` to the FTS term, identical to how missing-vector entries
contribute `0` to the vector term.

## Verification

Concrete example from the issue. With `weight=0.7`, `fill=1.0`:

| Document | `_distance` | `_score` | Old `_relevance_score` | New
`_relevance_score` |

|----------|-------------|----------|------------------------|------------------------|
| `apple orange` | 0.0 (best) | 2.41 (only FTS) | 0.30 (**wrong: ranked
2nd**) | 1.42 (**correct: ranked 1st**) |
| `banana grape` | 0.9999 (worst) | — | 0.70 (**wrong: ranked 1st**) |
0.00 (**correct: ranked last**) |

## Tests

Two regression tests added to `python/python/tests/test_rerankers.py`:

- `test_linear_combination_best_match_ranks_first` — the document with
the smallest distance **and** an FTS match must have the highest
`_relevance_score`.
- `test_linear_combination_missing_fts_is_penalised` — a document with
any FTS score must beat an otherwise-equal document with no FTS match.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>

2026-05-26 15:26:34 -07:00

python

fix(rerankers): inverted scores and incorrect missing-FTS penalty in LinearCombinationReranker (#3437 )

2026-05-26 15:26:34 -07:00

src

feat: support setting LSM write spec for a table (#3396 )

2026-05-18 00:11:33 -07:00

tests

feat(python): support bytes in lit() expressions (#3387 )

2026-05-14 15:24:52 -07:00

.bumpversion.toml

Bump version: 0.33.0-beta.0 → 0.33.0-beta.1

2026-05-22 10:08:07 +00:00

.gitignore

feat(python): add type-safe expression builder API (#3150 )

2026-03-31 11:32:49 -07:00

AGENTS.md

docs: document Python uv agent workflow (#3417 )

2026-05-20 21:35:42 +08:00

ASYNC_MIGRATION.md

feat: add support for add to async python API (#1037 )

2024-04-05 16:31:36 -07:00

build.rs

ci: check license headers (#2076 )

2025-01-29 08:27:07 -08:00

Cargo.toml

Bump version: 0.33.0-beta.0 → 0.33.0-beta.1

2026-05-22 10:08:07 +00:00

CLAUDE.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

CONTRIBUTING.md

chore!: change support python version from 3.10 to 3.13 (#2955 )

2026-01-30 01:47:50 +08:00

LICENSE

chore: bump lance to 0.8.5 (#561 )

2024-04-05 16:22:59 -07:00

license_header.txt

ci: check license headers (#2076 )

2025-01-29 08:27:07 -08:00

Makefile

ci: add support for lance-format fury index for downloading pylance (#2804 )

2025-11-20 23:29:36 -08:00

pyproject.toml

refactor(python): remove legacy tantivy FTS support (#3282 )

2026-04-20 09:28:45 +08:00

PYTHON_THIRD_PARTY_LICENSES.md

refactor(python): remove legacy tantivy FTS support (#3282 )

2026-04-20 09:28:45 +08:00

README.md

chore: unify component README titles (#3066 )

2026-03-09 21:47:58 +08:00

RUST_THIRD_PARTY_LICENSES.html

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

uv.lock

refactor(python): remove legacy tantivy FTS support (#3282 )

2026-04-20 09:28:45 +08:00

README.md

LanceDB Python SDK

A Python library for LanceDB.

Installation

pip install lancedb

Preview Releases

Stable releases are created about every 2 weeks. For the latest features and bug fixes, you can install the preview release. These releases receive the same level of testing as stable releases, but are not guaranteed to be available for more than 6 months after they are released. Once your application is stable, we recommend switching to stable releases.

pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb

Usage

Basic Example

import lancedb
db = lancedb.connect('<PATH_TO_LANCEDB_DATASET>')
table = db.open_table('my_table')
results = table.search([0.1, 0.3]).limit(20).to_list()
print(results)

Development

See CONTRIBUTING.md for information on how to contribute to LanceDB.