mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 02:50:41 +00:00

Files

Armaan Sandhu a1261e6299 fix(python): average MRR reciprocal ranks over all rankings (#3599 )

## What

`MRRReranker.rerank_multivector` averages each document's reciprocal
ranks over the wrong denominator. It divides by the number of rankings
the document *happens to appear in*, instead of the total number of
rankings being fused.

```python
# python/python/lancedb/rerankers/mrr.py
for result_id, reciprocal_ranks in mrr_score_map.items():
    mean_rr = np.mean(reciprocal_ranks)   # divides by len(present systems)
```

`mrr_score_map[doc]` only accumulates a reciprocal rank for the systems
in which the document was returned, so `np.mean` never accounts for the
systems that missed it.

## Why it's wrong

Mean Reciprocal Rank fusion treats a system that didn't return a
document as a reciprocal rank of `0` and averages across **all**
systems. That's the exact mechanism by which it rewards cross-system
consensus. Dividing by the appearance count removes that, so a document
liked by a single ranking can beat one ranked highly by every ranking.

Concretely, fusing 3 vector rankings:

| Doc | Ranks | Current score | Correct score |
|-----|-------|---------------|---------------|
| A | #1 in 1 system only | `mean([1.0]) = 1.000` | `1.0 / 3 = 0.333` |
| B | #1, #1, #2 across all 3 | `mean([1, 1, .5]) = 0.833` | `2.5 / 3 =
0.833` |

The current code ranks **A above B** - a document two of three rankings
ignored outranks one all three ranked at or near the top.

This also makes `rerank_multivector` inconsistent with `rerank_hybrid`
in the same file, which already treats a missing system as `0`
(`vector_rr = 0.0` / `fts_rr = 0.0`), and with the class docstring
("average of reciprocal ranks across different search results").

## Fix

Divide the summed reciprocal ranks by the total number of rankings:

```python
num_systems = len(vector_results)
...
mean_rr = float(np.sum(reciprocal_ranks)) / num_systems
```

## Tests

Adds `test_mrr_multivector_rewards_consensus`, which asserts the exact
MRR scores and that the consensus document ranks first. It fails on
`main` and passes with this change. Existing reranker tests are
unaffected.

2026-07-01 15:36:56 -07:00

python

fix(python): average MRR reciprocal ranks over all rankings (#3599 )

2026-07-01 15:36:56 -07:00

src

fix(python): route sync namespace connections through rust (#3598 )

2026-06-30 14:46:23 -07:00

tests

feat(python): expose OAuth connection config (#3586 )

2026-06-29 12:36:35 -07:00

.bumpversion.toml

Bump version: 0.34.0-beta.4 → 0.34.0-beta.5

2026-06-30 22:23:43 +00:00

.gitignore

feat(python): add type-safe expression builder API (#3150 )

2026-03-31 11:32:49 -07:00

AGENTS.md

docs: document Python uv agent workflow (#3417 )

2026-05-20 21:35:42 +08:00

ASYNC_MIGRATION.md

feat: add support for add to async python API (#1037 )

2024-04-05 16:31:36 -07:00

build.rs

ci: check license headers (#2076 )

2025-01-29 08:27:07 -08:00

Cargo.toml

Bump version: 0.34.0-beta.4 → 0.34.0-beta.5

2026-06-30 22:23:43 +00:00

CLAUDE.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

CONTRIBUTING.md

chore!: change support python version from 3.10 to 3.13 (#2955 )

2026-01-30 01:47:50 +08:00

LICENSE

chore: bump lance to 0.8.5 (#561 )

2024-04-05 16:22:59 -07:00

license_header.txt

ci: check license headers (#2076 )

2025-01-29 08:27:07 -08:00

Makefile

ci: add support for lance-format fury index for downloading pylance (#2804 )

2025-11-20 23:29:36 -08:00

pyproject.toml

refactor(python): remove legacy tantivy FTS support (#3282 )

2026-04-20 09:28:45 +08:00

PYTHON_THIRD_PARTY_LICENSES.md

refactor(python): remove legacy tantivy FTS support (#3282 )

2026-04-20 09:28:45 +08:00

README.md

chore: unify component README titles (#3066 )

2026-03-09 21:47:58 +08:00

RUST_THIRD_PARTY_LICENSES.html

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

uv.lock

ci: update python lockfile weekly (#3498 )

2026-06-03 15:24:32 -07:00

README.md

LanceDB Python SDK

A Python library for LanceDB.

Installation

pip install lancedb

Preview Releases

Stable releases are created about every 2 weeks. For the latest features and bug fixes, you can install the preview release. These releases receive the same level of testing as stable releases, but are not guaranteed to be available for more than 6 months after they are released. Once your application is stable, we recommend switching to stable releases.

pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb

Usage

Basic Example

import lancedb
db = lancedb.connect('<PATH_TO_LANCEDB_DATASET>')
table = db.open_table('my_table')
results = table.search([0.1, 0.3]).limit(20).to_list()
print(results)

Development

See CONTRIBUTING.md for information on how to contribute to LanceDB.