mirror of
https://github.com/lancedb/lancedb.git
synced 2026-07-03 11:00:40 +00:00
fix(python): average MRR reciprocal ranks over all rankings (#3599)
## What
`MRRReranker.rerank_multivector` averages each document's reciprocal
ranks over the wrong denominator. It divides by the number of rankings
the document *happens to appear in*, instead of the total number of
rankings being fused.
```python
# python/python/lancedb/rerankers/mrr.py
for result_id, reciprocal_ranks in mrr_score_map.items():
mean_rr = np.mean(reciprocal_ranks) # divides by len(present systems)
```
`mrr_score_map[doc]` only accumulates a reciprocal rank for the systems
in which the document was returned, so `np.mean` never accounts for the
systems that missed it.
## Why it's wrong
Mean Reciprocal Rank fusion treats a system that didn't return a
document as a reciprocal rank of `0` and averages across **all**
systems. That's the exact mechanism by which it rewards cross-system
consensus. Dividing by the appearance count removes that, so a document
liked by a single ranking can beat one ranked highly by every ranking.
Concretely, fusing 3 vector rankings:
| Doc | Ranks | Current score | Correct score |
|-----|-------|---------------|---------------|
| A | #1 in 1 system only | `mean([1.0]) = 1.000` | `1.0 / 3 = 0.333` |
| B | #1, #1, #2 across all 3 | `mean([1, 1, .5]) = 0.833` | `2.5 / 3 =
0.833` |
The current code ranks **A above B** - a document two of three rankings
ignored outranks one all three ranked at or near the top.
This also makes `rerank_multivector` inconsistent with `rerank_hybrid`
in the same file, which already treats a missing system as `0`
(`vector_rr = 0.0` / `fts_rr = 0.0`), and with the class docstring
("average of reciprocal ranks across different search results").
## Fix
Divide the summed reciprocal ranks by the total number of rankings:
```python
num_systems = len(vector_results)
...
mean_rr = float(np.sum(reciprocal_ranks)) / num_systems
```
## Tests
Adds `test_mrr_multivector_rewards_consensus`, which asserts the exact
MRR scores and that the consensus document ranks first. It fails on
`main` and passes with this change. Existing reranker tests are
unaffected.
This commit is contained in:
@@ -156,9 +156,16 @@ class MRRReranker(Reranker):
|
||||
reciprocal_rank = 1.0 / rank
|
||||
mrr_score_map[result_id].append(reciprocal_rank)
|
||||
|
||||
# MRR averages the reciprocal rank across *all* ranking systems, treating
|
||||
# a system in which a document does not appear as a reciprocal rank of 0.
|
||||
# We therefore divide by the total number of systems, not by the number of
|
||||
# systems the document happens to appear in -- otherwise a document found
|
||||
# by a single ranking would outrank one ranked highly by every system,
|
||||
# defeating the purpose of fusing the rankings.
|
||||
num_systems = len(vector_results)
|
||||
final_mrr_scores = {}
|
||||
for result_id, reciprocal_ranks in mrr_score_map.items():
|
||||
mean_rr = np.mean(reciprocal_ranks)
|
||||
mean_rr = float(np.sum(reciprocal_ranks)) / num_systems
|
||||
final_mrr_scores[result_id] = mean_rr
|
||||
|
||||
combined = pa.concat_tables(vector_results, **self._concat_tables_args)
|
||||
|
||||
@@ -350,6 +350,38 @@ def test_mrr_reranker_empty_input():
|
||||
reranker.rerank_multivector([])
|
||||
|
||||
|
||||
def test_mrr_multivector_rewards_consensus():
|
||||
# Reciprocal ranks must be averaged across *all* ranking systems, treating a
|
||||
# missing system as 0. A document ranked first by every system must outrank a
|
||||
# document ranked first by only one of them.
|
||||
reranker = MRRReranker()
|
||||
|
||||
def ranking(row_ids):
|
||||
return pa.table({"_rowid": pa.array(row_ids, type=pa.int64())})
|
||||
|
||||
# Doc 1 is rank 1 in only the first system; doc 2 is rank 1 in two systems
|
||||
# and rank 2 in the third (strong cross-system consensus).
|
||||
rs1 = ranking([1, 2, 3])
|
||||
rs2 = ranking([2, 3, 4])
|
||||
rs3 = ranking([2, 5, 6])
|
||||
|
||||
result = reranker.rerank_multivector([rs1, rs2, rs3])
|
||||
scores = {
|
||||
row_id: score
|
||||
for row_id, score in zip(
|
||||
result["_rowid"].to_pylist(),
|
||||
result["_relevance_score"].to_pylist(),
|
||||
)
|
||||
}
|
||||
|
||||
# sum of reciprocal ranks / number of systems
|
||||
assert scores[1] == pytest.approx(1.0 / 3)
|
||||
assert scores[2] == pytest.approx((0.5 + 1.0 + 1.0) / 3)
|
||||
assert scores[2] > scores[1]
|
||||
# The consensus document ranks first overall.
|
||||
assert result["_rowid"].to_pylist()[0] == 2
|
||||
|
||||
|
||||
def test_rrf_reranker_distance():
|
||||
data = pa.table(
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user