For K∈{2,4,8,16,32} LLVM proves the index bounds and elides the checks,
so get_unchecked buys nothing on the production K=8 path. K=64 was the
only value that defeated bounds-check elision (one check in the tail
scan) and it was instantiated in tests only — drop it.
block_search: cite the k-ary search paper
Document that kary_search is the 'k-ary search on a sorted array' variant
from Schlegel, Gemulla & Lehner (DaMoN 2009), specialized to a lower-bound.
BufferedUnionScorer is the hot path for full union traversal, including (TopDocs, Count) where Count forces all matches to be visited. After the block-wand intersection changes, LLVM started inlining the refill helper into the advance path, which regressed TOP_100_COUNT union queries even though the union algorithm did not change.
Force the refill helper out of line so the advance loop stays small and stable while pruning collectors continue to use Block-WAND.
Benchmark on search-benchmark-game TOP_100_COUNT union query set (301 queries, sum of per-query medians):
- tantivy 0.26: 0.853646s
- main before: 0.918605s
- this change: 0.841659s
search-benchmark-game shows TOP_100_COUNT regression on queries tagged intersection_union.
The regression came from allowing single-term boolean unions to become TermUnion for Block-WAND. https://github.com/quickwit-oss/tantivy/pull/2915
When such a scorer is used as the optional side of RequiredOptionalScorer, boxing converted the lone term into BufferedUnionScorer.
Keep the TermUnion representation available for pruning collection, but unwrap one-term unions when boxing or doing non-pruning iteration.
We ([ParadeDB](https://github.com/paradedb/paradedb)) have restored and been using the removed [index sorting](https://github.com/quickwit-oss/tantivy/issues/2352) feature in our Tantivy fork.
Our use case is sorting the index by Postgres' internal `ctid` identifier. Results returned from Tantivy must be checked against Postgres' visibility map, and checking them in ctid order is much more cache friendly, resulting in up to 80% speedups for certain queries.
This PR is split into 5 commits, corresponding to the index sorting reversal plus bug fixes we uncovered during our usage of index sorting.
| Commit | Maps to | What it does |
|---|---|---|
| `2aea0ad9f` | foundation ([#104](https://github.com/paradedb/tantivy/pull/104)) | Restore `SegmentComponent::TempStore` (revert of upstream #2815). Subsumes fork PR [#104](https://github.com/paradedb/tantivy/pull/104)'s CI fix. |
| `9205bcb0c` | [#92](https://github.com/paradedb/tantivy/pull/92) | Restore sort-by-field (single-segment + merge paths). |
| `39c790f0f` | [#101](https://github.com/paradedb/tantivy/pull/101) | Enable `sort_by` for `Str`/`Bytes` fast fields. |
| `9c4341a87` | [#105](https://github.com/paradedb/tantivy/pull/105) | Native typed numeric sort-key comparison (precision/NULL fix). |
| `2d9ba2418` | [#106](https://github.com/paradedb/tantivy/pull/106) | Preserve NULL ordering in numeric segment merges. |
We have discussed with the Tantivy maintainers and they indicated they would be open to this PR. Another motivation for landing this PR is we are planning on contributing a significant refactor that makes Tantivy's segment components extensible, and landing that without index sorting leads to too many conflicts.
term_counts (one u32/term) was allocated but not charged to
AggregationLimitsGuard, so a memory limit could be exceeded silently.
Charge it, skip allocating it when unbounded, and add a regression test.
Rename the value threaded through build_segment_term_collector and
maybe_build_collector from max_term_id to col_max_val/max_column_val — it
is the column's max value, only later reused as the max term id. Make the
grid-size arithmetic overflow-/zero-safe (saturating_add, checked_div).
Spell out that `counts` is the flattened per-term × time-bucket grid (each
term's own contiguous slice) and that `term_counts` is only needed when the
per-term total can't be derived from that grid (i.e. with hard bounds).
When a histogram's hard_bounds are wider than the column's value range, the
per-doc `bounds.contains` check can never fail. Collapse such bounds to the
unbounded sentinel in `normalize_histogram_req`, so both the general histogram
hot loop and the fused term×histogram path skip the check — the latter then
derives per-term counts from the grid (the ~17% win) instead of falling back to
per-doc counting just because `bounds != [MIN, MAX]`.
Only the collect-time filter is affected: empty-bucket emission reads
`req.hard_bounds` directly, and hard_bounds only ever clips that range, so a
wider-than-data bound leaves results unchanged. Covered by new tests on the
general and fused paths, including mid-interval (bucket-splitting) bounds.
Also tighten the fused-path u32-overflow guard to bound on `num_vals()` (the
per-value increment count) rather than `num_docs()`, and document why the fused
collector's hot-loop fields are hoisted into locals (re-reading them from memory
each iteration measured ~15% slower).
Switch the default serialized output of `sum` on empty / all-missing
buckets back to `"value": 0` to match Elasticsearch, and gate the
SQL-style `"value": null` behavior behind a new
`none_if_no_match: Option<bool>` flag on `SumAggregation`.
`IntermediateSum::finalize` still returns `Option<f64>` internally so
the Rust API stays parallel to min/max/avg, but the ES-vs-SQL choice is
made at the boundary in `IntermediateMetricResult::into_final_metric_result`:
`None` is coerced to `Some(0.0)` unless `none_if_no_match` is set on the
aggregation request.
Adds `AggregationVariants::as_sum()` accessor for that boundary check
and two end-to-end tests covering both the default ES behavior and the
opt-in null behavior on an empty index.
Surface the trade-off in the doc comment so future reviewers see why
this differs from ES (which returns "value": 0 for sum over
empty/all-missing buckets) and what consumers (ParadeDB SQL NULL) the
None variant is meant to serve.