From bf16de550e6ae6040d7c40f627ed2b30bdd0e5a0 Mon Sep 17 00:00:00 2001
From: discord9 <discord9@163.com>
Date: Mon, 13 Apr 2026 20:48:48 +0800
Subject: [PATCH] docs: better rephrase

Signed-off-by: discord9 <discord9@163.com>
---
 .../rfcs/2026-04-10-chinese-fulltext-lexicon-expansion.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/docs/rfcs/2026-04-10-chinese-fulltext-lexicon-expansion.md b/docs/rfcs/2026-04-10-chinese-fulltext-lexicon-expansion.md
index 070ff3573c..c4e47a6c94 100644
--- a/docs/rfcs/2026-04-10-chinese-fulltext-lexicon-expansion.md
+++ b/docs/rfcs/2026-04-10-chinese-fulltext-lexicon-expansion.md
@@ -90,13 +90,14 @@ Those are cross-boundary combinations from adjacent text, not reasonable subterm
 
 1. `@@` is still rewritten to `matches_term(...)`.
 2. Query analysis produces the normal query tokens.
-3. For eligible Chinese analyzed tokens, the engine looks up lexicon tokens according to token position:
+3. This model assumes query-time and index-time analysis are aligned: expansion works over compatible analyzed token boundaries, not by repairing an arbitrary mismatch between raw query text and a completely different indexed tokenization.
+4. For eligible Chinese analyzed tokens, the engine looks up lexicon tokens according to token position:
    - a single-token query may use normal contains expansion
    - the first token in a multi-token query may only expand to tokens that use it as a suffix
    - the last token in a multi-token query may only expand to tokens that use it as a prefix
    - middle tokens do not expand
-4. The expanded token set becomes the probe set for bloom/fulltext recall.
-5. Final correctness still uses `matches_term`.
+5. The expanded token set becomes the probe set for bloom/fulltext recall.
+6. Final correctness still uses `matches_term`.
 
 This keeps recall and correctness separate:
 
@@ -170,6 +171,7 @@ That tradeoff is the main reason to propose lexicon expansion as the next step.
 
 - Expansion happens after query analysis, not on the raw full query string.
 - Single-token queries may use normal contains expansion.
+- This rule assumes indexed text and query text are analyzed with the same or compatible tokenizer boundaries.
 - Multi-token queries use outward-only expansion:
   - the first token may expand only to tokens that use it as a suffix, for example `登录 -> 立即登录`
   - the last token may expand only to tokens that use it as a prefix, for example `手机号 -> 手机号验证码`