feat!: upgrade lance to v0.28.0 (#2404)

this introduces some breaking changes in terms of rust API of creating
FTS index, and the default index params changed

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Updated default settings for full-text search (FTS) index creation:
stemming, stop word removal, and ASCII folding are now enabled by
default, while token position storage is disabled by default.

- **Refactor**
- Simplified and streamlined the configuration and handling of FTS index
parameters for improved maintainability and consistency across
interfaces.
- Enhanced serialization and request construction for FTS index
parameters to reduce manual handling and improve code clarity.
- Improved test coverage by explicitly enabling positional indexing in
FTS tests to support phrase queries.

- **Chores**
- Upgraded all internal dependencies related to FTS indexing to the
latest version for enhanced compatibility and performance.
- Updated package versions for Node.js, Python, and Rust components to
the latest beta releases.
- Improved CI workflows by adding Rust toolchain setup with formatting
and linting tools.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
This commit is contained in:
BubbleCal
2025-05-30 06:19:24 +08:00
committed by GitHub
parent d0bc671cac
commit 5c7f63388d
21 changed files with 484 additions and 479 deletions

View File

@@ -125,32 +125,30 @@ impl Index {
ascii_folding: Option<bool>,
) -> Self {
let mut opts = FtsIndexBuilder::default();
let mut tokenizer_configs = opts.tokenizer_configs.clone();
if let Some(with_position) = with_position {
opts = opts.with_position(with_position);
}
if let Some(base_tokenizer) = base_tokenizer {
tokenizer_configs = tokenizer_configs.base_tokenizer(base_tokenizer);
opts = opts.base_tokenizer(base_tokenizer);
}
if let Some(language) = language {
tokenizer_configs = tokenizer_configs.language(&language).unwrap();
opts = opts.language(&language).unwrap();
}
if let Some(max_token_length) = max_token_length {
tokenizer_configs = tokenizer_configs.max_token_length(Some(max_token_length as usize));
opts = opts.max_token_length(Some(max_token_length as usize));
}
if let Some(lower_case) = lower_case {
tokenizer_configs = tokenizer_configs.lower_case(lower_case);
opts = opts.lower_case(lower_case);
}
if let Some(stem) = stem {
tokenizer_configs = tokenizer_configs.stem(stem);
opts = opts.stem(stem);
}
if let Some(remove_stop_words) = remove_stop_words {
tokenizer_configs = tokenizer_configs.remove_stop_words(remove_stop_words);
opts = opts.remove_stop_words(remove_stop_words);
}
if let Some(ascii_folding) = ascii_folding {
tokenizer_configs = tokenizer_configs.ascii_folding(ascii_folding);
opts = opts.ascii_folding(ascii_folding);
}
opts.tokenizer_configs = tokenizer_configs;
Self {
inner: Mutex::new(Some(LanceDbIndex::FTS(opts))),