Adds IVF_SQ index config through Rust core and Python bindings, plus
alias names IvfHnswSq/Pq for backward compatibility. Updates
remote/table helpers and types to accept the new index type. Includes
tests covering IVF SQ creation and alias usage.
this introduces some breaking changes in terms of rust API of creating
FTS index, and the default index params changed
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Updated default settings for full-text search (FTS) index creation:
stemming, stop word removal, and ASCII folding are now enabled by
default, while token position storage is disabled by default.
- **Refactor**
- Simplified and streamlined the configuration and handling of FTS index
parameters for improved maintainability and consistency across
interfaces.
- Enhanced serialization and request construction for FTS index
parameters to reduce manual handling and improve code clarity.
- Improved test coverage by explicitly enabling positional indexing in
FTS tests to support phrase queries.
- **Chores**
- Upgraded all internal dependencies related to FTS indexing to the
latest version for enhanced compatibility and performance.
- Updated package versions for Node.js, Python, and Rust components to
the latest beta releases.
- Improved CI workflows by adding Rust toolchain setup with formatting
and linting tools.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
binary vectors and hamming distance can work on only IVF_FLAT, so
introduce them all in this PR.
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
### Changes to sync API
* Updated `LanceTable` and `LanceDBConnection` reprs
* Add `storage_options`, `data_storage_version`, and
`enable_v2_manifest_paths` to sync create table API.
* Add `storage_options` to `open_table` in sync API.
* Add `list_indices()` and `index_stats()` to sync API
* `create_table()` will now create only 1 version when data is passed.
Previously it would always create two versions: 1 to create an empty
table and 1 to add data to it.
### Changes to async API
* Add `embedding_functions` to async `create_table()` API.
* Added `head()` to async API
### Refactors
* Refactor index parameters into dataclasses so they are easier to use
from Python
* Moved most tests to use an in-memory DB so we don't need to create so
many temp directories
Closes#1792Closes#1932
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Replaces Python implementation of Remote SDK with Rust one.
* Drops dependency on `attrs` and `cachetools`. Makes `requests` an
optional dependency used only for embeddings feature.
* Adds dependency on `nest-asyncio`. This was required to get hybrid
search working.
* Deprecate `request_thread_pool` parameter. We now use the tokio
threadpool.
* Stop caching the `schema` on a remote table. Schema is mutable and
there's no mechanism in place to invalidate the cache.
* Removed the client-side resolution of the vector column. We should
already be resolving this server-side.
BREAKING CHANGE: default tokenizer no longer does stemming or stop-word
removal. Users should explicitly turn that option on in the future.
- upgrade lance to 0.19.1
- update the FTS docs
- update the FTS API
Upstream change notes:
https://github.com/lancedb/lance/releases/tag/v0.19.1
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
BREAKING CHANGE: the return value of `index_stats` method has changed
and all `index_stats` APIs now take index name instead of UUID. Also
several deprecated index statistics methods were removed.
* Removes deprecated methods for individual index statistics
* Aligns public `IndexStatistics` struct with API response from LanceDB
Cloud.
* Implements `index_stats` for remote Rust SDK and Python async API.
Before this we ignored the `fts_columns` parameter, and for now we
support to search on only one column, it could lead to an error if we
have multiple indexed columns for FTS
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>