lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-22 21:09:58 +00:00

Author	SHA1	Message	Date
Mark McCaskey	fe76496a59	fix: `.nprobes` method in python bindings, improve error messages (#2556 ) `nprobes` with a value greater than 20 fails with the minimum error: ``` self = <lancedb.query.AsyncVectorQuery object at 0x10b749720>, minimum_nprobes = 30 def minimum_nprobes(self, minimum_nprobes: int) -> Self: """Set the minimum number of probes to use. See `nprobes` for more details. These partitions will be searched on every indexed vector query and will increase recall at the expense of latency. """ > self._inner.minimum_nprobes(minimum_nprobes) E ValueError: Invalid input, minimum_nprobes must be less than or equal to maximum_nprobes python/lancedb/query.py:2744: ValueError ``` Putting the max set before the min seems reasonable but it causes this reasonable case to fail: ``` def test_nprobes_min_max_works_sync(table): LanceVectorQueryBuilder(table, [0, 0], "vector").minimum_nprobes(2).maximum_nprobes(4).to_list() ``` with ``` self = <lancedb.query.AsyncVectorQuery object at 0x1203f1c90>, maximum_nprobes = 4 def maximum_nprobes(self, maximum_nprobes: int) -> Self: """Set the maximum number of probes to use. See `nprobes` for more details. If this value is greater than `minimum_nprobes` then the excess partitions will be searched only if we have not found enough results. This can be useful when there is a narrow filter to allow these queries to spend more time searching and avoid potential false negatives. If this value is 0 then no limit will be applied and all partitions could be searched if needed to satisfy the limit. """ > self._inner.maximum_nprobes(maximum_nprobes) E ValueError: Invalid input, maximum_nprobes must be greater than or equal to minimum_nprobes python/lancedb/query.py:2761: ValueError ```. The case I care about is where min == max, but this solution handles it even if they're not. If both min and max exist, we set both to the minimum and then set the max. This isn't 100% the same as the minimum setter checks for 0 on the min and `.nprobes` does not do any sanity checking at all. But I figured this was the most reasonable and general solution without touching more of this code. As part of this I noticed the error messages were a bit ambiguous so I made them symmetric and clarified them while I was here.	2025-07-30 09:23:25 -07:00
Lance Release	f79295c697	Bump version: 0.24.2-beta.2 → 0.24.2	2025-07-25 20:31:15 +00:00
Lance Release	381fad9b65	Bump version: 0.24.2-beta.1 → 0.24.2-beta.2	2025-07-25 20:31:15 +00:00
Tristan Zajonc	055bf91d3e	fix: handle empty list with schema in table creation (#2548 ) ## Summary Fixes IndexError when creating tables with empty list data and a provided schema. Previously, `_into_pyarrow_reader()` would attempt to access `data[0]` on empty lists, causing an IndexError. Now properly handles empty lists by using the provided schema. Also adds regression tests for GitHub issues #1968 and #303 to prevent future regressions with empty table scenarios. ## Changes - Fix IndexError in `_into_pyarrow_reader()` for empty list + schema case - Add Optional[pa.Schema] parameter to handle empty data gracefully - Add `test_create_table_empty_list_with_schema` for the IndexError fix - Add `test_create_empty_then_add_data` for issue #1968 - Add `test_search_empty_table` for issue #303 ## Test plan - [x] All new regression tests pass - [x] Existing tests continue to pass - [x] Code formatted with `make format`	2025-07-25 10:23:43 +08:00
Tristan Zajonc	10fa23e0d6	fix(python): expose register function in embeddings module (#2544 ) ## Summary Fixes #2541 Problem: The `register` function was not accessible via `from lancedb.embeddings import register` as documented, causing ImportError for users trying to create custom embedding functions. Solution: Added `register` to the exports in `python/lancedb/embeddings/__init__.py` to match the documented API and follow the same pattern as other registry functions (`get_registry`, `EmbeddingFunctionRegistry`). Root Cause: The function existed in `lancedb.embeddings.registry` but wasn't exposed through the main embeddings module interface. ## Changes - Add `register` to imports in `/python/python/lancedb/embeddings/__init__.py` ## Test Plan - [x] Verified `from lancedb.embeddings import register` works as documented - [x] Confirmed existing embedding tests pass - [x] Checked that the fix follows existing patterns (same as `get_registry`) - [x] Validated linting and formatting passes ## References Fixes #2541	2025-07-24 15:30:06 -07:00
yihong	43d9fc28b0	fix: can not build on python3.9 for dev (#2477 ) This patch fix can not build on python3.9 dev the reason is that for ibm-watsonx-ai the min version is py3.10 more can check on `pyoven` https://pyoven.org/package/ibm-watsonx-ai/ also fix tiny md lint --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-07-24 12:39:04 -07:00
aniaan	f45f0d0431	fix(python): correct type annotations in EmbeddingFunctionRegistry (#2478 ) - Fix register() method's alias parameter type from 'str = None' to 'Optional[str] = None' - Add return type annotation 'Type[EmbeddingFunction]' to get() method - Import Type from typing module for proper type hints	2025-07-24 12:31:49 -07:00
Tristan Zajonc	b9e3c36d82	fix: replace broken documentation URLs in error messages (#2533 ) Replaces broken 404 URL and unhelpful documentation links in type error messages with working URL and inline list of supported data types. Before: Points to https://lancedb.github.io/lance/read_and_write.html (404 error) After: Lists supported types inline and points to https://lancedb.github.io/lancedb/guides/tables/	2025-07-24 12:30:27 -07:00
Chen Chongchen	3cd7dd3375	fix: to_pydantic typing (#2517 ) currently, to_pydantic will always return LanceModel. If type checking is enabled in my project. I have to use `cast(data, List[RealModelType])` to solve type error. This PR uses generic to solve this problem.	2025-07-24 12:30:15 -07:00
Will Jones	3d1f102087	feat: allow Python and Typescript users to create `Session`s (#2530 ) ## Summary - Exposes `Session` in Python and Typescript so users can set the `index_cache_size_bytes` and `metadata_cache_size_bytes` * The `Session` is attached to the `Connection`, and thus shared across all tables in that connection. - Adds deprecation warnings for table-level cache configuration 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-24 12:06:29 -07:00
Tristan Zajonc	81afd8a42f	fix: use local random state in FTS test fixtures to prevent flaky failures (#2532 ) ## Summary Fixes intermittent CI failures in `test_search_fts[False]` where boolean FTS queries were returning fewer results than expected due to non-deterministic test data generation. ## Problem The test was using global `random` and `np.random` without seeding, causing the boolean query `MatchQuery("puppy", "text") & MatchQuery("runs", "text")` to sometimes return only 3 results instead of the expected 5, leading to `AssertionError: assert 3 == 5`. ## Solution - Replace global random calls with local `random.Random(42)` and `np.random.RandomState(42)` objects in test fixtures - Ensures deterministic test data while maintaining test isolation - No impact on other tests since random state is scoped to fixtures only ## Test Results - ✅ `test_search_fts[False]` now passes consistently - ✅ All other FTS tests continue to pass - ✅ No regression in other test suites (verified with `test_basic`) - ✅ Maintains existing test behavior and coverage	2025-07-24 11:30:02 -07:00
Tristan Zajonc	c2aa03615a	fix: correct grammar in LanceDB cloud connection error message (#2537 ) ## Summary Fixed a minor grammar error in the error message for missing API key when connecting to LanceDB cloud. ## Changes - Changed 'api_key is required to connected LanceDB cloud' to 'api_key is required to connect to LanceDB cloud' - Location: `python/python/lancedb/__init__.py:95` ## Test plan - Error message formatting is correct and grammatical - No functional changes to existing behavior	2025-07-24 09:56:06 -07:00
Tristan Zajonc	d2c6759e7f	fix: use import stubs to prevent MLX doctest collection failures (#2536 ) ## Summary - Add `create_import_stub()` helper to `embeddings/utils.py` for handling optional dependencies - Fix MLX doctest collection failures by using import stubs in `gte_mlx_model.py` - Module now imports successfully for doctest collection even when MLX is not installed ## Changes - New utility function: `create_import_stub()` creates placeholder objects that allow class inheritance but raise helpful errors when used - Updated MLX model: Uses import stubs instead of direct imports that fail immediately - Graceful degradation: Clear error messages when MLX functionality is accessed without MLX installed ## Test Results - ✅ `pytest --doctest-modules python/lancedb` now passes (with and without MLX installed) - ✅ All existing tests continue to pass - ✅ MLX functionality works normally when MLX is installed - ✅ Helpful error messages when MLX functionality is used without MLX installed Fixes #2538 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-07-23 16:25:33 -07:00
Will Jones	fbff244ed8	chore: add claude md files (#2531 ) Gives basic context to Claude about how to do common tasks in the repo.	2025-07-23 12:20:36 -07:00
Lance Release	7a15337e03	Bump version: 0.24.2-beta.0 → 0.24.2-beta.1	2025-07-22 15:40:17 +00:00
Lance Release	ce24457531	Bump version: 0.24.1 → 0.24.2-beta.0	2025-07-18 16:02:37 +00:00
BubbleCal	087fe6343d	test: fix random data may break test case (#2514 ) this test adds a new vector and then performs vector search with distance range. this may fail if the new vector becomes the closest one to the query vector Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-18 16:15:06 +08:00
Ayush Chaurasia	f076bb41f4	feat: add support for returning all scores with rerankers (#2509 ) Previously `return_score="all"` was supported only for the default reranker (RRF) and not the model based rerankers. This adds support for keeping all scores in the base reranker so that all model based rerankers can use it. Its a slower path than keeping just the relevance score but can be useful in debugging	2025-07-15 21:03:03 +05:30
BubbleCal	03b62599d7	feat: support ngram tokenizer (#2507 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-15 16:36:08 +08:00
Lance Release	a300a238db	Bump version: 0.24.1-beta.2 → 0.24.1	2025-07-10 21:36:02 +00:00
Lance Release	a41ff1df0a	Bump version: 0.24.1-beta.1 → 0.24.1-beta.2	2025-07-10 21:36:02 +00:00
CyrusAttoun	167fccc427	fix: change 'return' to 'raise' for unimplemented remote table function (#2484 ) just noticed that we're doing a 'return' instead of a 'raise' while trying to get remote functionality working for my project. I went ahead and implemented tests for both of the unimplemented functions (to_pandas and to_arrow) while I was in there. --------- Co-authored-by: Cyrus Attoun <jattoun1@gmail.com>	2025-07-09 14:27:08 -07:00
Lance Release	905552f993	Bump version: 0.24.1-beta.0 → 0.24.1-beta.1	2025-07-09 05:53:28 +00:00
BubbleCal	cab36d94b2	feat: support to specify num_partitions and num_bits (#2488 )	2025-07-09 11:36:09 +08:00
Lance Release	d4bb59b542	Bump version: 0.24.0 → 0.24.1-beta.0	2025-07-07 21:00:38 +00:00
Weston Pace	1dadb2aefa	feat: upgrade to lance 0.31.0-beta.1 (#2469 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Updated dependencies to newer versions for improved compatibility and stability. * Refactor * Improved internal handling of data ranges and stream lifetimes for enhanced performance and reliability. * Simplified code style for Python query object conversions without affecting functionality. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-30 11:10:53 -07:00
Haoyu Weng	eb9784d7f2	feat(python): batch Ollama embed calls (#2453 ) Other embedding integrations such as Cohere and OpenAI already send requests in batches. We should do that for Ollama too to improve throughput. The Ollama [`.embed` API](`63ca747622/ollama/_client.py (L359-L378)`) was added in version 0.3.0 (almost a year ago) so I updated the version requirement in pyproject. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes - Improved compatibility with newer versions of the "ollama" package by requiring version 0.3.0 or higher. - Enhanced embedding generation to process batches of texts more efficiently and reliably. - Refactor - Improved type consistency and clarity for embedding-related methods. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-30 08:28:14 -07:00
Will Jones	4beb2d2877	fix(python): make sure `explain_plan` works with FTS queries (#2466 ) ## Summary Fixes issue #2465 where FTS explain plans only showed basic `LanceScan` instead of detailed execution plans with FTS query details, limits, and offsets. ## Root Cause The `FTSQuery::explain_plan()` and `analyze_plan()` methods were missing the `.full_text_search()` call before calling explain/analyze plan, causing them to operate on the base query without FTS context. ## Changes - Fixed `explain_plan()` and `analyze_plan()` in `src/query.rs` to call `.full_text_search()` - Added comprehensive test coverage for FTS explain plans with limits, offsets, and filters - Updated existing tests to expect correct behavior instead of buggy behavior ## Before/After Before (broken): ``` LanceScan: uri=..., projection=[...], row_id=false, row_addr=false, ordered=true ``` After (fixed): ``` ProjectionExec: expr=[id@2 as id, text@3 as text, _score@1 as _score] Take: columns="_rowid, _score, (id), (text)" CoalesceBatchesExec: target_batch_size=1024 GlobalLimitExec: skip=2, fetch=4 MatchQuery: query=test ``` ## Test Plan - [x] All new FTS explain plan tests pass - [x] Existing tests continue to pass - [x] FTS queries now show proper execution plans with MatchQuery, limits, filters Closes #2465 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Added new test cases to verify explain plan output for full-text search, vector queries with pagination, and queries with filters. * Bug Fixes * Improved the accuracy of explain plan and analysis output for full-text search queries, ensuring the correct query details are reflected. * Refactor * Enhanced the formatting and hierarchical structure of execution plans for hybrid queries, providing clearer and more detailed plan representations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-06-26 23:35:14 -07:00
Lance Release	c625b6f2b2	Bump version: 0.24.0-beta.0 → 0.24.0	2025-06-20 05:46:05 +00:00
Lance Release	bec8fe6547	Bump version: 0.23.1-beta.2 → 0.24.0-beta.0	2025-06-20 05:46:04 +00:00
BubbleCal	cb70ff8cee	feat!: switch default FTS to native lance FTS (#2428 ) This switches the default FTS to native lance FTS for Python sync table API, the other APIs have switched to native implementation already <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - The default behavior for creating a full-text search index now uses the new implementation rather than the legacy one. - Bug Fixes - Improved handling and error messages for phrase queries in full-text search. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-06-19 10:38:34 +08:00
BubbleCal	cbb5a841b1	feat: support prefix matching and must_not clause (#2441 )	2025-06-19 10:32:32 +08:00
Lance Release	e5a80a5e86	Bump version: 0.23.1-beta.1 → 0.23.1-beta.2	2025-06-18 23:33:05 +00:00
Lance Release	e08d45e090	Bump version: 0.23.1-beta.0 → 0.23.1-beta.1	2025-06-17 23:22:00 +00:00
Wyatt Alt	627ca4c810	chore: update lance to v0.29.1-beta.2 (#2442 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated internal dependencies to use a newer version of the Lance library. - New Features - Added support for a new query occurrence type labeled "MUST NOT" in search filters. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-17 14:02:13 -07:00
Lance Release	9eb6119468	Bump version: 0.23.0 → 0.23.1-beta.0	2025-06-16 16:29:22 +00:00
Weston Pace	59b57e30ed	feat: add maximum and minimum nprobes properties (#2430 ) This exposes the maximum_nprobes and minimum_nprobes feature that was added in https://github.com/lancedb/lance/pull/3903 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for specifying minimum and maximum probe counts in vector search queries, allowing finer control over search behavior. - Users can now independently set minimum and maximum probes for vector and hybrid queries via new methods and parameters in Python, Node.js, and Rust APIs. - Bug Fixes - Improved parameter validation to ensure correct usage of minimum and maximum probe values. - Tests - Expanded test coverage to validate correct handling, serialization, and error cases for the new probe parameters. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-13 15:18:29 -07:00
BubbleCal	84ded9d678	feat: support new FTS features in python SDK (#2411 ) - AND operator - phrase query slop param - boolean query <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for combining full-text search queries using AND/OR operators, enabling more flexible query composition. - Introduced new query types and parameters, including boolean queries, operator selection, occurrence constraints, and phrase slop for advanced search scenarios. - Enhanced asynchronous search to accept rich full-text query objects directly. - Bug Fixes - Improved handling and validation of full-text search queries in both synchronous and asynchronous search operations. - Tests - Updated and expanded tests to cover new full-text query types and their usage in search functions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-06-06 14:33:46 +08:00
Lance Release	8f42b5874e	Bump version: 0.23.0-beta.3 → 0.23.0	2025-06-04 21:07:39 +00:00
Lance Release	274f19f560	Bump version: 0.23.0-beta.2 → 0.23.0-beta.3	2025-06-04 21:07:38 +00:00
Lance Release	20e017fedc	Bump version: 0.23.0-beta.1 → 0.23.0-beta.2	2025-06-04 07:13:44 +00:00
Jack Ye	74e578b3c8	feat: upgrade lance to v0.29.0-beta.2 (#2419 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated various internal dependencies to newer versions for improved stability and compatibility. - Increased the version number for the Python package. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-03 15:16:26 -07:00
Lance Release	8825c7c1dd	Bump version: 0.23.0-beta.0 → 0.23.0-beta.1	2025-06-03 16:26:58 +00:00
Lance Release	5c7303ab2e	Bump version: 0.22.2-beta.0 → 0.23.0-beta.0	2025-05-31 03:47:13 +00:00
Will Jones	5895ef4039	ci: revert unnecessary version bump (#2415 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Downgraded version numbers for the Node.js, Python, and Rust packages. No other user-facing changes were made. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-30 16:51:14 -07:00
BubbleCal	5c7f63388d	feat!: upgrade lance to v0.28.0 (#2404 ) this introduces some breaking changes in terms of rust API of creating FTS index, and the default index params changed Signed-off-by: BubbleCal <bubble-cal@outlook.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Updated default settings for full-text search (FTS) index creation: stemming, stop word removal, and ASCII folding are now enabled by default, while token position storage is disabled by default. - Refactor - Simplified and streamlined the configuration and handling of FTS index parameters for improved maintainability and consistency across interfaces. - Enhanced serialization and request construction for FTS index parameters to reduce manual handling and improve code clarity. - Improved test coverage by explicitly enabling positional indexing in FTS tests to support phrase queries. - Chores - Upgraded all internal dependencies related to FTS indexing to the latest version for enhanced compatibility and performance. - Updated package versions for Node.js, Python, and Rust components to the latest beta releases. - Improved CI workflows by adding Rust toolchain setup with formatting and linting tools. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-29 15:19:24 -07:00
Renato Marroquin	d0bc671cac	docs: add example for querying a lance table with SQL (#2389 ) Adds example for querying a dataset with SQL <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Added new guides on querying LanceDB tables using SQL with DuckDB and Apache Datafusion. - Included detailed instructions for integrating LanceDB with Datafusion in Python. - Updated navigation to include Datafusion and SQL querying documentation. - Improved formatting in TypeScript and vectordb update examples for consistency. - Tests - Added a new test demonstrating SQL querying on Lance tables via DataFusion integration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-05-29 06:14:38 -07:00
Lance Release	d7a9dbb9fc	Bump version: 0.22.1 → 0.22.2-beta.0	2025-05-23 21:58:17 +00:00
Lance Release	745c34a6a9	Bump version: 0.22.1-beta.6 → 0.22.1	2025-05-22 05:57:20 +00:00
Lance Release	db8fa2454d	Bump version: 0.22.1-beta.5 → 0.22.1-beta.6	2025-05-22 05:57:20 +00:00

1 2 3 4 5 ...

857 Commits