lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-23 05:19:58 +00:00

Author	SHA1	Message	Date
Tristan Zajonc	055bf91d3e	fix: handle empty list with schema in table creation (#2548 ) ## Summary Fixes IndexError when creating tables with empty list data and a provided schema. Previously, `_into_pyarrow_reader()` would attempt to access `data[0]` on empty lists, causing an IndexError. Now properly handles empty lists by using the provided schema. Also adds regression tests for GitHub issues #1968 and #303 to prevent future regressions with empty table scenarios. ## Changes - Fix IndexError in `_into_pyarrow_reader()` for empty list + schema case - Add Optional[pa.Schema] parameter to handle empty data gracefully - Add `test_create_table_empty_list_with_schema` for the IndexError fix - Add `test_create_empty_then_add_data` for issue #1968 - Add `test_search_empty_table` for issue #303 ## Test plan - [x] All new regression tests pass - [x] Existing tests continue to pass - [x] Code formatted with `make format`	2025-07-25 10:23:43 +08:00
Tristan Zajonc	10fa23e0d6	fix(python): expose register function in embeddings module (#2544 ) ## Summary Fixes #2541 Problem: The `register` function was not accessible via `from lancedb.embeddings import register` as documented, causing ImportError for users trying to create custom embedding functions. Solution: Added `register` to the exports in `python/lancedb/embeddings/__init__.py` to match the documented API and follow the same pattern as other registry functions (`get_registry`, `EmbeddingFunctionRegistry`). Root Cause: The function existed in `lancedb.embeddings.registry` but wasn't exposed through the main embeddings module interface. ## Changes - Add `register` to imports in `/python/python/lancedb/embeddings/__init__.py` ## Test Plan - [x] Verified `from lancedb.embeddings import register` works as documented - [x] Confirmed existing embedding tests pass - [x] Checked that the fix follows existing patterns (same as `get_registry`) - [x] Validated linting and formatting passes ## References Fixes #2541	2025-07-24 15:30:06 -07:00
yihong	43d9fc28b0	fix: can not build on python3.9 for dev (#2477 ) This patch fix can not build on python3.9 dev the reason is that for ibm-watsonx-ai the min version is py3.10 more can check on `pyoven` https://pyoven.org/package/ibm-watsonx-ai/ also fix tiny md lint --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-07-24 12:39:04 -07:00
aniaan	f45f0d0431	fix(python): correct type annotations in EmbeddingFunctionRegistry (#2478 ) - Fix register() method's alias parameter type from 'str = None' to 'Optional[str] = None' - Add return type annotation 'Type[EmbeddingFunction]' to get() method - Import Type from typing module for proper type hints	2025-07-24 12:31:49 -07:00
Tristan Zajonc	b9e3c36d82	fix: replace broken documentation URLs in error messages (#2533 ) Replaces broken 404 URL and unhelpful documentation links in type error messages with working URL and inline list of supported data types. Before: Points to https://lancedb.github.io/lance/read_and_write.html (404 error) After: Lists supported types inline and points to https://lancedb.github.io/lancedb/guides/tables/	2025-07-24 12:30:27 -07:00
Chen Chongchen	3cd7dd3375	fix: to_pydantic typing (#2517 ) currently, to_pydantic will always return LanceModel. If type checking is enabled in my project. I have to use `cast(data, List[RealModelType])` to solve type error. This PR uses generic to solve this problem.	2025-07-24 12:30:15 -07:00
Will Jones	3d1f102087	feat: allow Python and Typescript users to create `Session`s (#2530 ) ## Summary - Exposes `Session` in Python and Typescript so users can set the `index_cache_size_bytes` and `metadata_cache_size_bytes` * The `Session` is attached to the `Connection`, and thus shared across all tables in that connection. - Adds deprecation warnings for table-level cache configuration 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-24 12:06:29 -07:00
Tristan Zajonc	81afd8a42f	fix: use local random state in FTS test fixtures to prevent flaky failures (#2532 ) ## Summary Fixes intermittent CI failures in `test_search_fts[False]` where boolean FTS queries were returning fewer results than expected due to non-deterministic test data generation. ## Problem The test was using global `random` and `np.random` without seeding, causing the boolean query `MatchQuery("puppy", "text") & MatchQuery("runs", "text")` to sometimes return only 3 results instead of the expected 5, leading to `AssertionError: assert 3 == 5`. ## Solution - Replace global random calls with local `random.Random(42)` and `np.random.RandomState(42)` objects in test fixtures - Ensures deterministic test data while maintaining test isolation - No impact on other tests since random state is scoped to fixtures only ## Test Results - ✅ `test_search_fts[False]` now passes consistently - ✅ All other FTS tests continue to pass - ✅ No regression in other test suites (verified with `test_basic`) - ✅ Maintains existing test behavior and coverage	2025-07-24 11:30:02 -07:00
Tristan Zajonc	c2aa03615a	fix: correct grammar in LanceDB cloud connection error message (#2537 ) ## Summary Fixed a minor grammar error in the error message for missing API key when connecting to LanceDB cloud. ## Changes - Changed 'api_key is required to connected LanceDB cloud' to 'api_key is required to connect to LanceDB cloud' - Location: `python/python/lancedb/__init__.py:95` ## Test plan - Error message formatting is correct and grammatical - No functional changes to existing behavior	2025-07-24 09:56:06 -07:00
Tristan Zajonc	d2c6759e7f	fix: use import stubs to prevent MLX doctest collection failures (#2536 ) ## Summary - Add `create_import_stub()` helper to `embeddings/utils.py` for handling optional dependencies - Fix MLX doctest collection failures by using import stubs in `gte_mlx_model.py` - Module now imports successfully for doctest collection even when MLX is not installed ## Changes - New utility function: `create_import_stub()` creates placeholder objects that allow class inheritance but raise helpful errors when used - Updated MLX model: Uses import stubs instead of direct imports that fail immediately - Graceful degradation: Clear error messages when MLX functionality is accessed without MLX installed ## Test Results - ✅ `pytest --doctest-modules python/lancedb` now passes (with and without MLX installed) - ✅ All existing tests continue to pass - ✅ MLX functionality works normally when MLX is installed - ✅ Helpful error messages when MLX functionality is used without MLX installed Fixes #2538 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-07-23 16:25:33 -07:00
Will Jones	fbff244ed8	chore: add claude md files (#2531 ) Gives basic context to Claude about how to do common tasks in the repo.	2025-07-23 12:20:36 -07:00
Lance Release	7a15337e03	Bump version: 0.24.2-beta.0 → 0.24.2-beta.1	2025-07-22 15:40:17 +00:00
Lance Release	ce24457531	Bump version: 0.24.1 → 0.24.2-beta.0	2025-07-18 16:02:37 +00:00
BubbleCal	087fe6343d	test: fix random data may break test case (#2514 ) this test adds a new vector and then performs vector search with distance range. this may fail if the new vector becomes the closest one to the query vector Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-18 16:15:06 +08:00
Ayush Chaurasia	f076bb41f4	feat: add support for returning all scores with rerankers (#2509 ) Previously `return_score="all"` was supported only for the default reranker (RRF) and not the model based rerankers. This adds support for keeping all scores in the base reranker so that all model based rerankers can use it. Its a slower path than keeping just the relevance score but can be useful in debugging	2025-07-15 21:03:03 +05:30
BubbleCal	03b62599d7	feat: support ngram tokenizer (#2507 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-15 16:36:08 +08:00
Lance Release	a300a238db	Bump version: 0.24.1-beta.2 → 0.24.1	2025-07-10 21:36:02 +00:00
Lance Release	a41ff1df0a	Bump version: 0.24.1-beta.1 → 0.24.1-beta.2	2025-07-10 21:36:02 +00:00
CyrusAttoun	167fccc427	fix: change 'return' to 'raise' for unimplemented remote table function (#2484 ) just noticed that we're doing a 'return' instead of a 'raise' while trying to get remote functionality working for my project. I went ahead and implemented tests for both of the unimplemented functions (to_pandas and to_arrow) while I was in there. --------- Co-authored-by: Cyrus Attoun <jattoun1@gmail.com>	2025-07-09 14:27:08 -07:00
Lance Release	905552f993	Bump version: 0.24.1-beta.0 → 0.24.1-beta.1	2025-07-09 05:53:28 +00:00
BubbleCal	cab36d94b2	feat: support to specify num_partitions and num_bits (#2488 )	2025-07-09 11:36:09 +08:00
Lance Release	d4bb59b542	Bump version: 0.24.0 → 0.24.1-beta.0	2025-07-07 21:00:38 +00:00
Weston Pace	1dadb2aefa	feat: upgrade to lance 0.31.0-beta.1 (#2469 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Updated dependencies to newer versions for improved compatibility and stability. * Refactor * Improved internal handling of data ranges and stream lifetimes for enhanced performance and reliability. * Simplified code style for Python query object conversions without affecting functionality. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-30 11:10:53 -07:00
Haoyu Weng	eb9784d7f2	feat(python): batch Ollama embed calls (#2453 ) Other embedding integrations such as Cohere and OpenAI already send requests in batches. We should do that for Ollama too to improve throughput. The Ollama [`.embed` API](`63ca747622/ollama/_client.py (L359-L378)`) was added in version 0.3.0 (almost a year ago) so I updated the version requirement in pyproject. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Bug Fixes - Improved compatibility with newer versions of the "ollama" package by requiring version 0.3.0 or higher. - Enhanced embedding generation to process batches of texts more efficiently and reliably. - Refactor - Improved type consistency and clarity for embedding-related methods. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-30 08:28:14 -07:00
Will Jones	4beb2d2877	fix(python): make sure `explain_plan` works with FTS queries (#2466 ) ## Summary Fixes issue #2465 where FTS explain plans only showed basic `LanceScan` instead of detailed execution plans with FTS query details, limits, and offsets. ## Root Cause The `FTSQuery::explain_plan()` and `analyze_plan()` methods were missing the `.full_text_search()` call before calling explain/analyze plan, causing them to operate on the base query without FTS context. ## Changes - Fixed `explain_plan()` and `analyze_plan()` in `src/query.rs` to call `.full_text_search()` - Added comprehensive test coverage for FTS explain plans with limits, offsets, and filters - Updated existing tests to expect correct behavior instead of buggy behavior ## Before/After Before (broken): ``` LanceScan: uri=..., projection=[...], row_id=false, row_addr=false, ordered=true ``` After (fixed): ``` ProjectionExec: expr=[id@2 as id, text@3 as text, _score@1 as _score] Take: columns="_rowid, _score, (id), (text)" CoalesceBatchesExec: target_batch_size=1024 GlobalLimitExec: skip=2, fetch=4 MatchQuery: query=test ``` ## Test Plan - [x] All new FTS explain plan tests pass - [x] Existing tests continue to pass - [x] FTS queries now show proper execution plans with MatchQuery, limits, filters Closes #2465 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Added new test cases to verify explain plan output for full-text search, vector queries with pagination, and queries with filters. * Bug Fixes * Improved the accuracy of explain plan and analysis output for full-text search queries, ensuring the correct query details are reflected. * Refactor * Enhanced the formatting and hierarchical structure of execution plans for hybrid queries, providing clearer and more detailed plan representations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-06-26 23:35:14 -07:00
Lance Release	c625b6f2b2	Bump version: 0.24.0-beta.0 → 0.24.0	2025-06-20 05:46:05 +00:00
Lance Release	bec8fe6547	Bump version: 0.23.1-beta.2 → 0.24.0-beta.0	2025-06-20 05:46:04 +00:00
BubbleCal	cb70ff8cee	feat!: switch default FTS to native lance FTS (#2428 ) This switches the default FTS to native lance FTS for Python sync table API, the other APIs have switched to native implementation already <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - The default behavior for creating a full-text search index now uses the new implementation rather than the legacy one. - Bug Fixes - Improved handling and error messages for phrase queries in full-text search. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-06-19 10:38:34 +08:00
BubbleCal	cbb5a841b1	feat: support prefix matching and must_not clause (#2441 )	2025-06-19 10:32:32 +08:00
Lance Release	e5a80a5e86	Bump version: 0.23.1-beta.1 → 0.23.1-beta.2	2025-06-18 23:33:05 +00:00
Lance Release	e08d45e090	Bump version: 0.23.1-beta.0 → 0.23.1-beta.1	2025-06-17 23:22:00 +00:00
Wyatt Alt	627ca4c810	chore: update lance to v0.29.1-beta.2 (#2442 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated internal dependencies to use a newer version of the Lance library. - New Features - Added support for a new query occurrence type labeled "MUST NOT" in search filters. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-17 14:02:13 -07:00
Lance Release	9eb6119468	Bump version: 0.23.0 → 0.23.1-beta.0	2025-06-16 16:29:22 +00:00
Weston Pace	59b57e30ed	feat: add maximum and minimum nprobes properties (#2430 ) This exposes the maximum_nprobes and minimum_nprobes feature that was added in https://github.com/lancedb/lance/pull/3903 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for specifying minimum and maximum probe counts in vector search queries, allowing finer control over search behavior. - Users can now independently set minimum and maximum probes for vector and hybrid queries via new methods and parameters in Python, Node.js, and Rust APIs. - Bug Fixes - Improved parameter validation to ensure correct usage of minimum and maximum probe values. - Tests - Expanded test coverage to validate correct handling, serialization, and error cases for the new probe parameters. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-13 15:18:29 -07:00
BubbleCal	84ded9d678	feat: support new FTS features in python SDK (#2411 ) - AND operator - phrase query slop param - boolean query <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for combining full-text search queries using AND/OR operators, enabling more flexible query composition. - Introduced new query types and parameters, including boolean queries, operator selection, occurrence constraints, and phrase slop for advanced search scenarios. - Enhanced asynchronous search to accept rich full-text query objects directly. - Bug Fixes - Improved handling and validation of full-text search queries in both synchronous and asynchronous search operations. - Tests - Updated and expanded tests to cover new full-text query types and their usage in search functions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-06-06 14:33:46 +08:00
Lance Release	8f42b5874e	Bump version: 0.23.0-beta.3 → 0.23.0	2025-06-04 21:07:39 +00:00
Lance Release	274f19f560	Bump version: 0.23.0-beta.2 → 0.23.0-beta.3	2025-06-04 21:07:38 +00:00
Lance Release	20e017fedc	Bump version: 0.23.0-beta.1 → 0.23.0-beta.2	2025-06-04 07:13:44 +00:00
Jack Ye	74e578b3c8	feat: upgrade lance to v0.29.0-beta.2 (#2419 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated various internal dependencies to newer versions for improved stability and compatibility. - Increased the version number for the Python package. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-06-03 15:16:26 -07:00
Lance Release	8825c7c1dd	Bump version: 0.23.0-beta.0 → 0.23.0-beta.1	2025-06-03 16:26:58 +00:00
Lance Release	5c7303ab2e	Bump version: 0.22.2-beta.0 → 0.23.0-beta.0	2025-05-31 03:47:13 +00:00
Will Jones	5895ef4039	ci: revert unnecessary version bump (#2415 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Downgraded version numbers for the Node.js, Python, and Rust packages. No other user-facing changes were made. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-30 16:51:14 -07:00
BubbleCal	5c7f63388d	feat!: upgrade lance to v0.28.0 (#2404 ) this introduces some breaking changes in terms of rust API of creating FTS index, and the default index params changed Signed-off-by: BubbleCal <bubble-cal@outlook.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Updated default settings for full-text search (FTS) index creation: stemming, stop word removal, and ASCII folding are now enabled by default, while token position storage is disabled by default. - Refactor - Simplified and streamlined the configuration and handling of FTS index parameters for improved maintainability and consistency across interfaces. - Enhanced serialization and request construction for FTS index parameters to reduce manual handling and improve code clarity. - Improved test coverage by explicitly enabling positional indexing in FTS tests to support phrase queries. - Chores - Upgraded all internal dependencies related to FTS indexing to the latest version for enhanced compatibility and performance. - Updated package versions for Node.js, Python, and Rust components to the latest beta releases. - Improved CI workflows by adding Rust toolchain setup with formatting and linting tools. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-29 15:19:24 -07:00
Renato Marroquin	d0bc671cac	docs: add example for querying a lance table with SQL (#2389 ) Adds example for querying a dataset with SQL <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Added new guides on querying LanceDB tables using SQL with DuckDB and Apache Datafusion. - Included detailed instructions for integrating LanceDB with Datafusion in Python. - Updated navigation to include Datafusion and SQL querying documentation. - Improved formatting in TypeScript and vectordb update examples for consistency. - Tests - Added a new test demonstrating SQL querying on Lance tables via DataFusion integration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-05-29 06:14:38 -07:00
Lance Release	d7a9dbb9fc	Bump version: 0.22.1 → 0.22.2-beta.0	2025-05-23 21:58:17 +00:00
Lance Release	745c34a6a9	Bump version: 0.22.1-beta.6 → 0.22.1	2025-05-22 05:57:20 +00:00
Lance Release	db8fa2454d	Bump version: 0.22.1-beta.5 → 0.22.1-beta.6	2025-05-22 05:57:20 +00:00
Lance Release	e3f2fd3892	Bump version: 0.22.1-beta.4 → 0.22.1-beta.5	2025-05-15 23:42:46 +00:00
Lance Release	04f962f6b0	Bump version: 0.22.1-beta.3 → 0.22.1-beta.4	2025-05-08 20:18:40 +00:00
Will Jones	272e4103b2	feat: provide timeout parameter for merge_insert (#2378 ) Provides the ability to set a timeout for merge insert. The default underlying timeout is however long the first attempt takes, or if there are multiple attempts, 30 seconds. This has two use cases: 1. Make the timeout shorter, when you want to fail if it takes too long. 2. Allow taking more time to do retries. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for specifying a timeout when performing merge insert operations in Python, Node.js, and Rust APIs. - Introduced a new option to control the maximum allowed execution time for merge inserts, including retry timeout handling. - Documentation - Updated and added documentation to describe the new timeout option and its usage in APIs. - Tests - Added and updated tests to verify correct timeout behavior during merge insert operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-08 13:07:05 -07:00

1 2 3 4 5 ...

854 Commits