lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-25 06:19:57 +00:00

Author	SHA1	Message	Date
Will Jones	cee2b5ea42	chore: upgrade pyarrow pin (#2192 ) Closes #2191 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated the required version of the pyarrow package to version 16 or higher. - Adjusted automated testing workflows to install pyarrow version 16 for compatibility checks. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 11:23:13 -07:00
Magnus	4f07fea6df	feat: add ColPali embedding support with MultiVector type (#2170 ) This PR adds ColPali support with ColPaliEmbeddings class (tagged "colpali") using ColQwen2.5 for multi-vector text/image embeddings. Also added MultiVector Pydantic type to handle the vector lists. I've added some integration test for the embedding model and some unit test for the new Pydantic type. Could be a template for other ColPali variants as well. or until transformers🤗 starts supporting it. Still `TODO`: - [ ] Documentation - [ ] Add an example _Could also allow Image as query, but didn't work well when testing it._ [ColPali-Engine](https://github.com/illuin-tech/colpali) version: 0.3.9.dev17+g3faee24 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced support for ColPali-based multimodal multi-vector embeddings for both text and images. - Added a new embedding class for generating multi-vector embeddings, configurable for various model and processing options. - Added a new Pydantic type for multi-vector embeddings, supporting validation and schema generation for lists of fixed-dimension vectors. - Bug Fixes - Ensured proper asynchronous index creation in query tests for improved reliability. - Tests - Added integration tests for ColPali embeddings, including text-to-image search and validation of multi-vector fields. - Added comprehensive tests for the new multi-vector Pydantic type, covering schema, validation, and default value behavior. - Chores - Updated optional dependencies to include the ColPali engine. - Added utility to check for availability of flash attention support. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 11:47:37 +08:00
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
PhorstenkampFuzzy	a6fa69ab89	fix(python): add pylance as its own optional dependency (#2336 ) This change allows to centrally manage the plance depndency without everybody needing to monitor for compatibility manually. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced an optional dependency that enhances development support. Users can now benefit from improved static analysis capabilities when installing the recommended version (0.23.2 or later). <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-14 09:28:16 -07:00
fzowl	30ed8c4c43	fix: voyageai regression multimodal supercedes text models (#2268 ) fix #2160	2025-04-04 14:45:56 -07:00
Lei Xu	a38f784081	chore: add numpy as dependency (#2308 )	2025-04-04 10:33:39 -07:00
Will Jones	b2a38ac366	fix: make pylance optional again (#2209 ) The two remaining blockers were: * A method `with_embeddings` that was deprecated a year ago * A typecheck for `LanceDataset`	2025-03-21 11:26:32 -07:00
Weston Pace	3966b16b63	fix: restore pylance as mandatory dependency (#2204 ) We attempted to make pylance optional in https://github.com/lancedb/lancedb/pull/2156 but it appears this did not quite work. Users are unable to use lancedb from a fresh install. This reverts the optional-ness so we can get back in a working state while we fix the issue.	2025-03-11 06:13:52 -07:00
Weston Pace	4a47150ae7	feat: upgrade to lance 0.24.1 (#2199 )	2025-03-10 15:18:37 -07:00
msu-reevo	cc81f3e1a5	fix(python): typing (#2167 ) @wjones127 is there a standard way you guys setup your virtualenv? I can either relist all the dependencies in the pyright precommit section, or specify a venv, or the user has to be in the virtual environment when they run git commit. If the venv location was standardized or a python manager like `uv` was used it would be easier to avoid duplicating the pyright dependency list. Per your suggestion, in `pyproject.toml` I added in all the passing files to the `includes` section. For ruff I upgraded the version and removed "TCH" which doesn't exist as an option. I added a `pyright_report.csv` which contains a list of all files sorted by pyright errors ascending as a todo list to work on. I fixed about 30 issues in `table.py` stemming from str's being passed into methods that required a string within a set of string Literals by extracting them into `types.py` Can you verify in the rust bridge that the schema should be a property and not a method here? If it's a method, then there's another place in the code where `inner.schema` should be `inner.schema()` ``` python class RecordBatchStream: @property def schema(self) -> pa.Schema: ... ``` Also unless the `_lancedb.pyi` file is wrong, then there is no `__anext__` here for `__inner` when it's not an `AsyncGenerator` and only `next` is defined: ``` python async def __anext__(self) -> pa.RecordBatch: return await self._inner.__anext__() if isinstance(self._inner, AsyncGenerator): batch = await self._inner.__anext__() else: batch = await self._inner.next() if batch is None: raise StopAsyncIteration return batch ``` in the else statement, `_inner` is a `RecordBatchStream` ```python class RecordBatchStream: @property def schema(self) -> pa.Schema: ... async def next(self) -> Optional[pa.RecordBatch]: ... ``` --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-10 09:01:23 -07:00
Will Jones	01e4291d21	feat(python): drop hard dependency on pylance (#2156 ) Closes #1793	2025-02-26 15:53:45 -08:00
Weston Pace	d6b3ccb37b	feat: upgrade lance to 0.23.2 (#2152 ) This also changes the pylance pin from `==0.23.2` to `~=0.23.2` which should allow the pylance dependency to float a little. The pylance dependency is actually not used for much anymore and so it should be tolerant of patch changes.	2025-02-26 09:02:51 -08:00
Weston Pace	c4f99e82e5	feat: push filters down into DF table provider (#2128 )	2025-02-25 14:46:28 -08:00
Will Jones	e7574698eb	feat: upgrade Lance to 0.23.0 (#2101 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.23.0	2025-02-07 07:58:07 -08:00
Rob Meng	32716adaa3	chore: bump lance version (#2092 )	2025-02-04 12:25:05 -05:00
Will Jones	2f39274a66	feat: upgrade lance to 0.23.0-beta.4 (#2089 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.23.0-beta.4	2025-01-31 17:20:15 -08:00
Will Jones	acda7a4589	feat: upgrade lance to v0.23.0-beta.3 (#2074 ) This includes several bugfixes for `merge_insert` and null handling in vector search. https://github.com/lancedb/lance/releases/tag/v0.23.0-beta.3	2025-01-28 14:00:06 -08:00
Will Jones	52b79d2b1e	feat: upgrade lance to v0.23.0-beta.2 (#2063 ) Fixes https://github.com/lancedb/lancedb/issues/2043	2025-01-23 13:51:30 -08:00
Will Jones	bcfc93cc88	fix(python): various fixes for async query builders (#2048 ) This includes several improvements and fixes to the Python Async query builders: 1. The API reference docs show all the methods for each builder 2. The hybrid query builder now has all the same setter methods as the vector search one, so you can now set things like `.distance_type()` on a hybrid query. 3. Re-rankers are now properly hooked up and tested for FTS and vector search. Previously the re-rankers were accidentally bypassed in unit tests, because the builders overrode `.to_arrow()`, but the unit test called `.to_batches()` which was only defined in the base class. Now all builders implement `.to_batches()` and leave `.to_arrow()` to the base class. 4. The `AsyncQueryBase` and `AsyncVectoryQueryBase` setter methods now return `Self`, which provides the appropriate subclass as the type hint return value. Previously, `AsyncQueryBase` had them all hard-coded to `AsyncQuery`, which was unfortunate. (This required bringing in `typing-extensions` for older Python version, but I think it's worth it.)	2025-01-20 16:14:34 -08:00
Bert	4703cc6894	chore: upgrade lance to v0.22.1-beta.3 (#2038 )	2025-01-16 12:42:42 -05:00
Weston Pace	5c759505b8	feat: upgrade lance 0.22.1b1 (#2029 ) Now the version actually exists :)	2025-01-15 07:37:37 -08:00
Will Jones	92dcf24b0c	feat: upgrade Lance to v0.22.0 (#2017 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.22.0	2025-01-13 15:06:01 -08:00
Weston Pace	d8d11f48e7	feat: upgrade to lance 0.22.0b1 (#2011 )	2025-01-10 12:51:52 -08:00
Lei Xu	f76c4a5ce1	chore: add pyright static type checking and fix some of the table interface (#1996 ) * Enable `pyright` in the project * Fixed some pyright typing errors in `table.py`	2025-01-04 15:24:58 -08:00
Lei Xu	397813f6a4	chore: bump pylance to 0.21.1b1 (#1989 )	2024-12-31 15:34:27 -08:00
BubbleCal	aec8332eb5	chore: add `dynamic = ["version"]` to pass build check (#1977 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-12-28 10:45:23 -08:00
Will Jones	5ddd84cec0	feat: upgrade lance to 0.21.0-beta.5 (#1961 )	2024-12-19 10:54:59 -08:00
LuQQiu	edc9b9adec	chore: bump Lance version to v0.21.0-beta.4 (#1939 )	2024-12-13 14:36:13 -08:00
Weston Pace	00552439d9	feat: upgrade lance to 0.21.0b3 (#1936 )	2024-12-12 21:32:59 -08:00
Weston Pace	d6c0f75078	feat: upgrade to lance prerelease 0.21.0b2 (#1933 )	2024-12-11 11:17:10 -08:00
Will Jones	5f261cf2d8	feat: upgrade to Lance v0.20.0 (#1908 ) Upstream change log: https://github.com/lancedb/lance/releases/tag/v0.20.0	2024-12-05 10:53:59 -08:00
LuQQiu	d6d9cb7415	feat: bump lance to 0.20.0b3 (#1882 ) Bump lance version. Upstream change log: https://github.com/lancedb/lance/releases/tag/v0.20.0-beta.3	2024-11-25 16:15:44 -08:00
Will Jones	6826039575	fix(python): run remote SDK futures in background thread (#1856 ) Users who call the remote SDK from code that uses futures (either `ThreadPoolExecutor` or `asyncio`) can get odd errors like: ``` Traceback (most recent call last): File "/usr/lib/python3.12/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) RuntimeError: cannot enter context: <_contextvars.Context object at 0x7cfe94cdc900> is already entered ``` This PR fixes that by executing all LanceDB futures in a dedicated thread pool running on a background thread. That way, it doesn't interact with their threadpool.	2024-11-25 13:12:47 -08:00
Lei Xu	2ded17452b	fix(python)!: handle bad openai embeddings gracefully (#1873 ) BREAKING-CHANGE: change Pydantic Vector field to be nullable by default. Closes #1577	2024-11-23 13:33:52 -08:00
Lei Xu	d369233b3d	feat: bump lance to 0.20.0b2 (#1865 ) Bump lance version. Upstream change log: https://github.com/lancedb/lance/releases/tag/v0.20.0-beta.2	2024-11-21 13:16:59 -08:00
Bert	cb9a00a28d	feat: add list_versions to typescript, rust and remote python sdks (#1850 ) Will require update to lance dependency to bring in this change which makes the version serializable https://github.com/lancedb/lance/pull/3143	2024-11-21 13:35:14 -05:00
Rob Meng	e3ea5cf9b9	chore: bump lance to 0.19.3 (#1839 )	2024-11-16 14:57:52 -05:00
Rob Meng	d8c217b47d	chore: bump lance to 0.19.2 (#1829 )	2024-11-13 23:23:02 -05:00
Will Jones	91cab3b556	feat(python): transition Python remote sdk to use Rust implementation (#1701 ) * Replaces Python implementation of Remote SDK with Rust one. * Drops dependency on `attrs` and `cachetools`. Makes `requests` an optional dependency used only for embeddings feature. * Adds dependency on `nest-asyncio`. This was required to get hybrid search working. * Deprecate `request_thread_pool` parameter. We now use the tokio threadpool. * Stop caching the `schema` on a remote table. Schema is mutable and there's no mechanism in place to invalidate the cache. * Removed the client-side resolution of the vector column. We should already be resolving this server-side.	2024-11-05 13:44:39 -08:00
Weston Pace	26f4a80e10	feat: upgrade to lance 0.19.2-beta.3 (#1794 )	2024-11-05 06:43:41 -08:00
BubbleCal	32fdcf97db	feat!: upgrade lance to 0.19.1 (#1762 ) BREAKING CHANGE: default tokenizer no longer does stemming or stop-word removal. Users should explicitly turn that option on in the future. - upgrade lance to 0.19.1 - update the FTS docs - update the FTS API Upstream change notes: https://github.com/lancedb/lance/releases/tag/v0.19.1 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-10-29 09:03:52 -07:00
Weston Pace	f43cb8bba1	feat: upgrade lance to 0.18.3 (#1748 )	2024-10-16 00:48:31 -07:00
James Wu	38eb05f297	fix(python): remove dependency on retry package (#1749 ) ## user story fixes https://github.com/lancedb/lancedb/issues/1480 https://github.com/invl/retry has not had an update in 8 years, one if its sub-dependencies via requirements.txt (https://github.com/pytest-dev/py) is no longer maintained and has a high severity vulnerability (CVE-2022-42969). retry is only used for a single function in the python codebase for a deprecated helper function `with_embeddings`, which was created for an older tutorial (https://github.com/lancedb/lancedb/pull/12) [but is now deprecated](https://lancedb.github.io/lancedb/embeddings/legacy/). ## changes i backported a limited range of functionality of the `@retry()` decorator directly into lancedb so that we no longer have a dependency to the `retry` package. ## tests ``` /Users/james/src/lancedb/python $ ruff check . All checks passed! /Users/james/src/lancedb/python $ pytest python/tests/test_embeddings.py python/tests/test_embeddings.py .......s.... [100%] ================================================================ 11 passed, 1 skipped, 2 warnings in 7.08s ================================================================ ```	2024-10-15 15:13:57 -07:00
Will Jones	5f9d8509b3	feat: upgrade Lance to v0.18.2 (#1737 ) Includes changes from v0.18.1 and v0.18.2: * [v0.18.1 change log](https://github.com/lancedb/lance/releases/tag/v0.18.1) * [v0.18.2 change log](https://github.com/lancedb/lance/releases/tag/v0.18.2) Closes #1656 Closes #1615 Closes #1661	2024-10-09 11:46:46 -06:00
LuQQiu	abeaae3d80	feat!: upgrade Lance to 0.18.0 (#1657 ) BREAKING CHANGE: default file format changed to Lance v2.0. Upgrade Lance to 0.18.0 Change notes: https://github.com/lancedb/lance/releases/tag/v0.18.0	2024-09-19 10:50:26 -07:00
Antonio Molner Domenech	a405847f9b	fix(python): remove unmaintained ratelimiter dependency (#1603 ) The `ratelimiter` package hasn't been updated in ages and is no longer maintained. This PR removes the dependency on `ratelimiter` and replaces it with a custom rate limiter implementation. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-09 12:35:53 -07:00
Will Jones	cd32944e54	feat: upgrade lance to v0.17.0 (#1608 ) Changelog: https://github.com/lancedb/lance/releases/tag/v0.17.0 Highlights: * You can do "phrase queries" by adding double quotes around phrases (multiple tokens) in FTS. Added follow ups in: https://github.com/lancedb/lancedb/issues/1611	2024-09-06 14:10:02 -07:00
BubbleCal	501817cfac	chore: bump the required python version to 3.9 (#1541 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-08-14 08:44:31 -07:00
BubbleCal	613f3063b9	chore: upgrade lance to 0.16.1 (#1524 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-08-09 19:18:05 +08:00
Lei Xu	2bdf0a02f9	feat!: upgrade lance to 0.16 (#1519 )	2024-08-07 13:15:22 -07:00

1 2 3 4 5

212 Commits