lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 11:00:40 +00:00

Author	SHA1	Message	Date
Max Epstein	72af977a73	fix(CohereReranker): updated default model_name param to newest v3 (#1862 )	2024-11-21 09:02:49 -08:00
Bert	7cecb71df0	feat: support for checkout and checkout_latest in remote sdks (#1863 )	2024-11-21 11:28:46 -05:00
BubbleCal	b2f88f0b29	feat: support to sepcify ef search param (#1844 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-11-19 23:12:25 +08:00
Lei Xu	267aa83bf8	feat(python): check vector query is not None (#1847 ) Fix the type hints of `nearest_to` method, and raise `ValueError` when the input is None	2024-11-18 14:15:22 -08:00
Will Jones	72543c8b9d	test(python): test `with_row_id` in sync query (#1835 ) Also remove weird `MockTable` fixture.	2024-11-18 11:32:52 -08:00
Will Jones	587c0824af	feat: flexible null handling and insert subschemas in Python (#1827 ) * Test that we can insert subschemas (omit nullable columns) in Python. * More work is needed to support this in Node. See: https://github.com/lancedb/lancedb/issues/1832 * Test that we can insert data with nullable schema but no nulls in non-nullable schema. * Add `"null"` option for `on_bad_vectors` where we fill with null if the vector is bad. * Make null values not considered bad if the field itself is nullable.	2024-11-15 11:33:00 -08:00
Rob Meng	b724b1a01f	feat: support remote empty query (#1828 ) Support sending empty query types to remote lancedb. also include offset and limit, where were previously omitted.	2024-11-13 23:04:52 -05:00
Will Jones	abd75e0ead	feat: search multiple query vectors as one query (#1811 ) Allows users to pass multiple query vector as part of a single query plan. This just runs the queries in parallel without any further optimization. It's mostly a convenience. Previously, I think this was only handled by the sync Python remote API. This makes it common across all SDKs. Closes https://github.com/lancedb/lancedb/issues/1803 ```python >>> import lancedb >>> import asyncio >>> >>> async def main(): ... db = await lancedb.connect_async("./demo") ... table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite") ... return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas() ... >>> asyncio.run(main()) query_index id vector _distance 0 2 2 [4.0, 5.0, 6.0] 0.0 1 1 2 [4.0, 5.0, 6.0] 0.0 2 0 1 [1.0, 2.0, 3.0] 0.0 ```	2024-11-13 16:05:16 -08:00
Lei Xu	4c9bab0d92	fix: use pandas with pydantic embedding column (#1818 ) * Make Pandas `DataFrame` works with embedding function + Subset of columns * Make `lancedb.create_table()` work with embedding function	2024-11-11 14:48:56 -08:00
fzowl	cbbc07d0f5	feat: voyageai support (#1799 ) Adding VoyageAI embedding and rerank support	2024-11-09 00:51:20 +05:30
BubbleCal	4372c231cd	feat: support optimize indices in sync API (#1769 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-11-08 08:48:07 -08:00
Will Jones	91cab3b556	feat(python): transition Python remote sdk to use Rust implementation (#1701 ) * Replaces Python implementation of Remote SDK with Rust one. * Drops dependency on `attrs` and `cachetools`. Makes `requests` an optional dependency used only for embeddings feature. * Adds dependency on `nest-asyncio`. This was required to get hybrid search working. * Deprecate `request_thread_pool` parameter. We now use the tokio threadpool. * Stop caching the `schema` on a remote table. Schema is mutable and there's no mechanism in place to invalidate the cache. * Removed the client-side resolution of the vector column. We should already be resolving this server-side.	2024-11-05 13:44:39 -08:00
Weston Pace	26f4a80e10	feat: upgrade to lance 0.19.2-beta.3 (#1794 )	2024-11-05 06:43:41 -08:00
Will Jones	3604d20ad3	feat(python,node): support with_row_id in Python and remote (#1784 ) Needed to support hybrid search in Remote SDK.	2024-11-04 11:25:45 -08:00
Gagan Bhullar	9708d829a9	fix: explain plan options (#1776 ) PR fixes #1768	2024-11-04 10:25:34 -08:00
Will Jones	15ed7f75a0	feat(python): support post filter on FTS (#1783 )	2024-11-01 10:05:05 -07:00
Will Jones	96181ab421	feat: `fast_search` in Python and Node (#1623 ) Sometimes it is acceptable to users to only search indexed data and skip and new un-indexed data. For example, if un-indexed data will be shortly indexed and they don't mind the delay. In these cases, we can save a lot of CPU time in search, and provide better latency. Users can activate this on queries using `fast_search()`.	2024-11-01 09:29:09 -07:00
Weston Pace	55104c5bae	feat: allow distance type (metric) to be specified during hybrid search (#1777 )	2024-10-29 13:51:18 -07:00
BubbleCal	32fdcf97db	feat!: upgrade lance to 0.19.1 (#1762 ) BREAKING CHANGE: default tokenizer no longer does stemming or stop-word removal. Users should explicitly turn that option on in the future. - upgrade lance to 0.19.1 - update the FTS docs - update the FTS API Upstream change notes: https://github.com/lancedb/lance/releases/tag/v0.19.1 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-10-29 09:03:52 -07:00
James Wu	38eb05f297	fix(python): remove dependency on retry package (#1749 ) ## user story fixes https://github.com/lancedb/lancedb/issues/1480 https://github.com/invl/retry has not had an update in 8 years, one if its sub-dependencies via requirements.txt (https://github.com/pytest-dev/py) is no longer maintained and has a high severity vulnerability (CVE-2022-42969). retry is only used for a single function in the python codebase for a deprecated helper function `with_embeddings`, which was created for an older tutorial (https://github.com/lancedb/lancedb/pull/12) [but is now deprecated](https://lancedb.github.io/lancedb/embeddings/legacy/). ## changes i backported a limited range of functionality of the `@retry()` decorator directly into lancedb so that we no longer have a dependency to the `retry` package. ## tests ``` /Users/james/src/lancedb/python $ ruff check . All checks passed! /Users/james/src/lancedb/python $ pytest python/tests/test_embeddings.py python/tests/test_embeddings.py .......s.... [100%] ================================================================ 11 passed, 1 skipped, 2 warnings in 7.08s ================================================================ ```	2024-10-15 15:13:57 -07:00
Ryan Green	679a70231e	feat: allow fast_search on python remote table (#1747 ) Add `fast_search` parameter to query builder and remote table to support skipping flat search in remote search	2024-10-14 14:39:54 -06:00
Will Jones	8509f73221	feat: better errors for remote SDK (#1722 ) * Adds nicer errors to remote SDK, that expose useful properties like `request_id` and `status_code`. * Makes sure the Python tracebacks print nicely by mapping the `source` field from a Rust error to the `__cause__` field.	2024-10-08 22:21:13 -06:00
Gagan Bhullar	4d458d5829	feat(python): drop support for dictionary in Table.add (#1725 ) PR closes #1706	2024-10-08 20:41:08 -06:00
Will Jones	f305f34d9b	feat(python): bind python async remote client to rust client (#1700 ) Closes [#1638](https://github.com/lancedb/lancedb/issues/1638) This just binds the Python Async client to the Rust remote client.	2024-10-01 15:46:59 -07:00
Will Jones	2c4b07eb17	feat(python): merge_insert in async Python (#1707 ) Fixes #1401	2024-10-01 10:06:52 -07:00
Will Jones	33b402c861	fix: `list_indices` returns correct index type (#1715 ) Fixes https://github.com/lancedb/lancedb/issues/1711 Doesn't address this https://github.com/lancedb/lance/issues/2039 Instead we load the index statistics, which seems to contain the index type. However, this involves more IO than previously. I'm not sure whether we care that much. If we do, we can fix that upstream Lance issue.	2024-10-01 09:16:18 -07:00
Akash Saravanan	d6b5054778	feat(python): add support for trust_remote_code in hf embeddings (#1712 ) Resovles #1709. Adds `trust_remote_code` as a parameter to the `TransformersEmbeddingFunction` class with a default of False. Updated relevant documentation with the same.	2024-10-01 01:06:28 +05:30
Will Jones	f958f4d2e8	feat: remote index stats (#1702 ) BREAKING CHANGE: the return value of `index_stats` method has changed and all `index_stats` APIs now take index name instead of UUID. Also several deprecated index statistics methods were removed. * Removes deprecated methods for individual index statistics * Aligns public `IndexStatistics` struct with API response from LanceDB Cloud. * Implements `index_stats` for remote Rust SDK and Python async API.	2024-09-27 12:10:00 -07:00
rjrobben	e606a455df	fix(EmbeddingFunction): modify safe_model_dump to explicitly exclude class fields with underscore (#1688 ) Resolve issue #1681 --------- Co-authored-by: rjrobben <rjrobben123@gmail.com>	2024-09-25 11:53:49 -07:00
Ayush Chaurasia	2f2721e242	feat(python): allow explicit hybrid search query pattern in SaaS (feat parity) (#1698 ) - fixes https://github.com/lancedb/lancedb/issues/1697. - unifies vector column inference logic for remote and local table to prevent future disparities. - Updates docstring in RemoteTable to specify empty queries are not supported	2024-09-25 21:04:00 +05:30
QianZhu	f00b21c98c	fix: metric type for python/node search api (#1689 )	2024-09-24 16:10:29 -07:00
Ayush Chaurasia	f81ce68e41	fix(python): force deduce vector column name if running explicit hybrid query (#1692 ) Right now when passing vector and query explicitly for hybrid search , vector_column_name is not deduced. (https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/#hybrid-search-in-lancedb ). Because vector and query can be both none when initialising the QueryBuilder in this case. This PR forces deduction of query type if it is set to "hybrid"	2024-09-24 19:02:56 +05:30
Ayush Chaurasia	86978e7588	feat!: enforce all rerankers always return relevance score & deprecate linear combination fixes (#1687 ) - Enforce all rerankers always return _relevance_score. This was already loosely done in tests before but based on user feedback its better to always have _relevance_score present in all reranked results - Deprecate LinearCombinationReranker in docs. And also fix a case where it would not return _relevance_score if one result set was missing	2024-09-23 12:12:02 +05:30
Lei Xu	7c314d61cc	chore: add error handling for openai embedding generation (#1680 )	2024-09-23 12:10:56 +05:30
Lei Xu	915d828cee	feat!: set embeddings to Null if embedding function return invalid results (#1674 )	2024-09-19 23:16:20 -07:00
LuQQiu	abeaae3d80	feat!: upgrade Lance to 0.18.0 (#1657 ) BREAKING CHANGE: default file format changed to Lance v2.0. Upgrade Lance to 0.18.0 Change notes: https://github.com/lancedb/lance/releases/tag/v0.18.0	2024-09-19 10:50:26 -07:00
Gagan Bhullar	b3c0227065	docs: hnsw documentation (#1640 ) PR closes #1627 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-19 10:32:46 -07:00
Lei Xu	32af962c0c	feat: fix creating empty table and creating table by a list of RecordBatch for remote python sdk (#1650 ) Closes #1637	2024-09-14 11:33:34 -07:00
Ayush Chaurasia	18484d0b6c	fix: allow pass optional args in colbert reranker (#1649 ) Fixes https://github.com/lancedb/lancedb/issues/1641	2024-09-14 11:18:09 -07:00
Lei Xu	c02ee3c80c	chore: make remote client a context manager (#1648 ) Allow `RemoteLanceDBClient` to be used as context manager	2024-09-13 22:08:48 -07:00
Sayandip Dutta	9b8472850e	fix: unterminated string literal on table update (#1573 ) resolves #1429 (python) ```python - return f"'{value}'" + return f'"{value}"' ``` --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-13 12:32:59 -07:00
Sayandip Dutta	36d05ea641	fix: add appropriate QueryBuilder overloads to LanceTable.search (#1558 ) - Add overloads to Table.search, to preserve the return information of different types of QueryBuilder objects for LanceTable - Fix fts_column type annotation by including making it `Optional` resolves #1550 --------- Co-authored-by: sayandip-dutta <sayandip.dutta@nevaehtech.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-13 12:32:30 -07:00
LuQQiu	c7732585bf	fix: support pyarrow input types (#1628 ) fixes #1625 Support PyArrow.RecordBatch, pa.dataset.Dataset, pa.dataset.Scanner, paRecordBatchReader	2024-09-12 10:59:18 -07:00
BubbleCal	4b79db72bf	docs: improve the docs and API param name (#1629 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-09-11 10:18:29 +08:00
Gagan Bhullar	205fc530cf	feat: expose hnsw indices (#1595 ) PR closes #1522 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-10 11:08:13 -07:00
BubbleCal	2bde5401eb	feat: support to build FTS without positions (#1621 )	2024-09-10 22:51:32 +08:00
Antonio Molner Domenech	a405847f9b	fix(python): remove unmaintained ratelimiter dependency (#1603 ) The `ratelimiter` package hasn't been updated in ages and is no longer maintained. This PR removes the dependency on `ratelimiter` and replaces it with a custom rate limiter implementation. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-09-09 12:35:53 -07:00
Will Jones	2a6586d6fb	feat: add flag to enable faster manifest paths (#1612 ) The new V2 manifest path scheme makes discovering the latest version of a table constant time on object stores, regardless of the number of versions in the table. See benchmarks in the PR here: https://github.com/lancedb/lance/pull/2798 Closes #1583	2024-09-09 11:34:36 -07:00
James Wu	029b01bbbf	feat: enable phrase_query(bool) for hybrid search queries (#1578 ) first off, apologies for any folly since i'm new to contributing to lancedb. this PR is the continuation of [a discord thread](https://discord.com/channels/1030247538198061086/1030247538667827251/1278844345713299599): ## user story here's the lance db search query i'd like to run: ``` def search(phrase): logger.info(f'Searching for phrase: {phrase}') phrase_embedding = get_embedding(phrase) df = (table.search((phrase_embedding, phrase), query_type='hybrid') .limit(10).to_list()) logger.info(f'Success search with row count: {len(df)}') search('howdy (howdy)') search('howdy(howdy)') ``` the second search fails due to `ValueError: Syntax Error: howdy(howdy)` i saw on the [docs](https://lancedb.github.io/lancedb/fts/#phrase-queries-vs-terms-queries) that i can use `phrase_query()` to [enable a flag](https://github.com/lancedb/lancedb/blob/main/python/python/lancedb/query.py#L790-L792) to wrap the query in double quotes (as well as sanitize single quotes) prior to sending the query to search. this works for [normal FTS](https://lancedb.github.io/lancedb/fts/), but the command is unavailable on [hybrid search](https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/). ## changes i added `phrase_query()` function to `LanceHybridQueryBuilder` by propagating the call down to its `self. _fts_query` object. i'm not too familiar with the codebase and am not sure if this is the best way to implement the functionality. feel free to riff on this PR or discard ## tests ``` (lancedb) JamesMPB:python james$ pwd /Users/james/src/lancedb/python (lancedb) JamesMPB:python james$ pytest python/tests/test_table.py python/tests/test_table.py ....................................... [100%] ====================================================== 39 passed, 1 warning in 2.23s ======================================================= ```	2024-09-07 08:58:05 +05:30
BubbleCal	8dcd328dce	feat: support to create table from record batch iterator (#1593 )	2024-09-06 10:41:38 +08:00

1 2 3

148 Commits