lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-25 22:29:58 +00:00

Author	SHA1	Message	Date
Wyatt Alt	f86b20a564	fix: delete tables from DDB on drop_all_tables (#2194 ) Prior to this commit, issuing drop_all_tables on a listing database with an external manifest store would delete physical tables but leave references behind in the manifest store. The table drop would succeed, but subsequent creation of a table with the same name would fail with a conflict. With this patch, the external manifest store is updated to account for the dropped tables so that dropped table names can be reused.	2025-03-10 15:00:53 -07:00
msu-reevo	cc81f3e1a5	fix(python): typing (#2167 ) @wjones127 is there a standard way you guys setup your virtualenv? I can either relist all the dependencies in the pyright precommit section, or specify a venv, or the user has to be in the virtual environment when they run git commit. If the venv location was standardized or a python manager like `uv` was used it would be easier to avoid duplicating the pyright dependency list. Per your suggestion, in `pyproject.toml` I added in all the passing files to the `includes` section. For ruff I upgraded the version and removed "TCH" which doesn't exist as an option. I added a `pyright_report.csv` which contains a list of all files sorted by pyright errors ascending as a todo list to work on. I fixed about 30 issues in `table.py` stemming from str's being passed into methods that required a string within a set of string Literals by extracting them into `types.py` Can you verify in the rust bridge that the schema should be a property and not a method here? If it's a method, then there's another place in the code where `inner.schema` should be `inner.schema()` ``` python class RecordBatchStream: @property def schema(self) -> pa.Schema: ... ``` Also unless the `_lancedb.pyi` file is wrong, then there is no `__anext__` here for `__inner` when it's not an `AsyncGenerator` and only `next` is defined: ``` python async def __anext__(self) -> pa.RecordBatch: return await self._inner.__anext__() if isinstance(self._inner, AsyncGenerator): batch = await self._inner.__anext__() else: batch = await self._inner.next() if batch is None: raise StopAsyncIteration return batch ``` in the else statement, `_inner` is a `RecordBatchStream` ```python class RecordBatchStream: @property def schema(self) -> pa.Schema: ... async def next(self) -> Optional[pa.RecordBatch]: ... ``` --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-10 09:01:23 -07:00
Lance Release	51437bc228	Bump version: 0.21.0-beta.0 → 0.21.0-beta.1	2025-03-06 19:23:06 +00:00
Bert	fa53cfcfd2	feat: support modifying field metadata in lancedb python (#2178 )	2025-03-04 16:58:46 -05:00
BubbleCal	8877eb020d	feat: record the server version for remote table (#2147 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-27 15:55:59 +08:00
Will Jones	01e4291d21	feat(python): drop hard dependency on pylance (#2156 ) Closes #1793	2025-02-26 15:53:45 -08:00
Lance Release	e1836e54e3	Bump version: 0.20.0 → 0.21.0-beta.0	2025-02-26 20:10:54 +00:00
Will Jones	5b12a47119	feat!: revert query limit to be unbounded for scans (#2151 ) In earlier PRs (#1886, #1191) we made the default limit 10 regardless of the query type. This was confusing for users and in many cases a breaking change. Users would have queries that used to return all results, but instead only returned the first 10, causing silent bugs. Part of the cause was consistency: the Python sync API seems to have always had a limit of 10, while newer APIs (Python async and Nodejs) didn't. This PR sets the default limit only for searches (vector search, FTS), while letting scans (even with filters) be unbounded. It does this consistently for all SDKs. Fixes #1983 Fixes #1852 Fixes #2141	2025-02-26 10:32:14 -08:00
Lance Release	072adc41aa	Bump version: 0.20.0-beta.0 → 0.20.0	2025-02-26 18:15:23 +00:00
Lance Release	c6f25ef1f0	Bump version: 0.19.1-beta.3 → 0.20.0-beta.0	2025-02-26 18:15:23 +00:00
Weston Pace	d6b3ccb37b	feat: upgrade lance to 0.23.2 (#2152 ) This also changes the pylance pin from `==0.23.2` to `~=0.23.2` which should allow the pylance dependency to float a little. The pylance dependency is actually not used for much anymore and so it should be tolerant of patch changes.	2025-02-26 09:02:51 -08:00
Weston Pace	c4f99e82e5	feat: push filters down into DF table provider (#2128 )	2025-02-25 14:46:28 -08:00
andrew-pienso	979a2d3d9d	docs: fixes is_open docstring on AsyncTable (#2150 )	2025-02-25 09:11:25 -08:00
Will Jones	7ac5f74c80	feat!: add variable store to embeddings registry (#2112 ) BREAKING CHANGE: embedding function implementations in Node need to now call `resolveVariables()` in their constructors and should not implement `toJSON()`. This tries to address the handling of secrets. In Node, they are currently lost. In Python, they are currently leaked into the table schema metadata. This PR introduces an in-memory variable store on the function registry. It also allows embedding function definitions to label certain config values as "sensitive", and the preprocessing logic will raise an error if users try to pass in hard-coded values. Closes #2110 Closes #521 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-02-24 15:52:19 -08:00
Will Jones	ecdee4d2b1	feat(python): add search() method to async API (#2049 ) Reviving #1966. Closes #1938 The `search()` method can apply embeddings for the user. This simplifies hybrid search, so instead of writing: ```python vector_query = embeddings.compute_query_embeddings("flower moon")[0] await ( async_tbl.query() .nearest_to(vector_query) .nearest_to_text("flower moon") .to_pandas() ) ``` You can write: ```python await (await async_tbl.search("flower moon", query_type="hybrid")).to_pandas() ``` Unfortunately, we had to do a double-await here because `search()` needs to be async. This is because it often needs to do IO to retrieve and run an embedding function.	2025-02-24 14:19:25 -08:00
Lei Xu	6fa1f37506	docs: improve pydantic integration docs (#2136 ) Address usage mistakes in https://github.com/lancedb/lancedb/issues/2135. * Add example of how to use `LanceModel` and `Vector` decorator * Add test for pydantic doc * Fix the example to directly use LanceModel instead of calling `MyModel.to_arrow_schema()` in the example. * Add cross-reference link to pydantic doc site * Configure mkdocs to watch code changes in python directory.	2025-02-21 12:48:37 -08:00
BubbleCal	544382df5e	fix: handle batch quires in single request (#2139 )	2025-02-21 13:23:39 +08:00
Lance Release	a33a0670f6	Bump version: 0.19.1-beta.2 → 0.19.1-beta.3	2025-02-20 03:37:27 +00:00
Lei Xu	1865f7decf	fix: support optional nested pydantic model (#2130 ) Closes #2129	2025-02-17 20:43:13 -08:00
BubbleCal	a608621476	test: query with dist range and new rows (#2126 ) we found a bug that flat KNN plan node's stats is not in right order as fields in schema, it would cause an error if querying with distance range and new unindexed rows. we've fixed this in lance so add this test for verifying it works Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-17 12:57:45 +08:00
Lance Release	40f0dbb64d	Bump version: 0.19.1-beta.1 → 0.19.1-beta.2	2025-02-13 04:39:19 +00:00
Will Jones	78a17ad54c	chore: improve dev instructions for Python (#2088 ) Closes #2042	2025-02-12 14:08:52 -08:00
Lance Release	d18d63c69d	Bump version: 0.19.1-beta.0 → 0.19.1-beta.1	2025-02-11 20:55:23 +00:00
Lance Release	e64712cfa5	Bump version: 0.19.0 → 0.19.1-beta.0	2025-02-07 19:27:07 +00:00
Lance Release	998cd43fe6	Bump version: 0.19.0-beta.0 → 0.19.0	2025-02-07 17:32:26 +00:00
Lance Release	4bc7eebe61	Bump version: 0.18.1-beta.4 → 0.19.0-beta.0	2025-02-07 17:32:26 +00:00
Will Jones	e7574698eb	feat: upgrade Lance to 0.23.0 (#2101 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.23.0	2025-02-07 07:58:07 -08:00
Will Jones	801a9e5f6f	feat(python): streaming larger-than-memory writes (#2094 ) Makes our preprocessing pipeline do transforms in streaming fashion, so users can do larger-then-memory writes. Closes #2082	2025-02-06 16:37:30 -08:00
Weston Pace	1a449fa49e	refactor: rename drop_db / drop_database to drop_all_tables, expose database from connection (#2098 ) If we start supporting external catalogs then "drop database" may be misleading (and not possible). We should be more clear that this is a utility method to drop all tables. This is also a nice chance for some consistency cleanup as it was `drop_db` in rust, `drop_database` in python, and non-existent in typescript. This PR also adds a public accessor to get the database trait from a connection. BREAKING CHANGE: the `drop_database` / `drop_db` methods are now deprecated.	2025-02-06 13:22:28 -08:00
Weston Pace	6bf742c759	feat: expose table trait (#2097 ) Similar to `c269524b2f` this PR reworks and exposes an internal trait (this time `TableInternal`) to be a public trait. These two PRs together should make it possible for others to integrate LanceDB on top of other catalogs. This PR also adds a basic `TableProvider` implementation for tables, although some work still needs to be done here (pushdown not yet enabled).	2025-02-05 18:13:51 -08:00
Ryan Green	ef3093bc23	feat: drop_index() remote implementation (#2093 ) Support drop_index operation in remote table.	2025-02-05 10:06:19 -03:30
Will Jones	16851389ea	feat: extra headers parameter in client options (#2091 ) Closes #1106 Unfortunately, these need to be set at the connection level. I investigated whether if we let users provide a callback they could use `AsyncLocalStorage` to access their context. However, it doesn't seem like NAPI supports this right now. I filed an issue: https://github.com/napi-rs/napi-rs/issues/2456	2025-02-04 17:26:45 -08:00
Weston Pace	c269524b2f	feat!: refactor ConnectionInternal into a Database trait (#2067 ) This opens up the door for more custom database implementations than the two we have today. The biggest change should be inivisble: `ConnectionInternal` has been renamed to `Database`, made public, and refactored However, there are a few breaking changes. `data_storage_version` and `enable_v2_manifest_paths` have been moved from options on `create_table` to options for the database which are now set via `storage_options`. Before: ``` db = connect(uri) tbl = db.create_table("my_table", data, data_storage_version="legacy", enable_v2_manifest_paths=True) ``` After: ``` db = connect(uri, storage_options={ "new_table_enable_v2_manifest_paths": "true", "new_table_data_storage_version": "legacy" }) tbl = db.create_table("my_table", data) ``` BREAKING CHANGE: the data_storage_version, enable_v2_manifest_paths options have moved from options to create_table to storage_options. BREAKING CHANGE: the use_legacy_format option has been removed, data_storage_version has replaced it for some time now	2025-02-04 14:35:14 -08:00
Lance Release	f6eef14313	Bump version: 0.18.1-beta.3 → 0.18.1-beta.4	2025-02-04 17:25:52 +00:00
Rob Meng	32716adaa3	chore: bump lance version (#2092 )	2025-02-04 12:25:05 -05:00
Lance Release	482f1ee1d3	Bump version: 0.18.1-beta.2 → 0.18.1-beta.3	2025-02-01 01:20:49 +00:00
Will Jones	2f39274a66	feat: upgrade lance to 0.23.0-beta.4 (#2089 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.23.0-beta.4	2025-01-31 17:20:15 -08:00
Will Jones	2fc174f532	docs: add sync/async tabs to quickstart (#2087 ) Closes #2033	2025-01-31 15:43:54 -08:00
Will Jones	dba85f4d6f	docs: user guide for merge insert (#2083 ) Closes #2062	2025-01-31 10:03:21 -08:00
Will Jones	15f8f4d627	ci: check license headers (#2076 ) Based on the same workflow in Lance.	2025-01-29 08:27:07 -08:00
Lance Release	a9897d9d85	Bump version: 0.18.1-beta.1 → 0.18.1-beta.2	2025-01-28 22:31:14 +00:00
Will Jones	acda7a4589	feat: upgrade lance to v0.23.0-beta.3 (#2074 ) This includes several bugfixes for `merge_insert` and null handling in vector search. https://github.com/lancedb/lance/releases/tag/v0.23.0-beta.3	2025-01-28 14:00:06 -08:00
Vaibhav	dac0857745	feat: add `distance_type()` parameter to python sync query builders and `metric()` as an alias (#2073 ) This PR aims to fix #2047 by doing the following things: - Add a distance_type parameter to the sync query builders of Python SDK. - Make metric an alias to distance_type.	2025-01-28 13:59:53 -08:00
Lance Release	e5f42a850e	Bump version: 0.18.1-beta.0 → 0.18.1-beta.1	2025-01-23 23:01:13 +00:00
Will Jones	28e1b70e4b	fix(python): preserve original distance and score in hybrid queries (#2061 ) Fixes #2031 When we do hybrid search, we normalize the scores. We do this calculation in-place, because the Rerankers expect the `_distance` and `_score` columns to be the normalized ones. So I've changed the logic so that we restore the original distance and scores by matching on row ids.	2025-01-23 13:54:26 -08:00
Will Jones	52b79d2b1e	feat: upgrade lance to v0.23.0-beta.2 (#2063 ) Fixes https://github.com/lancedb/lancedb/issues/2043	2025-01-23 13:51:30 -08:00
Will Jones	bcfc93cc88	fix(python): various fixes for async query builders (#2048 ) This includes several improvements and fixes to the Python Async query builders: 1. The API reference docs show all the methods for each builder 2. The hybrid query builder now has all the same setter methods as the vector search one, so you can now set things like `.distance_type()` on a hybrid query. 3. Re-rankers are now properly hooked up and tested for FTS and vector search. Previously the re-rankers were accidentally bypassed in unit tests, because the builders overrode `.to_arrow()`, but the unit test called `.to_batches()` which was only defined in the base class. Now all builders implement `.to_batches()` and leave `.to_arrow()` to the base class. 4. The `AsyncQueryBase` and `AsyncVectoryQueryBase` setter methods now return `Self`, which provides the appropriate subclass as the type hint return value. Previously, `AsyncQueryBase` had them all hard-coded to `AsyncQuery`, which was unfortunate. (This required bringing in `typing-extensions` for older Python version, but I think it's worth it.)	2025-01-20 16:14:34 -08:00
BubbleCal	214d0debf5	docs: claim LanceDB supports float16/float32/float64 for multivector (#2040 )	2025-01-21 07:04:15 +08:00
Will Jones	f059372137	feat: add `drop_index()` method (#2039 ) Closes #1665	2025-01-20 10:08:51 -08:00
Lance Release	3dc1803c07	Bump version: 0.18.0 → 0.18.1-beta.0	2025-01-17 04:37:23 +00:00

... 3 4 5 6 7 ...

834 Commits