lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-26 14:49:57 +00:00

Author	SHA1	Message	Date
Will Jones	b2a38ac366	fix: make pylance optional again (#2209 ) The two remaining blockers were: * A method `with_embeddings` that was deprecated a year ago * A typecheck for `LanceDataset`	2025-03-21 11:26:32 -07:00
msu-reevo	cc81f3e1a5	fix(python): typing (#2167 ) @wjones127 is there a standard way you guys setup your virtualenv? I can either relist all the dependencies in the pyright precommit section, or specify a venv, or the user has to be in the virtual environment when they run git commit. If the venv location was standardized or a python manager like `uv` was used it would be easier to avoid duplicating the pyright dependency list. Per your suggestion, in `pyproject.toml` I added in all the passing files to the `includes` section. For ruff I upgraded the version and removed "TCH" which doesn't exist as an option. I added a `pyright_report.csv` which contains a list of all files sorted by pyright errors ascending as a todo list to work on. I fixed about 30 issues in `table.py` stemming from str's being passed into methods that required a string within a set of string Literals by extracting them into `types.py` Can you verify in the rust bridge that the schema should be a property and not a method here? If it's a method, then there's another place in the code where `inner.schema` should be `inner.schema()` ``` python class RecordBatchStream: @property def schema(self) -> pa.Schema: ... ``` Also unless the `_lancedb.pyi` file is wrong, then there is no `__anext__` here for `__inner` when it's not an `AsyncGenerator` and only `next` is defined: ``` python async def __anext__(self) -> pa.RecordBatch: return await self._inner.__anext__() if isinstance(self._inner, AsyncGenerator): batch = await self._inner.__anext__() else: batch = await self._inner.next() if batch is None: raise StopAsyncIteration return batch ``` in the else statement, `_inner` is a `RecordBatchStream` ```python class RecordBatchStream: @property def schema(self) -> pa.Schema: ... async def next(self) -> Optional[pa.RecordBatch]: ... ``` --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-10 09:01:23 -07:00
Lei Xu	7c12d497b0	ci: bump python to 3.12 in GHA (#2169 )	2025-03-01 17:24:02 -08:00
Lei Xu	f76c4a5ce1	chore: add pyright static type checking and fix some of the table interface (#1996 ) * Enable `pyright` in the project * Fixed some pyright typing errors in `table.py`	2025-01-04 15:24:58 -08:00
Lei Xu	4c9bab0d92	fix: use pandas with pydantic embedding column (#1818 ) * Make Pandas `DataFrame` works with embedding function + Subset of columns * Make `lancedb.create_table()` work with embedding function	2024-11-11 14:48:56 -08:00
Lei Xu	c9c61eb060	docs: expose merge_insert doc for remote python SDK (#1464 ) `merge_insert` API is not shown up on [`RemoteTable`](https://lancedb.github.io/lancedb/python/saas-python/#lancedb.remote.table.RemoteTable) today * Also bump `ruff` version as well	2024-07-22 10:48:16 -07:00
Weston Pace	ea86dad4b7	feat: upgrade lance to 0.12.2-beta.2 (#1381 )	2024-06-14 05:43:26 -07:00
Rob Meng	2e197ef387	feat: upgrade lance to 0.11.0 (#1317 ) upgrade lance and make fixes for the upgrade	2024-05-21 18:53:19 -04:00
Will Jones	1d23af213b	feat: expose storage options in LanceDB (#1204 ) Exposes `storage_options` in LanceDB. This is provided for Python async, Node `lancedb`, and Node `vectordb` (and Rust of course). Python synchronous is omitted because it's not compatible with the PyArrow filesystems we use there currently. In the future, we will move the sync API to wrap the async one, and then it will get support for `storage_options`. 1. Fixes #1168 2. Closes #1165 3. Closes #1082 4. Closes #439 5. Closes #897 6. Closes #642 7. Closes #281 8. Closes #114 9. Closes #990 10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged to use `storageOptions` instead.	2024-04-10 10:12:04 -07:00
Chang She	a3761f4209	doc: fix langchain link (#1053 )	2024-04-05 16:31:36 -07:00
Weston Pace	629c622d15	feat: Initial remote table implementation for rust (#1024 ) This will eventually replace the remote table implementations in python and node.	2024-04-05 16:31:36 -07:00
Weston Pace	2cec2a8937	feat: add a basic async python client starting point (#1014 ) This changes `lancedb` from a "pure python" setuptools project to a maturin project and adds a rust lancedb dependency. The async python client is extremely minimal (only `connect` and `Connection.table_names` are supported). The purpose of this PR is to get the infrastructure in place for building out the rest of the async client. Although this is not technically a breaking change (no APIs are changing) it is still a considerable change in the way the wheels are built because they now include the native shared library.	2024-04-05 16:31:34 -07:00
Lei Xu	bd2d187538	ci: bump to new version of python action to use node 20 gIthub action runtime (#909 ) Github action is deprecating old node-16 runtime.	2024-04-05 16:29:05 -07:00
Lei Xu	a617ad35ff	ci: change apple silicon runner to free OSS macos-14 target (#901 )	2024-04-05 16:28:56 -07:00
Lei Xu	f2e29eb004	chore: upgrade lance, pylance and datafusion (#879 )	2024-04-05 16:28:56 -07:00
Lei Xu	faa5912c3f	chore: bump github actions to v4 due to GHA warnings of node version deprecation (#874 )	2024-04-05 16:28:56 -07:00
Will Jones	5f6d13e958	ci: lint and enforce linting (#829 ) @eddyxu added instructions for linting here: `7af213801a/python/README.md (L45-L50)` However, we had a lot of failures and weren't checking this in CI. This PR fixes all lints and adds a check to CI to keep us in compliance with the lints.	2024-04-05 16:27:31 -07:00
Lei Xu	45b006d68c	chore: remove black as dependency (#808 ) We use `ruff` in CI and dev workflow now.	2024-04-05 16:25:02 -07:00
Chang She	009297e900	bug(python): fix path handling in windows (#724 ) Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:24:45 -07:00
Chang She	374a6f7e78	feat: support nested pydantic schema (#707 )	2024-04-05 16:24:30 -07:00
Lei Xu	49306a99ba	chore: apple silicon runner (#633 ) Close #632	2024-04-05 16:23:49 -07:00
Lei Xu	86efd36689	chore: improve create_table API consistency between local and remote SDK (#627 )	2024-04-05 16:23:47 -07:00
Bert	4e9aab9e8b	ci: cancel in progress runs on new push (#620 )	2024-04-05 16:22:59 -07:00
Will Jones	166b281d66	increment pylance (#618 )	2024-04-05 16:22:59 -07:00
Chang She	f20f19b804	feat: improve pydantic 1.x compat (#503 )	2023-09-18 19:01:30 -07:00
Chang She	c21f9cdda0	ci: fix docs build (#496 ) python/python.md contains typos in the class references --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-09-18 13:07:21 -07:00
Chang She	31dad71c94	multi-modal embedding-function (#484 )	2023-09-16 21:23:51 -04:00
gsilvestrin	1daecac648	fix(python): Pin pylance and add pandas as test dependency (#373 )	2023-07-27 15:21:45 -07:00
Chang She	e2325c634b	Allow creation of an empty table (#254 ) It's inconvenient to always require data at table creation time. Here we enable you to create an empty table and add data and set schema later. --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-07-06 20:44:58 -07:00
Rob Meng	d1e8a97a2a	isort entire repo (#200 )	2023-06-15 20:12:10 -04:00
Tevin Wang	9b83ce3d2a	add black to python CI (#178 ) Closes #48	2023-06-12 11:22:34 -07:00
Will Jones	fed33a51d5	wip: make the python API reference a bit nicer (#162 ) Adds: * Make `mkdocstrings` aware we are using numpy-style docstrings * Fixes broken link on `index.md` to Python API docs (and added link to node ones) * Added examples to various classes. * Added doctest to verify examples work.	2023-06-08 16:07:06 -07:00
Chang She	50cdb16b45	Better handle empty results from tantivy (#155 ) Closes #154 --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-06-05 18:18:14 -07:00
Chang She	04d97347d7	move tantivy-py installation to be separate from wheel (#97 ) pypi does not allow packages to be uploaded that has a direct reference for now we'll just ask the user to install tantivy separately --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-05-25 17:57:26 -06:00
Chang She	f485378ea4	Basic full text search capabilities (#62 ) This is v1 of integrating full text search index into LanceDB. # API The query API is roughly the same as before, except if the input is text instead of a vector we assume that its fts search. ## Example If `table` is a LanceDB LanceTable, then: Build index: `table.create_fts_index("text")` Query: `df = table.search("puppy").limit(10).select(["text"]).to_df()` # Implementation Here we use the tantivy-py package to build the index. We then use the row id's as the full-text-search index's doc id then we just do a Take operation to fetch the rows. # Limitations 1. don't support incremental row appends yet. New data won't show up in search 2. local filesystem only 3. requires building tantivy explicitly --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-05-24 22:25:31 -06:00
Chang She	c45b4dbd27	fix directory	2023-03-22 14:30:27 -07:00
Chang She	4f0785e381	add GHA for python unit tests	2023-03-22 14:28:02 -07:00

37 Commits