lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-26 22:59:57 +00:00

Author	SHA1	Message	Date
Weston Pace	4eb819072a	feat: upgrade to lance 0.9.11 and expose merge_insert (#906 ) This adds the python bindings requested in #870 The javascript/rust bindings will be added in a future PR.	2024-04-05 16:29:05 -07:00
Ayush Chaurasia	a41f7be88d	feat(python): Hybrid search & Reranker API (#824 ) based on https://github.com/lancedb/lancedb/pull/713 - The Reranker api can be plugged into vector only or fts only search but this PR doesn't do that (see example - https://txt.cohere.com/rerank/) ### Default reranker -- `LinearCombinationReranker(weight=0.7, fill=1.0)` ``` table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas() ``` ### Available rerankers LinearCombinationReranker ``` from lancedb.rerankers import LinearCombinationReranker # Same as default table.search("hello", query_type="hybrid").rerank( normalize="score", reranker=LinearCombinationReranker() ).to_pandas() # with custom params reranker = LinearCombinationReranker(weight=0.3, fill=1.0) table.search("hello", query_type="hybrid").rerank( normalize="score", reranker=reranker ).to_pandas() ``` Cohere Reranker ``` from lancedb.rerankers import CohereReranker # default model.. English and multi-lingual supported. See docstring for available custom params table.search("hello", query_type="hybrid").rerank( normalize="rank", # score or rank reranker=CohereReranker() ).to_pandas() ``` CrossEncoderReranker ``` from lancedb.rerankers import CrossEncoderReranker table.search("hello", query_type="hybrid").rerank( normalize="rank", reranker=CrossEncoderReranker() ).to_pandas() ``` ## Using custom Reranker ``` from lancedb.reranker import Reranker class CustomReranker(Reranker): def rerank_hybrid(self, vector_result, fts_result): combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic # Custom rerank logic here return combined_res ``` - [x] Expand testing - [x] Make sure usage makes sense - [x] Run simple benchmarks for correctness (Seeing weird result from cohere reranker in the toy example) - Support diverse rerankers by default: - [x] Cross encoding - [x] Cohere - [x] Reciprocal Rank Fusion --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-04-05 16:28:56 -07:00
Lei Xu	97d033dfd6	bug: add a test for fp16 (#837 ) Add test to ingest fp16 to a database	2024-04-05 16:27:42 -07:00
Chang She	ac3d95ec34	feat(python): allow the entire table to be converted a polars dataframe (#814 )	2024-04-05 16:26:36 -07:00
Chang She	17dcb70076	feat(python): basic polars integration (#811 ) We should now be able to directly ingest polars dataframes and return results as polars dataframes ![image](https://github.com/lancedb/lancedb/assets/759245/828b1260-c791-45f1-a047-aa649575e798)	2024-04-05 16:26:19 -07:00
Chang She	7581cbb38f	chore(python): add docstring for limit behavior (#800 ) Closes #796	2024-04-05 16:25:01 -07:00
Chang She	f17d16f935	feat(python): add count_rows with filter option (#801 ) Closes #795	2024-04-05 16:25:01 -07:00
Weston Pace	94e81ff84b	feat: add the ability to create scalar indices (#679 ) This is a pretty direct binding to the underlying lance capability	2024-04-05 16:24:47 -07:00
Will Jones	48a12e780c	upgrade lance to v0.9.1 (#727 ) This brings in some important bugfixes related to take and aarch64 Linux. See changes at: https://github.com/lancedb/lance/releases/tag/v0.9.1	2024-04-05 16:24:30 -07:00
Chang She	cc9d74e7a7	feat(python): add option to flatten output in to_pandas (#722 ) Closes https://github.com/lancedb/lance/issues/1738 We add a `flatten` parameter to the signature of `to_pandas`. By default this is None and does nothing. If set to True or -1, then LanceDB will flatten structs before converting to a pandas dataframe. All nested structs are also flattened. If set to any positive integer, then LanceDB will flatten structs up to the specified level of nesting. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:24:30 -07:00
Chang She	374a6f7e78	feat: support nested pydantic schema (#707 )	2024-04-05 16:24:30 -07:00
Will Jones	9356c3b86a	feat(python): add update query support for Python (#654 ) Closes #69 Will not pass until https://github.com/lancedb/lance/pull/1585 is released	2024-04-05 16:24:29 -07:00
Rok Mihevc	78ab9068a8	feat(python): expose index cache size (#655 ) This is to enable https://github.com/lancedb/lancedb/issues/641. Should be merged after https://github.com/lancedb/lance/pull/1587 is released.	2024-04-05 16:23:49 -07:00
Lei Xu	7b5bfadab2	chore: bump lance to 0.8.5 (#561 ) Bump lance to 0.5.8	2024-04-05 16:22:59 -07:00
Will Jones	c07207c661	feat: cleanup and compaction (#518 ) #488	2024-04-05 16:22:59 -07:00
Chang She	8469d010f8	feat: add to_list and to_pandas api's (#556 ) Add `to_list` to return query results as list of python dict (so we're not too pandas-centric). Closes #555 Add `to_pandas` API and add deprecation warning on `to_df`. Closes #545 Co-authored-by: Chang She <chang@lancedb.com>	2024-04-05 16:22:59 -07:00
Lei Xu	a26c8f3316	feat: use GPU for index creation. (#540 ) Bump lance to 0.8.3 to include GPU training --------- Co-authored-by: Rob Meng <rob.xu.meng@gmail.com>	2023-10-05 20:49:00 -07:00
Chang She	c21f9cdda0	ci: fix docs build (#496 ) python/python.md contains typos in the class references --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-09-18 13:07:21 -07:00
Chang She	31dad71c94	multi-modal embedding-function (#484 )	2023-09-16 21:23:51 -04:00
Lei Xu	b315ea3978	[Python] Pydantic vector field with default value (#474 ) Rename `lance.pydantic.vector` to `Vector` and deprecate `vector(dim)`	2023-09-08 22:35:31 -07:00
Chang She	9a9a73a65d	[python] Use pydantic for embedding function persistence (#467 ) 1. Support persistent embedding function so users can just search using query string 2. Add fixed size list conversion for multiple vector columns 3. Add support for empty query (just apply select/where/limit). 4. Refactor and simplify some of the data prep code --------- Co-authored-by: Chang She <chang@lancedb.com> Co-authored-by: Weston Pace <weston.pace@gmail.com>	2023-09-05 21:30:45 -07:00
Chang She	0cba0f4f92	[python] Temporary update feature (#457 ) Combine delete and append to make a temporary update feature that is only enabled for the local python lancedb. The reason why this is temporary is because it first has to load the data that matches the where clause into memory, which is technical unbounded. --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-08-30 00:25:26 -07:00
Chang She	e587a17a64	[python] Support schema evolution in local LanceDB (#452 ) Previously if you needed to add a column to a table you'd have to rewrite the whole table. Instead, we use the merge functionality from Lance format to incrementally add columns from another table or dataframe. --------- Co-authored-by: Chang She <chang@lancedb.com> Co-authored-by: Weston Pace <weston.pace@gmail.com>	2023-08-24 14:40:49 -07:00
Chang She	2f1f9f6338	[python] improve restore functionality (#451 ) Previously the temporary restore feature required copying data. The new feature in pylance does not. --------- Co-authored-by: Chang She <chang@lancedb.com> Co-authored-by: Weston Pace <weston.pace@gmail.com>	2023-08-24 11:00:34 -07:00
Chang She	e3061d4cb4	[python] Temporary restore feature (#428 ) This adds LanceTable.restore as a temporary feature. It reads data from a previous version and creates a new snapshot version using that data. This makes the version writeable unlike checkout. This should be replaced once the feature is implemented in pylance. Co-authored-by: Chang She <chang@lancedb.com>	2023-08-14 20:10:29 -07:00
Chang She	cada35d5b7	Improve pydantic integration (#384 )	2023-07-31 12:16:44 -04:00
Lei Xu	f09db4a6d6	[Python] Do not return Table count for every add operation (#328 ) `Table::count()` will be linearly slower with more fragments ingested.	2023-07-18 17:11:17 -07:00
Chang She	e2325c634b	Allow creation of an empty table (#254 ) It's inconvenient to always require data at table creation time. Here we enable you to create an empty table and add data and set schema later. --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-07-06 20:44:58 -07:00
Chang She	507eeae9c8	Set default to error instead of drop (#259 ) when encountering bad input data, we can default to principle of least surprise and raise an exception. Co-authored-by: Chang She <chang@lancedb.com>	2023-07-05 22:44:18 -07:00
Chang She	3c46d7f268	Handle NaN input data (#241 ) Sometimes LangChain would insert a single `[np.nan]` as a placeholder if the embedding function failed. This causes a problem for Lance format because then the array can't be stored as a FixedSizedListArray. Instead: 1. By default we remove rows with embedding lengths less than the maximum length in the batch 2. If `strict=True` kwargs is set to True, then a `ValueError` is raised if the embeddings aren't all the same length --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-07-04 20:00:46 -07:00
Lei Xu	4bc676e26a	[Python] Support replace during create_index (#233 ) Closes #214	2023-06-27 16:02:07 -07:00
Philip Kung	313e66c4c5	Specify and Index Column for Vector Search (#217 )	2023-06-26 16:11:08 -07:00
Rob Meng	d1e8a97a2a	isort entire repo (#200 )	2023-06-15 20:12:10 -04:00
Rob Meng	cbb56e25ab	port remote connection client into lancedb (#194 ) * to_df() is now async, added `to_df_blocking` to convenience * add remote lancedb client to public lancedb * make lancedb connection class understand url scheme `lancedb+<connection_type>://<host>:<port>`.	2023-06-15 18:57:52 -04:00
Chang She	04d97347d7	move tantivy-py installation to be separate from wheel (#97 ) pypi does not allow packages to be uploaded that has a direct reference for now we'll just ask the user to install tantivy separately --------- Co-authored-by: Chang She <chang@lancedb.com>	2023-05-25 17:57:26 -06:00
Chang She	99310e099e	expose methods to work with versioning in tables	2023-04-19 16:48:06 -07:00
Chang She	f0ea1d898b	invalidate cached dataset after create_index and add	2023-04-18 16:51:26 -07:00
gsilvestrin	6865d66d37	renaming test case	2023-04-14 16:32:31 -07:00
gsilvestrin	aeecd809cc	bugfix for LanceTable.add to convert python lists into arrow fixed size lists - Fixed `add` unit test to create the correct expected result - Added a unit test for LanceTable.add - Need to discuss if len(LanceTable) is handled correctly	2023-04-14 14:13:01 -07:00
Chang She	5ef5141812	black	2023-03-22 18:29:07 -07:00
Chang She	690141d357	add unit tests	2023-03-21 22:29:19 -07:00

41 Commits