lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-24 22:09:58 +00:00

Author	SHA1	Message	Date
Will Jones	73b2977bff	chore: upgrade lance to 0.9.16 (#975 )	2024-02-14 14:20:03 -08:00
Lance Release	5b60412d66	[python] Bump version: 0.5.4 → 0.5.5	2024-02-13 23:30:35 +00:00
Ayush Chaurasia	eb31d95fef	feat(python): hybrid search updates, examples, & latency benchmarks (#964 ) - Rename safe_import -> attempt_import_or_raise (closes https://github.com/lancedb/lancedb/pull/923) - Update docs - Add Notebook example (@changhiskhan you can use it for the talk. Comes with "open in colab" button) - Latency benchmark & results comparison, sanity check on real-world data - Updates the default openai model to gpt-4	2024-02-13 17:58:39 +05:30
QianZhu	1b990983b3	Qian/make vector col optional (#950 ) remote SDK tests were completed through lancedb_integtest	2024-02-12 16:35:44 -08:00
Lance Release	82936c77ef	[python] Bump version: 0.5.3 → 0.5.4	2024-02-09 22:56:45 +00:00
Weston Pace	dddcddcaf9	chore: bump lance version to 0.9.15 (#949 )	2024-02-09 14:55:44 -08:00
Weston Pace	a9727eb318	feat: add support for filter during merge insert when matched (#948 ) Closes #940	2024-02-09 10:26:14 -08:00
QianZhu	48d55bf952	added error msg to SaaS APIs (#852 ) 1. improved error msg for SaaS create_table and create_index --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-02-09 10:07:47 -08:00
Weston Pace	d2e71c8b08	feat: add a filterable count_rows to all the lancedb APIs (#913 ) A `count_rows` method that takes a filter was recently added to `LanceTable`. This PR adds it everywhere else except `RemoteTable` (that will come soon).	2024-02-08 09:40:29 -08:00
Ayush Chaurasia	d982ee934a	feat(python): Reranker DX improvements (#904 ) - Most users might not know how to use `QueryBuilder` object. Instead we should just pass the string query. - Add new rerankers: Colbert, openai	2024-02-06 13:59:31 +05:30
Will Jones	57605a2d86	feat(python): add `read_consistency_interval` argument (#828 ) This PR refactors how we handle read consistency: does the `LanceTable` class always pick up modifications to the table made by other instance or processes. Users have three options they can set at the connection level: 1. (Default) `read_consistency_interval=None` means it will not check at all. Users can call `table.checkout_latest()` to manually check for updates. 2. `read_consistency_interval=timedelta(0)` means always check for updates, giving strong read consistency. 3. `read_consistency_interval=timedelta(seconds=20)` means check for updates every 20 seconds. This is eventual consistency, a compromise between the two options above. ## Table reference state There is now an explicit difference between a `LanceTable` that tracks the current version and one that is fixed at a historical version. We now enforce that users cannot write if they have checked out an old version. They are instructed to call `checkout_latest()` before calling the write methods. Since `conn.open_table()` doesn't have a parameter for version, users will only get fixed references if they call `table.checkout()`. The difference between these two can be seen in the repr: Table that are fixed at a particular version will have a `version` displayed in the repr. Otherwise, the version will not be shown. ```python >>> table LanceTable(connection=..., name="my_table") >>> table.checkout(1) >>> table LanceTable(connection=..., name="my_table", version=1) ``` I decided to not create different classes for these states, because I think we already have enough complexity with the Cloud vs OSS table references. Based on #812	2024-02-05 08:12:19 -08:00
Ayush Chaurasia	738511c5f2	feat(python): add support new openai embedding functions (#912 ) @PrashantDixit0 --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-02-04 18:19:42 -08:00
Lance Release	a9088224c5	[python] Bump version: 0.5.2 → 0.5.3	2024-02-03 03:04:04 +00:00
Ayush Chaurasia	688c57a0d8	fix: revert safe_import_pandas usage (#921 )	2024-02-02 18:57:13 -08:00
Lance Release	ce2242e06d	[python] Bump version: 0.5.1 → 0.5.2	2024-02-02 21:33:02 +00:00
Weston Pace	778339388a	chore: bump pylance version to latest in pyproject.toml (#918 )	2024-02-02 13:32:12 -08:00
Weston Pace	7f8637a0b4	feat: add merge_insert to the node and rust APIs (#915 )	2024-02-02 13:16:51 -08:00
QianZhu	09cd08222d	make it explicit about the vector column data type (#916 ) <img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM" src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964"> [ <img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM" src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07"> ](url) --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-02-02 09:02:02 -08:00
Bert	a248d7feec	fix: add request retry to python client (#917 ) Adds capability to the remote python SDK to retry requests (fixes #911) This can be configured through environment: - `LANCE_CLIENT_MAX_RETRIES`= total number of retries. Set to 0 to disable retries. default = 3 - `LANCE_CLIENT_CONNECT_RETRIES` = number of times to retry request in case of TCP connect failure. default = 3 - `LANCE_CLIENT_READ_RETRIES` = number of times to retry request in case of HTTP request failure. default = 3 - `LANCE_CLIENT_RETRY_STATUSES` = http statuses for which the request will be retried. passed as comma separated list of ints. default `500, 502, 503` - `LANCE_CLIENT_RETRY_BACKOFF_FACTOR` = controls time between retry requests. see [here](`23f2287eb5/src/urllib3/util/retry.py (L141-L146)`). default = 0.25 Only read requests will be retried: - list table names - query - describe table - list table indices This does not add retry capabilities for writes as it could possibly cause issues in the case where the retried write isn't idempotent. For example, in the case where the LB times-out the request but the server completes the request anyway, we might not want to blindly retry an insert request.	2024-02-02 11:27:29 -05:00
Weston Pace	cc9473a94a	docs: add cleanup_old_versions and compact_files to `Table` for documentation purposes (#900 ) Closes #819	2024-02-01 15:06:00 -08:00
Weston Pace	d77e95a4f4	feat: upgrade to lance 0.9.11 and expose merge_insert (#906 ) This adds the python bindings requested in #870 The javascript/rust bindings will be added in a future PR.	2024-02-01 11:36:29 -08:00
Raghav Dixit	9df6905d86	chore(python): GTE embedding function model name update (#902 ) Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-01-30 23:56:29 +05:30
Ayush Chaurasia	3ffed89793	feat(python): Hybrid search & Reranker API (#824 ) based on https://github.com/lancedb/lancedb/pull/713 - The Reranker api can be plugged into vector only or fts only search but this PR doesn't do that (see example - https://txt.cohere.com/rerank/) ### Default reranker -- `LinearCombinationReranker(weight=0.7, fill=1.0)` ``` table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas() ``` ### Available rerankers LinearCombinationReranker ``` from lancedb.rerankers import LinearCombinationReranker # Same as default table.search("hello", query_type="hybrid").rerank( normalize="score", reranker=LinearCombinationReranker() ).to_pandas() # with custom params reranker = LinearCombinationReranker(weight=0.3, fill=1.0) table.search("hello", query_type="hybrid").rerank( normalize="score", reranker=reranker ).to_pandas() ``` Cohere Reranker ``` from lancedb.rerankers import CohereReranker # default model.. English and multi-lingual supported. See docstring for available custom params table.search("hello", query_type="hybrid").rerank( normalize="rank", # score or rank reranker=CohereReranker() ).to_pandas() ``` CrossEncoderReranker ``` from lancedb.rerankers import CrossEncoderReranker table.search("hello", query_type="hybrid").rerank( normalize="rank", reranker=CrossEncoderReranker() ).to_pandas() ``` ## Using custom Reranker ``` from lancedb.reranker import Reranker class CustomReranker(Reranker): def rerank_hybrid(self, vector_result, fts_result): combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic # Custom rerank logic here return combined_res ``` - [x] Expand testing - [x] Make sure usage makes sense - [x] Run simple benchmarks for correctness (Seeing weird result from cohere reranker in the toy example) - Support diverse rerankers by default: - [x] Cross encoding - [x] Cohere - [x] Reciprocal Rank Fusion --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-01-30 19:10:33 +05:30
Raghav Dixit	d1a7257810	feat(python): Embedding fn support for gte-mlx/gte-large (#873 ) have added testing and an example in the docstring, will be pushing a separate PR in recipe repo for rag example --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-01-30 11:21:57 +05:30
Ayush Chaurasia	5c5e23bbb9	chore(python): Temporarily extend remote connection timeout (#888 ) Context - https://etoai.slack.com/archives/C05NC5YSW5V/p1706371205883149	2024-01-29 17:34:33 +05:30
Ayush Chaurasia	d84e0d1db8	feat(python): Aws Bedrock embeddings integration (#822 ) Supports amazon titan, cohere english & cohere multi-lingual base models.	2024-01-28 02:04:15 +05:30
Lei Xu	ac94b2a420	chore: upgrade lance, pylance and datafusion (#879 )	2024-01-27 12:31:38 -08:00
Bert	82cbcf6d07	Bump lance 0.9.9 (#851 )	2024-01-24 08:41:28 -05:00
Lance Release	41f0e32a06	[python] Bump version: 0.5.0 → 0.5.1	2024-01-23 22:01:14 +00:00
QianZhu	b4d451ed21	extend timeout for requests.get and requests.post (#848 )	2024-01-22 20:31:39 -08:00
Bert	66eaa2a00e	allow passing api key as env var (#841 ) Allow passing API key as env var: ```shell export LANCEDB_API_KEY=sh_123... ``` with this set, apiKey argument can omitted from `connect` ```js const db = await vectordb.connect({ uri: "db://test-proj-01-ae8343", region: "us-east-1", }) ``` ```py db = lancedb.connect( uri="db://test-proj-01-ae8343", region="us-east-1", ) ```	2024-01-22 16:18:28 -05:00
Lei Xu	83ed8d1e49	bug: add a test for fp16 (#837 ) Add test to ingest fp16 to a database	2024-01-20 16:23:28 -08:00
Bert	c89d5e6e6d	fix: remote python client closes idle connections (#831 )	2024-01-19 17:28:36 -05:00
Will Jones	d012db24c2	ci: lint and enforce linting (#829 ) @eddyxu added instructions for linting here: `7af213801a/python/README.md (L45-L50)` However, we had a lot of failures and weren't checking this in CI. This PR fixes all lints and adds a check to CI to keep us in compliance with the lints.	2024-01-19 13:09:14 -08:00
Bert	7af213801a	bump lance to 0.9.7 (#826 )	2024-01-18 20:44:22 -08:00
Prashanth Rao	119b928a52	docs: Updates and refactor (#683 ) This PR makes incremental changes to the documentation. * Closes #697 * Closes #698 ## Chores - [x] Add dark mode - [x] Fix headers in navbar - [x] Add `extra.css` to customize navbar styles - [x] Customize fonts for prose/code blocks, navbar and admonitions - [x] Inspect all admonition boxes (remove redundant dropdowns) and improve clarity and readability - [x] Ensure that all images in the docs have white background (not transparent) to be viewable in dark mode - [x] Improve code formatting in code blocks to make them consistent with autoformatters (eslint/ruff) - [x] Add bolder weight to h1 headers - [x] Add diagram showing the difference between embedded (OSS) and serverless (Cloud) - [x] Fix [Creating an empty table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table) section: right now, the subheaders are not clickable. - [x] In critical data ingestion methods like `table.add` (among others), the type signature often does not match the actual code - [x] Proof-read each documentation section and rewrite as necessary to provide more context, use cases, and explanations so it reads less like reference documentation. This is especially important for CRUD and search sections since those are so central to the user experience. ## Restructure/new content - [x] The section for [Adding data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table) only shows examples for pandas and iterables. We should include pydantic models, arrow tables, etc. - [x] Add conceptual tutorial for IVF-PQ index - [x] Clearly separate vector search, FTS and filtering sections so that these are easier to find - [x] Add docs on refine factor to explain its importance for recall. Closes #716 - [x] Add an FAQ page showing answers to commonly asked questions about LanceDB. Closes #746 - [x] Add simple polars example to the integrations section. Closes #756 and closes #153 - [ ] Add basic docs for the Rust API (more detailed API docs can come later). Closes #781 - [x] Add a section on the various storage options on local vs. cloud (S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782 - [x] Revamp filtering docs: add pre-filtering examples and redo headers and update content for SQL filters. Closes #783 and closes #784. - [x] Add docs for data management: compaction, cleaning up old versions and incremental indexing. Closes #785 - [ ] Add a benchmark section that also discusses some best practices. Closes #787 --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2024-01-19 00:18:37 +05:30
Lance Release	8bcdc81fd3	[python] Bump version: 0.4.4 → 0.5.0	2024-01-18 01:53:15 +00:00
Chang She	39e14c70c5	chore(python): turn off lazy frame ingestion (#821 )	2024-01-16 19:11:16 -08:00
Chang She	af8263af94	feat(python): allow the entire table to be converted a polars dataframe (#814 )	2024-01-15 15:49:16 -08:00
Chang She	be4ab9eef3	feat(python): add exist_ok option to create table (#813 ) This mimics CREATE TABLE IF NOT EXISTS behavior. We add `db.create_table(..., exist_ok=True)` parameter. By default it is set to False, so trying to create a table with the same name will raise an exception. If set to True, then it only opens the table if it already exists. If you pass in a schema, it will be checked against the existing table to make sure you get what you want. If you pass in data, it will NOT be added to the existing table.	2024-01-15 11:09:18 -08:00
Ayush Chaurasia	184d2bc969	chore(python): get rid of Pydantic deprication warning in embedding fcn (#816 ) ``` UserWarning: Valid config keys have changed in V2: * 'keep_untouched' has been renamed to 'ignored_types' warnings.warn(message, UserWarning) ```	2024-01-15 12:19:51 +05:30
Anton Shevtsov	ff6f005336	Add openai api key not found help (#815 ) This pull request adds check for the presence of an environment variable `OPENAI_API_KEY` and removes an unused parameter in `retry_with_exponential_backoff` function.	2024-01-15 02:44:09 +05:30
Chang She	49333e522c	feat(python): basic polars integration (#811 ) We should now be able to directly ingest polars dataframes and return results as polars dataframes ![image](https://github.com/lancedb/lancedb/assets/759245/828b1260-c791-45f1-a047-aa649575e798)	2024-01-13 16:38:16 -08:00
Ayush Chaurasia	4568df422d	feat(python): Add gemini text embedding function (#806 ) Named it Gemini-text for now. Not sure how complicated it will be to support both text and multimodal embeddings under the same class "gemini"..But its not something to worry about for now I guess.	2024-01-12 22:38:55 -08:00
Lance Release	0a16e29b93	[python] Bump version: 0.4.3 → 0.4.4	2024-01-11 21:29:00 +00:00
Will Jones	cf7d7a19f5	upgrade lance (#809 )	2024-01-11 13:28:10 -08:00
Lei Xu	fe2fb91a8b	chore: remove black as dependency (#808 ) We use `ruff` in CI and dev workflow now.	2024-01-11 10:58:49 -08:00
Sebastian Law	99adfe065a	use requests instead of aiohttp for underlying http client (#803 ) instead of starting and stopping the current thread's event loop on every http call, just make an http call.	2024-01-10 00:07:50 -05:00
Chang She	277406509e	chore(python): add docstring for limit behavior (#800 ) Closes #796	2024-01-09 20:20:13 -08:00
Chang She	63411b4d8b	feat(python): add phrase query option for fts (#798 ) addresses #797 Problem: tantivy does not expose option to explicitly Proposed solution here: 1. Add a `.phrase_query()` option 2. Under the hood, LanceDB takes care of wrapping the input in quotes and replace nested double quotes with single quotes I've also filed an upstream issue, if they support phrase queries natively then we can get rid of our manual custom processing here.	2024-01-09 19:41:31 -08:00

1 2 3 4 5 ...

265 Commits